Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next PATCH 3/5] udp: Add support for software checksum and GSO_PARTIAL with GSO offload
From: Eric Dumazet @ 2018-05-04  3:50 UTC (permalink / raw)
  To: Alexander Duyck, netdev, willemb, davem, Steffen Klassert
In-Reply-To: <20180504003333.4496.7705.stgit@localhost.localdomain>



On 05/03/2018 05:33 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch adds support for a software provided checksum and GSO_PARTIAL
> segmentation support. With this we can offload UDP segmentation on devices
> that only have partial support for tunnels.
> 
> Since we are no longer needing the hardware checksum we can drop the checks
> in the segmentation code that were verifying if it was present.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  net/ipv4/udp_offload.c |   28 ++++++++++++++++++----------
>  net/ipv6/udp_offload.c |   11 +----------
>  2 files changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 946d06d2aa0c..fd94bbb369b2 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -217,6 +217,13 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>  		return segs;
>  	}
>  
> +	/* GSO partial and frag_list segmentation only requires splitting
> +	 * the frame into an MSS multiple and possibly a remainder, both
> +	 * cases return a GSO skb. So update the mss now.
> +	 */
> +	if (skb_is_gso(segs))
> +		mss *= skb_shinfo(segs)->gso_segs;
> +
> 

I do not understand this code.

I am also seeing it in tcp, after commit 07b26c9454a2a ("gso: Support partial splitting at the frag_list pointer")

Presumably this broke tcp_gso_tstamp() , right ?

^ permalink raw reply

* Re: [PATCH RFC net-next] net: ipvs: Adjust gso_size for IPPROTO_TCP
From: David Ahern @ 2018-05-04  3:50 UTC (permalink / raw)
  To: Martin KaFai Lau, Julian Anastasov
  Cc: netdev, Tom Herbert, Eric Dumazet, Nikita Shirokov, kernel-team,
	lvs-devel
In-Reply-To: <20180503070114.bcuusvzga45klccs@kafai-mbp>

On 5/3/18 1:01 AM, Martin KaFai Lau wrote:
> On Wed, May 02, 2018 at 10:30:32PM +0300, Julian Anastasov wrote:
>>
>> 	Hello,
>>
>> On Wed, 2 May 2018, Martin KaFai Lau wrote:
>>
>>> On Wed, May 02, 2018 at 09:38:43AM +0300, Julian Anastasov wrote:
>>>>
>>>> - initial traffic for port 21 does not use GSO. But after
>>>> every packet IPVS calls maybe_update_pmtu (rt->dst.ops->update_pmtu)
>>>> to report the reduced MTU. These updates are stored in fnhe_pmtu
>>>> but they do not go to any route, even if we try to get fresh
>>>> output route. Why? Because the local routes are not cached, so
>>>> they can not use the fnhe. This is what my patch for route.c
>>>> will fix. With this fix FTP-DATA gets route with reduced PMTU.
>>> For IPv6, the 'if (rt6->rt6i_flags & RTF_LOCAL)' gate in
>>> __ip6_rt_update_pmtu() may need to be lifted also.
>>
>> 	Probably. I completely forgot the IPv6 part
>> but as I don't know the IPv6 code enough, it may take
>> some time to understand what can be the problem there...
>> I'm not sure whether everything started with commit 0a6b2a1dc2a2,
>> so that in some configurations before that commit things
>> worked and problem was not noticed.
>>
>> 	I think, we should focus on such direction for IPv6:
>>
>> - do we remember per-VIP PMTU for the local routes
> IPv6 used not to create cache route for DST_HOST route which
> is a /128 route (that includes local /128 route).
> 
> Because of this, it had a bug such that a PMTU for the DST_HOST
> route will trigger dst.ops->update_pmtu() which then set
> an expire on the permanent /128 route instead of a cache
> route.  The permanent route got unexpectedly expired/removed
> later.
> 
> The fix was to allow creating /128 cache route as long as
> it is not RTF_LOCAL in 653437d02f1f and 7035870d1219.  The
> first post spelled out the problem better:
> https://patchwork.ozlabs.org/patch/456050/
> 
> Later, when we only create cache route after seeing PMTU
> in 45e4fd26683c, this RTF_LOCAL checking was carried over
> to __ip6_rt_update_pmtu().
> 
> Out of my head, I don't see issue removing the
> RTF_LOCAL check from __ip6_rt_update_pmtu().
> DavidA, what do you think?

I agree; I think it is fine. A route is route. The IPVS use cases with
local redirects are blurring the line with needs for local routes.

^ permalink raw reply

* Re: possible deadlock in smc_close_non_accepted
From: Hangbin Liu @ 2018-05-04  3:45 UTC (permalink / raw)
  To: syzbot; +Cc: davem, linux-kernel, linux-s390, netdev, syzkaller-bugs, ubraun
In-Reply-To: <001a114218b2f8c0fe05666a2fb6@google.com>

#syz fix: net/smc: simplify wait when closing listen socket

^ permalink raw reply

* Re: VRF: ICMPV6 Echo Reply failed to egress if ingress pkt Src is IPV6 Global and Dest is IPV6 Link Local.
From: David Ahern @ 2018-05-04  3:32 UTC (permalink / raw)
  To: Sukumar Gopalakrishnan, netdev
In-Reply-To: <CADiZnkTJqH70iwpKmfCKo68-5T+aaq27gNNu76toWL6LAyeEcg@mail.gmail.com>

On 4/30/18 6:58 AM, Sukumar Gopalakrishnan wrote:
> VRF: ICMPV6 Echo Reply failed to egress if ingress pkt Src is IPV6
> Global and Dest is IPV6 Link Local.

...

> if (fl6->flowi6_oif == dev->ifindex) {

try adding ' && !rt6_need_strict(saddr)' to the above.

If it works, add a comment above the line noting this case (link local
dest and global source) submit a patch.

>                 dst = &net->ipv6.ip6_null_entry->dst;
>                 dst_hold(dst);
>                 return dst;
>         }
> ..
> 
> 
> TEMP FIX:
> =========
> 
> Return Ingress vlan device's ifIndex instead of skb->dev's.
> 
> static int icmp6_iif(const struct sk_buff *skb)
> {
>         int iif = IP6CB(skb)->iif; // skb->dev->ifindex;
> 
> ..
> ..
> }
> 

^ permalink raw reply

* [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Yonghong Song @ 2018-05-04  3:31 UTC (permalink / raw)
  To: peterz, mingo, torvalds, ast, daniel, linux-kernel, x86, netdev
  Cc: kernel-team

Commit d0266046ad54 ("x86: Remove FAST_FEATURE_TESTS")
removed X86_FAST_FEATURE_TESTS and make macro static_cpu_has() always
use __always_inline function _static_cpu_has() funciton.
The static_cpu_has() uses gcc feature asm goto construct,
which is not supported by clang.

Issues
======

Currently, for BPF programs written in C, the only widely-supported
compiler is clang. Because of clang lacking support
of asm goto construct, if you try to compile
bpf programs under samples/bpf/,
   $ make -j20 && make headers_install && make samples/bpf/
you will see a lot of failures like below:

  clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include \
         -I/home/yhs/work/bpf-next/arch/x86/include \
         -I./arch/x86/include/generated  -I/home/yhs/work/bpf-next/include \
         -I./include -I/home/yhs/work/bpf-next/arch/x86/include/uapi \
         -I./arch/x86/include/generated/uapi \
         -I/home/yhs/work/bpf-next/include/uapi -I./include/generated/uapi \
         -include /home/yhs/work/bpf-next/include/linux/kconfig.h
         -Isamples/bpf \
	 -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/ \
	 -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
	 -D__TARGET_ARCH_x86 -Wno-compare-distinct-pointer-types \
	 -Wno-gnu-variable-sized-type-not-at-end \
	 -Wno-address-of-packed-member -Wno-tautological-compare \
	 -Wno-unknown-warning-option  \
	 -O2 -emit-llvm -c /home/yhs/work/bpf-next/samples/bpf/xdp_redirect_cpu_kern.c \
         -o -| llc -march=bpf -filetype=obj -o samples/bpf/xdp_redirect_cpu_kern.o
  In file included from /home/yhs/work/bpf-next/samples/bpf/xdp_redirect_cpu_kern.c:10:
  In file included from /home/yhs/work/bpf-next/include/uapi/linux/in.h:24:
  In file included from /home/yhs/work/bpf-next/include/linux/socket.h:8:
  In file included from /home/yhs/work/bpf-next/include/linux/uio.h:13:
  In file included from /home/yhs/work/bpf-next/include/linux/thread_info.h:38:
  In file included from /home/yhs/work/bpf-next/arch/x86/include/asm/thread_info.h:53:
  /home/yhs/work/bpf-next/arch/x86/include/asm/cpufeature.h:150:2: error: 'asm goto' constructs are not supported yet
        asm_volatile_goto("1: jmp 6f\n"
        ^
  /home/yhs/work/bpf-next/include/linux/compiler-gcc.h:296:42: note: expanded from macro 'asm_volatile_goto'
  #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
                                           ^
  1 error generated.
  ...
  In file included from /home/yhs/work/bpf-next/samples/bpf/tracex4_kern.c:7:
  In file included from /home/yhs/work/bpf-next/include/linux/ptrace.h:6:
  In file included from /home/yhs/work/bpf-next/include/linux/sched.h:14:
  In file included from /home/yhs/work/bpf-next/include/linux/pid.h:5:
  In file included from /home/yhs/work/bpf-next/include/linux/rculist.h:11:
  In file included from /home/yhs/work/bpf-next/include/linux/rcupdate.h:40:
  In file included from /home/yhs/work/bpf-next/include/linux/preempt.h:81:
  In file included from /home/yhs/work/bpf-next/arch/x86/include/asm/preempt.h:7:
  In file included from /home/yhs/work/bpf-next/include/linux/thread_info.h:38:
  In file included from /home/yhs/work/bpf-next/arch/x86/include/asm/thread_info.h:53:
  /home/yhs/work/bpf-next/arch/x86/include/asm/cpufeature.h:150:2: error: 'asm goto' constructs are not supported yet
  ...

In the above, bpf compilation looks like below
  clang -target <native_arch> ... -emit-llvm -c -o - | llc -march=bpf ...
The clang compilation targets the native architecture and generates the LLVM IR bytecode, and
the llc compilation takes the IR bytecode and generates bpf instructions.

If kernel header files accessed by bpf program have asm goto construct, e.g.,
arch/x86/include/asm/cpufeature.h, the "clang -target <native_arch> ..."
compilation will fail as clang does not support asm goto yet.

Solution
=========

The solution proposed in this patch does not need user space change.
When the kernel detects the compiler has asm goto support,
it sets CC_HAVE_ASM_GOTO, the patch added the additional marco definition
__NO_CLANG_BPF_HACK. In arch/x86/include/asm/cpufeature.h, the
asm goto construct is accessed only if __NO_CLANG_BPF_HACK is defined.
This should not impact kernel compilation functionality since
for x86, we have
  ifndef CC_HAVE_ASM_GOTO
    $(error Compiler lacks asm-goto support.)
  endif
So __NO_CLANG_BPF_HACK should be always defined during kernel compilation
targeting x86.

User space code which compiles bpf program does not have
__NO_CLANG_BPF_HACK defined so it will go to alternate path
which avoids asm goto.

This approach is preferred since the already deployed bcc scripts, or
any other bpf applicaitons utilizing LLVM JIT compilation functionality,
will continue work with the new kernel without re-compilation and
re-deployment.

Note that this is a hack in the kernel to workaround bpf compilation issue.
The hack will be removed once clang starts to support asm goto.

Fixes: d0266046ad54 ("x86: Remove FAST_FEATURE_TESTS")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 Makefile                          | 1 +
 arch/x86/include/asm/cpufeature.h | 5 +++++
 2 files changed, 6 insertions(+)

Changelog:
 v2 -> v3:
   . Changed macro name from NO_BPF_WORKAROUND to __NO_CLANG_BPF_HACK.
     Explained the solution in more details.
 v1 -> v2:
   . Use NO_BPF_WORKAROUND macro instead of CC_HAVE_ASM_GOTO
     to make it explicit that this is a workaround.

Peter,

Could you take a look at this patch revision and give your opinion?
More and more people hit this issue and have to manually work around it.
It would be good if this can be resolved soon.

Thanks!

diff --git a/Makefile b/Makefile
index 83b6c54..cfd8759 100644
--- a/Makefile
+++ b/Makefile
@@ -504,6 +504,7 @@ export RETPOLINE_CFLAGS
 ifeq ($(call shell-cached,$(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC) $(KBUILD_CFLAGS)), y)
   CC_HAVE_ASM_GOTO := 1
   KBUILD_CFLAGS += -DCC_HAVE_ASM_GOTO
+  KBUILD_CFLAGS += -D__NO_CLANG_BPF_HACK
   KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO
 endif
 
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index b27da96..42edd5d 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -140,6 +140,8 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
 
 #define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit)
 
+/* this macro is a temporary hack for bpf until clang gains asm-goto support */
+#ifdef __NO_CLANG_BPF_HACK
 /*
  * Static testing of CPU features.  Used the same as boot_cpu_has().
  * These will statically patch the target code for additional
@@ -195,6 +197,9 @@ static __always_inline __pure bool _static_cpu_has(u16 bit)
 		boot_cpu_has(bit) :				\
 		_static_cpu_has(bit)				\
 )
+#else
+#define static_cpu_has(bit)		boot_cpu_has(bit)
+#endif
 
 #define cpu_has_bug(c, bit)		cpu_has(c, (bit))
 #define set_cpu_bug(c, bit)		set_cpu_cap(c, (bit))
-- 
2.9.5

^ permalink raw reply related

* [PATCH V3] net/netlink: make sure the headers line up actual value output
From: YU Bo @ 2018-05-04  3:09 UTC (permalink / raw)
  To: davem, xiyou.wangcong, tsu.yubo, yuzibode; +Cc: netdev

Making sure the headers line up properly with the actual value output of the command
`cat /proc/net/netlink`

Before the patch:
<sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
<ffff8cd2c2f7b000 0   909    00000550 0        0        0 2        0        18946

After the patch:
>sk               Eth Pid        Groups   Rmem     Wmem     Dump  Locks    Drops    Inode
>0000000033203952 0   897        00000113 0        0        0     2        0        14906

Signed-off-by: Bo YU <tsu.yubo@gmail.com>
---
Changes in V3:
  - Drop external website for commit message
  - Modify the commit accroding to DavidM
Changes in V2:
  - Do not break the code line

 net/netlink/af_netlink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 55342c4d5cec..2e2dd88fc79f 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2606,13 +2606,13 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
 {
 	if (v == SEQ_START_TOKEN) {
 		seq_puts(seq,
-			 "sk       Eth Pid    Groups   "
-			 "Rmem     Wmem     Dump     Locks     Drops     Inode\n");
+			 "sk               Eth Pid        Groups   "
+			 "Rmem     Wmem     Dump  Locks    Drops    Inode\n");
 	} else {
 		struct sock *s = v;
 		struct netlink_sock *nlk = nlk_sk(s);

-		seq_printf(seq, "%pK %-3d %-6u %08x %-8d %-8d %d %-8d %-8d %-8lu\n",
+		seq_printf(seq, "%pK %-3d %-10u %08x %-8d %-8d %-5d %-8d %-8d %-8lu\n",
 			   s,
 			   s->sk_protocol,
 			   nlk->portid,

^ permalink raw reply related

* [PATCH V3] net/netlink: make sure the headers line up actual value output
From: YU Bo @ 2018-05-04  2:59 UTC (permalink / raw)
  To: davem, xiyou.wangcong, yuzibode, tsu.yubo; +Cc: netdev, janitors

Making sure the headers line up properly with the actual value output of the command
`cat /proc/net/netlink`

Before the patch:
<sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
<ffff8cd2c2f7b000 0   909    00000550 0        0        0 2        0        18946

After the patch:
>sk               Eth Pid        Groups   Rmem     Wmem     Dump  Locks    Drops    Inode
>0000000033203952 0   897        00000113 0        0        0     2        0        14906

Signed-off-by: Bo YU <tsu.yubo@gmail.com>
---
Changes in V3:
  - Drop external website for commit message
  - Modify the commit accroding to DavidM
Changes in V2:
  - Do not break the code line

 net/netlink/af_netlink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 55342c4d5cec..2e2dd88fc79f 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2606,13 +2606,13 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
 {
 	if (v == SEQ_START_TOKEN) {
 		seq_puts(seq,
-			 "sk       Eth Pid    Groups   "
-			 "Rmem     Wmem     Dump     Locks     Drops     Inode\n");
+			 "sk               Eth Pid        Groups   "
+			 "Rmem     Wmem     Dump  Locks    Drops    Inode\n");
 	} else {
 		struct sock *s = v;
 		struct netlink_sock *nlk = nlk_sk(s);

-		seq_printf(seq, "%pK %-3d %-6u %08x %-8d %-8d %d %-8d %-8d %-8lu\n",
+		seq_printf(seq, "%pK %-3d %-10u %08x %-8d %-8d %-5d %-8d %-8d %-8lu\n",
 			   s,
 			   s->sk_protocol,
 			   nlk->portid,

^ permalink raw reply related

* [bpf-next v2 9/9] samples/bpf: Add example of ipv4 and ipv6 forwarding in XDP
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

Simple example of fast-path forwarding. It has a serious flaw
in not verifying the egress device index supports XDP forwarding.
If the egress device does not packets are dropped.

Take this only as a simple example of fast-path forwarding.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 samples/bpf/Makefile                      |   4 +
 samples/bpf/xdp_fwd_kern.c                | 113 +++++++++++++++++++++++++
 samples/bpf/xdp_fwd_user.c                | 136 ++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/bpf_helpers.h |   3 +
 4 files changed, 256 insertions(+)
 create mode 100644 samples/bpf/xdp_fwd_kern.c
 create mode 100644 samples/bpf/xdp_fwd_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 8e0c7fb6d7cc..28513d6be1bf 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -46,6 +46,7 @@ hostprogs-y += syscall_tp
 hostprogs-y += cpustat
 hostprogs-y += xdp_adjust_tail
 hostprogs-y += xdpsock
+hostprogs-y += xdp_fwd
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -100,6 +101,7 @@ syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
 cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
 xdp_adjust_tail-objs := bpf_load.o $(LIBBPF) xdp_adjust_tail_user.o
 xdpsock-objs := bpf_load.o $(LIBBPF) xdpsock_user.o
+xdp_fwd-objs := bpf_load.o $(LIBBPF) xdp_fwd_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -154,6 +156,7 @@ always += syscall_tp_kern.o
 always += cpustat_kern.o
 always += xdp_adjust_tail_kern.o
 always += xdpsock_kern.o
+always += xdp_fwd_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -201,6 +204,7 @@ HOSTLOADLIBES_syscall_tp += -lelf
 HOSTLOADLIBES_cpustat += -lelf
 HOSTLOADLIBES_xdp_adjust_tail += -lelf
 HOSTLOADLIBES_xdpsock += -lelf -pthread
+HOSTLOADLIBES_xdp_fwd += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
new file mode 100644
index 000000000000..7eeaa32538b1
--- /dev/null
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2017-18 David Ahern <dsahern@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#define KBUILD_MODNAME "foo"
+#include <uapi/linux/bpf.h>
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/if_vlan.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+#include "bpf_helpers.h"
+
+#define IPV6_FLOWINFO_MASK              cpu_to_be32(0x0FFFFFFF)
+
+struct bpf_map_def SEC("maps") tx_port = {
+	.type = BPF_MAP_TYPE_DEVMAP,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 64,
+};
+
+static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data = (void *)(long)ctx->data;
+	struct bpf_fib_lookup fib_params;
+	struct ethhdr *eth = data;
+	int out_index;
+	u16 h_proto;
+	u64 nh_off;
+
+	nh_off = sizeof(*eth);
+	if (data + nh_off > data_end)
+		return XDP_DROP;
+
+	__builtin_memset(&fib_params, 0, sizeof(fib_params));
+
+	h_proto = eth->h_proto;
+	if (h_proto == htons(ETH_P_IP)) {
+		struct iphdr *iph = data + nh_off;
+
+		if (iph + 1 > data_end)
+			return XDP_DROP;
+
+		fib_params.family	= AF_INET;
+		fib_params.tos		= iph->tos;
+		fib_params.l4_protocol	= iph->protocol;
+		fib_params.sport	= 0;
+		fib_params.dport	= 0;
+		fib_params.tot_len	= ntohs(iph->tot_len);
+		fib_params.ipv4_src	= iph->saddr;
+		fib_params.ipv4_dst	= iph->daddr;
+	} else if (h_proto == htons(ETH_P_IPV6)) {
+		struct ipv6hdr *iph = data + nh_off;
+
+		if (iph + 1 > data_end)
+			return XDP_DROP;
+
+		fib_params.family	= AF_INET6;
+		fib_params.flowlabel	= *(__be32 *)iph & IPV6_FLOWINFO_MASK;
+		fib_params.l4_protocol	= iph->nexthdr;
+		fib_params.sport	= 0;
+		fib_params.dport	= 0;
+		fib_params.tot_len	= ntohs(iph->payload_len);
+		fib_params.ipv6_src	= iph->saddr;
+		fib_params.ipv6_dst	= iph->daddr;
+	} else {
+		return XDP_PASS;
+	}
+
+	fib_params.ifindex = ctx->ingress_ifindex;
+
+	out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
+
+	/* verify egress index has xdp support
+	 * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
+	 *       cannot pass map_type 14 into func bpf_map_lookup_elem#1:
+	 * NOTE: without verification that egress index supports XDP
+	 *       forwarding packets are dropped.
+	 */
+	if (out_index > 0) {
+		memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
+		memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
+		return bpf_redirect_map(&tx_port, out_index, 0);
+	}
+
+	return XDP_PASS;
+}
+
+SEC("xdp_fwd")
+int xdp_fwd_prog(struct xdp_md *ctx)
+{
+	return xdp_fwd_flags(ctx, 0);
+}
+
+SEC("xdp_fwd_direct")
+int xdp_fwd_direct_prog(struct xdp_md *ctx)
+{
+	return xdp_fwd_flags(ctx, BPF_FIB_LOOKUP_DIRECT);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/xdp_fwd_user.c b/samples/bpf/xdp_fwd_user.c
new file mode 100644
index 000000000000..9c6606f57126
--- /dev/null
+++ b/samples/bpf/xdp_fwd_user.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2017-18 David Ahern <dsahern@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/limits.h>
+#include <net/if.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <libgen.h>
+
+#include "bpf_load.h"
+#include "bpf_util.h"
+#include "libbpf.h"
+
+
+static int do_attach(int idx, int fd, const char *name)
+{
+	int err;
+
+	err = bpf_set_link_xdp_fd(idx, fd, 0);
+	if (err < 0)
+		printf("ERROR: failed to attach program to %s\n", name);
+
+	return err;
+}
+
+static int do_detach(int idx, const char *name)
+{
+	int err;
+
+	err = bpf_set_link_xdp_fd(idx, -1, 0);
+	if (err < 0)
+		printf("ERROR: failed to detach program from %s\n", name);
+
+	return err;
+}
+
+static void usage(const char *prog)
+{
+	fprintf(stderr,
+		"usage: %s [OPTS] interface-list\n"
+		"\nOPTS:\n"
+		"    -d    detach program\n"
+		"    -D    direct table lookups (skip fib rules)\n",
+		prog);
+}
+
+int main(int argc, char **argv)
+{
+	char filename[PATH_MAX];
+	int opt, i, idx, err;
+	int prog_id = 0;
+	int attach = 1;
+	int ret = 0;
+
+	while ((opt = getopt(argc, argv, ":dD")) != -1) {
+		switch (opt) {
+		case 'd':
+			attach = 0;
+			break;
+		case 'D':
+			prog_id = 1;
+			break;
+		default:
+			usage(basename(argv[0]));
+			return 1;
+		}
+	}
+
+	if (optind == argc) {
+		usage(basename(argv[0]));
+		return 1;
+	}
+
+	if (attach) {
+		snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+		if (access(filename, O_RDONLY) < 0) {
+			printf("error accessing file %s: %s\n",
+				filename, strerror(errno));
+			return 1;
+		}
+
+		if (load_bpf_file(filename)) {
+			printf("%s", bpf_log_buf);
+			return 1;
+		}
+
+		if (!prog_fd[prog_id]) {
+			printf("load_bpf_file: %s\n", strerror(errno));
+			return 1;
+		}
+	}
+	if (attach) {
+		for (i = 1; i < 64; ++i)
+			bpf_map_update_elem(map_fd[0], &i, &i, 0);
+	}
+
+	for (i = optind; i < argc; ++i) {
+		idx = if_nametoindex(argv[i]);
+		if (!idx)
+			idx = strtoul(argv[i], NULL, 0);
+
+		if (!idx) {
+			fprintf(stderr, "Invalid arg\n");
+			return 1;
+		}
+		if (!attach) {
+			err = do_detach(idx, argv[i]);
+			if (err)
+				ret = err;
+		} else {
+			err = do_attach(idx, prog_fd[prog_id], argv[i]);
+			if (err)
+				ret = err;
+		}
+	}
+
+	return ret;
+}
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 265f8e0e8ada..2375d06c706b 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -103,6 +103,9 @@ static int (*bpf_skb_get_xfrm_state)(void *ctx, int index, void *state,
 	(void *) BPF_FUNC_skb_get_xfrm_state;
 static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
 	(void *) BPF_FUNC_get_stack;
+static int (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params,
+			     int plen, __u32 flags) =
+	(void *) BPF_FUNC_fib_lookup;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet continues up the stack.

If it is to be forwarded, the forwarding can be done directly if the
neighbor is already known. If the neighbor does not exist, the first
few packets go up the stack for neighbor resolution. Once resolved, the
xdp program provides the fast path.

On successful lookup the nexthop dmac, current device smac and egress
device index are returned.

The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
are implemented in this patch. The API includes layer 4 parameters if
the XDP program chooses to do deep packet inspection to allow compare
against ACLs implemented as FIB rules.

Header rewrite is left to the XDP program.

The lookup takes 2 flags:
- BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
  straight to the table associated with the device (expert setting for
  those looking to maximize throughput)

- BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
  Default is an ingress lookup.

Initial performance numbers collected by Jesper, forwarded packets/sec:

       Full stack    XDP FIB lookup    XDP Direct lookup
IPv4   1,947,969       7,074,156          7,415,333
IPv6   1,728,000       6,165,504          7,262,720

These number are single CPU core forwarding on a Broadwell
E5-1650 v4 @ 3.60GHz.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/uapi/linux/bpf.h |  83 ++++++++++++++-
 net/core/filter.c        | 266 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 348 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 93d5a4eeec2a..ddc566cb7492 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -10,6 +10,8 @@
 
 #include <linux/types.h>
 #include <linux/bpf_common.h>
+#include <linux/if_ether.h>
+#include <linux/in6.h>
 
 /* Extended instruction set based on top of classic BPF */
 
@@ -1826,6 +1828,33 @@ union bpf_attr {
  * 	Return
  * 		0 on success, or a negative error in case of failure.
  *
+ *
+ * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags)
+ *	Description
+ *		Do FIB lookup in kernel tables using parameters in *params*.
+ *		If lookup is successful and result shows packet is to be
+ *		forwarded, the neighbor tables are searched for the nexthop.
+ *		If successful (ie., FIB lookup shows forwarding and nexthop
+ *		is resolved), the nexthop address is returned in ipv4_dst,
+ *		ipv6_dst or mpls_out based on family, smac is set to mac
+ *		address of egress device, dmac is set to nexthop mac address,
+ *		rt_metric is set to metric from route.
+ *
+ *             *plen* argument is the size of the passed in struct.
+ *             *flags* argument can be one or more BPF_FIB_LOOKUP_ flags:
+ *
+ *             **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs
+ *             full lookup using FIB rules
+ *             **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress
+ *             perspective (default is ingress)
+ *
+ *             *ctx* is either **struct xdp_md** for XDP programs or
+ *             **struct sk_buff** tc cls_act programs.
+ *
+ *     Return
+ *             Egress device index on success, 0 if packet needs to continue
+ *             up the stack for further processing or a negative error in case
+ *             of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -1896,7 +1925,8 @@ union bpf_attr {
 	FN(xdp_adjust_tail),		\
 	FN(skb_get_xfrm_state),		\
 	FN(get_stack),			\
-	FN(skb_load_bytes_relative),
+	FN(skb_load_bytes_relative),	\
+	FN(fib_lookup),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2310,4 +2340,55 @@ struct bpf_raw_tracepoint_args {
 	__u64 args[0];
 };
 
+/* DIRECT:  Skip the FIB rules and go to FIB table associated with device
+ * OUTPUT:  Do lookup from egress perspective; default is ingress
+ */
+#define BPF_FIB_LOOKUP_DIRECT  BIT(0)
+#define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
+
+struct bpf_fib_lookup {
+	/* input */
+	__u8	family;   /* network family, AF_INET, AF_INET6, AF_MPLS */
+
+	/* set if lookup is to consider L4 data - e.g., FIB rules */
+	__u8	l4_protocol;
+	__be16	sport;
+	__be16	dport;
+
+	/* total length of packet from network header - used for MTU check */
+	__u16	tot_len;
+	__u32	ifindex;  /* L3 device index for lookup */
+
+	union {
+		/* inputs to lookup */
+		__u8	tos;		/* AF_INET  */
+		__be32	flowlabel;	/* AF_INET6 */
+
+		/* output: metric of fib result */
+		__u32 rt_metric;
+	};
+
+	union {
+		__be32		mpls_in;
+		__be32		ipv4_src;
+		struct in6_addr	ipv6_src;
+	};
+
+	/* input to bpf_fib_lookup, *dst is destination address.
+	 * output: bpf_fib_lookup sets to gateway address
+	 */
+	union {
+		/* return for MPLS lookups */
+		__be32		mpls_out[4];  /* support up to 4 labels */
+		__be32		ipv4_dst;
+		struct in6_addr	ipv6_dst;
+	};
+
+	/* output */
+	__be16	h_vlan_proto;
+	__be16	h_vlan_TCI;
+	__u8	smac[ETH_ALEN];
+	__u8	dmac[ETH_ALEN];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 6877426c23a6..cf0d27acf1d1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -60,6 +60,10 @@
 #include <net/xfrm.h>
 #include <linux/bpf_trace.h>
 #include <net/xdp_sock.h>
+#include <linux/inetdevice.h>
+#include <net/ip_fib.h>
+#include <net/flow.h>
+#include <net/arp.h>
 
 /**
  *	sk_filter_trim_cap - run a packet through a socket filter
@@ -4032,6 +4036,264 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
 };
 #endif
 
+#if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
+static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
+				  const struct neighbour *neigh,
+				  const struct net_device *dev)
+{
+	memcpy(params->dmac, neigh->ha, ETH_ALEN);
+	memcpy(params->smac, dev->dev_addr, ETH_ALEN);
+	params->h_vlan_TCI = 0;
+	params->h_vlan_proto = 0;
+
+	return dev->ifindex;
+}
+#endif
+
+#if IS_ENABLED(CONFIG_INET)
+static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
+			       u32 flags)
+{
+	struct in_device *in_dev;
+	struct neighbour *neigh;
+	struct net_device *dev;
+	struct fib_result res;
+	struct fib_nh *nh;
+	struct flowi4 fl4;
+	int err;
+
+	dev = dev_get_by_index_rcu(net, params->ifindex);
+	if (unlikely(!dev))
+		return -ENODEV;
+
+	/* verify forwarding is enabled on this interface */
+	in_dev = __in_dev_get_rcu(dev);
+	if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
+		return 0;
+
+	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
+		fl4.flowi4_iif = 1;
+		fl4.flowi4_oif = params->ifindex;
+	} else {
+		fl4.flowi4_iif = params->ifindex;
+		fl4.flowi4_oif = 0;
+	}
+	fl4.flowi4_tos = params->tos & IPTOS_RT_MASK;
+	fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
+	fl4.flowi4_flags = 0;
+
+	fl4.flowi4_proto = params->l4_protocol;
+	fl4.daddr = params->ipv4_dst;
+	fl4.saddr = params->ipv4_src;
+	fl4.fl4_sport = params->sport;
+	fl4.fl4_dport = params->dport;
+
+	if (flags & BPF_FIB_LOOKUP_DIRECT) {
+		u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN;
+		struct fib_table *tb;
+
+		tb = fib_get_table(net, tbid);
+		if (unlikely(!tb))
+			return 0;
+
+		err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
+	} else {
+		fl4.flowi4_mark = 0;
+		fl4.flowi4_secid = 0;
+		fl4.flowi4_tun_key.tun_id = 0;
+		fl4.flowi4_uid = sock_net_uid(net, NULL);
+
+		err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
+	}
+
+	if (err || res.type != RTN_UNICAST)
+		return 0;
+
+	if (res.fi->fib_nhs > 1)
+		fib_select_path(net, &res, &fl4, NULL);
+
+	nh = &res.fi->fib_nh[res.nh_sel];
+
+	/* do not handle lwt encaps right now */
+	if (nh->nh_lwtstate)
+		return 0;
+
+	dev = nh->nh_dev;
+	if (unlikely(!dev))
+		return 0;
+
+	if (nh->nh_gw)
+		params->ipv4_dst = nh->nh_gw;
+
+	params->rt_metric = res.fi->fib_priority;
+
+	/* xdp and cls_bpf programs are run in RCU-bh so
+	 * rcu_read_lock_bh is not needed here
+	 */
+	neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
+	if (neigh)
+		return bpf_fib_set_fwd_params(params, neigh, dev);
+
+	return 0;
+}
+#endif
+
+#if IS_ENABLED(CONFIG_IPV6)
+static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
+			       u32 flags)
+{
+	struct neighbour *neigh;
+	struct net_device *dev;
+	struct inet6_dev *idev;
+	struct fib6_info *f6i;
+	struct flowi6 fl6;
+	int strict = 0;
+	int oif;
+
+	/* link local addresses are never forwarded */
+	if (rt6_need_strict(&params->ipv6_dst) ||
+	    rt6_need_strict(&params->ipv6_src))
+		return 0;
+
+	dev = dev_get_by_index_rcu(net, params->ifindex);
+	if (unlikely(!dev))
+		return -ENODEV;
+
+	idev = __in6_dev_get_safely(dev);
+	if (unlikely(!idev || !net->ipv6.devconf_all->forwarding))
+		return 0;
+
+	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
+		fl6.flowi6_iif = 1;
+		oif = fl6.flowi6_oif = params->ifindex;
+	} else {
+		oif = fl6.flowi6_iif = params->ifindex;
+		fl6.flowi6_oif = 0;
+		strict = RT6_LOOKUP_F_HAS_SADDR;
+	}
+	fl6.flowlabel = params->flowlabel;
+	fl6.flowi6_scope = 0;
+	fl6.flowi6_flags = 0;
+	fl6.mp_hash = 0;
+
+	fl6.flowi6_proto = params->l4_protocol;
+	fl6.daddr = params->ipv6_dst;
+	fl6.saddr = params->ipv6_src;
+	fl6.fl6_sport = params->sport;
+	fl6.fl6_dport = params->dport;
+
+	if (flags & BPF_FIB_LOOKUP_DIRECT) {
+		u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN;
+		struct fib6_table *tb;
+
+		tb = ipv6_stub->fib6_get_table(net, tbid);
+		if (unlikely(!tb))
+			return 0;
+
+		f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict);
+	} else {
+		fl6.flowi6_mark = 0;
+		fl6.flowi6_secid = 0;
+		fl6.flowi6_tun_key.tun_id = 0;
+		fl6.flowi6_uid = sock_net_uid(net, NULL);
+
+		f6i = ipv6_stub->fib6_lookup(net, oif, &fl6, strict);
+	}
+
+	if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry))
+		return 0;
+
+	if (unlikely(f6i->fib6_flags & RTF_REJECT ||
+	    f6i->fib6_type != RTN_UNICAST))
+		return 0;
+
+	if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0)
+		f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6,
+						       fl6.flowi6_oif, NULL,
+						       strict);
+
+	if (f6i->fib6_nh.nh_lwtstate)
+		return 0;
+
+	if (f6i->fib6_flags & RTF_GATEWAY)
+		params->ipv6_dst = f6i->fib6_nh.nh_gw;
+
+	dev = f6i->fib6_nh.nh_dev;
+	params->rt_metric = f6i->fib6_metric;
+
+	/* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is
+	 * not needed here. Can not use __ipv6_neigh_lookup_noref here
+	 * because we need to get nd_tbl via the stub
+	 */
+	neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
+				      ndisc_hashfn, &params->ipv6_dst, dev);
+	if (neigh)
+		return bpf_fib_set_fwd_params(params, neigh, dev);
+
+	return 0;
+}
+#endif
+
+BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
+	   struct bpf_fib_lookup *, params, int, plen, u32, flags)
+{
+	if (plen < sizeof(*params))
+		return -EINVAL;
+
+	switch (params->family) {
+#if IS_ENABLED(CONFIG_INET)
+	case AF_INET:
+		return bpf_ipv4_fib_lookup(dev_net(ctx->rxq->dev), params,
+					   flags);
+#endif
+#if IS_ENABLED(CONFIG_IPV6)
+	case AF_INET6:
+		return bpf_ipv6_fib_lookup(dev_net(ctx->rxq->dev), params,
+					   flags);
+#endif
+	}
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
+	.func		= bpf_xdp_fib_lookup,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_PTR_TO_MEM,
+	.arg3_type      = ARG_CONST_SIZE,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
+	   struct bpf_fib_lookup *, params, int, plen, u32, flags)
+{
+	if (plen < sizeof(*params))
+		return -EINVAL;
+
+	switch (params->family) {
+#if IS_ENABLED(CONFIG_INET)
+	case AF_INET:
+		return bpf_ipv4_fib_lookup(dev_net(skb->dev), params, flags);
+#endif
+#if IS_ENABLED(CONFIG_IPV6)
+	case AF_INET6:
+		return bpf_ipv6_fib_lookup(dev_net(skb->dev), params, flags);
+#endif
+	}
+	return -ENOTSUPP;
+}
+
+static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
+	.func		= bpf_skb_fib_lookup,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_PTR_TO_MEM,
+	.arg3_type      = ARG_CONST_SIZE,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
 {
@@ -4181,6 +4443,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_get_xfrm_state:
 		return &bpf_skb_get_xfrm_state_proto;
 #endif
+	case BPF_FUNC_fib_lookup:
+		return &bpf_skb_fib_lookup_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
@@ -4206,6 +4470,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_redirect_map_proto;
 	case BPF_FUNC_xdp_adjust_tail:
 		return &bpf_xdp_adjust_tail_proto;
+	case BPF_FUNC_fib_lookup:
+		return &bpf_xdp_fib_lookup_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 7/9] net/ipv6: Add fib lookup stubs for use in bpf helper
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.

The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 include/net/addrconf.h   | 14 ++++++++++++++
 net/ipv6/addrconf_core.c | 33 ++++++++++++++++++++++++++++++++-
 net/ipv6/af_inet6.c      |  6 +++++-
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 8312cc25a3af..ff766ab207e0 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -223,6 +223,20 @@ struct ipv6_stub {
 				 const struct in6_addr *addr);
 	int (*ipv6_dst_lookup)(struct net *net, struct sock *sk,
 			       struct dst_entry **dst, struct flowi6 *fl6);
+
+	struct fib6_table *(*fib6_get_table)(struct net *net, u32 id);
+	struct fib6_info *(*fib6_lookup)(struct net *net, int oif,
+					 struct flowi6 *fl6, int flags);
+	struct fib6_info *(*fib6_table_lookup)(struct net *net,
+					      struct fib6_table *table,
+					      int oif, struct flowi6 *fl6,
+					      int flags);
+	struct fib6_info *(*fib6_multipath_select)(const struct net *net,
+						   struct fib6_info *f6i,
+						   struct flowi6 *fl6, int oif,
+						   const struct sk_buff *skb,
+						   int strict);
+
 	void (*udpv6_encap_enable)(void);
 	void (*ndisc_send_na)(struct net_device *dev, const struct in6_addr *daddr,
 			      const struct in6_addr *solicited_addr,
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index 32b564dfd02a..2fe754fd4f5e 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -134,8 +134,39 @@ static int eafnosupport_ipv6_dst_lookup(struct net *net, struct sock *u1,
 	return -EAFNOSUPPORT;
 }
 
+static struct fib6_table *eafnosupport_fib6_get_table(struct net *net, u32 id)
+{
+	return NULL;
+}
+
+static struct fib6_info *
+eafnosupport_fib6_table_lookup(struct net *net, struct fib6_table *table,
+			       int oif, struct flowi6 *fl6, int flags)
+{
+	return NULL;
+}
+
+static struct fib6_info *
+eafnosupport_fib6_lookup(struct net *net, int oif, struct flowi6 *fl6,
+			 int flags)
+{
+	return NULL;
+}
+
+static struct fib6_info *
+eafnosupport_fib6_multipath_select(const struct net *net, struct fib6_info *f6i,
+				   struct flowi6 *fl6, int oif,
+				   const struct sk_buff *skb, int strict)
+{
+	return f6i;
+}
+
 const struct ipv6_stub *ipv6_stub __read_mostly = &(struct ipv6_stub) {
-	.ipv6_dst_lookup = eafnosupport_ipv6_dst_lookup,
+	.ipv6_dst_lookup   = eafnosupport_ipv6_dst_lookup,
+	.fib6_get_table    = eafnosupport_fib6_get_table,
+	.fib6_table_lookup = eafnosupport_fib6_table_lookup,
+	.fib6_lookup       = eafnosupport_fib6_lookup,
+	.fib6_multipath_select = eafnosupport_fib6_multipath_select,
 };
 EXPORT_SYMBOL_GPL(ipv6_stub);
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 36d622c477b1..c0e8255d50bb 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -887,7 +887,11 @@ static struct pernet_operations inet6_net_ops = {
 static const struct ipv6_stub ipv6_stub_impl = {
 	.ipv6_sock_mc_join = ipv6_sock_mc_join,
 	.ipv6_sock_mc_drop = ipv6_sock_mc_drop,
-	.ipv6_dst_lookup = ip6_dst_lookup,
+	.ipv6_dst_lookup   = ip6_dst_lookup,
+	.fib6_get_table	   = fib6_get_table,
+	.fib6_table_lookup = fib6_table_lookup,
+	.fib6_lookup       = fib6_lookup,
+	.fib6_multipath_select = fib6_multipath_select,
 	.udpv6_encap_enable = udpv6_encap_enable,
 	.ndisc_send_na = ndisc_send_na,
 	.nd_tbl	= &nd_tbl,
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 6/9] net/ipv6: Update fib6 tracepoint to take fib6_info
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

Similar to IPv4, IPv6 should use the FIB lookup result in the
tracepoint.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 include/trace/events/fib6.h | 14 +++++++-------
 net/ipv6/route.c            | 14 ++++++--------
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/include/trace/events/fib6.h b/include/trace/events/fib6.h
index 7e8d48a81b91..1b8d951e3c12 100644
--- a/include/trace/events/fib6.h
+++ b/include/trace/events/fib6.h
@@ -12,10 +12,10 @@
 
 TRACE_EVENT(fib6_table_lookup,
 
-	TP_PROTO(const struct net *net, const struct rt6_info *rt,
+	TP_PROTO(const struct net *net, const struct fib6_info *f6i,
 		 struct fib6_table *table, const struct flowi6 *flp),
 
-	TP_ARGS(net, rt, table, flp),
+	TP_ARGS(net, f6i, table, flp),
 
 	TP_STRUCT__entry(
 		__field(	u32,	tb_id		)
@@ -48,20 +48,20 @@ TRACE_EVENT(fib6_table_lookup,
 		in6 = (struct in6_addr *)__entry->dst;
 		*in6 = flp->daddr;
 
-		if (rt->rt6i_idev) {
-			__assign_str(name, rt->rt6i_idev->dev->name);
+		if (f6i->fib6_nh.nh_dev) {
+			__assign_str(name, f6i->fib6_nh.nh_dev);
 		} else {
 			__assign_str(name, "");
 		}
-		if (rt == net->ipv6.ip6_null_entry) {
+		if (f6i == net->ipv6.fib6_null_entry) {
 			struct in6_addr in6_zero = {};
 
 			in6 = (struct in6_addr *)__entry->gw;
 			*in6 = in6_zero;
 
-		} else if (rt) {
+		} else if (f6i) {
 			in6 = (struct in6_addr *)__entry->gw;
-			*in6 = rt->rt6i_gateway;
+			*in6 = f6i->fib6_nh.nh_gw;
 		}
 	),
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d0ace0c5c3e9..cf8de6899581 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1078,6 +1078,8 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
 			goto restart;
 	}
 
+	trace_fib6_table_lookup(net, f6i, table, fl6);
+
 	/* Search through exception table */
 	rt = rt6_find_cached_rt(f6i, &fl6->daddr, &fl6->saddr);
 	if (rt) {
@@ -1096,8 +1098,6 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
 
 	rcu_read_unlock();
 
-	trace_fib6_table_lookup(net, rt, table, fl6);
-
 	return rt;
 }
 
@@ -1827,6 +1827,8 @@ struct fib6_info *fib6_table_lookup(struct net *net, struct fib6_table *table,
 		}
 	}
 
+	trace_fib6_table_lookup(net, f6i, table, fl6);
+
 	return f6i;
 }
 
@@ -1853,7 +1855,6 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 		rt = net->ipv6.ip6_null_entry;
 		rcu_read_unlock();
 		dst_hold(&rt->dst);
-		trace_fib6_table_lookup(net, rt, table, fl6);
 		return rt;
 	}
 
@@ -1864,7 +1865,6 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 			dst_use_noref(&rt->dst, jiffies);
 
 		rcu_read_unlock();
-		trace_fib6_table_lookup(net, rt, table, fl6);
 		return rt;
 	} else if (unlikely((fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH) &&
 			    !(f6i->fib6_flags & RTF_GATEWAY))) {
@@ -1890,9 +1890,7 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 			dst_hold(&uncached_rt->dst);
 		}
 
-		trace_fib6_table_lookup(net, uncached_rt, table, fl6);
 		return uncached_rt;
-
 	} else {
 		/* Get a percpu copy */
 
@@ -1906,7 +1904,7 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 
 		local_bh_enable();
 		rcu_read_unlock();
-		trace_fib6_table_lookup(net, pcpu_rt, table, fl6);
+
 		return pcpu_rt;
 	}
 }
@@ -2486,7 +2484,7 @@ static struct rt6_info *__ip6_route_redirect(struct net *net,
 
 	rcu_read_unlock();
 
-	trace_fib6_table_lookup(net, ret, table, fl6);
+	trace_fib6_table_lookup(net, rt, table, fl6);
 	return ret;
 };
 
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 5/9] net/ipv6: Add fib6_lookup
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

Add IPv6 equivalent to fib_lookup. Does a fib lookup, including rules,
but returns a FIB entry, fib6_info, rather than a dst based rt6_info.
fib6_lookup is any where from 140% (MULTIPLE_TABLES config disabled)
to 60% faster than any of the dst based lookup methods (without custom
rules) and 25% faster with custom rules (e.g., l3mdev rule).

Since the lookup function has a completely different signature,
fib6_rule_action is split into 2 paths: the existing one is
renamed __fib6_rule_action and a new one for the fib6_info path
is added. fib6_rule_action decides which to call based on the
lookup_ptr. If it is fib6_table_lookup then the new path is taken.

Caller must hold rcu lock as no reference is taken on the returned
fib entry.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip6_fib.h |  6 ++++
 net/ipv6/fib6_rules.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++--
 net/ipv6/ip6_fib.c    |  7 +++++
 3 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 4f7b8f59ea6d..d920dd00139b 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -376,6 +376,12 @@ struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 				   const struct sk_buff *skb,
 				   int flags, pol_lookup_t lookup);
 
+/* called with rcu lock held; can return error pointer
+ * caller needs to select path
+ */
+struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6,
+			      int flags);
+
 /* called with rcu lock held; caller needs to select path */
 struct fib6_info *fib6_table_lookup(struct net *net, struct fib6_table *table,
 				    int oif, struct flowi6 *fl6, int strict);
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index d040c4bff3a0..f590446595d8 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -60,6 +60,39 @@ unsigned int fib6_rules_seq_read(struct net *net)
 	return fib_rules_seq_read(net, AF_INET6);
 }
 
+/* called with rcu lock held; no reference taken on fib6_info */
+struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6,
+			      int flags)
+{
+	struct fib6_info *f6i;
+	int err;
+
+	if (net->ipv6.fib6_has_custom_rules) {
+		struct fib_lookup_arg arg = {
+			.lookup_ptr = fib6_table_lookup,
+			.lookup_data = &oif,
+			.flags = FIB_LOOKUP_NOREF,
+		};
+
+		l3mdev_update_flow(net, flowi6_to_flowi(fl6));
+
+		err = fib_rules_lookup(net->ipv6.fib6_rules_ops,
+				       flowi6_to_flowi(fl6), flags, &arg);
+		if (err)
+			return ERR_PTR(err);
+
+		f6i = arg.result ? : net->ipv6.fib6_null_entry;
+	} else {
+		f6i = fib6_table_lookup(net, net->ipv6.fib6_local_tbl,
+					oif, fl6, flags);
+		if (!f6i || f6i == net->ipv6.fib6_null_entry)
+			f6i = fib6_table_lookup(net, net->ipv6.fib6_main_tbl,
+						oif, fl6, flags);
+	}
+
+	return f6i;
+}
+
 struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 				   const struct sk_buff *skb,
 				   int flags, pol_lookup_t lookup)
@@ -121,8 +154,48 @@ static int fib6_rule_saddr(struct net *net, struct fib_rule *rule, int flags,
 	return 0;
 }
 
-static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
-			    int flags, struct fib_lookup_arg *arg)
+static int fib6_rule_action_alt(struct fib_rule *rule, struct flowi *flp,
+				int flags, struct fib_lookup_arg *arg)
+{
+	struct flowi6 *flp6 = &flp->u.ip6;
+	struct net *net = rule->fr_net;
+	struct fib6_table *table;
+	struct fib6_info *f6i;
+	int err = -EAGAIN, *oif;
+	u32 tb_id;
+
+	switch (rule->action) {
+	case FR_ACT_TO_TBL:
+		break;
+	case FR_ACT_UNREACHABLE:
+		return -ENETUNREACH;
+	case FR_ACT_PROHIBIT:
+		return -EACCES;
+	case FR_ACT_BLACKHOLE:
+	default:
+		return -EINVAL;
+	}
+
+	tb_id = fib_rule_get_table(rule, arg);
+	table = fib6_get_table(net, tb_id);
+	if (!table)
+		return -EAGAIN;
+
+	oif = (int *)arg->lookup_data;
+	f6i = fib6_table_lookup(net, table, *oif, flp6, flags);
+	if (f6i != net->ipv6.fib6_null_entry) {
+		err = fib6_rule_saddr(net, rule, flags, flp6,
+				      fib6_info_nh_dev(f6i));
+
+		if (likely(!err))
+			arg->result = f6i;
+	}
+
+	return err;
+}
+
+static int __fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
+			      int flags, struct fib_lookup_arg *arg)
 {
 	struct flowi6 *flp6 = &flp->u.ip6;
 	struct rt6_info *rt = NULL;
@@ -182,6 +255,15 @@ static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
 	return err;
 }
 
+static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
+			    int flags, struct fib_lookup_arg *arg)
+{
+	if (arg->lookup_ptr == fib6_table_lookup)
+		return fib6_rule_action_alt(rule, flp, flags, arg);
+
+	return __fib6_rule_action(rule, flp, flags, arg);
+}
+
 static bool fib6_rule_suppress(struct fib_rule *rule, struct fib_lookup_arg *arg)
 {
 	struct rt6_info *rt = (struct rt6_info *) arg->result;
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 4cfffa0f676e..0b94c0a631cb 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -354,6 +354,13 @@ struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 	return &rt->dst;
 }
 
+/* called with rcu lock held; no reference taken on fib6_info */
+struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6,
+			      int flags)
+{
+	return fib6_table_lookup(net, net->ipv6.fib6_main_tbl, oif, fl6, flags);
+}
+
 static void __net_init fib6_tables_init(struct net *net)
 {
 	fib6_link_table(net, net->ipv6.fib6_main_tbl);
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 4/9] net/ipv6: Refactor fib6_rule_action
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

Move source address lookup from fib6_rule_action to a helper. It will be
used in a later patch by a second variant for fib6_rule_action.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/fib6_rules.c | 52 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index 6547fc6491a6..d040c4bff3a0 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -96,6 +96,31 @@ struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 	return &net->ipv6.ip6_null_entry->dst;
 }
 
+static int fib6_rule_saddr(struct net *net, struct fib_rule *rule, int flags,
+			   struct flowi6 *flp6, const struct net_device *dev)
+{
+	struct fib6_rule *r = (struct fib6_rule *)rule;
+
+	/* If we need to find a source address for this traffic,
+	 * we check the result if it meets requirement of the rule.
+	 */
+	if ((rule->flags & FIB_RULE_FIND_SADDR) &&
+	    r->src.plen && !(flags & RT6_LOOKUP_F_HAS_SADDR)) {
+		struct in6_addr saddr;
+
+		if (ipv6_dev_get_saddr(net, dev, &flp6->daddr,
+				       rt6_flags2srcprefs(flags), &saddr))
+			return -EAGAIN;
+
+		if (!ipv6_prefix_equal(&saddr, &r->src.addr, r->src.plen))
+			return -EAGAIN;
+
+		flp6->saddr = saddr;
+	}
+
+	return 0;
+}
+
 static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
 			    int flags, struct fib_lookup_arg *arg)
 {
@@ -134,27 +159,12 @@ static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
 
 	rt = lookup(net, table, flp6, arg->lookup_data, flags);
 	if (rt != net->ipv6.ip6_null_entry) {
-		struct fib6_rule *r = (struct fib6_rule *)rule;
-
-		/*
-		 * If we need to find a source address for this traffic,
-		 * we check the result if it meets requirement of the rule.
-		 */
-		if ((rule->flags & FIB_RULE_FIND_SADDR) &&
-		    r->src.plen && !(flags & RT6_LOOKUP_F_HAS_SADDR)) {
-			struct in6_addr saddr;
-
-			if (ipv6_dev_get_saddr(net,
-					       ip6_dst_idev(&rt->dst)->dev,
-					       &flp6->daddr,
-					       rt6_flags2srcprefs(flags),
-					       &saddr))
-				goto again;
-			if (!ipv6_prefix_equal(&saddr, &r->src.addr,
-					       r->src.plen))
-				goto again;
-			flp6->saddr = saddr;
-		}
+		err = fib6_rule_saddr(net, rule, flags, flp6,
+				      ip6_dst_idev(&rt->dst)->dev);
+
+		if (err == -EAGAIN)
+			goto again;
+
 		err = rt->dst.error;
 		if (err != -EAGAIN)
 			goto out;
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 3/9] net/ipv6: Extract table lookup from ip6_pol_route
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

ip6_pol_route is used for ingress and egress FIB lookups. Refactor it
moving the table lookup into a separate fib6_table_lookup that can be
invoked separately and export the new function.

ip6_pol_route now calls fib6_table_lookup and uses the result to generate
a dst based rt6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip6_fib.h |  4 ++++
 net/ipv6/route.c      | 39 +++++++++++++++++++++++++--------------
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 80d76d8dc683..4f7b8f59ea6d 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -376,6 +376,10 @@ struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 				   const struct sk_buff *skb,
 				   int flags, pol_lookup_t lookup);
 
+/* called with rcu lock held; caller needs to select path */
+struct fib6_info *fib6_table_lookup(struct net *net, struct fib6_table *table,
+				    int oif, struct flowi6 *fl6, int strict);
+
 struct fib6_info *fib6_multipath_select(const struct net *net,
 					struct fib6_info *match,
 					struct flowi6 *fl6, int oif,
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 58af969f3a2c..d0ace0c5c3e9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1800,21 +1800,12 @@ void rt6_age_exceptions(struct fib6_info *rt,
 	rcu_read_unlock_bh();
 }
 
-struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
-			       int oif, struct flowi6 *fl6,
-			       const struct sk_buff *skb, int flags)
+/* must be called with rcu lock held */
+struct fib6_info *fib6_table_lookup(struct net *net, struct fib6_table *table,
+				    int oif, struct flowi6 *fl6, int strict)
 {
 	struct fib6_node *fn, *saved_fn;
 	struct fib6_info *f6i;
-	struct rt6_info *rt;
-	int strict = 0;
-
-	strict |= flags & RT6_LOOKUP_F_IFACE;
-	strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE;
-	if (net->ipv6.devconf_all->forwarding == 0)
-		strict |= RT6_LOOKUP_F_REACHABLE;
-
-	rcu_read_lock();
 
 	fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 	saved_fn = fn;
@@ -1824,8 +1815,6 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 
 redo_rt6_select:
 	f6i = rt6_select(net, fn, oif, strict);
-	if (f6i->fib6_nsiblings)
-		f6i = fib6_multipath_select(net, f6i, fl6, oif, skb, strict);
 	if (f6i == net->ipv6.fib6_null_entry) {
 		fn = fib6_backtrack(fn, &fl6->saddr);
 		if (fn)
@@ -1838,6 +1827,28 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 		}
 	}
 
+	return f6i;
+}
+
+struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
+			       int oif, struct flowi6 *fl6,
+			       const struct sk_buff *skb, int flags)
+{
+	struct fib6_info *f6i;
+	struct rt6_info *rt;
+	int strict = 0;
+
+	strict |= flags & RT6_LOOKUP_F_IFACE;
+	strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE;
+	if (net->ipv6.devconf_all->forwarding == 0)
+		strict |= RT6_LOOKUP_F_REACHABLE;
+
+	rcu_read_lock();
+
+	f6i = fib6_table_lookup(net, table, oif, fl6, strict);
+	if (f6i->fib6_nsiblings)
+		f6i = fib6_multipath_select(net, f6i, fl6, oif, skb, strict);
+
 	if (f6i == net->ipv6.fib6_null_entry) {
 		rt = net->ipv6.ip6_null_entry;
 		rcu_read_unlock();
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 2/9] net/ipv6: Rename rt6_multipath_select
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

Rename rt6_multipath_select to fib6_multipath_select and export it.
A later patch wants access to it similar to IPv4's fib_select_path.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip6_fib.h |  5 +++++
 net/ipv6/route.c      | 17 +++++++++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 5a16630179cb..80d76d8dc683 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -376,6 +376,11 @@ struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 				   const struct sk_buff *skb,
 				   int flags, pol_lookup_t lookup);
 
+struct fib6_info *fib6_multipath_select(const struct net *net,
+					struct fib6_info *match,
+					struct flowi6 *fl6, int oif,
+					const struct sk_buff *skb, int strict);
+
 struct fib6_node *fib6_node_lookup(struct fib6_node *root,
 				   const struct in6_addr *daddr,
 				   const struct in6_addr *saddr);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d903db30dfff..58af969f3a2c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -419,11 +419,11 @@ static bool rt6_check_expired(const struct rt6_info *rt)
 	return false;
 }
 
-static struct fib6_info *rt6_multipath_select(const struct net *net,
-					      struct fib6_info *match,
-					     struct flowi6 *fl6, int oif,
-					     const struct sk_buff *skb,
-					     int strict)
+struct fib6_info *fib6_multipath_select(const struct net *net,
+					struct fib6_info *match,
+					struct flowi6 *fl6, int oif,
+					const struct sk_buff *skb,
+					int strict)
 {
 	struct fib6_info *sibling, *next_sibling;
 
@@ -1068,8 +1068,9 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
 		f6i = rt6_device_match(net, f6i, &fl6->saddr,
 				      fl6->flowi6_oif, flags);
 		if (f6i->fib6_nsiblings && fl6->flowi6_oif == 0)
-			f6i = rt6_multipath_select(net, f6i, fl6,
-						   fl6->flowi6_oif, skb, flags);
+			f6i = fib6_multipath_select(net, f6i, fl6,
+						    fl6->flowi6_oif, skb,
+						    flags);
 	}
 	if (f6i == net->ipv6.fib6_null_entry) {
 		fn = fib6_backtrack(fn, &fl6->saddr);
@@ -1824,7 +1825,7 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 redo_rt6_select:
 	f6i = rt6_select(net, fn, oif, strict);
 	if (f6i->fib6_nsiblings)
-		f6i = rt6_multipath_select(net, f6i, fl6, oif, skb, strict);
+		f6i = fib6_multipath_select(net, f6i, fl6, oif, skb, strict);
 	if (f6i == net->ipv6.fib6_null_entry) {
 		fn = fib6_backtrack(fn, &fl6->saddr);
 		if (fn)
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 1/9] net/ipv6: Rename fib6_lookup to fib6_node_lookup
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern
In-Reply-To: <20180504025432.23451-1-dsahern@gmail.com>

Rename fib6_lookup to fib6_node_lookup to better reflect what it
returns. The fib6_lookup name will be used in a later patch for
an IPv6 equivalent to IPv4's fib_lookup.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/ip6_fib.h |  6 +++---
 net/ipv6/ip6_fib.c    | 14 ++++++++------
 net/ipv6/route.c      |  8 ++++----
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 1af450d4e923..5a16630179cb 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -376,9 +376,9 @@ struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 				   const struct sk_buff *skb,
 				   int flags, pol_lookup_t lookup);
 
-struct fib6_node *fib6_lookup(struct fib6_node *root,
-			      const struct in6_addr *daddr,
-			      const struct in6_addr *saddr);
+struct fib6_node *fib6_node_lookup(struct fib6_node *root,
+				   const struct in6_addr *daddr,
+				   const struct in6_addr *saddr);
 
 struct fib6_node *fib6_locate(struct fib6_node *root,
 			      const struct in6_addr *daddr, int dst_len,
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 6421c893466e..4cfffa0f676e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1354,8 +1354,8 @@ struct lookup_args {
 	const struct in6_addr	*addr;		/* search key			*/
 };
 
-static struct fib6_node *fib6_lookup_1(struct fib6_node *root,
-				       struct lookup_args *args)
+static struct fib6_node *fib6_node_lookup_1(struct fib6_node *root,
+					    struct lookup_args *args)
 {
 	struct fib6_node *fn;
 	__be32 dir;
@@ -1400,7 +1400,8 @@ static struct fib6_node *fib6_lookup_1(struct fib6_node *root,
 #ifdef CONFIG_IPV6_SUBTREES
 				if (subtree) {
 					struct fib6_node *sfn;
-					sfn = fib6_lookup_1(subtree, args + 1);
+					sfn = fib6_node_lookup_1(subtree,
+								 args + 1);
 					if (!sfn)
 						goto backtrack;
 					fn = sfn;
@@ -1422,8 +1423,9 @@ static struct fib6_node *fib6_lookup_1(struct fib6_node *root,
 
 /* called with rcu_read_lock() held
  */
-struct fib6_node *fib6_lookup(struct fib6_node *root, const struct in6_addr *daddr,
-			      const struct in6_addr *saddr)
+struct fib6_node *fib6_node_lookup(struct fib6_node *root,
+				   const struct in6_addr *daddr,
+				   const struct in6_addr *saddr)
 {
 	struct fib6_node *fn;
 	struct lookup_args args[] = {
@@ -1442,7 +1444,7 @@ struct fib6_node *fib6_lookup(struct fib6_node *root, const struct in6_addr *dad
 		}
 	};
 
-	fn = fib6_lookup_1(root, daddr ? args : args + 1);
+	fn = fib6_node_lookup_1(root, daddr ? args : args + 1);
 	if (!fn || fn->fn_flags & RTN_TL_ROOT)
 		fn = root;
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7ee0a34fba46..d903db30dfff 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1006,7 +1006,7 @@ static struct fib6_node* fib6_backtrack(struct fib6_node *fn,
 		pn = rcu_dereference(fn->parent);
 		sn = FIB6_SUBTREE(pn);
 		if (sn && sn != fn)
-			fn = fib6_lookup(sn, NULL, saddr);
+			fn = fib6_node_lookup(sn, NULL, saddr);
 		else
 			fn = pn;
 		if (fn->fn_flags & RTN_RTINFO)
@@ -1059,7 +1059,7 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
 		flags &= ~RT6_LOOKUP_F_IFACE;
 
 	rcu_read_lock();
-	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
+	fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 restart:
 	f6i = rcu_dereference(fn->leaf);
 	if (!f6i) {
@@ -1815,7 +1815,7 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 
 	rcu_read_lock();
 
-	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
+	fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 	saved_fn = fn;
 
 	if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF)
@@ -2420,7 +2420,7 @@ static struct rt6_info *__ip6_route_redirect(struct net *net,
 	 */
 
 	rcu_read_lock();
-	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
+	fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 restart:
 	for_each_fib6_node_rt_rcu(fn) {
 		if (rt->fib6_nh.nh_flags & RTNH_F_DEAD)
-- 
2.11.0

^ permalink raw reply related

* [bpf-next v2 0/9] bpf: Add helper to do FIB lookups
From: David Ahern @ 2018-05-04  2:54 UTC (permalink / raw)
  To: netdev, borkmann, ast
  Cc: davem, shm, roopa, brouer, toke, john.fastabend, David Ahern

Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet is expected to continue up the stack
for full processing.

The response from a FIB and neighbor lookup is either the egress index
with the bpf_fib_lookup struct filled in with dmac and gateway or
0 meaning the packet should continue up the stack. In time we can
revisit this to return the FIB lookup result errno if it is one of the
special RTN_'s such as RTN_BLACKHOLE (-EINVAL) so that the XDP
programs can do an early drop if desired.

Patches 1-6 do some more refactoring to IPv6 with the end goal of
extracting a FIB lookup function that aligns with fib_lookup for IPv4,
basically returning a fib6_info without creating a dst based entry.

Patch 7 adds lookup functions to the ipv6 stub. These are needed since
bpf is built into the kernel and ipv6 may not be built or loaded.

Patch 8 adds the bpf helper and 9 adds a sample program.

v2
- removed pkt_access from bpf_func_proto as noticed by Daniel
- added check in that IPv6 forwarding is enabled 
- added DaveM's ack on patches 1-7 and 9 based on v1 response and
  fact that no changes were made to them in v2

v1
- updated commit messages and cover letter
- added comment to sample program noting lack of verification on
  egress device supporting XDP

RFC v2
- fixed use of foward helper from cls_act as noted by Daniel
- in patch 1 rename fib6_lookup_1 as well for consistency

David Ahern (9):
  net/ipv6: Rename fib6_lookup to fib6_node_lookup
  net/ipv6: Rename rt6_multipath_select
  net/ipv6: Extract table lookup from ip6_pol_route
  net/ipv6: Refactor fib6_rule_action
  net/ipv6: Add fib6_lookup
  net/ipv6: Update fib6 tracepoint to take fib6_info
  net/ipv6: Add fib lookup stubs for use in bpf helper
  bpf: Provide helper to do forwarding lookups in kernel FIB table
  samples/bpf: Add example of ipv4 and ipv6 forwarding in XDP

 include/net/addrconf.h                    |  14 ++
 include/net/ip6_fib.h                     |  21 ++-
 include/trace/events/fib6.h               |  14 +-
 include/uapi/linux/bpf.h                  |  83 +++++++++-
 net/core/filter.c                         | 266 ++++++++++++++++++++++++++++++
 net/ipv6/addrconf_core.c                  |  33 +++-
 net/ipv6/af_inet6.c                       |   6 +-
 net/ipv6/fib6_rules.c                     | 138 +++++++++++++---
 net/ipv6/ip6_fib.c                        |  21 ++-
 net/ipv6/route.c                          |  76 +++++----
 samples/bpf/Makefile                      |   4 +
 samples/bpf/xdp_fwd_kern.c                | 113 +++++++++++++
 samples/bpf/xdp_fwd_user.c                | 136 +++++++++++++++
 tools/testing/selftests/bpf/bpf_helpers.h |   3 +
 14 files changed, 853 insertions(+), 75 deletions(-)
 create mode 100644 samples/bpf/xdp_fwd_kern.c
 create mode 100644 samples/bpf/xdp_fwd_user.c

-- 
2.11.0

^ permalink raw reply

* Re: [PATCH bpf] bpf: use array_index_nospec in find_prog_type
From: Alexei Starovoitov @ 2018-05-04  2:32 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, Mark Rutland
In-Reply-To: <925948ff31d036a1e22dcaa59e989c485945e1c4.1525392315.git.daniel@iogearbox.net>

On Fri, May 04, 2018 at 02:13:57AM +0200, Daniel Borkmann wrote:
> Commit 9ef09e35e521 ("bpf: fix possible spectre-v1 in find_and_alloc_map()")
> converted find_and_alloc_map() over to use array_index_nospec() to sanitize
> map type that user space passes on map creation, and this patch does an
> analogous conversion for progs in find_prog_type() as it's also passed from
> user space when loading progs as attr->prog_type.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Mark Rutland <mark.rutland@arm.com>

Applied to bpf tree, thanks Daniel.

^ permalink raw reply

* [PATCH net-next] liquidio: support use of ethtool to set link speed of CN23XX-225 cards
From: Felix Manlunas @ 2018-05-04  2:23 UTC (permalink / raw)
  To: davem
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	felix.manlunas, weilin.chang

From: Weilin Chang <weilin.chang@cavium.com>

Support setting the link speed of CN23XX-225 cards (which can do 25Gbps or
10Gbps) via ethtool_ops.set_link_ksettings.

Also fix the function assigned to ethtool_ops.get_link_ksettings to use the
new link_ksettings api completely (instead of partially via
ethtool_convert_legacy_u32_to_link_mode).

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Acked-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
 drivers/net/ethernet/cavium/liquidio/lio_core.c    | 196 +++++++++++++++++++++
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 193 +++++++++++++++++---
 drivers/net/ethernet/cavium/liquidio/lio_main.c    |  20 +++
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |   5 +
 .../net/ethernet/cavium/liquidio/liquidio_common.h |   4 +
 .../net/ethernet/cavium/liquidio/octeon_device.h   |  14 ++
 .../net/ethernet/cavium/liquidio/octeon_network.h  |  15 ++
 7 files changed, 423 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 6821afc..8093c5e 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -1481,3 +1481,199 @@ int octnet_get_link_stats(struct net_device *netdev)
 
 	return 0;
 }
+
+static void liquidio_nic_seapi_ctl_callback(struct octeon_device *oct,
+					    u32 status,
+					    void *buf)
+{
+	struct liquidio_nic_seapi_ctl_context *ctx;
+	struct octeon_soft_command *sc = buf;
+
+	ctx = sc->ctxptr;
+
+	oct = lio_get_device(ctx->octeon_id);
+	if (status) {
+		dev_err(&oct->pci_dev->dev, "%s: instruction failed. Status: %llx\n",
+			__func__,
+			CVM_CAST64(status));
+	}
+	ctx->status = status;
+	complete(&ctx->complete);
+}
+
+int liquidio_set_speed(struct lio *lio, int speed)
+{
+	struct liquidio_nic_seapi_ctl_context *ctx;
+	struct octeon_device *oct = lio->oct_dev;
+	struct oct_nic_seapi_resp *resp;
+	struct octeon_soft_command *sc;
+	union octnet_cmd *ncmd;
+	u32 ctx_size;
+	int retval;
+	u32 var;
+
+	if (oct->speed_setting == speed)
+		return 0;
+
+	if (!OCTEON_CN23XX_PF(oct)) {
+		dev_err(&oct->pci_dev->dev, "%s: SET SPEED only for PF\n",
+			__func__);
+		return -EOPNOTSUPP;
+	}
+
+	ctx_size = sizeof(struct liquidio_nic_seapi_ctl_context);
+	sc = octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE,
+				       sizeof(struct oct_nic_seapi_resp),
+				       ctx_size);
+	if (!sc)
+		return -ENOMEM;
+
+	ncmd = sc->virtdptr;
+	ctx  = sc->ctxptr;
+	resp = sc->virtrptr;
+	memset(resp, 0, sizeof(struct oct_nic_seapi_resp));
+
+	ctx->octeon_id = lio_get_device_id(oct);
+	ctx->status = 0;
+	init_completion(&ctx->complete);
+
+	ncmd->u64 = 0;
+	ncmd->s.cmd = SEAPI_CMD_SPEED_SET;
+	ncmd->s.param1 = speed;
+
+	octeon_swap_8B_data((u64 *)ncmd, (OCTNET_CMD_SIZE >> 3));
+
+	sc->iq_no = lio->linfo.txpciq[0].s.q_no;
+
+	octeon_prepare_soft_command(oct, sc, OPCODE_NIC,
+				    OPCODE_NIC_UBOOT_CTL, 0, 0, 0);
+
+	sc->callback = liquidio_nic_seapi_ctl_callback;
+	sc->callback_arg = sc;
+	sc->wait_time = 5000;
+
+	retval = octeon_send_soft_command(oct, sc);
+	if (retval == IQ_SEND_FAILED) {
+		dev_info(&oct->pci_dev->dev, "Failed to send soft command\n");
+		retval = -EBUSY;
+	} else {
+		/* Wait for response or timeout */
+		if (wait_for_completion_timeout(&ctx->complete,
+						msecs_to_jiffies(10000)) == 0) {
+			dev_err(&oct->pci_dev->dev, "%s: sc timeout\n",
+				__func__);
+			octeon_free_soft_command(oct, sc);
+			return -EINTR;
+		}
+
+		retval = resp->status;
+
+		if (retval) {
+			dev_err(&oct->pci_dev->dev, "%s failed, retval=%d\n",
+				__func__, retval);
+			octeon_free_soft_command(oct, sc);
+			return -EIO;
+		}
+
+		var = be32_to_cpu((__force __be32)resp->speed);
+		if (var != speed) {
+			dev_err(&oct->pci_dev->dev,
+				"%s: setting failed speed= %x, expect %x\n",
+				__func__, var, speed);
+		}
+
+		oct->speed_setting = var;
+	}
+
+	octeon_free_soft_command(oct, sc);
+
+	return retval;
+}
+
+int liquidio_get_speed(struct lio *lio)
+{
+	struct liquidio_nic_seapi_ctl_context *ctx;
+	struct octeon_device *oct = lio->oct_dev;
+	struct oct_nic_seapi_resp *resp;
+	struct octeon_soft_command *sc;
+	union octnet_cmd *ncmd;
+	u32 ctx_size;
+	int retval;
+
+	ctx_size = sizeof(struct liquidio_nic_seapi_ctl_context);
+	sc = octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE,
+				       sizeof(struct oct_nic_seapi_resp),
+				       ctx_size);
+	if (!sc)
+		return -ENOMEM;
+
+	ncmd = sc->virtdptr;
+	ctx  = sc->ctxptr;
+	resp = sc->virtrptr;
+	memset(resp, 0, sizeof(struct oct_nic_seapi_resp));
+
+	ctx->octeon_id = lio_get_device_id(oct);
+	ctx->status = 0;
+	init_completion(&ctx->complete);
+
+	ncmd->u64 = 0;
+	ncmd->s.cmd = SEAPI_CMD_SPEED_GET;
+
+	octeon_swap_8B_data((u64 *)ncmd, (OCTNET_CMD_SIZE >> 3));
+
+	sc->iq_no = lio->linfo.txpciq[0].s.q_no;
+
+	octeon_prepare_soft_command(oct, sc, OPCODE_NIC,
+				    OPCODE_NIC_UBOOT_CTL, 0, 0, 0);
+
+	sc->callback = liquidio_nic_seapi_ctl_callback;
+	sc->callback_arg = sc;
+	sc->wait_time = 5000;
+
+	retval = octeon_send_soft_command(oct, sc);
+	if (retval == IQ_SEND_FAILED) {
+		dev_info(&oct->pci_dev->dev, "Failed to send soft command\n");
+		oct->no_speed_setting = 1;
+		oct->speed_setting = 25;
+
+		retval = -EBUSY;
+	} else {
+		if (wait_for_completion_timeout(&ctx->complete,
+						msecs_to_jiffies(10000)) == 0) {
+			dev_err(&oct->pci_dev->dev, "%s: sc timeout\n",
+				__func__);
+
+			oct->speed_setting = 25;
+			oct->no_speed_setting = 1;
+
+			octeon_free_soft_command(oct, sc);
+
+			return -EINTR;
+		}
+		retval = resp->status;
+		if (retval) {
+			dev_err(&oct->pci_dev->dev,
+				"%s failed retval=%d\n", __func__, retval);
+			oct->no_speed_setting = 1;
+			oct->speed_setting = 25;
+			octeon_free_soft_command(oct, sc);
+			retval = -EIO;
+		} else {
+			u32 var;
+
+			var = be32_to_cpu((__force __be32)resp->speed);
+			oct->speed_setting = var;
+			if (var == 0xffff) {
+				oct->no_speed_setting = 1;
+				/* unable to access boot variables
+				 * get the default value based on the NIC type
+				 */
+				oct->speed_setting = 25;
+			}
+		}
+	}
+
+	octeon_free_soft_command(oct, sc);
+
+	return retval;
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index a1d8430..e2d6d42 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -230,46 +230,147 @@ static int lio_get_link_ksettings(struct net_device *netdev,
 	struct lio *lio = GET_LIO(netdev);
 	struct octeon_device *oct = lio->oct_dev;
 	struct oct_link_info *linfo;
-	u32 supported = 0, advertising = 0;
 
 	linfo = &lio->linfo;
 
+	ethtool_link_ksettings_zero_link_mode(ecmd, supported);
+	ethtool_link_ksettings_zero_link_mode(ecmd, advertising);
+
 	switch (linfo->link.s.phy_type) {
 	case LIO_PHY_PORT_TP:
 		ecmd->base.port = PORT_TP;
-		supported = (SUPPORTED_10000baseT_Full |
-			     SUPPORTED_TP | SUPPORTED_Pause);
-		advertising = (ADVERTISED_10000baseT_Full | ADVERTISED_Pause);
 		ecmd->base.autoneg = AUTONEG_DISABLE;
+		ethtool_link_ksettings_add_link_mode(ecmd, supported, TP);
+		ethtool_link_ksettings_add_link_mode(ecmd, supported, Pause);
+		ethtool_link_ksettings_add_link_mode(ecmd, supported,
+						     10000baseT_Full);
+
+		ethtool_link_ksettings_add_link_mode(ecmd, advertising, Pause);
+		ethtool_link_ksettings_add_link_mode(ecmd, advertising,
+						     10000baseT_Full);
+
 		break;
 
 	case LIO_PHY_PORT_FIBRE:
-		ecmd->base.port = PORT_FIBRE;
-
-		if (linfo->link.s.speed == SPEED_10000) {
-			supported = SUPPORTED_10000baseT_Full;
-			advertising = ADVERTISED_10000baseT_Full;
+		if (linfo->link.s.if_mode == INTERFACE_MODE_XAUI ||
+		    linfo->link.s.if_mode == INTERFACE_MODE_RXAUI ||
+		    linfo->link.s.if_mode == INTERFACE_MODE_XLAUI ||
+		    linfo->link.s.if_mode == INTERFACE_MODE_XFI) {
+			dev_dbg(&oct->pci_dev->dev, "ecmd->base.transceiver is XCVR_EXTERNAL\n");
+		} else {
+			dev_err(&oct->pci_dev->dev, "Unknown link interface mode: %d\n",
+				linfo->link.s.if_mode);
 		}
 
-		supported |= SUPPORTED_FIBRE | SUPPORTED_Pause;
-		advertising |= ADVERTISED_Pause;
+		ecmd->base.port = PORT_FIBRE;
 		ecmd->base.autoneg = AUTONEG_DISABLE;
+		ethtool_link_ksettings_add_link_mode(ecmd, supported, FIBRE);
+
+		ethtool_link_ksettings_add_link_mode(ecmd, supported, Pause);
+		ethtool_link_ksettings_add_link_mode(ecmd, advertising, Pause);
+		if (oct->subsystem_id == OCTEON_CN2350_25GB_SUBSYS_ID ||
+		    oct->subsystem_id == OCTEON_CN2360_25GB_SUBSYS_ID) {
+			if (OCTEON_CN23XX_PF(oct)) {
+				ethtool_link_ksettings_add_link_mode
+					(ecmd, supported, 25000baseSR_Full);
+				ethtool_link_ksettings_add_link_mode
+					(ecmd, supported, 25000baseKR_Full);
+				ethtool_link_ksettings_add_link_mode
+					(ecmd, supported, 25000baseCR_Full);
+
+				if (oct->no_speed_setting == 0)  {
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 10000baseSR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 10000baseKR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 10000baseCR_Full);
+				}
+
+				if (oct->no_speed_setting == 0)
+					liquidio_get_speed(lio);
+				else
+					oct->speed_setting = 25;
+
+				if (oct->speed_setting == 10) {
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 10000baseSR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 10000baseKR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 10000baseCR_Full);
+				}
+				if (oct->speed_setting == 25) {
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 25000baseSR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 25000baseKR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 25000baseCR_Full);
+				}
+			} else { /* VF */
+				if (linfo->link.s.speed == 10000) {
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 10000baseSR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 10000baseKR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 10000baseCR_Full);
+
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 10000baseSR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 10000baseKR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 10000baseCR_Full);
+				}
+
+				if (linfo->link.s.speed == 25000) {
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 25000baseSR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 25000baseKR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, supported,
+						 25000baseCR_Full);
+
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 25000baseSR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 25000baseKR_Full);
+					ethtool_link_ksettings_add_link_mode
+						(ecmd, advertising,
+						 25000baseCR_Full);
+				}
+			}
+		} else {
+			ethtool_link_ksettings_add_link_mode(ecmd, supported,
+							     10000baseT_Full);
+			ethtool_link_ksettings_add_link_mode(ecmd, advertising,
+							     10000baseT_Full);
+		}
 		break;
 	}
 
-	if (linfo->link.s.if_mode == INTERFACE_MODE_XAUI ||
-	    linfo->link.s.if_mode == INTERFACE_MODE_RXAUI ||
-	    linfo->link.s.if_mode == INTERFACE_MODE_XLAUI ||
-	    linfo->link.s.if_mode == INTERFACE_MODE_XFI) {
-		ethtool_convert_legacy_u32_to_link_mode(
-			ecmd->link_modes.supported, supported);
-		ethtool_convert_legacy_u32_to_link_mode(
-			ecmd->link_modes.advertising, advertising);
-	} else {
-		dev_err(&oct->pci_dev->dev, "Unknown link interface reported %d\n",
-			linfo->link.s.if_mode);
-	}
-
 	if (linfo->link.s.link_up) {
 		ecmd->base.speed = linfo->link.s.speed;
 		ecmd->base.duplex = linfo->link.s.duplex;
@@ -281,6 +382,49 @@ static int lio_get_link_ksettings(struct net_device *netdev,
 	return 0;
 }
 
+static int lio_set_link_ksettings(struct net_device *netdev,
+				  const struct ethtool_link_ksettings *ecmd)
+{
+	struct lio *lio = GET_LIO(netdev);
+	struct octeon_device *oct = lio->oct_dev;
+	struct oct_link_info *linfo;
+	const int speed = ecmd->base.speed;
+	u32 is25G = 0;
+
+	linfo = &lio->linfo;
+
+	if (oct->subsystem_id == OCTEON_CN2350_25GB_SUBSYS_ID ||
+	    oct->subsystem_id == OCTEON_CN2360_25GB_SUBSYS_ID) {
+		is25G = 1;
+	} else {
+		return -EOPNOTSUPP;
+	}
+
+	if (oct->no_speed_setting) {
+		dev_err(&oct->pci_dev->dev, "%s: Changing speed is not supported\n",
+			__func__);
+		return -EOPNOTSUPP;
+	}
+
+	if ((ecmd->base.duplex != DUPLEX_UNKNOWN &&
+	     ecmd->base.duplex != linfo->link.s.duplex) ||
+	     ecmd->base.autoneg != AUTONEG_DISABLE ||
+	    (ecmd->base.speed != 10000 && ecmd->base.speed != 25000 &&
+	     ecmd->base.speed != SPEED_UNKNOWN))
+		return -EOPNOTSUPP;
+
+	if ((oct->speed_boot == speed / 1000) &&
+	    oct->speed_boot == oct->speed_setting)
+		return 0;
+
+	liquidio_set_speed(lio, speed / 1000);
+
+	dev_dbg(&oct->pci_dev->dev, "Port speed is set to %dG\n",
+		oct->speed_setting);
+
+	return 0;
+}
+
 static void
 lio_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
 {
@@ -2966,6 +3110,7 @@ static int lio_set_priv_flags(struct net_device *netdev, u32 flags)
 
 static const struct ethtool_ops lio_ethtool_ops = {
 	.get_link_ksettings	= lio_get_link_ksettings,
+	.set_link_ksettings	= lio_set_link_ksettings,
 	.get_link		= ethtool_op_get_link,
 	.get_drvinfo		= lio_get_drvinfo,
 	.get_ringparam		= lio_ethtool_get_ringparam,
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index ee75048..e500528 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -912,6 +912,9 @@ liquidio_probe(struct pci_dev *pdev,
 	/* set linux specific device pointer */
 	oct_dev->pci_dev = (void *)pdev;
 
+	oct_dev->subsystem_id = pdev->subsystem_vendor |
+		(pdev->subsystem_device << 16);
+
 	hs = &handshake[oct_dev->octeon_id];
 	init_completion(&hs->init);
 	init_completion(&hs->started);
@@ -3664,6 +3667,23 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
 			"NIC ifidx:%d Setup successful\n", i);
 
 		octeon_free_soft_command(octeon_dev, sc);
+
+		if (octeon_dev->subsystem_id ==
+			OCTEON_CN2350_25GB_SUBSYS_ID ||
+		    octeon_dev->subsystem_id ==
+			OCTEON_CN2360_25GB_SUBSYS_ID) {
+			liquidio_get_speed(lio);
+
+			if (octeon_dev->speed_setting == 0) {
+				octeon_dev->speed_setting = 25;
+				octeon_dev->no_speed_setting = 1;
+			}
+		} else {
+			octeon_dev->no_speed_setting = 1;
+			octeon_dev->speed_setting = 10;
+		}
+		octeon_dev->speed_boot = octeon_dev->speed_setting;
+
 	}
 
 	devlink = devlink_alloc(&liquidio_devlink_ops,
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 08b682b..41b5c72 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -411,6 +411,9 @@ liquidio_vf_probe(struct pci_dev *pdev,
 	/* set linux specific device pointer */
 	oct_dev->pci_dev = pdev;
 
+	oct_dev->subsystem_id = pdev->subsystem_vendor |
+		(pdev->subsystem_device << 16);
+
 	if (octeon_device_init(oct_dev)) {
 		liquidio_vf_remove(pdev);
 		return -ENOMEM;
@@ -2198,6 +2201,8 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
 			"NIC ifidx:%d Setup successful\n", i);
 
 		octeon_free_soft_command(octeon_dev, sc);
+
+		octeon_dev->no_speed_setting = 1;
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
index 0265704..285b248 100644
--- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
+++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
@@ -93,6 +93,7 @@ enum octeon_tag_type {
 
 #define OPCODE_NIC_VF_REP_PKT          0x15
 #define OPCODE_NIC_VF_REP_CMD          0x16
+#define OPCODE_NIC_UBOOT_CTL           0x17
 
 #define CORE_DRV_TEST_SCATTER_OP    0xFFF5
 
@@ -249,6 +250,9 @@ static inline void add_sg_size(struct octeon_sg_entry *sg_entry,
 #define   OCTNET_CMD_VLAN_FILTER_ENABLE 0x1
 #define   OCTNET_CMD_VLAN_FILTER_DISABLE 0x0
 
+#define   SEAPI_CMD_SPEED_SET           0x2
+#define   SEAPI_CMD_SPEED_GET           0x3
+
 #define   LIO_CMD_WAIT_TM 100
 
 /* RX(packets coming from wire) Checksum verification flags */
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
index 9430c0a..94a4ed88d 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
@@ -43,6 +43,13 @@
 #define  OCTEON_CN23XX_REV_1_1        0x01
 #define  OCTEON_CN23XX_REV_2_0        0x80
 
+/**SubsystemId for the chips */
+#define	 OCTEON_CN2350_10GB_SUBSYS_ID_1	0X3177d
+#define	 OCTEON_CN2350_10GB_SUBSYS_ID_2	0X4177d
+#define	 OCTEON_CN2360_10GB_SUBSYS_ID	0X5177d
+#define	 OCTEON_CN2350_25GB_SUBSYS_ID	0X7177d
+#define	 OCTEON_CN2360_25GB_SUBSYS_ID	0X6177d
+
 /** Endian-swap modes supported by Octeon. */
 enum octeon_pci_swap_mode {
 	OCTEON_PCI_PASSTHROUGH = 0,
@@ -430,6 +437,8 @@ struct octeon_device {
 
 	u16 rev_id;
 
+	u32 subsystem_id;
+
 	u16 pf_num;
 
 	u16 vf_num;
@@ -584,6 +593,11 @@ struct octeon_device {
 	struct lio_vf_rep_list vf_rep_list;
 	struct devlink *devlink;
 	enum devlink_eswitch_mode eswitch_mode;
+
+	/* for 25G NIC speed change */
+	u8  speed_boot;
+	u8  speed_setting;
+	u8  no_speed_setting;
 };
 
 #define  OCT_DRV_ONLINE 1
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_network.h b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
index 8571f11..dd3177a 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_network.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
@@ -81,6 +81,18 @@ struct oct_nic_stats_ctrl {
 	struct net_device *netdev;
 };
 
+struct oct_nic_seapi_resp {
+	u64 rh;
+	u32 speed;
+	u64 status;
+};
+
+struct liquidio_nic_seapi_ctl_context {
+	int octeon_id;
+	u32 status;
+	struct completion complete;
+};
+
 /** LiquidIO per-interface network private data */
 struct lio {
 	/** State of the interface. Rx/Tx happens only in the RUNNING state.  */
@@ -230,6 +242,9 @@ void lio_delete_glists(struct lio *lio);
 
 int lio_setup_glists(struct octeon_device *oct, struct lio *lio, int num_qs);
 
+int liquidio_get_speed(struct lio *lio);
+int liquidio_set_speed(struct lio *lio, int speed);
+
 /**
  * \brief Net device change_mtu
  * @param netdev network device

^ permalink raw reply related

* Re: [PATCH net-next] net/mlx4_en: optimizes get_fixed_ipv6_csum()
From: Eric Dumazet @ 2018-05-04  2:10 UTC (permalink / raw)
  To: David Miller, eric.dumazet; +Cc: tariqt, saeedm, edumazet, netdev
In-Reply-To: <20180503.215255.1696584242788978423.davem@davemloft.net>



On 05/03/2018 06:52 PM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 3 May 2018 17:05:06 -0700
> 
>>
>>
>> On 05/02/2018 07:18 AM, Tariq Toukan wrote:
>>>
>>>
>>> On 27/04/2018 1:56 AM, Saeed Mahameed wrote:
>>
>>>> LGTM,
>>>>
>>>> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
>>>>
>>>
>>> Acked-by: Tariq Toukan <tariqt@mellanox.com>
>>>
>>> Thanks Eric.
>>
>> Thanks guys. 
>>
>> I see this patch  ( http://patchwork.ozlabs.org/patch/901336/ ) in
>> a state I do not know : "Awaiting Upstream"
> 
> THat means I expect to see this change from the upstream
> maintainer, which in this case is Tariq.
> 

I see, but it seems Tariq does not know that, otherwise he would
not have sent an "Acked-by:"

I guess this will need an extra round-trip ...

^ permalink raw reply

* RE: KMSAN: uninit-value in strcmp
From: Jon Maloy @ 2018-05-04  1:58 UTC (permalink / raw)
  To: David Miller,
	syzbot+df0257c92ffd4fcc58cd@syzkaller.appspotmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	syzkaller-bugs@googlegroups.com,
	tipc-discussion@lists.sourceforge.net, ying.xue@windriver.com
In-Reply-To: <20180503.152213.740988890883315228.davem@davemloft.net>



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, May 03, 2018 15:22
> To: syzbot+df0257c92ffd4fcc58cd@syzkaller.appspotmail.com
> Cc: Jon Maloy <jon.maloy@ericsson.com>; linux-kernel@vger.kernel.org;
> netdev@vger.kernel.org; syzkaller-bugs@googlegroups.com; tipc-
> discussion@lists.sourceforge.net; ying.xue@windriver.com
> Subject: Re: KMSAN: uninit-value in strcmp
> 
> From: syzbot <syzbot+df0257c92ffd4fcc58cd@syzkaller.appspotmail.com>
> Date: Thu, 03 May 2018 11:44:02 -0700
> 
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:17 [inline]
> >  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
> >  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
> >  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
> >  strcmp+0xf7/0x160 lib/string.c:329
> >  tipc_nl_node_get_link+0x220/0x6f0 net/tipc/node.c:1881
> > genl_family_rcv_msg net/netlink/genetlink.c:599 [inline]
> 
> Hmmm, TIPC_NL_LINK_GET uses tipc_nl_policy, which has a proper nesting
> entry for TIPC_NLA_LINK.  I wonder how the code goes about validating
> TIPC_NLA_LINK_NAME in such a case?  Does it?

I assume that a strncmp() instead of a strcmp() would avert this particular crash, but it doesn't sound like that is what you are after here?
To be honest, I will need to study this code a little myself to understand if there is more that has to be done.

///jon

> 
> This may be the problem.

^ permalink raw reply

* Re: [net-next PATCH 4/5] udp: Do not copy destructor if one is not present
From: Eric Dumazet @ 2018-05-04  1:58 UTC (permalink / raw)
  To: Alexander Duyck, netdev, willemb, davem
In-Reply-To: <20180504003339.4496.26712.stgit@localhost.localdomain>



On 05/03/2018 05:33 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch makes it so that if a destructor is not present we avoid trying
> to update the skb socket or any reference counting that would be associated
> with the NULL socket and/or descriptor. By doing this we can support
> traffic coming from another namespace without any issues.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  net/ipv4/udp_offload.c |   19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index fd94bbb369b2..52760660d674 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -195,6 +195,7 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>  	unsigned int sum_truesize = 0;
>  	struct udphdr *uh;
>  	unsigned int mss;
> +	bool copy_dtor;
>  	__sum16 check;
>  	__be16 newlen;
>  
> @@ -208,12 +209,14 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>  	skb_pull(gso_skb, sizeof(*uh));
>  
>  	/* clear destructor to avoid skb_segment assigning it to tail */
> -	WARN_ON_ONCE(gso_skb->destructor != sock_wfree);
> +	copy_dtor = gso_skb->destructor == sock_wfree;
> +	WARN_ON_ONCE(gso_skb->destructor && !copy_dtor);

Why not simply clear gso_skb->destructor only if copy_dtor is true ?

Then we can get rid of this WARN_ON_ONCE()

>  	gso_skb->destructor = NULL;
>  
>  	segs = skb_segment(gso_skb, features);
>  	if (unlikely(IS_ERR_OR_NULL(segs))) {
> -		gso_skb->destructor = sock_wfree;
> +		if (copy_dtor)
> +			gso_skb->destructor = sock_wfree;
>  		return segs;
>  	}
>  
> @@ -234,9 +237,11 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>  					    (__force u32)newlen));
>  
>  	for (;;) {
> -		seg->destructor = sock_wfree;
> -		seg->sk = sk;
> -		sum_truesize += seg->truesize;
> +		if (copy_dtor) {
> +			seg->destructor = sock_wfree;
> +			seg->sk = sk;
> +			sum_truesize += seg->truesize;
> +		}
>  
>  		if (!seg->next)
>  			break;
> @@ -268,7 +273,9 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>  		uh->check = gso_make_checksum(seg, ~check);
>  
>  	/* update refcount for the packet */
> -	refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc);
> +	if (copy_dtor)
> +		refcount_add(sum_truesize - gso_skb->truesize,
> +			     &sk->sk_wmem_alloc);
>  out:
>  	return segs;
>  }
> 

^ permalink raw reply

* Re: [net-next PATCH 2/5] udp: Do not pass checksum or MSS as parameters
From: Eric Dumazet @ 2018-05-04  1:56 UTC (permalink / raw)
  To: Alexander Duyck, netdev, willemb, davem
In-Reply-To: <20180504003326.4496.54432.stgit@localhost.localdomain>



On 05/03/2018 05:33 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch is meant to be a start at cleaning up some of the UDP GSO
> segmentation code. Specifically we were passing mss and a recomputed
> checksum when we really didn't need to. The function itself could derive
> that information based on the already provided checksum, the length of the
> packet stored in the UDP header, and the gso_size.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  include/net/udp.h      |    3 +-
>  net/ipv4/udp_offload.c |   61 +++++++++++++++++++++++++++++++-----------------
>  net/ipv6/udp_offload.c |    7 +-----
>  3 files changed, 41 insertions(+), 30 deletions(-)
>

There are really a lot of changes in this patch (not mentioned in changelog).

Can you please split all this in small patches ?

^ permalink raw reply

* Re: [PATCH net-next] net/mlx4_en: optimizes get_fixed_ipv6_csum()
From: David Miller @ 2018-05-04  1:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tariqt, saeedm, edumazet, netdev
In-Reply-To: <be5f3289-2a60-5c61-90e9-99aad086ec9d@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 3 May 2018 17:05:06 -0700

> 
> 
> On 05/02/2018 07:18 AM, Tariq Toukan wrote:
>> 
>> 
>> On 27/04/2018 1:56 AM, Saeed Mahameed wrote:
> 
>>> LGTM,
>>>
>>> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
>>>
>> 
>> Acked-by: Tariq Toukan <tariqt@mellanox.com>
>> 
>> Thanks Eric.
> 
> Thanks guys. 
> 
> I see this patch  ( http://patchwork.ozlabs.org/patch/901336/ ) in
> a state I do not know : "Awaiting Upstream"

THat means I expect to see this change from the upstream
maintainer, which in this case is Tariq.

^ permalink raw reply

* Re: [net-next PATCH 1/5] udp: Record gso_segs when supporting UDP segmentation offload
From: Eric Dumazet @ 2018-05-04  1:50 UTC (permalink / raw)
  To: Alexander Duyck, netdev, willemb, davem
In-Reply-To: <20180504003321.4496.34585.stgit@localhost.localdomain>



On 05/03/2018 05:33 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> We need to record the number of segments that will be generated when this
> frame is segmented. The expectation is that if gso_size is set then
> gso_segs is set as well. Without this some drivers such as ixgbe get
> confused if they attempt to offload this as they record 0 segments for the
> entire packet instead of the correct value.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  net/ipv4/udp.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index dd3102a37ef9..e07db83b311e 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -793,6 +793,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
>  
>  		skb_shinfo(skb)->gso_size = cork->gso_size;
>  		skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> +		skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(len - sizeof(uh),
> +							 cork->gso_size);
>  		goto csum_partial;
>  	}
>  
> 

Yes, this also fixes qdisc_pkt_len_init()

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox