* Re: [v2] ath10k: fix incorrect size of dma_free_coherent in ath10k_ce_alloc_src_ring_64
From: Kalle Valo @ 2018-06-14 15:11 UTC (permalink / raw)
To: YueHaibing
Cc: linux-wireless, netdev, YueHaibing, linux-kernel, ath10k, davem
In-Reply-To: <20180601112548.22592-1-yuehaibing@huawei.com>
YueHaibing <yuehaibing@huawei.com> wrote:
> sizeof(struct ce_desc) should be a copy-paste mistake
> just use sizeof(struct ce_desc_64) to avoid mem leak
>
> Fixes: b7ba83f7c414 ("ath10k: add support for shadow register for WNC3990")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Patch applied to ath-next branch of ath.git, thanks.
5a211627004e ath10k: fix incorrect size of dma_free_coherent in ath10k_ce_alloc_src_ring_64
--
https://patchwork.kernel.org/patch/10443063/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format
From: Arnaldo Carvalho de Melo @ 2018-06-14 15:03 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
Wang Nan, Jiri Olsa, Namhyung Kim, Ingo Molnar
In-Reply-To: <20180613232638.yyhktiovl6oeawgt@kafai-mbp>
Em Wed, Jun 13, 2018 at 04:26:38PM -0700, Martin KaFai Lau escreveu:
> On Tue, Jun 12, 2018 at 05:41:26PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jun 12, 2018 at 05:31:24PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, Jun 07, 2018 at 01:07:01PM -0700, Martin KaFai Lau escreveu:
> > > > On Thu, Jun 07, 2018 at 04:30:29PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > So this must be available in a newer llvm version? Which one?
> > > > I should have put in the details in my last email or
> > > > in the commit message, my bad.
> > > > 1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
> > > > LLC_FLAGS needed to compile the bpf prog. It requires a new
> > > > "-mattr=dwarf" llc option which was added to the future
> > > > llvm 7.0.
> > > [root@jouet bpf]# pahole hello.o
> > > struct clang version 5.0.1 (tags/RELEASE_501/final) {
> > > clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 (tags/RELEASE_501/final); /* 0 4 */
> > > clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 (tags/RELEASE_501/final); /* 4 4 */
> > > clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 (tags/RELEASE_501/final); /* 8 4 */
> > > clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 (tags/RELEASE_501/final); /* 12 4 */
> > > /* size: 16, cachelines: 1, members: 4 */
> > > /* last cacheline: 16 bytes */
> > > };
> > > [root@jouet bpf]#
> > >
> > > Ok, I guess I saw this case in the llvm/clang git logs, so this one was
> > > generated with the older clang, will regenerate and add that "-mattr=dwarf"
> > > part.
> > [root@jouet bpf]# pahole hello.o
> > struct clang version 7.0.0 <SNIP>
<SNIP>
> > /* size: 16, cachelines: 1, members: 4 */
> > /* last cacheline: 16 bytes */
> > };
> That means the "-mattr=dwarf" is not effective.
> Can you share your clang and llc command to create hello.o?
I tried it, but it didn't work, see:
[root@jouet bpf]# cat hello.c
#include "stdio.h"
int syscall_enter(openat)(void *ctx)
{
puts("Hello, world\n");
return 0;
}
[root@jouet bpf]# trace -e openat,hello.c touch /tmp/kafai
clang-6.0: error: unknown argument: '-mattr=dwarf'
ERROR: unable to compile hello.c
Hint: Check error message shown above.
Hint: You can also pre-compile it into .o using:
clang -target bpf -O2 -c hello.c
with proper -I and -D options.
event syntax error: 'hello.c'
\___ Failed to load hello.c from source: Error when compiling BPF scriptlet
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
[root@jouet bpf]#
The full command line with that is:
[root@jouet bpf]# trace -v -e openat,hello.c touch /tmp/kafai |& grep mattr
set env: CLANG_OPTIONS=-g -mattr=dwarf
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41100 -g -mattr=dwarf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -I/home/acme/lib/include/perf/bpf -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.17.0-rc5/build -c /home/acme/bpf/hello.c -target bpf -O2 -o -
clang-6.0: error: unknown argument: '-mattr=dwarf'
[root@jouet bpf]#
This is with these llvm and clang trees:
[root@jouet llvm]# git log --oneline -5
98c78e82f54 (HEAD -> master, origin/master, origin/HEAD) [asan] Instrument comdat globals on COFF targets
6ad988b5998 [DAGCombiner] clean up comments; NFC
a735ba5b795 [X86][SSE] Support v8i16/v16i16 rotations
1503b9f6fe8 [x86] add tests for node-level FMF; NFC
4a49826736f [x86] regenerate test checks; NFC
[root@jouet llvm]#
[root@jouet llvm]# cd tools/clang/
[root@jouet clang]# git log --oneline -5
8c873daccc (HEAD -> master, origin/master, origin/HEAD) [X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking.
a344be6ba4 [X86] Change immediate type for some builtins from char to int.
dcdd53793e [CUDA] Fix emission of constant strings in sections
a90c85acaf [X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking.
ff71c0eccc [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
[root@jouet clang]#
[root@jouet clang]# git log | grep mattr=dwarf
[root@jouet clang]# cd -
/home/acme/git.tmp/git/llvm
[root@jouet llvm]# git log | grep mattr=dwarf
bpf: introduce -mattr=dwarfris to disable DwarfUsesRelocationsAcrossSections
This patch introduces a new flag -mattr=dwarfris
[root@jouet llvm]#
Humm, so its -mattr=dwarfris and not -attr=dwarf?
Didn't help :-\
commit 0e0047f8c9ada2f0fe0c5f01579a80e2455b8df5
Author: Yonghong Song <yhs@fb.com>
Date: Thu Mar 1 23:04:59 2018 +0000
bpf: introduce -mattr=dwarfris to disable DwarfUsesRelocationsAcrossSections
Commit e4507fb8c94b ("bpf: disable DwarfUsesRelocationsAcrossSections")
disables MCAsmInfo DwarfUsesRelocationsAcrossSections unconditionally
so that dwarf will not use cross section (between dwarf and symbol table)
relocations. This new debug format enables pahole to dump structures
correctly as libdwarves.so does not have BPF backend support yet.
This new debug format, however, breaks bcc (https://github.com/iovisor/bcc)
source debug output as llvm in-memory Dwarf support has some issues to
handle it. More specifically, with DwarfUsesRelocationsAcrossSections
disabled, JIT compiler does not generate .debug_abbrev and Dwarf
DIE (debug info entry) processing is not happy about this.
This patch introduces a new flag -mattr=dwarfris
(dwarf relocation in section) to disable DwarfUsesRelocationsAcrossSections.
DwarfUsesRelocationsAcrossSections is true by default.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326505 91177308-0d34-0410-b5e6-96231b3b80d8
^ permalink raw reply
* Re: [PULL] vhost: cleanups and fixes
From: Nitesh Narayan Lal @ 2018-06-14 15:01 UTC (permalink / raw)
To: Wei Wang, Linus Torvalds, Michael S. Tsirkin
Cc: KVM list, virtualization, Network Development,
Linux Kernel Mailing List, Andrew Morton, Bjorn Andersson
In-Reply-To: <5B1FA8EF.4030409@intel.com>
[-- Attachment #1.1: Type: text/plain, Size: 3316 bytes --]
Hi Wei,
On 06/12/2018 07:05 AM, Wei Wang wrote:
> On 06/12/2018 09:59 AM, Linus Torvalds wrote:
>> On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin <mst@redhat.com>
>> wrote:
>>> Maybe it will help to have GFP_NONE which will make any allocation
>>> fail if attempted. Linus, would this address your comment?
>> It would definitely have helped me initially overlook that call chain.
>>
>> But then when I started looking at the whole dma_map_page() thing, it
>> just raised my hackles again.
>>
>> I would seriously suggest having a much simpler version for the "no
>> allocation, no dma mapping" case, so that it's *obvious* that that
>> never happens.
>>
>> So instead of having virtio_balloon_send_free_pages() call a really
>> generic complex chain of functions that in _some_ cases can do memory
>> allocation, why isn't there a short-circuited "vitruque_add_datum()"
>> that is guaranteed to never do anything like that?
>>
>> Honestly, I look at "add_one_sg()" and it really doesn't make me
>> happy. It looks hacky as hell. If I read the code right, you're really
>> trying to just queue up a simple tuple of <pfn,len>, except you encode
>> it as a page pointer in order to play games with the SG logic, and
>> then you hmap that to the ring, except in this case it's all a fake
>> ring that just adds the cpu-physical address instead.
>>
>> And to figuer that out, it's like five layers of indirection through
>> different helper functions that *can* do more generic things but in
>> this case don't.
>>
>> And you do all of this from a core VM callback function with some
>> _really_ core VM locks held.
>>
>> That makes no sense to me.
>>
>> How about this:
>>
>> - get rid of all that code
>>
>> - make the core VM callback save the "these are the free memory
>> regions" in a fixed and limited array. One that DOES JUST THAT. No
>> crazy "SG IO dma-mapping function crap". Just a plain array of a fixed
>> size, pre-allocated for that virtio instance.
>>
>> - make it obvious that what you do in that sequence is ten
>> instructions and no allocations ("Look ma, I wrote a value to an array
>> and incremented the array idex, and I'M DONE")
>>
>> - then in that workqueue entry that you start *anyway*, you empty the
>> array and do all the crazy virtio stuff.
>>
>> In fact, while at it, just simplify the VM interface too. Instead of
>> traversing a random number of buddy lists, just trraverse *one* - the
>> top-level one. Are you seriously ever going to shrink or mark
>> read-only anythin *but* something big enough to be in the maximum
>> order?
>>
>> MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really*
>> want the balloon code to work on smaller things, particularly since
>> the whole interface is fundamentally racy and opportunistic to begin
>> with?
>
> OK, I will implement a new version based on the suggestions. Thanks.
I have been working on a similar series [1] that is more generic, which
solves the problem of giving unused memory back to the host and could be
used to solve the migration problem as well. Can you take a look and see
if you can use my series in some way?
[1] https://www.spinics.net/lists/kvm/msg170113.html
>
> Best,
> Wei
>
--
Regards
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* [PATCH v2 3/3] bpfilter: check compiler capability in Kconfig
From: Masahiro Yamada @ 2018-06-14 14:39 UTC (permalink / raw)
To: netdev, Alexei Starovoitov, David S . Miller
Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada,
linux-kbuild, Michal Marek, Daniel Borkmann
In-Reply-To: <1528987172-19810-1-git-send-email-yamada.masahiro@socionext.com>
With the brand-new syntax extension of Kconfig, we can directly
check the compiler capability in the configuration phase.
If the cc-can-link.sh fails, the BPFILTER_UMH is automatically
hidden by the dependency.
I deleted 'default n', which is no-op.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---
Changes in v2:
- newly added
Makefile | 5 -----
net/Makefile | 4 ----
net/bpfilter/Kconfig | 2 +-
scripts/cc-can-link.sh | 2 +-
4 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/Makefile b/Makefile
index 8a26b59..9ada673 100644
--- a/Makefile
+++ b/Makefile
@@ -507,11 +507,6 @@ ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC) $(KBUILD_CFLA
KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO
endif
-ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/cc-can-link.sh $(CC)), y)
- CC_CAN_LINK := y
- export CC_CAN_LINK
-endif
-
# The expansion should be delayed until arch/$(SRCARCH)/Makefile is included.
# Some architectures define CROSS_COMPILE in arch/$(SRCARCH)/Makefile.
# CC_VERSION_TEXT is referenced from Kconfig (so it needs export),
diff --git a/net/Makefile b/net/Makefile
index 13ec0d5..bdaf539 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -20,11 +20,7 @@ obj-$(CONFIG_TLS) += tls/
obj-$(CONFIG_XFRM) += xfrm/
obj-$(CONFIG_UNIX) += unix/
obj-$(CONFIG_NET) += ipv6/
-ifneq ($(CC_CAN_LINK),y)
-$(warning CC cannot link executables. Skipping bpfilter.)
-else
obj-$(CONFIG_BPFILTER) += bpfilter/
-endif
obj-$(CONFIG_PACKET) += packet/
obj-$(CONFIG_NET_KEY) += key/
obj-$(CONFIG_BRIDGE) += bridge/
diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
index a948b07..76deb66 100644
--- a/net/bpfilter/Kconfig
+++ b/net/bpfilter/Kconfig
@@ -1,6 +1,5 @@
menuconfig BPFILTER
bool "BPF based packet filtering framework (BPFILTER)"
- default n
depends on NET && BPF && INET
help
This builds experimental bpfilter framework that is aiming to
@@ -9,6 +8,7 @@ menuconfig BPFILTER
if BPFILTER
config BPFILTER_UMH
tristate "bpfilter kernel module with user mode helper"
+ depends on $(success,$(srctree)/scripts/cc-can-link.sh $(CC))
default m
help
This builds bpfilter kernel module with embedded user mode helper
diff --git a/scripts/cc-can-link.sh b/scripts/cc-can-link.sh
index 208eb28..6efcead 100755
--- a/scripts/cc-can-link.sh
+++ b/scripts/cc-can-link.sh
@@ -1,7 +1,7 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
-cat << "END" | $@ -x c - -o /dev/null >/dev/null 2>&1 && echo "y"
+cat << "END" | $@ -x c - -o /dev/null >/dev/null 2>&1
#include <stdio.h>
int main(void)
{
--
2.7.4
^ permalink raw reply related
* [PATCH v2 2/3] bpfilter: include bpfilter_umh in assembly instead of using objcopy
From: Masahiro Yamada @ 2018-06-14 14:39 UTC (permalink / raw)
To: netdev, Alexei Starovoitov, David S . Miller
Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada,
Alexei Starovoitov, YueHaibing
In-Reply-To: <1528987172-19810-1-git-send-email-yamada.masahiro@socionext.com>
What we want here is to embed a user-space program into the kernel.
Instead of the complex ELF magic, let's simply wrap it in the assembly
with the '.incbin' directive.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---
Changes in v2:
- Rebase
net/bpfilter/Makefile | 15 ++-------------
net/bpfilter/bpfilter_kern.c | 11 +++++------
net/bpfilter/bpfilter_umh_blob.S | 7 +++++++
3 files changed, 14 insertions(+), 19 deletions(-)
create mode 100644 net/bpfilter/bpfilter_umh_blob.S
diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index e0bbe75..39c6980 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -15,18 +15,7 @@ ifeq ($(CONFIG_BPFILTER_UMH), y)
HOSTLDFLAGS += -static
endif
-# a bit of elf magic to convert bpfilter_umh binary into a binary blob
-# inside bpfilter_umh.o elf file referenced by
-# _binary_net_bpfilter_bpfilter_umh_start symbol
-# which bpfilter_kern.c passes further into umh blob loader at run-time
-quiet_cmd_copy_umh = GEN $@
- cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
- $(OBJCOPY) -I binary -O `$(OBJDUMP) -f $<|grep format|cut -d' ' -f8` \
- -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
- --rename-section .data=.init.rodata $< $@
-
-$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
- $(call cmd,copy_umh)
+$(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
-bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
+bpfilter-objs += bpfilter_kern.o bpfilter_umh_blob.o
diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
index 0952257..6de3ae5 100644
--- a/net/bpfilter/bpfilter_kern.c
+++ b/net/bpfilter/bpfilter_kern.c
@@ -10,11 +10,8 @@
#include <linux/file.h>
#include "msgfmt.h"
-#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
-#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
-
-extern char UMH_start;
-extern char UMH_end;
+extern char bpfilter_umh_start;
+extern char bpfilter_umh_end;
static struct umh_info info;
/* since ip_getsockopt() can run in parallel, serialize access to umh */
@@ -93,7 +90,9 @@ static int __init load_umh(void)
int err;
/* fork usermode process */
- err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
+ err = fork_usermode_blob(&bpfilter_umh_end,
+ &bpfilter_umh_end - &bpfilter_umh_start,
+ &info);
if (err)
return err;
pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S
new file mode 100644
index 0000000..40311d1
--- /dev/null
+++ b/net/bpfilter/bpfilter_umh_blob.S
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+ .section .init.rodata, "a"
+ .global bpfilter_umh_start
+bpfilter_umh_start:
+ .incbin "net/bpfilter/bpfilter_umh"
+ .global bpfilter_umh_end
+bpfilter_umh_end:
--
2.7.4
^ permalink raw reply related
* [PATCH v2 1/3] bpfilter: add bpfilter_umh to .gitignore
From: Masahiro Yamada @ 2018-06-14 14:39 UTC (permalink / raw)
To: netdev, Alexei Starovoitov, David S . Miller
Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada
In-Reply-To: <1528987172-19810-1-git-send-email-yamada.masahiro@socionext.com>
bpfilter_umh is a generated file. It should be ignored by git.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---
Changes in v2: None
net/bpfilter/.gitignore | 1 +
1 file changed, 1 insertion(+)
create mode 100644 net/bpfilter/.gitignore
diff --git a/net/bpfilter/.gitignore b/net/bpfilter/.gitignore
new file mode 100644
index 0000000..e97084e
--- /dev/null
+++ b/net/bpfilter/.gitignore
@@ -0,0 +1 @@
+bpfilter_umh
--
2.7.4
^ permalink raw reply related
* [PATCH v2 0/3] net: bpfilter: clean-up build rules
From: Masahiro Yamada @ 2018-06-14 14:39 UTC (permalink / raw)
To: netdev, Alexei Starovoitov, David S . Miller
Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada,
linux-kbuild, Michal Marek, Alexei Starovoitov, Daniel Borkmann,
YueHaibing
Clean-up from Kbuild/Kconfig point of view.
I confirmed this series can apply and compile
based on today's Linus tree.
(commit 2837461dbe6f)
Masahiro Yamada (3):
bpfilter: add bpfilter_umh to .gitignore
bpfilter: include bpfilter_umh in assembly instead of using objcopy
bpfilter: check compiler capability in Kconfig
Makefile | 5 -----
net/Makefile | 4 ----
net/bpfilter/.gitignore | 1 +
net/bpfilter/Kconfig | 2 +-
net/bpfilter/Makefile | 15 ++-------------
net/bpfilter/bpfilter_kern.c | 11 +++++------
net/bpfilter/bpfilter_umh_blob.S | 7 +++++++
scripts/cc-can-link.sh | 2 +-
8 files changed, 17 insertions(+), 30 deletions(-)
create mode 100644 net/bpfilter/.gitignore
create mode 100644 net/bpfilter/bpfilter_umh_blob.S
--
2.7.4
^ permalink raw reply
* Re: [PATCH] iwlwifi: pcie: make array prop static, shrinks object size
From: Kalle Valo @ 2018-06-14 14:36 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Joe Perches, Colin King, Johannes Berg, Emmanuel Grumbach,
Luca Coelho, Intel Linux Wireless, David S . Miller,
linux-wireless, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20180611204506.GA21542@kroah.com>
Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> On Mon, Jun 11, 2018 at 12:40:55PM -0700, Joe Perches wrote:
>> (adding Greg KH)
>>
>> Now what is happening is that prop is being reloaded
>> each invocation with the constant addresses of the strings.
>>
>> It seems the prototype and function for kobject_uevent_env
>> should change as well to avoid this.
>>
>> Perhaps this should become:
>> ---
>> drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 2 +-
>> include/linux/kobject.h | 2 +-
>> lib/kobject_uevent.c | 2 +-
>> 3 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
>> index 7229991ae70d..6668a8aad22e 100644
>> --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
>> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
>> @@ -1946,7 +1946,7 @@ static void iwl_trans_pcie_removal_wk(struct work_struct *wk)
>> struct iwl_trans_pcie_removal *removal =
>> container_of(wk, struct iwl_trans_pcie_removal, work);
>> struct pci_dev *pdev = removal->pdev;
>> - char *prop[] = {"EVENT=INACCESSIBLE", NULL};
>> + static const char * const prop[] = {"EVENT=INACCESSIBLE", NULL};
>>
>> dev_err(&pdev->dev, "Device gone - attempting removal\n");
>> kobject_uevent_env(&pdev->dev.kobj, KOBJ_CHANGE, prop);
>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
>> index 7f6f93c3df9c..9f5cf553dd1e 100644
>> --- a/include/linux/kobject.h
>> +++ b/include/linux/kobject.h
>> @@ -217,7 +217,7 @@ extern struct kobject *firmware_kobj;
>>
>> int kobject_uevent(struct kobject *kobj, enum kobject_action action);
>> int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>> - char *envp[]);
>> + const char * const envp[]);
>> int kobject_synth_uevent(struct kobject *kobj, const char *buf, size_t count);
>>
>> __printf(2, 3)
>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>> index 63d0816ab23b..9107989a0cc8 100644
>> --- a/lib/kobject_uevent.c
>> +++ b/lib/kobject_uevent.c
>> @@ -452,7 +452,7 @@ static void zap_modalias_env(struct kobj_uevent_env *env)
>> * corresponding error when it fails.
>> */
>> int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>> - char *envp_ext[])
>> + const char * const envp_ext[])
>> {
>> struct kobj_uevent_env *env;
>> const char *action_string = kobject_actions[action];
>
> No objection from me, care to make it a real patch so that I can apply
> it after 4.18-rc1 is out?
For the wireless part:
Acked-by: Kalle Valo <kvalo@codeaurora.org>
--
Kalle Valo
^ permalink raw reply
* Miss Ebtisam
From: Miss Ebtisam musa ibrahim @ 2018-06-14 14:27 UTC (permalink / raw)
[-- Attachment #1: Type: text/plain, Size: 0 bytes --]
[-- Attachment #2: My Name is Miss Ebtisam musa Ibrahim.docx --]
[-- Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document, Size: 12084 bytes --]
^ permalink raw reply
* Re: [BUG] net: stmmac: socfpga ethernet no longer working on linux-next
From: Dinh Nguyen @ 2018-06-14 14:21 UTC (permalink / raw)
To: Marek Vasut; +Cc: Jose.Abreu, netdev, David Miller, clabbe, Dinh Nguyen
In-Reply-To: <08ba0471-5abd-f493-f148-062510f8e333@denx.de>
On Thu, Jun 14, 2018 at 6:14 AM Marek Vasut <marex@denx.de> wrote:
>
> On 06/14/2018 10:18 AM, Jose Abreu wrote:
> > On 14-06-2018 08:38, Jose Abreu wrote:
> >> Hello,
> >>
> >> On 13-06-2018 21:46, Dinh Nguyen wrote:
> >>> Hi,
> >>>
> >>> The stmmac ethernet has stopped working in linux-next and linus/master
> >>> branch(v4.17-11782-gbe779f03d563)
> >>>
> >>> It appears that the stmmac ethernet has stopped working after these 2 commits:
> >>>
> >>> 4dbbe8dde848 net: stmmac: Add support for U32 TC filter using Flexible RX Parser
> >>> 5f0456b43140 net: stmmac: Implement logic to automatically select HW Interface
> >>>
> >>> If I move to this commit "565020aaeebf net: stmmac: Disable ACS
> >>> Feature for GMAC >= 4", then the stmmac works again on SoCFPGA.
> >>>
> >>> I was following this thread:
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg502858.html&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=yaVFU4TjGY0gVF8El1uKcisy6TPsyCl9uN7Wsis-qhY&m=fvPkLp2xlWolmIYwoFLmALhxlycg1w0UmxiYdT7qojc&s=aC4a2U3X_siDxSNz3c5OeadhEJWll31yP-oi5nNar94&e=
> >>>
> >>> Was wondering if there was a patch to fix dwmac-sun8i that the socfpga
> >>> platform needs as well?
> >> Probably. I will check and get back to you ASAP.
> >
> > This seems to be a different problem. Can you send me your dmesg
> > log and DT bindings you are using?
>
> arch/arm/boot/dts/socfpga_arria10_socdk_sdmmc.dts
> for example fails for me in next/master. Worked on 4.17-rc7.
>
I'm using "arch/arm/boot/dts/socfpga_arria5_socdk.dts". Here's my boot log:
It appears to just get stuck in "eth0: link becomes ready", times out
and reinits:
[ 0.000000] Linux version 4.17.0-11782-gbe779f03d563-dirty (dinguyen@linux-bu
ilds1) (gcc version 7.2.1 20171011 (Linaro GCC 7.2-2017.11)) #26 SMP Thu Jun 14
09:01:38 CDT 2018
[ 0.000000] CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=10c5387d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instructio
n cache
[ 0.000000] OF: fdt: Machine model: Altera SOCFPGA Arria V SoC Development Ki
t
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] Memory policy: Data cache writealloc
[ 0.000000] On node 0 totalpages: 262144
[ 0.000000] Normal zone: 1536 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 196608 pages, LIFO batch:31
[ 0.000000] HighMem zone: 65536 pages, LIFO batch:15
[ 0.000000] random: get_random_bytes called from start_kernel+0xac/0x488 with
crng_init=0
[ 0.000000] percpu: Embedded 16 pages/cpu @(ptrval) s36044 r8192 d21300 u6553
6
[ 0.000000] pcpu-alloc: s36044 r8192 d21300 u65536 alloc=16*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 260608
[ 0.000000] Kernel command line: root=/dev/nfs rw nfsroot=10.122.105.139:/hom
e/dinguyen/rootfs_yocto ip=dhcp debug ignore_loglevel
[ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.000000] Memory: 1027460K/1048576K available (7168K kernel code, 508K rwda
ta, 1540K rodata, 1024K init, 133K bss, 21116K reserved, 0K cma-reserved, 262144
K highmem)
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB)
[ 0.000000] fixmap : 0xffc00000 - 0xfff00000 (3072 kB)
[ 0.000000] vmalloc : 0xf0800000 - 0xff800000 ( 240 MB)
[ 0.000000] lowmem : 0xc0000000 - 0xf0000000 ( 768 MB)
[ 0.000000] pkmap : 0xbfe00000 - 0xc0000000 ( 2 MB)
[ 0.000000] modules : 0xbf000000 - 0xbfe00000 ( 14 MB)
[ 0.000000] .text : 0x(ptrval) - 0x(ptrval) (8160 kB)
[ 0.000000] .init : 0x(ptrval) - 0x(ptrval) (1024 kB)
[ 0.000000] .data : 0x(ptrval) - 0x(ptrval) ( 509 kB)
[ 0.000000] .bss : 0x(ptrval) - 0x(ptrval) ( 134 kB)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.000000] ftrace: allocating 25769 entries in 76 pages
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU event tracing is enabled.
[ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[ 0.000000] L2C-310 enabling early BRESP for Cortex-A9
[ 0.000000] L2C-310 full line of zeros enabled for Cortex-A9
[ 0.000000] L2C-310 ID prefetch enabled, offset 8 lines
[ 0.000000] L2C-310 dynamic clock gating enabled, standby mode enabled
[ 0.000000] L2C-310 cache controller enabled, 8 ways, 512 kB
[ 0.000000] L2C-310: CACHE_ID 0x410030c9, AUX_CTRL 0x76460001
[ 0.000000] clocksource: timer1: mask: 0xffffffff max_cycles: 0xffffffff, max
_idle_ns: 19112604467 ns
[ 0.000004] sched_clock: 32 bits at 100MHz, resolution 10ns, wraps every 2147
4836475ns
[ 0.000014] Switching to timer-based delay loop, resolution 10ns
[ 0.000150] Console: colour dummy device 80x30
[ 0.000528] console [tty0] enabled
[ 0.000552] Calibrating delay loop (skipped), value calculated using timer fr
equency.. 200.00 BogoMIPS (lpj=1000000)
[ 0.000575] pid_max: default: 32768 minimum: 301
[ 0.000689] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.000708] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.001202] CPU: Testing write buffer coherency: ok
[ 0.001236] CPU0: Spectre v2: using BPIALL workaround
[ 0.001454] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[ 0.001923] Setting up static identity map for 0x100000 - 0x100060
[ 0.002042] Hierarchical SRCU implementation.
[ 0.002489] smp: Bringing up secondary CPUs ...
[ 0.003035] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[ 0.003041] CPU1: Spectre v2: using BPIALL workaround
[ 0.003146] smp: Brought up 1 node, 2 CPUs
[ 0.003165] SMP: Total of 2 processors activated (400.00 BogoMIPS).
[ 0.003177] CPU: All CPU(s) started in SVC mode.
[ 0.003986] devtmpfs: initialized
[ 0.007541] VFP support v0.3: implementor 41 architecture 3 part 30 variant 9
rev 4
[ 0.007737] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, ma
x_idle_ns: 19112604462750000 ns
[ 0.007764] futex hash table entries: 512 (order: 3, 32768 bytes)
[ 0.008640] NET: Registered protocol family 16
[ 0.009583] DMA: preallocated 256 KiB pool for atomic coherent allocations
[ 0.010648] hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint
registers.
[ 0.010676] hw-breakpoint: maximum watchpoint size is 4 bytes.
[ 0.027506] vgaarb: loaded
[ 0.027786] SCSI subsystem initialized
[ 0.028018] usbcore: registered new interface driver usbfs
[ 0.028109] usbcore: registered new interface driver hub
[ 0.028190] usbcore: registered new device driver usb
[ 0.028375] usb_phy_generic soc:usbphy: soc:usbphy supply vcc not found, usin
g dummy regulator
[ 0.028884] pps_core: LinuxPPS API ver. 1 registered
[ 0.028906] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giome
tti <giometti@linux.it>
[ 0.028941] PTP clock support registered
[ 0.029114] FPGA manager framework
[ 0.030434] clocksource: Switched to clocksource timer1
[ 0.078659] NET: Registered protocol family 2
[ 0.079128] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144
bytes)
[ 0.079164] TCP established hash table entries: 8192 (order: 3, 32768 bytes)
[ 0.079229] TCP bind hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.079332] TCP: Hash tables configured (established 8192 bind 8192)
[ 0.079428] UDP hash table entries: 512 (order: 2, 16384 bytes)
[ 0.079469] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[ 0.079611] NET: Registered protocol family 1
[ 0.080101] RPC: Registered named UNIX socket transport module.
[ 0.080122] RPC: Registered udp transport module.
[ 0.080133] RPC: Registered tcp transport module.
[ 0.080143] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 0.080160] PCI: CLS 0 bytes, default 64
[ 0.080784] hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counter
s available
[ 0.082584] workingset: timestamp_bits=30 max_order=18 bucket_order=0
[ 0.090349] NFS: Registering the id_resolver key type
[ 0.090453] Key type id_resolver registered
[ 0.090469] Key type id_legacy registered
[ 0.090491] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[ 0.091186] ntfs: driver 2.1.32 [Flags: R/W].
[ 0.091551] jffs2: version 2.2. (NAND) ▒© 2001-2006 Red Hat, Inc.
[ 0.092807] bounce: pool size: 64 pages
[ 0.092835] io scheduler noop registered (default)
[ 0.092848] io scheduler mq-deadline registered
[ 0.092859] io scheduler kyber registered
[ 0.099974] dma-pl330 ffe01000.pdma: Loaded driver for PL330 DMAC-341330
[ 0.100011] dma-pl330 ffe01000.pdma: DBUFF-512x8bytes Num_Chans-8 Num
_Peri-32 Num_Events-8
[ 0.105043] Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
[ 0.106125] ffc02000.serial0: ttyS0 at MMIO 0xffc02000 (irq = 38, base_baud =
6250000) is a 16550A
[ 0.763357] console [ttyS0] enabled
[ 0.767408] ffc03000.serial1: ttyS1 at MMIO 0xffc03000 (irq = 39, base_baud =
6250000) is a 16550A
[ 0.777740] brd: module loaded
[ 0.781670] cadence-qspi ff705000.spi: n25q512ax3 (65536 Kbytes)
[ 0.787808] 2 fixed-partitions partitions found on MTD device ff705000.spi.0
[ 0.794855] Creating 2 MTD partitions on "ff705000.spi.0":
[ 0.800328] 0x000000000000-0x000000800000 : "Flash 0 Raw Data"
[ 0.806873] 0x000000800000-0x000008000000 : "Flash 0 jffs2 Filesystem"
[ 0.813416] mtd: partition "Flash 0 jffs2 Filesystem" extends beyond the end
of device "ff705000.spi.0" -- size truncated to 0x3800000
[ 0.826562] libphy: Fixed MDIO Bus: probed
[ 0.831312] CAN device driver interface
[ 0.835537] socfpga-dwmac ff702000.ethernet: PTP uses main clock
[ 0.841794] socfpga-dwmac ff702000.ethernet: Version ID not available
[ 0.848223] socfpga-dwmac ff702000.ethernet: DWMAC1000
[ 0.853454] socfpga-dwmac ff702000.ethernet: Normal descriptors
[ 0.859357] socfpga-dwmac ff702000.ethernet: Ring mode enabled
[ 0.865184] socfpga-dwmac ff702000.ethernet: DMA HW capability register suppo
rted
[ 0.872654] socfpga-dwmac ff702000.ethernet: RX Checksum Offload Engine suppo
rted
[ 0.880113] socfpga-dwmac ff702000.ethernet: COE Type 2
[ 0.885329] socfpga-dwmac ff702000.ethernet: TX Checksum insertion supported
[ 0.899744] libphy: stmmac: probed
[ 0.903175] Micrel KSZ9021 Gigabit PHY stmmac-0:04: attached PHY driver [Micr
el KSZ9021 Gigabit PHY] (mii_bus:phy_addr=stmmac-0:04, irq=POLL)
[ 0.916772] dwc2 ffb40000.usb: ffb40000.usb supply vusb_d not found, using du
mmy regulator
[ 0.925092] dwc2 ffb40000.usb: ffb40000.usb supply vusb_a not found, using du
mmy regulator
[ 0.933461] dwc2 ffb40000.usb: dwc2_check_params: Invalid parameter lpm=1
[ 0.940230] dwc2 ffb40000.usb: dwc2_check_params: Invalid parameter lpm_clock
_gating=1
[ 0.948136] dwc2 ffb40000.usb: dwc2_check_params: Invalid parameter besl=1
[ 0.954998] dwc2 ffb40000.usb: dwc2_check_params: Invalid parameter hird_thre
shold_en=1
[ 0.963004] dwc2 ffb40000.usb: EPs: 16, dedicated fifos, 8064 entries in SPRA
M
[ 0.970490] dwc2 ffb40000.usb: DWC OTG Controller
[ 0.975204] dwc2 ffb40000.usb: new USB bus registered, assigned bus number 1
[ 0.982265] dwc2 ffb40000.usb: irq 40, io mem 0xffb40000
[ 0.988221] hub 1-0:1.0: USB hub found
[ 0.992038] hub 1-0:1.0: 1 port detected
[ 0.996636] usbcore: registered new interface driver usb-storage
[ 1.002925] i2c /dev entries driver
[ 1.007280] Synopsys Designware Multimedia Card Interface Driver
[ 1.013615] dw_mmc ff704000.dwmmc0: IDMAC supports 32-bit address mode.
[ 1.020245] dw_mmc ff704000.dwmmc0: Using internal DMA controller.
[ 1.026433] dw_mmc ff704000.dwmmc0: Version ID is 240a
[ 1.031604] dw_mmc ff704000.dwmmc0: DW MMC controller at irq 32,32 bit host d
ata width,1024 deep fifo
[ 1.040982] mmc_host mmc0: card is polling.
[ 1.057816] mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz
, actual 396825HZ div = 63)
[ 1.080628] ledtrig-cpu: registered to indicate activity on CPUs
[ 1.086793] usbcore: registered new interface driver usbhid
[ 1.092376] usbhid: USB HID core driver
[ 1.096453] fpga_manager fpga0: Altera SOCFPGA FPGA Manager registered
[ 1.103496] altera_hps2fpga_bridge ff400000.fpga_bridge: fpga bridge [lwhps2f
pga] registered
[ 1.112159] altera_hps2fpga_bridge ff500000.fpga_bridge: fpga bridge [hps2fpg
a] registered
[ 1.120864] oprofile: using arm/armv7-ca9
[ 1.125539] NET: Registered protocol family 10
[ 1.130705] Segment Routing with IPv6
[ 1.134425] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ 1.140892] NET: Registered protocol family 17
[ 1.145346] NET: Registered protocol family 15
[ 1.149776] can: controller area network core (rev 20170425 abi 9)
[ 1.156001] NET: Registered protocol family 29
[ 1.160447] can: raw protocol (rev 20170425)
[ 1.164704] can: broadcast manager protocol (rev 20170425 t)
[ 1.170354] can: netlink gateway (rev 20170425) max_hops=1
[ 1.175998] 8021q: 802.1Q VLAN Support v1.8
[ 1.180204] Key type dns_resolver registered
[ 1.184561] ThumbEE CPU extension supported.
[ 1.188825] Registering SWP/SWPB emulation handler
[ 1.198329] at24 0-0051: 4096 byte 24c32 EEPROM, writable, 32 bytes/write
[ 1.206310] rtc-ds1307 0-0068: SET TIME!
[ 1.214145] rtc-ds1307 0-0068: registered as rtc0
[ 1.220796] rtc-ds1307 0-0068: setting system clock to 2000-01-01 00:00:07 UT
C (946684807)
[ 1.249625] mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 50000000
Hz, actual 50000000HZ div = 0)
[ 1.259416] mmc0: new high speed SDHC card at address 0007
[ 1.265633] mmcblk0: mmc0:0007 SD4GB 3.71 GiB
[ 1.271840] mmcblk0: p1 p2 p3 p4
[ 1.341106] Micrel KSZ9021 Gigabit PHY stmmac-0:04: attached PHY driver [Micr
el KSZ9021 Gigabit PHY] (mii_bus:phy_addr=stmmac-0:04, irq=POLL)
[ 1.355710] socfpga-dwmac ff702000.ethernet eth0: No Safety Features support
found
[ 1.363460] socfpga-dwmac ff702000.ethernet eth0: registered PTP clock
[ 1.370219] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 3.441194] socfpga-dwmac ff702000.ethernet eth0: Link is Up - 100Mbps/Full -
flow control rx/tx
[ 3.450432] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3.480411] Sending DHCP requests ...... timed out!
[ 85.627555] Removed PTP HW clock successfully on eth0
[ 85.632788] IP-Config: Retrying forever (NFS root)...
[ 85.731113] Micrel KSZ9021 Gigabit PHY stmmac-0:04: attached PHY driver [Micr
el KSZ9021 Gigabit PHY] (mii_bus:phy_addr=stmmac-0:04, irq=POLL)
[ 85.750434] socfpga-dwmac ff702000.ethernet eth0: No Safety Features support
found
[ 85.758160] socfpga-dwmac ff702000.ethernet eth0: registered PTP clock
[ 85.764825] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 87.831196] socfpga-dwmac ff702000.ethernet eth0: Link is Up - 100Mbps/Full -
flow control rx/tx
[ 87.840438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
^ permalink raw reply
* [PATCH net-next,RFC 13/13] netfilter: nft_flow_offload: make sure route is not stale
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
Use dst_check() to validate that route is still valid, otherwise,
tear down the flow entry and pass up packet to the standard forwarding
path so we have a chance to cache the fresh route again.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/netfilter/nf_flow_table_ip.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 0828e49bd95e..2bdf740debac 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -244,6 +244,11 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
rt = (struct rtable *)flow->tuplehash[dir].tuple.dst_cache;
+ if (dst_check(&rt->dst, 0)) {
+ flow_offload_teardown(flow);
+ return NF_ACCEPT;
+ }
+
if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)) &&
(ip_hdr(skb)->frag_off & htons(IP_DF)) != 0)
return NF_ACCEPT;
@@ -462,6 +467,11 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
rt = (struct rt6_info *)flow->tuplehash[dir].tuple.dst_cache;
+ if (dst_check(&rt->dst, 0)) {
+ flow_offload_teardown(flow);
+ return NF_ACCEPT;
+ }
+
if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)))
return NF_ACCEPT;
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 12/13] netfilter: nft_flow_offload: remove secpath check
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
It is safe to place a flow that is coming from IPSec into the flowtable.
So decapsulated can benefit from the flowtable fastpath.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/netfilter/nft_flow_offload.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index f2e95edfb4de..a7f529b79bdb 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -54,8 +54,6 @@ static bool nft_flow_offload_skip(struct sk_buff *skb)
if (unlikely(opt->optlen))
return true;
- if (skb_sec_path(skb))
- return true;
return false;
}
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 11/13] netfilter: nft_flow_offload: enable offload after second packet is seen
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
Once we have a confirmed conntrack, ie. a packet went through the stack
and a conntrack was added, then allow second packet to configure the
flowtable offload.
This allows UDP media traffic going in only one direction to enable offloads.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/netfilter/nft_flow_offload.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index d6bab8c3cbb0..f2e95edfb4de 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -88,14 +88,9 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
goto out;
}
- if (test_bit(IPS_HELPER_BIT, &ct->status))
- goto out;
-
- if (ctinfo == IP_CT_NEW ||
- ctinfo == IP_CT_RELATED)
- goto out;
-
- if (test_and_set_bit(IPS_OFFLOAD_BIT, &ct->status))
+ if (test_bit(IPS_HELPER_BIT, &ct->status) ||
+ !test_bit(IPS_CONFIRMED_BIT, &ct->status) ||
+ test_and_set_bit(IPS_OFFLOAD_BIT, &ct->status))
goto out;
dir = CTINFO2DIR(ctinfo);
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 10/13] netfilter: nf_flow_table: add flowtable for early ingress hook
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
Add the new flowtable type for the early ingress hook, this allows
us to combine the custom GRO chaining with the flowtable abstraction
to define fastpaths.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/netfilter/nf_flow_table.h | 3 ++
net/ipv4/netfilter/nf_flow_table_ipv4.c | 11 ++++++
net/netfilter/nf_flow_table_ip.c | 62 +++++++++++++++++++++++++++++++++
3 files changed, 76 insertions(+)
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 4606bad41155..e270269dd1e8 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -126,6 +126,9 @@ unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state);
unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state);
+unsigned int nf_flow_offload_early_ingress_ip_hook(void *priv,
+ struct sk_buff *skb,
+ const struct nf_hook_state *state);
#define MODULE_ALIAS_NF_FLOWTABLE(family) \
MODULE_ALIAS("nf-flowtable-" __stringify(family))
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index 681c0d5c47d7..b771000ca894 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -14,15 +14,26 @@ static struct nf_flowtable_type flowtable_ipv4 = {
.owner = THIS_MODULE,
};
+static struct nf_flowtable_type flowtable_ipv4_early = {
+ .family = NFPROTO_IPV4,
+ .hooknum = NF_NETDEV_EARLY_INGRESS,
+ .init = nf_flow_table_init,
+ .free = nf_flow_table_free,
+ .hook = nf_flow_offload_early_ingress_ip_hook,
+ .owner = THIS_MODULE,
+};
+
static int __init nf_flow_ipv4_module_init(void)
{
nft_register_flowtable_type(&flowtable_ipv4);
+ nft_register_flowtable_type(&flowtable_ipv4_early);
return 0;
}
static void __exit nf_flow_ipv4_module_exit(void)
{
+ nft_unregister_flowtable_type(&flowtable_ipv4_early);
nft_unregister_flowtable_type(&flowtable_ipv4);
}
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 15ed91309992..0828e49bd95e 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -11,6 +11,7 @@
#include <net/ip6_route.h>
#include <net/neighbour.h>
#include <net/netfilter/nf_flow_table.h>
+#include <net/xfrm.h>
/* For layer 4 checksum field offset. */
#include <linux/tcp.h>
#include <linux/udp.h>
@@ -487,3 +488,64 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
return NF_STOLEN;
}
EXPORT_SYMBOL_GPL(nf_flow_offload_ipv6_hook);
+
+unsigned int
+nf_flow_offload_early_ingress_ip_hook(void *priv, struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct nf_flowtable *flow_table = priv;
+ struct flow_offload_tuple tuple = {};
+ enum flow_offload_tuple_dir dir;
+ struct flow_offload *flow;
+ struct net_device *outdev;
+ const struct rtable *rt;
+ unsigned int thoff;
+ struct iphdr *iph;
+
+ if (skb->protocol != htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ if (nf_flow_tuple_ip(skb, state->in, &tuple) < 0)
+ return NF_ACCEPT;
+
+ tuplehash = flow_offload_lookup(flow_table, &tuple);
+ if (tuplehash == NULL)
+ return NF_ACCEPT;
+
+ outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx);
+ if (!outdev)
+ return NF_ACCEPT;
+
+ dir = tuplehash->tuple.dir;
+ flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
+ rt = (const struct rtable *)flow->tuplehash[dir].tuple.dst_cache;
+
+ if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)) &&
+ (ip_hdr(skb)->frag_off & htons(IP_DF)) != 0)
+ return NF_ACCEPT;
+
+ if (skb_try_make_writable(skb, sizeof(*iph)))
+ return NF_DROP;
+
+ thoff = ip_hdr(skb)->ihl * 4;
+ if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff))
+ return NF_ACCEPT;
+
+ if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
+ nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
+ return NF_DROP;
+
+ flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
+
+ skb_dst_set_noref(skb, flow->tuplehash[dir].tuple.dst_cache);
+
+ if (skb_dst(skb)->xfrm &&
+ !xfrm_dev_offload_ok(skb, skb_dst(skb)->xfrm))
+ return NF_ACCEPT;
+
+ NAPI_GRO_CB(skb)->is_ffwd = 1;
+
+ return NF_STOLEN;
+}
+EXPORT_SYMBOL_GPL(nf_flow_offload_early_ingress_ip_hook);
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 09/13] netfilter: nf_flow_table: add hooknum to flowtable type
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
This allows us to register different flowtable variants depending on the
hook type, hence we can define flowtable for new hook types.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/netfilter/nf_flow_table.h | 1 +
net/ipv4/netfilter/nf_flow_table_ipv4.c | 1 +
net/ipv6/netfilter/nf_flow_table_ipv6.c | 1 +
net/netfilter/nf_flow_table_inet.c | 1 +
net/netfilter/nf_tables_api.c | 120 +++++++++++++++++---------------
5 files changed, 67 insertions(+), 57 deletions(-)
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index ba9fa4592f2b..4606bad41155 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -14,6 +14,7 @@ struct nf_flowtable;
struct nf_flowtable_type {
struct list_head list;
int family;
+ unsigned int hooknum;
int (*init)(struct nf_flowtable *ft);
void (*free)(struct nf_flowtable *ft);
nf_hookfn *hook;
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index e1e56d7123d2..681c0d5c47d7 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -7,6 +7,7 @@
static struct nf_flowtable_type flowtable_ipv4 = {
.family = NFPROTO_IPV4,
+ .hooknum = NF_NETDEV_INGRESS,
.init = nf_flow_table_init,
.free = nf_flow_table_free,
.hook = nf_flow_offload_ip_hook,
diff --git a/net/ipv6/netfilter/nf_flow_table_ipv6.c b/net/ipv6/netfilter/nf_flow_table_ipv6.c
index c511d206bf9b..f1f976bdc151 100644
--- a/net/ipv6/netfilter/nf_flow_table_ipv6.c
+++ b/net/ipv6/netfilter/nf_flow_table_ipv6.c
@@ -8,6 +8,7 @@
static struct nf_flowtable_type flowtable_ipv6 = {
.family = NFPROTO_IPV6,
+ .hooknum = NF_NETDEV_INGRESS,
.init = nf_flow_table_init,
.free = nf_flow_table_free,
.hook = nf_flow_offload_ipv6_hook,
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 99771aa7e7ea..347a640d9723 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -22,6 +22,7 @@ nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb,
static struct nf_flowtable_type flowtable_inet = {
.family = NFPROTO_INET,
+ .hooknum = NF_NETDEV_INGRESS,
.init = nf_flow_table_init,
.free = nf_flow_table_free,
.hook = nf_flow_offload_inet_hook,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index ca4c4d994ddb..5d6c3b9eee6b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5266,6 +5266,40 @@ static int nf_tables_parse_devices(const struct nft_ctx *ctx,
return err;
}
+static const struct nf_flowtable_type *__nft_flowtable_type_get(u8 family,
+ int hooknum)
+{
+ const struct nf_flowtable_type *type;
+
+ list_for_each_entry(type, &nf_tables_flowtables, list) {
+ if (family == type->family &&
+ hooknum == type->hooknum)
+ return type;
+ }
+ return NULL;
+}
+
+static const struct nf_flowtable_type *nft_flowtable_type_get(u8 family,
+ int hooknum)
+{
+ const struct nf_flowtable_type *type;
+
+ type = __nft_flowtable_type_get(family, hooknum);
+ if (type != NULL && try_module_get(type->owner))
+ return type;
+
+#ifdef CONFIG_MODULES
+ if (type == NULL) {
+ nfnl_unlock(NFNL_SUBSYS_NFTABLES);
+ request_module("nf-flowtable-%u", family);
+ nfnl_lock(NFNL_SUBSYS_NFTABLES);
+ if (__nft_flowtable_type_get(family, hooknum))
+ return ERR_PTR(-EAGAIN);
+ }
+#endif
+ return ERR_PTR(-ENOENT);
+}
+
static const struct nla_policy nft_flowtable_hook_policy[NFTA_FLOWTABLE_HOOK_MAX + 1] = {
[NFTA_FLOWTABLE_HOOK_NUM] = { .type = NLA_U32 },
[NFTA_FLOWTABLE_HOOK_PRIORITY] = { .type = NLA_U32 },
@@ -5278,6 +5312,7 @@ static int nf_tables_flowtable_parse_hook(const struct nft_ctx *ctx,
{
struct net_device *dev_array[NFT_FLOWTABLE_DEVICE_MAX];
struct nlattr *tb[NFTA_FLOWTABLE_HOOK_MAX + 1];
+ const struct nf_flowtable_type *type;
struct nf_hook_ops *ops;
int hooknum, priority;
int err, n = 0, i;
@@ -5293,19 +5328,31 @@ static int nf_tables_flowtable_parse_hook(const struct nft_ctx *ctx,
return -EINVAL;
hooknum = ntohl(nla_get_be32(tb[NFTA_FLOWTABLE_HOOK_NUM]));
- if (hooknum != NF_NETDEV_INGRESS)
+ if (hooknum != NF_NETDEV_INGRESS &&
+ hooknum != NF_NETDEV_EARLY_INGRESS)
return -EINVAL;
+ type = nft_flowtable_type_get(ctx->family, hooknum);
+ if (IS_ERR(type))
+ return PTR_ERR(type);
+
+ flowtable->data.type = type;
+ err = type->init(&flowtable->data);
+ if (err < 0)
+ goto err1;
+
priority = ntohl(nla_get_be32(tb[NFTA_FLOWTABLE_HOOK_PRIORITY]));
err = nf_tables_parse_devices(ctx, tb[NFTA_FLOWTABLE_HOOK_DEVS],
dev_array, &n);
if (err < 0)
- return err;
+ goto err2;
ops = kzalloc(sizeof(struct nf_hook_ops) * n, GFP_KERNEL);
- if (!ops)
- return -ENOMEM;
+ if (!ops) {
+ err = -ENOMEM;
+ goto err2;
+ }
flowtable->hooknum = hooknum;
flowtable->priority = priority;
@@ -5323,38 +5370,13 @@ static int nf_tables_flowtable_parse_hook(const struct nft_ctx *ctx,
GFP_KERNEL);
}
- return err;
-}
-
-static const struct nf_flowtable_type *__nft_flowtable_type_get(u8 family)
-{
- const struct nf_flowtable_type *type;
-
- list_for_each_entry(type, &nf_tables_flowtables, list) {
- if (family == type->family)
- return type;
- }
- return NULL;
-}
-
-static const struct nf_flowtable_type *nft_flowtable_type_get(u8 family)
-{
- const struct nf_flowtable_type *type;
-
- type = __nft_flowtable_type_get(family);
- if (type != NULL && try_module_get(type->owner))
- return type;
+ return 0;
+err2:
+ flowtable->data.type->free(&flowtable->data);
+err1:
+ module_put(type->owner);
-#ifdef CONFIG_MODULES
- if (type == NULL) {
- nfnl_unlock(NFNL_SUBSYS_NFTABLES);
- request_module("nf-flowtable-%u", family);
- nfnl_lock(NFNL_SUBSYS_NFTABLES);
- if (__nft_flowtable_type_get(family))
- return ERR_PTR(-EAGAIN);
- }
-#endif
- return ERR_PTR(-ENOENT);
+ return err;
}
static void nft_unregister_flowtable_net_hooks(struct net *net,
@@ -5377,7 +5399,6 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk,
struct netlink_ext_ack *extack)
{
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
- const struct nf_flowtable_type *type;
struct nft_flowtable *flowtable, *ft;
u8 genmask = nft_genmask_next(net);
int family = nfmsg->nfgen_family;
@@ -5429,21 +5450,10 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk,
goto err1;
}
- type = nft_flowtable_type_get(family);
- if (IS_ERR(type)) {
- err = PTR_ERR(type);
- goto err2;
- }
-
- flowtable->data.type = type;
- err = type->init(&flowtable->data);
- if (err < 0)
- goto err3;
-
err = nf_tables_flowtable_parse_hook(&ctx, nla[NFTA_FLOWTABLE_HOOK],
flowtable);
if (err < 0)
- goto err4;
+ goto err2;
for (i = 0; i < flowtable->ops_len; i++) {
if (!flowtable->ops[i].dev)
@@ -5457,37 +5467,33 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk,
if (flowtable->ops[i].dev == ft->ops[k].dev &&
flowtable->ops[i].pf == ft->ops[k].pf) {
err = -EBUSY;
- goto err5;
+ goto err3;
}
}
}
err = nf_register_net_hook(net, &flowtable->ops[i]);
if (err < 0)
- goto err5;
+ goto err3;
}
err = nft_trans_flowtable_add(&ctx, NFT_MSG_NEWFLOWTABLE, flowtable);
if (err < 0)
- goto err6;
+ goto err4;
list_add_tail_rcu(&flowtable->list, &table->flowtables);
table->use++;
return 0;
-err6:
+err4:
i = flowtable->ops_len;
-err5:
+err3:
for (k = i - 1; k >= 0; k--) {
kfree(flowtable->dev_name[k]);
nf_unregister_net_hook(net, &flowtable->ops[k]);
}
kfree(flowtable->ops);
-err4:
- flowtable->data.type->free(&flowtable->data);
-err3:
- module_put(type->owner);
err2:
kfree(flowtable->name);
err1:
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 08/13] netfilter: nft_chain_filter: add support for early ingress
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
This patch adds the new filter chain at the early ingress hook.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/netfilter/nft_chain_filter.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 84c902477a91..bc7fb2dc0e44 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -277,9 +277,11 @@ static const struct nft_chain_type nft_chain_filter_netdev = {
.name = "filter",
.type = NFT_CHAIN_T_DEFAULT,
.family = NFPROTO_NETDEV,
- .hook_mask = (1 << NF_NETDEV_INGRESS),
+ .hook_mask = (1 << NF_NETDEV_INGRESS) |
+ (1 << NF_NETDEV_EARLY_INGRESS),
.hooks = {
- [NF_NETDEV_INGRESS] = nft_do_chain_netdev,
+ [NF_NETDEV_INGRESS] = nft_do_chain_netdev,
+ [NF_NETDEV_EARLY_INGRESS] = nft_do_chain_netdev,
},
};
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 07/13] netfilter: add ESP support for early ingress
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
From: Steffen Klassert <steffen.klassert@secunet.com>
This patch adds the GSO logic for ESP and the codepath that allows
the xfrm infrastructure to signal the GRO layer that the packet is
following the fast forwarding path.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/early_ingress.h | 2 ++
net/ipv4/netfilter/early_ingress.c | 8 ++++++++
net/ipv6/netfilter/early_ingress.c | 8 ++++++++
net/netfilter/early_ingress.c | 36 +++++++++++++++++++++++++++++++++++
net/xfrm/xfrm_output.c | 4 ++++
5 files changed, 58 insertions(+)
diff --git a/include/net/netfilter/early_ingress.h b/include/net/netfilter/early_ingress.h
index 9ba8e2875345..6653b294f25a 100644
--- a/include/net/netfilter/early_ingress.h
+++ b/include/net/netfilter/early_ingress.h
@@ -8,6 +8,8 @@ struct sk_buff **nft_udp_gro_receive(struct sk_buff **head,
struct sk_buff *skb);
struct sk_buff **nft_tcp_gro_receive(struct sk_buff **head,
struct sk_buff *skb);
+struct sk_buff *nft_esp_gso_segment(struct sk_buff *skb,
+ netdev_features_t features);
int nf_hook_early_ingress(struct sk_buff *skb);
diff --git a/net/ipv4/netfilter/early_ingress.c b/net/ipv4/netfilter/early_ingress.c
index 6ff6e34e5eff..74f3a7f1273d 100644
--- a/net/ipv4/netfilter/early_ingress.c
+++ b/net/ipv4/netfilter/early_ingress.c
@@ -5,6 +5,7 @@
#include <net/arp.h>
#include <net/udp.h>
#include <net/tcp.h>
+#include <net/esp.h>
#include <net/protocol.h>
#include <net/netfilter/early_ingress.h>
@@ -303,9 +304,16 @@ static const struct net_offload nft_tcp4_offload = {
},
};
+static const struct net_offload nft_esp4_offload = {
+ .callbacks = {
+ .gso_segment = nft_esp_gso_segment,
+ },
+};
+
static const struct net_offload __rcu *nft_ip_offloads[MAX_INET_PROTOS] __read_mostly = {
[IPPROTO_UDP] = &nft_udp4_offload,
[IPPROTO_TCP] = &nft_tcp4_offload,
+ [IPPROTO_ESP] = &nft_esp4_offload,
};
void nf_early_ingress_ip_enable(void)
diff --git a/net/ipv6/netfilter/early_ingress.c b/net/ipv6/netfilter/early_ingress.c
index 026d2814530a..fb00b083593b 100644
--- a/net/ipv6/netfilter/early_ingress.c
+++ b/net/ipv6/netfilter/early_ingress.c
@@ -5,6 +5,7 @@
#include <net/arp.h>
#include <net/udp.h>
#include <net/tcp.h>
+#include <net/esp.h>
#include <net/protocol.h>
#include <net/netfilter/early_ingress.h>
#include <net/ip6_route.h>
@@ -291,9 +292,16 @@ static const struct net_offload nft_tcp6_offload = {
},
};
+static const struct net_offload nft_esp6_offload = {
+ .callbacks = {
+ .gso_segment = nft_esp_gso_segment,
+ },
+};
+
static const struct net_offload __rcu *nft_ip6_offloads[MAX_INET_PROTOS] __read_mostly = {
[IPPROTO_UDP] = &nft_udp6_offload,
[IPPROTO_TCP] = &nft_tcp6_offload,
+ [IPPROTO_ESP] = &nft_esp6_offload,
};
void nf_early_ingress_ip6_enable(void)
diff --git a/net/netfilter/early_ingress.c b/net/netfilter/early_ingress.c
index 4daf6cfea304..10d718bbe495 100644
--- a/net/netfilter/early_ingress.c
+++ b/net/netfilter/early_ingress.c
@@ -5,6 +5,7 @@
#include <net/arp.h>
#include <net/udp.h>
#include <net/tcp.h>
+#include <net/esp.h>
#include <net/protocol.h>
#include <crypto/aead.h>
#include <net/netfilter/early_ingress.h>
@@ -274,6 +275,41 @@ struct sk_buff **nft_tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
return pp;
}
+struct sk_buff *nft_esp_gso_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ struct xfrm_offload *xo = xfrm_offload(skb);
+ netdev_features_t esp_features = features;
+ struct crypto_aead *aead;
+ struct ip_esp_hdr *esph;
+ struct xfrm_state *x;
+
+ if (!xo)
+ return ERR_PTR(-EINVAL);
+
+ x = skb->sp->xvec[skb->sp->len - 1];
+ aead = x->data;
+ esph = ip_esp_hdr(skb);
+
+ if (esph->spi != x->id.spi)
+ return ERR_PTR(-EINVAL);
+
+ if (!pskb_may_pull(skb, sizeof(*esph) + crypto_aead_ivsize(aead)))
+ return ERR_PTR(-EINVAL);
+
+ __skb_pull(skb, sizeof(*esph) + crypto_aead_ivsize(aead));
+
+ skb->encap_hdr_csum = 1;
+
+ if (!(features & NETIF_F_HW_ESP) || !x->xso.offload_handle ||
+ (x->xso.dev != skb->dev))
+ esp_features = features & ~(NETIF_F_SG | NETIF_F_CSUM_MASK);
+
+ xo->flags |= XFRM_GSO_SEGMENT;
+
+ return x->outer_mode->gso_segment(x, skb, esp_features);
+}
+
static inline bool nf_hook_early_ingress_active(const struct sk_buff *skb)
{
#ifdef HAVE_JUMP_LABEL
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 89b178a78dc7..c63b157f46ce 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -146,6 +146,10 @@ int xfrm_output_resume(struct sk_buff *skb, int err)
while (likely((err = xfrm_output_one(skb, err)) == 0)) {
nf_reset(skb);
+ if (!skb_dst(skb)->xfrm && skb->sp &&
+ (skb_shinfo(skb)->gso_type & SKB_GSO_NFT))
+ return -EREMOTE;
+
err = skb_dst(skb)->ops->local_out(net, skb->sk, skb);
if (unlikely(err != 1))
goto out;
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 06/13] netfilter: add early ingress support for IPv6
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
From: Steffen Klassert <steffen.klassert@secunet.com>
This patch adds the custom GSO and GRO logic for the IPv6 early ingress
hook. Layer 4 supports UDP and TCP at this stage.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/early_ingress.h | 2 +
net/ipv6/netfilter/Makefile | 1 +
net/ipv6/netfilter/early_ingress.c | 307 ++++++++++++++++++++++++++++++++++
net/netfilter/early_ingress.c | 2 +
4 files changed, 312 insertions(+)
create mode 100644 net/ipv6/netfilter/early_ingress.c
diff --git a/include/net/netfilter/early_ingress.h b/include/net/netfilter/early_ingress.h
index caaef9fe619f..9ba8e2875345 100644
--- a/include/net/netfilter/early_ingress.h
+++ b/include/net/netfilter/early_ingress.h
@@ -13,6 +13,8 @@ int nf_hook_early_ingress(struct sk_buff *skb);
void nf_early_ingress_ip_enable(void);
void nf_early_ingress_ip_disable(void);
+void nf_early_ingress_ip6_enable(void);
+void nf_early_ingress_ip6_disable(void);
void nf_early_ingress_enable(void);
void nf_early_ingress_disable(void);
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 10a5a1c87320..445dfcf51ca8 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -2,6 +2,7 @@
#
# Makefile for the netfilter modules on top of IPv6.
#
+obj-$(CONFIG_NETFILTER_EARLY_INGRESS) += early_ingress.o
# Link order matters here.
obj-$(CONFIG_IP6_NF_IPTABLES) += ip6_tables.o
diff --git a/net/ipv6/netfilter/early_ingress.c b/net/ipv6/netfilter/early_ingress.c
new file mode 100644
index 000000000000..026d2814530a
--- /dev/null
+++ b/net/ipv6/netfilter/early_ingress.c
@@ -0,0 +1,307 @@
+#include <linux/kernel.h>
+#include <linux/netfilter.h>
+#include <linux/types.h>
+#include <net/xfrm.h>
+#include <net/arp.h>
+#include <net/udp.h>
+#include <net/tcp.h>
+#include <net/protocol.h>
+#include <net/netfilter/early_ingress.h>
+#include <net/ip6_route.h>
+
+static const struct net_offload __rcu *nft_ip6_offloads[MAX_INET_PROTOS] __read_mostly;
+
+static struct sk_buff *nft_udp6_gso_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ skb_push(skb, sizeof(struct ipv6hdr));
+ return nft_skb_segment(skb);
+}
+
+static struct sk_buff *nft_tcp6_gso_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ skb_push(skb, sizeof(struct ipv6hdr));
+ return nft_skb_segment(skb);
+}
+
+static struct sk_buff *nft_ipv6_gso_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ struct sk_buff *segs = ERR_PTR(-EINVAL);
+ const struct net_offload *ops;
+ struct packet_offload *ptype;
+ struct ipv6hdr *iph;
+ int proto;
+
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_NFT)) {
+ ptype = dev_get_packet_offload(skb->protocol, 1);
+ if (ptype)
+ return ptype->callbacks.gso_segment(skb, features);
+
+ return ERR_PTR(-EPROTONOSUPPORT);
+ }
+
+ if (SKB_GSO_CB(skb)->encap_level == 0) {
+ iph = ipv6_hdr(skb);
+ skb_reset_network_header(skb);
+ } else {
+ iph = (struct ipv6hdr *)skb->data;
+ }
+
+ if (unlikely(!pskb_may_pull(skb, sizeof(*iph))))
+ goto out;
+
+ SKB_GSO_CB(skb)->encap_level += sizeof(*iph);
+
+ if (unlikely(!pskb_may_pull(skb, sizeof(*iph))))
+ goto out;
+
+ __skb_pull(skb, sizeof(*iph));
+
+ proto = iph->nexthdr;
+
+ segs = ERR_PTR(-EPROTONOSUPPORT);
+
+ ops = rcu_dereference(nft_ip6_offloads[proto]);
+ if (likely(ops && ops->callbacks.gso_segment))
+ segs = ops->callbacks.gso_segment(skb, features);
+
+out:
+ return segs;
+}
+
+static int nft_ipv6_gro_complete(struct sk_buff *skb, int nhoff)
+{
+ struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
+ struct dst_entry *dst = skb_dst(skb);
+ struct rt6_info *rt = (struct rt6_info *)dst;
+ const struct net_offload *ops;
+ struct packet_offload *ptype;
+ int proto = iph->nexthdr;
+ struct in6_addr *nexthop;
+ struct neighbour *neigh;
+ struct net_device *dev;
+ unsigned int hh_len;
+ int err = 0;
+ u16 count;
+
+ count = NAPI_GRO_CB(skb)->count;
+
+ if (!NAPI_GRO_CB(skb)->is_ffwd) {
+ ptype = dev_get_packet_offload(skb->protocol, 1);
+ if (ptype)
+ return ptype->callbacks.gro_complete(skb, nhoff);
+
+ return 0;
+ }
+
+ rcu_read_lock();
+ ops = rcu_dereference(nft_ip6_offloads[proto]);
+ if (!ops || !ops->callbacks.gro_complete)
+ goto out_unlock;
+
+ /* Only need to add sizeof(*iph) to get to the next hdr below
+ * because any hdr with option will have been flushed in
+ * inet_gro_receive().
+ */
+ err = ops->callbacks.gro_complete(skb, nhoff + sizeof(*iph));
+
+out_unlock:
+ rcu_read_unlock();
+
+ if (err)
+ return err;
+
+ skb_shinfo(skb)->gso_type |= SKB_GSO_NFT;
+ skb_shinfo(skb)->gso_segs = count;
+
+ dev = dst->dev;
+ dev_hold(dev);
+ skb->dev = dev;
+
+ if (skb_dst(skb)->xfrm) {
+ err = dst_output(dev_net(dev), NULL, skb);
+ if (err != -EREMOTE)
+ return -EINPROGRESS;
+ }
+
+ if (count <= 1)
+ skb_gso_reset(skb);
+
+ hh_len = LL_RESERVED_SPACE(dev);
+
+ if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
+ struct sk_buff *skb2;
+
+ skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
+ if (!skb2) {
+ kfree_skb(skb);
+ return -ENOMEM;
+ }
+ consume_skb(skb);
+ skb = skb2;
+ }
+ rcu_read_lock();
+ nexthop = rt6_nexthop(rt, &iph->daddr);
+ neigh = __ipv6_neigh_lookup_noref(dev, nexthop);
+ if (unlikely(!neigh))
+ neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
+ if (!IS_ERR(neigh))
+ neigh_output(neigh, skb);
+ rcu_read_unlock();
+
+ return -EINPROGRESS;
+}
+
+static struct sk_buff **nft_ipv6_gro_receive(struct sk_buff **head,
+ struct sk_buff *skb)
+{
+ const struct net_offload *ops;
+ struct packet_offload *ptype;
+ struct sk_buff **pp = NULL;
+ struct sk_buff *p;
+ struct ipv6hdr *iph;
+ unsigned int nlen;
+ unsigned int hlen;
+ unsigned int off;
+ int proto, ret;
+
+ off = skb_gro_offset(skb);
+ hlen = off + sizeof(*iph);
+
+ iph = skb_gro_header_slow(skb, hlen, off);
+ if (unlikely(!iph))
+ goto out;
+
+ proto = iph->nexthdr;
+
+ rcu_read_lock();
+
+ if (iph->version != 6)
+ goto out_unlock;
+
+ nlen = skb_network_header_len(skb);
+
+ ret = nf_hook_early_ingress(skb);
+ switch (ret) {
+ case NF_STOLEN:
+ break;
+ case NF_ACCEPT:
+ ptype = dev_get_packet_offload(skb->protocol, 1);
+ if (ptype)
+ pp = ptype->callbacks.gro_receive(head, skb);
+
+ goto out_unlock;
+ case NF_DROP:
+ pp = ERR_PTR(-EPERM);
+ goto out_unlock;
+ }
+
+ ops = rcu_dereference(nft_ip6_offloads[proto]);
+ if (!ops || !ops->callbacks.gro_receive)
+ goto out_unlock;
+
+ if (iph->hop_limit <= 1)
+ goto out_unlock;
+
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ for (p = *head; p; p = p->next) {
+ struct ipv6hdr *iph2;
+ __be32 first_word; /* <Version:4><Traffic_Class:8><Flow_Label:20> */
+
+ if (!NAPI_GRO_CB(p)->same_flow)
+ continue;
+
+ if (!NAPI_GRO_CB(p)->is_ffwd) {
+ NAPI_GRO_CB(p)->same_flow = 0;
+ continue;
+ }
+
+ if (!skb_dst(p)) {
+ NAPI_GRO_CB(p)->same_flow = 0;
+ continue;
+ }
+
+ iph2 = ipv6_hdr(p);
+ first_word = *(__be32 *)iph ^ *(__be32 *)iph2;
+
+ /* All fields must match except length and Traffic Class.
+ * XXX skbs on the gro_list have all been parsed and pulled
+ * already so we don't need to compare nlen
+ * (nlen != (sizeof(*iph2) + ipv6_exthdrs_len(iph2, &ops)))
+ * memcmp() alone below is suffcient, right?
+ */
+ if ((first_word & htonl(0xF00FFFFF)) ||
+ memcmp(&iph->nexthdr, &iph2->nexthdr,
+ nlen - offsetof(struct ipv6hdr, nexthdr))) {
+ NAPI_GRO_CB(p)->same_flow = 0;
+ continue;
+ }
+ /* flush if Traffic Class fields are different */
+ NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
+
+ NAPI_GRO_CB(skb)->is_ffwd = 1;
+ skb_dst_set_noref(skb, skb_dst(p));
+ pp = &p;
+
+ break;
+ }
+
+ NAPI_GRO_CB(skb)->is_atomic = true;
+
+ iph->hop_limit--;
+
+ skb_pull(skb, off);
+ NAPI_GRO_CB(skb)->data_offset = sizeof(*iph);
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, sizeof(*iph));
+
+ pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
+out_unlock:
+ rcu_read_unlock();
+
+out:
+ NAPI_GRO_CB(skb)->data_offset = 0;
+ return pp;
+}
+
+static struct packet_offload nft_ip6_packet_offload __read_mostly = {
+ .type = cpu_to_be16(ETH_P_IPV6),
+ .priority = 0,
+ .callbacks = {
+ .gro_receive = nft_ipv6_gro_receive,
+ .gro_complete = nft_ipv6_gro_complete,
+ .gso_segment = nft_ipv6_gso_segment,
+ },
+};
+
+static const struct net_offload nft_udp6_offload = {
+ .callbacks = {
+ .gso_segment = nft_udp6_gso_segment,
+ .gro_receive = nft_udp_gro_receive,
+ },
+};
+
+static const struct net_offload nft_tcp6_offload = {
+ .callbacks = {
+ .gso_segment = nft_tcp6_gso_segment,
+ .gro_receive = nft_tcp_gro_receive,
+ },
+};
+
+static const struct net_offload __rcu *nft_ip6_offloads[MAX_INET_PROTOS] __read_mostly = {
+ [IPPROTO_UDP] = &nft_udp6_offload,
+ [IPPROTO_TCP] = &nft_tcp6_offload,
+};
+
+void nf_early_ingress_ip6_enable(void)
+{
+ dev_add_offload(&nft_ip6_packet_offload);
+}
+
+void nf_early_ingress_ip6_disable(void)
+{
+ dev_remove_offload(&nft_ip6_packet_offload);
+}
diff --git a/net/netfilter/early_ingress.c b/net/netfilter/early_ingress.c
index bf31aa8b3721..4daf6cfea304 100644
--- a/net/netfilter/early_ingress.c
+++ b/net/netfilter/early_ingress.c
@@ -312,6 +312,7 @@ void nf_early_ingress_enable(void)
if (nf_early_ingress_use++ == 0) {
nf_early_ingress_use++;
nf_early_ingress_ip_enable();
+ nf_early_ingress_ip6_enable();
}
}
@@ -319,5 +320,6 @@ void nf_early_ingress_disable(void)
{
if (--nf_early_ingress_use == 0) {
nf_early_ingress_ip_disable();
+ nf_early_ingress_ip6_disable();
}
}
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 05/13] netfilter: add early ingress hook for IPv4
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
From: Steffen Klassert <steffen.klassert@secunet.com>
Add the new early ingress hook for the netdev family, this new hook is
called from the GRO layer before the standard ipv4 GRO layers.
This hook allows us to perform early packet filtering and to define fast
forwarding path through packet chaining and flowtables using the new GSO
netfilter type. Packet that don't follow the fast path are passed up to
the standard GRO path for aggregation as usual.
This patch adds the GRO and GSO logic for this custom packet chaining.
The chaining uses the frag_list pointer so this means we do not need to
mangle the packets, therefore the aggregation strategy we follow does
not modify the packet as in the standard GRO path - we have no need to
recalculate checksum. This chain of packets is sent from the
.gro_complete callback directly to the neighbour layer. The first packet
in the chain holds a reference to the destination route.
Supported layer 4 protocols for this custom GRO packet chaining include
TCP and UDP.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/linux/netdevice.h | 2 +
include/linux/netfilter.h | 6 +
include/linux/netfilter_ingress.h | 1 +
include/net/netfilter/early_ingress.h | 20 +++
include/uapi/linux/netfilter.h | 1 +
net/ipv4/netfilter/Makefile | 1 +
net/ipv4/netfilter/early_ingress.c | 319 +++++++++++++++++++++++++++++++++
net/netfilter/Kconfig | 8 +
net/netfilter/Makefile | 1 +
net/netfilter/core.c | 35 +++-
net/netfilter/early_ingress.c | 323 ++++++++++++++++++++++++++++++++++
11 files changed, 716 insertions(+), 1 deletion(-)
create mode 100644 include/net/netfilter/early_ingress.h
create mode 100644 net/ipv4/netfilter/early_ingress.c
create mode 100644 net/netfilter/early_ingress.c
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 62734cf0c43a..c79922665be5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1880,6 +1880,8 @@ struct net_device {
rx_handler_func_t __rcu *rx_handler;
void __rcu *rx_handler_data;
+ struct nf_hook_entries __rcu *nf_hooks_early_ingress;
+
#ifdef CONFIG_NET_CLS_ACT
struct mini_Qdisc __rcu *miniq_ingress;
#endif
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 04551af2ff23..ad3f0b9ae4f1 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -429,4 +429,10 @@ extern struct nfnl_ct_hook __rcu *nfnl_ct_hook;
*/
DECLARE_PER_CPU(bool, nf_skb_duplicated);
+int nf_hook_netdev(struct sk_buff *skb, struct nf_hook_state *state,
+ const struct nf_hook_entries *e);
+
+void nf_early_ingress_enable(void);
+void nf_early_ingress_disable(void);
+
#endif /*__LINUX_NETFILTER_H*/
diff --git a/include/linux/netfilter_ingress.h b/include/linux/netfilter_ingress.h
index 554c920691dd..7b70c9d4c435 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -40,6 +40,7 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
static inline void nf_hook_ingress_init(struct net_device *dev)
{
+ RCU_INIT_POINTER(dev->nf_hooks_early_ingress, NULL);
RCU_INIT_POINTER(dev->nf_hooks_ingress, NULL);
}
#else /* CONFIG_NETFILTER_INGRESS */
diff --git a/include/net/netfilter/early_ingress.h b/include/net/netfilter/early_ingress.h
new file mode 100644
index 000000000000..caaef9fe619f
--- /dev/null
+++ b/include/net/netfilter/early_ingress.h
@@ -0,0 +1,20 @@
+#ifndef _NF_EARLY_INGRESS_H_
+#define _NF_EARLY_INGRESS_H_
+
+#include <net/protocol.h>
+
+struct sk_buff *nft_skb_segment(struct sk_buff *head_skb);
+struct sk_buff **nft_udp_gro_receive(struct sk_buff **head,
+ struct sk_buff *skb);
+struct sk_buff **nft_tcp_gro_receive(struct sk_buff **head,
+ struct sk_buff *skb);
+
+int nf_hook_early_ingress(struct sk_buff *skb);
+
+void nf_early_ingress_ip_enable(void);
+void nf_early_ingress_ip_disable(void);
+
+void nf_early_ingress_enable(void);
+void nf_early_ingress_disable(void);
+
+#endif
diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index cca10e767cd8..55d26b20e09f 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -54,6 +54,7 @@ enum nf_inet_hooks {
enum nf_dev_hooks {
NF_NETDEV_INGRESS,
+ NF_NETDEV_EARLY_INGRESS,
NF_NETDEV_NUMHOOKS
};
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 8394c17c269f..faf5fab59f0f 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -2,6 +2,7 @@
#
# Makefile for the netfilter modules on top of IPv4.
#
+obj-$(CONFIG_NETFILTER_EARLY_INGRESS) += early_ingress.o
# objects for l3 independent conntrack
nf_conntrack_ipv4-y := nf_conntrack_l3proto_ipv4.o nf_conntrack_proto_icmp.o
diff --git a/net/ipv4/netfilter/early_ingress.c b/net/ipv4/netfilter/early_ingress.c
new file mode 100644
index 000000000000..6ff6e34e5eff
--- /dev/null
+++ b/net/ipv4/netfilter/early_ingress.c
@@ -0,0 +1,319 @@
+#include <linux/kernel.h>
+#include <linux/netfilter.h>
+#include <linux/types.h>
+#include <net/xfrm.h>
+#include <net/arp.h>
+#include <net/udp.h>
+#include <net/tcp.h>
+#include <net/protocol.h>
+#include <net/netfilter/early_ingress.h>
+
+static const struct net_offload __rcu *nft_ip_offloads[MAX_INET_PROTOS] __read_mostly;
+
+static struct sk_buff *nft_udp4_gso_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ skb_push(skb, sizeof(struct iphdr));
+ return nft_skb_segment(skb);
+}
+
+static struct sk_buff *nft_tcp4_gso_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ skb_push(skb, sizeof(struct iphdr));
+ return nft_skb_segment(skb);
+}
+
+static struct sk_buff *nft_ipv4_gso_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ struct sk_buff *segs = ERR_PTR(-EINVAL);
+ const struct net_offload *ops;
+ struct packet_offload *ptype;
+ struct iphdr *iph;
+ int proto;
+ int ihl;
+
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_NFT)) {
+ ptype = dev_get_packet_offload(skb->protocol, 1);
+ if (ptype)
+ return ptype->callbacks.gso_segment(skb, features);
+
+ return ERR_PTR(-EPROTONOSUPPORT);
+ }
+
+ if (SKB_GSO_CB(skb)->encap_level == 0) {
+ iph = ip_hdr(skb);
+ skb_reset_network_header(skb);
+ } else {
+ iph = (struct iphdr *)skb->data;
+ }
+
+ if (unlikely(!pskb_may_pull(skb, sizeof(*iph))))
+ goto out;
+
+ ihl = iph->ihl * 4;
+ if (ihl < sizeof(*iph))
+ goto out;
+
+ SKB_GSO_CB(skb)->encap_level += ihl;
+
+ if (unlikely(!pskb_may_pull(skb, ihl)))
+ goto out;
+
+ __skb_pull(skb, ihl);
+
+ proto = iph->protocol;
+
+ segs = ERR_PTR(-EPROTONOSUPPORT);
+
+ ops = rcu_dereference(nft_ip_offloads[proto]);
+ if (likely(ops && ops->callbacks.gso_segment))
+ segs = ops->callbacks.gso_segment(skb, features);
+
+out:
+ return segs;
+}
+
+static int nft_ipv4_gro_complete(struct sk_buff *skb, int nhoff)
+{
+ struct iphdr *iph = (struct iphdr *)(skb->data + nhoff);
+ struct dst_entry *dst = skb_dst(skb);
+ struct rtable *rt = (struct rtable *)dst;
+ const struct net_offload *ops;
+ struct packet_offload *ptype;
+ struct net_device *dev;
+ struct neighbour *neigh;
+ unsigned int hh_len;
+ int err = 0;
+ u32 nexthop;
+ u16 count;
+
+ count = NAPI_GRO_CB(skb)->count;
+
+ if (!NAPI_GRO_CB(skb)->is_ffwd) {
+ ptype = dev_get_packet_offload(skb->protocol, 1);
+ if (ptype)
+ return ptype->callbacks.gro_complete(skb, nhoff);
+
+ return 0;
+ }
+
+ rcu_read_lock();
+ ops = rcu_dereference(nft_ip_offloads[iph->protocol]);
+ if (!ops || !ops->callbacks.gro_complete)
+ goto out_unlock;
+
+ /* Only need to add sizeof(*iph) to get to the next hdr below
+ * because any hdr with option will have been flushed in
+ * inet_gro_receive().
+ */
+ err = ops->callbacks.gro_complete(skb, nhoff + sizeof(*iph));
+
+out_unlock:
+ rcu_read_unlock();
+
+ if (err)
+ return err;
+
+ skb_shinfo(skb)->gso_type |= SKB_GSO_NFT;
+ skb_shinfo(skb)->gso_segs = count;
+
+ dev = dst->dev;
+ dev_hold(dev);
+ skb->dev = dev;
+
+ if (skb_dst(skb)->xfrm) {
+ err = dst_output(dev_net(dev), NULL, skb);
+ if (err != -EREMOTE)
+ return -EINPROGRESS;
+ }
+
+ if (count <= 1)
+ skb_gso_reset(skb);
+
+ hh_len = LL_RESERVED_SPACE(dev);
+
+ if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
+ struct sk_buff *skb2;
+
+ skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
+ if (!skb2) {
+ kfree_skb(skb);
+ return -ENOMEM;
+ }
+ consume_skb(skb);
+ skb = skb2;
+ }
+ rcu_read_lock();
+ nexthop = (__force u32) rt_nexthop(rt, iph->daddr);
+ neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
+ if (unlikely(!neigh))
+ neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
+ if (!IS_ERR(neigh))
+ neigh_output(neigh, skb);
+ rcu_read_unlock();
+
+ return -EINPROGRESS;
+}
+
+static struct sk_buff **nft_ipv4_gro_receive(struct sk_buff **head,
+ struct sk_buff *skb)
+{
+ const struct net_offload *ops;
+ struct packet_offload *ptype;
+ struct sk_buff **pp = NULL;
+ struct sk_buff *p;
+ struct iphdr *iph;
+ unsigned int hlen;
+ unsigned int off;
+ int proto, ret;
+
+ off = skb_gro_offset(skb);
+ hlen = off + sizeof(*iph);
+
+ iph = skb_gro_header_slow(skb, hlen, off);
+ if (unlikely(!iph)) {
+ pp = ERR_PTR(-EPERM);
+ goto out;
+ }
+
+ proto = iph->protocol;
+
+ rcu_read_lock();
+
+ if (*(u8 *)iph != 0x45) {
+ kfree_skb(skb);
+ pp = ERR_PTR(-EPERM);
+ goto out_unlock;
+ }
+
+ if (unlikely(ip_fast_csum((u8 *)iph, 5))) {
+ kfree_skb(skb);
+ pp = ERR_PTR(-EPERM);
+ goto out_unlock;
+ }
+
+ if (ip_is_fragment(iph))
+ goto out_unlock;
+
+ ret = nf_hook_early_ingress(skb);
+ switch (ret) {
+ case NF_STOLEN:
+ break;
+ case NF_ACCEPT:
+ ptype = dev_get_packet_offload(skb->protocol, 1);
+ if (ptype)
+ pp = ptype->callbacks.gro_receive(head, skb);
+
+ goto out_unlock;
+ case NF_DROP:
+ pp = ERR_PTR(-EPERM);
+ goto out_unlock;
+ }
+
+ ops = rcu_dereference(nft_ip_offloads[proto]);
+ if (!ops || !ops->callbacks.gro_receive)
+ goto out_unlock;
+
+ if (iph->ttl <= 1) {
+ kfree_skb(skb);
+ pp = ERR_PTR(-EPERM);
+ goto out_unlock;
+ }
+
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ for (p = *head; p; p = p->next) {
+ struct iphdr *iph2;
+
+ if (!NAPI_GRO_CB(p)->same_flow)
+ continue;
+
+ iph2 = ip_hdr(p);
+ /* The above works because, with the exception of the top
+ * (inner most) layer, we only aggregate pkts with the same
+ * hdr length so all the hdrs we'll need to verify will start
+ * at the same offset.
+ */
+ if ((iph->protocol ^ iph2->protocol) |
+ ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) |
+ ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) {
+ NAPI_GRO_CB(p)->same_flow = 0;
+ continue;
+ }
+
+ if (!NAPI_GRO_CB(p)->is_ffwd)
+ continue;
+
+ if (!skb_dst(p))
+ continue;
+
+ /* All fields must match except length and checksum. */
+ NAPI_GRO_CB(p)->flush |=
+ ((iph->ttl - 1) ^ iph2->ttl) |
+ (iph->tos ^ iph2->tos) |
+ ((iph->frag_off ^ iph2->frag_off) & htons(IP_DF));
+
+ pp = &p;
+
+ break;
+ }
+
+ NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));
+
+ ip_decrease_ttl(iph);
+ skb->priority = rt_tos2priority(iph->tos);
+
+ skb_pull(skb, off);
+ NAPI_GRO_CB(skb)->data_offset = sizeof(*iph);
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, sizeof(*iph));
+
+ pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
+out_unlock:
+ rcu_read_unlock();
+
+out:
+ NAPI_GRO_CB(skb)->data_offset = 0;
+ return pp;
+}
+
+static struct packet_offload nft_ipv4_packet_offload __read_mostly = {
+ .type = cpu_to_be16(ETH_P_IP),
+ .priority = 0,
+ .callbacks = {
+ .gro_receive = nft_ipv4_gro_receive,
+ .gro_complete = nft_ipv4_gro_complete,
+ .gso_segment = nft_ipv4_gso_segment,
+ },
+};
+
+static const struct net_offload nft_udp4_offload = {
+ .callbacks = {
+ .gso_segment = nft_udp4_gso_segment,
+ .gro_receive = nft_udp_gro_receive,
+ },
+};
+
+static const struct net_offload nft_tcp4_offload = {
+ .callbacks = {
+ .gso_segment = nft_tcp4_gso_segment,
+ .gro_receive = nft_tcp_gro_receive,
+ },
+};
+
+static const struct net_offload __rcu *nft_ip_offloads[MAX_INET_PROTOS] __read_mostly = {
+ [IPPROTO_UDP] = &nft_udp4_offload,
+ [IPPROTO_TCP] = &nft_tcp4_offload,
+};
+
+void nf_early_ingress_ip_enable(void)
+{
+ dev_add_offload(&nft_ipv4_packet_offload);
+}
+
+void nf_early_ingress_ip_disable(void)
+{
+ dev_remove_offload(&nft_ipv4_packet_offload);
+}
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index dbd7d1fad277..8f803a1fd76e 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -9,6 +9,14 @@ config NETFILTER_INGRESS
This allows you to classify packets from ingress using the Netfilter
infrastructure.
+config NETFILTER_EARLY_INGRESS
+ bool "Netfilter early ingress support"
+ default y
+ help
+ This allows you to perform very early filtering and packet aggregation
+ for fast forwarding bypass by exercising the GRO engine from the
+ Netfilter infrastructure.
+
config NETFILTER_NETLINK
tristate
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 44449389e527..eebc0e35f9e5 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
netfilter-objs := core.o nf_log.o nf_queue.o nf_sockopt.o utils.o
+netfilter-$(CONFIG_NETFILTER_EARLY_INGRESS) += early_ingress.o
nf_conntrack-y := nf_conntrack_core.o nf_conntrack_standalone.o nf_conntrack_expect.o nf_conntrack_helper.o nf_conntrack_proto.o nf_conntrack_l3proto_generic.o nf_conntrack_proto_generic.o nf_conntrack_proto_tcp.o nf_conntrack_proto_udp.o nf_conntrack_extend.o nf_conntrack_acct.o nf_conntrack_seqadj.o
nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += nf_conntrack_timeout.o
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 168af54db975..4885365380d3 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -306,6 +306,11 @@ nf_hook_entry_head(struct net *net, int pf, unsigned int hooknum,
return &dev->nf_hooks_ingress;
}
#endif
+ if (hooknum == NF_NETDEV_EARLY_INGRESS) {
+ if (dev && dev_net(dev) == net)
+ return &dev->nf_hooks_early_ingress;
+ }
+
WARN_ON_ONCE(1);
return NULL;
}
@@ -321,7 +326,8 @@ static int __nf_register_net_hook(struct net *net, int pf,
if (reg->hooknum == NF_NETDEV_INGRESS)
return -EOPNOTSUPP;
#endif
- if (reg->hooknum != NF_NETDEV_INGRESS ||
+ if ((reg->hooknum != NF_NETDEV_INGRESS &&
+ reg->hooknum != NF_NETDEV_EARLY_INGRESS) ||
!reg->dev || dev_net(reg->dev) != net)
return -EINVAL;
}
@@ -347,6 +353,9 @@ static int __nf_register_net_hook(struct net *net, int pf,
if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)
net_inc_ingress_queue();
#endif
+ if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EARLY_INGRESS)
+ nf_early_ingress_enable();
+
#ifdef HAVE_JUMP_LABEL
static_key_slow_inc(&nf_hooks_needed[pf][reg->hooknum]);
#endif
@@ -404,6 +413,9 @@ static void __nf_unregister_net_hook(struct net *net, int pf,
#ifdef CONFIG_NETFILTER_INGRESS
if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)
net_dec_ingress_queue();
+
+ if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EARLY_INGRESS)
+ nf_early_ingress_disable();
#endif
#ifdef HAVE_JUMP_LABEL
static_key_slow_dec(&nf_hooks_needed[pf][reg->hooknum]);
@@ -535,6 +547,27 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
}
EXPORT_SYMBOL(nf_hook_slow);
+int nf_hook_netdev(struct sk_buff *skb, struct nf_hook_state *state,
+ const struct nf_hook_entries *e)
+{
+ unsigned int verdict, s, v = NF_ACCEPT;
+
+ for (s = 0; s < e->num_hook_entries; s++) {
+ verdict = nf_hook_entry_hookfn(&e->hooks[s], skb, state);
+ v = verdict & NF_VERDICT_MASK;
+ switch (v) {
+ case NF_ACCEPT:
+ break;
+ case NF_DROP:
+ kfree_skb(skb);
+ /* Fall through */
+ default:
+ return v;
+ }
+ }
+
+ return v;
+}
int skb_make_writable(struct sk_buff *skb, unsigned int writable_len)
{
diff --git a/net/netfilter/early_ingress.c b/net/netfilter/early_ingress.c
new file mode 100644
index 000000000000..bf31aa8b3721
--- /dev/null
+++ b/net/netfilter/early_ingress.c
@@ -0,0 +1,323 @@
+#include <linux/kernel.h>
+#include <linux/netfilter.h>
+#include <linux/types.h>
+#include <net/xfrm.h>
+#include <net/arp.h>
+#include <net/udp.h>
+#include <net/tcp.h>
+#include <net/protocol.h>
+#include <crypto/aead.h>
+#include <net/netfilter/early_ingress.h>
+
+/* XXX: Maybe export this from net/core/skbuff.c
+ * instead of holding a local copy */
+static void skb_headers_offset_update(struct sk_buff *skb, int off)
+{
+ /* Only adjust this if it actually is csum_start rather than csum */
+ if (skb->ip_summed == CHECKSUM_PARTIAL)
+ skb->csum_start += off;
+ /* {transport,network,mac}_header and tail are relative to skb->head */
+ skb->transport_header += off;
+ skb->network_header += off;
+ if (skb_mac_header_was_set(skb))
+ skb->mac_header += off;
+ skb->inner_transport_header += off;
+ skb->inner_network_header += off;
+ skb->inner_mac_header += off;
+}
+
+struct sk_buff *nft_skb_segment(struct sk_buff *head_skb)
+{
+ unsigned int headroom;
+ struct sk_buff *nskb;
+ struct sk_buff *segs = NULL;
+ struct sk_buff *tail = NULL;
+ unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
+ struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
+ unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
+ unsigned int delta_segs, delta_len, delta_truesize;
+
+ __skb_push(head_skb, doffset);
+
+ headroom = skb_headroom(head_skb);
+
+ delta_segs = delta_len = delta_truesize = 0;
+
+ skb_shinfo(head_skb)->frag_list = NULL;
+
+ segs = skb_clone(head_skb, GFP_ATOMIC);
+ if (unlikely(!segs))
+ return ERR_PTR(-ENOMEM);
+
+ do {
+ nskb = list_skb;
+
+ list_skb = list_skb->next;
+
+ if (!tail)
+ segs->next = nskb;
+ else
+ tail->next = nskb;
+
+ tail = nskb;
+
+ delta_len += nskb->len;
+ delta_truesize += nskb->truesize;
+
+ skb_push(nskb, doffset);
+
+ nskb->dev = head_skb->dev;
+ nskb->queue_mapping = head_skb->queue_mapping;
+ nskb->network_header = head_skb->network_header;
+ nskb->mac_len = head_skb->mac_len;
+ nskb->mac_header = head_skb->mac_header;
+ nskb->transport_header = head_skb->transport_header;
+
+ if (!secpath_exists(nskb))
+ nskb->sp = secpath_get(head_skb->sp);
+
+ skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom);
+
+ skb_copy_from_linear_data_offset(head_skb, -tnl_hlen,
+ nskb->data - tnl_hlen,
+ doffset + tnl_hlen);
+
+ } while (list_skb);
+
+ segs->len = head_skb->len - delta_len;
+ segs->data_len = head_skb->data_len - delta_len;
+ segs->truesize += head_skb->data_len - delta_truesize;
+
+ head_skb->len = segs->len;
+ head_skb->data_len = segs->data_len;
+ head_skb->truesize += segs->truesize;
+
+ skb_shinfo(segs)->gso_size = 0;
+ skb_shinfo(segs)->gso_segs = 0;
+ skb_shinfo(segs)->gso_type = 0;
+
+ segs->prev = tail;
+
+ return segs;
+}
+
+static int nft_skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+ struct sk_buff *p = *head;
+
+ if (unlikely((!NAPI_GRO_CB(p)->is_ffwd) || !skb_dst(p)))
+ return -EINVAL;
+
+ if (NAPI_GRO_CB(p)->last == p)
+ skb_shinfo(p)->frag_list = skb;
+ else
+ NAPI_GRO_CB(p)->last->next = skb;
+ NAPI_GRO_CB(p)->last = skb;
+
+ NAPI_GRO_CB(p)->count++;
+ p->data_len += skb->len;
+ p->truesize += skb->truesize;
+ p->len += skb->len;
+
+ NAPI_GRO_CB(skb)->same_flow = 1;
+ return 0;
+}
+
+static struct sk_buff **udp_gro_ffwd_receive(struct sk_buff **head,
+ struct sk_buff *skb,
+ struct udphdr *uh)
+{
+ struct sk_buff *p = NULL;
+ struct sk_buff **pp = NULL;
+ struct udphdr *uh2;
+ int flush = 0;
+
+ for (; (p = *head); head = &p->next) {
+
+ if (!NAPI_GRO_CB(p)->same_flow)
+ continue;
+
+ uh2 = udp_hdr(p);
+
+ /* Match ports and either checksums are either both zero
+ * or nonzero.
+ */
+ if ((*(u32 *)&uh->source != *(u32 *)&uh2->source) ||
+ (!uh->check ^ !uh2->check)) {
+ NAPI_GRO_CB(p)->same_flow = 0;
+ continue;
+ }
+
+ goto found;
+ }
+
+ goto out;
+
+found:
+ p = *head;
+
+ if (nft_skb_gro_receive(head, skb))
+ flush = 1;
+
+out:
+ if (p && (!NAPI_GRO_CB(skb)->same_flow || flush))
+ pp = head;
+
+ NAPI_GRO_CB(skb)->flush |= flush;
+ return pp;
+}
+
+struct sk_buff **nft_udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+ struct udphdr *uh;
+
+ uh = skb_gro_header_slow(skb, skb_transport_offset(skb) + sizeof(struct udphdr),
+ skb_transport_offset(skb));
+
+ if (unlikely(!uh))
+ goto flush;
+
+ if (NAPI_GRO_CB(skb)->flush)
+ goto flush;
+
+ if (NAPI_GRO_CB(skb)->is_ffwd)
+ return udp_gro_ffwd_receive(head, skb, uh);
+
+flush:
+ NAPI_GRO_CB(skb)->flush = 1;
+ return NULL;
+}
+
+struct sk_buff **nft_tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+ struct sk_buff **pp = NULL;
+ struct sk_buff *p;
+ struct tcphdr *th;
+ struct tcphdr *th2;
+ unsigned int len;
+ unsigned int thlen;
+ __be32 flags;
+ unsigned int mss = 1;
+ unsigned int hlen;
+ int flush = 1;
+ int i;
+
+ th = skb_gro_header_slow(skb, skb_transport_offset(skb) + sizeof(struct tcphdr),
+ skb_transport_offset(skb));
+ if (unlikely(!th))
+ goto out;
+
+ thlen = th->doff * 4;
+ if (thlen < sizeof(*th))
+ goto out;
+
+ hlen = skb_transport_offset(skb) + thlen;
+
+ th = skb_gro_header_slow(skb, hlen, skb_transport_offset(skb));
+ if (unlikely(!th))
+ goto out;
+
+ skb_gro_pull(skb, thlen);
+ len = skb_gro_len(skb);
+ flags = tcp_flag_word(th);
+
+ for (; (p = *head); head = &p->next) {
+ if (!NAPI_GRO_CB(p)->same_flow)
+ continue;
+
+ th2 = tcp_hdr(p);
+
+ if (*(u32 *)&th->source ^ *(u32 *)&th2->source) {
+ NAPI_GRO_CB(p)->same_flow = 0;
+ continue;
+ }
+
+ goto found;
+ }
+
+ goto out_check_final;
+
+found:
+ flush = NAPI_GRO_CB(p)->flush;
+ flush |= (__force int)(flags & TCP_FLAG_CWR);
+ flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
+ ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
+ flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
+ for (i = sizeof(*th); i < thlen; i += 4)
+ flush |= *(u32 *)((u8 *)th + i) ^
+ *(u32 *)((u8 *)th2 + i);
+
+ mss = skb_shinfo(p)->gso_size;
+
+ flush |= (len - 1) >= mss;
+ flush |= (ntohl(th2->seq) + (skb_gro_len(p) - (hlen * (NAPI_GRO_CB(p)->count - 1)))) ^ ntohl(th->seq);
+
+ if (flush || nft_skb_gro_receive(head, skb)) {
+ mss = 1;
+ goto out_check_final;
+ }
+
+ p = *head;
+
+out_check_final:
+ flush = len < mss;
+ flush |= (__force int)(flags & (TCP_FLAG_URG | TCP_FLAG_PSH |
+ TCP_FLAG_RST | TCP_FLAG_SYN |
+ TCP_FLAG_FIN));
+
+ if (p && (!NAPI_GRO_CB(skb)->same_flow || flush))
+ pp = head;
+
+out:
+ NAPI_GRO_CB(skb)->flush |= (flush != 0);
+
+ return pp;
+}
+
+static inline bool nf_hook_early_ingress_active(const struct sk_buff *skb)
+{
+#ifdef HAVE_JUMP_LABEL
+ if (!static_key_false(&nf_hooks_needed[NFPROTO_NETDEV][NF_NETDEV_EARLY_INGRESS]))
+ return false;
+#endif
+ return rcu_access_pointer(skb->dev->nf_hooks_early_ingress);
+}
+
+int nf_hook_early_ingress(struct sk_buff *skb)
+{
+ struct nf_hook_entries *e =
+ rcu_dereference(skb->dev->nf_hooks_early_ingress);
+ struct nf_hook_state state;
+ int ret = NF_ACCEPT;
+
+ if (nf_hook_early_ingress_active(skb)) {
+ if (unlikely(!e))
+ return 0;
+
+ nf_hook_state_init(&state, NF_NETDEV_EARLY_INGRESS,
+ NFPROTO_NETDEV, skb->dev, NULL, NULL,
+ dev_net(skb->dev), NULL);
+
+ ret = nf_hook_netdev(skb, &state, e);
+ }
+
+ return ret;
+}
+
+/* protected by nf_hook_mutex. */
+static int nf_early_ingress_use;
+
+void nf_early_ingress_enable(void)
+{
+ if (nf_early_ingress_use++ == 0) {
+ nf_early_ingress_use++;
+ nf_early_ingress_ip_enable();
+ }
+}
+
+void nf_early_ingress_disable(void)
+{
+ if (--nf_early_ingress_use == 0) {
+ nf_early_ingress_ip_disable();
+ }
+}
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 04/13] net: Use one bit of NAPI_GRO_CB for the netfilter fastpath.
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
From: Steffen Klassert <steffen.klassert@secunet.com>
This patch adds a is_ffwd bit to the NAPI_GRO_CB to indicate
fastpath packtes in the GRO layer. It also implements the
logic we need for this in the generic codepath. The rest
of the needed logic is implemented within netfilter and
introduced with a followup patch.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/linux/netdevice.h | 2 +-
net/core/dev.c | 36 +++++++++++++++++++++++++++---------
2 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d8cadfa3769b..62734cf0c43a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2238,7 +2238,7 @@ struct napi_gro_cb {
/* Number of gro_receive callbacks this packet already went through */
u8 recursion_counter:4;
- /* 1 bit hole */
+ u8 is_ffwd:1;
/* used to support CHECKSUM_COMPLETE for tunneling protocols */
__wsum csum;
diff --git a/net/core/dev.c b/net/core/dev.c
index 115de8bfcb54..75f530886874 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4864,7 +4864,8 @@ static int napi_gro_complete(struct sk_buff *skb)
BUILD_BUG_ON(sizeof(struct napi_gro_cb) > sizeof(skb->cb));
- if (NAPI_GRO_CB(skb)->count == 1) {
+ if (NAPI_GRO_CB(skb)->count == 1 &&
+ !(NAPI_GRO_CB(skb)->is_ffwd)) {
skb_shinfo(skb)->gso_size = 0;
goto out;
}
@@ -4880,8 +4881,10 @@ static int napi_gro_complete(struct sk_buff *skb)
rcu_read_unlock();
if (err) {
- WARN_ON(&ptype->list == head);
- kfree_skb(skb);
+ if (err != -EINPROGRESS) {
+ WARN_ON(&ptype->list == head);
+ kfree_skb(skb);
+ }
return NET_RX_SUCCESS;
}
@@ -4936,8 +4939,10 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
diffs |= p->vlan_tci ^ skb->vlan_tci;
- diffs |= skb_metadata_dst_cmp(p, skb);
- diffs |= skb_metadata_differs(p, skb);
+ if (!NAPI_GRO_CB(p)->is_ffwd) {
+ diffs |= skb_metadata_dst_cmp(p, skb);
+ diffs |= skb_metadata_differs(p, skb);
+ }
if (maclen == ETH_HLEN)
diffs |= compare_ether_header(skb_mac_header(p),
skb_mac_header(skb));
@@ -5019,6 +5024,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
NAPI_GRO_CB(skb)->is_fou = 0;
NAPI_GRO_CB(skb)->is_atomic = 1;
NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
+ NAPI_GRO_CB(skb)->is_ffwd = 0;
/* Setup for GRO checksum validation */
switch (skb->ip_summed) {
@@ -5044,9 +5050,14 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
if (&ptype->list == head)
goto normal;
- if (IS_ERR(pp) && PTR_ERR(pp) == -EINPROGRESS) {
- ret = GRO_CONSUMED;
- goto ok;
+ if (IS_ERR(pp)) {
+ int err;
+
+ err = PTR_ERR(pp);
+ if (err == -EINPROGRESS || err == -EPERM) {
+ ret = GRO_CONSUMED;
+ goto ok;
+ }
}
same_flow = NAPI_GRO_CB(skb)->same_flow;
@@ -5064,8 +5075,15 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
if (same_flow)
goto ok;
- if (NAPI_GRO_CB(skb)->flush)
+ if (NAPI_GRO_CB(skb)->flush) {
+ if (NAPI_GRO_CB(skb)->is_ffwd) {
+ napi_gro_complete(skb);
+ ret = GRO_CONSUMED;
+ goto ok;
+ }
+
goto normal;
+ }
if (unlikely(napi->gro_count >= MAX_GRO_SKBS)) {
struct sk_buff *nskb = napi->gro_list;
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 03/13] net: Add a GSO feature bit for the netfilter forward fastpath.
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
From: Steffen Klassert <steffen.klassert@secunet.com>
The netfilter forward fastpath has its own logic to create
GSO packets. So add a feature bit that we can catch GSO
packets that are generated by the fastpath GRO handler.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/linux/netdev_features.h | 4 +++-
include/linux/netdevice.h | 1 +
include/linux/skbuff.h | 2 ++
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 623bb8ced060..f380a27410ef 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -56,8 +56,9 @@ enum {
NETIF_F_GSO_ESP_BIT, /* ... ESP with TSO */
NETIF_F_GSO_UDP_BIT, /* ... UFO, deprecated except tuntap */
NETIF_F_GSO_UDP_L4_BIT, /* ... UDP payload GSO (not UFO) */
+ NETIF_F_GSO_NFT_BIT, /* ... NFT generic */
/**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */
- NETIF_F_GSO_UDP_L4_BIT,
+ NETIF_F_GSO_NFT_BIT,
NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */
NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */
@@ -140,6 +141,7 @@ enum {
#define NETIF_F_GSO_SCTP __NETIF_F(GSO_SCTP)
#define NETIF_F_GSO_ESP __NETIF_F(GSO_ESP)
#define NETIF_F_GSO_UDP __NETIF_F(GSO_UDP)
+#define NETIF_F_GSO_NFT __NETIF_F(GSO_NFT)
#define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
#define NETIF_F_HW_VLAN_STAG_RX __NETIF_F(HW_VLAN_STAG_RX)
#define NETIF_F_HW_VLAN_STAG_TX __NETIF_F(HW_VLAN_STAG_TX)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 13a56f9b2a32..d8cadfa3769b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4229,6 +4229,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
+ BUILD_BUG_ON(SKB_GSO_NFT != (NETIF_F_GSO_NFT >> NETIF_F_GSO_SHIFT));
return (features & feature) == feature;
}
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c86885954994..4a5cff1ffcaa 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -575,6 +575,8 @@ enum {
SKB_GSO_UDP = 1 << 16,
SKB_GSO_UDP_L4 = 1 << 17,
+
+ SKB_GSO_NFT = 1 << 18,
};
#if BITS_PER_LONG > 32
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 02/13] net: Change priority of ipv4 and ipv6 packet offloads.
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
From: Steffen Klassert <steffen.klassert@secunet.com>
The forward fastpath needs to insert callbacks with
higher priority than the standard callbacks. So change
the priority of ipv4 and ipv6 packet offloads from zero
to one. With this we are able to insert callbacks with
priotity zero if needed.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/ipv4/af_inet.c | 1 +
net/ipv6/ip6_offload.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 15e125558c76..fbb90f7556ea 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1841,6 +1841,7 @@ static int ipv4_proc_init(void);
static struct packet_offload ip_packet_offload __read_mostly = {
.type = cpu_to_be16(ETH_P_IP),
+ .priority = 1,
.callbacks = {
.gso_segment = inet_gso_segment,
.gro_receive = inet_gro_receive,
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 5b3f2f89ef41..863913fb690f 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -343,6 +343,7 @@ static int ip4ip6_gro_complete(struct sk_buff *skb, int nhoff)
static struct packet_offload ipv6_packet_offload __read_mostly = {
.type = cpu_to_be16(ETH_P_IPV6),
+ .priority = 1,
.callbacks = {
.gso_segment = ipv6_gso_segment,
.gro_receive = ipv6_gro_receive,
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 01/13] net: Add a helper to get the packet offload callbacks by priority.
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
In-Reply-To: <20180614141947.3580-1-pablo@netfilter.org>
From: Steffen Klassert <steffen.klassert@secunet.com>
With this helper it is possible to request callbacks with
a certain priority. This will be used in the upcoming forward
fastpath to pass packets to the standard GRO path.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/linux/netdevice.h | 1 +
net/core/dev.c | 14 ++++++++++++++
2 files changed, 15 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ec9850c7936..13a56f9b2a32 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2523,6 +2523,7 @@ void dev_remove_pack(struct packet_type *pt);
void __dev_remove_pack(struct packet_type *pt);
void dev_add_offload(struct packet_offload *po);
void dev_remove_offload(struct packet_offload *po);
+struct packet_offload *dev_get_packet_offload(__be16 type, int priority);
int dev_get_iflink(const struct net_device *dev);
int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 6e18242a1cae..115de8bfcb54 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -468,7 +468,21 @@ void dev_remove_pack(struct packet_type *pt)
}
EXPORT_SYMBOL(dev_remove_pack);
+struct packet_offload *dev_get_packet_offload(__be16 type, int priority)
+{
+ struct list_head *offload_head = &offload_base;
+ struct packet_offload *ptype;
+
+ list_for_each_entry_rcu(ptype, offload_head, list) {
+ if (ptype->type != type || !ptype->callbacks.gro_receive || !ptype->callbacks.gro_complete || ptype->priority < priority)
+ continue;
+ return ptype;
+ }
+
+ return NULL;
+}
+EXPORT_SYMBOL(dev_get_packet_offload);
/**
* dev_add_offload - register offload handlers
* @po: protocol offload declaration
--
2.11.0
^ permalink raw reply related
* [PATCH net-next,RFC 00/13] New fast forwarding path
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, steffen.klassert
Hi,
This patchset proposes a new fast forwarding path infrastructure that
combines the GRO/GSO and the flowtable infrastructures. The idea is to
add a hook at the GRO layer that is invoked before the standard GRO
protocol offloads. This allows us to build custom packet chains that we
can quickly pass in one go to the neighbour layer to define fast
forwarding path for flows.
For each packet that gets into the GRO layer, we first check if there is
an entry in the flowtable, if so, the packet is placed in a list until
the GRO infrastructure decides to send the batch from gro_complete to
the neighbour layer. The first packet in the list takes the route from
the flowtable entry, so we avoid reiterative routing lookups.
In case no entry is found in the flowtable, the packet is passed up to
the classic GRO offload handlers. Thus, this packet follows the standard
forwarding path. Note that the initial packets of the flow always go
through the standard IPv4/IPv6 netfilter forward hook, that is used to
configure what flows are placed in the flowtable. Therefore, only a few
(initial) packets follow the standard forwarding path while most of the
follow up packets take this new fast forwarding path.
The fast forwarding path is enabled through explicit user policy, so the
user needs to request this behaviour from control plane, the following
example shows how to place flows in the new fast forwarding path from
the netfilter forward chain:
table x {
flowtable f {
hook early_ingress priority 0; devices = { eth0, eth1 }
}
chain y {
type filter hook forward priority 0;
ip protocol tcp flow offload @f
}
}
The example above defines a fastpath for TCP flows that are placed in
the flowtable 'f', this flowtable is hooked at the new early_ingress
hook. The initial TCP packets that match this rule from the standard
fowarding path create an entry in the flowtable, thus, GRO creates chain
of packets for those that find an entry in the flowtable and send
them through the neighbour layer.
This new hook is happening before the ingress taps, therefore, packets
that follow this new fast forwarding path are not shown by tcpdump.
This patchset supports both layer 3 IPv4 and IPv6, and layer 4 TCP and
UDP protocols. This fastpath also integrates with the IPSec
infrastructure and the ESP protocol.
We have collected performance numbers:
TCP TSO TCP Fast Forward
32.5 Gbps 35.6 Gbps
UDP UDP Fast Forward
17.6 Gbps 35.6 Gbps
ESP ESP Fast Forward
6 Gbps 7.5 Gbps
For UDP, this is doubling performance, and we almost achieve line rate
with one single CPU using the Intel i40e NIC. We got similar numbers
with the Mellanox ConnectX-4. For TCP, this is slightly improving things
even if TSO is being defeated given that we need to segment the packet
chain in software. We would like to explore HW GRO support with hardware
vendors with this new mode, we think that should improve the TCP numbers
we are showing above even more. For ESP traffic, performance improvement
is ~25%, in this case, perf shows the bottleneck becomes the crypto layer.
This patchset is co-authored work with Steffen Klassert.
Comments are welcome, thanks.
Pablo Neira Ayuso (6):
netfilter: nft_chain_filter: add support for early ingress
netfilter: nf_flow_table: add hooknum to flowtable type
netfilter: nf_flow_table: add flowtable for early ingress hook
netfilter: nft_flow_offload: enable offload after second packet is seen
netfilter: nft_flow_offload: remove secpath check
netfilter: nft_flow_offload: make sure route is not stale
Steffen Klassert (7):
net: Add a helper to get the packet offload callbacks by priority.
net: Change priority of ipv4 and ipv6 packet offloads.
net: Add a GSO feature bit for the netfilter forward fastpath.
net: Use one bit of NAPI_GRO_CB for the netfilter fastpath.
netfilter: add early ingress hook for IPv4
netfilter: add early ingress support for IPv6
netfilter: add ESP support for early ingress
include/linux/netdev_features.h | 4 +-
include/linux/netdevice.h | 6 +-
include/linux/netfilter.h | 6 +
include/linux/netfilter_ingress.h | 1 +
include/linux/skbuff.h | 2 +
include/net/netfilter/early_ingress.h | 24 +++
include/net/netfilter/nf_flow_table.h | 4 +
include/uapi/linux/netfilter.h | 1 +
net/core/dev.c | 50 ++++-
net/ipv4/af_inet.c | 1 +
net/ipv4/netfilter/Makefile | 1 +
net/ipv4/netfilter/early_ingress.c | 327 +++++++++++++++++++++++++++++
net/ipv4/netfilter/nf_flow_table_ipv4.c | 12 ++
net/ipv6/ip6_offload.c | 1 +
net/ipv6/netfilter/Makefile | 1 +
net/ipv6/netfilter/early_ingress.c | 315 ++++++++++++++++++++++++++++
net/ipv6/netfilter/nf_flow_table_ipv6.c | 1 +
net/netfilter/Kconfig | 8 +
net/netfilter/Makefile | 1 +
net/netfilter/core.c | 35 +++-
net/netfilter/early_ingress.c | 361 ++++++++++++++++++++++++++++++++
net/netfilter/nf_flow_table_inet.c | 1 +
net/netfilter/nf_flow_table_ip.c | 72 +++++++
net/netfilter/nf_tables_api.c | 120 ++++++-----
net/netfilter/nft_chain_filter.c | 6 +-
net/netfilter/nft_flow_offload.c | 13 +-
net/xfrm/xfrm_output.c | 4 +
27 files changed, 1297 insertions(+), 81 deletions(-)
create mode 100644 include/net/netfilter/early_ingress.h
create mode 100644 net/ipv4/netfilter/early_ingress.c
create mode 100644 net/ipv6/netfilter/early_ingress.c
create mode 100644 net/netfilter/early_ingress.c
--
2.11.0
^ permalink raw reply
* Re: FW: [PATCH 2/2] ath10k: allow ATH10K_SNOC with COMPILE_TEST
From: Kalle Valo @ 2018-06-14 14:09 UTC (permalink / raw)
To: Niklas Cassel
Cc: Govind Singh, bjorn.andersson, davem, netdev, linux-wireless,
linux-kernel, ath10k
In-Reply-To: <20180613132819.GA12603@centauri.ideon.se>
Niklas Cassel <niklas.cassel@linaro.org> writes:
> On Tue, Jun 12, 2018 at 02:44:03PM +0200, Niklas Cassel wrote:
>> On Tue, Jun 12, 2018 at 06:02:48PM +0530, Govind Singh wrote:
>> > On 2018-06-12 17:45, Govind Singh wrote:
>> > >
>> > > ATH10K_SNOC builds just fine with COMPILE_TEST, so make that possible.
>> > >
>> > > Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
>> > > ---
>> > > drivers/net/wireless/ath/ath10k/Kconfig | 3 ++-
>> > > 1 file changed, 2 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/drivers/net/wireless/ath/ath10k/Kconfig
>> > > b/drivers/net/wireless/ath/ath10k/Kconfig
>> > > index 54ff5930126c..6572a43590a8 100644
>> > > --- a/drivers/net/wireless/ath/ath10k/Kconfig
>> > > +++ b/drivers/net/wireless/ath/ath10k/Kconfig
>> > > @@ -42,7 +42,8 @@ config ATH10K_USB
>> > >
>> > > config ATH10K_SNOC
>> > > tristate "Qualcomm ath10k SNOC support (EXPERIMENTAL)"
>> > > - depends on ATH10K && ARCH_QCOM
>> > > + depends on ATH10K
>> > > + depends on ARCH_QCOM || COMPILE_TEST
>> > > ---help---
>> > > This module adds support for integrated WCN3990 chip connected
>> > > to system NOC(SNOC). Currently work in progress and will not
>> >
>> > Thanks Niklas for enabling COMPILE_TEST. With QMI set of
>> > changes(https://patchwork.kernel.org/patch/10448183/), we need to enable
>> > COMPILE_TEST for
>> > QCOM_SCM/QMI_HELPERS which seems broken today. Are you planning to fix the
>> > same.
>
> This patch is good as is.
>
> However, Govind's QMI patch set together with this patch
> resulted in build errors.
>
> FTR, these are fixed by:
> https://marc.info/?l=linux-kernel&m=152880985402356
> https://marc.info/?l=linux-kernel&m=152889452326350
So the problem is that if I apply this patch I can't apply Govind's QMI
patchset (due to the build problems) until Niklas' fixes to qcom and
rpmsg subsystems propogate back to my tree and that might take weeks, or
even months. But I really would like to apply the QMI patchset ASAP so
that we can complete the wcn3990 support and not unnecessarily delay it.
So what I propose is that I put this patch 2 as 'Awaiting Upstream' in
patchwork and apply it once Niklas' patches get to my tree. Does that
sound good?
--
Kalle Valo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox