* [PATCH net-next] sctp: define sctp_packet_gso_append to build GSO frames
From: Xin Long @ 2018-06-13 23:37 UTC (permalink / raw)
To: network dev, linux-sctp
Cc: Marcelo Ricardo Leitner, Neil Horman, davem, eric.dumazet
Now sctp GSO uses skb_gro_receive() to append the data into head
skb frag_list. However it actually only needs very few code from
skb_gro_receive(). Besides, NAPI_GRO_CB has to be set while most
of its members are not needed here.
This patch is to add sctp_packet_gso_append() to build GSO frames
instead of skb_gro_receive(), and it would avoid many unnecessary
checks and make the code clearer.
Note that sctp will use page frags instead of frag_list to build
GSO frames in another patch. But it may take time, as sctp's GSO
frames may have different size. skb_segment() can only split it
into the frags with the same size, which would break the border
of sctp chunks.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
include/net/sctp/structs.h | 5 +++++
net/sctp/output.c | 28 ++++++++++++++++++----------
2 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ebf809e..dbe1b91 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1133,6 +1133,11 @@ struct sctp_input_cb {
};
#define SCTP_INPUT_CB(__skb) ((struct sctp_input_cb *)&((__skb)->cb[0]))
+struct sctp_output_cb {
+ struct sk_buff *last;
+};
+#define SCTP_OUTPUT_CB(__skb) ((struct sctp_output_cb *)&((__skb)->cb[0]))
+
static inline const struct sk_buff *sctp_gso_headskb(const struct sk_buff *skb)
{
const struct sctp_chunk *chunk = SCTP_INPUT_CB(skb)->chunk;
diff --git a/net/sctp/output.c b/net/sctp/output.c
index e672dee..7f849b0 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -409,6 +409,21 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk)
refcount_inc(&sk->sk_wmem_alloc);
}
+static void sctp_packet_gso_append(struct sk_buff *head, struct sk_buff *skb)
+{
+ if (SCTP_OUTPUT_CB(head)->last == head)
+ skb_shinfo(head)->frag_list = skb;
+ else
+ SCTP_OUTPUT_CB(head)->last->next = skb;
+ SCTP_OUTPUT_CB(head)->last = skb;
+
+ head->truesize += skb->truesize;
+ head->data_len += skb->len;
+ head->len += skb->len;
+
+ __skb_header_release(skb);
+}
+
static int sctp_packet_pack(struct sctp_packet *packet,
struct sk_buff *head, int gso, gfp_t gfp)
{
@@ -422,7 +437,7 @@ static int sctp_packet_pack(struct sctp_packet *packet,
if (gso) {
skb_shinfo(head)->gso_type = sk->sk_gso_type;
- NAPI_GRO_CB(head)->last = head;
+ SCTP_OUTPUT_CB(head)->last = head;
} else {
nskb = head;
pkt_size = packet->size;
@@ -503,15 +518,8 @@ static int sctp_packet_pack(struct sctp_packet *packet,
&packet->chunk_list);
}
- if (gso) {
- if (skb_gro_receive(&head, nskb)) {
- kfree_skb(nskb);
- return 0;
- }
- if (WARN_ON_ONCE(skb_shinfo(head)->gso_segs >=
- sk->sk_gso_max_segs))
- return 0;
- }
+ if (gso)
+ sctp_packet_gso_append(head, nskb);
pkt_count++;
} while (!list_empty(&packet->chunk_list));
--
2.1.0
^ permalink raw reply related
* Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format
From: Martin KaFai Lau @ 2018-06-13 23:26 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
Wang Nan, Jiri Olsa, Namhyung Kim, Ingo Molnar
In-Reply-To: <20180612204126.GC22039@kernel.org>
On Tue, Jun 12, 2018 at 05:41:26PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jun 12, 2018 at 05:31:24PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, Jun 07, 2018 at 01:07:01PM -0700, Martin KaFai Lau escreveu:
> > > On Thu, Jun 07, 2018 at 04:30:29PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > So this must be available in a newer llvm version? Which one?
> >
> > > I should have put in the details in my last email or
> > > in the commit message, my bad.
> >
> > > 1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
> > > LLC_FLAGS needed to compile the bpf prog. It requires a new
> > > "-mattr=dwarf" llc option which was added to the future
> > > llvm 7.0.
>
> > [root@jouet bpf]# pahole hello.o
> > struct clang version 5.0.1 (tags/RELEASE_501/final) {
> > clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 (tags/RELEASE_501/final); /* 0 4 */
> > clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 (tags/RELEASE_501/final); /* 4 4 */
> > clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 (tags/RELEASE_501/final); /* 8 4 */
> > clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 (tags/RELEASE_501/final); /* 12 4 */
> >
> > /* size: 16, cachelines: 1, members: 4 */
> > /* last cacheline: 16 bytes */
> > };
> > [root@jouet bpf]#
> >
> > Ok, I guess I saw this case in the llvm/clang git logs, so this one was
> > generated with the older clang, will regenerate and add that "-mattr=dwarf"
> > part.
>
> [root@jouet bpf]# pahole hello.o
> struct clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78e82f54be8fb0bb5f02e3ca674fbde10ef34) {
> clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78 clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78e82f54be8fb0bb5f02e3ca674fbde10ef34); /* 0 4 */
> clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78 clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78e82f54be8fb0bb5f02e3ca674fbde10ef34); /* 4 4 */
> clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78 clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78e82f54be8fb0bb5f02e3ca674fbde10ef34); /* 8 4 */
> clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78 clang version 7.0.0 (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_clang.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=_Qzsu689xEjjl9JvYCvJsIZLZZKDLB6rM-Uc0gqQvyg&e= 8c873daccce7ee5339b9fd82c81fe02b73543b65) (https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_git_llvm.git&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=4d495SlcvobgBOFahId75gM-V2su4Qq2wiLOGkU-adI&s=cFz6VP_YIYy_hubsx05WDqpTDyXl0Wnx_RAmAl1dbpg&e= 98c78e82f54be8fb0bb5f02e3ca674fbde10ef34); /* 12 4 */
>
> /* size: 16, cachelines: 1, members: 4 */
> /* last cacheline: 16 bytes */
> };
That means the "-mattr=dwarf" is not effective.
Can you share your clang and llc command to create hello.o?
> [root@jouet bpf]#
>
> Ideas?
>
> [root@jouet bpf]# trace -e open*,hello.c
> clang-6.0: error: unknown argument: '-mattr=dwarf'
> ERROR: unable to compile hello.c
> Hint: Check error message shown above.
> Hint: You can also pre-compile it into .o using:
> clang -target bpf -O2 -c hello.c
> with proper -I and -D options.
> event syntax error: 'hello.c'
> \___ Failed to load hello.c from source: Error when compiling BPF scriptlet
>
> (add -v to see detail)
> Run 'perf list' for a list of valid events
>
> Usage: perf trace [<options>] [<command>]
> or: perf trace [<options>] -- <command> [<options>]
> or: perf trace record [<options>] [<command>]
> or: perf trace record [<options>] -- <command> [<options>]
>
> -e, --event <event> event/syscall selector. use 'perf list' to list available events
> [root@jouet bpf]#
>
> [root@jouet bpf]# trace -v -e open*,hello.c
> bpf: builtin compilation failed: -95, try external compiler
> Kernel build dir is set to /lib/modules/4.17.0-rc5/build
> set env: KBUILD_DIR=/lib/modules/4.17.0-rc5/build
> unset env: KBUILD_OPTS
> include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
> set env: NR_CPUS=4
> set env: LINUX_VERSION_CODE=0x41100
> set env: CLANG_EXEC=/usr/local/bin/clang
> set env: CLANG_OPTIONS=-g -mattr=dwarf
> set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
> set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/include/perf/bpf
> set env: WORKING_DIR=/lib/modules/4.17.0-rc5/build
> set env: CLANG_SOURCE=/home/acme/bpf/hello.c
> llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $KERNEL_INC_OPTIONS $PERF_BPF_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o -
> llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41100 -g -mattr=dwarf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -I/home/acme/lib/include/perf/bpf -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.17.0-rc5/build -c /home/acme/bpf/hello.c -target bpf -O2 -o -
> clang-6.0: error: unknown argument: '-mattr=dwarf'
> ERROR: unable to compile hello.c
> Hint: Check error message shown above.
> Hint: You can also pre-compile it into .o using:
> clang -target bpf -O2 -c hello.c
> with proper -I and -D options.
> event syntax error: 'hello.c'
> \___ Failed to load hello.c from source: Error when compiling BPF scriptlet
>
^ permalink raw reply
* Re: KASAN: slab-out-of-bounds Read in bpf_skb_change_head
From: Daniel Borkmann @ 2018-06-13 23:01 UTC (permalink / raw)
To: syzbot, ast, davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <0000000000009920d3056e8d57a9@google.com>
On 06/14/2018 12:17 AM, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 75d4e704fa8d netdev-FAQ: clarify DaveM's position for stab..
> git tree: bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=16bd21af800000
> kernel config: https://syzkaller.appspot.com/x/.config?x=a601a80fec461d44
> dashboard link: https://syzkaller.appspot.com/bug?extid=567faa843005dda30737
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1039185f800000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e85cff800000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+567faa843005dda30737@syzkaller.appspotmail.com
#syz fix: bpf: reject passing modified ctx to helper functions
^ permalink raw reply
* Re: [RFC PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation
From: Anchal Agarwal @ 2018-06-13 22:20 UTC (permalink / raw)
To: Roger Pau Monn??
Cc: jgross, len.brown, kamatam, eduval, vallish, netdev, fllinden,
x86, rjw, linux-kernel, anchalag, cyberax, mingo, pavel, hpa,
linux-pm, xen-devel, boris.ostrovsky, guruanb, tglx
In-Reply-To: <20180613082428.bjdko4k6cnq6eid3@mac>
Hi Roger,
To answer your question, due to the lack of mentioned commit
(commit 12ea729645ac ("xen/blkback: unmap all persistent grants when
frontend gets disconnected") in the older dom0 kernels(<3.2),resume from
hibernation can fail on guest side. In the absence of the commit,
Persistant Grants are not unmapped immediately when frontend is
disconnected from backend and hence leave the block device in an
inconsistent state. To avoid this unstability and work with larger set
of kernel versions, this approach had been used. Once you don't have
any pending req/resp it is safer for guest to resume from hibernation.
Thanks,
Anchal
On Wed, Jun 13, 2018 at 10:24:28AM +0200, Roger Pau Monn?? wrote:
> On Tue, Jun 12, 2018 at 08:56:13PM +0000, Anchal Agarwal wrote:
> > From: Munehisa Kamata <kamatam@amazon.com>
> >
> > Add freeze and restore callbacks for PM suspend and hibernation support.
> > The freeze handler stops a block-layer queue and disconnect the frontend
> > from the backend while freeing ring_info and associated resources. The
> > restore handler re-allocates ring_info and re-connect to the backedend,
> > so the rest of the kernel can continue to use the block device
> > transparently.Also, the handlers are used for both PM
> > suspend and hibernation so that we can keep the existing suspend/resume
> > callbacks for Xen suspend without modification.
> > If a backend doesn't have commit 12ea729645ac ("xen/blkback: unmap all
> > persistent grants when frontend gets disconnected"), the frontend may see
> > massive amount of grant table warning when freeing resources.
> >
> > [ 36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
> > [ 36.855089] xen:grant_table: WARNING: g.e. 0x112 still in use!
> >
> > In this case, persistent grants would need to be disabled.
> >
> > Ensure no reqs/rsps in rings before disconnecting. When disconnecting
> > the frontend from the backend in blkfront_freeze(), there still may be
> > unconsumed requests or responses in the rings, especially when the
> > backend is backed by network-based device. If the frontend gets
> > disconnected with such reqs/rsps remaining there, it can cause
> > grant warnings and/or losing reqs/rsps by freeing pages afterward.
>
> I'm not sure why having pending requests can cause grant warnings or
> lose of requests. If handled properly this shouldn't be an issue.
> Linux blkfront already does live migration (which also involves a
> reconnection of the frontend) with pending requests and that doesn't
> seem to be an issue.
>
> > This can lead resumed kernel into unrecoverable state like unexpected
> > freeing of grant page and/or hung task due to the lost reqs or rsps.
> > Therefore we have to ensure that there is no unconsumed requests or
> > responses before disconnecting.
>
> Given that we have multiqueue, plus multipage rings, I'm not sure
> waiting for the requests on the rings to complete is a good idea.
>
> Why can't you just disconnect the frontend and requeue all the
> requests in flight? When the frontend connects on resume those
> requests will be queued again.
>
> Thanks, Roger.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply
* Re: [PATCH v2] hv_netvsc: Add per-cpu ethtool stats for netvsc
From: Stephen Hemminger @ 2018-06-13 22:18 UTC (permalink / raw)
To: Yidong Ren
Cc: Yidong Ren, Stephen Hemminger, netdev@vger.kernel.org,
Haiyang Zhang, linux-kernel@vger.kernel.org,
devel@linuxdriverproject.org, David S. Miller
In-Reply-To: <SN6PR2101MB11353820325BBF4712076573847E0@SN6PR2101MB1135.namprd21.prod.outlook.com>
On Wed, 13 Jun 2018 22:03:34 +0000
Yidong Ren <Yidong.Ren@microsoft.com> wrote:
> > From: devel <driverdev-devel-bounces@linuxdriverproject.org> On Behalf
> > Of Stephen Hemminger
> > > +/* statistics per queue (rx/tx packets/bytes) */ #define
> > > +NETVSC_PCPU_STATS_LEN (num_present_cpus() *
> > ARRAY_SIZE(pcpu_stats))
> >
> > Even though Hyper-V/Azure does not support hot plug cpu's it might be
> > better to num_cpu_possible to avoid any possible future surprises.
>
> That will create a very long output (num_cpu_possible = 128 on my machine) for ethtool,
> While doesn't provide additional info.
> num_present_cpus() would cause problem only if someone removed cpu
> between netvsc_get_sset_count() and netvsc_get_strings() and netvsc_get_ethtool_stats().
>
> An alternative way could be: Check all stats, and only output if not zero.
> This need to be done in two pass. First pass to get the correct count, second pass to print the number.
> Is there an elegant way to do this?
Ok, but there is a race between getting names and getting statistics.
If a cpu was added/removed then statistics would not match.
^ permalink raw reply
* KASAN: slab-out-of-bounds Read in bpf_skb_change_head
From: syzbot @ 2018-06-13 22:17 UTC (permalink / raw)
To: ast, daniel, davem, linux-kernel, netdev, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: 75d4e704fa8d netdev-FAQ: clarify DaveM's position for stab..
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16bd21af800000
kernel config: https://syzkaller.appspot.com/x/.config?x=a601a80fec461d44
dashboard link: https://syzkaller.appspot.com/bug?extid=567faa843005dda30737
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1039185f800000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e85cff800000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+567faa843005dda30737@syzkaller.appspotmail.com
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KASAN: slab-out-of-bounds in ____bpf_skb_change_head
net/core/filter.c:2921 [inline]
BUG: KASAN: slab-out-of-bounds in bpf_skb_change_head+0x80c/0x9d0
net/core/filter.c:2917
Read of size 4 at addr ffff8801d94ea680 by task syz-executor991/4551
CPU: 0 PID: 4551 Comm: syz-executor991 Not tainted 4.17.0-rc7+ #38
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
____bpf_skb_change_head net/core/filter.c:2921 [inline]
bpf_skb_change_head+0x80c/0x9d0 net/core/filter.c:2917
Allocated by task 0:
(stack is not available)
Freed by task 0:
(stack is not available)
The buggy address belongs to the object at ffff8801d94ea580
which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 24 bytes to the right of
232-byte region [ffff8801d94ea580, ffff8801d94ea668)
The buggy address belongs to the page:
page:ffffea0007653a80 count:1 mapcount:0 mapping:ffff8801d94ea080 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801d94ea080 0000000000000000 000000010000000c
raw: ffffea0006b2e720 ffffea00076595e0 ffff8801d9450e40 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8801d94ea580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801d94ea600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801d94ea680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8801d94ea700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801d94ea780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply
* RE: [PATCH v2] hv_netvsc: Add per-cpu ethtool stats for netvsc
From: Yidong Ren @ 2018-06-13 22:03 UTC (permalink / raw)
To: Stephen Hemminger, Yidong Ren
Cc: Stephen Hemminger, netdev@vger.kernel.org, Haiyang Zhang,
linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
David S. Miller
In-Reply-To: <20180613144813.708f26bf@xeon-e3>
> From: devel <driverdev-devel-bounces@linuxdriverproject.org> On Behalf
> Of Stephen Hemminger
> > +/* statistics per queue (rx/tx packets/bytes) */ #define
> > +NETVSC_PCPU_STATS_LEN (num_present_cpus() *
> ARRAY_SIZE(pcpu_stats))
>
> Even though Hyper-V/Azure does not support hot plug cpu's it might be
> better to num_cpu_possible to avoid any possible future surprises.
That will create a very long output (num_cpu_possible = 128 on my machine) for ethtool,
While doesn't provide additional info.
num_present_cpus() would cause problem only if someone removed cpu
between netvsc_get_sset_count() and netvsc_get_strings() and netvsc_get_ethtool_stats().
An alternative way could be: Check all stats, and only output if not zero.
This need to be done in two pass. First pass to get the correct count, second pass to print the number.
Is there an elegant way to do this?
^ permalink raw reply
* Re: [PATCH v2] hv_netvsc: Add per-cpu ethtool stats for netvsc
From: Stephen Hemminger @ 2018-06-13 21:48 UTC (permalink / raw)
To: Yidong Ren
Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
David S. Miller, devel, netdev, linux-kernel
In-Reply-To: <20180613193608.444-1-yidren@linuxonhyperv.com>
On Wed, 13 Jun 2018 12:36:08 -0700
Yidong Ren <yidren@linuxonhyperv.com> wrote:
> From: Yidong Ren <yidren@microsoft.com>
>
> This patch implements following ethtool stats fields for netvsc:
> cpu<n>_tx/rx_packets/bytes
> cpu<n>_vf_tx/rx_packets/bytes
>
> Corresponding per-cpu counters exist in current code. Exposing these
> counters will help troubleshooting performance issues.
>
> Signed-off-by: Yidong Ren <yidren@microsoft.com>
> ---
> Changes in v2:
> - Remove cpp style comment
> - Resubmit after freeze
>
> drivers/net/hyperv/hyperv_net.h | 11 +++++
> drivers/net/hyperv/netvsc_drv.c | 104 +++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 113 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 23304ac..c825353 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -873,6 +873,17 @@ struct netvsc_ethtool_stats {
> unsigned long wake_queue;
> };
>
> +struct netvsc_ethtool_pcpu_stats {
> + u64 rx_packets;
> + u64 rx_bytes;
> + u64 tx_packets;
> + u64 tx_bytes;
> + u64 vf_rx_packets;
> + u64 vf_rx_bytes;
> + u64 vf_tx_packets;
> + u64 vf_tx_bytes;
> +};
> +
> struct netvsc_vf_pcpu_stats {
> u64 rx_packets;
> u64 rx_bytes;
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 7b18a8c..6803aae 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -1105,6 +1105,66 @@ static void netvsc_get_vf_stats(struct net_device *net,
> }
> }
>
> +static void netvsc_get_pcpu_stats(struct net_device *net,
> + struct netvsc_ethtool_pcpu_stats
> + __percpu *pcpu_tot)
> +{
> + struct net_device_context *ndev_ctx = netdev_priv(net);
> + struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);
> + int i;
> +
> + /* fetch percpu stats of vf */
> + for_each_possible_cpu(i) {
> + const struct netvsc_vf_pcpu_stats *stats =
> + per_cpu_ptr(ndev_ctx->vf_stats, i);
> + struct netvsc_ethtool_pcpu_stats *this_tot =
> + per_cpu_ptr(pcpu_tot, i);
> + unsigned int start;
> +
> + do {
> + start = u64_stats_fetch_begin_irq(&stats->syncp);
> + this_tot->vf_rx_packets = stats->rx_packets;
> + this_tot->vf_tx_packets = stats->tx_packets;
> + this_tot->vf_rx_bytes = stats->rx_bytes;
> + this_tot->vf_tx_bytes = stats->tx_bytes;
> + } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
> + this_tot->rx_packets = this_tot->vf_rx_packets;
> + this_tot->tx_packets = this_tot->vf_tx_packets;
> + this_tot->rx_bytes = this_tot->vf_rx_bytes;
> + this_tot->tx_bytes = this_tot->vf_tx_bytes;
> + }
> +
> + /* fetch percpu stats of netvsc */
> + for (i = 0; i < nvdev->num_chn; i++) {
> + const struct netvsc_channel *nvchan = &nvdev->chan_table[i];
> + const struct netvsc_stats *stats;
> + struct netvsc_ethtool_pcpu_stats *this_tot =
> + per_cpu_ptr(pcpu_tot, nvchan->channel->target_cpu);
> + u64 packets, bytes;
> + unsigned int start;
> +
> + stats = &nvchan->tx_stats;
> + do {
> + start = u64_stats_fetch_begin_irq(&stats->syncp);
> + packets = stats->packets;
> + bytes = stats->bytes;
> + } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
> +
> + this_tot->tx_bytes += bytes;
> + this_tot->tx_packets += packets;
> +
> + stats = &nvchan->rx_stats;
> + do {
> + start = u64_stats_fetch_begin_irq(&stats->syncp);
> + packets = stats->packets;
> + bytes = stats->bytes;
> + } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
> +
> + this_tot->rx_bytes += bytes;
> + this_tot->rx_packets += packets;
> + }
> +}
> +
> static void netvsc_get_stats64(struct net_device *net,
> struct rtnl_link_stats64 *t)
> {
> @@ -1202,6 +1262,23 @@ static const struct {
> { "rx_no_memory", offsetof(struct netvsc_ethtool_stats, rx_no_memory) },
> { "stop_queue", offsetof(struct netvsc_ethtool_stats, stop_queue) },
> { "wake_queue", offsetof(struct netvsc_ethtool_stats, wake_queue) },
> +}, pcpu_stats[] = {
> + { "cpu%u_rx_packets",
> + offsetof(struct netvsc_ethtool_pcpu_stats, rx_packets) },
> + { "cpu%u_rx_bytes",
> + offsetof(struct netvsc_ethtool_pcpu_stats, rx_bytes) },
> + { "cpu%u_tx_packets",
> + offsetof(struct netvsc_ethtool_pcpu_stats, tx_packets) },
> + { "cpu%u_tx_bytes",
> + offsetof(struct netvsc_ethtool_pcpu_stats, tx_bytes) },
> + { "cpu%u_vf_rx_packets",
> + offsetof(struct netvsc_ethtool_pcpu_stats, vf_rx_packets) },
> + { "cpu%u_vf_rx_bytes",
> + offsetof(struct netvsc_ethtool_pcpu_stats, vf_rx_bytes) },
> + { "cpu%u_vf_tx_packets",
> + offsetof(struct netvsc_ethtool_pcpu_stats, vf_tx_packets) },
> + { "cpu%u_vf_tx_bytes",
> + offsetof(struct netvsc_ethtool_pcpu_stats, vf_tx_bytes) },
> }, vf_stats[] = {
> { "vf_rx_packets", offsetof(struct netvsc_vf_pcpu_stats, rx_packets) },
> { "vf_rx_bytes", offsetof(struct netvsc_vf_pcpu_stats, rx_bytes) },
> @@ -1213,6 +1290,9 @@ static const struct {
> #define NETVSC_GLOBAL_STATS_LEN ARRAY_SIZE(netvsc_stats)
> #define NETVSC_VF_STATS_LEN ARRAY_SIZE(vf_stats)
>
> +/* statistics per queue (rx/tx packets/bytes) */
> +#define NETVSC_PCPU_STATS_LEN (num_present_cpus() * ARRAY_SIZE(pcpu_stats))
Even though Hyper-V/Azure does not support hot plug cpu's it might be better
to num_cpu_possible to avoid any possible future surprises.
^ permalink raw reply
* More manual-page fixups.
From: Eric S. Raymond @ 2018-06-13 21:31 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 772 bytes --]
John Linville asked me to ship the ethtool.8 patch to this list.
That's the first 0001 patch in the enclosures and should be applied
to the ethtool repo.
The others are more syntax fixups for the iproute2 repo. Some
are things like list syntax errors that you don't notice when
rendering via groff because it has no error checking. Some others
unroll the pseudo-BNF used in some synopses to the standard form
described by DocBook synopsis markup.
There's more work to be done on this. Expect more patches, now
that I know where to send them.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.
[-- Attachment #2: 0001-In-ethtool.8-remove-superfluous-and-incorrect-c.patch --]
[-- Type: text/x-diff, Size: 724 bytes --]
>From 9496a4d44cbc569e15bf656e30bfd2fc0dfc64ce Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Wed, 13 Jun 2018 17:16:21 -0400
Subject: [PATCH] In ethtool.8, remove superfluous and incorrect \c.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
---
ethtool.8.in | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ethtool.8.in b/ethtool.8.in
index 04c53c9..57b2893 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -684,7 +684,7 @@ Disable (wake on nothing). This option clears all previous options.
T}
.TE
.TP
-.B sopass \*(MA\c
+.B sopass \*(MA
Sets the SecureOn\[tm] password. The argument to this option must be 6
bytes in Ethernet MAC hex format (\*(MA).
.PP
--
2.17.1
[-- Attachment #3: 0001-Clean-up-markup-in-tc-bpf.8.patch --]
[-- Type: text/x-diff, Size: 15668 bytes --]
>From 15472b7be6c619e5eb35037da70b35d066544838 Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Tue, 12 Jun 2018 22:32:24 -0400
Subject: [PATCH 1/5] Clean up markup in tc-bpf.8
This page had multiple issues:
1. .in +4/.nf....fi/.in was used where .EX/.EE was called for.
2. .SS and running text shouldn't have been used in Synopsis section.
Inline text has been moved to PARAMETERS.
3. Under PATAMETERS, adjacent .SS tags could not be lifted to XML.
.TP is now used in that section instead.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
---
man/man8/tc-bpf.8 | 239 +++++++++++++++++++++++-----------------------
1 file changed, 117 insertions(+), 122 deletions(-)
diff --git a/man/man8/tc-bpf.8 b/man/man8/tc-bpf.8
index d311f295..f883a912 100644
--- a/man/man8/tc-bpf.8
+++ b/man/man8/tc-bpf.8
@@ -3,7 +3,6 @@
BPF \- BPF programmable classifier and actions for ingress/egress
queueing disciplines
.SH SYNOPSIS
-.SS eBPF classifier (filter) or action:
.B tc filter ... bpf
[
.B object-file
@@ -28,7 +27,7 @@ POLICE_SPEC ] [
ACTION_SPEC ] [
.B classid
CLASSID ]
-.br
+
.B tc action ... bpf
[
.B object-file
@@ -40,7 +39,6 @@ UDS_FILE ] [
.B verbose
]
-.SS cBPF classifier (filter) or action:
.B tc filter ... bpf
[
.B bytecode-file
@@ -53,7 +51,7 @@ POLICE_SPEC ] [
ACTION_SPEC ] [
.B classid
CLASSID ]
-.br
+
.B tc action ... bpf
[
.B bytecode-file
@@ -110,7 +108,9 @@ are pushed into one map and use another one for dynamically load balancing
traffic based on the determined load, just to provide a few examples.
.SH PARAMETERS
-.SS object-file
+The first pair of filter/action invocations is for eBPF, the second for cBPF.
+.TP
+object-file
points to an object file that has an executable and linkable format (ELF)
and contains eBPF opcodes and eBPF map definitions. The LLVM compiler
infrastructure with
@@ -120,16 +120,16 @@ files that can be passed to the eBPF classifier (more details in the
.B EXAMPLES
section). This option is mandatory when an eBPF classifier or action is
to be loaded.
-
-.SS section
+.TP
+section
is the name of the ELF section from the object file, where the eBPF
classifier or action resides. By default the section name for the
classifier is called "classifier", and for the action "action". Given
that a single object file can contain multiple classifier and actions,
the corresponding section name needs to be specified, if it differs
from the defaults.
-
-.SS export
+.TP
+export
points to a Unix domain socket file. In case the eBPF object file also
contains a section named "maps" with eBPF map specifications, then the
map file descriptors can be handed off via the Unix domain socket to
@@ -139,18 +139,18 @@ import, that uses them for calling into
.B bpf(2)
system call to read out or update eBPF map data from user space, for
example, for monitoring purposes or to push down new policies.
-
-.SS verbose
+.TP
+verbose
if set, it will dump the eBPF verifier output, even if loading the eBPF
program was successful. By default, only on error, the verifier log is
being emitted to the user.
-
-.SS direct-action | da
+.TP
+direct-action | da
instructs eBPF classifier to not invoke external TC actions, instead use the
TC actions return codes (\fBTC_ACT_OK\fR, \fBTC_ACT_SHOT\fR etc.) for
classifiers.
-
-.SS skip_hw | skip_sw
+.TP
+skip_hw | skip_sw
hardware offload control flags. By default TC will try to offload
filters to hardware if possible.
.B skip_hw
@@ -159,21 +159,22 @@ explicitly disables the attempt to offload.
forces the offload and disables running the eBPF program in the kernel.
If hardware offload is not possible and this flag was set kernel will
report an error and filter will not be installed at all.
-
-.SS police
+.TP
+police
is an optional parameter for an eBPF/cBPF classifier that specifies a
police in
.B tc(1)
which is attached to the classifier, for example, on an ingress qdisc.
-
-.SS action
+.TP
+action
is an optional parameter for an eBPF/cBPF classifier that specifies a
subsequent action in
.B tc(1)
which is attached to a classifier.
-
-.SS classid
-.SS flowid
+.TP
+classid
+.TP
+flowid
provides the default traffic control class identifier for this eBPF/cBPF
classifier. The default class identifier can also be overwritten by the
return code of the eBPF/cBPF program. A default return code of
@@ -184,8 +185,8 @@ a return code other than these two will override the default classid. This
allows for efficient, non-linear classification with only a single eBPF/cBPF
program as opposed to having multiple individual programs for various class
identifiers which would need to reparse packet contents.
-
-.SS bytecode
+.TP
+bytecode
is being used for loading cBPF classifier and actions only. The cBPF bytecode
is directly passed as a text string in the form of
.B \'s,c t f k,c t f k,c t f k,...\'
@@ -211,8 +212,8 @@ that ships with the Linux kernel source tree under
or
.B bytecode-file
option is mandatory when a cBPF classifier or action is to be loaded.
-
-.SS bytecode-file
+.TP
+bytecode-file
also being used to load a cBPF classifier or action. It's effectively the
same as
.B bytecode
@@ -224,7 +225,7 @@ rather resides in a text file.
A full blown example including eBPF agent code can be found inside the
iproute2 source package under:
.B examples/bpf/
-
+.sp
As prerequisites, the kernel needs to have the eBPF system call namely
.B bpf(2)
enabled and ships with
@@ -234,9 +235,11 @@ and
kernel modules for the traffic control subsystem. To enable eBPF/eBPF JIT
support, depending which of the two the given architecture supports:
-.in +4n
+.RS
+.EX
.B echo 1 > /proc/sys/net/core/bpf_jit_enable
-.in
+.EE
+.RE
A given restricted C file can be compiled via LLVM as:
@@ -247,24 +250,24 @@ A given restricted C file can be compiled via LLVM as:
The compiler invocation might still simplify in future, so for now,
it's quite handy to alias this construct in one way or another, for
example:
-.in +4n
-.nf
-.sp
+.RS
+.EX
+
__bcc() {
clang -O2 -emit-llvm -c $1 -o - | \\
llc -march=bpf -filetype=obj -o "`basename $1 .c`.o"
}
alias bcc=__bcc
-.fi
-.in
+.EE
+.RE
A minimal, stand-alone unit, which matches on all traffic with the
default classid (return code of -1) looks like:
-.in +4n
-.nf
-.sp
+.RS
+.EX
+
#include <linux/bpf.h>
#ifndef __section
@@ -277,8 +280,8 @@ __section("classifier") int cls_main(struct __sk_buff *skb)
}
char __license[] __section("license") = "GPL";
-.fi
-.in
+.EE
+.RE
More examples can be found further below in subsection
.B eBPF PROGRAMMING
@@ -299,9 +302,9 @@ example
.B objdump(1)
for inspecting ELF section headers:
-.in +4n
-.nf
-.sp
+.RS
+.EX
+
objdump -h bpf.o
[...]
3 classifier 000007f8 0000000000000000 0000000000000000 00000040 2**3
@@ -315,56 +318,56 @@ objdump -h bpf.o
7 license 00000004 0000000000000000 0000000000000000 00000988 2**0
CONTENTS, ALLOC, LOAD, DATA
[...]
-.fi
-.in
+.EE
+.RE
Adding an eBPF classifier from an object file that contains a classifier
in the default ELF section is trivial (note that instead of "object-file"
also shortcuts such as "obj" can be used):
-.in +4n
+.RS
.B bcc bpf.c
.br
.B tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1
-.in
+.RE
In case the classifier resides in ELF section "mycls", then that same
command needs to be invoked as:
-.in +4n
+.RS
.B tc filter add dev em1 parent 1: bpf obj bpf.o sec mycls flowid 1:1
-.in
+.RE
Dumping the classifier configuration will tell the location of the
classifier, in other words that it's from object file "bpf.o" under
section "mycls":
-.in +4n
+.RS
.B tc filter show dev em1
.br
.B filter parent 1: protocol all pref 49152 bpf
.br
.B filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid 1:1 bpf.o:[mycls]
-.in
+.RE
The same program can also be installed on ingress qdisc side as opposed
to egress ...
-.in +4n
+.RS
.B tc qdisc add dev em1 handle ffff: ingress
.br
.B tc filter add dev em1 parent ffff: bpf obj bpf.o sec mycls flowid ffff:1
-.in
+.RE
\&... and again dumped from there:
-.in +4n
+.RS
.B tc filter show dev em1 parent ffff:
.br
.B filter protocol all pref 49152 bpf
.br
.B filter protocol all pref 49152 bpf handle 0x1 flowid ffff:1 bpf.o:[mycls]
-.in
+.RE
Attaching a classifier and action on ingress has the restriction that
it doesn't have an actual underlying queueing discipline. What ingress
@@ -382,15 +385,13 @@ object file within various sections. In that case, non-default section
names must be provided, which is the case for both actions in this
example:
-.in +4n
-.B tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1 \e
-.br
-.in +25n
-.B action bpf obj bpf.o sec action-mark \e
-.br
-.B action bpf obj bpf.o sec action-rand ok
-.in -25n
-.in -4n
+.RS
+.EX
+tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1 \e
+ action bpf obj bpf.o sec action-mark \e
+ action bpf obj bpf.o sec action-rand ok
+.EE
+.RE
The advantage of this is that the classifier and the two actions can
then share eBPF maps with each other, if implemented in the programs.
@@ -421,17 +422,14 @@ this fd-owner shell, they can terminate and restart without losing eBPF
maps file descriptors. Example invocation with the previous classifier and
action mixture:
-.in +4n
-.B tc exec bpf imp /tmp/bpf
-.br
-.B tc filter add dev em1 parent 1: bpf obj bpf.o exp /tmp/bpf flowid 1:1 \e
-.br
-.in +25n
-.B action bpf obj bpf.o sec action-mark \e
-.br
-.B action bpf obj bpf.o sec action-rand ok
-.in -25n
-.in -4n
+.RS
+.EX
+tc exec bpf imp /tmp/bpf
+tc filter add dev em1 parent 1: bpf obj bpf.o exp /tmp/bpf flowid 1:1 \e
+ action bpf obj bpf.o sec action-mark \e
+ action bpf obj bpf.o sec action-rand ok
+.EE
+.RE
Assuming that eBPF maps are shared with classifier and actions, it's
enough to export them once, for example, from within the classifier
@@ -454,9 +452,8 @@ member of
The environment in this example looks as follows:
-.in +4n
-.nf
-.sp
+.RS
+.EX
sh# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6
@@ -468,8 +465,8 @@ sh# ls -la /proc/self/fd
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
sh# my_bpf_agent
-.fi
-.in
+.EE
+.RE
eBPF agents are very useful in that they can prepopulate eBPF maps from
user space, monitor statistics via maps and based on that feedback, for
@@ -495,7 +492,7 @@ from the iproute2 source package for a fully fledged flow dissector
example to better demonstrate some of the possibilities with eBPF.
Supported 32 bit classifier return codes from the C program and their meanings:
-.in +4n
+.RS
.B 0
, denotes a mismatch
.br
@@ -505,12 +502,12 @@ Supported 32 bit classifier return codes from the C program and their meanings:
.B else
, everything else will override the default classid to provide a facility for
non-linear matching
-.in
+.RE
Supported 32 bit action return codes from the C program and their meanings (
.B linux/pkt_cls.h
):
-.in +4n
+.RS
.B TC_ACT_OK (0)
, will terminate the packet processing pipeline and allows the packet to
proceed
@@ -532,7 +529,7 @@ from the beginning
.br
.B else
, everything else is an unspecified return code
-.in
+.RE
Both classifier and action return codes are supported in eBPF and cBPF
programs.
@@ -543,9 +540,8 @@ from a container, have previously been marked in interval [0, 255]. The
program keeps statistics on different marks for user space and maps the
classid to the root qdisc with the marking itself as the minor handle:
-.in +4n
-.nf
-.sp
+.RS
+.EX
#include <stdint.h>
#include <asm/types.h>
@@ -595,17 +591,17 @@ __section("cls") int cls_main(struct __sk_buff *skb)
}
char __license[] __section("license") = "GPL";
-.fi
-.in
+.EE
+.RE
Another small example is a port redirector which demuxes destination port
80 into the interval [8080, 8087] steered by RSS, that can then be attached
to ingress qdisc. The exercise of adding the egress counterpart and IPv6
support is left to the reader:
-.in +4n
-.nf
-.sp
+.RS
+.EX
+
#include <asm/types.h>
#include <asm/byteorder.h>
@@ -664,16 +660,15 @@ __section("lb") int lb_main(struct __sk_buff *skb)
}
char __license[] __section("license") = "GPL";
-.fi
-.in
+.EE
+.RE
The related helper header file
.B helpers.h
in both examples was:
-.in +4n
-.nf
-.sp
+.RS
+.EX
/* Misc helper macros. */
#define __section(x) __attribute__((section(x), used))
#define offsetof(x, y) __builtin_offsetof(x, y)
@@ -704,8 +699,8 @@ unsigned long long load_byte(void *skb, unsigned long long off)
asm ("llvm.bpf.load.byte");
unsigned long long load_half(void *skb, unsigned long long off)
asm ("llvm.bpf.load.half");
-.fi
-.in
+.EE
+.RE
Best practice, we recommend to only have a single eBPF classifier loaded
in tc and perform
@@ -733,9 +728,11 @@ the kernel log, which can be read via
.B dmesg(1)
:
-.in +4n
+.RS
+.EX
.B echo 2 > /proc/sys/net/core/bpf_jit_enable
-.in
+.EE
+.RE
The Linux kernel source tree ships additionally under
.B tools/net/
@@ -744,18 +741,18 @@ a small helper called
that reads out the opcode image dump from the kernel log and dumps the
resulting disassembly:
-.in +4n
+.RS
.B bpf_jit_disasm -o
-.in
+.RE
Other than that, the Linux kernel also contains an extensive eBPF/cBPF
test suite module called
.B test_bpf
\&. Upon ...
-.in +4n
+.RS
.B modprobe test_bpf
-.in
+.RE
\&... it performs a diversity of test cases and dumps the results into
the kernel log that can be inspected with
@@ -786,9 +783,9 @@ The raw interface with tc takes opcodes directly. For example, the most
minimal classifier matching on every packet resulting in the default
classid of 1:1 looks like:
-.in +4n
+.RS
.B tc filter add dev em1 parent 1: bpf bytecode '1,6 0 0 4294967295,' flowid 1:1
-.in
+.RE
The first decimal of the bytecode sequence denotes the number of subsequent
4-tuples of cBPF opcodes. As mentioned, such a 4-tuple consists of
@@ -813,9 +810,8 @@ internal classic BPF compiler, his code derived here for usage with
.B tc(8)
:
-.in +4n
-.nf
-.sp
+.RS
+.EX
#include <pcap.h>
#include <stdio.h>
@@ -850,25 +846,25 @@ int main(int argc, char **argv)
pcap_freecode(&prog);
return 0;
}
-.fi
-.in
+.EE
+.RE
Given this small helper, any
.B tcpdump(8)
filter expression can be abused as a classifier where a match will
result in the default classid:
-.in +4n
+.RS
.B bpftool EN10MB 'tcp[tcpflags] & tcp-syn != 0' > /var/bpf/tcp-syn
.br
.B tc filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn flowid 1:1
-.in
+.RE
Basically, such a minimal generator is equivalent to:
-.in +4n
+.RS
.B tcpdump -iem1 -ddd 'tcp[tcpflags] & tcp-syn != 0' | tr '\\\\n' ',' > /var/bpf/tcp-syn
-.in
+.RE
Since
.B libpcap
@@ -888,25 +884,24 @@ for classifying IPv4/TCP packets, saved in a text file called
.B foobar
:
-.in +4n
-.nf
-.sp
+.RS
+.EX
ldh [12]
jne #0x800, drop
ldb [23]
jneq #6, drop
ret #-1
drop: ret #0
-.fi
-.in
+.EE
+.RE
Similarly, such a classifier can be loaded as:
-.in +4n
+.RS
.B bpf_asm foobar > /var/bpf/tcp-syn
.br
.B tc filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn flowid 1:1
-.in
+.RE
For BPF classifiers, the Linux kernel provides additionally under
.B tools/net/
--
2.17.1
[-- Attachment #4: 0002-In-tcp-nat.8-change-command-synopsis-to-a-form-that-.patch --]
[-- Type: text/x-diff, Size: 2853 bytes --]
>From 69842ef64825014c9c7f6054783d5af172f7879d Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Tue, 12 Jun 2018 22:58:39 -0400
Subject: [PATCH 2/5] In tcp-nat.8, change command synopsis to a form that can
be parsed.
(This means getting rid of the pseudo-BNF := notation.)
Also, correct a misspelling.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
---
man/man8/tc-nat.8 | 45 ++++++++++++++++++++-------------------------
1 file changed, 20 insertions(+), 25 deletions(-)
diff --git a/man/man8/tc-nat.8 b/man/man8/tc-nat.8
index fdcc052a..3617ac6c 100644
--- a/man/man8/tc-nat.8
+++ b/man/man8/tc-nat.8
@@ -6,22 +6,9 @@ nat - stateless native address translation action
.in +8
.ti -8
.BR tc " ... " "action nat"
-.I DIRECTION OLD NEW
+.RB "{ " ingress " | " egress " }"
+old-addr new-addr
-.ti -8
-.IR DIRECTION " := { "
-.BR ingress " | " egress " }"
-
-.ti -8
-.IR OLD " := " IPV4_ADDR_SPEC
-
-.ti -8
-.IR NEW " := " IPV4_ADDR_SPEC
-
-.ti -8
-.IR IPV4_ADDR_SPEC " := { "
-.BR default " | " any " | " all " | "
-\fIin_addr\fR[\fB/\fR{\fIprefix\fR|\fInetmask\fR}]
.SH DESCRIPTION
The
.B nat
@@ -39,38 +26,46 @@ Translate destination addresses, i.e. perform DNAT.
.B egress
Translate source addresses, i.e. perform SNAT.
.TP
-.I OLD
+.I old-addr
Specifies addresses which should be translated.
.TP
-.I NEW
+.I new-addr
Specifies addresses which
-.I OLD
+.I old-addr
should be translated into.
.SH NOTES
The accepted address format in
-.IR OLD " and " NEW
+.IR old-addr " and " new-addr
is quite flexible. It may either consist of one of the keywords
.BR default ", " any " or " all ,
representing the all-zero IP address or a combination of IP address and netmask
or prefix length separated by a slash
.RB ( / )
sign. In any case, the mask (or prefix length) value of
-.I OLD
+.I old-addr
is used for
-.I NEW
+.I new-addr
as well so that a one-to-one mapping of addresses is assured.
+.PP
+The most general form is
+
+.RS
+.BR default " | " any " | " all " | "
+\fIin_addr\fR[\fB/\fR{\fIprefix\fR|\fInetmask\fR}]
+.RE
+
Address translation is done using a combination of binary operations. First, the
original (source or destination) address is matched against the value of
-.IR OLD .
+.IR old-addr .
If the original address fits, the new address is created by taking the leading
bits from
-.I NEW
+.I new-addr
(defined by the netmask of
-.IR OLD )
+.IR old-addr )
and taking the remaining bits from the original address.
-There is rudimental support for upper layer protocols, namely TCP, UDP and ICMP.
+There is rudimentary support for upper layer protocols, namely TCP, UDP and ICMP.
While for the first two only checksum recalculation is performed, the action
also takes care of embedded IP headers in ICMP packets by translating the
respective address therein, too.
--
2.17.1
[-- Attachment #5: 0003-In-tc-pie.8-fix-up-list-and-example-syntax.patch --]
[-- Type: text/x-diff, Size: 4086 bytes --]
>From 4d787557880a25967da7dfdb070eea8c6cf9c635 Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Wed, 13 Jun 2018 00:08:55 -0400
Subject: [PATCH 3/5] In tc-pie.8, fix up list and example syntax.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
---
man/man8/tc-pie.8 | 56 ++++++++++++++++++++++++++++-------------------
1 file changed, 33 insertions(+), 23 deletions(-)
diff --git a/man/man8/tc-pie.8 b/man/man8/tc-pie.8
index 278293bd..64192b86 100644
--- a/man/man8/tc-pie.8
+++ b/man/man8/tc-pie.8
@@ -27,10 +27,14 @@ int ] [
Proportional Integral controller-Enhanced (PIE) is a control theoretic active
queue management scheme. It is based on the proportional integral controller but
aims to control delay. The main design goals are
- o Low latency control
- o High link utilization
- o Simple implementation
- o Guaranteed stability and fast responsiveness
+.IP
+Low latency control
+.IP
+High link utilization
+.IP
+Simple implementation
+.IP
+Guaranteed stability and fast responsiveness
.SH ALGORITHM
PIE is designed to control delay effectively. First, an average dequeue rate is
@@ -43,10 +47,11 @@ PIE makes adjustments to the probability based on the trend of the delay i.e.
whether it is going up or down.The delay converges quickly to the target value
specified.
-alpha and beta are statically chosen parameters chosen to control the drop probability
-growth and are determined through control theoretic approaches. alpha determines how
-the deviation between the current and target latency changes probability. beta exerts
-additional adjustments depending on the latency trend.
+alpha and beta are statically chosen parameters chosen to control the
+drop probability growth and are determined through control theoretic
+approaches. alpha determines how the deviation between the current and
+target latency changes probability. beta exerts additional adjustments
+depending on the latency trend.
The drop probabilty is used to mark packets in ecn mode. However, as in RED,
beyond 10% packets are dropped based on this probability. The bytemode is used
@@ -55,22 +60,24 @@ to drop packets proportional to the packet size.
Additional details can be found in the paper cited below.
.SH PARAMETERS
-.SS limit
+.TP
+limit
limit on the queue size in packets. Incoming packets are dropped when this limit
is reached. Default is 1000 packets.
-
-.SS target
+.TP
+target
is the expected queue delay. The default target delay is 20ms.
-
-.SS tupdate
+.TP
+tupdate
is the frequency at which the system drop probability is calculated. The default is 30ms.
-
-.SS alpha
-.SS beta
+.TP
+alpha
+.TP
+beta
alpha and beta are parameters chosen to control the drop probability. These
should be in the range between 0 and 32.
-
-.SS ecn | noecn
+.TP
+ecn | noecn
is used to mark packets instead of dropping
.B ecn
to turn on ecn mode,
@@ -78,8 +85,8 @@ to turn on ecn mode,
to turn off ecn mode. By default,
.B ecn
is turned off.
-
-.SS bytemode | nobytemode
+.TP
+bytemode | nobytemode
is used to scale drop probability proportional to packet size
.B bytemode
to turn on bytemode,
@@ -89,6 +96,7 @@ to turn off bytemode. By default,
is turned off.
.SH EXAMPLES
+.EX
# tc qdisc add dev eth0 root pie
# tc -s qdisc show
qdisc pie 8034: dev eth0 root refcnt 2 limit 200p target 19000us tupdate 29000us alpha 2 beta 20
@@ -113,7 +121,7 @@ is turned off.
backlog 33728b 32p requeues 0
prob 0.102262 delay 24000us avg_dq_rate 1464840
pkts_in 2468 overlimit 214 dropped 0 maxq 192 ecn_mark 71
-
+.EE
.SH SEE ALSO
.BR tc (8),
@@ -121,8 +129,10 @@ is turned off.
.BR tc-red (8)
.SH SOURCES
- o IETF draft submission is at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00
- o IEEE Conference on High Performance Switching and Routing 2013 : "PIE: A
+.IP
+IETF draft submission is at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00
+.IP
+IEEE Conference on High Performance Switching and Routing 2013 : "PIE: A
Lightweight Control Scheme to Address the Bufferbloat Problem"
.SH AUTHORS
--
2.17.1
[-- Attachment #6: 0004-Fix-list-syntax-errors-in-tc-pedit.8.patch --]
[-- Type: text/x-diff, Size: 892 bytes --]
>From 053a817b72cb942d47bb3fbb45385e04d8b6f378 Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Wed, 13 Jun 2018 09:49:12 -0400
Subject: [PATCH 4/5] Fix list syntax errors in tc-pedit.8.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
---
man/man8/tc-pedit.8 | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/man/man8/tc-pedit.8 b/man/man8/tc-pedit.8
index bbd725c4..53d07fec 100644
--- a/man/man8/tc-pedit.8
+++ b/man/man8/tc-pedit.8
@@ -224,17 +224,17 @@ The supported keywords for
.I IP6HDR_FIELD
are:
.RS
-.TP
+.IP ""
.B src
-.TQ
+.IP ""
.B dst
-.TQ
+.IP ""
.B flow_lbl
-.TQ
+.IP ""
.B payload_len
-.TQ
+.IP ""
.B nexthdr
-.TQ
+.IP ""
.B hoplimit
.RE
.TP
@@ -250,6 +250,7 @@ are:
Source or destination TCP port number, a 16-bit value.
.TP
.B flags
+(To be documented)
.RE
.TP
.BI udp " UDPHDR_FIELD"
--
2.17.1
[-- Attachment #7: 0005-In-devlink.8-translate-unparseable-callout-syntax-to.patch --]
[-- Type: text/x-diff, Size: 1099 bytes --]
>From 643db903b3f6cd90dce5cc90daafcd6830e83a8a Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Wed, 13 Jun 2018 12:45:24 -0400
Subject: [PATCH 5/5] In devlink.8, translate unparseable callout syntax to
parseable form.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
---
man/man8/devlink.8 | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 7986310f..efc6e625 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -7,7 +7,7 @@ devlink \- Devlink tool
.in +8
.ti -8
.B devlink
-.RI "[ " OPTIONS " ] " OBJECT " { " COMMAND " | "
+.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource " } { " COMMAND " | "
.BR help " }"
.sp
@@ -17,18 +17,6 @@ devlink \- Devlink tool
.BI "-batch " filename
.sp
-.ti -8
-.IR OBJECT " := { "
-.BR dev " | " port " | " monitor " | " sb " | " resource " }"
-.sp
-
-.ti -8
-.IR OPTIONS " := { "
-\fB\-V\fR[\fIersion\fR] |
-\fB\-n\fR[\fIno-nice-names\fR] }
-\fB\-j\fR[\fIjson\fR] }
-\fB\-p\fR[\fIpretty\fR] }
-
.SH OPTIONS
.TP
--
2.17.1
^ permalink raw reply related
* RE: [PATCH v2] hv_netvsc: Add per-cpu ethtool stats for netvsc
From: Yidong Ren @ 2018-06-13 21:07 UTC (permalink / raw)
To: Eric Dumazet, Yidong Ren, KY Srinivasan, Haiyang Zhang,
Stephen Hemminger, David S. Miller, devel@linuxdriverproject.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <4c853799-44e0-ff33-3555-41982d601ebb@gmail.com>
> From: Eric Dumazet <eric.dumazet@gmail.com>
> You actually want to allocate memory local to this cpu, possibly in one chunk,
> not spread all over the places.
>
> kvmalloc(nr_cpu_ids * sizeof(struct netvsc_ethtool_pcpu_stats)) should be
> really better, since it would most of the time be satisfied by a single kmalloc()
Got it. I'm just trying to allocate memory for each cpu. It doesn't have to be __percpu variable.
^ permalink raw reply
* Re: [PATCH 0/9] Netfilter fixes for net
From: David Miller @ 2018-06-13 21:05 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20180613105700.12894-1-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 13 Jun 2018 12:56:51 +0200
> The following patchset contains Netfilter patches for your net tree:
>
> 1) Fix NULL pointer dereference from nf_nat_decode_session() if NAT is
> not loaded, from Prashant Bhole.
>
> 2) Fix socket extension module autoload.
>
> 3) Don't bogusly reject sets with the NFT_SET_EVAL flag set on from
> the dynset extension.
>
> 4) Fix races with nf_tables module removal and netns exit path,
> patches from Florian Westphal.
>
> 5) Don't hit BUG_ON if jumpstack goes too deep, instead hit
> WARN_ON_ONCE, from Taehee Yoo.
>
> 6) Another NULL pointer dereference from ctnetlink, again if NAT is
> not loaded, from Florian Westphal.
>
> 7) Fix x_tables match list corruption in xt_connmark module removal
> path, also from Florian.
>
> 8) nf_conncount doesn't properly deal with conntrack zones, hence
> garbage collector may get rid of entries in a different zone.
> From Yi-Hung Wei.
>
> You can pull these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
Pulled, thank you.
^ permalink raw reply
* Re: [RFC nf-next 0/5] netfilter: add ebpf translation infrastructure
From: Florian Westphal @ 2018-06-13 20:59 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Florian Westphal, netfilter-devel, ast, daniel, netdev,
David S. Miller, ecree
In-Reply-To: <20180613004324.x2xdyj2qlbkkpccy@ast-mbp.dhcp.thefacebook.com>
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Tue, Jun 12, 2018 at 11:28:12AM +0200, Florian Westphal wrote:
> > I think its important user(space) can see which rules are jitted, and
> > which ebpf prog corresponds to which rule(s), using an expression as
> > container allows to re-use existing nft config plane code to serialze
> > this via netlink attributes.
>
> In my mind it would be all or nothing. I don't think it helps
> to convert some rules and not all.
Ok. Still, even in that case I think it would be good if we'd be able to tell
userspace the ebpf program id that corresponds to the ruleset.
> > Step 1: 1:1 mapping, an nft rule has at most one ebpf prog.
> > Step 2: figure out how to handle maps, sets, and how to cope with
> > not-yet-translateable expressions
> > Step 3: m:n mapping: kernel provides adjacent rules to the UMH for
> > jitting. Example: user appends rules a, b, c. UMH creates
> > single ebpf prog from a/b/c.
> > nft-pseudo-expression replaces a/b/c in the
> > packet path, original rules a/b/c are linked from the pseudo
> > expression for tracking. If user deletes rule b, we provide
> > a/c to UMH to create new epbf prog that replaces new
> > sequence a/c.
> > Step 4: always provide entire future base chain and all reachable chains
> > to the umh. Ideally all of it is replaced by single program.
[..]
> > Does that make sense to you?
> >
> > If you see this as flawed, please let me know, but as I have no idea
> > how to resolve these issues going from 0 to 4 makes no sense to me.
>
> I think the challenge is how to implement 4 without doing step 1, right?
Yes.
> imo doing such 1:1 (single rule to single bpf prog) translation does not
> help to break hard problem into smaller pieces. Such 1:1 is great
> for prototype, but not to land upstream.
> For the same reasons in bpfilter we did single iptable rule to single
> bpf prog translation, but such code doesn't belong in upstream tree,
> since it's not a scalable approach.
[..]
> > Okay, but without any idea how to consider existing expressions,
> > sets, maps etc. I'm not sure it makes sense to work on that at this
> > point.
>
> I think sets and ipset (in case of iptables) fit well into trie model.
Yes, but thats going to be a lot of effort to handle properly
without breaking (or replacing) userland plumbing.
For nft we could aim for full-translation for the ingress hook
initially as that takes stateful filering out of the picture (ingress
occurs before conntrack).
We could also ignore sets for now and only deal with anonymous sets (they
are immutable and data stored in such sets can be made available to
UMH).
I can rework the RFC to emit "future table" to UMH instead of
individual rules, but I don't know yet when i will have time to work on
it again.
Thanks,
Florian
^ permalink raw reply
* Re: [PATCH v2] hv_netvsc: Add per-cpu ethtool stats for netvsc
From: Eric Dumazet @ 2018-06-13 20:57 UTC (permalink / raw)
To: Yidong Ren, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
David S. Miller, devel, netdev, linux-kernel
In-Reply-To: <20180613193608.444-1-yidren@linuxonhyperv.com>
On 06/13/2018 12:36 PM, Yidong Ren wrote:
> From: Yidong Ren <yidren@microsoft.com>
>
> This patch implements following ethtool stats fields for netvsc:
> cpu<n>_tx/rx_packets/bytes
> cpu<n>_vf_tx/rx_packets/bytes
...
>
> + pcpu_sum = alloc_percpu(struct netvsc_ethtool_pcpu_stats);
> + netvsc_get_pcpu_stats(dev, pcpu_sum);
> + for_each_present_cpu(cpu) {
> + struct netvsc_ethtool_pcpu_stats *this_sum =
> + per_cpu_ptr(pcpu_sum, cpu);
> + for (j = 0; j < ARRAY_SIZE(pcpu_stats); j++)
> + data[i++] = *(u64 *)((void *)this_sum
> + + pcpu_stats[j].offset);
> + }
> + free_percpu(pcpu_sum);
>
Using alloc_percpu() / free_percpu() for a short section of code makes no sense.
You actually want to allocate memory local to this cpu, possibly in one chunk,
not spread all over the places.
kvmalloc(nr_cpu_ids * sizeof(struct netvsc_ethtool_pcpu_stats)) should be really better,
since it would most of the time be satisfied by a single kmalloc()
^ permalink raw reply
* [BUG] net: stmmac: socfpga ethernet no longer working on linux-next
From: Dinh Nguyen @ 2018-06-13 20:46 UTC (permalink / raw)
To: netdev; +Cc: David Miller, clabbe, joabreu, Dinh Nguyen, Marek Vasut
Hi,
The stmmac ethernet has stopped working in linux-next and linus/master
branch(v4.17-11782-gbe779f03d563)
It appears that the stmmac ethernet has stopped working after these 2 commits:
4dbbe8dde848 net: stmmac: Add support for U32 TC filter using Flexible RX Parser
5f0456b43140 net: stmmac: Implement logic to automatically select HW Interface
If I move to this commit "565020aaeebf net: stmmac: Disable ACS
Feature for GMAC >= 4", then the stmmac works again on SoCFPGA.
I was following this thread:
https://www.spinics.net/lists/netdev/msg502858.html
Was wondering if there was a patch to fix dwmac-sun8i that the socfpga
platform needs as well?
Thanks,
Dinh
^ permalink raw reply
* Re: [PATCH net] net: qcom/emac: Add missing of_node_put()
From: Timur Tabi @ 2018-06-13 20:45 UTC (permalink / raw)
To: YueHaibing, davem; +Cc: linux-kernel, netdev, Hemanth Puranik
In-Reply-To: <20180611130345.15172-1-yuehaibing@huawei.com>
On 06/11/2018 08:03 AM, YueHaibing wrote:
> Add missing of_node_put() call for device node returned by
> of_parse_phandle().
>
> Signed-off-by: YueHaibing<yuehaibing@huawei.com>
Acked-by: Timur Tabi <timur@codeaurora.org>
This seems legit. The comment for of_find_device_by_node() that says
the np needs to be released was added after the code was written, so
it's possible that I didn't know at the time that this was a requirement.
However, I no longer have the ability to test EMAC on device tree
platforms, so I can't verify this code.
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH v2] hv_netvsc: Add per-cpu ethtool stats for netvsc
From: Yidong Ren @ 2018-06-13 19:36 UTC (permalink / raw)
To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
David S. Miller, devel, netdev, linux-kernel
From: Yidong Ren <yidren@microsoft.com>
This patch implements following ethtool stats fields for netvsc:
cpu<n>_tx/rx_packets/bytes
cpu<n>_vf_tx/rx_packets/bytes
Corresponding per-cpu counters exist in current code. Exposing these
counters will help troubleshooting performance issues.
Signed-off-by: Yidong Ren <yidren@microsoft.com>
---
Changes in v2:
- Remove cpp style comment
- Resubmit after freeze
drivers/net/hyperv/hyperv_net.h | 11 +++++
drivers/net/hyperv/netvsc_drv.c | 104 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 113 insertions(+), 2 deletions(-)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 23304ac..c825353 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -873,6 +873,17 @@ struct netvsc_ethtool_stats {
unsigned long wake_queue;
};
+struct netvsc_ethtool_pcpu_stats {
+ u64 rx_packets;
+ u64 rx_bytes;
+ u64 tx_packets;
+ u64 tx_bytes;
+ u64 vf_rx_packets;
+ u64 vf_rx_bytes;
+ u64 vf_tx_packets;
+ u64 vf_tx_bytes;
+};
+
struct netvsc_vf_pcpu_stats {
u64 rx_packets;
u64 rx_bytes;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 7b18a8c..6803aae 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1105,6 +1105,66 @@ static void netvsc_get_vf_stats(struct net_device *net,
}
}
+static void netvsc_get_pcpu_stats(struct net_device *net,
+ struct netvsc_ethtool_pcpu_stats
+ __percpu *pcpu_tot)
+{
+ struct net_device_context *ndev_ctx = netdev_priv(net);
+ struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);
+ int i;
+
+ /* fetch percpu stats of vf */
+ for_each_possible_cpu(i) {
+ const struct netvsc_vf_pcpu_stats *stats =
+ per_cpu_ptr(ndev_ctx->vf_stats, i);
+ struct netvsc_ethtool_pcpu_stats *this_tot =
+ per_cpu_ptr(pcpu_tot, i);
+ unsigned int start;
+
+ do {
+ start = u64_stats_fetch_begin_irq(&stats->syncp);
+ this_tot->vf_rx_packets = stats->rx_packets;
+ this_tot->vf_tx_packets = stats->tx_packets;
+ this_tot->vf_rx_bytes = stats->rx_bytes;
+ this_tot->vf_tx_bytes = stats->tx_bytes;
+ } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+ this_tot->rx_packets = this_tot->vf_rx_packets;
+ this_tot->tx_packets = this_tot->vf_tx_packets;
+ this_tot->rx_bytes = this_tot->vf_rx_bytes;
+ this_tot->tx_bytes = this_tot->vf_tx_bytes;
+ }
+
+ /* fetch percpu stats of netvsc */
+ for (i = 0; i < nvdev->num_chn; i++) {
+ const struct netvsc_channel *nvchan = &nvdev->chan_table[i];
+ const struct netvsc_stats *stats;
+ struct netvsc_ethtool_pcpu_stats *this_tot =
+ per_cpu_ptr(pcpu_tot, nvchan->channel->target_cpu);
+ u64 packets, bytes;
+ unsigned int start;
+
+ stats = &nvchan->tx_stats;
+ do {
+ start = u64_stats_fetch_begin_irq(&stats->syncp);
+ packets = stats->packets;
+ bytes = stats->bytes;
+ } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+
+ this_tot->tx_bytes += bytes;
+ this_tot->tx_packets += packets;
+
+ stats = &nvchan->rx_stats;
+ do {
+ start = u64_stats_fetch_begin_irq(&stats->syncp);
+ packets = stats->packets;
+ bytes = stats->bytes;
+ } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+
+ this_tot->rx_bytes += bytes;
+ this_tot->rx_packets += packets;
+ }
+}
+
static void netvsc_get_stats64(struct net_device *net,
struct rtnl_link_stats64 *t)
{
@@ -1202,6 +1262,23 @@ static const struct {
{ "rx_no_memory", offsetof(struct netvsc_ethtool_stats, rx_no_memory) },
{ "stop_queue", offsetof(struct netvsc_ethtool_stats, stop_queue) },
{ "wake_queue", offsetof(struct netvsc_ethtool_stats, wake_queue) },
+}, pcpu_stats[] = {
+ { "cpu%u_rx_packets",
+ offsetof(struct netvsc_ethtool_pcpu_stats, rx_packets) },
+ { "cpu%u_rx_bytes",
+ offsetof(struct netvsc_ethtool_pcpu_stats, rx_bytes) },
+ { "cpu%u_tx_packets",
+ offsetof(struct netvsc_ethtool_pcpu_stats, tx_packets) },
+ { "cpu%u_tx_bytes",
+ offsetof(struct netvsc_ethtool_pcpu_stats, tx_bytes) },
+ { "cpu%u_vf_rx_packets",
+ offsetof(struct netvsc_ethtool_pcpu_stats, vf_rx_packets) },
+ { "cpu%u_vf_rx_bytes",
+ offsetof(struct netvsc_ethtool_pcpu_stats, vf_rx_bytes) },
+ { "cpu%u_vf_tx_packets",
+ offsetof(struct netvsc_ethtool_pcpu_stats, vf_tx_packets) },
+ { "cpu%u_vf_tx_bytes",
+ offsetof(struct netvsc_ethtool_pcpu_stats, vf_tx_bytes) },
}, vf_stats[] = {
{ "vf_rx_packets", offsetof(struct netvsc_vf_pcpu_stats, rx_packets) },
{ "vf_rx_bytes", offsetof(struct netvsc_vf_pcpu_stats, rx_bytes) },
@@ -1213,6 +1290,9 @@ static const struct {
#define NETVSC_GLOBAL_STATS_LEN ARRAY_SIZE(netvsc_stats)
#define NETVSC_VF_STATS_LEN ARRAY_SIZE(vf_stats)
+/* statistics per queue (rx/tx packets/bytes) */
+#define NETVSC_PCPU_STATS_LEN (num_present_cpus() * ARRAY_SIZE(pcpu_stats))
+
/* 4 statistics per queue (rx/tx packets/bytes) */
#define NETVSC_QUEUE_STATS_LEN(dev) ((dev)->num_chn * 4)
@@ -1228,6 +1308,7 @@ static int netvsc_get_sset_count(struct net_device *dev, int string_set)
case ETH_SS_STATS:
return NETVSC_GLOBAL_STATS_LEN
+ NETVSC_VF_STATS_LEN
+ + NETVSC_PCPU_STATS_LEN
+ NETVSC_QUEUE_STATS_LEN(nvdev);
default:
return -EINVAL;
@@ -1242,9 +1323,10 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
const void *nds = &ndc->eth_stats;
const struct netvsc_stats *qstats;
struct netvsc_vf_pcpu_stats sum;
+ struct netvsc_ethtool_pcpu_stats __percpu *pcpu_sum;
unsigned int start;
u64 packets, bytes;
- int i, j;
+ int i, j, cpu;
if (!nvdev)
return;
@@ -1256,6 +1338,17 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
for (j = 0; j < NETVSC_VF_STATS_LEN; j++)
data[i++] = *(u64 *)((void *)&sum + vf_stats[j].offset);
+ pcpu_sum = alloc_percpu(struct netvsc_ethtool_pcpu_stats);
+ netvsc_get_pcpu_stats(dev, pcpu_sum);
+ for_each_present_cpu(cpu) {
+ struct netvsc_ethtool_pcpu_stats *this_sum =
+ per_cpu_ptr(pcpu_sum, cpu);
+ for (j = 0; j < ARRAY_SIZE(pcpu_stats); j++)
+ data[i++] = *(u64 *)((void *)this_sum
+ + pcpu_stats[j].offset);
+ }
+ free_percpu(pcpu_sum);
+
for (j = 0; j < nvdev->num_chn; j++) {
qstats = &nvdev->chan_table[j].tx_stats;
@@ -1283,7 +1376,7 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
struct net_device_context *ndc = netdev_priv(dev);
struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
u8 *p = data;
- int i;
+ int i, cpu;
if (!nvdev)
return;
@@ -1300,6 +1393,13 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
p += ETH_GSTRING_LEN;
}
+ for_each_present_cpu(cpu) {
+ for (i = 0; i < ARRAY_SIZE(pcpu_stats); i++) {
+ sprintf(p, pcpu_stats[i].name, cpu);
+ p += ETH_GSTRING_LEN;
+ }
+ }
+
for (i = 0; i < nvdev->num_chn; i++) {
sprintf(p, "tx_queue_%u_packets", i);
p += ETH_GSTRING_LEN;
--
2.7.4
^ permalink raw reply related
* Re: KASAN: slab-out-of-bounds Read in bpf_skb_vlan_push
From: syzbot @ 2018-06-13 18:49 UTC (permalink / raw)
To: ast, daniel, davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <26c434ee-0a0a-fbba-282c-dabddfac652e@iogearbox.net>
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger
crash:
Reported-and-tested-by:
syzbot+76de61614cb1abdd73fc@syzkaller.appspotmail.com
Tested on:
commit: be779f03d563 Merge tag 'kbuild-v4.18-2' of git://git.kerne..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=68d8eba98e3f8e88
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
Note: testing is done by a robot and is best-effort only.
^ permalink raw reply
* Re: [PATCH ethtool 1/6] ethtool: fix uninitialized return value
From: John W. Linville @ 2018-06-13 18:31 UTC (permalink / raw)
To: Ivan Vecera; +Cc: netdev
In-Reply-To: <20180608092010.13041-1-cera@cera.cz>
On Fri, Jun 08, 2018 at 11:20:05AM +0200, Ivan Vecera wrote:
> Fixes: b0fe96d ("Ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE and PHY downshift")
> Signed-off-by: Ivan Vecera <cera@cera.cz>
LGTM -- I have queued the series for the next release, including
extending the commit IDs...
Thanks!
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* Re: KASAN: slab-out-of-bounds Read in bpf_skb_vlan_push
From: Daniel Borkmann @ 2018-06-13 18:42 UTC (permalink / raw)
To: syzbot, ast, davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <0000000000000c58c2056e8a3a27@google.com>
On 06/13/2018 08:34 PM, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger crash:
>
> Reported-and-tested-by: syzbot+76de61614cb1abdd73fc@syzkaller.appspotmail.com
>
> Tested on:
>
> commit: be779f03d563 Merge tag 'kbuild-v4.18-2' of git://git.kerne..
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/--
> kernel config: https://syzkaller.appspot.com/x/.config?x=68d8eba98e3f8e88
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>
> Note: testing is done by a robot and is best-effort only.
#syz fix: bpf: reject passing modified ctx to helper functions
^ permalink raw reply
* Re: KASAN: slab-out-of-bounds Read in bpf_skb_vlan_push
From: syzbot @ 2018-06-13 18:34 UTC (permalink / raw)
To: ast, daniel, davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <b3af0f35-b0f8-859d-4f8f-b919d35ebaaa@iogearbox.net>
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger
crash:
Reported-and-tested-by:
syzbot+76de61614cb1abdd73fc@syzkaller.appspotmail.com
Tested on:
commit: be779f03d563 Merge tag 'kbuild-v4.18-2' of git://git.kerne..
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/--
kernel config: https://syzkaller.appspot.com/x/.config?x=68d8eba98e3f8e88
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
Note: testing is done by a robot and is best-effort only.
^ permalink raw reply
* Re: KASAN: slab-out-of-bounds Read in bpf_skb_vlan_push
From: Daniel Borkmann @ 2018-06-13 18:15 UTC (permalink / raw)
To: syzbot; +Cc: ast, davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <000000000000402477056e89f067@google.com>
On 06/13/2018 08:13 PM, syzbot wrote:
>> On 06/13/2018 06:17 PM, syzbot wrote:
>>> Hello,
>
>>> syzbot found the following crash on:
>
>>> HEAD commit: 75d4e704fa8d netdev-FAQ: clarify DaveM's position for stab..
>>> git tree: bpf-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1754783f800000
>>> kernel config: https://syzkaller.appspot.com/x/.config?x=a601a80fec461d44
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=76de61614cb1abdd73fc
>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=12c1e1bf800000
>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+76de61614cb1abdd73fc@syzkaller.appspotmail.com
>
>>> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
>>> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
>>> 8021q: adding VLAN 0 to HW filter on device team0
>>> 8021q: adding VLAN 0 to HW filter on device team0
>>> ==================================================================
>>> BUG: KASAN: slab-out-of-bounds in skb_at_tc_ingress include/net/sch_generic.h:535 [inline]
>>> BUG: KASAN: slab-out-of-bounds in bpf_push_mac_rcsum net/core/filter.c:1625 [inline]
>>> BUG: KASAN: slab-out-of-bounds in ____bpf_skb_vlan_push net/core/filter.c:2446 [inline]
>>> BUG: KASAN: slab-out-of-bounds in bpf_skb_vlan_push+0x6b7/0x720 net/core/filter.c:2437
>>> Read of size 5 at addr ffff8801b77347d0 by task syz-executor6/6529
>
>> Should be fixed already by:
>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=58990d1ff3f7896ee341030e9a7c2e4002570683
>
>
>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> want 2 args (repo, branch), got 1
Fair enough ... assumed default would have been master. ;-)
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
^ permalink raw reply
* Re: 4b66af2d("af_key: Always verify length of provided sadb_key")
From: Greg KH @ 2018-06-13 18:13 UTC (permalink / raw)
To: Zubin Mithra; +Cc: linux-netdev, groeck, stable
In-Reply-To: <20180613180252.GA34929@zsmcros.c.googlers.com>
On Wed, Jun 13, 2018 at 02:02:54PM -0400, Zubin Mithra wrote:
> Hello,
>
> Syzkaller has reported a crash here[1] for a slab OOB read in pfkey_add.
>
> Could the following patch be applied to stable kernels for 4.14, 4.4, 3.18, 3.14, 3.10 and 3.8?
>
> 4b66af2d("af_key: Always verify length of provided sadb_key")
>
> [1] https://syzkaller.appspot.com/bug?id=26cb120b31cd24d984fc16da67f50fb375c432a7
Now queued up, thanks.
greg k-h
^ permalink raw reply
* Re: Re: KASAN: slab-out-of-bounds Read in bpf_skb_vlan_push
From: syzbot @ 2018-06-13 18:13 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: ast, daniel, davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <b3af0f35-b0f8-859d-4f8f-b919d35ebaaa@iogearbox.net>
> On 06/13/2018 06:17 PM, syzbot wrote:
>> Hello,
>> syzbot found the following crash on:
>> HEAD commit: 75d4e704fa8d netdev-FAQ: clarify DaveM's position for
>> stab..
>> git tree: bpf-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1754783f800000
>> kernel config:
>> https://syzkaller.appspot.com/x/.config?x=a601a80fec461d44
>> dashboard link:
>> https://syzkaller.appspot.com/bug?extid=76de61614cb1abdd73fc
>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller
>> repro:https://syzkaller.appspot.com/x/repro.syz?x=12c1e1bf800000
>> IMPORTANT: if you fix the bug, please add the following tag to the
>> commit:
>> Reported-by: syzbot+76de61614cb1abdd73fc@syzkaller.appspotmail.com
>> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
>> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
>> 8021q: adding VLAN 0 to HW filter on device team0
>> 8021q: adding VLAN 0 to HW filter on device team0
>> ==================================================================
>> BUG: KASAN: slab-out-of-bounds in skb_at_tc_ingress
>> include/net/sch_generic.h:535 [inline]
>> BUG: KASAN: slab-out-of-bounds in bpf_push_mac_rcsum
>> net/core/filter.c:1625 [inline]
>> BUG: KASAN: slab-out-of-bounds in ____bpf_skb_vlan_push
>> net/core/filter.c:2446 [inline]
>> BUG: KASAN: slab-out-of-bounds in bpf_skb_vlan_push+0x6b7/0x720
>> net/core/filter.c:2437
>> Read of size 5 at addr ffff8801b77347d0 by task syz-executor6/6529
> Should be fixed already by:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=58990d1ff3f7896ee341030e9a7c2e4002570683
> #syz test:
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
want 2 args (repo, branch), got 1
^ permalink raw reply
* Re: KASAN: slab-out-of-bounds Read in bpf_skb_vlan_push
From: Daniel Borkmann @ 2018-06-13 18:13 UTC (permalink / raw)
To: syzbot, ast, davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <000000000000219be6056e8850fa@google.com>
On 06/13/2018 06:17 PM, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 75d4e704fa8d netdev-FAQ: clarify DaveM's position for stab..
> git tree: bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1754783f800000
> kernel config: https://syzkaller.appspot.com/x/.config?x=a601a80fec461d44
> dashboard link: https://syzkaller.appspot.com/bug?extid=76de61614cb1abdd73fc
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=12c1e1bf800000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+76de61614cb1abdd73fc@syzkaller.appspotmail.com
>
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 8021q: adding VLAN 0 to HW filter on device team0
> 8021q: adding VLAN 0 to HW filter on device team0
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in skb_at_tc_ingress include/net/sch_generic.h:535 [inline]
> BUG: KASAN: slab-out-of-bounds in bpf_push_mac_rcsum net/core/filter.c:1625 [inline]
> BUG: KASAN: slab-out-of-bounds in ____bpf_skb_vlan_push net/core/filter.c:2446 [inline]
> BUG: KASAN: slab-out-of-bounds in bpf_skb_vlan_push+0x6b7/0x720 net/core/filter.c:2437
> Read of size 5 at addr ffff8801b77347d0 by task syz-executor6/6529
Should be fixed already by:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=58990d1ff3f7896ee341030e9a7c2e4002570683
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
^ permalink raw reply
* 4b66af2d("af_key: Always verify length of provided sadb_key")
From: Zubin Mithra @ 2018-06-13 18:02 UTC (permalink / raw)
To: linux-netdev; +Cc: groeck, stable
Hello,
Syzkaller has reported a crash here[1] for a slab OOB read in pfkey_add.
Could the following patch be applied to stable kernels for 4.14, 4.4, 3.18, 3.14, 3.10 and 3.8?
4b66af2d("af_key: Always verify length of provided sadb_key")
[1] https://syzkaller.appspot.com/bug?id=26cb120b31cd24d984fc16da67f50fb375c432a7
Thanks,
- Zubin
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox