* Re: [PATCHv2 net 3/3] samples/bpf: Enable powerpc support
From: Alexei Starovoitov @ 2016-04-04 17:24 UTC (permalink / raw)
To: Naveen N. Rao
Cc: linux-kernel, linuxppc-dev, netdev, Alexei Starovoitov,
Daniel Borkmann, David S . Miller, Ananth N Mavinakayanahalli,
Michael Ellerman
In-Reply-To: <28d2e09a7db94a0a71b3222b3e5ffacb0d4a8dd7.1459789086.git.naveen.n.rao@linux.vnet.ibm.com>
On Mon, Apr 04, 2016 at 10:31:34PM +0530, Naveen N. Rao wrote:
> Add the necessary definitions for building bpf samples on ppc.
>
> Since ppc doesn't store function return address on the stack, modify how
> PT_REGS_RET() and PT_REGS_FP() work.
>
> Also, introduce PT_REGS_IP() to access the instruction pointer.
>
> Cc: Alexei Starovoitov <ast@fb.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCHv2 net 2/3] samples/bpf: Use llc in PATH, rather than a hardcoded value
From: Alexei Starovoitov @ 2016-04-04 17:24 UTC (permalink / raw)
To: Naveen N. Rao
Cc: linux-kernel, linuxppc-dev, netdev, Alexei Starovoitov,
David S . Miller, Daniel Borkmann, Ananth N Mavinakayanahalli,
Michael Ellerman
In-Reply-To: <000e23c21f475e7106c74a83eb6226c4cb8d4a14.1459789086.git.naveen.n.rao@linux.vnet.ibm.com>
On Mon, Apr 04, 2016 at 10:31:33PM +0530, Naveen N. Rao wrote:
> While at it, remove the generation of .s files and fix some typos in the
> related comment.
>
> Cc: Alexei Starovoitov <ast@fb.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [RFC PATCH 6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF
From: Naveen N. Rao @ 2016-04-04 17:09 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Alexei Starovoitov, linux-kernel, linuxppc-dev, Matt Evans, oss,
Paul Mackerras, netdev, David S. Miller
In-Reply-To: <56FEBF51.7090608@iogearbox.net>
On 2016/04/01 08:34PM, Daniel Borkmann wrote:
> On 04/01/2016 08:10 PM, Alexei Starovoitov wrote:
> >On 4/1/16 2:58 AM, Naveen N. Rao wrote:
> >>PPC64 eBPF JIT compiler. Works for both ABIv1 and ABIv2.
> >>
> >>Enable with:
> >>echo 1 > /proc/sys/net/core/bpf_jit_enable
> >>or
> >>echo 2 > /proc/sys/net/core/bpf_jit_enable
> >>
> >>... to see the generated JIT code. This can further be processed with
> >>tools/net/bpf_jit_disasm.
> >>
> >>With CONFIG_TEST_BPF=m and 'modprobe test_bpf':
> >>test_bpf: Summary: 291 PASSED, 0 FAILED, [234/283 JIT'ed]
> >>
> >>... on both ppc64 BE and LE.
> >>
> >>The details of the approach are documented through various comments in
> >>the code, as are the TODOs. Some of the prominent TODOs include
> >>implementing BPF tail calls and skb loads.
> >>
> >>Cc: Matt Evans <matt@ozlabs.org>
> >>Cc: Michael Ellerman <mpe@ellerman.id.au>
> >>Cc: Paul Mackerras <paulus@samba.org>
> >>Cc: Alexei Starovoitov <ast@fb.com>
> >>Cc: "David S. Miller" <davem@davemloft.net>
> >>Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> >>Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> >>---
> >> arch/powerpc/include/asm/ppc-opcode.h | 19 +-
> >> arch/powerpc/net/Makefile | 4 +
> >> arch/powerpc/net/bpf_jit.h | 66 ++-
> >> arch/powerpc/net/bpf_jit64.h | 58 +++
> >> arch/powerpc/net/bpf_jit_comp64.c | 828 ++++++++++++++++++++++++++++++++++
> >> 5 files changed, 973 insertions(+), 2 deletions(-)
> >> create mode 100644 arch/powerpc/net/bpf_jit64.h
> >> create mode 100644 arch/powerpc/net/bpf_jit_comp64.c
> >...
> >>-#ifdef CONFIG_PPC64
> >>+#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2)
> >
> >impressive stuff!
>
> +1, awesome to see another one!
Thanks!
>
> >Everything nicely documented. Could you add few words for the above
> >condition as well ?
> >Or may be a new macro, since it occurs many times?
> >What are these _CALL_ELF == 2 and != 2 conditions mean? ppc ABIs ?
Yes, there are 2 ABIs: ppc64 (ABIv1) -- big endian and the recently
introduced ppc64le (ABIv2) which is currently only little endian. There
is also ppc32...
Good suggestion about using a macro. I will put out a patch for that.
> >Will there ever be v3 ?
Hope not! ;)
>
> Minor TODO would also be to convert to use bpf_jit_binary_alloc() and
> bpf_jit_binary_free() API for the image, which is done by other eBPF
> jits, too.
Sure. I will make that change.
>
> >So far most of the bpf jits were going via net-next tree, but if
> >in this case no changes to the core is necessary then I guess it's fine
> >to do it via powerpc tree. What's your plan?
I initially thought this has to go through the powerpc tree. I don't
really have a preference and I'll allow the maintainers to take a call
on that. I do however need a review of the JIT code from Michael
Ellerman/Paul Mackerras.
- Naveen
^ permalink raw reply
* [PATCHv2 net 3/3] samples/bpf: Enable powerpc support
From: Naveen N. Rao @ 2016-04-04 17:01 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller,
Ananth N Mavinakayanahalli, Michael Ellerman
In-Reply-To: <5ab5eec91d2d0bc7f221d714ef84afac83b2604b.1459789086.git.naveen.n.rao@linux.vnet.ibm.com>
Add the necessary definitions for building bpf samples on ppc.
Since ppc doesn't store function return address on the stack, modify how
PT_REGS_RET() and PT_REGS_FP() work.
Also, introduce PT_REGS_IP() to access the instruction pointer.
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
v2: updated macros using ({ }) gcc extension as per Alexei
samples/bpf/bpf_helpers.h | 26 ++++++++++++++++++++++++++
samples/bpf/spintest_kern.c | 2 +-
samples/bpf/tracex2_kern.c | 4 ++--
samples/bpf/tracex4_kern.c | 2 +-
4 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 9363500..7904a2a 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -82,6 +82,7 @@ static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flag
#define PT_REGS_FP(x) ((x)->bp)
#define PT_REGS_RC(x) ((x)->ax)
#define PT_REGS_SP(x) ((x)->sp)
+#define PT_REGS_IP(x) ((x)->ip)
#elif defined(__s390x__)
@@ -94,6 +95,7 @@ static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flag
#define PT_REGS_FP(x) ((x)->gprs[11]) /* Works only with CONFIG_FRAME_POINTER */
#define PT_REGS_RC(x) ((x)->gprs[2])
#define PT_REGS_SP(x) ((x)->gprs[15])
+#define PT_REGS_IP(x) ((x)->ip)
#elif defined(__aarch64__)
@@ -106,6 +108,30 @@ static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flag
#define PT_REGS_FP(x) ((x)->regs[29]) /* Works only with CONFIG_FRAME_POINTER */
#define PT_REGS_RC(x) ((x)->regs[0])
#define PT_REGS_SP(x) ((x)->sp)
+#define PT_REGS_IP(x) ((x)->pc)
+
+#elif defined(__powerpc__)
+
+#define PT_REGS_PARM1(x) ((x)->gpr[3])
+#define PT_REGS_PARM2(x) ((x)->gpr[4])
+#define PT_REGS_PARM3(x) ((x)->gpr[5])
+#define PT_REGS_PARM4(x) ((x)->gpr[6])
+#define PT_REGS_PARM5(x) ((x)->gpr[7])
+#define PT_REGS_RC(x) ((x)->gpr[3])
+#define PT_REGS_SP(x) ((x)->sp)
+#define PT_REGS_IP(x) ((x)->nip)
#endif
+
+#ifdef __powerpc__
+#define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ (ip) = (ctx)->link; })
+#define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP
+#else
+#define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ \
+ bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); })
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) ({ \
+ bpf_probe_read(&(ip), sizeof(ip), \
+ (void *)(PT_REGS_FP(ctx) + sizeof(ip))); })
+#endif
+
#endif
diff --git a/samples/bpf/spintest_kern.c b/samples/bpf/spintest_kern.c
index 4b27619..ce0167d 100644
--- a/samples/bpf/spintest_kern.c
+++ b/samples/bpf/spintest_kern.c
@@ -34,7 +34,7 @@ struct bpf_map_def SEC("maps") stackmap = {
#define PROG(foo) \
int foo(struct pt_regs *ctx) \
{ \
- long v = ctx->ip, *val; \
+ long v = PT_REGS_IP(ctx), *val; \
\
val = bpf_map_lookup_elem(&my_map, &v); \
bpf_map_update_elem(&my_map, &v, &v, BPF_ANY); \
diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c
index 09c1adc..6d6eefd 100644
--- a/samples/bpf/tracex2_kern.c
+++ b/samples/bpf/tracex2_kern.c
@@ -27,10 +27,10 @@ int bpf_prog2(struct pt_regs *ctx)
long init_val = 1;
long *value;
- /* x64/s390x specific: read ip of kfree_skb caller.
+ /* read ip of kfree_skb caller.
* non-portable version of __builtin_return_address(0)
*/
- bpf_probe_read(&loc, sizeof(loc), (void *)PT_REGS_RET(ctx));
+ BPF_KPROBE_READ_RET_IP(loc, ctx);
value = bpf_map_lookup_elem(&my_map, &loc);
if (value)
diff --git a/samples/bpf/tracex4_kern.c b/samples/bpf/tracex4_kern.c
index ac46714..6dd8e38 100644
--- a/samples/bpf/tracex4_kern.c
+++ b/samples/bpf/tracex4_kern.c
@@ -40,7 +40,7 @@ int bpf_prog2(struct pt_regs *ctx)
long ip = 0;
/* get ip address of kmem_cache_alloc_node() caller */
- bpf_probe_read(&ip, sizeof(ip), (void *)(PT_REGS_FP(ctx) + sizeof(ip)));
+ BPF_KRETPROBE_READ_RET_IP(ip, ctx);
struct pair v = {
.val = bpf_ktime_get_ns(),
--
2.7.4
^ permalink raw reply related
* [PATCHv2 net 2/3] samples/bpf: Use llc in PATH, rather than a hardcoded value
From: Naveen N. Rao @ 2016-04-04 17:01 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, netdev
Cc: Alexei Starovoitov, David S . Miller, Daniel Borkmann,
Ananth N Mavinakayanahalli, Michael Ellerman
In-Reply-To: <5ab5eec91d2d0bc7f221d714ef84afac83b2604b.1459789086.git.naveen.n.rao@linux.vnet.ibm.com>
While at it, remove the generation of .s files and fix some typos in the
related comment.
Cc: Alexei Starovoitov <ast@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
v2: removed generation of .s files
samples/bpf/Makefile | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 502c9fc..b820cc9 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -76,16 +76,10 @@ HOSTLOADLIBES_offwaketime += -lelf
HOSTLOADLIBES_spintest += -lelf
HOSTLOADLIBES_map_perf_test += -lelf -lrt
-# point this to your LLVM backend with bpf support
-LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc
-
-# asm/sysreg.h inline assmbly used by it is incompatible with llvm.
-# But, ehere is not easy way to fix it, so just exclude it since it is
+# asm/sysreg.h - inline assembly used by it is incompatible with llvm.
+# But, there is no easy way to fix it, so just exclude it since it is
# useless for BPF samples.
$(obj)/%.o: $(src)/%.c
clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
- -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
- clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
- -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
- -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=asm -o $@.s
+ -O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=obj -o $@
--
2.7.4
^ permalink raw reply related
* [PATCHv2 net 1/3] samples/bpf: Fix build breakage with map_perf_test_user.c
From: Naveen N. Rao @ 2016-04-04 17:01 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller,
Ananth N Mavinakayanahalli, Michael Ellerman
Building BPF samples is failing with the below error:
samples/bpf/map_perf_test_user.c: In function ‘main’:
samples/bpf/map_perf_test_user.c:134:9: error: variable ‘r’ has
initializer but incomplete type
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
samples/bpf/map_perf_test_user.c:134:21: error: ‘RLIM_INFINITY’
undeclared (first use in this function)
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
samples/bpf/map_perf_test_user.c:134:21: note: each undeclared
identifier is reported only once for each function it appears in
samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
struct initializer [enabled by default]
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
for ‘r’) [enabled by default]
samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
struct initializer [enabled by default]
samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
for ‘r’) [enabled by default]
samples/bpf/map_perf_test_user.c:134:16: error: storage size of ‘r’
isn’t known
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
samples/bpf/map_perf_test_user.c:139:2: warning: implicit declaration of
function ‘setrlimit’ [-Wimplicit-function-declaration]
setrlimit(RLIMIT_MEMLOCK, &r);
^
samples/bpf/map_perf_test_user.c:139:12: error: ‘RLIMIT_MEMLOCK’
undeclared (first use in this function)
setrlimit(RLIMIT_MEMLOCK, &r);
^
samples/bpf/map_perf_test_user.c:134:16: warning: unused variable ‘r’
[-Wunused-variable]
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
make[2]: *** [samples/bpf/map_perf_test_user.o] Error 1
Fix this by including the necessary header file.
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
v2: no changes
samples/bpf/map_perf_test_user.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index 95af56e..3147377 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -17,6 +17,7 @@
#include <linux/bpf.h>
#include <string.h>
#include <time.h>
+#include <sys/resource.h>
#include "libbpf.h"
#include "bpf_load.h"
--
2.7.4
^ permalink raw reply related
* [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864
From: Alexander Duyck @ 2016-04-04 16:31 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
In-Reply-To: <20160404162545.14332.653.stgit@localhost.localdomain>
RFC 6864 states that the IPv4 ID field MUST NOT be used for purposes other
than fragmentation and reassembly. Currently we are looking at this field
as a way of identifying what frames can be aggregated and which cannot for
GRO. While this is valid for frames that do not have DF set, it is invalid
to do so if the bit is set.
In addition we were generating IPv4 ID collisions when 2 or more flows were
interleaved over the same tunnel. To prevent that we store the result of
all IP ID checks via a "|=" instead of overwriting previous values.
With this patch we support two different approaches for the IP ID field.
The first is a non-incrementing IP ID with DF bit set. In such a case we
simply won't write to the flush_id field in the GRO context block. The
other option is the legacy option in which the IP ID must increment by 1
for every packet we aggregate.
In the case of the non-incrementing IP ID we will end up losing the data
that the IP ID is fixed. However as per RFC 6864 we should be able to
write any value into the IP ID when the DF bit is set so this should cause
minimal harm.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
v2: Updated patch so that we now only support one of two options. Either
the IP ID is fixed with DF bit set, or the IP ID is incrementing. That
allows us to support the fixed ID case as occurs with IPv6 to IPv4
header translation and what is likely already out there for some
devices with tunnel headers.
net/core/dev.c | 1 +
net/ipv4/af_inet.c | 25 ++++++++++++++++++-------
net/ipv6/ip6_offload.c | 3 ---
3 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 77a71cd68535..3429632398a4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4352,6 +4352,7 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
unsigned long diffs;
NAPI_GRO_CB(p)->flush = 0;
+ NAPI_GRO_CB(p)->flush_id = 0;
if (hash != skb_get_hash_raw(p)) {
NAPI_GRO_CB(p)->same_flow = 0;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9e481992dbae..33f6335448a2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1324,6 +1324,7 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
for (p = *head; p; p = p->next) {
struct iphdr *iph2;
+ u16 flush_id;
if (!NAPI_GRO_CB(p)->same_flow)
continue;
@@ -1347,14 +1348,24 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
(iph->tos ^ iph2->tos) |
((iph->frag_off ^ iph2->frag_off) & htons(IP_DF));
- /* Save the IP ID check to be included later when we get to
- * the transport layer so only the inner most IP ID is checked.
- * This is because some GSO/TSO implementations do not
- * correctly increment the IP ID for the outer hdrs.
- */
- NAPI_GRO_CB(p)->flush_id =
- ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
NAPI_GRO_CB(p)->flush |= flush;
+
+ /* We must save the offset as it is possible to have multiple
+ * flows using the same protocol and address pairs so we
+ * need to wait until we can validate this is part of the
+ * same flow with a 5-tuple or better to avoid unnecessary
+ * collisions between flows. We can support one of two
+ * possible scenarios, either a fixed value with DF bit set
+ * or an incrementing value with DF either set or unset.
+ * In the case of a fixed value we will end up losing the
+ * data that the IP ID was a fixed value, however per RFC
+ * 6864 in such a case the actual value of the IP ID is
+ * meant to be ignored anyway.
+ */
+ flush_id = (u16)(id - ntohs(iph2->id));
+ if (flush_id || !(iph2->frag_off & htons(IP_DF)))
+ NAPI_GRO_CB(p)->flush_id |= flush_id ^
+ NAPI_GRO_CB(p)->count;
}
NAPI_GRO_CB(skb)->flush |= flush;
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 82e9f3076028..9aa53f64dffd 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -238,9 +238,6 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
/* flush if Traffic Class fields are different */
NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
NAPI_GRO_CB(p)->flush |= flush;
-
- /* Clear flush_id, there's really no concept of ID in IPv6. */
- NAPI_GRO_CB(p)->flush_id = 0;
}
NAPI_GRO_CB(skb)->flush |= flush;
^ permalink raw reply related
* [net PATCH v2 1/2] GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
From: Alexander Duyck @ 2016-04-04 16:28 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
In-Reply-To: <20160404162545.14332.653.stgit@localhost.localdomain>
This patch fixes an issue I found in which we were dropping frames if we
had enabled checksums on GRE headers that were encapsulated by either FOU
or GUE. Without this patch I was barely able to get 1 Gb/s of throughput.
With this patch applied I am now at least getting around 6 Gb/s.
The issue is due to the fact that with FOU or GUE applied we do not provide
a transport offset pointing to the GRE header, nor do we offload it in
software as the GRE header is completely skipped by GSO and treated like a
VXLAN or GENEVE type header. As such we need to prevent the stack from
generating it and also prevent GRE from generating it via any interface we
create.
Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
include/linux/netdevice.h | 5 ++++-
net/core/dev.c | 1 +
net/ipv4/fou.c | 6 ++++++
net/ipv4/gre_offload.c | 8 ++++++++
net/ipv4/ip_gre.c | 13 ++++++++++---
5 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb0d5d09c2e4..8395308a2445 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2120,7 +2120,10 @@ struct napi_gro_cb {
/* Used in foo-over-udp, set in udp[46]_gro_receive */
u8 is_ipv6:1;
- /* 7 bit hole */
+ /* Used in GRE, set in fou/gue_gro_receive */
+ u8 is_fou:1;
+
+ /* 6 bit hole */
/* used to support CHECKSUM_COMPLETE for tunneling protocols */
__wsum csum;
diff --git a/net/core/dev.c b/net/core/dev.c
index b9bcbe77d913..77a71cd68535 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4439,6 +4439,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
NAPI_GRO_CB(skb)->flush = 0;
NAPI_GRO_CB(skb)->free = 0;
NAPI_GRO_CB(skb)->encap_mark = 0;
+ NAPI_GRO_CB(skb)->is_fou = 0;
NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
/* Setup for GRO checksum validation */
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 5a94aea280d3..a39068b4a4d9 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -203,6 +203,9 @@ static struct sk_buff **fou_gro_receive(struct sk_buff **head,
*/
NAPI_GRO_CB(skb)->encap_mark = 0;
+ /* Flag this frame as already having an outer encap header */
+ NAPI_GRO_CB(skb)->is_fou = 1;
+
rcu_read_lock();
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[proto]);
@@ -368,6 +371,9 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
*/
NAPI_GRO_CB(skb)->encap_mark = 0;
+ /* Flag this frame as already having an outer encap header */
+ NAPI_GRO_CB(skb)->is_fou = 1;
+
rcu_read_lock();
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[guehdr->proto_ctype]);
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index c47539d04b88..6a5bd4317866 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -150,6 +150,14 @@ static struct sk_buff **gre_gro_receive(struct sk_buff **head,
if ((greh->flags & ~(GRE_KEY|GRE_CSUM)) != 0)
goto out;
+ /* We can only support GRE_CSUM if we can track the location of
+ * the GRE header. In the case of FOU/GUE we cannot because the
+ * outer UDP header displaces the GRE header leaving us in a state
+ * of limbo.
+ */
+ if ((greh->flags & GRE_CSUM) && NAPI_GRO_CB(skb)->is_fou)
+ goto out;
+
type = greh->protocol;
rcu_read_lock();
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 31936d387cfd..af5d1f38217f 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -862,9 +862,16 @@ static void __gre_tunnel_init(struct net_device *dev)
dev->hw_features |= GRE_FEATURES;
if (!(tunnel->parms.o_flags & TUNNEL_SEQ)) {
- /* TCP offload with GRE SEQ is not supported. */
- dev->features |= NETIF_F_GSO_SOFTWARE;
- dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+ /* TCP offload with GRE SEQ is not supported, nor
+ * can we support 2 levels of outer headers requiring
+ * an update.
+ */
+ if (!(tunnel->parms.o_flags & TUNNEL_CSUM) ||
+ (tunnel->encap.type == TUNNEL_ENCAP_NONE)) {
+ dev->features |= NETIF_F_GSO_SOFTWARE;
+ dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+ }
+
/* Can use a lockless transmit, unless we generate
* output sequences
*/
^ permalink raw reply related
* [net PATCH v2 0/2] Fixes for GRO and GRE tunnels
From: Alexander Duyck @ 2016-04-04 16:27 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
This pair of patches addresses a few issues I have discovered over the last
week or so concerning GRO and GRE tunnels.
The first patch addresses an item I called out as an issue with FOU/GUE
encapsulating GRE, and I finally had a chance to test it and verify that
the code concerning it was broken so I took the opportunity to fix it so
that we cannot generate a FOU/GUE frame that is encapsulating a GRE tunnel
with checksum while requesting TSO/GSO for the frame.
The second patch actually addresses something I realized was an issue if we
feed a tunnel through GRO and back out through GSO. Specifically it was
possible for GRO to generate overlapping IPv4 ID ranges as the outer IP IDs
were being ignored for tunnels. Ignoring the IP IDs like this should only
be valid if the DF bit is set. This is normally the case for IPIP, SIT,
and GRE tunnels, but not so for UDP tunnels. In the case that the DF bit
is not set we store off the fact that there was a delta from what we were
expecting and when we hit the inner-most header we validate the value as to
avoid generating a frame which could lead to an IP ID collision on packets
that could eventually be fragmented. A side effect is that the inner-most
IP ID test is relaxed as well, but the worst case scenario is that we GRO a
frame with a throw-away ID sequence anyway so if anything segmenting such a
frame with the wrong IP IDs should have no negative effects.
v2: Updated RFC 6864 patch so that we only support one of two modes. Either
the IP ID is incrementing, or it is fixed at a specific value with DF bit
set.
---
Alexander Duyck (2):
GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
ipv4/GRO: Make GRO conform to RFC 6864
include/linux/netdevice.h | 5 ++++-
net/core/dev.c | 2 ++
net/ipv4/af_inet.c | 25 ++++++++++++++++++-------
net/ipv4/fou.c | 6 ++++++
net/ipv4/gre_offload.c | 8 ++++++++
net/ipv4/ip_gre.c | 13 ++++++++++---
net/ipv6/ip6_offload.c | 3 ---
7 files changed, 48 insertions(+), 14 deletions(-)
^ permalink raw reply
* [PATCH RFC 5/5] ARM: dts: sun8i: Enable Ethernet controller on the Orange PI PC
From: Chen-Yu Tsai @ 2016-04-04 16:22 UTC (permalink / raw)
To: Maxime Ripard, Florian Fainelli, Rob Herring, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala
Cc: Chen-Yu Tsai, netdev, linux-arm-kernel, linux-kernel, devicetree,
LABBE Corentin
In-Reply-To: <1459786954-12649-1-git-send-email-wens@csie.org>
The Orange PI PC uses the H3's internal Ethernet PHY with the EMAC
Ethernet controller.
Set a proper address for the PHY and enable the EMAC.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---
This patch depends on "ARM: dts: sun8i-h3: Add Ethernet controller device",
which uses an binding still in development.
Do not merge.
---
arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
index daf50b9a6657..f01e10df812a 100644
--- a/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
+++ b/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
@@ -102,6 +102,20 @@
status = "okay";
};
+&ephy {
+ allwinner,ephy-addr = <0x1>;
+};
+
+&emac {
+ phy = <&phy1>;
+ phy-mode = "mii";
+ status = "okay";
+
+ phy1: ethernet-phy@1 {
+ reg = <1>;
+ };
+};
+
&ir {
pinctrl-names = "default";
pinctrl-0 = <&ir_pins_a>;
--
2.7.0
^ permalink raw reply related
* [PATCH RFC 4/5] ARM: dts: sun8i-h3: Add Ethernet controller device node to sun8i-h3.dtsi
From: Chen-Yu Tsai @ 2016-04-04 16:22 UTC (permalink / raw)
To: Maxime Ripard, Florian Fainelli, Rob Herring, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala
Cc: devicetree, netdev, linux-kernel, Chen-Yu Tsai, LABBE Corentin,
linux-arm-kernel
In-Reply-To: <1459786954-12649-1-git-send-email-wens@csie.org>
From: LABBE Corentin <clabbe.montjoie@gmail.com>
The Allwinner H3 SoC has a gigabit Ethernet controller.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
[wens@csie.org: drop pinmux; update clocks/resets]
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---
This is not a stable binding. Do not merge.
---
arch/arm/boot/dts/sun8i-h3.dtsi | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/arm/boot/dts/sun8i-h3.dtsi b/arch/arm/boot/dts/sun8i-h3.dtsi
index 9a28aeba9bc6..7749af6354bb 100644
--- a/arch/arm/boot/dts/sun8i-h3.dtsi
+++ b/arch/arm/boot/dts/sun8i-h3.dtsi
@@ -635,6 +635,18 @@
status = "disabled";
};
+ emac: ethernet@1c30000 {
+ compatible = "allwinner,sun8i-h3-emac";
+ reg = <0x01c30000 0x1000>;
+ interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
+ resets = <&ahb_rst 17>;
+ reset-names = "ahb";
+ clocks = <&bus_gates 17>, <&ephy>;
+ clock-names = "ahb", "tx";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
gic: interrupt-controller@01c81000 {
compatible = "arm,cortex-a7-gic", "arm,cortex-a15-gic";
reg = <0x01c81000 0x1000>,
--
2.7.0
^ permalink raw reply related
* [PATCH RFC 3/5] ARM: dts: sun8i-h3: Add H3 Ethernet PHY device node to sun8i-h3.dtsi
From: Chen-Yu Tsai @ 2016-04-04 16:22 UTC (permalink / raw)
To: Maxime Ripard, Florian Fainelli, Rob Herring, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala
Cc: Chen-Yu Tsai, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA, LABBE Corentin
In-Reply-To: <1459786954-12649-1-git-send-email-wens-jdAy2FN1RRM@public.gmane.org>
The Allwinner H3 SoC incorporates an Ethernet PHY, whose controls are
mapped to a system control register.
Signed-off-by: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
---
arch/arm/boot/dts/sun8i-h3.dtsi | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/arm/boot/dts/sun8i-h3.dtsi b/arch/arm/boot/dts/sun8i-h3.dtsi
index 4a4926b0b0ed..9a28aeba9bc6 100644
--- a/arch/arm/boot/dts/sun8i-h3.dtsi
+++ b/arch/arm/boot/dts/sun8i-h3.dtsi
@@ -323,6 +323,15 @@
#size-cells = <1>;
ranges;
+ ephy: ethernet-phy@01c00030 {
+ compatible = "allwinner,sun8i-h3-ephy";
+ reg = <0x01c00030 0x4>;
+ clocks = <&bus_gates 128>;
+ resets = <&ahb_rst 66>;
+ #clock-cells = <0>;
+ clock-output-names = "emac_tx";
+ };
+
dma: dma-controller@01c02000 {
compatible = "allwinner,sun8i-h3-dma";
reg = <0x01c02000 0x1000>;
--
2.7.0
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH RFC 2/5] net: phy: sun8i-h3-ephy: Add driver for Allwinner H3 Ethernet PHY
From: Chen-Yu Tsai @ 2016-04-04 16:22 UTC (permalink / raw)
To: Maxime Ripard, Florian Fainelli, Rob Herring, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala
Cc: Chen-Yu Tsai, netdev, linux-arm-kernel, linux-kernel, devicetree,
LABBE Corentin
In-Reply-To: <1459786954-12649-1-git-send-email-wens@csie.org>
The Allwinner H3 SoC incorporates an Ethernet PHY. This is enabled and
configured through a memory mapped hardware register.
This same register also configures the MAC interface mode and TX clock
source. Also covered by the register, but not supported in this driver,
are TX/RX clock delay chains and inverters, and an RMII module.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---
drivers/net/phy/Kconfig | 8 ++
drivers/net/phy/Makefile | 1 +
drivers/net/phy/sun8i-h3-ephy.c | 266 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 275 insertions(+)
create mode 100644 drivers/net/phy/sun8i-h3-ephy.c
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 6dad9a9c356c..f4b9eb6e9114 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -276,3 +276,11 @@ endif # PHYLIB
config MICREL_KS8995MA
tristate "Micrel KS8995MA 5-ports 10/100 managed Ethernet switch"
depends on SPI
+
+config SUN8I_H3_EPHY
+ tristate "Allwinner sun8i H3 Ethernet PHY"
+ depends on OF && HAS_IOMEM && COMMON_CLK && RESET_CONTROLLER
+ depends on MACH_SUN8I || COMPILE_TEST
+ help
+ This modules provides a driver for the internal Ethernet PHY
+ and Ethernet MAC controls found in the Allwinner H3 SoC.
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index fcdbb9299fab..7cec3aceb518 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -44,3 +44,4 @@ obj-$(CONFIG_MDIO_MOXART) += mdio-moxart.o
obj-$(CONFIG_MDIO_BCM_UNIMAC) += mdio-bcm-unimac.o
obj-$(CONFIG_MICROCHIP_PHY) += microchip.o
obj-$(CONFIG_MDIO_BCM_IPROC) += mdio-bcm-iproc.o
+obj-$(CONFIG_SUN8I_H3_EPHY) += sun8i-h3-ephy.o
diff --git a/drivers/net/phy/sun8i-h3-ephy.c b/drivers/net/phy/sun8i-h3-ephy.c
new file mode 100644
index 000000000000..bf2703bee024
--- /dev/null
+++ b/drivers/net/phy/sun8i-h3-ephy.c
@@ -0,0 +1,266 @@
+/*
+ * Allwinner sun8i H3 E(thernet) PHY driver
+ *
+ * Copyright (C) 2016 Chen-Yu Tsai <wens@csie.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+#include <linux/spinlock.h>
+
+#define REG_PHY_ADDR_SHIFT 20
+#define REG_PHY_ADDR_MASK GENMASK(4, 0)
+#define REG_LED_POL BIT(17) /* 1: active low, 0: active high */
+#define REG_SHUTDOWN BIT(16) /* 1: shutdown, 0: power up */
+#define REG_PHY_SELECT BIT(15) /* 1: internal PHY, 0: external PHY */
+
+#define REG_DEFAULT_VALUE 0x58000
+#define REG_DEFAULT_MASK GENMASK(31, 15)
+
+#define REG_GPIT 2
+#define REG_ETCS_MASK 0x3
+#define SUN8I_H3_EMAC_PARENTS 2
+#define MII_PHY_CLK_RATE 25000000
+#define INT_TX_CLK_RATE 125000000
+#define MII_PHY_CLK_NAME "mii_phy_tx"
+#define INT_TX_CLK_NAME "int_tx"
+
+struct sun8i_h3_ephy {
+ struct device *dev;
+ void __iomem *reg;
+ struct clk *clk;
+ struct reset_control *reset;
+ struct clk *mac_clks[SUN8I_H3_EMAC_PARENTS + 1];
+};
+
+static DEFINE_SPINLOCK(sun8i_h3_ephy_lock);
+
+/* The EMAC clock/interface controls are much like those found on the A20,
+ * which is supported by sun7i-a20-gmac-clk. The H3 adds an RMII module,
+ * enabled by bit 13. Unfortunately vendor code and documentation don't
+ * explain how it should work with the other bits, and no hardware
+ * actually uses it.
+ */
+static u32 sun8i_h3_emac_mux_table[SUN8I_H3_EMAC_PARENTS] = {
+ 0x0, /* Select TX clock from MII PHY */
+ 0x2, /* Select internal TX clock (for RGMII) */
+};
+
+static const char *sun8i_h3_emac_mux_parents[SUN8I_H3_EMAC_PARENTS] = {
+ MII_PHY_CLK_NAME,
+ INT_TX_CLK_NAME,
+};
+
+int sun8i_h3_ephy_register_clocks(struct sun8i_h3_ephy *phy)
+{
+ struct device *dev = phy->dev;
+ struct device_node *np = dev->of_node;
+ const char *clk_name;
+ struct clk_mux *mux;
+ struct clk_gate *gate;
+ int ret;
+
+ if (of_property_read_string(np, "clock-output-names", &clk_name)) {
+ dev_err(phy->dev, "No clock output name given\n");
+ return -EINVAL;
+ }
+
+ /* allocate mux and gate clock structs */
+ mux = devm_kzalloc(dev, sizeof(struct clk_mux), GFP_KERNEL);
+ if (!mux)
+ return -ENOMEM;
+
+ gate = devm_kzalloc(dev, sizeof(struct clk_gate), GFP_KERNEL);
+ if (!gate)
+ return -ENOMEM;
+
+ /* Register fixed clocks to use as parents */
+ phy->mac_clks[0] = clk_register_fixed_rate(dev, MII_PHY_CLK_NAME,
+ NULL, 0, MII_PHY_CLK_RATE);
+ if (IS_ERR(phy->mac_clks[0]))
+ return PTR_ERR(phy->mac_clks[0]);
+
+ phy->mac_clks[1] = clk_register_fixed_rate(dev, INT_TX_CLK_NAME,
+ NULL, 0, INT_TX_CLK_RATE);
+ if (IS_ERR(phy->mac_clks[1])) {
+ ret = PTR_ERR(phy->mac_clks[1]);
+ goto err_int_tx_clk;
+ }
+
+ /* set up gate and mux properties */
+ gate->reg = phy->reg;
+ gate->bit_idx = REG_GPIT;
+ gate->lock = &sun8i_h3_ephy_lock;
+ mux->reg = phy->reg;
+ mux->mask = REG_ETCS_MASK;
+ mux->table = sun8i_h3_emac_mux_table;
+ mux->lock = &sun8i_h3_ephy_lock;
+
+ phy->mac_clks[2] = clk_register_composite(dev, clk_name,
+ sun8i_h3_emac_mux_parents,
+ SUN8I_H3_EMAC_PARENTS,
+ &mux->hw, &clk_mux_ops,
+ NULL, NULL,
+ &gate->hw, &clk_gate_ops,
+ 0);
+ if (IS_ERR(phy->mac_clks[2])) {
+ ret = PTR_ERR(phy->mac_clks[2]);
+ goto err_tx_clk;
+ }
+
+ ret = of_clk_add_provider(dev->of_node, of_clk_src_simple_get,
+ phy->mac_clks[2]);
+ if (ret)
+ goto err_of_clk;
+
+ return 0;
+
+err_of_clk:
+ /* TODO: switch to clk_unregister_composite when it's available */
+ clk_unregister(phy->mac_clks[2]);
+err_tx_clk:
+ clk_unregister_fixed_rate(phy->mac_clks[1]);
+err_int_tx_clk:
+ clk_unregister_fixed_rate(phy->mac_clks[0]);
+
+ return ret;
+}
+
+static void sun8i_h3_ephy_init_phy(struct sun8i_h3_ephy *phy, u32 addr)
+{
+ u32 val = readl(phy->reg);
+
+ /* set default values, but don't touch clock settings */
+ val = (val & ~REG_DEFAULT_MASK) | REG_DEFAULT_VALUE;
+
+ if (addr < REG_PHY_ADDR_MASK) {
+ dev_info(phy->dev, "Using interal PHY at 0x%x\n", addr);
+
+ val |= addr << REG_PHY_ADDR_SHIFT;
+ val &= ~REG_SHUTDOWN;
+ val |= REG_PHY_SELECT;
+
+ if (of_property_read_bool(phy->dev->of_node,
+ "allwinner,leds-active-low"))
+ val |= REG_LED_POL;
+ } else {
+ dev_info(phy->dev, "Using external PHY\n");
+ }
+
+ writel(val, phy->reg);
+}
+
+static void sun8i_h3_ephy_reset_phy(struct sun8i_h3_ephy *phy)
+{
+ u32 val;
+
+ val = readl(phy->reg);
+ val = (val & ~REG_DEFAULT_MASK) | REG_DEFAULT_VALUE;
+ writel(val, phy->reg);
+}
+
+int sun8i_h3_ephy_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct device_node *np = dev->of_node;
+ struct sun8i_h3_ephy *phy;
+ struct resource *res;
+ u32 addr = REG_PHY_ADDR_MASK;
+ int ret;
+
+ ret = of_property_read_u32(np, "allwinner,ephy-addr", &addr);
+ if (!ret && addr > REG_PHY_ADDR_MASK) {
+ dev_err(dev, "invalid PHY address: 0x%x\n", addr);
+ return -EINVAL;
+ } else if (ret && ret != -EINVAL) {
+ dev_err(dev, "could not get PHY address: %d\n", ret);
+ return ret;
+ }
+
+ phy = devm_kzalloc(dev, sizeof(*phy), GFP_KERNEL);
+ if (!phy)
+ return -ENOMEM;
+
+ phy->dev = dev;
+ platform_set_drvdata(pdev, phy);
+
+ phy->clk = devm_clk_get(dev, NULL);
+ if (IS_ERR(phy->clk)) {
+ dev_err(dev, "failed to get clock\n");
+ return PTR_ERR(phy->clk);
+ }
+
+ phy->reset = devm_reset_control_get(dev, NULL);
+ if (IS_ERR(phy->reset)) {
+ dev_err(dev, "failed to get reset control\n");
+ return PTR_ERR(phy->reset);
+ }
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ phy->reg = devm_ioremap_resource(dev, res);
+ if (IS_ERR(phy->reg)) {
+ dev_err(dev, "failed to map registers\n");
+ return PTR_ERR(phy->reg);
+ }
+
+ ret = clk_prepare_enable(phy->clk);
+ if (ret) {
+ dev_err(dev, "failed to enable clock\n");
+ return ret;
+ }
+
+ ret = reset_control_deassert(phy->reset);
+ if (ret) {
+ dev_err(dev, "failed to deassert reset control\n");
+ goto err_reset;
+ }
+
+ sun8i_h3_ephy_init_phy(phy, addr);
+
+ ret = sun8i_h3_ephy_register_clocks(phy);
+ if (ret)
+ goto err_tx_clk;
+
+ return 0;
+
+err_tx_clk:
+ sun8i_h3_ephy_reset_phy(phy);
+ reset_control_assert(phy->reset);
+err_reset:
+ clk_disable_unprepare(phy->clk);
+ return ret;
+}
+
+static const struct of_device_id sun8i_h3_ephy_of_match[] = {
+ { .compatible = "allwinner,sun8i-h3-ephy", },
+ { },
+};
+MODULE_DEVICE_TABLE(of, sun8i_h3_ephy_of_match);
+
+static struct platform_driver sun8i_h3_ephy_driver = {
+ .probe = sun8i_h3_ephy_probe,
+ .driver = {
+ .of_match_table = sun8i_h3_ephy_of_match,
+ .name = "sun8i-h3-ephy",
+ }
+};
+module_platform_driver(sun8i_h3_ephy_driver);
+
+MODULE_DESCRIPTION("Allwinner sun8i H3 Ethernet PHY driver");
+MODULE_AUTHOR("Chen-Yu Tsai <wens@csie.org>");
+MODULE_LICENSE("GPL v2");
--
2.7.0
^ permalink raw reply related
* [PATCH RFC 1/5] net: phy: sun8i-h3-ephy: Add bindings for Allwinner H3 Ethernet PHY
From: Chen-Yu Tsai @ 2016-04-04 16:22 UTC (permalink / raw)
To: Maxime Ripard, Florian Fainelli, Rob Herring, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala
Cc: Chen-Yu Tsai, netdev, linux-arm-kernel, linux-kernel, devicetree,
LABBE Corentin
In-Reply-To: <1459786954-12649-1-git-send-email-wens@csie.org>
The Allwinner H3 SoC incorporates an Ethernet PHY. This is enabled and
configured through a memory mapped hardware register.
This same register also configures the MAC interface mode and TX clock
source. Also covered by the register, but not supported in these bindings,
are TX/RX clock delay chains and inverters, and an RMII module.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---
.../bindings/net/allwinner,sun8i-h3-ephy.txt | 44 ++++++++++++++++++++++
1 file changed, 44 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt
diff --git a/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt b/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt
new file mode 100644
index 000000000000..146f227e6d88
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt
@@ -0,0 +1,44 @@
+* Allwinner H3 E(thernet) PHY
+
+The Allwinner H3 integrates an MII ethernet PHY. As with external PHYs,
+before it can be configured over the MDIO bus and used, certain hardware
+features must be configured, such as the PHY address and LED polarity.
+The PHY must also be powered on and brought out of reset.
+
+This is accomplished with regulators and pull-up/downs for external PHYs.
+For this internal PHY, a hardware register is programmed.
+
+The same hardware register also contains clock and interface controls
+for the MAC. This is also present in earlier SoCs, and is covered by
+"allwinner,sun7i-a20-gmac-clk". The controls in the H3 are slightly
+different due to the inclusion of what appears to be an RMII-MII
+bridge.
+
+Required properties:
+- compatible: should be "allwinner,sun8i-h3-ephy"
+- reg: address and length of the register set for the device
+- clocks: A phandle to the reference clock for this device
+- resets: A phandle to the reset control for this device
+
+Ethernet PHY related properties:
+- allwinner,ephy-addr: the MDIO bus address the PHY should respond to.
+ If this is not set, the external PHY is used, and
+ everything else in this section is ignored.
+- allwinner,leds-active-low: LEDs are active low. Without this, LEDs are
+ active high.
+
+Ethernet MAC clock related properties:
+- #clock-cells: should be 0
+- clock-output-names: "mac_tx"
+
+Example:
+
+ethernet-phy@01c00030 {
+ compatible = "allwinner,sun8i-h3-ephy";
+ reg = <0x01c00030 0x4>;
+ clocks = <&bus_gates 128>;
+ resets = <&ahb_rst 66>;
+ allwinner,ephy-addr = <0x1>;
+ #clock-cells = <0>;
+ clock-output-names = "mac_tx";
+};
--
2.7.0
^ permalink raw reply related
* [PATCH RFC 0/5] net: phy: sun8i-h3-ephy: Add Allwinner H3 Ethernet PHY driver
From: Chen-Yu Tsai @ 2016-04-04 16:22 UTC (permalink / raw)
To: Maxime Ripard, Florian Fainelli, Rob Herring, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala
Cc: Chen-Yu Tsai, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA, LABBE Corentin
Hi everyone,
This is an attempt to support the Ethernet PHY incorporated in Allwinner's
H3 SoC. It is a MII PHY, supporting 10/100 Mbps Ethernet.
Before the PHY can be accessed via MDIO and used, it needs to be powered
on and configured. This is done via a system control register in the CPU
address space. The same register also controls PHY interface mode and
MAC TX clock sources.
This driver brings up the Ethernet PHY (if it is used), and exports a
clock control for the MAC TX clock.
Patch 1 adds the bindings for this piece of hardware.
Patch 2 adds the driver for it.
Patch 3 adds a device node for the PHY.
Patch 4 & 5 are not to be merged. They are provided here as a solid
example of how this driver is used. The sun8i-emac bindings have not
been submitted and are not stable yet.
Comments welcome.
Regards
ChenYu
Chen-Yu Tsai (4):
net: phy: sun8i-h3-ephy: Add bindings for Allwinner H3 Ethernet PHY
net: phy: sun8i-h3-ephy: Add driver for Allwinner H3 Ethernet PHY
ARM: dts: sun8i-h3: Add H3 Ethernet PHY device node to sun8i-h3.dtsi
ARM: dts: sun8i: Enable Ethernet controller on the Orange PI PC
LABBE Corentin (1):
ARM: dts: sun8i-h3: Add Ethernet controller device node to
sun8i-h3.dtsi
.../bindings/net/allwinner,sun8i-h3-ephy.txt | 44 ++++
arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts | 14 ++
arch/arm/boot/dts/sun8i-h3.dtsi | 21 ++
drivers/net/phy/Kconfig | 8 +
drivers/net/phy/Makefile | 1 +
drivers/net/phy/sun8i-h3-ephy.c | 266 +++++++++++++++++++++
6 files changed, 354 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt
create mode 100644 drivers/net/phy/sun8i-h3-ephy.c
--
2.7.0
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Brenden Blanco @ 2016-04-04 16:17 UTC (permalink / raw)
To: John Fastabend
Cc: Jesper Dangaard Brouer, Tom Herbert, Daniel Borkmann,
David S. Miller, Linux Kernel Network Developers,
Alexei Starovoitov, ogerlitz
In-Reply-To: <57029127.3040303@gmail.com>
On Mon, Apr 04, 2016 at 09:07:03AM -0700, John Fastabend wrote:
> On 16-04-04 08:29 AM, Brenden Blanco wrote:
> > On Mon, Apr 04, 2016 at 05:12:27PM +0200, Jesper Dangaard Brouer wrote:
> >> On Mon, 4 Apr 2016 11:09:57 -0300
> >> Tom Herbert <tom@herbertland.com> wrote:
> >>
> >>> On Mon, Apr 4, 2016 at 10:36 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>>> On 04/04/2016 03:07 PM, Jesper Dangaard Brouer wrote:
> >>>>>
> >>>>> On Mon, 04 Apr 2016 10:49:09 +0200 Daniel Borkmann <daniel@iogearbox.net>
> >>>>> wrote:
> >>>>>>
> >>>>>> On 04/02/2016 03:21 AM, Brenden Blanco wrote:
> >>>>>>>
> >>>>>>> Add a new bpf prog type that is intended to run in early stages of the
> >>>>>>> packet rx path. Only minimal packet metadata will be available, hence a
> >>>>>>> new
> >>>>>>> context type, struct xdp_metadata, is exposed to userspace. So far only
> >>>>>>> expose the readable packet length, and only in read mode.
> >>>>>>>
> >>>>>>> The PHYS_DEV name is chosen to represent that the program is meant only
> >>>>>>> for physical adapters, rather than all netdevs.
> >>>>>>>
> >>>>>>> While the user visible struct is new, the underlying context must be
> >>>>>>> implemented as a minimal skb in order for the packet load_* instructions
> >>>>>>> to work. The skb filled in by the driver must have skb->len, skb->head,
> >>>>>>> and skb->data set, and skb->data_len == 0.
> >>>>>>>
> >>>>> [...]
> >>>>>>
> >>>>>>
> >>>>>> Do you plan to support bpf_skb_load_bytes() as well? I like using
> >>>>>> this API especially when dealing with larger chunks (>4 bytes) to
> >>>>>> load into stack memory, plus content is kept in network byte order.
> >>>>>>
> >>>>>> What about other helpers such as bpf_skb_store_bytes() et al that
> >>>>>> work on skbs. Do you intent to reuse them as is and thus populate
> >>>>>> the per cpu skb with needed fields (faking linear data), or do you
> >>>>>> see larger obstacles that prevent for this?
> >>>>>
> >>>>>
> >>>>> Argh... maybe the minimal pseudo/fake SKB is the wrong "signal" to send
> >>>>> to users of this API.
> >>>>>
> >>>>> The hole idea is that an SKB is NOT allocated yet, and not needed at
> >>>>> this level. If we start supporting calling underlying SKB functions,
> >>>>> then we will end-up in the same place (performance wise).
> >>>>
> >>>>
> >>>> I'm talking about the current skb-related BPF helper functions we have,
> >>>> so the question is how much from that code we have we can reuse under
> >>>> these constraints (obviously things like the tunnel helpers are a different
> >>>> story) and if that trade-off is acceptable for us. I'm also thinking
> >>>> that, for example, if you need to parse the packet data anyway for a drop
> >>>> verdict, you might as well pass some meta data (that is set in the real
> >>>> skb later on) for those packets that go up the stack.
> >>>
> >>> Right, the meta data in this case is an abstracted receive descriptor.
> >>> This would include items that we get in a device receive descriptor
> >>> (computed checksum, hash, VLAN tag). This is purposely a small
> >>> restricted data structure. I'm hoping we can minimize the size of this
> >>> to not much more than 32 bytes (including pointers to data and
> >>> linkage).
> >>
> >> I agree.
> >>
> >>> How this translates to skb to maintain compatibility is with BPF
> >>> interesting question. One other consideration is that skb's are kernel
> >>> specific, we should be able to use the same BPF filter program in
> >>> userspace over DPDK for instance-- so an skb interface as the packet
> >>> abstraction might not be the right model...
> >>
> >> I agree. I don't think reusing the SKB data structure is the right
> >> model. We should drop the SKB pointer from the API.
> >>
> >> As Tom also points out, making the BPF interface independent of the SKB
> >> meta-data structure, would also make the eBPF program more generally
> >> applicable.
> > The initial approach that I tried went down this path. Alexei advised
> > that I use the pseudo skb, and in the future the API between drivers and
> > bpf can change to adopt non-skb context. The only user facing ABIs in
> > this patchset are the IFLA, the xdp_metadata struct, and the name of the
> > new enum.
> >
> > The reason to use a pseudo skb for now is that there will be a fair
> > amount of churn to get bpf jit and interpreter to understand non-skb
> > context in the bpf_load_pointer() code. I don't see the need for
> > requiring that for this patchset, as it will be internal-only change
> > if/when we use something else.
>
> Another option would be to have per driver JIT code to patch up the
> skb read/loads with descriptor reads and metadata. From a strictly
> performance stand point it should be better than pseudo skbs.
I considered (and implemented) this as well, but there my problem was
that I needed to inform the bpf() syscall at BPF_PROG_LOAD time which
ifindex to look at for fixups, so I had to add a new ifindex field to
bpf_attr. Then during verification I had to use a new ndo to get the
driver-specific offsets for its particular descriptor format. It seemed
kludgy.
>
> .John
^ permalink raw reply
* Re: [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: John Fastabend @ 2016-04-04 16:07 UTC (permalink / raw)
To: Brenden Blanco, Jesper Dangaard Brouer
Cc: Tom Herbert, Daniel Borkmann, David S. Miller,
Linux Kernel Network Developers, Alexei Starovoitov, ogerlitz
In-Reply-To: <20160404152948.GA495@gmail.com>
On 16-04-04 08:29 AM, Brenden Blanco wrote:
> On Mon, Apr 04, 2016 at 05:12:27PM +0200, Jesper Dangaard Brouer wrote:
>> On Mon, 4 Apr 2016 11:09:57 -0300
>> Tom Herbert <tom@herbertland.com> wrote:
>>
>>> On Mon, Apr 4, 2016 at 10:36 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>> On 04/04/2016 03:07 PM, Jesper Dangaard Brouer wrote:
>>>>>
>>>>> On Mon, 04 Apr 2016 10:49:09 +0200 Daniel Borkmann <daniel@iogearbox.net>
>>>>> wrote:
>>>>>>
>>>>>> On 04/02/2016 03:21 AM, Brenden Blanco wrote:
>>>>>>>
>>>>>>> Add a new bpf prog type that is intended to run in early stages of the
>>>>>>> packet rx path. Only minimal packet metadata will be available, hence a
>>>>>>> new
>>>>>>> context type, struct xdp_metadata, is exposed to userspace. So far only
>>>>>>> expose the readable packet length, and only in read mode.
>>>>>>>
>>>>>>> The PHYS_DEV name is chosen to represent that the program is meant only
>>>>>>> for physical adapters, rather than all netdevs.
>>>>>>>
>>>>>>> While the user visible struct is new, the underlying context must be
>>>>>>> implemented as a minimal skb in order for the packet load_* instructions
>>>>>>> to work. The skb filled in by the driver must have skb->len, skb->head,
>>>>>>> and skb->data set, and skb->data_len == 0.
>>>>>>>
>>>>> [...]
>>>>>>
>>>>>>
>>>>>> Do you plan to support bpf_skb_load_bytes() as well? I like using
>>>>>> this API especially when dealing with larger chunks (>4 bytes) to
>>>>>> load into stack memory, plus content is kept in network byte order.
>>>>>>
>>>>>> What about other helpers such as bpf_skb_store_bytes() et al that
>>>>>> work on skbs. Do you intent to reuse them as is and thus populate
>>>>>> the per cpu skb with needed fields (faking linear data), or do you
>>>>>> see larger obstacles that prevent for this?
>>>>>
>>>>>
>>>>> Argh... maybe the minimal pseudo/fake SKB is the wrong "signal" to send
>>>>> to users of this API.
>>>>>
>>>>> The hole idea is that an SKB is NOT allocated yet, and not needed at
>>>>> this level. If we start supporting calling underlying SKB functions,
>>>>> then we will end-up in the same place (performance wise).
>>>>
>>>>
>>>> I'm talking about the current skb-related BPF helper functions we have,
>>>> so the question is how much from that code we have we can reuse under
>>>> these constraints (obviously things like the tunnel helpers are a different
>>>> story) and if that trade-off is acceptable for us. I'm also thinking
>>>> that, for example, if you need to parse the packet data anyway for a drop
>>>> verdict, you might as well pass some meta data (that is set in the real
>>>> skb later on) for those packets that go up the stack.
>>>
>>> Right, the meta data in this case is an abstracted receive descriptor.
>>> This would include items that we get in a device receive descriptor
>>> (computed checksum, hash, VLAN tag). This is purposely a small
>>> restricted data structure. I'm hoping we can minimize the size of this
>>> to not much more than 32 bytes (including pointers to data and
>>> linkage).
>>
>> I agree.
>>
>>> How this translates to skb to maintain compatibility is with BPF
>>> interesting question. One other consideration is that skb's are kernel
>>> specific, we should be able to use the same BPF filter program in
>>> userspace over DPDK for instance-- so an skb interface as the packet
>>> abstraction might not be the right model...
>>
>> I agree. I don't think reusing the SKB data structure is the right
>> model. We should drop the SKB pointer from the API.
>>
>> As Tom also points out, making the BPF interface independent of the SKB
>> meta-data structure, would also make the eBPF program more generally
>> applicable.
> The initial approach that I tried went down this path. Alexei advised
> that I use the pseudo skb, and in the future the API between drivers and
> bpf can change to adopt non-skb context. The only user facing ABIs in
> this patchset are the IFLA, the xdp_metadata struct, and the name of the
> new enum.
>
> The reason to use a pseudo skb for now is that there will be a fair
> amount of churn to get bpf jit and interpreter to understand non-skb
> context in the bpf_load_pointer() code. I don't see the need for
> requiring that for this patchset, as it will be internal-only change
> if/when we use something else.
Another option would be to have per driver JIT code to patch up the
skb read/loads with descriptor reads and metadata. From a strictly
performance stand point it should be better than pseudo skbs.
.John
^ permalink raw reply
* Re: [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Brenden Blanco @ 2016-04-04 15:29 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Tom Herbert, Daniel Borkmann, David S. Miller,
Linux Kernel Network Developers, Alexei Starovoitov, ogerlitz,
john fastabend
In-Reply-To: <20160404171227.1f862cb1@redhat.com>
On Mon, Apr 04, 2016 at 05:12:27PM +0200, Jesper Dangaard Brouer wrote:
> On Mon, 4 Apr 2016 11:09:57 -0300
> Tom Herbert <tom@herbertland.com> wrote:
>
> > On Mon, Apr 4, 2016 at 10:36 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > On 04/04/2016 03:07 PM, Jesper Dangaard Brouer wrote:
> > >>
> > >> On Mon, 04 Apr 2016 10:49:09 +0200 Daniel Borkmann <daniel@iogearbox.net>
> > >> wrote:
> > >>>
> > >>> On 04/02/2016 03:21 AM, Brenden Blanco wrote:
> > >>>>
> > >>>> Add a new bpf prog type that is intended to run in early stages of the
> > >>>> packet rx path. Only minimal packet metadata will be available, hence a
> > >>>> new
> > >>>> context type, struct xdp_metadata, is exposed to userspace. So far only
> > >>>> expose the readable packet length, and only in read mode.
> > >>>>
> > >>>> The PHYS_DEV name is chosen to represent that the program is meant only
> > >>>> for physical adapters, rather than all netdevs.
> > >>>>
> > >>>> While the user visible struct is new, the underlying context must be
> > >>>> implemented as a minimal skb in order for the packet load_* instructions
> > >>>> to work. The skb filled in by the driver must have skb->len, skb->head,
> > >>>> and skb->data set, and skb->data_len == 0.
> > >>>>
> > >> [...]
> > >>>
> > >>>
> > >>> Do you plan to support bpf_skb_load_bytes() as well? I like using
> > >>> this API especially when dealing with larger chunks (>4 bytes) to
> > >>> load into stack memory, plus content is kept in network byte order.
> > >>>
> > >>> What about other helpers such as bpf_skb_store_bytes() et al that
> > >>> work on skbs. Do you intent to reuse them as is and thus populate
> > >>> the per cpu skb with needed fields (faking linear data), or do you
> > >>> see larger obstacles that prevent for this?
> > >>
> > >>
> > >> Argh... maybe the minimal pseudo/fake SKB is the wrong "signal" to send
> > >> to users of this API.
> > >>
> > >> The hole idea is that an SKB is NOT allocated yet, and not needed at
> > >> this level. If we start supporting calling underlying SKB functions,
> > >> then we will end-up in the same place (performance wise).
> > >
> > >
> > > I'm talking about the current skb-related BPF helper functions we have,
> > > so the question is how much from that code we have we can reuse under
> > > these constraints (obviously things like the tunnel helpers are a different
> > > story) and if that trade-off is acceptable for us. I'm also thinking
> > > that, for example, if you need to parse the packet data anyway for a drop
> > > verdict, you might as well pass some meta data (that is set in the real
> > > skb later on) for those packets that go up the stack.
> >
> > Right, the meta data in this case is an abstracted receive descriptor.
> > This would include items that we get in a device receive descriptor
> > (computed checksum, hash, VLAN tag). This is purposely a small
> > restricted data structure. I'm hoping we can minimize the size of this
> > to not much more than 32 bytes (including pointers to data and
> > linkage).
>
> I agree.
>
> > How this translates to skb to maintain compatibility is with BPF
> > interesting question. One other consideration is that skb's are kernel
> > specific, we should be able to use the same BPF filter program in
> > userspace over DPDK for instance-- so an skb interface as the packet
> > abstraction might not be the right model...
>
> I agree. I don't think reusing the SKB data structure is the right
> model. We should drop the SKB pointer from the API.
>
> As Tom also points out, making the BPF interface independent of the SKB
> meta-data structure, would also make the eBPF program more generally
> applicable.
The initial approach that I tried went down this path. Alexei advised
that I use the pseudo skb, and in the future the API between drivers and
bpf can change to adopt non-skb context. The only user facing ABIs in
this patchset are the IFLA, the xdp_metadata struct, and the name of the
new enum.
The reason to use a pseudo skb for now is that there will be a fair
amount of churn to get bpf jit and interpreter to understand non-skb
context in the bpf_load_pointer() code. I don't see the need for
requiring that for this patchset, as it will be internal-only change
if/when we use something else.
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> Author of http://www.iptv-analyzer.org
> LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program
From: Eric Dumazet @ 2016-04-04 15:22 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Alexei Starovoitov, Brenden Blanco, davem, netdev, tom,
Or Gerlitz, daniel, john.fastabend
In-Reply-To: <20160404165701.2a25a17a@redhat.com>
On Mon, 2016-04-04 at 16:57 +0200, Jesper Dangaard Brouer wrote:
> On Fri, 1 Apr 2016 19:47:12 -0700 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > My guess we're hitting 14.5Mpps limit for empty bpf program
> > and for program that actually looks into the packet because we're
> > hitting 10G phy limit of 40G nic. Since physically 40G nic
> > consists of four 10G phys. There will be the same problem
> > with 100G and 50G nics. Both will be hitting 25G phy limit.
> > We need to vary packets somehow. Hopefully Or can explain that
> > bit of hw design.
> > Jesper's experiments with mlx4 showed the same 14.5Mpps limit
> > when sender blasting the same packet over and over again.
>
> That is an interesting observation Alexei, and could explain the pps limit
> I hit on 40G, with single flow testing. AFAIK 40G is 4x 10G PHYs, and
> 100G is 4x 25G PHYs.
>
> I have a pktgen script that tried to avoid this pitfall. By creating a
> new flow per pktgen kthread. I call it "pktgen_sample05_flow_per_thread.sh"[1]
>
> [1] https://github.com/netoptimizer/network-testing/blob/master/pktgen/pktgen_sample05_flow_per_thread.sh
>
A single flow is able to use 40Gbit on those 40Gbit NIC, so there is not
a single 10GB trunk used for a given flow.
This 14Mpps thing seems to be a queue limitation on mlx4.
^ permalink raw reply
* Re: [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Edward Cree @ 2016-04-04 15:18 UTC (permalink / raw)
To: Eric Dumazet, Jesper Dangaard Brouer
Cc: Daniel Borkmann, Brenden Blanco, davem, netdev, tom,
alexei.starovoitov, gerlitz, john.fastabend
In-Reply-To: <1459780420.6473.337.camel@edumazet-glaptop3.roam.corp.google.com>
On 04/04/16 15:33, Eric Dumazet wrote:
> On Mon, 2016-04-04 at 15:07 +0200, Jesper Dangaard Brouer wrote:
>
>> Argh... maybe the minimal pseudo/fake SKB is the wrong "signal" to send
>> to users of this API.
>>
>> The hole idea is that an SKB is NOT allocated yet, and not needed at
>> this level. If we start supporting calling underlying SKB functions,
>> then we will end-up in the same place (performance wise).
>
> A BPF program can access many skb fields.
>
> If you plan to support BPF, your fake skb needs to be populated like a
> real one. Looks like some code will be replicated in all drivers that
> want this facility...
>
> Or accept (document ?) that some BPF instructions are just not there.
> (hash, queue_mapping ...)
If these progs are eventually going to get pushed down into supporting
hardware, many skb things won't make sense at all at that level. I would
suggest that anything hardware wouldn't reasonably have available should
be "just not there"; I suspect that'll lead you to the right API for early
driver filter as well. And it probably won't look much like an skb.
-Ed
^ permalink raw reply
* Re: [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Jesper Dangaard Brouer @ 2016-04-04 15:12 UTC (permalink / raw)
To: Tom Herbert
Cc: Daniel Borkmann, Brenden Blanco, David S. Miller,
Linux Kernel Network Developers, Alexei Starovoitov, gerlitz,
john fastabend, brouer
In-Reply-To: <CALx6S37aK79AbkUPBFTHkonUziSb7A1KV47vnG1OgciPD2qXcA@mail.gmail.com>
On Mon, 4 Apr 2016 11:09:57 -0300
Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Apr 4, 2016 at 10:36 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > On 04/04/2016 03:07 PM, Jesper Dangaard Brouer wrote:
> >>
> >> On Mon, 04 Apr 2016 10:49:09 +0200 Daniel Borkmann <daniel@iogearbox.net>
> >> wrote:
> >>>
> >>> On 04/02/2016 03:21 AM, Brenden Blanco wrote:
> >>>>
> >>>> Add a new bpf prog type that is intended to run in early stages of the
> >>>> packet rx path. Only minimal packet metadata will be available, hence a
> >>>> new
> >>>> context type, struct xdp_metadata, is exposed to userspace. So far only
> >>>> expose the readable packet length, and only in read mode.
> >>>>
> >>>> The PHYS_DEV name is chosen to represent that the program is meant only
> >>>> for physical adapters, rather than all netdevs.
> >>>>
> >>>> While the user visible struct is new, the underlying context must be
> >>>> implemented as a minimal skb in order for the packet load_* instructions
> >>>> to work. The skb filled in by the driver must have skb->len, skb->head,
> >>>> and skb->data set, and skb->data_len == 0.
> >>>>
> >> [...]
> >>>
> >>>
> >>> Do you plan to support bpf_skb_load_bytes() as well? I like using
> >>> this API especially when dealing with larger chunks (>4 bytes) to
> >>> load into stack memory, plus content is kept in network byte order.
> >>>
> >>> What about other helpers such as bpf_skb_store_bytes() et al that
> >>> work on skbs. Do you intent to reuse them as is and thus populate
> >>> the per cpu skb with needed fields (faking linear data), or do you
> >>> see larger obstacles that prevent for this?
> >>
> >>
> >> Argh... maybe the minimal pseudo/fake SKB is the wrong "signal" to send
> >> to users of this API.
> >>
> >> The hole idea is that an SKB is NOT allocated yet, and not needed at
> >> this level. If we start supporting calling underlying SKB functions,
> >> then we will end-up in the same place (performance wise).
> >
> >
> > I'm talking about the current skb-related BPF helper functions we have,
> > so the question is how much from that code we have we can reuse under
> > these constraints (obviously things like the tunnel helpers are a different
> > story) and if that trade-off is acceptable for us. I'm also thinking
> > that, for example, if you need to parse the packet data anyway for a drop
> > verdict, you might as well pass some meta data (that is set in the real
> > skb later on) for those packets that go up the stack.
>
> Right, the meta data in this case is an abstracted receive descriptor.
> This would include items that we get in a device receive descriptor
> (computed checksum, hash, VLAN tag). This is purposely a small
> restricted data structure. I'm hoping we can minimize the size of this
> to not much more than 32 bytes (including pointers to data and
> linkage).
I agree.
> How this translates to skb to maintain compatibility is with BPF
> interesting question. One other consideration is that skb's are kernel
> specific, we should be able to use the same BPF filter program in
> userspace over DPDK for instance-- so an skb interface as the packet
> abstraction might not be the right model...
I agree. I don't think reusing the SKB data structure is the right
model. We should drop the SKB pointer from the API.
As Tom also points out, making the BPF interface independent of the SKB
meta-data structure, would also make the eBPF program more generally
applicable.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH] ath5k: Change led pin configuration for compaq c700 laptop
From: Kalle Valo @ 2016-04-04 15:03 UTC (permalink / raw)
To: Joseph Salisbury
Cc: jirislaby, mickflemm, mcgrof, linux-kernel, stable,
linux-wireless, netdev
In-Reply-To: <830d8fa0b7cb0b8cfb0b34360925cd50c3187341.1457977047.git.joseph.salisbury@canonical.com>
Joseph Salisbury <joseph.salisbury@canonical.com> writes:
> BugLink: http://bugs.launchpad.net/bugs/972604
>
> Commit 09c9bae26b0d3c9472cb6ae45010460a2cee8b8d ("ath5k: add led pin
> configuration for compaq c700 laptop") added a pin configuration for the Compaq
> c700 laptop. However, the polarity of the led pin is reversed. It should be
> red for wifi off and blue for wifi on, but it is the opposite. This bug was
> reported in the following bug report:
> http://pad.lv/972604
>
>
> Fixes: 09c9bae26b0d3c9472cb6ae45010460a2cee8b8d ("ath5k: add led pin
> configuration for compaq c700 laptop")
>
> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
> Cc: stable@vger.kernel.org
Applied to ath.git, thanks.
I just fixed the Fixes line to be one continuous line.
--
Kalle Valo
^ permalink raw reply
* Re: System hangs (unable to handle kernel paging request)
From: Oleksii Berezhniak @ 2016-04-04 15:01 UTC (permalink / raw)
To: netdev
In-Reply-To: <57027A82.6040807@gmail.com>
Can you please point me to more detailed description of similar issues
that you mentioned?
I can only find this:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32b3e08fff60494cd1d281a39b51583edfd2b18f
But there are no any hangs. Only performance issues.
BTW, GRO (Generic Receive Offloading) is disabled on our network adapter.
2016-04-04 17:30 GMT+03:00 Bastien Philbert <bastienphilbert@gmail.com>:
>
>
> On 2016-04-04 03:59 AM, Oleksii Berezhniak wrote:
>> Good day.
>>
>> We have PPPoE server with CentOS 7 (kernel 3.10.0-327.10.1.el7.dsip.x86_64)
>>
>> We applied some PPPoE related patches to this kernel:
>>
>> ppp: don't override sk->sk_state in pppoe_flush_dev()
>> ppp: fix pppoe_dev deletion condition in pppoe_release()
>> pppoe: fix memory corruption in padt work structure
>> pppoe: fix reference counting in PPPoE proxy
>>
>> Also we built latest version of ixgbe driver from Intel.
>>
>> Now we have crashes after approx. one week of uptime:
>>
>> [545444.673270] BUG: unable to handle kernel paging request at ffff88a005040200
>> [545444.673306] IP: [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
>> [545444.673335] PGD 0
>> [545444.673348] Oops: 0000 [#1] SMP
>> [545444.673367] Modules linked in: arc4 ppp_mppe act_police cls_u32
>> sch_ingress sch_tbf pptp gre pppoe pppox ppp_generic slhc 8021q garp
>> stp mrp llc iptable_nat nf_conn
>> track_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter xt_TCPMSS
>> iptable_mangle xt_CT nf_conntrack iptable_raw w83793 hwmon_vid
>> snd_hda_codec_realtek snd_hda_codec
>> _generic snd_hda_intel snd_hda_codec coretemp snd_hda_core iTCO_wdt
>> kvm iTCO_vendor_support snd_hwdep snd_seq snd_seq_device ipmi_ssif
>> ppdev lpc_ich snd_pcm pcspkr mfd_
>> core sg ipmi_si snd_timer snd i2c_i801 ipmi_msghandler ioatdma
>> parport_pc parport shpchp soundcore i7core_edac tpm_infineon edac_core
>> ip_tables ext4 mbcache jbd2 sd_mod
>> crct10dif_generic crc_t10dif crct10dif_common syscopyarea sysfillrect
>> firewire_ohci sysimgblt i2c_algo_bit drm_kms_helper ata_generic
>> pata_acpi
>> [545444.674383] ttm firewire_core crc_itu_t serio_raw drm ata_piix
>> libata crc32c_intel i2c_core ixgbe(OE) vxlan e1000e ip6_udp_tunnel
>> udp_tunnel aacraid dca ptp pps_co
>> re
>> [545444.674783] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G OE
>> ------------ 3.10.0-327.10.1.el7.dsip.x86_64 #1
>> [545444.675032] Hardware name: empty empty/S7010, BIOS 'V2.06 ' 03/31/2010
>> [545444.675162] task: ffff880139c55c00 ti: ffff880139c84000 task.ti:
>> ffff880139c84000
>> [545444.675400] RIP: 0010:[<ffffffff811c0e95>] [<ffffffff811c0e95>]
>> kmem_cache_alloc+0x75/0x1d0
>> [545444.675641] RSP: 0018:ffff88023fc23ce8 EFLAGS: 00010286
>> [545444.675766] RAX: 0000000000000000 RBX: ffff8802302eab00 RCX:
>> 000000010eb8edbe
>> [545444.676002] RDX: 000000010eb8edbd RSI: 0000000000000020 RDI:
>> ffff88013b803700
>> [545444.676237] RBP: ffff88023fc23d18 R08: 00000000000175a0 R09:
>> ffffffff81517e70
>> [545444.676472] R10: 000000000000006b R11: 0000000000000000 R12:
>> ffff88a005040200
>> [545444.676706] R13: 0000000000000020 R14: ffff88013b803700 R15:
>> ffff88013b803700
>> [545444.676942] FS: 0000000000000000(0000) GS:ffff88023fc20000(0000)
>> knlGS:0000000000000000
>> [545444.677180] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [545444.677307] CR2: ffff88a005040200 CR3: 0000000237e63000 CR4:
>> 00000000000007e0
>> [545444.677543] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [545444.677779] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [545444.678014] Stack:
>> [545444.678127] ffff880237ea2040 ffff8802302eab00 0000000000000280
>> 0000000000000280
>> [545444.678370] 0000000000000006 ffff880236bb1b60 ffff88023fc23d40
>> ffffffff81517e70
>> [545444.678614] 0000000000000280 ffff8802302eab00 0000000000000000
>> ffff88023fc23d60
>> [545444.678857] Call Trace:
>> [545444.678973] <IRQ>
>>
>> [545444.678982]
>> [545444.679100] [<ffffffff81517e70>] build_skb+0x30/0x1d0
>> [545444.679222] [<ffffffff8151a973>] __alloc_rx_skb+0x63/0xb0
>> [545444.679349] [<ffffffff8151a9db>] __netdev_alloc_skb+0x1b/0x40
>> [545444.679492] [<ffffffffa0104d8e>] ixgbe_clean_rx_irq+0xee/0xa50 [ixgbe]
>> [545444.679624] [<ffffffff8152862f>] ? __napi_complete+0x1f/0x30
>> [545444.679756] [<ffffffffa0106738>] ixgbe_poll+0x2d8/0x6d0 [ixgbe]
>> [545444.679886] [<ffffffff8152b092>] net_rx_action+0x152/0x240
>> [545444.680015] [<ffffffff81084aef>] __do_softirq+0xef/0x280
>> [545444.680144] [<ffffffff8164735c>] call_softirq+0x1c/0x30
>> [545444.680277] [<ffffffff81016fc5>] do_softirq+0x65/0xa0
>> [545444.680402] [<ffffffff81084e85>] irq_exit+0x115/0x120
>> [545444.680529] [<ffffffff81647ef8>] do_IRQ+0x58/0xf0
>> [545444.680660] [<ffffffff8163d1ad>] common_interrupt+0x6d/0x6d
>> [545444.680786] <EOI>
>> [545444.680794]
>> [545444.680914] [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
>> [545444.681041] [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
>> [545444.681168] [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
>> [545444.681297] [<ffffffff810d62c5>] cpu_startup_entry+0x245/0x290
>> [545444.681427] [<ffffffff810475fa>] start_secondary+0x1ba/0x230
>> [545444.681554] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85
>> e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a
>> 01 4d 8b 06 <49> 8b 1c 04 4c
>> 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
>> [545444.682056] RIP [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
>> [545444.682186] RSP <ffff88023fc23ce8>
>> [545444.682305] CR2: ffff88a005040200
>>
>>
>> Every time description and call stack are the same.
>>
>> What can be cause of these crashes?
>>
>> Thanks.
>>
> I am wondering if your kernel has this commit id, 32b3e08fff60494cd1d281a39b51583edfd2b18f.
> As this seems to be added to fix issues that look very similar to the trace you are receiving.
> Nick
--
WBR
^ permalink raw reply
* Re: [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program
From: Jesper Dangaard Brouer @ 2016-04-04 14:57 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eric Dumazet, Brenden Blanco, davem, netdev, tom, Or Gerlitz,
daniel, john.fastabend, brouer
In-Reply-To: <20160402024710.GA59703@ast-mbp.thefacebook.com>
On Fri, 1 Apr 2016 19:47:12 -0700 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> My guess we're hitting 14.5Mpps limit for empty bpf program
> and for program that actually looks into the packet because we're
> hitting 10G phy limit of 40G nic. Since physically 40G nic
> consists of four 10G phys. There will be the same problem
> with 100G and 50G nics. Both will be hitting 25G phy limit.
> We need to vary packets somehow. Hopefully Or can explain that
> bit of hw design.
> Jesper's experiments with mlx4 showed the same 14.5Mpps limit
> when sender blasting the same packet over and over again.
That is an interesting observation Alexei, and could explain the pps limit
I hit on 40G, with single flow testing. AFAIK 40G is 4x 10G PHYs, and
100G is 4x 25G PHYs.
I have a pktgen script that tried to avoid this pitfall. By creating a
new flow per pktgen kthread. I call it "pktgen_sample05_flow_per_thread.sh"[1]
[1] https://github.com/netoptimizer/network-testing/blob/master/pktgen/pktgen_sample05_flow_per_thread.sh
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH v5 2/4] Documentation: Bindings: Add STM32 DWMAC glue
From: Alexandre Torgue @ 2016-04-04 14:40 UTC (permalink / raw)
To: Joachim Eastwood
Cc: Rob Herring, Chen-Yu Tsai, Maxime Coquelin, Giuseppe Cavallaro,
netdev, devicetree, linux-kernel, linux-arm-kernel
In-Reply-To: <CAJgp7zxfFRhfNjcE23n+DKD0ffrvH4_sowPjK_-DZAS4ecH5dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Hi Rob,
2016-03-22 17:11 GMT+01:00 Alexandre Torgue <alexandre.torgue-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> Hi guys,
>
> I will fix typo issues (s/vesrion/version and ethernet @).
>
> Concerning compatible string. For sure "snps,dwmac-3.50a" string is
> not used inside glue driver.
> I perfere to keep it for information but if you really want that I
> remove it I will not block ;)
>
> 2016-03-21 16:36 GMT+01:00 Joachim Eastwood <manabian-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>> On 21 March 2016 at 13:40, Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>> On Sat, Mar 19, 2016 at 12:00:22AM +0800, Chen-Yu Tsai wrote:
>>>> Hi,
>>>>
>>>> On Fri, Mar 18, 2016 at 11:37 PM, Alexandre TORGUE
>>>> <alexandre.torgue-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> > +- clocks: Must contain a phandle for each entry in clock-names.
>>>> > +- clock-names: Should be "stmmaceth" for the host clock.
>>>
> We can remove host clock (stmmac eth) entry here and refer to
> stmmac.txt binding for common entry
>
>>> This doesn't sound like the clock input signal name...
>>>
>>>> > + Should be "tx-clk" for the MAC TX clock.
>>>> > + Should be "rx-clk" for the MAC RX clock.
>>>
>>> How can other DWMAC blocks not have these clocks? The glue can't really
>>> add these clocks. It could combine them into one or a new version of
>>> DWMAC could have a different number of clock inputs. So if there is
>>> variation here, then some of the bindings are probably wrong. I guess
>>> the only change I'm suggesting is possibly moving these into common
>>> binding doc.
>>
>> The LPC18xx implementation probably have these clocks as well but the
>> LPC1850 user manual only documents the main clock. Someone with access
>> to the IP block doc from Synopsys should be able to check which clocks
>> the MAC really needs.
>>
>> Rockchip bindings have two clocks named "mac_clk_rx" and "mac_clk_tx".
>> These are probably the same as stm32 needs so maybe use these names
>> and move them into the main doc and update the rockchip binding.
>>
> I think we can use same name. But I have a doubt on moving it in a
> common bindings (maybe I don't well understood). When you say "common
> binding file" is it "stmmac.txt" binding ? If yes does it mean that we
> have to control it inside stmmac driver (no more in glue) ? In this
> case those clocks will become "required" for stm32 and rockship but
> not for others chip. It could create confusion?
A gentle ping. Can you give me your feedback please ?
I will send next patchset version according to your answer.
Thanks in advance
Alex
>
> Best regards
>
> Alex
>
>>
>> regards,
>> Joachim Eastwood
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox