Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v4 0/5] Introduce NETIF_F_GRO_HW
From: Or Gerlitz @ 2017-12-14 20:01 UTC (permalink / raw)
  To: Michael Chan
  Cc: David Miller, Linux Netdev List, Andy Gospodarek, Gal Pressman
In-Reply-To: <CACKFLi=JhTRD8jK+HqDcTxJHcYzKoGXNfUXs9DKiFQULDUkKYg@mail.gmail.com>

On Thu, Dec 14, 2017 at 8:36 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Thu, Dec 14, 2017 at 8:34 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Mon, Dec 11, 2017 at 1:41 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> Introduce NETIF_F_GRO_HW feature flag and convert drivers that support
>>> hardware GRO to use the new flag.
>>
>> Hi Michael,
>>
>> Could you add more performance motivations/results? looking on your
>> netconf slides [1]
>> I see (much) better CPU utilization (slide 5) -- do you have more
>> numbers to share?
>
> The motivations are the same as LRO:  to achieve higher RX throughput
> and lower CPU utilization.

makes sense, what is the +/- %% improvement in RX throughput you see?

> You can think of it as an improved version of LRO that is stricter and
> provides more information to construct the GRO frame so that it is
> reversible.

sounds good to me

^ permalink raw reply

* Re: [PATCHv2 net-next 09/15] net: sch: api: add extack support in qdisc_get_rtab
From: David Ahern @ 2017-12-14 20:02 UTC (permalink / raw)
  To: Alexander Aring, jhs
  Cc: xiyou.wangcong, jiri, davem, netdev, kernel, David Ahern
In-Reply-To: <20171214183905.23066-10-aring@mojatatu.com>

On 12/14/17 11:38 AM, Alexander Aring wrote:
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 816e8b0c2609..24b286d763b7 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -393,13 +393,16 @@ static __u8 __detect_linklayer(struct tc_ratespec *r, __u32 *rtab)
>  static struct qdisc_rate_table *qdisc_rtab_list;
>  
>  struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
> -					struct nlattr *tab)
> +					struct nlattr *tab,
> +					struct netlink_ext_ack *extack)
>  {
>  	struct qdisc_rate_table *rtab;
>  
>  	if (tab == NULL || r->rate == 0 || r->cell_log == 0 ||
> -	    nla_len(tab) != TC_RTAB_SIZE)
> +	    nla_len(tab) != TC_RTAB_SIZE) {
> +		NL_SET_ERR_MSG(extack, "Invalid rtab parameters for searching");

What is rtab short for? Can you expand that to something meaningful to
the user?

>  		return NULL;
> +	}
>  
>  	for (rtab = qdisc_rtab_list; rtab; rtab = rtab->next) {
>  		if (!memcmp(&rtab->rate, r, sizeof(struct tc_ratespec)) &&
> @@ -418,6 +421,8 @@ struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
>  			r->linklayer = __detect_linklayer(r, rtab->data);
>  		rtab->next = qdisc_rtab_list;
>  		qdisc_rtab_list = rtab;
> +	} else {
> +		NL_SET_ERR_MSG(extack, "Failed to allocate new qdisc rtab");

ditto here

^ permalink raw reply

* Re: [PATCH] openvswitch: Trim off padding before L3 conntrack processing
From: Ed Swierk @ 2017-12-14 20:05 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: ovs-dev, Linux Kernel Network Developers, Lance Richardson,
	Benjamin Warren, Keith Holleman
In-Reply-To: <CAOrHB_CeoZ-MSmYNrMniccdTnT+VYUGEStq38CEi1HjtyGeJHA@mail.gmail.com>

On Wed, Dec 13, 2017 at 4:58 PM, Pravin Shelar <pshelar@ovn.org> wrote:
> On Tue, Dec 12, 2017 at 8:17 AM, Ed Swierk <eswierk@skyportsystems.com> wrote:
>> A short IPv4 packet may have up to 6 bytes of padding following the IP
>> payload when received on an Ethernet device.
>>
>> In the normal IPv4 receive path, ip_rcv() trims the packet to
>> ip_hdr->tot_len before invoking NF_INET_PRE_ROUTING hooks (including
>> conntrack). Then any subsequent L3+ processing steps, like
>> nf_checksum(), use skb->len as the length of the packet, rather than
>> referring back to ip_hdr->tot_len. In the IPv6 receive path, ip6_rcv()
>> does the same using ipv6_hdr->payload_len.
>>
>> In the OVS conntrack receive path, this trimming does not occur, so
>> the checksum verification in tcp_header() fails, printing "nf_ct_tcp:
>> bad TCP checksum". Extra zero bytes don't affect the checksum, but the
>> length in the IP pseudoheader does. That length is based on skb->len,
>> and without trimming, it doesn't match the length the sender used when
>> computing the checksum.
>>
>> With this change, OVS conntrack trims IPv4 and IPv6 packets prior to
>> L3 processing.
>>
>> Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
>> ---
>>  net/openvswitch/conntrack.c | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index d558e882ca0c..3a7c9215c431 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -1105,12 +1105,29 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
>>                    const struct ovs_conntrack_info *info)
>>  {
>>         int nh_ofs;
>> +       unsigned int nh_len;
>>         int err;
>>
>>         /* The conntrack module expects to be working at L3. */
>>         nh_ofs = skb_network_offset(skb);
>>         skb_pull_rcsum(skb, nh_ofs);
>>
>> +       /* Trim to L3 length since nf_checksum() doesn't expect padding. */
> Can you explore if nf_checksum can be changed to avoid the padding?

The nf_ip_checksum() and nf_ip6_checksum() helper functions can easily
be changed to avoid the padding.

My worry is that conntrack is just one of many netfilter hooks that
perform L3+ processing, and may assume that once skb->data points to
the L3 header, skb->len reflects the length of the L3 header and
payload. For example, in nf_conntrack_ftp.c, help() uses skb->len to
determine the length of the FTP payload and the TCP sequence number of
the next packet; this would be thrown off by lower-layer padding.

br_netfilter, a cousin of OVS, has always preserved this
assumption--like ip_rcv() and ip6_rcv(), br_validate_ipv4() and
br_validate_ipv6() trim the skb to the L3 length before they invoke
NF_INET_PRE_ROUTING hooks. Modifying OVS to fit the mold seems more
straightforward than changing this assumption.

>> +       switch (skb->protocol) {
>> +       case htons(ETH_P_IP):
>> +               nh_len = ntohs(ip_hdr(skb)->tot_len);
>> +               break;
>> +       case htons(ETH_P_IPV6):
>> +               nh_len = ntohs(ipv6_hdr(skb)->payload_len)
>> +                       + sizeof(struct ipv6hdr);
>> +               break;
>> +       default:
>> +               nh_len = skb->len;
>> +       }
>> +       err = pskb_trim_rcsum(skb, nh_len);
>> +       if (err)
>> +               return err;
>> +
> In case of error skb needs to be freed.

Thanks, I will fix this.

--Ed

^ permalink raw reply

* [PATCH bpf 0/5] Couple of BPF JIT fixes
From: Daniel Borkmann @ 2017-12-14 20:07 UTC (permalink / raw)
  To: ast; +Cc: holzheu, naveen.n.rao, davem, netdev, Daniel Borkmann

Two fixes that deal with buggy usage of bpf_helper_changes_pkt_data()
in the sense that they also reload cached skb data when there's no
skb context but xdp one, for example. A fix where skb meta data is
reloaded out of the wrong register on helper call, rest is test cases
and making sure on verifier side that there's always the guarantee
that ctx sits in r1. Thanks!

Daniel Borkmann (5):
  bpf, s390x: do not reload skb pointers in non-skb context
  bpf, ppc64: do not reload skb pointers in non-skb context
  bpf: guarantee r1 to be ctx in case of bpf_helper_changes_pkt_data
  bpf, sparc: fix usage of wrong reg for load_skb_regs after call
  bpf: add test case for ld_abs and helper changing pkt data

 arch/powerpc/net/bpf_jit_comp64.c           |  6 ++--
 arch/s390/net/bpf_jit_comp.c                | 11 ++++----
 arch/sparc/net/bpf_jit_comp_64.c            |  6 ++--
 kernel/bpf/verifier.c                       |  6 ++++
 lib/test_bpf.c                              | 43 +++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/test_verifier.c | 24 ++++++++++++++++
 6 files changed, 86 insertions(+), 10 deletions(-)

-- 
2.9.5

^ permalink raw reply

* [PATCH bpf 2/5] bpf, ppc64: do not reload skb pointers in non-skb context
From: Daniel Borkmann @ 2017-12-14 20:07 UTC (permalink / raw)
  To: ast; +Cc: holzheu, naveen.n.rao, davem, netdev, Daniel Borkmann
In-Reply-To: <20171214200727.22230-1-daniel@iogearbox.net>

The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the helper
would enforce a reload such as in case of XDP. Here, we do
have a struct xdp_buff instead of struct sk_buff as context,
thus this will access garbage.

JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB.

Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp64.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 46d74e8..d183b48 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -763,7 +763,8 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 			func = (u8 *) __bpf_call_base + imm;
 
 			/* Save skb pointer if we need to re-cache skb data */
-			if (bpf_helper_changes_pkt_data(func))
+			if ((ctx->seen & SEEN_SKB) &&
+			    bpf_helper_changes_pkt_data(func))
 				PPC_BPF_STL(3, 1, bpf_jit_stack_local(ctx));
 
 			bpf_jit_emit_func_call(image, ctx, (u64)func);
@@ -772,7 +773,8 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 			PPC_MR(b2p[BPF_REG_0], 3);
 
 			/* refresh skb cache */
-			if (bpf_helper_changes_pkt_data(func)) {
+			if ((ctx->seen & SEEN_SKB) &&
+			    bpf_helper_changes_pkt_data(func)) {
 				/* reload skb pointer to r3 */
 				PPC_BPF_LL(3, 1, bpf_jit_stack_local(ctx));
 				bpf_jit_emit_skb_loads(image, ctx);
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf 1/5] bpf, s390x: do not reload skb pointers in non-skb context
From: Daniel Borkmann @ 2017-12-14 20:07 UTC (permalink / raw)
  To: ast; +Cc: holzheu, naveen.n.rao, davem, netdev, Daniel Borkmann
In-Reply-To: <20171214200727.22230-1-daniel@iogearbox.net>

The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the
BPF helper would enforce a reload such as in case of XDP.
Here, we do have a struct xdp_buff instead of struct sk_buff
as context, thus this will access garbage.

JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB only. Tested on s390x.

Fixes: 9db7f2b81880 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 arch/s390/net/bpf_jit_comp.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index e81c168..9557d8b 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -55,8 +55,7 @@ struct bpf_jit {
 #define SEEN_LITERAL	8	/* code uses literals */
 #define SEEN_FUNC	16	/* calls C functions */
 #define SEEN_TAIL_CALL	32	/* code uses tail calls */
-#define SEEN_SKB_CHANGE	64	/* code changes skb data */
-#define SEEN_REG_AX	128	/* code uses constant blinding */
+#define SEEN_REG_AX	64	/* code uses constant blinding */
 #define SEEN_STACK	(SEEN_FUNC | SEEN_MEM | SEEN_SKB)
 
 /*
@@ -448,12 +447,12 @@ static void bpf_jit_prologue(struct bpf_jit *jit, u32 stack_depth)
 			EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0,
 				      REG_15, 152);
 	}
-	if (jit->seen & SEEN_SKB)
+	if (jit->seen & SEEN_SKB) {
 		emit_load_skb_data_hlen(jit);
-	if (jit->seen & SEEN_SKB_CHANGE)
 		/* stg %b1,ST_OFF_SKBP(%r0,%r15) */
 		EMIT6_DISP_LH(0xe3000000, 0x0024, BPF_REG_1, REG_0, REG_15,
 			      STK_OFF_SKBP);
+	}
 }
 
 /*
@@ -983,8 +982,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
 		EMIT2(0x0d00, REG_14, REG_W1);
 		/* lgr %b0,%r2: load return value into %b0 */
 		EMIT4(0xb9040000, BPF_REG_0, REG_2);
-		if (bpf_helper_changes_pkt_data((void *)func)) {
-			jit->seen |= SEEN_SKB_CHANGE;
+		if ((jit->seen & SEEN_SKB) &&
+		    bpf_helper_changes_pkt_data((void *)func)) {
 			/* lg %b1,ST_OFF_SKBP(%r15) */
 			EMIT6_DISP_LH(0xe3000000, 0x0004, BPF_REG_1, REG_0,
 				      REG_15, STK_OFF_SKBP);
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf 3/5] bpf: guarantee r1 to be ctx in case of bpf_helper_changes_pkt_data
From: Daniel Borkmann @ 2017-12-14 20:07 UTC (permalink / raw)
  To: ast; +Cc: holzheu, naveen.n.rao, davem, netdev, Daniel Borkmann
In-Reply-To: <20171214200727.22230-1-daniel@iogearbox.net>

Some JITs don't cache skb context on stack in prologue, so when
LD_ABS/IND is used and helper calls yield bpf_helper_changes_pkt_data()
as true, then they temporarily save/restore skb pointer. However,
the assumption that skb always has to be in r1 is a bit of a
gamble. Right now it turned out to be true for all helpers listed
in bpf_helper_changes_pkt_data(), but lets enforce that from verifier
side, so that we make this a guarantee and bail out if the func
proto is misconfigured in future helpers.

In case of BPF helper calls from cBPF, bpf_helper_changes_pkt_data()
is completely unrelevant here (since cBPF is context read-only) and
therefore always false.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d459357..e39b013 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1674,7 +1674,13 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 		return -EINVAL;
 	}
 
+	/* With LD_ABS/IND some JITs save/restore skb from r1. */
 	changes_data = bpf_helper_changes_pkt_data(fn->func);
+	if (changes_data && fn->arg1_type != ARG_PTR_TO_CTX) {
+		verbose(env, "kernel subsystem misconfigured func %s#%d: r1 != ctx\n",
+			func_id_name(func_id), func_id);
+		return -EINVAL;
+	}
 
 	memset(&meta, 0, sizeof(meta));
 	meta.pkt_access = fn->pkt_access;
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf 4/5] bpf, sparc: fix usage of wrong reg for load_skb_regs after call
From: Daniel Borkmann @ 2017-12-14 20:07 UTC (permalink / raw)
  To: ast; +Cc: holzheu, naveen.n.rao, davem, netdev, Daniel Borkmann
In-Reply-To: <20171214200727.22230-1-daniel@iogearbox.net>

When LD_ABS/IND is used in the program, and we have a BPF helper
call that changes packet data (bpf_helper_changes_pkt_data() returns
true), then in case of sparc JIT, we try to reload cached skb data
from bpf2sparc[BPF_REG_6]. However, there is no such guarantee or
assumption that skb sits in R6 at this point, all helpers changing
skb data only have a guarantee that skb sits in R1. Therefore,
store BPF R1 in L7 temporarily and after procedure call use L7 to
reload cached skb data. skb sitting in R6 is only true at the time
when LD_ABS/IND is executed.

Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/sparc/net/bpf_jit_comp_64.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 5765e7e..ff5f9cb 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1245,14 +1245,16 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 		u8 *func = ((u8 *)__bpf_call_base) + imm;
 
 		ctx->saw_call = true;
+		if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func))
+			emit_reg_move(bpf2sparc[BPF_REG_1], L7, ctx);
 
 		emit_call((u32 *)func, ctx);
 		emit_nop(ctx);
 
 		emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx);
 
-		if (bpf_helper_changes_pkt_data(func) && ctx->saw_ld_abs_ind)
-			load_skb_regs(ctx, bpf2sparc[BPF_REG_6]);
+		if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func))
+			load_skb_regs(ctx, L7);
 		break;
 	}
 
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf 5/5] bpf: add test case for ld_abs and helper changing pkt data
From: Daniel Borkmann @ 2017-12-14 20:07 UTC (permalink / raw)
  To: ast; +Cc: holzheu, naveen.n.rao, davem, netdev, Daniel Borkmann
In-Reply-To: <20171214200727.22230-1-daniel@iogearbox.net>

Add a test that i) uses LD_ABS, ii) zeroing R6 before call, iii) calls
a helper that triggers reload of cached skb data, iv) uses LD_ABS again.
It's added for test_bpf in order to do runtime testing after JITing as
well as test_verifier to test that the sequence is allowed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 lib/test_bpf.c                              | 43 +++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/test_verifier.c | 24 ++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index aa8812a..9e97480 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -435,6 +435,41 @@ static int bpf_fill_ld_abs_vlan_push_pop(struct bpf_test *self)
 	return 0;
 }
 
+static int bpf_fill_ld_abs_vlan_push_pop2(struct bpf_test *self)
+{
+	struct bpf_insn *insn;
+
+	insn = kmalloc_array(16, sizeof(*insn), GFP_KERNEL);
+	if (!insn)
+		return -ENOMEM;
+
+	/* Due to func address being non-const, we need to
+	 * assemble this here.
+	 */
+	insn[0] = BPF_MOV64_REG(R6, R1);
+	insn[1] = BPF_LD_ABS(BPF_B, 0);
+	insn[2] = BPF_LD_ABS(BPF_H, 0);
+	insn[3] = BPF_LD_ABS(BPF_W, 0);
+	insn[4] = BPF_MOV64_REG(R7, R6);
+	insn[5] = BPF_MOV64_IMM(R6, 0);
+	insn[6] = BPF_MOV64_REG(R1, R7);
+	insn[7] = BPF_MOV64_IMM(R2, 1);
+	insn[8] = BPF_MOV64_IMM(R3, 2);
+	insn[9] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+			       bpf_skb_vlan_push_proto.func - __bpf_call_base);
+	insn[10] = BPF_MOV64_REG(R6, R7);
+	insn[11] = BPF_LD_ABS(BPF_B, 0);
+	insn[12] = BPF_LD_ABS(BPF_H, 0);
+	insn[13] = BPF_LD_ABS(BPF_W, 0);
+	insn[14] = BPF_MOV64_IMM(R0, 42);
+	insn[15] = BPF_EXIT_INSN();
+
+	self->u.ptr.insns = insn;
+	self->u.ptr.len = 16;
+
+	return 0;
+}
+
 static int bpf_fill_jump_around_ld_abs(struct bpf_test *self)
 {
 	unsigned int len = BPF_MAXINSNS;
@@ -6066,6 +6101,14 @@ static struct bpf_test tests[] = {
 		{},
 		{ {0x1, 0x42 } },
 	},
+	{
+		"LD_ABS with helper changing skb data",
+		{ },
+		INTERNAL,
+		{ 0x34 },
+		{ { ETH_HLEN, 42 } },
+		.fill_helper = bpf_fill_ld_abs_vlan_push_pop2,
+	},
 };
 
 static struct net_device dev;
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 3c64f30..b03ecfd 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -6117,6 +6117,30 @@ static struct bpf_test tests[] = {
 		.result = ACCEPT,
 	},
 	{
+		"ld_abs: tests on r6 and skb data reload helper",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+			BPF_LD_ABS(BPF_B, 0),
+			BPF_LD_ABS(BPF_H, 0),
+			BPF_LD_ABS(BPF_W, 0),
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_6, 0),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+			BPF_MOV64_IMM(BPF_REG_2, 1),
+			BPF_MOV64_IMM(BPF_REG_3, 2),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_skb_vlan_push),
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_7),
+			BPF_LD_ABS(BPF_B, 0),
+			BPF_LD_ABS(BPF_H, 0),
+			BPF_LD_ABS(BPF_W, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 42),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
 		"ld_ind: check calling conv, r1",
 		.insns = {
 			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCHv2 net-next 10/15] net: sch: api: add extack support in tcf_block_get
From: David Ahern @ 2017-12-14 20:08 UTC (permalink / raw)
  To: Alexander Aring, jhs
  Cc: xiyou.wangcong, jiri, davem, netdev, kernel, David Ahern
In-Reply-To: <20171214183905.23066-11-aring@mojatatu.com>

On 12/14/17 11:39 AM, Alexander Aring wrote:
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 446ef956a79c..173107ed3726 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -282,20 +282,24 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
>  }
>  
>  int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
> -		      struct tcf_block_ext_info *ei)
> +		      struct tcf_block_ext_info *ei,
> +		      struct netlink_ext_ack *extack)
>  {
>  	struct tcf_block *block = kzalloc(sizeof(*block), GFP_KERNEL);
>  	struct tcf_chain *chain;
>  	int err;
>  
> -	if (!block)
> +	if (!block) {
> +		NL_SET_ERR_MSG(extack, "No tcf block given");

Wrong message for the failure. !block means kzalloc failed.

>  		return -ENOMEM;
> +	}
>  	INIT_LIST_HEAD(&block->chain_list);
>  	INIT_LIST_HEAD(&block->cb_list);
>  
>  	/* Create chain 0 by default, it has to be always present. */
>  	chain = tcf_chain_create(block, 0);
>  	if (!chain) {
> +		NL_SET_ERR_MSG(extack, "Failed to create new tcf chain");
>  		err = -ENOMEM;
>  		goto err_chain_create;
>  	}

^ permalink raw reply

* Re: [PATCHv2 net-next 14/15] net: sch: sch_cbs: add extack support
From: David Ahern @ 2017-12-14 20:11 UTC (permalink / raw)
  To: Alexander Aring, jhs
  Cc: xiyou.wangcong, jiri, davem, netdev, kernel, David Ahern
In-Reply-To: <20171214183905.23066-15-aring@mojatatu.com>

On 12/14/17 11:39 AM, Alexander Aring wrote:
> diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
> index 8bf6e163d29c..6d09ffd2371e 100644
> --- a/net/sched/sch_cbs.c
> +++ b/net/sched/sch_cbs.c
> @@ -219,14 +219,17 @@ static void cbs_disable_offload(struct net_device *dev,
>  }
>  
>  static int cbs_enable_offload(struct net_device *dev, struct cbs_sched_data *q,
> -			      const struct tc_cbs_qopt *opt)
> +			      const struct tc_cbs_qopt *opt,
> +			      struct netlink_ext_ack *extack)
>  {
>  	const struct net_device_ops *ops = dev->netdev_ops;
>  	struct tc_cbs_qopt_offload cbs = { };
>  	int err;
>  
> -	if (!ops->ndo_setup_tc)
> +	if (!ops->ndo_setup_tc) {
> +		NL_SET_ERR_MSG(extack, "Specified device does support cbs offload");

does *not* support ... ?

^ permalink raw reply

* Re: net-next libbpf broken on prev kernel release
From: Daniel Borkmann @ 2017-12-14 20:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Eric Leblond, Martin KaFai Lau, ast, netdev
In-Reply-To: <20171214142809.GF10463@kernel.org>

On 12/14/2017 03:28 PM, Arnaldo Carvalho de Melo wrote:
[...]
>> Arnaldo, will there be a rework of your fix that we could route to bpf tree?
> 
> I'm resuming work on it after I get my current batch tested and
> submitted, will reboot with an older kernel and follow your suggestions,
> that seems to match Alexei's and Martin's, my patch was just a RFC to
> show that we need a fallback for older kernels.

Okay, sounds good, thanks a lot!

> I needed to move on, so I updated my machine to a kernel where interlock
> of tools/ with the kernel happens and it worked, so I left this to see
> if someone else complained or if I was being too picky. :-)

:)

^ permalink raw reply

* [PATCH v2 ipsec-next 0/3] xfrm: offload api fixes
From: Shannon Nelson @ 2017-12-14 21:01 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev

These are a couple of little fixes to the xfrm_offload API to make
life just a little easier for the poor driver developer.

Changes from v1:
 - removed netdev_err() notes  (Steffen)
 - fixed build when CONFIG_XFRM_OFFLOAD is off (kbuild robot)
 - split into multiple patches (me)


Shannon Nelson (3):
  xfrm: check for xdo_dev_state_free
  xfrm: check for xdo_dev_ops add and delete
  xfrm: wrap xfrmdev_ops with offload config

 include/linux/netdevice.h |  2 +-
 include/net/xfrm.h        |  3 ++-
 net/xfrm/xfrm_device.c    | 16 ++++++++++++----
 3 files changed, 15 insertions(+), 6 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH v2 ipsec-next 2/3] xfrm: check for xdo_dev_ops add and delete
From: Shannon Nelson @ 2017-12-14 21:01 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <1513285277-21092-1-git-send-email-shannon.nelson@oracle.com>

This adds a check for the required add and delete functions up front
at registration time to be sure both are defined.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 net/xfrm/xfrm_device.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 30e5746..5eb6b82 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -144,11 +144,19 @@ EXPORT_SYMBOL_GPL(xfrm_dev_offload_ok);
 
 static int xfrm_dev_register(struct net_device *dev)
 {
-	if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
-		return NOTIFY_BAD;
-	if ((dev->features & NETIF_F_HW_ESP_TX_CSUM) &&
-	    !(dev->features & NETIF_F_HW_ESP))
+	if (!(dev->features & NETIF_F_HW_ESP)) {
+		if (dev->features & NETIF_F_HW_ESP_TX_CSUM)
+			return NOTIFY_BAD;
+		else
+			return NOTIFY_DONE;
+	}
+
+#ifdef CONFIG_XFRM_OFFLOAD
+	if (!(dev->xfrmdev_ops &&
+	      dev->xfrmdev_ops->xdo_dev_state_add &&
+	      dev->xfrmdev_ops->xdo_dev_state_delete))
 		return NOTIFY_BAD;
+#endif
 
 	return NOTIFY_DONE;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 ipsec-next 3/3] xfrm: wrap xfrmdev_ops with offload config
From: Shannon Nelson @ 2017-12-14 21:01 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <1513285277-21092-1-git-send-email-shannon.nelson@oracle.com>

There's no reason to define netdev->xfrmdev_ops if
the offload facility is not CONFIG'd in.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 include/linux/netdevice.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2eaac7d..145d0de 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1697,7 +1697,7 @@ struct net_device {
 	const struct ndisc_ops *ndisc_ops;
 #endif
 
-#ifdef CONFIG_XFRM
+#ifdef CONFIG_XFRM_OFFLOAD
 	const struct xfrmdev_ops *xfrmdev_ops;
 #endif
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 ipsec-next 1/3] xfrm: check for xdo_dev_state_free
From: Shannon Nelson @ 2017-12-14 21:01 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <1513285277-21092-1-git-send-email-shannon.nelson@oracle.com>

The current XFRM code assumes that we've implemented the
xdo_dev_state_free() callback, even if it is meaningless to the driver.
This patch adds a check for it before calling, as done in other APIs,
to prevent a NULL function pointer kernel crash.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 include/net/xfrm.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e015e16..dfabd04 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1891,7 +1891,8 @@ static inline void xfrm_dev_state_free(struct xfrm_state *x)
 	 struct net_device *dev = xso->dev;
 
 	if (dev && dev->xfrmdev_ops) {
-		dev->xfrmdev_ops->xdo_dev_state_free(x);
+		if (dev->xfrmdev_ops->xdo_dev_state_free)
+			dev->xfrmdev_ops->xdo_dev_state_free(x);
 		xso->dev = NULL;
 		dev_put(dev);
 	}
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 2/4] sctp: Add ip option support
From: Marcelo Ricardo Leitner @ 2017-12-14 21:04 UTC (permalink / raw)
  To: Paul Moore
  Cc: Richard Haines, selinux, netdev, linux-sctp,
	linux-security-module, Vlad Yasevich, nhorman, Stephen Smalley,
	Eric Paris
In-Reply-To: <CAHC9VhQ5iQsXum6NVkL-qdzu=bxVcaOscPjTSVufLzRbW_1_Pw@mail.gmail.com>

On Tue, Dec 12, 2017 at 05:24:46PM -0500, Paul Moore wrote:
> On Tue, Dec 12, 2017 at 4:56 PM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Tue, Dec 12, 2017 at 04:33:03PM -0500, Paul Moore wrote:
> >> On Tue, Dec 12, 2017 at 11:08 AM, Marcelo Ricardo Leitner
> >> <marcelo.leitner@gmail.com> wrote:
> >> > Hi Richard,
> >> >
> >> > On Mon, Nov 27, 2017 at 07:31:21PM +0000, Richard Haines wrote:
> >> > ...
> >> >> --- a/net/sctp/socket.c
> >> >> +++ b/net/sctp/socket.c
> >> >> @@ -3123,8 +3123,10 @@ static int sctp_setsockopt_maxseg(struct sock *sk, char __user *optval, unsigned
> >> >>
> >> >>       if (asoc) {
> >> >>               if (val == 0) {
> >> >> +                     struct sctp_af *af = sp->pf->af;
> >> >>                       val = asoc->pathmtu;
> >> >> -                     val -= sp->pf->af->net_header_len;
> >> >> +                     val -= af->ip_options_len(asoc->base.sk);
> >> >> +                     val -= af->net_header_len;
> >> >>                       val -= sizeof(struct sctphdr) +
> >> >>                                       sizeof(struct sctp_data_chunk);
> >> >>               }
> >> >
> >> > Right below here there is a call to sctp_frag_point(). That function
> >> > also needs this tweak.
> >> >
> >> > Yes, we should simplify all these calculations. I have a patch to use
> >> > sctp_frag_point on where it is currently recalculating it on
> >> > sctp_datamsg_from_user(), but probably should include other places as
> >> > well.
> >>
> >> FYI: Richard let me know he is occupied with another project at the
> >> moment and likely won't be able to do another respin until next week
> >> at the earliest.
> >
> > Okay, thanks. I can do a follow-up patch if it helps.
> 
> I'll leave that up to you, I think your comments are pretty
> straightforward and should be easy for Richard to incorporate, and
> there is a lot to be said for including the fix in the original patch,
> but if you would prefer to send a separate patch I think that's fine
> too.

This patchset conflicts with the stream schedulers patchset (on
sctp.h) and will also conflict with the stream interleaving patchsets
from Xin (other files).

I rebased the patchset upon current crypto tree but the last patchset
on stream interleaving is still to be accepted via net-next.

I'm unsure on how to proceed here. Please advise.

Thanks,
Marcelo

^ permalink raw reply

* Re: [RFC PATCH 0/9] ethtool netlink interface (WiP)
From: John W. Linville @ 2017-12-14 21:07 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev, linux-kernel
In-Reply-To: <cover.1513000306.git.mkubecek@suse.cz>

Michal,

Thanks for looking at this issue.

While the kernel and userland bits here are at least somewhat
decoupled, as the current maintainer for the userland ethtool I
feel compelled to comment. FWIW I had started down a similar road,
but I became dissatisfied with my own results. The main problem was
that the only obvious API-conversion technique essentially amounted
to re-implementing the ioctl-based API, but on top of the netlink
transport. Having been burned by something similar in the pre-nl80211
days with wireless extensions, I really didn't want to repeat that.

Even without considering the ioctl problesms, the current ethtool
API seems a bit crufty. It has been a catch-all, "where else would it
go?" dumping ground for a long time, and it has accrued a number of
not-entirely-related bits of functionality. In my mind, what needs
to happen is that these various bits of functionality need to be
reorganized into a handful of groupings. Then, each group needs an
API designed around semantics that are natural to the functionality
being addressed. I believe this is essentially the idea that others
have expressed with the "move some of the ethtool bits to devlink"
comments. I think that probably makes sense, although trying to shove
everything into devlink probably makes no more sense than keeping
the entire ethtool API intact on top of a netlink transport. Anyway,
I think that with a reasonable set of groupings, the semantics would
fall-out naturally and implementing them on netlink or any other
suitable transport would be reasonably trivial.

Unfortunately, I have yet to formultate a useful set of abstractions
for grouping the various bits of the ethtool API. I have reached-out
to a few community folks seeking their wisdom, but since no one has
been forthcoming with such a set of abstractions, I'll presume that
either I failed to convey my message, my idea isn't too good, or none
of them are any smarter than me -- I'll avoid identifying any of them
here in order to save us all some embarassment! :-)

In short, what I would like to see is a true rethink of what APIs need
to be provided to NICs, outside of the basic netdev abstraction. I
wish I had the answer to what those APIs should be, but I don't. I do
believe that with a well-conceived group of APIs, the proper semantics
will fall-out naturally. Any takers? :-)

Michal, once again -- thanks for attempting to address this
issue. Please do not take any of the above as discouragement. It is
clear that ethtool needs replacement, and we all know that 'perfect'
is the enemy of 'good'. I just would hate to miss the opportunity for a
'better' API just because ethtool's ioctly API has lived too long.

John

On Mon, Dec 11, 2017 at 02:53:11PM +0100, Michal Kubecek wrote:
> This is still work in progress and only a very small part of the ioctl
> interface is reimplemented but I would like to get some comments before
> the patchset becomes too big and changing things becomes too tedious.
> 
> The interface used for communication between ethtool and kernel is based on
> ioctl() and suffers from many problems. The most pressing seems the be the
> lack of extensibility. While some of the newer commands use structures
> designed to allow future extensions (e.g. GFEATURES or TEST), most either
> allow no extension at all (GPAUSEPARAM, GCOALESCE) or only limited set of
> reserved fields (GDRVINFO, GEEE). Even most of those which support future
> extensions limit the data types that can be used.
> 
> This series aims to provide an alternative interface based on netlink which
> is what other network configuration utilities use. In particular, it uses
> generic netlink (family "ethtool"). The goal is to provide an interface
> which would be extensible, flexible and practical both for ethtool and for
> other network configuration tools (e.g. wicked or systemd-networkd).
> 
> The interface is documented in Documentation/networking/ethtool-netlink.txt
> 
> I'll post RFC patch series for ethtool in a few days, it still needs some
> more polishing.
> 
> Basic concepts:
> 
> - the interface is based on generic netlink (family name "ethtool")
> 
> - provide everything ioctl can do but allow easier future extensions
> 
> - inextensibility of ioctl interface resulted in way too many commands,
>   many of them obsoleted by newer ones; reduce the number by  ignoring the
>   obsolete commands and grouping some together
> 
> - for "set" type commands, netlink allows providing only the attributes to
>   be changed; therefore we don't need a get-modify-set cycle, userspace can
>   simply say what it wants to change and how
> 
> - be less dependent on ethtool and kernel being in sync; allow e.g. saying
>   "ethtool -s eth0 advertise xyz off" without knowing what "xyz" means;
>   it's kernel's job to know what mode "xyz" is and if it exists and is
>   supported
> 
> Unresolved questions/tasks:
> 
> - allow dumps for "get" type requests, e.g. listing EEE settings for all
>   interfaces in current netns
> 
> - find reasonable format for data transfers (e.g. eeprom dump or flash)
> 
> - while the netlink interface allows easy future extensions, ethtool_ops
>   interface does not; some settings could be implemented using tunables and
>   accessed via relevant netlink messages (as well as tunables) from
>   userspace but in the long term, something better will be needed
> 
> - it would be nice if driver could provide useful error/warning messages to
>   be passed to userspace via extended ACK; example: while testing, I found
>   a driver which only allows values 0, 1, 3 and 10000 for certain parameter
>   but the only way poor user can find out is either by trying all values or
>   by checking driver source
> 
> Michal Kubecek (9):
>   netlink: introduce nla_put_bitfield32()
>   ethtool: introduce ethtool netlink interface
>   ethtool: helper functions for netlink interface
>   ethtool: netlink bitset handling
>   ethtool: implement GET_DRVINFO message
>   ethtool: implement GET_SETTINGS message
>   ethtool: implement SET_SETTINGS message
>   ethtool: implement GET_PARAMS message
>   ethtool: implement SET_PARAMS message
> 
>  Documentation/networking/ethtool-netlink.txt |  466 ++++++
>  include/linux/ethtool_netlink.h              |   12 +
>  include/linux/netdevice.h                    |    2 +
>  include/net/netlink.h                        |   15 +
>  include/uapi/linux/ethtool.h                 |    3 +
>  include/uapi/linux/ethtool_netlink.h         |  239 +++
>  net/Kconfig                                  |    7 +
>  net/core/Makefile                            |    3 +-
>  net/core/ethtool.c                           |  150 +-
>  net/core/ethtool_common.c                    |  158 ++
>  net/core/ethtool_common.h                    |   19 +
>  net/core/ethtool_netlink.c                   | 2323 ++++++++++++++++++++++++++
>  12 files changed, 3260 insertions(+), 137 deletions(-)
>  create mode 100644 Documentation/networking/ethtool-netlink.txt
>  create mode 100644 include/linux/ethtool_netlink.h
>  create mode 100644 include/uapi/linux/ethtool_netlink.h
>  create mode 100644 net/core/ethtool_common.c
>  create mode 100644 net/core/ethtool_common.h
>  create mode 100644 net/core/ethtool_netlink.c
> 
> -- 
> 2.15.1
> 
> 

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH net-next 1/1] net: dsa: microchip: Add Microchip KSZ8895 DSA driver
From: Pavel Machek @ 2017-12-14 21:12 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: andrew, f.fainelli, muvarov, nathan.leigh.conrad, vivien.didelot,
	UNGLinuxDriver, netdev
In-Reply-To: <93AF473E2DA327428DE3D46B72B1E9FD4113EACE@CHN-SV-EXMX02.mchp-main.com>

[-- Attachment #1: Type: text/plain, Size: 3179 bytes --]

Hi!

Thanks for the support.

> > root@miro:~# mii-tool lan3
> > lan3: negotiated 1000baseT-HD flow-control, link ok
> > 
> > But IIRC the switch is 100mbit? And dmesg does get it right. Its just
> > mii-tool that is confused.
> > 
> > Link detection seems to work
> > 
> > root@miro:/sys/bus/spi/devices/spi2.0# mii-tool lan1
> > lan1: negotiated 1000baseT-HD flow-control, link ok
> > root@miro:/sys/bus/spi/devices/spi2.0# mii-tool lan1
> > lan1: no link
> > 
> > (But that really should be 100baseT, not 1000baseT).
> 
> ethtool lan3 should also report the correct setting.

Yes, after port is configured, ethtool produces right results:

 Speed: 100Mb/s
 Duplex: Full

Before that, it looks rather confusing:

root@miro:~# ethtool lan2
root@miro:~# ethtool lan2
Settings for lan2:
...
	Speed: Unknown!
	Duplex: Unknown! (255)
	    
> > Is there register dump available somewhere? I was using
> > /sys/bus/spi/devices/spi32766.0/registers but this does not seem to be
> > available.
> 
> There is a patch to add that functionality.  It is very simple and I will send it
> to you later.  Without that it is hard to debug the DSA driver if there is
> something wrong.

That would be nice :-).

> I also have a simple utility to communicate with that registers file to read/write
> register individually.  Is there a standard Linux utility for that
> function?

I don't think standard utility exists. Binary file which can be
written by userspace shoudl be enough.

> >      p04_rx: 660
> >      p04_rx_hi: 0
> >      p04_rx_undersize: 0
> >      p04_rx_fragments: 20
> 
> This indicates a problem with the MAC.  Are you using a MII or RMII version?

I do have:
                mac0: ethernet@800f0000 {
	                       phy-mode = "rmii";
	                       pinctrl-names = "default";
			       ...
	       }

> >      p04_tx_hi: 0
> >      p04_tx_late_col: 0
> >      p04_tx_pause: 0
> >      p04_tx_bcast: 0
> >      p04_tx_mcast: 3
> 

> This indicates the host port tried to send frames to the MAC.
> >      tx_total_col: 0
> >      tx_exc_col: 0
> >      tx_single_col: 0
> >      tx_mult_col: 0
> >      rx_discards: 0
> >      tx_discards: 0
> 
> They just reported frames are received from the port.  Because of problem with
> the host port there is no transmission coming from the host port.

I disabled second ethernet port in the dts so it could not interfere
with testing, butno change.

Is there any way to debug the host port problems? I do have

   spi@0 {
        compatible = "microchip,ksz8895";
	...
 	ports {
 		port@4 {
 			reg = <4>;
 			label = "cpu";
 			ethernet = <&mac0>;
 			fixed-link {
 			    speed = <100>;
 			    full-duplex;
 			    };
 		};
        };
 };

On one side, and

 mac0: ethernet@800f0000 {
       phy-mode = "rmii";
       status = "okay";
       fixed-link {
       		  speed = <100>;
 		  full-duplex;
   };
 };

on the other...

Thanks,
									Pavel
 
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH 1/1] net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
From: Daniele Palmas @ 2017-12-14 21:28 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: netdev
In-Reply-To: <87k1xpfb48.fsf@miraculix.mork.no>

2017-12-14 18:55 GMT+01:00 Bjørn Mork <bjorn@mork.no>:
> Daniele Palmas <dnlplm@gmail.com> writes:
>
>> This patch adds support for Telit ME910 PID 0x1101.
>>
>> Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
>
> Acked-by: Bjørn Mork <bjorn@mork.no>
>
>>       bInterfaceSubClass    254
>
> Just curious: Is there some meaning hidden here?
>

That was puzzling also for me, so I checked the original QC
composition and had the same value. None of the documents I have has a
reference to this.

As far as I know, from the functional point of view, it is a normal AT
command port like the other one. But I'll try to investigate a bit
more..

Thanks,
Daniele

>
> Bjørn

^ permalink raw reply

* Re: [PATCH net-next] qmi_wwan: set FLAG_SEND_ZLP to avoid network initiated disconnect
From: Vamsi Samavedam @ 2017-12-14 21:39 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: netdev, linux-usb
In-Reply-To: <20171214185550.15779-1-bjorn@mork.no>

On Thu, Dec 14, 2017 at 07:55:50PM +0100, Bjørn Mork wrote:
> It has been reported that the dummy byte we add to avoid
> ZLPs can be forwarded by the modem to the PGW/GGSN, and that
> some operators will drop the connection if this happens.
> 
> In theory, QMI devices are based on CDC ECM and should as such
> both support ZLPs and silently ignore the dummy byte.  The latter
> assumption failed.  Let's test out the first.
> 
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> ---
> I am a bit worried about the effect of this change on all the
> devices I can't test myself. But trying it is the only way we
> can ever find out....

flag zlp is tested with pids: 9034,9048,904c,9075, 908E,9079,
908A,909F and 90A4. In general, qualcomm usb firmware can
handle zlps.

^ permalink raw reply

* Re: [PATCH net-next] qmi_wwan: set FLAG_SEND_ZLP to avoid network initiated disconnect
From: Bjørn Mork @ 2017-12-14 21:42 UTC (permalink / raw)
  To: Vamsi Samavedam; +Cc: netdev, linux-usb
In-Reply-To: <20171214213959.GC21058@vskrishn2-linux.qualcomm.com>

Vamsi Samavedam <vskrishn@codeaurora.org> writes:

> On Thu, Dec 14, 2017 at 07:55:50PM +0100, Bjørn Mork wrote:
>> It has been reported that the dummy byte we add to avoid
>> ZLPs can be forwarded by the modem to the PGW/GGSN, and that
>> some operators will drop the connection if this happens.
>> 
>> In theory, QMI devices are based on CDC ECM and should as such
>> both support ZLPs and silently ignore the dummy byte.  The latter
>> assumption failed.  Let's test out the first.
>> 
>> Signed-off-by: Bjørn Mork <bjorn@mork.no>
>> ---
>> I am a bit worried about the effect of this change on all the
>> devices I can't test myself. But trying it is the only way we
>> can ever find out....
>
> flag zlp is tested with pids: 9034,9048,904c,9075, 908E,9079,
> 908A,909F and 90A4. In general, qualcomm usb firmware can
> handle zlps.

Thanks.  That's very good to know.


Bjørn

^ permalink raw reply

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
From: Willem de Bruijn @ 2017-12-14 22:17 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development
In-Reply-To: <d71df64e-e65f-4db4-6f2e-c002c15fcbe4@01019freenet.de>

>> Well, the patch does not fix hanging VMs, which have been shutdown and
>> can't be killed any more.
>> Because of the stack trace
>>
>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> I was hoping, that the problems could be related - but that seems not to
>> be true.
>
> However, it turned out, that reverting the complete patchset "Remove UDP
> Fragmentation Offload support" prevent hanging qemu processes.

That implies a combination of UFO and vhost zerocopy. Disabling
experimental_zcopytx in vhost_net will probably work around the bug
then.

On the surface the two features are independent. Most of the relevant
UFO code is reverted with the patch mentioned earlier. Missing from
that is protocol stack support, but it is unlikely that your host OS is
generating these UFO packets.

They are coming from a guest over virtio_net, to which vhost_net then
applies zerocopy. Then the packet(s) is/are either freed without calling
uarg->callback() or queued somewhere for a very long time.

Looking at the diff-of-diffs between my stable patch and your full revert,
the majority of missing bits beside the procol layer is in device driver
support. Removing that causes the UFO packets to be segmented at any
dev_queue_xmit on their path. skb_segment ensures that when it segments
a large zerocopy packet, all new segments also point to the zerocopy
callback struct (ubuf_info), as the shared memory pages may not be
released until all skbs pointing to them are freed.

That may be wrong with vhost_zerocopy_callback, which does not use
refcounting. I will look into that. It may be that before the msg_zerocopy
patchsets large packets were copied before entering segmentation. It is
safe to enter segmentation for msg_zerocopy skbs, but not legacy zerocopy
skbs.

I will also set up two VMs and try to send UFO packets and see whether
they indeed are freed somewhere in the stack without notifying vhost_net.

^ permalink raw reply

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
From: Willem de Bruijn @ 2017-12-14 22:47 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development
In-Reply-To: <CAF=yD-JnDSfH-2wMjN5BUVdXeezp+k4ievSuvVCvEjX-jncD7Q@mail.gmail.com>

On Thu, Dec 14, 2017 at 5:17 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>>> Well, the patch does not fix hanging VMs, which have been shutdown and
>>> can't be killed any more.
>>> Because of the stack trace
>>>
>>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> I was hoping, that the problems could be related - but that seems not to
>>> be true.
>>
>> However, it turned out, that reverting the complete patchset "Remove UDP
>> Fragmentation Offload support" prevent hanging qemu processes.
>
> That implies a combination of UFO and vhost zerocopy. Disabling
> experimental_zcopytx in vhost_net will probably work around the bug
> then.
>
> On the surface the two features are independent. Most of the relevant
> UFO code is reverted with the patch mentioned earlier. Missing from
> that is protocol stack support, but it is unlikely that your host OS is
> generating these UFO packets.
>
> They are coming from a guest over virtio_net, to which vhost_net then
> applies zerocopy. Then the packet(s) is/are either freed without calling
> uarg->callback() or queued somewhere for a very long time.
>
> Looking at the diff-of-diffs between my stable patch and your full revert,
> the majority of missing bits beside the procol layer is in device driver
> support. Removing that causes the UFO packets to be segmented at any
> dev_queue_xmit on their path. skb_segment ensures that when it segments
> a large zerocopy packet, all new segments also point to the zerocopy
> callback struct (ubuf_info), as the shared memory pages may not be
> released until all skbs pointing to them are freed.
>
> That may be wrong with vhost_zerocopy_callback, which does not use
> refcounting. I will look into that. It may be that before the msg_zerocopy
> patchsets large packets were copied before entering segmentation. It is
> safe to enter segmentation for msg_zerocopy skbs, but not legacy zerocopy
> skbs.

If this is the cause, then the following, while not a real solution, would
probably also solve resolve the observed issue.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e140ba49b30a..8fe5bca1d6ae 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3655,10 +3655,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
                skb_copy_from_linear_data_offset(head_skb, offset,
                                                 skb_put(nskb, hsize), hsize);

+               if (unlikely(skb_orphan_frags_rx(head_skb, GFP_ATOMIC)))
+                       goto err;
                skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
                                              SKBTX_SHARED_FRAG;
-               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
-                       goto err;

This basically converts zerocopy TSO skbs to regular and calls their
uarg->callback just before segmenting them.

^ permalink raw reply related

* Re: [PATCH net-next v6 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE
From: Florian Fainelli @ 2017-12-14 23:24 UTC (permalink / raw)
  To: Kunihiko Hayashi, David Miller, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Lunn, Rob Herring, Mark Rutland,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar
In-Reply-To: <1513245910-15961-2-git-send-email-hayashi.kunihiko-uWyLwvC0a2jby3iVrkZq2A@public.gmane.org>



On 12/14/2017 02:05 AM, Kunihiko Hayashi wrote:
> DT bindings for the AVE ethernet controller found on Socionext's
> UniPhier platforms.
> 
> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko-uWyLwvC0a2jby3iVrkZq2A@public.gmane.org>
> Signed-off-by: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>  .../bindings/net/socionext,uniphier-ave4.txt       | 48 ++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
> new file mode 100644
> index 0000000..4700377
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
> @@ -0,0 +1,48 @@
> +* Socionext AVE ethernet controller
> +
> +This describes the devicetree bindings for AVE ethernet controller
> +implemented on Socionext UniPhier SoCs.
> +
> +Required properties:
> + - compatible: Should be
> +	- "socionext,uniphier-pro4-ave4" : for Pro4 SoC
> +	- "socionext,uniphier-pxs2-ave4" : for PXs2 SoC
> +	- "socionext,uniphier-ld11-ave4" : for LD11 SoC
> +	- "socionext,uniphier-ld20-ave4" : for LD20 SoC
> + - reg: Address where registers are mapped and size of region.
> + - interrupts: Should contain the MAC interrupt.
> + - phy-mode: See ethernet.txt in the same directory. Allow to choose
> +	"rgmii", "rmii", or "mii" according to the PHY.
> + - phy-handle: Should point to the external phy device.
> +	See ethernet.txt file in the same directory.
> + - clocks: A phandle to the clock for the MAC.
> +
> +Optional properties:
> + - resets: A phandle to the reset control for the MAC
> + - local-mac-address: See ethernet.txt in the same directory.
> +
> +Required subnode:
> + - mdio: Device tree subnode with the following required properties:
> +	- #address-cells: Must be <1>.
> +	- #size-cells: Must be <0>.
> +	- reg: phy ID number, usually a small integer.

Almost, the "reg" property applies to the MDIO child nodes: the Ethernet
PHYs/MDIO devices. For the MDIO controller itself, the "reg" property,
if specified, would be relative to the Ethernet controller's base
register address.

Please drop this property's description here.

> +
> +Example:
> +
> +	ether: ethernet@65000000 {
> +		compatible = "socionext,uniphier-ld20-ave4";
> +		reg = <0x65000000 0x8500>;
> +		interrupts = <0 66 4>;
> +		phy-mode = "rgmii";
> +		phy-handle = <&ethphy>;
> +		clocks = <&sys_clk 6>;
> +		resets = <&sys_rst 6>;
> +		local-mac-address = [00 00 00 00 00 00];
> +		mdio {
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +			ethphy: ethphy@1 {
> +				reg = <1>;
> +			};
> +		};
> +	};
> 

-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox