Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net 1/1] net/smc: coordinate wait queues for nonblocking connect
From: Cong Wang @ 2018-06-20 22:56 UTC (permalink / raw)
  To: Ursula Braun
  Cc: David Miller, Linux Kernel Network Developers, linux-s390,
	schwidefsky, Heiko Carstens, raspl, Christoph Hellwig
In-Reply-To: <20180620080737.50323-1-ubraun@linux.ibm.com>

On Wed, Jun 20, 2018 at 1:07 AM, Ursula Braun <ubraun@linux.ibm.com> wrote:
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index da7f02edcd37..21c84b924ffb 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -605,6 +605,8 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr,
>
>         smc_copy_sock_settings_to_clc(smc);
>         tcp_sk(smc->clcsock->sk)->syn_smc = 1;
> +       if (flags & O_NONBLOCK)
> +               sock->sk->sk_wq = smc->clcsock->sk->sk_wq;
>         rc = kernel_connect(smc->clcsock, addr, alen, flags);
>         if (rc)
>                 goto out;
> @@ -1285,12 +1287,9 @@ static __poll_t smc_poll_mask(struct socket *sock, __poll_t events)
>
>         smc = smc_sk(sock->sk);
>         sock_hold(sk);
> -       lock_sock(sk);
>         if ((sk->sk_state == SMC_INIT) || smc->use_fallback) {
>                 /* delegate to CLC child sock */
> -               release_sock(sk);
>                 mask = smc->clcsock->ops->poll_mask(smc->clcsock, events);
> -               lock_sock(sk);
>                 sk->sk_err = smc->clcsock->sk->sk_err;
>                 if (sk->sk_err) {
>                         mask |= EPOLLERR;
> @@ -1299,7 +1298,10 @@ static __poll_t smc_poll_mask(struct socket *sock, __poll_t events)
>                         if (sk->sk_state == SMC_INIT &&
>                             mask & EPOLLOUT &&
>                             smc->clcsock->sk->sk_state != TCP_CLOSE) {
> +                               sock->sk->sk_wq = smc->smcwq;
> +                               lock_sock(sk);

I think you need to use proper RCU API to protect these
assignment of sk->sk_wq.

^ permalink raw reply

* Re: WARNING: CPU: 3 PID: 0 at net/sched/sch_hfsc.c:1388 hfsc_dequeue+0x319/0x350 [sch_hfsc]
From: Cong Wang @ 2018-06-20 23:00 UTC (permalink / raw)
  To: Marco Berizzi; +Cc: Linux Kernel Network Developers
In-Reply-To: <1772931728.508825.1529478023614@mail.libero.it>

On Wed, Jun 20, 2018 at 12:00 AM, Marco Berizzi <pupilla@libero.it> wrote:
>> Il 19 giugno 2018 alle 13.42 Marco Berizzi <pupilla@libero.it> ha scritto:
>>
>> > Il 18 giugno 2018 alle 21.28 Cong Wang <xiyou.wangcong@gmail.com> ha scritto:
>> > Can you test the attached patch?
>> >
>> > It almost certainly fixes the warning, but I am not sure if
>> > it papers out any other real problem. Please make sure
>> > hfsc still works as expected. :)
>>
>> Hi Cong,
>>
>> Thanks a lot for the patch. I'm building 4.17.2 + your patch.
>> Within 24 hours I will update you.
>
> Hi Cong,
>
> I have applied your patch to 4.17.2
> From my point of view hfsc is working as expected.
> However my setup is pretty simple:
>
> tc class add dev eth2 parent 1:2 classid 1:16 hfsc ls rate 3000kbit ul rate 20000kbit
>
> I can see hfsc is shaping my traffic to 20MBit/s.
> The error message has not popped up :-)
> I will drop you a message withing 48 hours for confirmation.

Please also test HFSC_RSC ("rt") if possible.


> Will this patch merged in a future kernel release?
>

If you can confirm nothing breaks, I will send it out formally.

Thanks for testing!

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Song Liu @ 2018-06-20 23:14 UTC (permalink / raw)
  To: Okash Khawaja
  Cc: Daniel Borkmann, Martin Lau, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, Jakub Kicinski, David S. Miller,
	netdev@vger.kernel.org, Kernel Team, linux-kernel@vger.kernel.org
In-Reply-To: <20180620203703.101156292@fb.com>



> On Jun 20, 2018, at 1:30 PM, Okash Khawaja <osk@fb.com> wrote:
> 
> This consumes functionality exported in the previous patch. It does the
> main job of printing with BTF data. This is used in the following patch
> to provide a more readable output of a map's dump. It relies on
> json_writer to do json printing. Below is sample output where map keys
> are ints and values are of type struct A:
> 
> typedef int int_type;
> enum E {
>        E0,
>        E1,
> };
> 
> struct B {
>        int x;
>        int y;
> };
> 
> struct A {
>        int m;
>        unsigned long long n;
>        char o;
>        int p[8];
>        int q[4][8];
>        enum E r;
>        void *s;
>        struct B t;
>        const int u;
>        int_type v;
>        unsigned int w1: 3;
>        unsigned int w2: 3;
> };
> 
> $ sudo bpftool map dump -p id 14
> [{
>        "key": 0
>    },{
>        "value": {
>            "m": 1,
>            "n": 2,
>            "o": "c",
>            "p": [15,16,17,18,15,16,17,18
>            ],
>            "q": [[25,26,27,28,25,26,27,28
>                ],[35,36,37,38,35,36,37,38
>                ],[45,46,47,48,45,46,47,48
>                ],[55,56,57,58,55,56,57,58
>                ]
>            ],
>            "r": 1,
>            "s": 0x7ffff6f70568,
>            "t": {
>                "x": 5,
>                "y": 10
>            },
>            "u": 100,
>            "v": 20,
>            "w1": 0x7,
>            "w2": 0x3
>        }
>    }
> ]
> 
> This patch uses json's {} and [] to imply struct/union and array. More
> explicit information can be added later. For example, a command line
> option can be introduced to print whether a key or value is struct
> or union, name of a struct etc. This will however come at the expense
> of duplicating info when, for example, printing an array of structs.
> enums are printed as ints without their names.
> 
> Signed-off-by: Okash Khawaja <osk@fb.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> 
> ---
> tools/bpf/bpftool/btf_dumper.c |  247 +++++++++++++++++++++++++++++++++++++++++
> tools/bpf/bpftool/btf_dumper.h |   18 ++
> 2 files changed, 265 insertions(+)
> 
> --- /dev/null
> +++ b/tools/bpf/bpftool/btf_dumper.c
> @@ -0,0 +1,247 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2018 Facebook */
> +
> +#include <linux/btf.h>
> +#include <linux/err.h>
> +#include <stdio.h> /* for (FILE *) used by json_writer */
> +#include <linux/bitops.h>
> +#include <string.h>
> +#include <ctype.h>
> +
> +#include "btf.h"
> +#include "json_writer.h"
> +
> +#define BITS_PER_BYTE_MASK (BITS_PER_BYTE - 1)
> +#define BITS_PER_BYTE_MASKED(bits) ((bits) & BITS_PER_BYTE_MASK)
> +#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
> +#define BITS_ROUNDUP_BYTES(bits) \
> +	(BITS_ROUNDDOWN_BYTES(bits) + !!BITS_PER_BYTE_MASKED(bits))
> +
> +static int btf_dumper_do_type(const struct btf *btf, uint32_t type_id,
> +		uint8_t bit_offset, const void *data, json_writer_t *jw);
> +
> +static void btf_dumper_ptr(const void *data, json_writer_t *jw)
> +{
> +	jsonw_printf(jw, "%p", *((uintptr_t *)data));
> +}
> +
> +static int btf_dumper_modifier(const struct btf *btf, uint32_t type_id,
> +		const void *data, json_writer_t *jw)
> +{
> +	int32_t actual_type_id = btf__resolve_type(btf, type_id);
> +	int ret;
> +
> +	if (actual_type_id < 0)
> +		return actual_type_id;
> +
> +	ret = btf_dumper_do_type(btf, actual_type_id, 0, data, jw);
> +
> +	return ret;
> +}
> +
> +static void btf_dumper_enum(const void *data, json_writer_t *jw)
> +{
> +	jsonw_printf(jw, "%d", *((int32_t *)data));
> +}
> +
> +static int btf_dumper_array(const struct btf *btf, uint32_t type_id,
> +		const void *data, json_writer_t *jw)
> +{
> +	const struct btf_type *t = btf__type_by_id(btf, type_id);
> +	struct btf_array *arr = (struct btf_array *)(t + 1);
> +	int64_t elem_size;
> +	uint32_t i;
> +	int ret;
> +
> +	elem_size = btf__resolve_size(btf, arr->type);
> +	if (elem_size < 0)
> +		return elem_size;
> +
> +	jsonw_start_array(jw);
> +	for (i = 0; i < arr->nelems; i++) {
> +		ret = btf_dumper_do_type(btf, arr->type, 0,
> +				data + (i * elem_size), jw);
> +		if (ret)
> +			return ret;

Shall we call jsonw_end_array() before return here? 

> +	}
> +
> +	jsonw_end_array(jw);
> +
> +	return 0;
> +}
> +
> +static void btf_dumper_int_bits(uint32_t int_type, uint8_t bit_offset,
> +		const void *data, json_writer_t *jw)
> +{
> +	uint16_t total_bits_offset;
> +	uint32_t bits = BTF_INT_BITS(int_type);
> +	uint16_t bytes_to_copy;
> +	uint16_t bits_to_copy;
> +	uint8_t upper_bits;
> +	union {
> +		uint64_t u64_num;
> +		uint8_t u8_nums[8];
> +	} print_num;
> +
> +	total_bits_offset = bit_offset + BTF_INT_OFFSET(int_type);
> +	data += BITS_ROUNDDOWN_BYTES(total_bits_offset);
> +	bit_offset = BITS_PER_BYTE_MASKED(total_bits_offset);
> +	bits_to_copy = bits + bit_offset;
> +	bytes_to_copy = BITS_ROUNDUP_BYTES(bits_to_copy);
> +
> +	print_num.u64_num = 0;
> +	memcpy(&print_num.u64_num, data, bytes_to_copy);
> +
> +	upper_bits = BITS_PER_BYTE_MASKED(bits_to_copy);
> +	if (upper_bits) {
> +		uint8_t mask = (1 << upper_bits) - 1;
> +
> +		print_num.u8_nums[bytes_to_copy - 1] &= mask;
> +	}
> +
> +	print_num.u64_num >>= bit_offset;
> +
> +	jsonw_printf(jw, "0x%llx", print_num.u64_num);
> +}
> +
> +static int btf_dumper_int(const struct btf_type *t, uint8_t bit_offset,
> +		const void *data, json_writer_t *jw)
> +{
> +	uint32_t *int_type = (uint32_t *)(t + 1);
> +	uint32_t bits = BTF_INT_BITS(*int_type);
> +	int ret = 0;
> +
> +	/* if this is bit field */
> +	if (bit_offset || BTF_INT_OFFSET(*int_type) ||
> +			BITS_PER_BYTE_MASKED(bits)) {
> +		btf_dumper_int_bits(*int_type, bit_offset, data, jw);
> +		return ret;
> +	}
> +
> +	switch (BTF_INT_ENCODING(*int_type)) {
> +	case 0:
> +		if (BTF_INT_BITS(*int_type) == 64)
> +			jsonw_printf(jw, "%lu", *((uint64_t *)data));
> +		else if (BTF_INT_BITS(*int_type) == 32)
> +			jsonw_printf(jw, "%u", *((uint32_t *)data));
> +		else if (BTF_INT_BITS(*int_type) == 16)
> +			jsonw_printf(jw, "%hu", *((uint16_t *)data));
> +		else if (BTF_INT_BITS(*int_type) == 8)
> +			jsonw_printf(jw, "%hhu", *((uint8_t *)data));
> +		else
> +			btf_dumper_int_bits(*int_type, bit_offset, data, jw);
> +		break;
> +	case BTF_INT_SIGNED:
> +		if (BTF_INT_BITS(*int_type) == 64)
> +			jsonw_printf(jw, "%ld", *((int64_t *)data));
> +		else if (BTF_INT_BITS(*int_type) == 32)
> +			jsonw_printf(jw, "%d", *((int32_t *)data));
> +		else if (BTF_INT_BITS(*int_type) ==  16)
> +			jsonw_printf(jw, "%hd", *((int16_t *)data));
> +		else if (BTF_INT_BITS(*int_type) ==  8)
> +			jsonw_printf(jw, "%hhd", *((int8_t *)data));
> +		else
> +			btf_dumper_int_bits(*int_type, bit_offset, data, jw);
> +		break;
> +	case BTF_INT_CHAR:
> +		if (*((char *)data) == '\0')
> +			jsonw_null(jw);
> +		else if (isprint(*((char *)data)))
> +			jsonw_printf(jw, "\"%c\"", *((char *)data));
> +		else
> +			jsonw_printf(jw, "%hhx", *((char *)data));
> +		break;
> +	case BTF_INT_BOOL:
> +		jsonw_bool(jw, *((int *)data));
> +		break;
> +	default:
> +		/* shouldn't happen */
> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
> +static int btf_dumper_struct(const struct btf *btf, uint32_t type_id,
> +		const void *data, json_writer_t *jw)
> +{
> +	const struct btf_type *t = btf__type_by_id(btf, type_id);
> +	struct btf_member *m;
> +	int ret = 0;
> +	int i, vlen;
> +
> +	if (t == NULL)
> +		return -EINVAL;
> +
> +	vlen = BTF_INFO_VLEN(t->info);
> +	jsonw_start_object(jw);
> +	m = (struct btf_member *)(t + 1);
> +
> +	for (i = 0; i < vlen; i++) {
> +		jsonw_name(jw, btf__name_by_offset(btf, m[i].name_off));
> +		ret = btf_dumper_do_type(btf, m[i].type,
> +				BITS_PER_BYTE_MASKED(m[i].offset),
> +				data + BITS_ROUNDDOWN_BYTES(m[i].offset), jw);
> +		if (ret)
> +			return ret;

Shall we call jsonw_end_object() before return here? 

> +	}
> +
> +	jsonw_end_object(jw);
> +
> +	return 0;
> +}
> +
> +static int btf_dumper_do_type(const struct btf *btf, uint32_t type_id,
> +		uint8_t bit_offset, const void *data, json_writer_t *jw)
> +{
> +	const struct btf_type *t = btf__type_by_id(btf, type_id);
> +	int ret = 0;
> +
> +	switch (BTF_INFO_KIND(t->info)) {
> +	case BTF_KIND_INT:
> +		ret = btf_dumper_int(t, bit_offset, data, jw);
> +		break;
> +	case BTF_KIND_STRUCT:
> +	case BTF_KIND_UNION:
> +		ret = btf_dumper_struct(btf, type_id, data, jw);
> +		break;
> +	case BTF_KIND_ARRAY:
> +		ret = btf_dumper_array(btf, type_id, data, jw);
> +		break;
> +	case BTF_KIND_ENUM:
> +		btf_dumper_enum(data, jw);
> +		break;
> +	case BTF_KIND_PTR:
> +		btf_dumper_ptr(data, jw);
> +		break;
> +	case BTF_KIND_UNKN:
> +		jsonw_printf(jw, "(unknown)");
> +		break;
> +	case BTF_KIND_FWD:
> +		/* map key or value can't be forward */
> +		ret = -EINVAL;
> +		break;
> +	case BTF_KIND_TYPEDEF:
> +	case BTF_KIND_VOLATILE:
> +	case BTF_KIND_CONST:
> +	case BTF_KIND_RESTRICT:
> +		ret = btf_dumper_modifier(btf, type_id, data, jw);
> +		break;
> +	default:
> +		jsonw_printf(jw, "(unsupported-kind");

"(unsupported-kind)"   (missing ')'). 

> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
> +int32_t btf_dumper_type(const struct btf *btf, json_writer_t *jw,
> +		uint32_t type_id, const void *data)
> +{
> +	if (!jw)
> +		return -EINVAL;
> +
> +	return btf_dumper_do_type(btf, type_id, 0, data, jw);
nit: btf_dumper_do_type() returns int, while btf_dumper_type() returns int32_t. 
     This should not matter though. 

> +}
> --- /dev/null
> +++ b/tools/bpf/bpftool/btf_dumper.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2018 Facebook */
> +
> +#ifndef BTF_DUMPER_H
> +#define BTF_DUMPER_H
> +
> +/* btf_dumper_type - json print data along with type information
> + * @btf: btf instance initialised via btf__new()
> + * @jw: json writer used for printing
> + * @type_id: index in btf->types array. this points to the type to be dumped
> + * @data: pointer the actual data, i.e. the values to be printed
> + *
> + * Returns zero on success and negative error code otherwise
> + */
> +int32_t btf_dumper_type(const struct btf *btf, json_writer_t *jw,
> +		uint32_t type_id, void *data);
> +
> +#endif
> 

^ permalink raw reply

* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Anders Roxell @ 2018-06-20 23:18 UTC (permalink / raw)
  To: shannon.nelson; +Cc: Networking, David Miller
In-Reply-To: <cd0832a6-0d65-b9de-35b4-fdcb72d6003d@oracle.com>

On Thu, 21 Jun 2018 at 00:26, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>
> On 6/20/2018 12:09 PM, Anders Roxell wrote:
> > On Wed, 20 Jun 2018 at 07:42, Shannon Nelson <shannon.nelson@oracle.com> wrote:
> >>
> >> A couple of bad behaviors in the ipsec selftest were pointed out
> >> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
> >>
> >> Shannon Nelson (2):
> >>    selftests: rtnetlink: hide complaint from terminated monitor
> >>    selftests: rtnetlink: use a local IP address for IPsec tests
> >>
> >>   tools/testing/selftests/net/rtnetlink.sh | 11 +++++++----
> >>   1 file changed, 7 insertions(+), 4 deletions(-)
> >>
> >> --
> >> 2.7.4
> >>
> >
> > Hi Shannon,
> >
> > With this patches applied and my config patch.
> >
> > I still get this error when I run the ipsec test:
> >
> > FAIL: can't add fou port 7777, skipping test
> > RTNETLINK answers: Operation not supported
> > FAIL: can't add macsec interface, skipping test
> > RTNETLINK answers: Protocol not supported
> > RTNETLINK answers: No such process
> > RTNETLINK answers: No such process
> > FAIL: ipsec
>
> One of the odd things I noticed about this script is that there really
> aren't any diagnosis messages, just PASS or FAIL.  I followed this
> custom when I added the ipsec tests, but I think this is something that
> should change so we can get some idea of what breaks.
>
> I'm curious about the "RTNETLINK answers" messages and where they might
> be coming from, especially "RTNETLINK answers: Protocol not supported".

I added: "set -x" in the beginning of the rtnetlink.sh script.
+ ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
0x3132333435
363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
RTNETLINK answers: Protocol not supported

> What version of iproute2 are you using?  Is it older than iproute2-ss130716?

I use iproute2 release 4.17.0.

>
> What distro and kernel are you running?

for this test linux-next tag: next-20180620 distro OE (morty)

>
> What are the XFRM and AES settings in your kernel config - what is the
> output from
>         egrep -i "xfrm|_aes" .config

CONFIG_XFRM=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET6_XFRM_MODE_TRANSPORT=y
CONFIG_INET6_XFRM_MODE_TUNNEL=y
CONFIG_INET6_XFRM_MODE_BEET=y
CONFIG_CRYPTO_AES=y

>
> I did also notice that the ipsec test should set ret=0 at its start.

did.

> Can you either add this or comment out all the other tests in
> kci_test_rtnl() so that only the kci_test_ipsec is run and send me the
> output?

done.

Same result as before... added "set -x" and this is the output:
+ devdummy=test-dummy0
+ ret=0
+ ksft_skip=4
++ id -u
+ '[' 0 -ne 0 ']'
+ for x in ip tc
+ ip -Version
+ '[' 0 -ne 0 ']'
+ for x in ip tc
+ tc -Version
+ '[' 0 -ne 0 ']'
+ kci_test_rtnl
+ kci_test_ipsec
+ ret=0
++ ip -o addr
++ awk '/inet / { print $4; }'
++ grep -v '^127'
++ head -1
++ cut -f1 -d/
+ srcip=10.66.17.140
++ echo 10.66.17.140
++ cut -f1-3 -d.
+ net=10.66.17
++ echo 10.66.17.140
++ cut -f4 -d.
+ base=140
++ expr 140 + 1
+ dstip=10.66.17.141
+ algo='aead rfc4106(gcm(aes)) 0x3132333435363738393031323334353664636261 128'
+ ip x s flush
+ ip x p flush
+ check_err 0
+ '[' 0 -eq 0 ']'
+ ret=0
++ mktemp ipsectestXXX
+ tmpfile=ipsectestHFP
+ mpid=3339
+ sleep 0.2
+ ipsecid='proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07'
+ ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
0x3132333435
363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
RTNETLINK answers: Protocol not supported
+ check_err 2
+ '[' 0 -eq 0 ']'
+ ret=2
++ ip x s list
++ grep 10.66.17.140
++ grep 10.66.17.141
++ wc -l
+ lines=0
+ test 0 -eq 2
+ check_err 1
+ '[' 2 -eq 0 ']'
+ ip x s count
+ grep -q 'SAD count 1'
+ check_err 1
+ '[' 2 -eq 0 ']'
++ ip x s get proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07
++ grep 10.66.17.140
++ grep 10.66.17.141
++ wc -l
RTNETLINK answers: No such process
+ lines=0
+ test 0 -eq 2
+ check_err 1
+ '[' 2 -eq 0 ']'
+ ip x s delete proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07
RTNETLINK answers: No such process
+ check_err 2
+ '[' 2 -eq 0 ']'
++ ip x s list
++ wc -l
+ lines=0
+ test 0 -eq 0
+ check_err 0
+ '[' 2 -eq 0 ']'
+ ipsecsel='dir out src 10.66.17.140/24 dst 10.66.17.141/24'
+ ip x p add dir out src 10.66.17.140/24 dst 10.66.17.141/24 tmpl
proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode transport
reqid 0x07
+ check_err 0
+ '[' 2 -eq 0 ']'
++ grep 10.66.17.140
++ grep 10.66.17.141
++ wc -l
++ ip x p list
+ lines=2
+ test 2 -eq 2
+ check_err 0
+ '[' 2 -eq 0 ']'
+ ip x p count
+ grep -q 'SPD IN  0 OUT 1 FWD 0'
+ check_err 0
+ '[' 2 -eq 0 ']'
++ ip x p get dir out src 10.66.17.140/24 dst 10.66.17.141/24
++ grep 10.66.17.140
++ grep 10.66.17.141
++ wc -l
+ lines=2
+ test 2 -eq 2
+ check_err 0
+ '[' 2 -eq 0 ']'
+ ip x p delete dir out src 10.66.17.140/24 dst 10.66.17.141/24
+ check_err 0
+ '[' 2 -eq 0 ']'
++ ip x p list
++ wc -l
+ lines=0
+ test 0 -eq 0
+ check_err 0
+ '[' 2 -eq 0 ']'
+ kill 3339
++ wc -l ipsectestHFP
++ cut '-d ' -f1
+ lines=8
+ test 8 -eq 20
+ check_err 1
+ '[' 2 -eq 0 ']'
+ rm -rf ipsectestHFP
+ ip x s flush
+ check_err 0
+ '[' 2 -eq 0 ']'
+ ip x p flush
+ check_err 0
+ '[' 2 -eq 0 ']'
+ '[' 2 -ne 0 ']'
+ echo 'FAIL: ipsec'
FAIL: ipsec
+ return 1
+ exit 2

Cheers,
Anders

^ permalink raw reply

* Re: [PATCH bpf-next 3/3] bpf: btf: json print map dump with btf info
From: Song Liu @ 2018-06-20 23:22 UTC (permalink / raw)
  To: Okash Khawaja
  Cc: Daniel Borkmann, Martin Lau, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, Jakub Kicinski, David S. Miller,
	netdev@vger.kernel.org, Kernel Team, linux-kernel@vger.kernel.org
In-Reply-To: <20180620203703.208599277@fb.com>



> On Jun 20, 2018, at 1:30 PM, Okash Khawaja <osk@fb.com> wrote:
> 
> This patch modifies `bpftool map dump [-j|-p] id <map-id>` to json-
> print and pretty-json-print map dump. It calls btf_dumper introduced in
> previous patch to accomplish this.
> 
> The patch only prints debug info when -j or -p flags are supplied. Then
> too, if the map has associated btf data loaded. Otherwise the usual
> debug-less output is printed.
> 
> Signed-off-by: Okash Khawaja <osk@fb.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> 
> ---
> tools/bpf/bpftool/map.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 91 insertions(+), 3 deletions(-)
> 
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -43,9 +43,13 @@
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> +#include <linux/err.h>
> 
> #include <bpf.h>
> 
> +#include "json_writer.h"
> +#include "btf.h"
> +#include "btf_dumper.h"
> #include "main.h"
> 
> static const char * const map_type_name[] = {
> @@ -508,6 +512,83 @@ static int do_show(int argc, char **argv
> 	return errno == ENOENT ? 0 : -1;
> }
> 
> +
> +static int do_dump_btf(struct btf *btf, struct bpf_map_info *map_info,
> +		void *key, void *value)
> +{
> +	int ret;
> +
> +	jsonw_start_object(json_wtr);
> +	jsonw_name(json_wtr, "key");
> +
> +	ret = btf_dumper_type(btf, json_wtr, map_info->btf_key_type_id, key);
> +	if (ret)
> +		goto out;
> +
> +	jsonw_end_object(json_wtr);
> +
> +	jsonw_start_object(json_wtr);
> +	jsonw_name(json_wtr, "value");
> +
> +	ret = btf_dumper_type(btf, json_wtr, map_info->btf_value_type_id,
> +			value);
> +
> +out:
> +	/* end of root object */
> +	jsonw_end_object(json_wtr);
> +
> +	return ret;
> +}

Please add some tests for do_dump_btf(), including some invalid data type/kind. 

> +
> +static struct btf *get_btf(struct bpf_map_info *map_info)
> +{
> +	int btf_fd = bpf_btf_get_fd_by_id(map_info->btf_id);
> +	struct bpf_btf_info btf_info = { 0 };
> +	__u32 len = sizeof(btf_info);
> +	uint32_t last_size;
> +	int err;
> +	struct btf *btf = NULL;
> +	void *ptr = NULL, *temp_ptr;
> +
> +	if (btf_fd < 0)
> +		return NULL;
> +
> +	btf_info.btf_size = 4096;

Is btf_size always a constant of 4096? 

> +	do {
> +		last_size = btf_info.btf_size;
> +		temp_ptr = realloc(ptr, last_size);
> +		if (!temp_ptr) {
> +			p_err("unable allocate memory for debug info.");
> +			goto exit_free;
> +		}
> +
> +		ptr = temp_ptr;
> +		bzero(ptr, last_size);
> +		btf_info.btf = ptr_to_u64(ptr);
> +		err = bpf_obj_get_info_by_fd(btf_fd, &btf_info, &len);
> +	} while (!err && btf_info.btf_size > last_size && last_size == 4096);
> +
> +	if (err || btf_info.btf_size > last_size) {
> +		p_info("can't get btf info. debug info won't be displayed. error: %s",
> +				err ? strerror(errno) : "exceeds size retry");
> +		goto exit_free;
> +	}
> +
> +	btf = btf__new((uint8_t *) btf_info.btf,
> +			btf_info.btf_size, NULL);
> +	if (IS_ERR(btf)) {
> +		printf("error when initialising btf: %s\n",
> +				strerror(PTR_ERR(btf)));
> +		btf = NULL;
> +	}
> +
> +exit_free:
> +	close(btf_fd);
> +	free(ptr);
> +
> +	return btf;
> +}
> +
> static int do_dump(int argc, char **argv)
> {
> 	void *key, *value, *prev_key;
> @@ -516,6 +597,7 @@ static int do_dump(int argc, char **argv
> 	__u32 len = sizeof(info);
> 	int err;
> 	int fd;
> +	struct btf *btf = NULL;
> 
> 	if (argc != 2)
> 		usage();
> @@ -538,6 +620,8 @@ static int do_dump(int argc, char **argv
> 		goto exit_free;
> 	}
> 
> +	btf = get_btf(&info);
> +
> 	prev_key = NULL;
> 	if (json_output)
> 		jsonw_start_array(json_wtr);
> @@ -550,9 +634,12 @@ static int do_dump(int argc, char **argv
> 		}
> 
> 		if (!bpf_map_lookup_elem(fd, key, value)) {
> -			if (json_output)
> -				print_entry_json(&info, key, value);
> -			else
> +			if (json_output) {
> +				if (btf)
> +					do_dump_btf(btf, &info, key, value);
> +				else
> +					print_entry_json(&info, key, value);
> +			} else
> 				print_entry_plain(&info, key, value);
> 		} else {
> 			if (json_output) {
> @@ -584,6 +671,7 @@ exit_free:
> 	free(key);
> 	free(value);
> 	close(fd);
> +	btf__free(btf);
> 
> 	return err;
> }
> 

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] bpf: btf: export btf types and name by offset from lib
From: Song Liu @ 2018-06-20 23:24 UTC (permalink / raw)
  To: Okash Khawaja
  Cc: Daniel Borkmann, Martin Lau, Alexei Starovoitov, Yonghong Song,
	Quentin Monnet, Jakub Kicinski, David S. Miller, Networking,
	Kernel Team, linux-kernel@vger.kernel.org
In-Reply-To: <20180620224849.GA18488@w1t1fb>



> On Jun 20, 2018, at 3:48 PM, Okash Khawaja <osk@fb.com> wrote:
> 
> On Wed, Jun 20, 2018 at 11:40:19PM +0100, Song Liu wrote:
>> 
>> 
>>> On Jun 20, 2018, at 1:30 PM, Okash Khawaja <osk@fb.com> wrote:
>>> 
>>> This patch introduces btf__resolve_type() function and exports two
>>> existing functions from libbpf. btf__resolve_type follows modifier
>>> types like const and typedef until it hits a type which actually takes
>>> up memory, and then returns it. This function follows similar pattern
>>> to btf__resolve_size but instead of computing size, it just returns
>>> the type.
>>> 
>>> These  functions will be used in the followig patch which parses
>>> information inside array of `struct btf_type *`. btf_name_by_offset is
>>> used for printing variable names.
>>> 
>>> Signed-off-by: Okash Khawaja <osk@fb.com>
>>> Acked-by: Martin KaFai Lau <kafai@fb.com>
>>> 
>>> ---
>>> tools/lib/bpf/btf.c |   65 ++++++++++++++++++++++++++++++++++++----------------
>>> tools/lib/bpf/btf.h |    3 ++
>>> 2 files changed, 48 insertions(+), 20 deletions(-)
>>> 
>>> --- a/tools/lib/bpf/btf.c
>>> +++ b/tools/lib/bpf/btf.c
>>> @@ -17,6 +17,11 @@
>>> 
>>> #define BTF_MAX_NR_TYPES 65535
>>> 
>>> +#define IS_MODIFIER(k) (((k) == BTF_KIND_TYPEDEF) || \
>>> +		((k) == BTF_KIND_VOLATILE) || \
>>> +		((k) == BTF_KIND_CONST) || \
>>> +		((k) == BTF_KIND_RESTRICT))
>>> +
>>> static struct btf_type btf_void;
>>> 
>>> struct btf {
>>> @@ -33,14 +38,6 @@ struct btf {
>>> 	int fd;
>>> };
>>> 
>>> -static const char *btf_name_by_offset(const struct btf *btf, uint32_t offset)
>>> -{
>>> -	if (offset < btf->hdr->str_len)
>>> -		return &btf->strings[offset];
>>> -	else
>>> -		return NULL;
>>> -}
>>> -
>>> static int btf_add_type(struct btf *btf, struct btf_type *t)
>>> {
>>> 	if (btf->types_size - btf->nr_types < 2) {
>>> @@ -190,15 +187,6 @@ static int btf_parse_type_sec(struct btf
>>> 	return 0;
>>> }
>>> 
>>> -static const struct btf_type *btf_type_by_id(const struct btf *btf,
>>> -					     uint32_t type_id)
>>> -{
>>> -	if (type_id > btf->nr_types)
>>> -		return NULL;
>>> -
>>> -	return btf->types[type_id];
>>> -}
>> 
>> nit: Why do we need to move these two functions to later of the file? 
> No real reason. It looked like this file was following a convention of
> keeping statics together. I'll put them back in place if no one objects.

I see. Let's keep this patch as-is then. 

Thanks,
Song

> 
>> 
>> Other than this,
>> 
>> Acked-by: Song Liu <songliubraving@fb.com>
> Thanks
> 
>> 
>>> -
>>> static bool btf_type_is_void(const struct btf_type *t)
>>> {
>>> 	return t == &btf_void || BTF_INFO_KIND(t->info) == BTF_KIND_FWD;
>>> @@ -234,7 +222,7 @@ int64_t btf__resolve_size(const struct b
>>> 	int64_t size = -1;
>>> 	int i;
>>> 
>>> -	t = btf_type_by_id(btf, type_id);
>>> +	t = btf__type_by_id(btf, type_id);
>>> 	for (i = 0; i < MAX_RESOLVE_DEPTH && !btf_type_is_void_or_null(t);
>>> 	     i++) {
>>> 		size = btf_type_size(t);
>>> @@ -259,7 +247,7 @@ int64_t btf__resolve_size(const struct b
>>> 			return -EINVAL;
>>> 		}
>>> 
>>> -		t = btf_type_by_id(btf, type_id);
>>> +		t = btf__type_by_id(btf, type_id);
>>> 	}
>>> 
>>> 	if (size < 0)
>>> @@ -271,6 +259,26 @@ int64_t btf__resolve_size(const struct b
>>> 	return nelems * size;
>>> }
>>> 
>>> +int32_t btf__resolve_type(const struct btf *btf, uint32_t type_id)
>>> +{
>>> +	const struct btf_type *t;
>>> +	int depth = 0;
>>> +
>>> +	t = btf__type_by_id(btf, type_id);
>>> +	while (depth < MAX_RESOLVE_DEPTH &&
>>> +			!btf_type_is_void_or_null(t) &&
>>> +			IS_MODIFIER(BTF_INFO_KIND(t->info))) {
>>> +		type_id = t->type;
>>> +		t = btf__type_by_id(btf, type_id);
>>> +		depth++;
>>> +	}
>>> +
>>> +	if (depth == MAX_RESOLVE_DEPTH || btf_type_is_void_or_null(t))
>>> +		return -EINVAL;
>>> +
>>> +	return type_id;
>>> +}
>>> +
>>> int32_t btf__find_by_name(const struct btf *btf, const char *type_name)
>>> {
>>> 	uint32_t i;
>>> @@ -280,7 +288,7 @@ int32_t btf__find_by_name(const struct b
>>> 
>>> 	for (i = 1; i <= btf->nr_types; i++) {
>>> 		const struct btf_type *t = btf->types[i];
>>> -		const char *name = btf_name_by_offset(btf, t->name_off);
>>> +		const char *name = btf__name_by_offset(btf, t->name_off);
>>> 
>>> 		if (name && !strcmp(type_name, name))
>>> 			return i;
>>> @@ -371,3 +379,20 @@ int btf__fd(const struct btf *btf)
>>> {
>>> 	return btf->fd;
>>> }
>>> +
>>> +const char *btf__name_by_offset(const struct btf *btf, uint32_t offset)
>>> +{
>>> +	if (offset < btf->hdr->str_len)
>>> +		return &btf->strings[offset];
>>> +	else
>>> +		return NULL;
>>> +}
>>> +
>>> +const struct btf_type *btf__type_by_id(const struct btf *btf,
>>> +					     uint32_t type_id)
>>> +{
>>> +	if (type_id > btf->nr_types)
>>> +		return NULL;
>>> +
>>> +	return btf->types[type_id];
>>> +}
>>> --- a/tools/lib/bpf/btf.h
>>> +++ b/tools/lib/bpf/btf.h
>>> @@ -17,6 +17,9 @@ void btf__free(struct btf *btf);
>>> struct btf *btf__new(uint8_t *data, uint32_t size, btf_print_fn_t err_log);
>>> int32_t btf__find_by_name(const struct btf *btf, const char *type_name);
>>> int64_t btf__resolve_size(const struct btf *btf, uint32_t type_id);
>>> +int32_t btf__resolve_type(const struct btf *btf, uint32_t type_id);
>>> int btf__fd(const struct btf *btf);
>>> +const char *btf__name_by_offset(const struct btf *btf, uint32_t offset);
>>> +const struct btf_type *btf__type_by_id(const struct btf *btf, uint32_t type_id);
>>> 
>>> #endif

^ permalink raw reply

* Re: [PATCH] selftests: bpf: config: add config fragments
From: Anders Roxell @ 2018-06-20 23:31 UTC (permalink / raw)
  To: William Tu
  Cc: Daniel Borkmann, Alexei Starovoitov, Shuah Khan, Networking,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK
In-Reply-To: <CALDO+SY2eE6DHtVmfnUpMjTjHe3shfzae48nk_OehfUyuKNYsg@mail.gmail.com>

On Tue, 19 Jun 2018 at 14:56, William Tu <u9012063@gmail.com> wrote:
>
> On Thu, Jun 14, 2018 at 11:41 PM, Anders Roxell
> <anders.roxell@linaro.org> wrote:
> > On Thu, 14 Jun 2018 at 14:09, William Tu <u9012063@gmail.com> wrote:
> >>
> >> On Thu, Jun 14, 2018 at 4:42 AM, Anders Roxell <anders.roxell@linaro.org> wrote:
> >> > On 14 June 2018 at 13:06, William Tu <u9012063@gmail.com> wrote:
> >> >> On Tue, Jun 12, 2018 at 5:08 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> >>> On 06/12/2018 01:05 PM, Anders Roxell wrote:
> >> >>>> Tests test_tunnel.sh fails due to config fragments ins't enabled.
> >> >>>>
> >> >>>> Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.")
> >> >>>> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> >> >>>> ---
> >> >>>>
> >> >>>> All tests passes except ip6gretap that still fails. I'm unsure why.
> >> >>>> Ideas?
> >> >>
> >> >> Hi Anders,
> >> >>
> >> >> ip6erspan is based on ip6gretap, does ip6erspan pass?
> >> >
> >> > it did pass  when I was sending the email.
> >> > However, I retested this on next-20180613 and now it fails.
> >> >
> >> Does 'ip -s link show' show any errors/dropped on ip6gretap device?
> >
> > I rerun the test_ip6gretap test only and added "set -x" to
> > test_tunnel.sh here's the output.
> > I added "ip -s link show ip6gretap11" before the cleanup function in the script.
> >
> > # ./test_tunnel.sh
> > + PING_ARG='-c 3 -w 10 -q'
> > + ret=0
> > + GREEN='\033[0;92m'
> > + RED='\033[0;31m'
> > + NC='\033[0m'
> > + trap cleanup 0 3 6
> > + trap cleanup_exit 2 9
> > + cleanup
> > + ip netns delete at_ns0
> > + ip link del veth1
> > + ip link del ipip11
> > + ip link del ipip6tnl11
> > + ip link del gretap11
> > + ip link del ip6gre11
> > + ip link del ip6gretap11
> > + ip link del vxlan11
> > + ip link del ip6vxlan11
> > + ip link del geneve11
> > + ip link del ip6geneve11
> > + ip link del erspan11
> > + ip link del ip6erspan11
> > + bpf_tunnel_test
> > + echo 'Testing IP6GRETAP tunnel...'
> > Testing IP6GRETAP tunnel...
> > + test_ip6gretap
> > + TYPE=ip6gretap
> > + DEV_NS=ip6gretap00
> > + DEV=ip6gretap11
> > + ret=0
> > + check ip6gretap
> > + ip link help ip6gretap
> > + grep -q '^Usage:'
> > + '[' 0 -ne 0 ']'
> > + config_device
> > + ip netns add at_ns0
> > + ip link add veth0 type veth peer name veth1
> > + ip link set veth0 netns at_ns0
> > + ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
> > + ip netns exec at_ns0 ip link set dev veth0 up
> > + ip link set dev veth1 up mtu 1500
> > + ip addr add dev veth1 172.16.1.200/24
> > + add_ip6gretap_tunnel
> > + ip netns exec at_ns0 ip addr add ::11/96 dev veth0
> > + ip netns exec at_ns0 ip link set dev veth0 up
> > + ip addr add dev veth1 ::22/96
> > + ip link set dev veth1 up
> > + ip netns exec at_ns0 ip link add dev ip6gretap00 type ip6gretap seq
> > flowlabel 0xbcdef key 2 local ::11 remote ::22
> > + ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
> > + ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
> > + ip netns exec at_ns0 ip link set dev ip6gretap00 up
> > + ip link add dev ip6gretap11 type ip6gretap external
> > + ip addr add dev ip6gretap11 10.1.1.200/24
> > + ip addr add dev ip6gretap11 fc80::200/24
> > + ip link set dev ip6gretap11 up
> > + attach_bpf ip6gretap11 ip6gretap_set_tunnel ip6gretap_get_tunnel
> > + DEV=ip6gretap11
> > + SET=ip6gretap_set_tunnel
> > + GET=ip6gretap_get_tunnel
> > + tc qdisc add dev ip6gretap11 clsact
> > + tc filter add dev ip6gretap11 egress bpf da obj test_tunnel_kern.o
> > sec ip6gretap_set_tunnel
> > + tc filter add dev ip6gretap11 ingress bpf da obj test_tunnel_kern.o
> > sec ip6gretap_get_tunnel
> > + ping6 -c 3 -w 10 -q ::11
> > PING ::11 (::11): 56 data bytes
> >
> > --- ::11 ping statistics ---
> > 5 packets transmitted, 3 packets received, 40% packet loss
> > round-trip min/avg/max = 0.139/1.857/5.293 ms
> > + ip netns exec at_ns0 ping -c 3 -w 10 -q 10.1.1.200
> > PING 10.1.1.200 (10.1.1.200): 56 data bytes
> >
> > --- 10.1.1.200 ping statistics ---
> > 3 packets transmitted, 3 packets received, 0% packet loss
> > round-trip min/avg/max = 0.214/0.256/0.305 ms
> > + ping -c 3 -w 10 -q 10.1.1.100
> > PING 10.1.1.100 (10.1.1.100): 56 data bytes
> >
> > --- 10.1.1.100 ping statistics ---
> > 3 packets transmitted, 3 packets received, 0% packet loss
> > round-trip min/avg/max = 0.210/0.211/0.213 ms
> > + check_err 0
> > + '[' 0 -eq 0 ']'
> > + ret=0
> > + ip netns exec at_ns0 ping6 -c 3 -w 10 -q fc80::200
> > PING fc80::200 (fc80::200): 56 data bytes
> >
> > --- fc80::200 ping statistics ---
> > 10 packets transmitted, 0 packets received, 100% packet loss
> > + check_err 1
> > + '[' 0 -eq 0 ']'
> > + ret=1
> > + ip -s link show ip6gretap11
> > 19: ip6gretap11@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1434 qdisc
> > pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
> >     link/ether de:d2:0c:53:80:8c brd ff:ff:ff:ff:ff:ff
> >     RX: bytes  packets  errors  dropped overrun mcast
> >     2096       25       0       0       0       0
> >     TX: bytes  packets  errors  dropped carrier collsns
> >     5324       36       5       5       0       0
>
> So there are 5 errors at TX.

and today when I tried it on next-20180620 I saw 8 errors at TX.

> I couldn't reproduce in my local machine using 4.17-rc6.
> How do I checkin the "next-20180613" source code?

You can find the source code here [1], and I would look in the latest tag that I
said that I was able to reproduce it on above.

Cheers,
Anders
[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/

^ permalink raw reply

* Re: [PATCH v3 07/14] net: smc911x: remove the dmaengine compat need
From: David Miller @ 2018-06-20 23:39 UTC (permalink / raw)
  To: robert.jarzmik; +Cc: daniel, linux-kernel, netdev
In-Reply-To: <87muvppebs.fsf@belgarion.home>

From: Robert Jarzmik <robert.jarzmik@free.fr>
Date: Wed, 20 Jun 2018 19:17:43 +0200

> Could you (or somebody from netdev) review it and either ack it (and I'll take
> it through the pxa tree), or take it for v4.19 please ?

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH v3 08/14] net: smc91x: remove the dmaengine compat need
From: David Miller @ 2018-06-20 23:40 UTC (permalink / raw)
  To: robert.jarzmik; +Cc: daniel, linux-kernel, netdev
In-Reply-To: <87lgb9pebm.fsf@belgarion.home>


Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Saeed Mahameed @ 2018-06-20 23:41 UTC (permalink / raw)
  To: eric.dumazet@gmail.com, kafai@fb.com, Tariq Toukan
  Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <1bd6da9b-fa46-25e7-8921-cb56eb91e71b@gmail.com>

On Tue, 2018-06-19 at 17:25 -0700, Eric Dumazet wrote:
> 
> On 06/19/2018 11:05 AM, Saeed Mahameed wrote:
> 
> > this is only true for XDP setup, for non XDP max stride_size can
> > only
> > be around ~3k and only for mtu > ~6k
> > 
> > For XDP setup you suggested:
> > -               priv->frag_info[0].frag_size = eff_mtu;
> > +               priv->frag_info[0].frag_size = PAGE_SIZE;
> > 
> > currently the condition is:
> > 
> > release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> > 
> > so my solution and yours have the same problem you described above.
> > 
> > the problem is not with the initial values or with stride/farg size
> > math, it just that in XDP we shouldn't reuse at ALL. I agree with
> > you
> > that we need to optimize and maybe for PAGE_SIZE > 8k we need to
> > allow
> > XDP setup to reuses. but for now there is a data corruption to
> > handle.
> 
> 
> Sure, we all agree there is a bug to fix.
> 
> The way you are fixing it is kind of illogical.
> 
> The NIC can use a frag if its _size_ is big enough to receive the
> frame.
> 
> The _stride_  is an abstraction created by the driver to report an
> estimation of the _truesize_,
> or memory consumption, so that linux can better track overall memory
> usage.
> 
> For example, if MTU=1500, the size of the fragment is 1536 bytes, but
> since we can put only
> 2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048
> bytes.
> 
> Declaring that a final blob of a page, being 1600 bytes, not able to
> receive a frame because
> _stride_ is 2048 is illogical and waste resources.
> 
> 

I see, I wanted to use _stride_ as grantee for how much a page frag can
grow, for example in mlx5 we need the whole stride to build_skb  around
the frag, since we always need the trailer, but it is different in here
and we can avoid resource waste.

so how a bout this: (As suggested by Martin).
currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
is always 0 in non-XDP setup, hence:

frags->page_offset += sz_align;

where it really should be:
frags->page_offset += sz_align + priv->rx_headroom;

we can use it as a hint to not reuse as below:
what do you think ?


diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
 {
        const struct mlx4_en_frag_info *frag_info = priv->frag_info;
        unsigned int truesize = 0;
+       bool release = true;
        int nr, frag_size;
        struct page *page;
        dma_addr_t dma;
-       bool release;
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c

        /* Collect used fragments while replacing them in the HW
descriptors */
        for (nr = 0;; frags++) {
@@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
                        release = page_count(page) != 1 ||
                                  page_is_pfmemalloc(page) ||
                                  page_to_nid(page) != numa_mem_id();
-               } else {
+               } elseif(!priv->rx_headroom) {
                        u32 sz_align = ALIGN(frag_size,
SMP_CACHE_BYTES);

                        frags->page_offset += sz_align;

^ permalink raw reply related

* Re: net-next compilation failures
From: David Miller @ 2018-06-20 23:43 UTC (permalink / raw)
  To: Manish.Chopra; +Cc: netdev
In-Reply-To: <CY1PR0701MB1163F3D32FBB527D5C7054E189770@CY1PR0701MB1163.namprd07.prod.outlook.com>

From: "Chopra, Manish" <Manish.Chopra@cavium.com>
Date: Wed, 20 Jun 2018 21:00:40 +0000

> I am trying to compile net-next kernel and I face these below
> compilation errros for some reason. Attached the kernel .config
> used.  Any idea for what reason these failures could be stemming ?

The net-next tree isn't open and therefore you shouldn't be doing
work against it.

When it is closed, I don't try hard to keep net-next up to date with
upstream and thus the recent upstream fixes for build problems or
other bugs.

^ permalink raw reply

* [PATCH net] bpf: enforce correct alignment for instructions
From: Eric Dumazet @ 2018-06-21  0:24 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Daniel Borkmann,
	Martin KaFai Lau, Alexei Starovoitov

After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
offsetof(struct bpf_binary_header, image) became 3 instead of 4,
breaking powerpc BPF badly, since instructions need to be word aligned.

Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/filter.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index b615df57b7d5b2ccb468c411c3a2aae103cd2aea..20f2659dd829256d7fef206087ab3262e1e291f5 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -472,7 +472,9 @@ struct sock_fprog_kern {
 struct bpf_binary_header {
 	u16 pages;
 	u16 locked:1;
-	u8 image[];
+
+	/* Some arches need word alignment for their instructions */
+	u8 image[] __aligned(4);
 };
 
 struct bpf_prog {
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH] bpfilter: fix user mode helper cross compilation
From: David Miller @ 2018-06-21  0:19 UTC (permalink / raw)
  To: mcroce; +Cc: netdev
In-Reply-To: <20180620140434.18139-1-mcroce@redhat.com>

From: Matteo Croce <mcroce@redhat.com>
Date: Wed, 20 Jun 2018 16:04:34 +0200

> Use $(OBJDUMP) instead of literal 'objdump' to avoid
> using host toolchain when cross compiling.
> 
> Fixes: 421780fd4983 ("bpfilter: fix build error")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Applied.

^ permalink raw reply

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Eric Dumazet @ 2018-06-21  0:28 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet@gmail.com, kafai@fb.com,
	Tariq Toukan
  Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <65f1a941a3013250e2a768a31f5e521dc21f73e8.camel@mellanox.com>



On 06/20/2018 04:41 PM, Saeed Mahameed wrote:
> 
> I see, I wanted to use _stride_ as grantee for how much a page frag can
> grow, for example in mlx5 we need the whole stride to build_skb  around
> the frag, since we always need the trailer, but it is different in here
> and we can avoid resource waste.
> 
> so how a bout this: (As suggested by Martin).
> currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
> is always 0 in non-XDP setup, hence:
> 
> frags->page_offset += sz_align;
> 
> where it really should be:
> frags->page_offset += sz_align + priv->rx_headroom;
> 
> we can use it as a hint to not reuse as below:
> what do you think ?
> 
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 9f54ccbddea7..f14c7a574cc8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
> mlx4_en_priv *priv,
>  {
>         const struct mlx4_en_frag_info *frag_info = priv->frag_info;
>         unsigned int truesize = 0;
> +       bool release = true;
>         int nr, frag_size;
>         struct page *page;
>         dma_addr_t dma;
> -       bool release;
> index 9f54ccbddea7..f14c7a574cc8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> 
>         /* Collect used fragments while replacing them in the HW
> descriptors */
>         for (nr = 0;; frags++) {
> @@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
> mlx4_en_priv *priv,
>                         release = page_count(page) != 1 ||
>                                   page_is_pfmemalloc(page) ||
>                                   page_to_nid(page) != numa_mem_id();
> -               } else {
> +               } elseif(!priv->rx_headroom) {
>                         u32 sz_align = ALIGN(frag_size,
> SMP_CACHE_BYTES);
> 
>                         frags->page_offset += sz_align;
> 

I guess that would work, please double check priv->rx_headroom wont need another cache line,
thanks !

^ permalink raw reply

* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Shannon Nelson @ 2018-06-21  0:32 UTC (permalink / raw)
  To: Anders Roxell; +Cc: Networking, David Miller
In-Reply-To: <CADYN=9Jmgk91RBiEyEbSdMr5_3JQgk058CO6HLRhDW6NFaF6qQ@mail.gmail.com>

On 6/20/2018 4:18 PM, Anders Roxell wrote:
> On Thu, 21 Jun 2018 at 00:26, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>
>> On 6/20/2018 12:09 PM, Anders Roxell wrote:
>>> On Wed, 20 Jun 2018 at 07:42, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>>>
>>>> A couple of bad behaviors in the ipsec selftest were pointed out
>>>> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
>>>>
>>>> Shannon Nelson (2):
>>>>     selftests: rtnetlink: hide complaint from terminated monitor
>>>>     selftests: rtnetlink: use a local IP address for IPsec tests
>>>>
>>>>    tools/testing/selftests/net/rtnetlink.sh | 11 +++++++----
>>>>    1 file changed, 7 insertions(+), 4 deletions(-)
>>>>
>>>> --
>>>> 2.7.4
>>>>
>>>
>>> Hi Shannon,
>>>
>>> With this patches applied and my config patch.
>>>
>>> I still get this error when I run the ipsec test:
>>>
>>> FAIL: can't add fou port 7777, skipping test
>>> RTNETLINK answers: Operation not supported
>>> FAIL: can't add macsec interface, skipping test
>>> RTNETLINK answers: Protocol not supported
>>> RTNETLINK answers: No such process
>>> RTNETLINK answers: No such process
>>> FAIL: ipsec
>>
>> One of the odd things I noticed about this script is that there really
>> aren't any diagnosis messages, just PASS or FAIL.  I followed this
>> custom when I added the ipsec tests, but I think this is something that
>> should change so we can get some idea of what breaks.
>>
>> I'm curious about the "RTNETLINK answers" messages and where they might
>> be coming from, especially "RTNETLINK answers: Protocol not supported".
> 
> I added: "set -x" in the beginning of the rtnetlink.sh script.
> + ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
> transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
> 0x3132333435
> 363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
> RTNETLINK answers: Protocol not supported

Okay, so ip didn't like this command...

>> What are the XFRM and AES settings in your kernel config - what is the
>> output from
>>          egrep -i "xfrm|_aes" .config
> 
> CONFIG_XFRM=y
> CONFIG_XFRM_ALGO=y
> CONFIG_XFRM_USER=y
> CONFIG_INET_XFRM_MODE_TUNNEL=y
> CONFIG_INET6_XFRM_MODE_TRANSPORT=y
> CONFIG_INET6_XFRM_MODE_TUNNEL=y
> CONFIG_INET6_XFRM_MODE_BEET=y
> CONFIG_CRYPTO_AES=y

And this is probably why - there seem to be a few config variables 
missing, including CONFIG_INET_XFRM_MODE_TRANSPORT, which might be why 
the ip command fails above.

Here's what I have in my config:
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_AES_X86_64=m
CONFIG_CRYPTO_AES_NI_INTEL=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m

Can I talk you into adding CONFIG_INET_XFRM_MODE_TRANSPORT to your 
config and trying again?

sln

^ permalink raw reply

* Re: [PATCH] selftests: net: add config fragments
From: David Miller @ 2018-06-21  0:47 UTC (permalink / raw)
  To: anders.roxell
  Cc: shuah, fw, shannon.nelson, netdev, linux-kselftest, linux-kernel
In-Reply-To: <20180619164111.30785-1-anders.roxell@linaro.org>

From: Anders Roxell <anders.roxell@linaro.org>
Date: Tue, 19 Jun 2018 18:41:11 +0200

> Add fragments to pass bridge and vlan tests.
> 
> Fixes: 33b01b7b4f19 ("selftests: add rtnetlink test script")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net][RESEND] strparser: Don't schedule in workqueue in paused state
From: David Miller @ 2018-06-21  0:54 UTC (permalink / raw)
  To: vakul.garg
  Cc: doronrk, tom, john.fastabend, davejwatson, netdev, ebiggers,
	linux-kernel
In-Reply-To: <20180620215949.32334-1-vakul.garg@nxp.com>

From: Vakul Garg <vakul.garg@nxp.com>
Date: Thu, 21 Jun 2018 03:29:49 +0530

> In function strp_data_ready(), it is useless to call queue_work if
> the state of strparser is already paused. The state checking should
> be done before calling queue_work. The change reduces the context
> switches and improves the ktls-rx throughput by approx 20% (measured
> on cortex-a53 based platform).
> 
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] r8169: Fix netpoll oops
From: David Miller @ 2018-06-21  0:56 UTC (permalink / raw)
  To: ville.syrjala; +Cc: netdev, nic_swsd, hkallweit1
In-Reply-To: <20180620120153.11676-1-ville.syrjala@linux.intel.com>

From: Ville Syrjala <ville.syrjala@linux.intel.com>
Date: Wed, 20 Jun 2018 15:01:53 +0300

> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Pass the correct thing to rtl8169_interrupt() from netpoll.
> 
> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
> Cc: netdev@vger.kernel.org
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Fixes: ebcd5daa7ffd ("r8169: change interrupt handler argument type")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: Hangbin Liu @ 2018-06-21  1:18 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Stefano Brivio,
	Paolo Abeni, Mahesh Bandewar
In-Reply-To: <CAM_iQpVdCJ2wjujco0+xZ_Jjeyu-qZ_g+8cVHv9LmUARhJmMoQ@mail.gmail.com>

On Wed, Jun 20, 2018 at 10:45:39AM -0700, Cong Wang wrote:
> On Tue, Jun 19, 2018 at 10:31 PM, David Miller <davem@davemloft.net> wrote:
> > From: Hangbin Liu <liuhangbin@gmail.com>
> > Date: Wed, 20 Jun 2018 11:22:54 +0800
> >
> >> The only case dev_change_flags() return an err is when we change IFF_UP flag.
> >> Since we only set/reset IFF_NOARP, do you think we still need to check the
> >> return value?
> >
> > It is bad to try and take shortcuts on error handling using assumptions
> > like that.
> >
> > If dev_change_flags() is adjusted to return error codes in more
> > situations, nobody is going to remember to undo your "optimziation"
> > here.
> >
> > Please check for errors, thank you.
> 
> Yeah. Also since the notifier is triggered in this case:
> 
>         if (dev->flags & IFF_UP &&
>             (changes & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI | IFF_VOLATILE))) {
>                 struct netdev_notifier_change_info change_info = {
>                         .info = {
>                                 .dev = dev,
>                         },
>                         .flags_changed = changes,
>                 };
> 
>                 call_netdevice_notifiers_info(NETDEV_CHANGE, &change_info.info);
>         }
> 
> the return value of call_netdevice_notifiers_info() isn't captured
> either, but it should be.

Thanks for the explanation. I will fix it.

Regards
Hangbin

^ permalink raw reply

* [PATCH] cfg80211: use IDA to allocate wiphy indeces
From: Brian Norris @ 2018-06-21  1:29 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-kernel, linux-wireless, netdev, Brian Norris

It's annoying to see the phy index increase arbitrarily, just because a
device got removed and re-probed (e.g., during a device reset, or due to
probe testing). We can use the in-kernel index allocator for this,
instead of just an increasing counter.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---
 net/wireless/core.c | 31 ++++++++++++++-----------------
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/net/wireless/core.c b/net/wireless/core.c
index c0fd8a85e7f7..80c108c3ca38 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -8,6 +8,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/idr.h>
 #include <linux/if.h>
 #include <linux/module.h>
 #include <linux/err.h>
@@ -380,11 +381,11 @@ static void cfg80211_propagate_cac_done_wk(struct work_struct *work)
 
 /* exported functions */
 
+static DEFINE_IDA(wiphy_ida);
+
 struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 			   const char *requested_name)
 {
-	static atomic_t wiphy_counter = ATOMIC_INIT(0);
-
 	struct cfg80211_registered_device *rdev;
 	int alloc_size;
 
@@ -413,18 +414,12 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 
 	rdev->ops = ops;
 
-	rdev->wiphy_idx = atomic_inc_return(&wiphy_counter);
-
+	rdev->wiphy_idx = ida_simple_get(&wiphy_ida, 0, 0, GFP_KERNEL);
 	if (unlikely(rdev->wiphy_idx < 0)) {
-		/* ugh, wrapped! */
-		atomic_dec(&wiphy_counter);
 		kfree(rdev);
 		return NULL;
 	}
 
-	/* atomic_inc_return makes it start at 1, make it start at 0 */
-	rdev->wiphy_idx--;
-
 	/* give it a proper name */
 	if (requested_name && requested_name[0]) {
 		int rv;
@@ -452,10 +447,8 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 		 * value, and use a different name if this one exists?
 		 */
 		rv = dev_set_name(&rdev->wiphy.dev, PHY_NAME "%d", rdev->wiphy_idx);
-		if (rv < 0) {
-			kfree(rdev);
-			return NULL;
-		}
+		if (rv < 0)
+			goto err;
 	}
 
 	INIT_LIST_HEAD(&rdev->wiphy.wdev_list);
@@ -497,10 +490,8 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 				   &rdev->wiphy.dev, RFKILL_TYPE_WLAN,
 				   &rdev->rfkill_ops, rdev);
 
-	if (!rdev->rfkill) {
-		kfree(rdev);
-		return NULL;
-	}
+	if (!rdev->rfkill)
+		goto err;
 
 	INIT_WORK(&rdev->rfkill_sync, cfg80211_rfkill_sync_work);
 	INIT_WORK(&rdev->conn_work, cfg80211_conn_work);
@@ -525,6 +516,11 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 	rdev->wiphy.max_sched_scan_plan_interval = U32_MAX;
 
 	return &rdev->wiphy;
+
+err:
+	ida_simple_remove(&wiphy_ida, rdev->wiphy_idx);
+	kfree(rdev);
+	return NULL;
 }
 EXPORT_SYMBOL(wiphy_new_nm);
 
@@ -972,6 +968,7 @@ void cfg80211_dev_free(struct cfg80211_registered_device *rdev)
 	}
 	list_for_each_entry_safe(scan, tmp, &rdev->bss_list, list)
 		cfg80211_put_bss(&rdev->wiphy, &scan->pub);
+	ida_simple_remove(&wiphy_ida, rdev->wiphy_idx);
 	kfree(rdev);
 }
 
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH net] net: mscc: fix the injection header
From: David Miller @ 2018-06-21  1:30 UTC (permalink / raw)
  To: antoine.tenart
  Cc: f.fainelli, andrew, netdev, linux-kernel, thomas.petazzoni,
	alexandre.belloni, quentin.schulz, allan.nielsen
In-Reply-To: <20180620085046.2377-1-antoine.tenart@bootlin.com>

From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Wed, 20 Jun 2018 10:50:46 +0200

> When injecting frames in the Ocelot switch driver an injection header
> (IFH) should be used to configure various parameters related to a given
> frame, such as the port onto which the frame should be departed or its
> vlan id. Other parameters in the switch configuration can led to an
> injected frame being sent without an IFH but this led to various issues
> as the per-frame parameters are then not used. This is especially true
> when using multiple ports for injection.
> 
> The IFH was injected with the wrong endianness which led to the switch
> not taking it into account as the IFH_INJ_BYPASS bit was then unset.
> (The bit tells the switch to use the IFH over its internal
> configuration). This patch fixes it.
> 
> In addition to the endianness fix, the IFH is also fixed. As it was
> (unwillingly) unused, some of its fields were not configured the right
> way.
> 
> Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH v2 0/2] net: davinci_emac: fix suspend/resume (both a regression and a common clk problem)
From: Florian Fainelli @ 2018-06-21  2:44 UTC (permalink / raw)
  To: Bartosz Golaszewski, Grygorii Strashko, David S . Miller,
	Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Lukas Wunner,
	Kevin Hilman, David Lechner, Sekhar Nori, Andrew Lunn
  Cc: linux-omap, netdev, linux-kernel, Bartosz Golaszewski
In-Reply-To: <20180620080356.11900-1-brgl@bgdev.pl>



On 06/20/2018 01:03 AM, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> 
> Earlier I sent the first patch as a solution to a regression introduced
> during the v4.16 merge window, but after testing David's common clock
> series on top of 4.18-rc1 + this patch it turned out that the problem
> persisted.
> 
> This is a follow-up containing the regression fix and an additional
> patche that makes suspend/resume work with David's changes.
> 
> v1 -> v2:
> - dropped patch 2/3
> - in patch 2/2: check the device's parent's compatible

Much better:

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

> 
> Bartosz Golaszewski (2):
>   net: ethernet: fix suspend/resume in davinci_emac
>   net: davinci_emac: match the mdio device against its compatible if
>     possible
> 
>  drivers/net/ethernet/ti/davinci_emac.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 

-- 
Florian

^ permalink raw reply

* [PATCH v2 bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: dsahern @ 2018-06-21  3:00 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, kafai, David Ahern

From: David Ahern <dsahern@gmail.com>

For ACLs implemented using either FIB rules or FIB entries, the BPF
program needs the FIB lookup status to be able to drop the packet.
Since the bpf_fib_lookup API has not reached a released kernel yet,
change the return code to contain an encoding of the FIB lookup
result and return the nexthop device index in the params struct.

In addition, inform the BPF program of any post FIB lookup reason as
to why the packet needs to go up the stack.

The fib result for unicast routes must have an egress device, so remove
the check that it is non-NULL.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
v2
- drop BPF_FIB_LKUP_RET_NO_NHDEV; check in dev in fib result not needed
- enhance documentation of BPF_FIB_LKUP_RET_ codes

 include/uapi/linux/bpf.h   | 28 ++++++++++++++----
 net/core/filter.c          | 72 ++++++++++++++++++++++++++++++----------------
 samples/bpf/xdp_fwd_kern.c |  8 +++---
 3 files changed, 74 insertions(+), 34 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 59b19b6a40d7..b7db3261c62d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1857,7 +1857,8 @@ union bpf_attr {
  *		is resolved), the nexthop address is returned in ipv4_dst
  *		or ipv6_dst based on family, smac is set to mac address of
  *		egress device, dmac is set to nexthop mac address, rt_metric
- *		is set to metric from route (IPv4/IPv6 only).
+ *		is set to metric from route (IPv4/IPv6 only), and ifindex
+ *		is set to the device index of the nexthop from the FIB lookup.
  *
  *             *plen* argument is the size of the passed in struct.
  *             *flags* argument can be a combination of one or more of the
@@ -1873,9 +1874,10 @@ union bpf_attr {
  *             *ctx* is either **struct xdp_md** for XDP programs or
  *             **struct sk_buff** tc cls_act programs.
  *     Return
- *             Egress device index on success, 0 if packet needs to continue
- *             up the stack for further processing or a negative error in case
- *             of failure.
+ *		* < 0 if any input argument is invalid
+ *		*   0 on success (packet is forwarded, nexthop neighbor exists)
+ *		* > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
+ *		*     packet is not forwarded or needs assist from full stack
  *
  * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
  *	Description
@@ -2612,6 +2614,18 @@ struct bpf_raw_tracepoint_args {
 #define BPF_FIB_LOOKUP_DIRECT  BIT(0)
 #define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
 
+enum {
+	BPF_FIB_LKUP_RET_SUCCESS,      /* lookup successful */
+	BPF_FIB_LKUP_RET_BLACKHOLE,    /* dest is blackholed; can be dropped */
+	BPF_FIB_LKUP_RET_UNREACHABLE,  /* dest is unreachable; can be dropped */
+	BPF_FIB_LKUP_RET_PROHIBIT,     /* dest not allowed; can be dropped */
+	BPF_FIB_LKUP_RET_NOT_FWDED,    /* packet is not forwarded */
+	BPF_FIB_LKUP_RET_FWD_DISABLED, /* fwding is not enabled on ingress */
+	BPF_FIB_LKUP_RET_UNSUPP_LWT,   /* fwd requires encapsulation */
+	BPF_FIB_LKUP_RET_NO_NEIGH,     /* no neighbor entry for nh */
+	BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
+};
+
 struct bpf_fib_lookup {
 	/* input:  network family for lookup (AF_INET, AF_INET6)
 	 * output: network family of egress nexthop
@@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
 
 	/* total length of packet from network header - used for MTU check */
 	__u16	tot_len;
-	__u32	ifindex;  /* L3 device index for lookup */
+
+	/* input: L3 device index for lookup
+	 * output: device index from FIB lookup
+	 */
+	__u32	ifindex;
 
 	union {
 		/* inputs to lookup */
diff --git a/net/core/filter.c b/net/core/filter.c
index e7f12e9f598c..f8dd8aa89de4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4073,8 +4073,9 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
 	memcpy(params->smac, dev->dev_addr, ETH_ALEN);
 	params->h_vlan_TCI = 0;
 	params->h_vlan_proto = 0;
+	params->ifindex = dev->ifindex;
 
-	return dev->ifindex;
+	return 0;
 }
 #endif
 
@@ -4098,7 +4099,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	/* verify forwarding is enabled on this interface */
 	in_dev = __in_dev_get_rcu(dev);
 	if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
-		return 0;
+		return BPF_FIB_LKUP_RET_FWD_DISABLED;
 
 	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
 		fl4.flowi4_iif = 1;
@@ -4123,7 +4124,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 		tb = fib_get_table(net, tbid);
 		if (unlikely(!tb))
-			return 0;
+			return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 		err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
 	} else {
@@ -4135,8 +4136,20 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 		err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
 	}
 
-	if (err || res.type != RTN_UNICAST)
-		return 0;
+	if (err) {
+		/* map fib lookup errors to RTN_ type */
+		if (err == -EINVAL)
+			return BPF_FIB_LKUP_RET_BLACKHOLE;
+		if (err == -EHOSTUNREACH)
+			return BPF_FIB_LKUP_RET_UNREACHABLE;
+		if (err == -EACCES)
+			return BPF_FIB_LKUP_RET_PROHIBIT;
+
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
+	}
+
+	if (res.type != RTN_UNICAST)
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 	if (res.fi->fib_nhs > 1)
 		fib_select_path(net, &res, &fl4, NULL);
@@ -4144,19 +4157,16 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (check_mtu) {
 		mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
 		if (params->tot_len > mtu)
-			return 0;
+			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
 	}
 
 	nh = &res.fi->fib_nh[res.nh_sel];
 
 	/* do not handle lwt encaps right now */
 	if (nh->nh_lwtstate)
-		return 0;
+		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
 
 	dev = nh->nh_dev;
-	if (unlikely(!dev))
-		return 0;
-
 	if (nh->nh_gw)
 		params->ipv4_dst = nh->nh_gw;
 
@@ -4166,10 +4176,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	 * rcu_read_lock_bh is not needed here
 	 */
 	neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
-	if (neigh)
-		return bpf_fib_set_fwd_params(params, neigh, dev);
+	if (!neigh)
+		return BPF_FIB_LKUP_RET_NO_NEIGH;
 
-	return 0;
+	return bpf_fib_set_fwd_params(params, neigh, dev);
 }
 #endif
 
@@ -4190,7 +4200,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 	/* link local addresses are never forwarded */
 	if (rt6_need_strict(dst) || rt6_need_strict(src))
-		return 0;
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 	dev = dev_get_by_index_rcu(net, params->ifindex);
 	if (unlikely(!dev))
@@ -4198,7 +4208,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 	idev = __in6_dev_get_safely(dev);
 	if (unlikely(!idev || !net->ipv6.devconf_all->forwarding))
-		return 0;
+		return BPF_FIB_LKUP_RET_FWD_DISABLED;
 
 	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
 		fl6.flowi6_iif = 1;
@@ -4225,7 +4235,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 		tb = ipv6_stub->fib6_get_table(net, tbid);
 		if (unlikely(!tb))
-			return 0;
+			return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 		f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict);
 	} else {
@@ -4238,11 +4248,23 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	}
 
 	if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry))
-		return 0;
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
+
+	if (unlikely(f6i->fib6_flags & RTF_REJECT)) {
+		switch (f6i->fib6_type) {
+		case RTN_BLACKHOLE:
+			return BPF_FIB_LKUP_RET_BLACKHOLE;
+		case RTN_UNREACHABLE:
+			return BPF_FIB_LKUP_RET_UNREACHABLE;
+		case RTN_PROHIBIT:
+			return BPF_FIB_LKUP_RET_PROHIBIT;
+		default:
+			return BPF_FIB_LKUP_RET_NOT_FWDED;
+		}
+	}
 
-	if (unlikely(f6i->fib6_flags & RTF_REJECT ||
-	    f6i->fib6_type != RTN_UNICAST))
-		return 0;
+	if (f6i->fib6_type != RTN_UNICAST)
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 	if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0)
 		f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6,
@@ -4252,11 +4274,11 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (check_mtu) {
 		mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
 		if (params->tot_len > mtu)
-			return 0;
+			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
 	}
 
 	if (f6i->fib6_nh.nh_lwtstate)
-		return 0;
+		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
 
 	if (f6i->fib6_flags & RTF_GATEWAY)
 		*dst = f6i->fib6_nh.nh_gw;
@@ -4270,10 +4292,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	 */
 	neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
 				      ndisc_hashfn, dst, dev);
-	if (neigh)
-		return bpf_fib_set_fwd_params(params, neigh, dev);
+	if (!neigh)
+		return BPF_FIB_LKUP_RET_NO_NEIGH;
 
-	return 0;
+	return bpf_fib_set_fwd_params(params, neigh, dev);
 }
 #endif
 
diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index 6673cdb9f55c..a7e94e7ff87d 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -48,9 +48,9 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 	struct ethhdr *eth = data;
 	struct ipv6hdr *ip6h;
 	struct iphdr *iph;
-	int out_index;
 	u16 h_proto;
 	u64 nh_off;
+	int rc;
 
 	nh_off = sizeof(*eth);
 	if (data + nh_off > data_end)
@@ -101,7 +101,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 
 	fib_params.ifindex = ctx->ingress_ifindex;
 
-	out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
+	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
 
 	/* verify egress index has xdp support
 	 * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
@@ -109,7 +109,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 	 * NOTE: without verification that egress index supports XDP
 	 *       forwarding packets are dropped.
 	 */
-	if (out_index > 0) {
+	if (rc == 0) {
 		if (h_proto == htons(ETH_P_IP))
 			ip_decrease_ttl(iph);
 		else if (h_proto == htons(ETH_P_IPV6))
@@ -117,7 +117,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 
 		memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
 		memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
-		return bpf_redirect_map(&tx_port, out_index, 0);
+		return bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
 	}
 
 	return XDP_PASS;
-- 
2.11.0

^ permalink raw reply related

* [net:master 6/6] drivers/net/ethernet/mscc/ocelot.c:377:17: sparse: incorrect type in argument 2 (different base types)
From: kbuild test robot @ 2018-06-21  3:43 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: kbuild-all, netdev, Alexandre Belloni

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   08d02364b12faa54d76dbfea2090321fd27996f2
commit: 08d02364b12faa54d76dbfea2090321fd27996f2 [6/6] net: mscc: fix the injection header
reproduce:
        # apt-get install sparse
        git checkout 08d02364b12faa54d76dbfea2090321fd27996f2
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/mscc/ocelot.c:377:17: sparse: incorrect type in argument 2 (different base types) @@    expected unsigned int [unsigned] [usertype] val @@    got ed int [unsigned] [usertype] val @@
   drivers/net/ethernet/mscc/ocelot.c:377:17:    expected unsigned int [unsigned] [usertype] val
   drivers/net/ethernet/mscc/ocelot.c:377:17:    got restricted __be32 [usertype] <noident>
   include/linux/device.h:678:13: sparse: undefined identifier '__builtin_mul_overflow'
   include/linux/device.h:678:13: sparse: call with no type!

vim +377 drivers/net/ethernet/mscc/ocelot.c

   353	
   354	static int ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
   355	{
   356		struct ocelot_port *port = netdev_priv(dev);
   357		struct ocelot *ocelot = port->ocelot;
   358		u32 val, ifh[IFH_LEN];
   359		struct frame_info info = {};
   360		u8 grp = 0; /* Send everything on CPU group 0 */
   361		unsigned int i, count, last;
   362	
   363		val = ocelot_read(ocelot, QS_INJ_STATUS);
   364		if (!(val & QS_INJ_STATUS_FIFO_RDY(BIT(grp))) ||
   365		    (val & QS_INJ_STATUS_WMARK_REACHED(BIT(grp))))
   366			return NETDEV_TX_BUSY;
   367	
   368		ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
   369				 QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
   370	
   371		info.port = BIT(port->chip_port);
   372		info.tag_type = IFH_TAG_TYPE_C;
   373		info.vid = skb_vlan_tag_get(skb);
   374		ocelot_gen_ifh(ifh, &info);
   375	
   376		for (i = 0; i < IFH_LEN; i++)
 > 377			ocelot_write_rix(ocelot, cpu_to_be32(ifh[i]), QS_INJ_WR, grp);
   378	
   379		count = (skb->len + 3) / 4;
   380		last = skb->len % 4;
   381		for (i = 0; i < count; i++) {
   382			ocelot_write_rix(ocelot, ((u32 *)skb->data)[i], QS_INJ_WR, grp);
   383		}
   384	
   385		/* Add padding */
   386		while (i < (OCELOT_BUFFER_CELL_SZ / 4)) {
   387			ocelot_write_rix(ocelot, 0, QS_INJ_WR, grp);
   388			i++;
   389		}
   390	
   391		/* Indicate EOF and valid bytes in last word */
   392		ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
   393				 QS_INJ_CTRL_VLD_BYTES(skb->len < OCELOT_BUFFER_CELL_SZ ? 0 : last) |
   394				 QS_INJ_CTRL_EOF,
   395				 QS_INJ_CTRL, grp);
   396	
   397		/* Add dummy CRC */
   398		ocelot_write_rix(ocelot, 0, QS_INJ_WR, grp);
   399		skb_tx_timestamp(skb);
   400	
   401		dev->stats.tx_packets++;
   402		dev->stats.tx_bytes += skb->len;
   403		dev_kfree_skb_any(skb);
   404	
   405		return NETDEV_TX_OK;
   406	}
   407	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH net] bpf: enforce correct alignment for instructions
From: David Miller @ 2018-06-21  3:46 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, daniel, kafai, ast
In-Reply-To: <20180621002409.63136-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 20 Jun 2018 17:24:09 -0700

> After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
> offsetof(struct bpf_binary_header, image) became 3 instead of 4,
> breaking powerpc BPF badly, since instructions need to be word aligned.
> 
> Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

I'll apply this directly, thanks Eric.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox