Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  1:45 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <304d8535-5043-836d-2933-1a5efb7aec72@fb.com>

On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>> BTF verifier has Different logic depending on whether we are following
> >>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>> stop early in DFS traversal while resolving BTF types. But it also
> >>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>> size won't be resolved, as it is considered to be a sink for pointer,
> >>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>> completely wrong.
> >>>
> >>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>> over all BTF types anyways, so the only saving is a potentially slightly
> >>> shorter stack. But correctness is more important that tiny savings.
> >>>
> >>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>> typedef as a value type:
> >>>
> >>> typedef int array_t[16];
> >>>
> >>> struct {
> >>>        __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>        __type(value, array_t); /* i.e., array_t *value; */
> >>> } test_map SEC(".maps");
> >>>
> >>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>
> >> The change seems okay to me. Currently, looks like intermediate
> >> modifier type will carry size = 0 (in the internal data structure).
> >
> > Yes, which is totally wrong, especially that we use that size in some
> > cases to reject map with specified BTF.
> >
> >>
> >> If we remove RESOLVE logic, we probably want to double check
> >> whether we handle circular types correctly or not. Maybe we will
> >> be okay if all self tests pass.
> >
> > I checked, it does. We'll attempt to add referenced type unless it's a
> > "resolve sink" (where size is immediately known) or is already
> > resolved (it's state is RESOLVED). In other cases, we'll attempt to
> > env_stack_push(), which check that the state of that type is
> > NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> > type is added into the stack, it's resolve state goes from NOT_VISITED
> > to VISITED.
> >
> > So, if there is a loop, then we'll detect it as soon as we'll attempt
> > to add the same type onto the stack second time.
> >
> >>
> >> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >> before removing it.
> >
> > I don't think there is any, because every type will be visited exactly
> > once, due to DFS nature of algorithm. The only difference is that if
> > we have a long chain of modifiers, we can technically reach the max
> > limit and fail. But at 32 I think it's pretty unrealistic to have such
> > a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >
> >>
> >> Another possible change is, for external usage, removing
> >> modifiers, before checking the size, something like below.
> >> Note that I am not strongly advocating my below patch as
> >> it has the same shortcoming that maintained modifier type
> >> size may not be correct.
> >
> > I don't think your patch helps, it can actually confuse things even
> > more. It skips modifiers until underlying type is found, but you still
> > don't guarantee that at that time that underlying type will have its
> > size resolved.
>
> It actually does help. It does not change the internal btf type
> traversal algorithms. It only change the implementation of
> an external API btf_type_id_size(). Previously, this function
> is used by externals and internal btf.c. I broke it into two,
> one internal __btf_type_id_size(), and another external
> btf_type_id_size(). The external one removes modifier before
> finding type size. The external one is typically used only
> after btf is validated.

Sure, for external callers yes, it solves the problem. But there is
deeper problem: we mark modifier types RESOLVED before types they
ultimately point to are resolved. Then in all those btf_xxx_resolve()
functions we have check:

if (!env_type_is_resolve_sink && !env_type_is_resolved)
  return env_stack_push();
else {

  /* here we assume that we can calculate size of the type */
  /* so even if we traverse through all the modifiers and find
underlying type */
  /* that type will have resolved_size = 0, because we haven't
processed it yet */
  /* but we will just incorrectly assume that zero is *final* size */
}

So I think that your patch is still just hiding the problem, not solving it.

BTW, I've also identified part of btf_ptr_resolve() logic that can be
now safely removed (it's a special case that "restarts" DFS traversal
for modifiers, because they could have been prematurely marked
resolved). This is another sign that there is something wrong in an
algorithm.

I'd rather remove unnecessary complexity and fix underlying problem,
especially given that there is no performance or correctness penalty.

I'll post v2 soon.

>
> Will go through your other comments later.
>
> >
> >>
> >> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >> index 546ebee39e2a..6f927c3e0a89 100644
> >> --- a/kernel/bpf/btf.c
> >> +++ b/kernel/bpf/btf.c
> >> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >> btf_type *t)
> >>           return true;
> >>    }
> >>
> >> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >> +                                                u32 *type_id, u32
> >> *ret_size,
> >> +                                                bool skip_modifier)
> >> +{
> >> +       const struct btf_type *size_type;
> >> +       u32 size_type_id = *type_id;
> >> +       u32 size = 0;
> >> +
> >> +       size_type = btf_type_by_id(btf, size_type_id);
> >> +       if (size_type && skip_modifier) {
> >> +               while (btf_type_is_modifier(size_type))
> >> +                       size_type = btf_type_by_id(btf, size_type->type);
> >> +       }
> >> +
> >> +       if (btf_type_nosize_or_null(size_type))
> >> +               return NULL;
> >> +
> >> +       if (btf_type_has_size(size_type)) {
> >> +               size = size_type->size;
> >> +       } else if (btf_type_is_array(size_type)) {
> >> +               size = btf->resolved_sizes[size_type_id];
> >> +       } else if (btf_type_is_ptr(size_type)) {
> >> +               size = sizeof(void *);
> >> +       } else {
> >> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >> +                                !btf_type_is_var(size_type)))
> >> +                       return NULL;
> >> +
> >> +               size = btf->resolved_sizes[size_type_id];
> >> +               size_type_id = btf->resolved_ids[size_type_id];
> >> +               size_type = btf_type_by_id(btf, size_type_id);
> >> +               if (btf_type_nosize_or_null(size_type))
> >> +                       return NULL;
> >> +       }
> >> +
> >> +       *type_id = size_type_id;
> >> +       if (ret_size)
> >> +               *ret_size = size;
> >> +
> >> +       return size_type;
> >> +}
> >> +
> [...]

^ permalink raw reply

* RE: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Nixiaoming @ 2019-07-11  1:32 UTC (permalink / raw)
  To: Greg KH
  Cc: adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	jlayton@kernel.org, luto@kernel.org, mingo@kernel.org,
	Nadia.Derbey@bull.net, paulmck@linux.vnet.ibm.com,
	semen.protsenko@linaro.org, stable@kernel.org,
	stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org, vvs@virtuozzo.com, Huangjianhui (Alex),
	Dailei, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20190710055628.GB5778@kroah.com>

On Wed, July 10, 2019 1:56 PM Greg KH wrote:
>On Wed, Jul 10, 2019 at 11:09:07AM +0800, Xiaoming Ni wrote:
>> Registering the same notifier to a hook repeatedly can cause the hook
>> list to form a ring or lose other members of the list.
>
>Then don't do that :)
>

Duplicate registration is checked and exited in notifier_chain_cond_register()

Duplicate registration was checked in notifier_chain_register() but only 
the alarm was triggered without exiting. added by commit 831246570d34692e 
("kernel/notifier.c: double register detection")

This patch is similar to commit 8312465 and notifier_chain_cond_register(),
 with actual prevention for such behaviour,  which I think is necessary to 
 avoid the formation of a linked list ring.

>Is there any in-kernel users that do do this?  If so, please just fix
>them.
>
Notifier_chain_register() is not a hotspot path.
Adding a check here can make the kernel more stable.

Thanks

Xiaoming Ni


>thanks,
>
>greg k-h
>

^ permalink raw reply

* Re: [bpf-next v3 05/12] selftests/bpf: Allow passing more information to BPF prog test run
From: Andrii Nakryiko @ 2019-07-11  1:17 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-6-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> The test case can now specify a custom length of the data member,
> context data and its length, which will be passed to
> bpf_prog_test_run_xattr. For backward compatilibity, if the data
> length is 0 (which is what will happen when the field is left
> unspecified in the designated initializer of a struct), then the
> length passed to the bpf_prog_test_run_xattr is TEST_DATA_LEN.
>
> Also for backward compatilibity, if context data length is 0, NULL is
> passed as a context to bpf_prog_test_run_xattr. This is to avoid
> breaking other tests, where context data being NULL and context data
> length being 0 is handled differently from the case where context data
> is not NULL and context data length is 0.
>
> Custom lengths still can't be greater than hardcoded 64 bytes for data
> and 192 for context data.
>
> 192 for context data was picked to allow passing struct
> bpf_perf_event_data as a context for perf event programs. The struct
> is quite large, because it contains struct pt_regs.
>
> Test runs for perf event programs will not allow the copying the data
> back to data_out buffer, so they require data_out_size to be zero and
> data_out to be NULL. Since test_verifier hardcodes it, make it
> possible to override the size. Overriding the size to zero will cause
> the buffer to be NULL.
>
> Changes since v2:
> - Allow overriding the data out size and buffer.
>
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 105 +++++++++++++++++---
>  1 file changed, 93 insertions(+), 12 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index 1640ba9f12c1..6f124cc4ee34 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -54,6 +54,7 @@
>  #define MAX_TEST_RUNS  8
>  #define POINTER_VALUE  0xcafe4all
>  #define TEST_DATA_LEN  64
> +#define TEST_CTX_LEN   192
>
>  #define F_NEEDS_EFFICIENT_UNALIGNED_ACCESS     (1 << 0)
>  #define F_LOAD_WITH_STRICT_ALIGNMENT           (1 << 1)
> @@ -96,7 +97,12 @@ struct bpf_test {
>         enum bpf_prog_type prog_type;
>         uint8_t flags;
>         __u8 data[TEST_DATA_LEN];
> +       __u32 data_len;
> +       __u8 ctx[TEST_CTX_LEN];
> +       __u32 ctx_len;
>         void (*fill_helper)(struct bpf_test *self);
> +       bool override_data_out_len;
> +       __u32 overridden_data_out_len;
>         uint8_t runs;
>         struct {
>                 uint32_t retval, retval_unpriv;
> @@ -104,6 +110,9 @@ struct bpf_test {
>                         __u8 data[TEST_DATA_LEN];
>                         __u64 data64[TEST_DATA_LEN / 8];
>                 };
> +               __u32 data_len;
> +               __u8 ctx[TEST_CTX_LEN];
> +               __u32 ctx_len;
>         } retvals[MAX_TEST_RUNS];
>  };
>
> @@ -818,21 +827,35 @@ static int set_admin(bool admin)
>  }
>
>  static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
> -                           void *data, size_t size_data)
> +                           void *data, size_t size_data, void *ctx,
> +                           size_t size_ctx, u32 *overridden_data_out_size)
>  {
> -       __u8 tmp[TEST_DATA_LEN << 2];
> -       __u32 size_tmp = sizeof(tmp);
> -       int saved_errno;
> -       int err;
>         struct bpf_prog_test_run_attr attr = {
>                 .prog_fd = fd_prog,
>                 .repeat = 1,
>                 .data_in = data,
>                 .data_size_in = size_data,
> -               .data_out = tmp,
> -               .data_size_out = size_tmp,
> +               .ctx_in = ctx,
> +               .ctx_size_in = size_ctx,
>         };
> +       __u8 tmp[TEST_DATA_LEN << 2];
> +       __u32 size_tmp = sizeof(tmp);
> +       __u32 size_buf = size_tmp;
> +       __u8 *buf = tmp;
> +       int saved_errno;
> +       int err;
>
> +       if (overridden_data_out_size)
> +               size_buf = *overridden_data_out_size;
> +       if (size_buf > size_tmp) {
> +               printf("FAIL: out data size (%d) greater than a buffer size (%d) ",
> +                      size_buf, size_tmp);
> +               return -EINVAL;
> +       }
> +       if (!size_buf)
> +               buf = NULL;
> +       attr.data_size_out = size_buf;
> +       attr.data_out = buf;
>         if (unpriv)
>                 set_admin(true);
>         err = bpf_prog_test_run_xattr(&attr);
> @@ -956,13 +979,45 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>         if (!alignment_prevented_execution && fd_prog >= 0) {
>                 uint32_t expected_val;
>                 int i;
> +               __u32 size_data;
> +               __u32 size_ctx;
> +               bool bad_size;
> +               void *ctx;
> +               __u32 *overridden_data_out_size;
>
>                 if (!test->runs) {
> +                       if (test->data_len > 0)
> +                               size_data = test->data_len;
> +                       else
> +                               size_data = sizeof(test->data);
> +                       if (test->override_data_out_len)
> +                               overridden_data_out_size = &test->overridden_data_out_len;
> +                       else
> +                               overridden_data_out_size = NULL;
> +                       size_ctx = test->ctx_len;
> +                       bad_size = false;

I hated all this duplication of logic, which with this patch becomes
even more expansive, so I removed it. Please see [0]. Can you please
apply that patch and add all this new logic only once?

  [0] https://patchwork.ozlabs.org/patch/1130601/

>                         expected_val = unpriv && test->retval_unpriv ?
>                                 test->retval_unpriv : test->retval;
>
> -                       err = do_prog_test_run(fd_prog, unpriv, expected_val,
> -                                              test->data, sizeof(test->data));
> +                       if (size_data > sizeof(test->data)) {
> +                               printf("FAIL: data size (%u) greater than TEST_DATA_LEN (%lu) ", size_data, sizeof(test->data));
> +                               bad_size = true;
> +                       }
> +                       if (size_ctx > sizeof(test->ctx)) {
> +                               printf("FAIL: ctx size (%u) greater than TEST_CTX_LEN (%lu) ", size_ctx, sizeof(test->ctx));

These look like way too long lines, wrap them?

> +                               bad_size = true;
> +                       }
> +                       if (size_ctx)
> +                               ctx = test->ctx;
> +                       else
> +                               ctx = NULL;

nit: single line:

ctx = size_ctx ? test->ctx : NULL;

> +                       if (bad_size)
> +                               err = 1;
> +                       else
> +                               err = do_prog_test_run(fd_prog, unpriv, expected_val,
> +                                                      test->data, size_data,
> +                                                      ctx, size_ctx,
> +                                                      overridden_data_out_size);
>                         if (err)
>                                 run_errs++;
>                         else
> @@ -970,14 +1025,40 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>                 }
>
>                 for (i = 0; i < test->runs; i++) {
> +                       if (test->retvals[i].data_len > 0)
> +                               size_data = test->retvals[i].data_len;
> +                       else
> +                               size_data = sizeof(test->retvals[i].data);
> +                       if (test->override_data_out_len)
> +                               overridden_data_out_size = &test->overridden_data_out_len;
> +                       else
> +                               overridden_data_out_size = NULL;
> +                       size_ctx = test->retvals[i].ctx_len;
> +                       bad_size = false;
>                         if (unpriv && test->retvals[i].retval_unpriv)
>                                 expected_val = test->retvals[i].retval_unpriv;
>                         else
>                                 expected_val = test->retvals[i].retval;
>
> -                       err = do_prog_test_run(fd_prog, unpriv, expected_val,
> -                                              test->retvals[i].data,
> -                                              sizeof(test->retvals[i].data));
> +                       if (size_data > sizeof(test->retvals[i].data)) {
> +                               printf("FAIL: data size (%u) at run %i greater than TEST_DATA_LEN (%lu) ", size_data, i + 1, sizeof(test->retvals[i].data));
> +                               bad_size = true;
> +                       }
> +                       if (size_ctx > sizeof(test->retvals[i].ctx)) {
> +                               printf("FAIL: ctx size (%u) at run %i greater than TEST_CTX_LEN (%lu) ", size_ctx, i + 1, sizeof(test->retvals[i].ctx));
> +                               bad_size = true;
> +                       }
> +                       if (size_ctx)
> +                               ctx = test->retvals[i].ctx;
> +                       else
> +                               ctx = NULL;
> +                       if (bad_size)
> +                               err = 1;
> +                       else
> +                               err = do_prog_test_run(fd_prog, unpriv, expected_val,
> +                                                      test->retvals[i].data, size_data,
> +                                                      ctx, size_ctx,
> +                                                      overridden_data_out_size);
>                         if (err) {
>                                 printf("(run %d/%d) ", i + 1, test->runs);
>                                 run_errs++;
> --
> 2.20.1
>

^ permalink raw reply

* [PATCH bpf-next] selftests/bpf: remove logic duplication in test_verifier.c
From: Andrii Nakryiko @ 2019-07-11  1:08 UTC (permalink / raw)
  To: andrii.nakryiko, kernel-team, ast, daniel, bpf, netdev
  Cc: Andrii Nakryiko, Krzesimir Nowak

test_verifier tests can specify single- and multi-runs tests. Internally
logic of handling them is duplicated. Get rid of it by making single run
retval specification to be a first retvals spec.

Cc: Krzesimir Nowak <krzesimir@kinvolk.io>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/test_verifier.c | 37 ++++++++++-----------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index b0773291012a..120ecdf4a7db 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -86,7 +86,7 @@ struct bpf_test {
 	int fixup_sk_storage_map[MAX_FIXUPS];
 	const char *errstr;
 	const char *errstr_unpriv;
-	uint32_t retval, retval_unpriv, insn_processed;
+	uint32_t insn_processed;
 	int prog_len;
 	enum {
 		UNDEF,
@@ -95,16 +95,24 @@ struct bpf_test {
 	} result, result_unpriv;
 	enum bpf_prog_type prog_type;
 	uint8_t flags;
-	__u8 data[TEST_DATA_LEN];
 	void (*fill_helper)(struct bpf_test *self);
 	uint8_t runs;
-	struct {
-		uint32_t retval, retval_unpriv;
-		union {
-			__u8 data[TEST_DATA_LEN];
-			__u64 data64[TEST_DATA_LEN / 8];
+	union {
+		struct {
+			uint32_t retval, retval_unpriv;
+			union {
+				__u8 data[TEST_DATA_LEN];
+				__u64 data64[TEST_DATA_LEN / 8];
+			};
 		};
-	} retvals[MAX_TEST_RUNS];
+		struct {
+			uint32_t retval, retval_unpriv;
+			union {
+				__u8 data[TEST_DATA_LEN];
+				__u64 data64[TEST_DATA_LEN / 8];
+			};
+		} retvals[MAX_TEST_RUNS];
+	};
 	enum bpf_attach_type expected_attach_type;
 };
 
@@ -949,17 +957,8 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 		uint32_t expected_val;
 		int i;
 
-		if (!test->runs) {
-			expected_val = unpriv && test->retval_unpriv ?
-				test->retval_unpriv : test->retval;
-
-			err = do_prog_test_run(fd_prog, unpriv, expected_val,
-					       test->data, sizeof(test->data));
-			if (err)
-				run_errs++;
-			else
-				run_successes++;
-		}
+		if (!test->runs)
+			test->runs = 1;
 
 		for (i = 0; i < test->runs; i++) {
 			if (unpriv && test->retvals[i].retval_unpriv)
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  0:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <CAEf4BzaVouFd=3whC1EjhQ9mit62b-C+NhQuW4RiXW02Rq_1Ug@mail.gmail.com>



On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
>>> BTF verifier has Different logic depending on whether we are following
>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
>>> stop early in DFS traversal while resolving BTF types. But it also
>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
>>> size won't be resolved, as it is considered to be a sink for pointer,
>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
>>> completely wrong.
>>>
>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
>>> over all BTF types anyways, so the only saving is a potentially slightly
>>> shorter stack. But correctness is more important that tiny savings.
>>>
>>> This bug manifests itself in rejecting BTF-defined maps that use array
>>> typedef as a value type:
>>>
>>> typedef int array_t[16];
>>>
>>> struct {
>>>        __uint(type, BPF_MAP_TYPE_ARRAY);
>>>        __type(value, array_t); /* i.e., array_t *value; */
>>> } test_map SEC(".maps");
>>>
>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
>>> Cc: Martin KaFai Lau <kafai@fb.com>
>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>
>> The change seems okay to me. Currently, looks like intermediate
>> modifier type will carry size = 0 (in the internal data structure).
> 
> Yes, which is totally wrong, especially that we use that size in some
> cases to reject map with specified BTF.
> 
>>
>> If we remove RESOLVE logic, we probably want to double check
>> whether we handle circular types correctly or not. Maybe we will
>> be okay if all self tests pass.
> 
> I checked, it does. We'll attempt to add referenced type unless it's a
> "resolve sink" (where size is immediately known) or is already
> resolved (it's state is RESOLVED). In other cases, we'll attempt to
> env_stack_push(), which check that the state of that type is
> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> type is added into the stack, it's resolve state goes from NOT_VISITED
> to VISITED.
> 
> So, if there is a loop, then we'll detect it as soon as we'll attempt
> to add the same type onto the stack second time.
> 
>>
>> I may still be worthwhile to qualify the RESOLVE optimization benefit
>> before removing it.
> 
> I don't think there is any, because every type will be visited exactly
> once, due to DFS nature of algorithm. The only difference is that if
> we have a long chain of modifiers, we can technically reach the max
> limit and fail. But at 32 I think it's pretty unrealistic to have such
> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> 
>>
>> Another possible change is, for external usage, removing
>> modifiers, before checking the size, something like below.
>> Note that I am not strongly advocating my below patch as
>> it has the same shortcoming that maintained modifier type
>> size may not be correct.
> 
> I don't think your patch helps, it can actually confuse things even
> more. It skips modifiers until underlying type is found, but you still
> don't guarantee that at that time that underlying type will have its
> size resolved.

It actually does help. It does not change the internal btf type
traversal algorithms. It only change the implementation of
an external API btf_type_id_size(). Previously, this function
is used by externals and internal btf.c. I broke it into two,
one internal __btf_type_id_size(), and another external
btf_type_id_size(). The external one removes modifier before
finding type size. The external one is typically used only
after btf is validated.

Will go through your other comments later.

> 
>>
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 546ebee39e2a..6f927c3e0a89 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
>> btf_type *t)
>>           return true;
>>    }
>>
>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
>> +                                                u32 *type_id, u32
>> *ret_size,
>> +                                                bool skip_modifier)
>> +{
>> +       const struct btf_type *size_type;
>> +       u32 size_type_id = *type_id;
>> +       u32 size = 0;
>> +
>> +       size_type = btf_type_by_id(btf, size_type_id);
>> +       if (size_type && skip_modifier) {
>> +               while (btf_type_is_modifier(size_type))
>> +                       size_type = btf_type_by_id(btf, size_type->type);
>> +       }
>> +
>> +       if (btf_type_nosize_or_null(size_type))
>> +               return NULL;
>> +
>> +       if (btf_type_has_size(size_type)) {
>> +               size = size_type->size;
>> +       } else if (btf_type_is_array(size_type)) {
>> +               size = btf->resolved_sizes[size_type_id];
>> +       } else if (btf_type_is_ptr(size_type)) {
>> +               size = sizeof(void *);
>> +       } else {
>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
>> +                                !btf_type_is_var(size_type)))
>> +                       return NULL;
>> +
>> +               size = btf->resolved_sizes[size_type_id];
>> +               size_type_id = btf->resolved_ids[size_type_id];
>> +               size_type = btf_type_by_id(btf, size_type_id);
>> +               if (btf_type_nosize_or_null(size_type))
>> +                       return NULL;
>> +       }
>> +
>> +       *type_id = size_type_id;
>> +       if (ret_size)
>> +               *ret_size = size;
>> +
>> +       return size_type;
>> +}
>> +
[...]

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5e: Provide cb_list pointer when setting up tc block on rep
From: wenxu @ 2019-07-11  0:35 UTC (permalink / raw)
  To: Vlad Buslov, netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, pablo, saeedm
In-Reply-To: <20190710182554.2988-1-vladbu@mellanox.com>


在 2019/7/11 2:25, Vlad Buslov 写道:
> Recent refactoring of tc block offloads infrastructure introduced new
> flow_block_cb_setup_simple() method intended to be used as unified way for
> all drivers to register offload callbacks. However, commit that actually
> extended all users (drivers) with block cb list and provided it to
> flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer
> dereference when creating Qdisc:
>
> [  278.385175] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  278.393233] #PF: supervisor read access in kernel mode
> [  278.399446] #PF: error_code(0x0000) - not-present page
> [  278.405847] PGD 8000000850e73067 P4D 8000000850e73067 PUD 8620cd067 PMD 0
> [  278.414141] Oops: 0000 [#1] SMP PTI
> [  278.419019] CPU: 7 PID: 3369 Comm: tc Not tainted 5.2.0-rc6+ #492
> [  278.426580] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
> [  278.435853] RIP: 0010:flow_block_cb_setup_simple+0xc4/0x190
> [  278.442953] Code: 10 48 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 05 00 01 00 00 49 89 40 08 31 c0 c3 b8 a1 ff ff ff c3 f3 c3 <48> 8b 06 48 39 c6 75 0a eb 1a 48 8b 00 48 39 c6 74 12
>  48 3b 50 28
> [  278.464829] RSP: 0018:ffffaf07c3f97990 EFLAGS: 00010246
> [  278.471648] RAX: 0000000000000000 RBX: ffff9b43ed4c7680 RCX: ffff9b43d5f80840
> [  278.480408] RDX: ffffffffc0491650 RSI: 0000000000000000 RDI: ffffaf07c3f97998
> [  278.489110] RBP: ffff9b43ddff9000 R08: ffff9b43d5f80840 R09: 0000000000000001
> [  278.497838] R10: 0000000000000009 R11: 00000000000003ad R12: ffffaf07c3f97c08
> [  278.506595] R13: ffff9b43d5f80000 R14: ffff9b43ed4c7680 R15: ffff9b43dfa20b40
> [  278.515374] FS:  00007f796be1b400(0000) GS:ffff9b43ef840000(0000) knlGS:0000000000000000
> [  278.525099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  278.532453] CR2: 0000000000000000 CR3: 0000000840398002 CR4: 00000000001606e0
> [  278.541197] Call Trace:
> [  278.545252]  tcf_block_offload_cmd.isra.52+0x7e/0xb0
> [  278.551871]  tcf_block_get_ext+0x365/0x3e0
> [  278.557569]  qdisc_create+0x15c/0x4e0
> [  278.562859]  ? kmem_cache_alloc_trace+0x1a2/0x1c0
> [  278.569235]  tc_modify_qdisc+0x1c8/0x780
> [  278.574761]  rtnetlink_rcv_msg+0x291/0x340
> [  278.580518]  ? _cond_resched+0x15/0x40
> [  278.585856]  ? rtnl_calcit.isra.29+0x120/0x120
> [  278.591868]  netlink_rcv_skb+0x4a/0x110
> [  278.597198]  netlink_unicast+0x1a0/0x250
> [  278.602601]  netlink_sendmsg+0x2c1/0x3c0
> [  278.608022]  sock_sendmsg+0x5b/0x60
> [  278.612969]  ___sys_sendmsg+0x289/0x310
> [  278.618231]  ? do_wp_page+0x99/0x730
> [  278.623216]  ? page_add_new_anon_rmap+0xbe/0x140
> [  278.629298]  ? __handle_mm_fault+0xc84/0x1360
> [  278.635113]  ? __sys_sendmsg+0x5e/0xa0
> [  278.640285]  __sys_sendmsg+0x5e/0xa0
> [  278.645239]  do_syscall_64+0x5b/0x1b0
> [  278.650274]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  278.656697] RIP: 0033:0x7f796abdeb87
> [  278.661628] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53
>  48 89 f3 48
> [  278.683248] RSP: 002b:00007ffde213ba48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [  278.692245] RAX: ffffffffffffffda RBX: 000000005d261e6f RCX: 00007f796abdeb87
> [  278.700862] RDX: 0000000000000000 RSI: 00007ffde213bab0 RDI: 0000000000000003
> [  278.709527] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
> [  278.718167] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
> [  278.726743] R13: 000000000067b580 R14: 0000000000000000 R15: 0000000000000000
> [  278.735302] Modules linked in: dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc mlx5_ib ib_uverbs intel_rapl ib_core sb_edac x86_pkg_temp_
> thermal intel_powerclamp coretemp kvm_intel kvm mlx5_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel ses mei_me enclosure mlxfw ipmi_ssif intel_cstate iTCO_wdt ptp mei
> pps_core iTCO_vendor_support pcspkr joydev intel_uncore i2c_i801 ipmi_si lpc_ich intel_rapl_perf ioatdma wmi dca pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_k
> ms_helper ttm drm mpt3sas raid_class scsi_transport_sas
> [  278.802263] CR2: 0000000000000000
> [  278.807170] ---[ end trace b1f0a442a279e66f ]---
>
> Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to
> flow_block_cb_setup_simple() function instead of hardcoded NULL pointer.
>
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 10ef90a7bddd..7245d287633d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -1175,6 +1175,8 @@ static int mlx5e_rep_setup_tc_cb(enum tc_setup_type type, void *type_data,
>  	}
>  }
>  
> +static LIST_HEAD(mlx5e_rep_block_cb_list);
> +

I think it is not necessary needs a extra LIST_HEAD, the early mlx5e_block_cb_list is ok

The early patch  http://patchwork.ozlabs.org/patch/1130439/ is enough.

>  static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  			      void *type_data)
>  {
> @@ -1182,7 +1184,8 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  
>  	switch (type) {
>  	case TC_SETUP_BLOCK:
> -		return flow_block_cb_setup_simple(type_data, NULL,
> +		return flow_block_cb_setup_simple(type_data,
> +						  &mlx5e_rep_block_cb_list,
>  						  mlx5e_rep_setup_tc_cb,
>  						  priv, priv, true);
>  	default:

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  0:29 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <f6bc7a95-e8e1-eec4-9728-3b9e36b434fa@fb.com>

On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> > BTF verifier has Different logic depending on whether we are following
> > a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> > stop early in DFS traversal while resolving BTF types. But it also
> > results in a size resolution bug, when there is a chain, e.g., of PTR ->
> > TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> > size won't be resolved, as it is considered to be a sink for pointer,
> > leading to TYPEDEF being in RESOLVED state with zero size, which is
> > completely wrong.
> >
> > Optimization is doubtful, though, as btf_check_all_types() will iterate
> > over all BTF types anyways, so the only saving is a potentially slightly
> > shorter stack. But correctness is more important that tiny savings.
> >
> > This bug manifests itself in rejecting BTF-defined maps that use array
> > typedef as a value type:
> >
> > typedef int array_t[16];
> >
> > struct {
> >       __uint(type, BPF_MAP_TYPE_ARRAY);
> >       __type(value, array_t); /* i.e., array_t *value; */
> > } test_map SEC(".maps");
> >
> > Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> > Cc: Martin KaFai Lau <kafai@fb.com>
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>
> The change seems okay to me. Currently, looks like intermediate
> modifier type will carry size = 0 (in the internal data structure).

Yes, which is totally wrong, especially that we use that size in some
cases to reject map with specified BTF.

>
> If we remove RESOLVE logic, we probably want to double check
> whether we handle circular types correctly or not. Maybe we will
> be okay if all self tests pass.

I checked, it does. We'll attempt to add referenced type unless it's a
"resolve sink" (where size is immediately known) or is already
resolved (it's state is RESOLVED). In other cases, we'll attempt to
env_stack_push(), which check that the state of that type is
NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
type is added into the stack, it's resolve state goes from NOT_VISITED
to VISITED.

So, if there is a loop, then we'll detect it as soon as we'll attempt
to add the same type onto the stack second time.

>
> I may still be worthwhile to qualify the RESOLVE optimization benefit
> before removing it.

I don't think there is any, because every type will be visited exactly
once, due to DFS nature of algorithm. The only difference is that if
we have a long chain of modifiers, we can technically reach the max
limit and fail. But at 32 I think it's pretty unrealistic to have such
a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)

>
> Another possible change is, for external usage, removing
> modifiers, before checking the size, something like below.
> Note that I am not strongly advocating my below patch as
> it has the same shortcoming that maintained modifier type
> size may not be correct.

I don't think your patch helps, it can actually confuse things even
more. It skips modifiers until underlying type is found, but you still
don't guarantee that at that time that underlying type will have its
size resolved.

>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 546ebee39e2a..6f927c3e0a89 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> btf_type *t)
>          return true;
>   }
>
> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> +                                                u32 *type_id, u32
> *ret_size,
> +                                                bool skip_modifier)
> +{
> +       const struct btf_type *size_type;
> +       u32 size_type_id = *type_id;
> +       u32 size = 0;
> +
> +       size_type = btf_type_by_id(btf, size_type_id);
> +       if (size_type && skip_modifier) {
> +               while (btf_type_is_modifier(size_type))
> +                       size_type = btf_type_by_id(btf, size_type->type);
> +       }
> +
> +       if (btf_type_nosize_or_null(size_type))
> +               return NULL;
> +
> +       if (btf_type_has_size(size_type)) {
> +               size = size_type->size;
> +       } else if (btf_type_is_array(size_type)) {
> +               size = btf->resolved_sizes[size_type_id];
> +       } else if (btf_type_is_ptr(size_type)) {
> +               size = sizeof(void *);
> +       } else {
> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> +                                !btf_type_is_var(size_type)))
> +                       return NULL;
> +
> +               size = btf->resolved_sizes[size_type_id];
> +               size_type_id = btf->resolved_ids[size_type_id];
> +               size_type = btf_type_by_id(btf, size_type_id);
> +               if (btf_type_nosize_or_null(size_type))
> +                       return NULL;
> +       }
> +
> +       *type_id = size_type_id;
> +       if (ret_size)
> +               *ret_size = size;
> +
> +       return size_type;
> +}
> +
> +const struct btf_type *btf_type_id_size(const struct btf *btf,
> +                                       u32 *type_id, u32 *ret_size)
> +{
> +       return __btf_type_id_size(btf, type_id, ret_size, true);
> +}
> +
>   /*
>    * Check that given struct member is a regular int with expected
>    * offset and size.
> @@ -633,7 +681,7 @@ bool btf_member_is_reg_int(const struct btf *btf,
> const struct btf_type *s,
>          u8 nr_bits;
>
>          id = m->type;
> -       t = btf_type_id_size(btf, &id, NULL);
> +       t = __btf_type_id_size(btf, &id, NULL, false);
>          if (!t || !btf_type_is_int(t))
>                  return false;
>
> @@ -1051,42 +1099,6 @@ static const struct btf_type
> *btf_type_id_resolve(const struct btf *btf,
>          return btf_type_by_id(btf, *type_id);
>   }
>
> -const struct btf_type *btf_type_id_size(const struct btf *btf,
> -                                       u32 *type_id, u32 *ret_size)
> -{
> -       const struct btf_type *size_type;
> -       u32 size_type_id = *type_id;
> -       u32 size = 0;
> -
> -       size_type = btf_type_by_id(btf, size_type_id);
> -       if (btf_type_nosize_or_null(size_type))
> -               return NULL;
> -
> -       if (btf_type_has_size(size_type)) {
> -               size = size_type->size;
> -       } else if (btf_type_is_array(size_type)) {
> -               size = btf->resolved_sizes[size_type_id];
> -       } else if (btf_type_is_ptr(size_type)) {
> -               size = sizeof(void *);
> -       } else {
> -               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> -                                !btf_type_is_var(size_type)))
> -                       return NULL;
> -
> -               size = btf->resolved_sizes[size_type_id];
> -               size_type_id = btf->resolved_ids[size_type_id];
> -               size_type = btf_type_by_id(btf, size_type_id);
> -               if (btf_type_nosize_or_null(size_type))
> -                       return NULL;
> -       }
> -
> -       *type_id = size_type_id;
> -       if (ret_size)
> -               *ret_size = size;
> -
> -       return size_type;
> -}
> -
>   static int btf_df_check_member(struct btf_verifier_env *env,
>                                 const struct btf_type *struct_type,
>                                 const struct btf_member *member,
> @@ -1489,7 +1501,7 @@ static int btf_modifier_check_member(struct
> btf_verifier_env *env,
>          struct btf_member resolved_member;
>          struct btf *btf = env->btf;
>
> -       resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
> +       resolved_type = __btf_type_id_size(btf, &resolved_type_id, NULL,
> false);
>          if (!resolved_type) {
>                  btf_verifier_log_member(env, struct_type, member,
>                                          "Invalid member");
> @@ -1514,7 +1526,7 @@ static int btf_modifier_check_kflag_member(struct
> btf_verifier_env *env,
>          struct btf_member resolved_member;
>          struct btf *btf = env->btf;
>
> -       resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
> +       resolved_type = __btf_type_id_size(btf, &resolved_type_id, NULL,
> false);
>          if (!resolved_type) {
>                  btf_verifier_log_member(env, struct_type, member,
>                                          "Invalid member");
> @@ -1620,7 +1632,7 @@ static int btf_modifier_resolve(struct
> btf_verifier_env *env,
>           * save us a few type-following when we use it later (e.g. in
>           * pretty print).
>           */
> -       if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
> +       if (!__btf_type_id_size(btf, &next_type_id, &next_type_size,
> false)) {
>                  if (env_type_is_resolved(env, next_type_id))
>                          next_type = btf_type_id_resolve(btf,
> &next_type_id);
>
> @@ -1675,7 +1687,7 @@ static int btf_var_resolve(struct btf_verifier_env
> *env,
>           * forward types or similar that would resolve to size of
>           * zero is allowed.
>           */
> -       if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
> +       if (!__btf_type_id_size(btf, &next_type_id, &next_type_size,
> false)) {
>                  btf_verifier_log_type(env, v->t, "Invalid type_id");
>                  return -EINVAL;
>          }
> @@ -1725,7 +1737,7 @@ static int btf_ptr_resolve(struct btf_verifier_env
> *env,
>                                                resolved_type_id);
>          }
>
> -       if (!btf_type_id_size(btf, &next_type_id, NULL)) {
> +       if (!__btf_type_id_size(btf, &next_type_id, NULL, false)) {
>                  if (env_type_is_resolved(env, next_type_id))
>                          next_type = btf_type_id_resolve(btf,
> &next_type_id);
>
> @@ -1851,7 +1863,7 @@ static int btf_array_check_member(struct
> btf_verifier_env *env,
>          }
>
>          array_type_id = member->type;
> -       btf_type_id_size(btf, &array_type_id, &array_size);
> +       __btf_type_id_size(btf, &array_type_id, &array_size, false);
>          struct_size = struct_type->size;
>          bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
>          if (struct_size - bytes_offset < array_size) {
> @@ -1938,7 +1950,7 @@ static int btf_array_resolve(struct
> btf_verifier_env *env,
>              !env_type_is_resolved(env, index_type_id))
>                  return env_stack_push(env, index_type, index_type_id);
>
> -       index_type = btf_type_id_size(btf, &index_type_id, NULL);
> +       index_type = __btf_type_id_size(btf, &index_type_id, NULL, false);
>          if (!index_type || !btf_type_is_int(index_type) ||
>              !btf_type_int_is_regular(index_type)) {
>                  btf_verifier_log_type(env, v->t, "Invalid index");
> @@ -1959,7 +1971,7 @@ static int btf_array_resolve(struct
> btf_verifier_env *env,
>              !env_type_is_resolved(env, elem_type_id))
>                  return env_stack_push(env, elem_type, elem_type_id);
>
> -       elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
> +       elem_type = __btf_type_id_size(btf, &elem_type_id, &elem_size,
> false);
>          if (!elem_type) {
>                  btf_verifier_log_type(env, v->t, "Invalid elem");
>                  return -EINVAL;
> @@ -2000,7 +2012,7 @@ static void btf_array_seq_show(const struct btf
> *btf, const struct btf_type *t,
>          u32 i, elem_size, elem_type_id;
>
>          elem_type_id = array->type;
> -       elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
> +       elem_type = __btf_type_id_size(btf, &elem_type_id, &elem_size,
> false);
>          elem_ops = btf_type_ops(elem_type);
>          seq_puts(m, "[");
>          for (i = 0; i < array->nelems; i++) {
> @@ -2732,7 +2744,7 @@ static int btf_datasec_resolve(struct
> btf_verifier_env *env,
>                  }
>
>                  type_id = var_type->type;
> -               if (!btf_type_id_size(btf, &type_id, &type_size)) {
> +               if (!__btf_type_id_size(btf, &type_id, &type_size, false)) {
>                          btf_verifier_log_vsi(env, v->t, vsi, "Invalid
> type");
>                          return -EINVAL;
>                  }
> @@ -2813,7 +2825,7 @@ static int btf_func_proto_check(struct
> btf_verifier_env *env,
>                  }
>
>                  /* Ensure the return type is a type that has a size */
> -               if (!btf_type_id_size(btf, &ret_type_id, NULL)) {
> +               if (!__btf_type_id_size(btf, &ret_type_id, NULL, false)) {
>                          btf_verifier_log_type(env, t, "Invalid return
> type");
>                          return -EINVAL;
>                  }
> @@ -2861,7 +2873,7 @@ static int btf_func_proto_check(struct
> btf_verifier_env *env,
>                                  break;
>                  }
>
> -               if (!btf_type_id_size(btf, &arg_type_id, NULL)) {
> +               if (!__btf_type_id_size(btf, &arg_type_id, NULL, false)) {
>                          btf_verifier_log_type(env, t, "Invalid arg#%u",
> i + 1);
>                          err = -EINVAL;
>                          break;
> @@ -3014,7 +3026,7 @@ static bool btf_resolve_valid(struct
> btf_verifier_env *env,
>                  u32 elem_type_id = array->type;
>                  u32 elem_size;
>
> -               elem_type = btf_type_id_size(btf, &elem_type_id,
> &elem_size);
> +               elem_type = __btf_type_id_size(btf, &elem_type_id,
> &elem_size, false);
>                  return elem_type && !btf_type_is_modifier(elem_type) &&
>                          (array->nelems * elem_size ==
>                           btf->resolved_sizes[type_id]);
>
>
> > ---
> >   kernel/bpf/btf.c | 42 +++---------------------------------------
> >   1 file changed, 3 insertions(+), 39 deletions(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index cad09858a5f2..c68c7e73b0d1 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -231,14 +231,6 @@ enum visit_state {
> >       RESOLVED,
> >   };
> >
> > -enum resolve_mode {
> > -     RESOLVE_TBD,    /* To Be Determined */
> > -     RESOLVE_PTR,    /* Resolving for Pointer */
> > -     RESOLVE_STRUCT_OR_ARRAY,        /* Resolving for struct/union
> > -                                      * or array
> > -                                      */
> > -};
> > -
> >   #define MAX_RESOLVE_DEPTH 32
> >
> >   struct btf_sec_info {
> > @@ -254,7 +246,6 @@ struct btf_verifier_env {
> >       u32 log_type_id;
> >       u32 top_stack;
> >       enum verifier_phase phase;
> > -     enum resolve_mode resolve_mode;
> >   };
> >
> >   static const char * const btf_kind_str[NR_BTF_KINDS] = {
> > @@ -964,26 +955,7 @@ static void btf_verifier_env_free(struct btf_verifier_env *env)
> >   static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
> >                                    const struct btf_type *next_type)
> >   {
> > -     switch (env->resolve_mode) {
> > -     case RESOLVE_TBD:
> > -             /* int, enum or void is a sink */
> > -             return !btf_type_needs_resolve(next_type);
> > -     case RESOLVE_PTR:
> > -             /* int, enum, void, struct, array, func or func_proto is a sink
> > -              * for ptr
> > -              */
> > -             return !btf_type_is_modifier(next_type) &&
> > -                     !btf_type_is_ptr(next_type);
> > -     case RESOLVE_STRUCT_OR_ARRAY:
> > -             /* int, enum, void, ptr, func or func_proto is a sink
> > -              * for struct and array
> > -              */
> > -             return !btf_type_is_modifier(next_type) &&
> > -                     !btf_type_is_array(next_type) &&
> > -                     !btf_type_is_struct(next_type);
> > -     default:
> > -             BUG();
> > -     }
> > +     return !btf_type_needs_resolve(next_type);
> >   }
> >
> >   static bool env_type_is_resolved(const struct btf_verifier_env *env,
> > @@ -1010,13 +982,6 @@ static int env_stack_push(struct btf_verifier_env *env,
> >       v->type_id = type_id;
> >       v->next_member = 0;
> >
> > -     if (env->resolve_mode == RESOLVE_TBD) {
> > -             if (btf_type_is_ptr(t))
> > -                     env->resolve_mode = RESOLVE_PTR;
> > -             else if (btf_type_is_struct(t) || btf_type_is_array(t))
> > -                     env->resolve_mode = RESOLVE_STRUCT_OR_ARRAY;
> > -     }
> > -
> >       return 0;
> >   }
> >
> > @@ -1038,7 +1003,7 @@ static void env_stack_pop_resolved(struct btf_verifier_env *env,
> >       env->visit_states[type_id] = RESOLVED;
> >   }
> >
> > -static const struct resolve_vertex *env_stack_peak(struct btf_verifier_env *env)
> > +static const struct resolve_vertex *env_stack_peek(struct btf_verifier_env *env)
> >   {
> >       return env->top_stack ? &env->stack[env->top_stack - 1] : NULL;
> >   }
> > @@ -3030,9 +2995,8 @@ static int btf_resolve(struct btf_verifier_env *env,
> >       const struct resolve_vertex *v;
> >       int err = 0;
> >
> > -     env->resolve_mode = RESOLVE_TBD;
> >       env_stack_push(env, t, type_id);
> > -     while (!err && (v = env_stack_peak(env))) {
> > +     while (!err && (v = env_stack_peek(env))) {
> >               env->log_type_id = v->type_id;
> >               err = btf_type_ops(v->t)->resolve(env, v);
> >       }
> >

^ permalink raw reply

* Re: [PATCH] ipvs: remove unnecessary space
From: Joe Perches @ 2019-07-11  0:22 UTC (permalink / raw)
  To: Simon Horman, yangxingwu, Pablo Neira Ayuso
  Cc: wensong, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190710080609.smxjqe2d5jyro4hv@verge.net.au>

On Wed, 2019-07-10 at 10:06 +0200, Simon Horman wrote:
> On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > this patch removes the extra space.
> > 
> > Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
> 
> Thanks, this looks good to me.
> 
> Acked-by: Simon Horman <horms@verge.net.au>
> 
> Pablo, please consider including this in nf-next.
> 
> 
> > ---
> >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > index 94d9d34..98e358e 100644
> > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> >  		return 0;
> >  	}
> >  
> > -	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > -			 sizeof(unsigned long), GFP_KERNEL);
> > +	table =	kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > +			sizeof(unsigned long), GFP_KERNEL);

bitmap_alloc?

> >  	if (!table)
> >  		return -ENOMEM;
> >  
> > -- 
> > 1.8.3.1
> > 


^ permalink raw reply

* Re: [PATCH net-next,v4 12/12] netfilter: nf_tables: add hardware offload support
From: Pablo Neira Ayuso @ 2019-07-11  0:18 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, thomas.lendacky, f.fainelli, ariel.elior,
	michael.chan, madalin.bucur, yisen.zhuang, salil.mehta,
	jeffrey.t.kirsher, tariqt, saeedm, jiri, idosch, jakub.kicinski,
	peppe.cavallaro, grygorii.strashko, andrew, vivien.didelot,
	alexandre.torgue, joabreu, linux-net-drivers, ogerlitz,
	Manish.Chopra, marcelo.leitner, mkubecek, venkatkumar.duvvuru,
	maxime.chevallier, cphealy, phil, netfilter-devel
In-Reply-To: <20190710075227.GA4362@nanopsycho>

On Wed, Jul 10, 2019 at 09:52:27AM +0200, Jiri Pirko wrote:
> Tue, Jul 09, 2019 at 10:55:50PM CEST, pablo@netfilter.org wrote:
> 
> [...]
> 
> >+	if (!dev || !dev->netdev_ops->ndo_setup_tc)
> 
> Why didn't you rename ndo_setup_tc? I put a comment about it in the
> previous version thread. I expect that you can at least write why it is
> a wrong idea.

This is a good idea. It happened that I read this email by when my new
patch series was ready. I will follow up with a patch to address this
rename as soon as the bug fixes are sorted out.

Thanks.

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  0:16 UTC (permalink / raw)
  To: Andrii Nakryiko, andrii.nakryiko@gmail.com, Alexei Starovoitov,
	daniel@iogearbox.net, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Kernel Team
  Cc: Martin Lau
In-Reply-To: <20190710080840.2613160-1-andriin@fb.com>



On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> BTF verifier has Different logic depending on whether we are following
> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> stop early in DFS traversal while resolving BTF types. But it also
> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> size won't be resolved, as it is considered to be a sink for pointer,
> leading to TYPEDEF being in RESOLVED state with zero size, which is
> completely wrong.
> 
> Optimization is doubtful, though, as btf_check_all_types() will iterate
> over all BTF types anyways, so the only saving is a potentially slightly
> shorter stack. But correctness is more important that tiny savings.
> 
> This bug manifests itself in rejecting BTF-defined maps that use array
> typedef as a value type:
> 
> typedef int array_t[16];
> 
> struct {
> 	__uint(type, BPF_MAP_TYPE_ARRAY);
> 	__type(value, array_t); /* i.e., array_t *value; */
> } test_map SEC(".maps");
> 
> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

The change seems okay to me. Currently, looks like intermediate
modifier type will carry size = 0 (in the internal data structure).

If we remove RESOLVE logic, we probably want to double check
whether we handle circular types correctly or not. Maybe we will
be okay if all self tests pass.

I may still be worthwhile to qualify the RESOLVE optimization benefit
before removing it.

Another possible change is, for external usage, removing
modifiers, before checking the size, something like below.
Note that I am not strongly advocating my below patch as
it has the same shortcoming that maintained modifier type
size may not be correct.

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 546ebee39e2a..6f927c3e0a89 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct 
btf_type *t)
         return true;
  }

+static const struct btf_type *__btf_type_id_size(const struct btf *btf,
+                                                u32 *type_id, u32 
*ret_size,
+                                                bool skip_modifier)
+{
+       const struct btf_type *size_type;
+       u32 size_type_id = *type_id;
+       u32 size = 0;
+
+       size_type = btf_type_by_id(btf, size_type_id);
+       if (size_type && skip_modifier) {
+               while (btf_type_is_modifier(size_type))
+                       size_type = btf_type_by_id(btf, size_type->type);
+       }
+
+       if (btf_type_nosize_or_null(size_type))
+               return NULL;
+
+       if (btf_type_has_size(size_type)) {
+               size = size_type->size;
+       } else if (btf_type_is_array(size_type)) {
+               size = btf->resolved_sizes[size_type_id];
+       } else if (btf_type_is_ptr(size_type)) {
+               size = sizeof(void *);
+       } else {
+               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
+                                !btf_type_is_var(size_type)))
+                       return NULL;
+
+               size = btf->resolved_sizes[size_type_id];
+               size_type_id = btf->resolved_ids[size_type_id];
+               size_type = btf_type_by_id(btf, size_type_id);
+               if (btf_type_nosize_or_null(size_type))
+                       return NULL;
+       }
+
+       *type_id = size_type_id;
+       if (ret_size)
+               *ret_size = size;
+
+       return size_type;
+}
+
+const struct btf_type *btf_type_id_size(const struct btf *btf,
+                                       u32 *type_id, u32 *ret_size)
+{
+       return __btf_type_id_size(btf, type_id, ret_size, true);
+}
+
  /*
   * Check that given struct member is a regular int with expected
   * offset and size.
@@ -633,7 +681,7 @@ bool btf_member_is_reg_int(const struct btf *btf, 
const struct btf_type *s,
         u8 nr_bits;

         id = m->type;
-       t = btf_type_id_size(btf, &id, NULL);
+       t = __btf_type_id_size(btf, &id, NULL, false);
         if (!t || !btf_type_is_int(t))
                 return false;

@@ -1051,42 +1099,6 @@ static const struct btf_type 
*btf_type_id_resolve(const struct btf *btf,
         return btf_type_by_id(btf, *type_id);
  }

-const struct btf_type *btf_type_id_size(const struct btf *btf,
-                                       u32 *type_id, u32 *ret_size)
-{
-       const struct btf_type *size_type;
-       u32 size_type_id = *type_id;
-       u32 size = 0;
-
-       size_type = btf_type_by_id(btf, size_type_id);
-       if (btf_type_nosize_or_null(size_type))
-               return NULL;
-
-       if (btf_type_has_size(size_type)) {
-               size = size_type->size;
-       } else if (btf_type_is_array(size_type)) {
-               size = btf->resolved_sizes[size_type_id];
-       } else if (btf_type_is_ptr(size_type)) {
-               size = sizeof(void *);
-       } else {
-               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
-                                !btf_type_is_var(size_type)))
-                       return NULL;
-
-               size = btf->resolved_sizes[size_type_id];
-               size_type_id = btf->resolved_ids[size_type_id];
-               size_type = btf_type_by_id(btf, size_type_id);
-               if (btf_type_nosize_or_null(size_type))
-                       return NULL;
-       }
-
-       *type_id = size_type_id;
-       if (ret_size)
-               *ret_size = size;
-
-       return size_type;
-}
-
  static int btf_df_check_member(struct btf_verifier_env *env,
                                const struct btf_type *struct_type,
                                const struct btf_member *member,
@@ -1489,7 +1501,7 @@ static int btf_modifier_check_member(struct 
btf_verifier_env *env,
         struct btf_member resolved_member;
         struct btf *btf = env->btf;

-       resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
+       resolved_type = __btf_type_id_size(btf, &resolved_type_id, NULL, 
false);
         if (!resolved_type) {
                 btf_verifier_log_member(env, struct_type, member,
                                         "Invalid member");
@@ -1514,7 +1526,7 @@ static int btf_modifier_check_kflag_member(struct 
btf_verifier_env *env,
         struct btf_member resolved_member;
         struct btf *btf = env->btf;

-       resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
+       resolved_type = __btf_type_id_size(btf, &resolved_type_id, NULL, 
false);
         if (!resolved_type) {
                 btf_verifier_log_member(env, struct_type, member,
                                         "Invalid member");
@@ -1620,7 +1632,7 @@ static int btf_modifier_resolve(struct 
btf_verifier_env *env,
          * save us a few type-following when we use it later (e.g. in
          * pretty print).
          */
-       if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
+       if (!__btf_type_id_size(btf, &next_type_id, &next_type_size, 
false)) {
                 if (env_type_is_resolved(env, next_type_id))
                         next_type = btf_type_id_resolve(btf, 
&next_type_id);

@@ -1675,7 +1687,7 @@ static int btf_var_resolve(struct btf_verifier_env 
*env,
          * forward types or similar that would resolve to size of
          * zero is allowed.
          */
-       if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
+       if (!__btf_type_id_size(btf, &next_type_id, &next_type_size, 
false)) {
                 btf_verifier_log_type(env, v->t, "Invalid type_id");
                 return -EINVAL;
         }
@@ -1725,7 +1737,7 @@ static int btf_ptr_resolve(struct btf_verifier_env 
*env,
                                               resolved_type_id);
         }

-       if (!btf_type_id_size(btf, &next_type_id, NULL)) {
+       if (!__btf_type_id_size(btf, &next_type_id, NULL, false)) {
                 if (env_type_is_resolved(env, next_type_id))
                         next_type = btf_type_id_resolve(btf, 
&next_type_id);

@@ -1851,7 +1863,7 @@ static int btf_array_check_member(struct 
btf_verifier_env *env,
         }

         array_type_id = member->type;
-       btf_type_id_size(btf, &array_type_id, &array_size);
+       __btf_type_id_size(btf, &array_type_id, &array_size, false);
         struct_size = struct_type->size;
         bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
         if (struct_size - bytes_offset < array_size) {
@@ -1938,7 +1950,7 @@ static int btf_array_resolve(struct 
btf_verifier_env *env,
             !env_type_is_resolved(env, index_type_id))
                 return env_stack_push(env, index_type, index_type_id);

-       index_type = btf_type_id_size(btf, &index_type_id, NULL);
+       index_type = __btf_type_id_size(btf, &index_type_id, NULL, false);
         if (!index_type || !btf_type_is_int(index_type) ||
             !btf_type_int_is_regular(index_type)) {
                 btf_verifier_log_type(env, v->t, "Invalid index");
@@ -1959,7 +1971,7 @@ static int btf_array_resolve(struct 
btf_verifier_env *env,
             !env_type_is_resolved(env, elem_type_id))
                 return env_stack_push(env, elem_type, elem_type_id);

-       elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
+       elem_type = __btf_type_id_size(btf, &elem_type_id, &elem_size, 
false);
         if (!elem_type) {
                 btf_verifier_log_type(env, v->t, "Invalid elem");
                 return -EINVAL;
@@ -2000,7 +2012,7 @@ static void btf_array_seq_show(const struct btf 
*btf, const struct btf_type *t,
         u32 i, elem_size, elem_type_id;

         elem_type_id = array->type;
-       elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
+       elem_type = __btf_type_id_size(btf, &elem_type_id, &elem_size, 
false);
         elem_ops = btf_type_ops(elem_type);
         seq_puts(m, "[");
         for (i = 0; i < array->nelems; i++) {
@@ -2732,7 +2744,7 @@ static int btf_datasec_resolve(struct 
btf_verifier_env *env,
                 }

                 type_id = var_type->type;
-               if (!btf_type_id_size(btf, &type_id, &type_size)) {
+               if (!__btf_type_id_size(btf, &type_id, &type_size, false)) {
                         btf_verifier_log_vsi(env, v->t, vsi, "Invalid 
type");
                         return -EINVAL;
                 }
@@ -2813,7 +2825,7 @@ static int btf_func_proto_check(struct 
btf_verifier_env *env,
                 }

                 /* Ensure the return type is a type that has a size */
-               if (!btf_type_id_size(btf, &ret_type_id, NULL)) {
+               if (!__btf_type_id_size(btf, &ret_type_id, NULL, false)) {
                         btf_verifier_log_type(env, t, "Invalid return 
type");
                         return -EINVAL;
                 }
@@ -2861,7 +2873,7 @@ static int btf_func_proto_check(struct 
btf_verifier_env *env,
                                 break;
                 }

-               if (!btf_type_id_size(btf, &arg_type_id, NULL)) {
+               if (!__btf_type_id_size(btf, &arg_type_id, NULL, false)) {
                         btf_verifier_log_type(env, t, "Invalid arg#%u", 
i + 1);
                         err = -EINVAL;
                         break;
@@ -3014,7 +3026,7 @@ static bool btf_resolve_valid(struct 
btf_verifier_env *env,
                 u32 elem_type_id = array->type;
                 u32 elem_size;

-               elem_type = btf_type_id_size(btf, &elem_type_id, 
&elem_size);
+               elem_type = __btf_type_id_size(btf, &elem_type_id, 
&elem_size, false);
                 return elem_type && !btf_type_is_modifier(elem_type) &&
                         (array->nelems * elem_size ==
                          btf->resolved_sizes[type_id]);


> ---
>   kernel/bpf/btf.c | 42 +++---------------------------------------
>   1 file changed, 3 insertions(+), 39 deletions(-)
> 
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index cad09858a5f2..c68c7e73b0d1 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -231,14 +231,6 @@ enum visit_state {
>   	RESOLVED,
>   };
>   
> -enum resolve_mode {
> -	RESOLVE_TBD,	/* To Be Determined */
> -	RESOLVE_PTR,	/* Resolving for Pointer */
> -	RESOLVE_STRUCT_OR_ARRAY,	/* Resolving for struct/union
> -					 * or array
> -					 */
> -};
> -
>   #define MAX_RESOLVE_DEPTH 32
>   
>   struct btf_sec_info {
> @@ -254,7 +246,6 @@ struct btf_verifier_env {
>   	u32 log_type_id;
>   	u32 top_stack;
>   	enum verifier_phase phase;
> -	enum resolve_mode resolve_mode;
>   };
>   
>   static const char * const btf_kind_str[NR_BTF_KINDS] = {
> @@ -964,26 +955,7 @@ static void btf_verifier_env_free(struct btf_verifier_env *env)
>   static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
>   				     const struct btf_type *next_type)
>   {
> -	switch (env->resolve_mode) {
> -	case RESOLVE_TBD:
> -		/* int, enum or void is a sink */
> -		return !btf_type_needs_resolve(next_type);
> -	case RESOLVE_PTR:
> -		/* int, enum, void, struct, array, func or func_proto is a sink
> -		 * for ptr
> -		 */
> -		return !btf_type_is_modifier(next_type) &&
> -			!btf_type_is_ptr(next_type);
> -	case RESOLVE_STRUCT_OR_ARRAY:
> -		/* int, enum, void, ptr, func or func_proto is a sink
> -		 * for struct and array
> -		 */
> -		return !btf_type_is_modifier(next_type) &&
> -			!btf_type_is_array(next_type) &&
> -			!btf_type_is_struct(next_type);
> -	default:
> -		BUG();
> -	}
> +	return !btf_type_needs_resolve(next_type);
>   }
>   
>   static bool env_type_is_resolved(const struct btf_verifier_env *env,
> @@ -1010,13 +982,6 @@ static int env_stack_push(struct btf_verifier_env *env,
>   	v->type_id = type_id;
>   	v->next_member = 0;
>   
> -	if (env->resolve_mode == RESOLVE_TBD) {
> -		if (btf_type_is_ptr(t))
> -			env->resolve_mode = RESOLVE_PTR;
> -		else if (btf_type_is_struct(t) || btf_type_is_array(t))
> -			env->resolve_mode = RESOLVE_STRUCT_OR_ARRAY;
> -	}
> -
>   	return 0;
>   }
>   
> @@ -1038,7 +1003,7 @@ static void env_stack_pop_resolved(struct btf_verifier_env *env,
>   	env->visit_states[type_id] = RESOLVED;
>   }
>   
> -static const struct resolve_vertex *env_stack_peak(struct btf_verifier_env *env)
> +static const struct resolve_vertex *env_stack_peek(struct btf_verifier_env *env)
>   {
>   	return env->top_stack ? &env->stack[env->top_stack - 1] : NULL;
>   }
> @@ -3030,9 +2995,8 @@ static int btf_resolve(struct btf_verifier_env *env,
>   	const struct resolve_vertex *v;
>   	int err = 0;
>   
> -	env->resolve_mode = RESOLVE_TBD;
>   	env_stack_push(env, t, type_id);
> -	while (!err && (v = env_stack_peak(env))) {
> +	while (!err && (v = env_stack_peek(env))) {
>   		env->log_type_id = v->type_id;
>   		err = btf_type_ops(v->t)->resolve(env, v);
>   	}
> 

^ permalink raw reply related

* [PATCH net-next 2/3] net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_t
From: Pablo Neira Ayuso @ 2019-07-11  0:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski
In-Reply-To: <20190711001235.20686-1-pablo@netfilter.org>

Rename this type definition and adapt users.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
This patch is a dependency for patch 3/3, so include/net/flow_offload.h
does not need to include include/net/sch_cls.h, and hence avoid a
circular inclusion.

 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  2 +-
 drivers/net/ethernet/mscc/ocelot_tc.c          |  2 +-
 include/net/flow_offload.h                     | 16 ++++++++++------
 include/net/pkt_cls.h                          |  4 ++--
 include/net/sch_generic.h                      |  6 ++----
 net/core/flow_offload.c                        |  9 +++++----
 net/dsa/slave.c                                |  2 +-
 net/sched/cls_api.c                            |  2 +-
 net/sched/cls_bpf.c                            |  2 +-
 net/sched/cls_flower.c                         |  2 +-
 net/sched/cls_matchall.c                       |  2 +-
 net/sched/cls_u32.c                            |  6 +++---
 12 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index a469035400cf..51cd0b6f1f3e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1679,7 +1679,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 				   struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 	bool ingress;
 	int err;
 
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index abbcb66bf5ac..fba9512e9ca6 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -134,7 +134,7 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 				 struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 	int err;
 
 	netdev_dbg(port->dev, "tc_block command %d, binder_type %d\n",
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index aa9b5287b231..98bf3af5c84d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -3,7 +3,6 @@
 
 #include <linux/kernel.h>
 #include <net/flow_dissector.h>
-#include <net/sch_generic.h>
 
 struct flow_match {
 	struct flow_dissector	*dissector;
@@ -261,23 +260,27 @@ struct flow_block_offload {
 	struct netlink_ext_ack *extack;
 };
 
+enum tc_setup_type;
+typedef int flow_setup_cb_t(enum tc_setup_type type, void *type_data,
+			    void *cb_priv);
+
 struct flow_block_cb {
 	struct list_head	driver_list;
 	struct list_head	list;
-	tc_setup_cb_t		*cb;
+	flow_setup_cb_t		*cb;
 	void			*cb_ident;
 	void			*cb_priv;
 	void			(*release)(void *cb_priv);
 	unsigned int		refcnt;
 };
 
-struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv));
 void flow_block_cb_free(struct flow_block_cb *block_cb);
 
 struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *offload,
-					   tc_setup_cb_t *cb, void *cb_ident);
+					   flow_setup_cb_t *cb, void *cb_ident);
 
 void *flow_block_cb_priv(struct flow_block_cb *block_cb);
 void flow_block_cb_incref(struct flow_block_cb *block_cb);
@@ -295,11 +298,12 @@ static inline void flow_block_cb_remove(struct flow_block_cb *block_cb,
 	list_move(&block_cb->list, &offload->cb_list);
 }
 
-bool flow_block_cb_is_busy(tc_setup_cb_t *cb, void *cb_ident,
+bool flow_block_cb_is_busy(flow_setup_cb_t *cb, void *cb_ident,
 			   struct list_head *driver_block_list);
 
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
-			       struct list_head *driver_list, tc_setup_cb_t *cb,
+			       struct list_head *driver_list,
+			       flow_setup_cb_t *cb,
 			       void *cb_ident, void *cb_priv, bool ingress_only);
 
 enum flow_cls_command {
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 841faadceb6e..cee651b76a1f 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -126,14 +126,14 @@ static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 }
 
 static inline
-int tc_setup_cb_block_register(struct tcf_block *block, tc_setup_cb_t *cb,
+int tc_setup_cb_block_register(struct tcf_block *block, flow_setup_cb_t *cb,
 			       void *cb_priv)
 {
 	return 0;
 }
 
 static inline
-void tc_setup_cb_block_unregister(struct tcf_block *block, tc_setup_cb_t *cb,
+void tc_setup_cb_block_unregister(struct tcf_block *block, flow_setup_cb_t *cb,
 				  void *cb_priv)
 {
 }
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 855167bbc372..9482e060483b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -15,6 +15,7 @@
 #include <linux/mutex.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
+#include <net/flow_offload.h>
 
 struct Qdisc_ops;
 struct qdisc_walker;
@@ -22,9 +23,6 @@ struct tcf_walker;
 struct module;
 struct bpf_flow_keys;
 
-typedef int tc_setup_cb_t(enum tc_setup_type type,
-			  void *type_data, void *cb_priv);
-
 typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
 				    enum tc_setup_type type, void *type_data);
 
@@ -313,7 +311,7 @@ struct tcf_proto_ops {
 	void			(*walk)(struct tcf_proto *tp,
 					struct tcf_walker *arg, bool rtnl_held);
 	int			(*reoffload)(struct tcf_proto *tp, bool add,
-					     tc_setup_cb_t *cb, void *cb_priv,
+					     flow_setup_cb_t *cb, void *cb_priv,
 					     struct netlink_ext_ack *extack);
 	void			(*bind_class)(void *, u32, unsigned long);
 	void *			(*tmplt_create)(struct net *net,
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 507de4b48815..a800fa78d96c 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -165,7 +165,7 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
 
-struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv))
 {
@@ -194,7 +194,7 @@ void flow_block_cb_free(struct flow_block_cb *block_cb)
 EXPORT_SYMBOL(flow_block_cb_free);
 
 struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
-					   tc_setup_cb_t *cb, void *cb_ident)
+					   flow_setup_cb_t *cb, void *cb_ident)
 {
 	struct flow_block_cb *block_cb;
 
@@ -226,7 +226,7 @@ unsigned int flow_block_cb_decref(struct flow_block_cb *block_cb)
 }
 EXPORT_SYMBOL(flow_block_cb_decref);
 
-bool flow_block_cb_is_busy(tc_setup_cb_t *cb, void *cb_ident,
+bool flow_block_cb_is_busy(flow_setup_cb_t *cb, void *cb_ident,
 			   struct list_head *driver_block_list)
 {
 	struct flow_block_cb *block_cb;
@@ -243,7 +243,8 @@ EXPORT_SYMBOL(flow_block_cb_is_busy);
 
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
 			       struct list_head *driver_block_list,
-			       tc_setup_cb_t *cb, void *cb_ident, void *cb_priv,
+			       flow_setup_cb_t *cb,
+			       void *cb_ident, void *cb_priv,
 			       bool ingress_only)
 {
 	struct flow_block_cb *block_cb;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 6ca9ec58f881..d697a64fb564 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -951,7 +951,7 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 				    struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 
 	if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		cb = dsa_slave_setup_tc_block_cb_ig;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 278014e26aec..51fbe6e95a92 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1514,7 +1514,7 @@ void tcf_block_put(struct tcf_block *block)
 EXPORT_SYMBOL(tcf_block_put);
 
 static int
-tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
+tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb,
 			    void *cb_priv, bool add, bool offload_in_use,
 			    struct netlink_ext_ack *extack)
 {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 691f71830134..3f7a9c02b70c 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -651,7 +651,7 @@ static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 	}
 }
 
-static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			     void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 38d6e85693fc..054123742e32 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1800,7 +1800,7 @@ fl_get_next_hw_filter(struct tcf_proto *tp, struct cls_fl_filter *f, bool add)
 	return NULL;
 }
 
-static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct tcf_block *block = tp->chain->block;
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index a30d2f8feb32..455ea2793f9b 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -282,7 +282,7 @@ static void mall_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 	arg->count++;
 }
 
-static int mall_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int mall_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			  void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct cls_mall_head *head = rtnl_dereference(tp->root);
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index be9e46c77e8b..8614088edd1b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -1152,7 +1152,7 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 }
 
 static int u32_reoffload_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
-			       bool add, tc_setup_cb_t *cb, void *cb_priv,
+			       bool add, flow_setup_cb_t *cb, void *cb_priv,
 			       struct netlink_ext_ack *extack)
 {
 	struct tc_cls_u32_offload cls_u32 = {};
@@ -1172,7 +1172,7 @@ static int u32_reoffload_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
 }
 
 static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
-			       bool add, tc_setup_cb_t *cb, void *cb_priv,
+			       bool add, flow_setup_cb_t *cb, void *cb_priv,
 			       struct netlink_ext_ack *extack)
 {
 	struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
@@ -1213,7 +1213,7 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 	return 0;
 }
 
-static int u32_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int u32_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			 void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct tc_u_common *tp_c = tp->data;
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next 3/3] net: flow_offload: add flow_block structure and use it
From: Pablo Neira Ayuso @ 2019-07-11  0:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski
In-Reply-To: <20190711001235.20686-1-pablo@netfilter.org>

This object stores the flow block callbacks that are attached to this
block. This patch restores block sharing.

Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/flow_offload.h        |  5 +++++
 include/net/netfilter/nf_tables.h |  5 +++--
 include/net/sch_generic.h         |  2 +-
 net/core/flow_offload.c           |  2 +-
 net/netfilter/nf_tables_api.c     |  2 +-
 net/netfilter/nf_tables_offload.c |  5 +++--
 net/sched/cls_api.c               | 10 +++++++---
 7 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 98bf3af5c84d..e50d94736829 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -248,6 +248,10 @@ enum flow_block_binder_type {
 	FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
 };
 
+struct flow_block {
+	struct list_head cb_list;
+};
+
 struct netlink_ext_ack;
 
 struct flow_block_offload {
@@ -255,6 +259,7 @@ struct flow_block_offload {
 	enum flow_block_binder_type binder_type;
 	bool block_shared;
 	struct net *net;
+	struct flow_block *block;
 	struct list_head cb_list;
 	struct list_head *driver_block_list;
 	struct netlink_ext_ack *extack;
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 35dfdd9f69b3..00658462f89b 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -11,6 +11,7 @@
 #include <linux/rhashtable.h>
 #include <net/netfilter/nf_flow_table.h>
 #include <net/netlink.h>
+#include <net/flow_offload.h>
 
 struct module;
 
@@ -951,7 +952,7 @@ struct nft_stats {
  *	@stats: per-cpu chain stats
  *	@chain: the chain
  *	@dev_name: device name that this base chain is attached to (if any)
- *	@cb_list: list of flow block callbacks (for hardware offload)
+ *	@block: flow block (for hardware offload)
  */
 struct nft_base_chain {
 	struct nf_hook_ops		ops;
@@ -961,7 +962,7 @@ struct nft_base_chain {
 	struct nft_stats __percpu	*stats;
 	struct nft_chain		chain;
 	char 				dev_name[IFNAMSIZ];
-	struct list_head		cb_list;
+	struct flow_block		block;
 };
 
 static inline struct nft_base_chain *nft_base_chain(const struct nft_chain *chain)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9482e060483b..58041cb0ce15 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -399,7 +399,7 @@ struct tcf_block {
 	refcount_t refcnt;
 	struct net *net;
 	struct Qdisc *q;
-	struct list_head cb_list;
+	struct flow_block flow;
 	struct list_head owner_list;
 	bool keep_dst;
 	unsigned int offloadcnt; /* Number of oddloaded filters */
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index a800fa78d96c..935c7f81a9ef 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -198,7 +198,7 @@ struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
 {
 	struct flow_block_cb *block_cb;
 
-	list_for_each_entry(block_cb, f->driver_block_list, driver_list) {
+	list_for_each_entry(block_cb, &f->block->cb_list, list) {
 		if (block_cb->cb == cb &&
 		    block_cb->cb_ident == cb_ident)
 			return block_cb;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index ed17a7c29b86..c565f146435b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1662,7 +1662,7 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 
 		chain->flags |= NFT_BASE_CHAIN | flags;
 		basechain->policy = NF_ACCEPT;
-		INIT_LIST_HEAD(&basechain->cb_list);
+		INIT_LIST_HEAD(&basechain->block.cb_list);
 	} else {
 		chain = kzalloc(sizeof(*chain), GFP_KERNEL);
 		if (chain == NULL)
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 2c3302845f67..2a184277ee58 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -116,7 +116,7 @@ static int nft_setup_cb_call(struct nft_base_chain *basechain,
 	struct flow_block_cb *block_cb;
 	int err;
 
-	list_for_each_entry(block_cb, &basechain->cb_list, list) {
+	list_for_each_entry(block_cb, &basechain->block.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err < 0)
 			return err;
@@ -154,7 +154,7 @@ static int nft_flow_offload_rule(struct nft_trans *trans,
 static int nft_flow_offload_bind(struct flow_block_offload *bo,
 				 struct nft_base_chain *basechain)
 {
-	list_splice(&bo->cb_list, &basechain->cb_list);
+	list_splice(&bo->cb_list, &basechain->block.cb_list);
 	return 0;
 }
 
@@ -198,6 +198,7 @@ static int nft_flow_offload_chain(struct nft_trans *trans,
 		return -EOPNOTSUPP;
 
 	bo.command = cmd;
+	bo.block = &basechain->block;
 	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	bo.extack = &extack;
 	INIT_LIST_HEAD(&bo.cb_list);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 51fbe6e95a92..66181961ad6f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -691,6 +691,8 @@ static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
 	if (!indr_dev->block)
 		return;
 
+	bo.block = &indr_dev->block->flow;
+
 	indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
 			  &bo);
 	tcf_block_setup(indr_dev->block, &bo);
@@ -775,6 +777,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 		.command	= command,
 		.binder_type	= ei->binder_type,
 		.net		= dev_net(dev),
+		.block		= &block->flow,
 		.block_shared	= tcf_block_shared(block),
 		.extack		= extack,
 	};
@@ -810,6 +813,7 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
 	bo.net = dev_net(dev);
 	bo.command = command;
 	bo.binder_type = ei->binder_type;
+	bo.block = &block->flow;
 	bo.block_shared = tcf_block_shared(block);
 	bo.extack = extack;
 	INIT_LIST_HEAD(&bo.cb_list);
@@ -988,7 +992,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 	}
 	mutex_init(&block->lock);
 	INIT_LIST_HEAD(&block->chain_list);
-	INIT_LIST_HEAD(&block->cb_list);
+	INIT_LIST_HEAD(&block->flow.cb_list);
 	INIT_LIST_HEAD(&block->owner_list);
 	INIT_LIST_HEAD(&block->chain0.filter_chain_list);
 
@@ -1570,7 +1574,7 @@ static int tcf_block_bind(struct tcf_block *block,
 
 		i++;
 	}
-	list_splice(&bo->cb_list, &block->cb_list);
+	list_splice(&bo->cb_list, &block->flow.cb_list);
 
 	return 0;
 
@@ -3155,7 +3159,7 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 	if (block->nooffloaddevcnt && err_stop)
 		return -EOPNOTSUPP;
 
-	list_for_each_entry(block_cb, &block->cb_list, list) {
+	list_for_each_entry(block_cb, &block->flow.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err) {
 			if (err_stop)
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next 1/3] net: flow_offload: remove netns parameter from flow_block_cb_alloc()
From: Pablo Neira Ayuso @ 2019-07-11  0:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski

No need to annotate the netns on the flow block callback object,
flow_block_cb_is_busy() already checks for used blocks.

Fixes: d63db30c8537 ("net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c    | 3 +--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c      | 5 ++---
 drivers/net/ethernet/mscc/ocelot_flower.c           | 3 +--
 drivers/net/ethernet/mscc/ocelot_tc.c               | 2 +-
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 6 ++----
 include/net/flow_offload.h                          | 3 +--
 net/core/flow_offload.c                             | 9 +++------
 net/dsa/slave.c                                     | 2 +-
 8 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7245d287633d..2162412073c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -735,8 +735,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 		list_add(&indr_priv->list,
 			 &rpriv->uplink_priv.tc_indr_block_priv_list);
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       mlx5e_rep_indr_setup_block_cb,
+		block_cb = flow_block_cb_alloc(mlx5e_rep_indr_setup_block_cb,
 					       indr_priv, indr_priv,
 					       mlx5e_rep_indr_tc_block_unbind);
 		if (IS_ERR(block_cb)) {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 4d34d42b3b0e..a469035400cf 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1610,8 +1610,7 @@ mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
 		acl_block = mlxsw_sp_acl_block_create(mlxsw_sp, f->net);
 		if (!acl_block)
 			return -ENOMEM;
-		block_cb = flow_block_cb_alloc(f->net,
-					       mlxsw_sp_setup_tc_block_cb_flower,
+		block_cb = flow_block_cb_alloc(mlxsw_sp_setup_tc_block_cb_flower,
 					       mlxsw_sp, acl_block,
 					       mlxsw_sp_tc_block_flower_release);
 		if (IS_ERR(block_cb)) {
@@ -1702,7 +1701,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 					  &mlxsw_sp_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, mlxsw_sp_port,
+		block_cb = flow_block_cb_alloc(cb, mlxsw_sp_port,
 					       mlxsw_sp_port, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c
index 7aaddc09c185..6a11aea8b186 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -323,8 +323,7 @@ int ocelot_setup_tc_block_flower_bind(struct ocelot_port *port,
 		if (!port_block)
 			return -ENOMEM;
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       ocelot_setup_tc_block_cb_flower,
+		block_cb = flow_block_cb_alloc(ocelot_setup_tc_block_cb_flower,
 					       port, port_block,
 					       ocelot_tc_block_unbind);
 		if (IS_ERR(block_cb)) {
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index 9e6464ffae5d..abbcb66bf5ac 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -156,7 +156,7 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 		if (flow_block_cb_is_busy(cb, port, &ocelot_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, port, port, NULL);
+		block_cb = flow_block_cb_alloc(cb, port, port, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 7e725fa60347..a0f8892bb4b5 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1324,8 +1324,7 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 					  &nfp_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       nfp_flower_setup_tc_block_cb,
+		block_cb = flow_block_cb_alloc(nfp_flower_setup_tc_block_cb,
 					       repr, repr, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
@@ -1430,8 +1429,7 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
 		cb_priv->app = app;
 		list_add(&cb_priv->list, &priv->indr_block_cb_priv);
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       nfp_flower_setup_indr_block_cb,
+		block_cb = flow_block_cb_alloc(nfp_flower_setup_indr_block_cb,
 					       cb_priv, cb_priv,
 					       nfp_flower_setup_indr_tc_release);
 		if (IS_ERR(block_cb)) {
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index db337299e81e..aa9b5287b231 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -264,7 +264,6 @@ struct flow_block_offload {
 struct flow_block_cb {
 	struct list_head	driver_list;
 	struct list_head	list;
-	struct net		*net;
 	tc_setup_cb_t		*cb;
 	void			*cb_ident;
 	void			*cb_priv;
@@ -272,7 +271,7 @@ struct flow_block_cb {
 	unsigned int		refcnt;
 };
 
-struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv));
 void flow_block_cb_free(struct flow_block_cb *block_cb);
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 76f8db3841d7..507de4b48815 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -165,7 +165,7 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
 
-struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv))
 {
@@ -175,7 +175,6 @@ struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
 	if (!block_cb)
 		return ERR_PTR(-ENOMEM);
 
-	block_cb->net = net;
 	block_cb->cb = cb;
 	block_cb->cb_ident = cb_ident;
 	block_cb->cb_priv = cb_priv;
@@ -200,8 +199,7 @@ struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
 	struct flow_block_cb *block_cb;
 
 	list_for_each_entry(block_cb, f->driver_block_list, driver_list) {
-		if (block_cb->net == f->net &&
-		    block_cb->cb == cb &&
+		if (block_cb->cb == cb &&
 		    block_cb->cb_ident == cb_ident)
 			return block_cb;
 	}
@@ -261,8 +259,7 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 		if (flow_block_cb_is_busy(cb, cb_ident, driver_block_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, cb_ident,
-					       cb_priv, NULL);
+		block_cb = flow_block_cb_alloc(cb, cb_ident, cb_priv, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 614c38ece104..6ca9ec58f881 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -967,7 +967,7 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 		if (flow_block_cb_is_busy(cb, dev, &dsa_slave_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, dev, dev, NULL);
+		block_cb = flow_block_cb_alloc(cb, dev, dev, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
-- 
2.11.0



^ permalink raw reply related

* Re: [bpf-next v3 04/12] selftests/bpf: Use bpf_prog_test_run_xattr
From: Andrii Nakryiko @ 2019-07-11  0:03 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-5-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:43 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> The bpf_prog_test_run_xattr function gives more options to set up a
> test run of a BPF program than the bpf_prog_test_run function.
>
> We will need this extra flexibility to pass ctx data later.
>
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---

lgtm, with some nits below

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/testing/selftests/bpf/test_verifier.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index c7541f572932..1640ba9f12c1 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -822,14 +822,20 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
>  {
>         __u8 tmp[TEST_DATA_LEN << 2];
>         __u32 size_tmp = sizeof(tmp);

nit: this is now is not needed as a separate local variable, inline?

> -       uint32_t retval;
>         int saved_errno;
>         int err;
> +       struct bpf_prog_test_run_attr attr = {
> +               .prog_fd = fd_prog,
> +               .repeat = 1,
> +               .data_in = data,
> +               .data_size_in = size_data,
> +               .data_out = tmp,
> +               .data_size_out = size_tmp,
> +       };
>
>         if (unpriv)
>                 set_admin(true);
> -       err = bpf_prog_test_run(fd_prog, 1, data, size_data,
> -                               tmp, &size_tmp, &retval, NULL);
> +       err = bpf_prog_test_run_xattr(&attr);
>         saved_errno = errno;
>         if (unpriv)
>                 set_admin(false);
> @@ -846,9 +852,9 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
>                         return err;
>                 }
>         }
> -       if (retval != expected_val &&
> +       if (attr.retval != expected_val &&
>             expected_val != POINTER_VALUE) {

this if condition now fits one line, can you please combine? thanks!

> -               printf("FAIL retval %d != %d ", retval, expected_val);
> +               printf("FAIL retval %d != %d ", attr.retval, expected_val);
>                 return 1;
>         }
>
> --
> 2.20.1
>

^ permalink raw reply

* Re: [bpf-next v3 03/12] selftests/bpf: Avoid another case of errno clobbering
From: Andrii Nakryiko @ 2019-07-10 23:57 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-4-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:43 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> Commit 8184d44c9a57 ("selftests/bpf: skip verifier tests for
> unsupported program types") added a check for an unsupported program
> type. The function doing it changes errno, so test_verifier should
> save it before calling it if test_verifier wants to print a reason why
> verifying a BPF program of a supported type failed.
>
> Changes since v2:
> - Move the declaration to fit the reverse christmas tree style.
>
> Fixes: 8184d44c9a57 ("selftests/bpf: skip verifier tests for unsupported program types")
> Cc: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/testing/selftests/bpf/test_verifier.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index 3fe126e0083b..c7541f572932 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -864,6 +864,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>         int run_errs, run_successes;
>         int map_fds[MAX_NR_MAPS];
>         const char *expected_err;
> +       int saved_errno;
>         int fixup_skips;

nit: combine those ints? or even with i and err below as well?

>         __u32 pflags;
>         int i, err;
> @@ -894,6 +895,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>                 pflags |= BPF_F_ANY_ALIGNMENT;
>         fd_prog = bpf_verify_program(prog_type, prog, prog_len, pflags,
>                                      "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 4);
> +       saved_errno = errno;
>         if (fd_prog < 0 && !bpf_probe_prog_type(prog_type, 0)) {
>                 printf("SKIP (unsupported program type %d)\n", prog_type);
>                 skips++;
> @@ -910,7 +912,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>         if (expected_ret == ACCEPT) {
>                 if (fd_prog < 0) {
>                         printf("FAIL\nFailed to load prog '%s'!\n",
> -                              strerror(errno));
> +                              strerror(saved_errno));
>                         goto fail_log;
>                 }
>  #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> --
> 2.20.1
>

^ permalink raw reply

* Re: [bpf-next v3 02/12] selftests/bpf: Avoid a clobbering of errno
From: Andrii Nakryiko @ 2019-07-10 23:51 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-3-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> Save errno right after bpf_prog_test_run returns, so we later check
> the error code actually set by bpf_prog_test_run, not by some libcap
> function.
>
> Changes since v1:
> - Fix the "Fixes:" tag to mention actual commit that introduced the
>   bug
>
> Changes since v2:
> - Move the declaration so it fits the reverse christmas tree style.
>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Fixes: 832c6f2c29ec ("bpf: test make sure to run unpriv test cases in test_verifier")
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index b8d065623ead..3fe126e0083b 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -823,16 +823,18 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
>         __u8 tmp[TEST_DATA_LEN << 2];
>         __u32 size_tmp = sizeof(tmp);
>         uint32_t retval;
> +       int saved_errno;
>         int err;
>
>         if (unpriv)
>                 set_admin(true);
>         err = bpf_prog_test_run(fd_prog, 1, data, size_data,
>                                 tmp, &size_tmp, &retval, NULL);

Given err is either 0 or -1, how about instead making err useful right
here without extra variable?

if (bpf_prog_test_run(...))
        err = errno;

> +       saved_errno = errno;
>         if (unpriv)
>                 set_admin(false);
>         if (err) {
> -               switch (errno) {
> +               switch (saved_errno) {
>                 case 524/*ENOTSUPP*/:

ENOTSUPP is defined in include/linux/errno.h, is there any problem
with using this in selftests?

>                         printf("Did not run the program (not supported) ");
>                         return 0;
> --
> 2.20.1
>

^ permalink raw reply

* Re: [bpf-next v3 01/12] selftests/bpf: Print a message when tester could not run a program
From: Andrii Nakryiko @ 2019-07-10 23:44 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-2-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> This prints a message when the error is about program type being not
> supported by the test runner or because of permissions problem. This
> is to see if the program we expected to run was actually executed.
>
> The messages are open-coded because strerror(ENOTSUPP) returns
> "Unknown error 524".
>
> Changes since v2:
> - Also print "FAIL" on an unexpected bpf_prog_test_run error, so there
>   is a corresponding "FAIL" message for each failed test.
>
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index c5514daf8865..b8d065623ead 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -831,11 +831,20 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
>                                 tmp, &size_tmp, &retval, NULL);
>         if (unpriv)
>                 set_admin(false);
> -       if (err && errno != 524/*ENOTSUPP*/ && errno != EPERM) {
> -               printf("Unexpected bpf_prog_test_run error ");
> -               return err;
> +       if (err) {
> +               switch (errno) {
> +               case 524/*ENOTSUPP*/:
> +                       printf("Did not run the program (not supported) ");
> +                       return 0;
> +               case EPERM:
> +                       printf("Did not run the program (no permission) ");

Let's add "SKIP: " prefix to these?

> +                       return 0;
> +               default:
> +                       printf("FAIL: Unexpected bpf_prog_test_run error (%s) ", strerror(saved_errno));
> +                       return err;
> +               }
>         }
> -       if (!err && retval != expected_val &&
> +       if (retval != expected_val &&
>             expected_val != POINTER_VALUE) {
>                 printf("FAIL retval %d != %d ", retval, expected_val);
>                 return 1;
> --
> 2.20.1
>

^ permalink raw reply

* bpf_prog_free_deferred bug. WAS: Re: WARNING in mark_chain_precision
From: Andrii Nakryiko @ 2019-07-10 23:29 UTC (permalink / raw)
  To: Hillf Danton
  Cc: syzbot, aaron.f.brown, Alexei Starovoitov, bpf, Daniel Borkmann,
	David S. Miller, hawk, intel-wired-lan, Jakub Kicinski,
	jeffrey.t.kirsher, john fastabend, Martin Lau, open list,
	Networking, sasha.neftin, Song Liu, syzkaller-bugs, xdp-newbies,
	Yonghong Song
In-Reply-To: <000000000000a94981058d37f1a4@google.com>

On Tue, Jul 9, 2019 at 9:08 PM Hillf Danton <hdanton@sina.com> wrote:
>
>
> Mon, 08 Jul 2019 21:25:00 -0700 (PDT)
> > Hello,
> >
> > syzbot has tested the proposed patch but the reproducer still triggered
> > crash:
> > WARNING in bpf_jit_free
> >
> > WARNING: CPU: 0 PID: 9077 at kernel/bpf/core.c:851 bpf_jit_free+0x157/0x1b0
> > Kernel panic - not syncing: panic_on_warn set ...
> > CPU: 0 PID: 9077 Comm: kworker/0:3 Not tainted 5.2.0-rc6+ #1
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Workqueue: events bpf_prog_free_deferred
> > Call Trace:
> >   __dump_stack lib/dump_stack.c:77 [inline]
> >   dump_stack+0x172/0x1f0 lib/dump_stack.c:113
> >   panic+0x2cb/0x744 kernel/panic.c:219
> >   __warn.cold+0x20/0x4d kernel/panic.c:576
> >   report_bug+0x263/0x2b0 lib/bug.c:186
> >   fixup_bug arch/x86/kernel/traps.c:179 [inline]
> >   fixup_bug arch/x86/kernel/traps.c:174 [inline]
> >   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
> >   do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
> >   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:986
> > RIP: 0010:bpf_jit_free+0x157/0x1b0
> > Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 b8 00 02 00 00
> > 00 00 ad de 48 39 43 70 0f 84 05 ff ff ff e8 09 7f f4 ff <0f> 0b e9 f9 fe
> > ff ff e8 2d 02 2e 00 e9 d9 fe ff ff 48 89 7d e0 e8
> > RSP: 0018:ffff888084affcb0 EFLAGS: 00010293
> > RAX: ffff88808a622100 RBX: ffff88809639d580 RCX: ffffffff817b0b0d
> > RDX: 0000000000000000 RSI: ffffffff817c4557 RDI: ffff88809639d5f0
> > RBP: ffff888084affcd0 R08: 1ffffffff150daa8 R09: fffffbfff150daa9
> > R10: fffffbfff150daa8 R11: ffffffff8a86d547 R12: ffffc90001921000
> > R13: ffff88809639d5e8 R14: ffff8880a0589800 R15: ffff8880ae834d40
> >   bpf_prog_free_deferred+0x27a/0x350 kernel/bpf/core.c:1982
> >   process_one_work+0x989/0x1790 kernel/workqueue.c:2269
> >   worker_thread+0x98/0xe40 kernel/workqueue.c:2415
> >   kthread+0x354/0x420 kernel/kthread.c:255
> >   ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> >
> >
> > Tested on:
> >
> > commit:         b9321614 bpf: fix precision bit propagation for BPF_ST ins..
> > git tree:       https://github.com/anakryiko/linux bpf-fix-precise-bpf_st
> > console output: https://syzkaller.appspot.com/x/log.txt?x=112f0dfda00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=6bb3e6e7997c14f9
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> >
>
> 1, currently, bpf_prog_put(), the put helper, deletes all kallsyms before
> invoking the free helper, bpf_prog_free(); the latter complains if kallsyms
> are detected not all off.
>
> 2, before commit 538950a1b752 ("soreuseport: setsockopt
> SO_ATTACH_REUSEPORT_[CE]BPF"), __bpf_prog_release() already called the put
> helper or the free one for a given prog depending on its type: put for
> BPF_PROG_TYPE_SOCKET_FILTER.
>
> In the commit we can see bpf_prog_destroy(), __bpf_prog_free(),
> bpf_prog_put(), here and then; Note in __get_bpf() the put for
> !BPF_PROG_TYPE_SOCKET_FILTER.
>
> 3, in commit 113214be7f6c ("bpf: refactor bpf_prog_get and type check into
> helper") bpf_prog_get_type() was added and put in __get_bpf().
>
> In the commit we can see other types like BPF_PROG_TYPE_SCHED_ACT and
> BPF_PROG_TYPE_SCHED_CLS.
>
> 4, in commit 8217ca653ec6 ("bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog
> in reuseport selection") sk_reuseport_prog_free() was added without a word
> in the log for it.
>
> 5, enrich prog type in __bpf_prog_release().

Thanks for investigation!

I'm curious what makes BPF_PROG_TYPE_SOCKET_FILTER (and
BPF_PROG_TYPE_SK_REUSEPORT) special, compared to all the other types
of BPF programs. If it's something to do with sockets specifically,
there are a bunch of other programs that deal with sockets, so should
they be handled specially as well (e.g., BPF_PROG_TYPE_CGROUP_SOCK)?

Daniel, do you have any insight here?

>
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1142,11 +1142,15 @@ static void bpf_release_orig_filter(stru
>
>  static void __bpf_prog_release(struct bpf_prog *prog)
>  {
> -       if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER) {
> +       switch (prog->type) {
> +       case BPF_PROG_TYPE_SOCKET_FILTER:
> +       case BPF_PROG_TYPE_SK_REUSEPORT:
>                 bpf_prog_put(prog);
> -       } else {
> +               break;
> +       default:
>                 bpf_release_orig_filter(prog);
>                 bpf_prog_free(prog);
> +               break;
>         }
>  }
>
> --
>

^ permalink raw reply

* Re: [RESEND] test_verifier #13 fails on arm64: "retval 65507 != -29"
From: Andi Kleen @ 2019-07-10 23:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Paolo Pisati, Yonghong Song, Daniel Borkmann, Alexei Starovoitov,
	Martin Lau, Song Liu, bpf@vger.kernel.org, Networking
In-Reply-To: <CAEf4Bzbtpqk-9ELnHFsHo278b5T4Z-2CgNnNbOqbD5Ocbuc-fg@mail.gmail.com>

> > Reading csum_partial/csum_fold, seems like after calculation of
> > checksum (so-called unfolded checksum), it is supposed to be passed
> > into csum_fold() to convert it into 16-bit one and invert.

Yes, you always need to fold at the end.

The low level code does fold sometimes, but not always.

-Andi

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5e: Provide cb_list pointer when setting up tc block on rep
From: Pablo Neira Ayuso @ 2019-07-10 23:15 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, saeedm
In-Reply-To: <20190710182554.2988-1-vladbu@mellanox.com>

On Wed, Jul 10, 2019 at 09:25:54PM +0300, Vlad Buslov wrote:
> Recent refactoring of tc block offloads infrastructure introduced new
> flow_block_cb_setup_simple() method intended to be used as unified way for
> all drivers to register offload callbacks. However, commit that actually
> extended all users (drivers) with block cb list and provided it to
> flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer
> dereference when creating Qdisc:
> 
> [  278.385175] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  278.393233] #PF: supervisor read access in kernel mode
> [  278.399446] #PF: error_code(0x0000) - not-present page
> [  278.405847] PGD 8000000850e73067 P4D 8000000850e73067 PUD 8620cd067 PMD 0
> [  278.414141] Oops: 0000 [#1] SMP PTI
> [  278.419019] CPU: 7 PID: 3369 Comm: tc Not tainted 5.2.0-rc6+ #492
> [  278.426580] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
> [  278.435853] RIP: 0010:flow_block_cb_setup_simple+0xc4/0x190
> [  278.442953] Code: 10 48 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 05 00 01 00 00 49 89 40 08 31 c0 c3 b8 a1 ff ff ff c3 f3 c3 <48> 8b 06 48 39 c6 75 0a eb 1a 48 8b 00 48 39 c6 74 12
>  48 3b 50 28
> [  278.464829] RSP: 0018:ffffaf07c3f97990 EFLAGS: 00010246
> [  278.471648] RAX: 0000000000000000 RBX: ffff9b43ed4c7680 RCX: ffff9b43d5f80840
> [  278.480408] RDX: ffffffffc0491650 RSI: 0000000000000000 RDI: ffffaf07c3f97998
> [  278.489110] RBP: ffff9b43ddff9000 R08: ffff9b43d5f80840 R09: 0000000000000001
> [  278.497838] R10: 0000000000000009 R11: 00000000000003ad R12: ffffaf07c3f97c08
> [  278.506595] R13: ffff9b43d5f80000 R14: ffff9b43ed4c7680 R15: ffff9b43dfa20b40
> [  278.515374] FS:  00007f796be1b400(0000) GS:ffff9b43ef840000(0000) knlGS:0000000000000000
> [  278.525099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  278.532453] CR2: 0000000000000000 CR3: 0000000840398002 CR4: 00000000001606e0
> [  278.541197] Call Trace:
> [  278.545252]  tcf_block_offload_cmd.isra.52+0x7e/0xb0
> [  278.551871]  tcf_block_get_ext+0x365/0x3e0
> [  278.557569]  qdisc_create+0x15c/0x4e0
> [  278.562859]  ? kmem_cache_alloc_trace+0x1a2/0x1c0
> [  278.569235]  tc_modify_qdisc+0x1c8/0x780
> [  278.574761]  rtnetlink_rcv_msg+0x291/0x340
> [  278.580518]  ? _cond_resched+0x15/0x40
> [  278.585856]  ? rtnl_calcit.isra.29+0x120/0x120
> [  278.591868]  netlink_rcv_skb+0x4a/0x110
> [  278.597198]  netlink_unicast+0x1a0/0x250
> [  278.602601]  netlink_sendmsg+0x2c1/0x3c0
> [  278.608022]  sock_sendmsg+0x5b/0x60
> [  278.612969]  ___sys_sendmsg+0x289/0x310
> [  278.618231]  ? do_wp_page+0x99/0x730
> [  278.623216]  ? page_add_new_anon_rmap+0xbe/0x140
> [  278.629298]  ? __handle_mm_fault+0xc84/0x1360
> [  278.635113]  ? __sys_sendmsg+0x5e/0xa0
> [  278.640285]  __sys_sendmsg+0x5e/0xa0
> [  278.645239]  do_syscall_64+0x5b/0x1b0
> [  278.650274]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  278.656697] RIP: 0033:0x7f796abdeb87
> [  278.661628] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53
>  48 89 f3 48
> [  278.683248] RSP: 002b:00007ffde213ba48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [  278.692245] RAX: ffffffffffffffda RBX: 000000005d261e6f RCX: 00007f796abdeb87
> [  278.700862] RDX: 0000000000000000 RSI: 00007ffde213bab0 RDI: 0000000000000003
> [  278.709527] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
> [  278.718167] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
> [  278.726743] R13: 000000000067b580 R14: 0000000000000000 R15: 0000000000000000
> [  278.735302] Modules linked in: dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc mlx5_ib ib_uverbs intel_rapl ib_core sb_edac x86_pkg_temp_
> thermal intel_powerclamp coretemp kvm_intel kvm mlx5_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel ses mei_me enclosure mlxfw ipmi_ssif intel_cstate iTCO_wdt ptp mei
> pps_core iTCO_vendor_support pcspkr joydev intel_uncore i2c_i801 ipmi_si lpc_ich intel_rapl_perf ioatdma wmi dca pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_k
> ms_helper ttm drm mpt3sas raid_class scsi_transport_sas
> [  278.802263] CR2: 0000000000000000
> [  278.807170] ---[ end trace b1f0a442a279e66f ]---
> 
> Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to
> flow_block_cb_setup_simple() function instead of hardcoded NULL pointer.
> 
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

There's a similar patch from wenxu BTW.

^ permalink raw reply

* Re: [PATCH net-next] net/sched: Fix kernel NULL pointer dereference
From: Pablo Neira Ayuso @ 2019-07-10 23:13 UTC (permalink / raw)
  To: wenxu; +Cc: davem, netfilter-devel, netdev
In-Reply-To: <20190710220337.tbflwdku4332ewo5@salvia>

On Thu, Jul 11, 2019 at 12:03:37AM +0200, Pablo Neira Ayuso wrote:
> On Wed, Jul 10, 2019 at 09:45:04PM +0800, wenxu@ucloud.cn wrote:
> > From: wenxu <wenxu@ucloud.cn>
> > 
> > [  697.665184] BUG: kernel NULL pointer dereference, address: 0000000000000030
> > [  697.665550] #PF: supervisor read access in kernel mode
> > [  697.665906] #PF: error_code(0x0000) - not-present page
> > [  697.666297] PGD 800000104e636067 P4D 800000104e636067 PUD ff4b02067 PMD 0
> > [  697.666710] Oops: 0000 [#1] SMP PTI
> > [  697.667115] CPU: 31 PID: 24466 Comm: modprobe Kdump: loaded Tainted: G           O      5.2.0-rc6+ #1
> > [  697.667867] Hardware name: Huawei Technologies Co., Ltd. RH1288 V3/BC11HGSC0, BIOS 3.57 02/26/2017
> > [  697.668620] RIP: 0010:tc_indr_block_ing_cmd.isra.52+0x4c/0xb0
> > [  697.669029] Code: 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 f3 48 ab 48 8b 06 49 8b b3 e8 04 00 00 44 89 45 b0 c7 45 b4 01 00 00 00 <8b> 48 30 48 89 75 c0 85 c9 48 8d 4d b0 0f 95 45 b8 48 85 c0 4c 8d
> > [  697.670132] RSP: 0018:ffffc90007bf7958 EFLAGS: 00010246
> > [  697.670537] RAX: 0000000000000000 RBX: ffff88905e2cbae8 RCX: 0000000000000000
> > [  697.670938] RDX: ffff88905e2cbcd8 RSI: ffffffff823a8480 RDI: ffffc90007bf7990
> > [  697.671352] RBP: ffffc90007bf79a8 R08: 0000000000000000 R09: ffff88905e2cbcc0
> > [  697.671761] R10: ffff888107c07780 R11: ffff88902c249000 R12: ffff88905e2cbcd0
> > [  697.672173] R13: ffff88905e2cbac0 R14: ffff88885596bc00 R15: ffff88905e2cbcc0
> > [  697.672582] FS:  00007fe0b4095740(0000) GS:ffff88905fbc0000(0000) knlGS:0000000000000000
> > [  697.673335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  697.673746] CR2: 0000000000000030 CR3: 0000000ff46b4005 CR4: 00000000001606e0
> > [  697.674156] Call Trace:
> > [  697.674563]  __tc_indr_block_cb_register+0x11e/0x3c0
> > [  697.674998]  mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core]
> > [  697.675411]  notifier_call_chain+0x53/0xa0
> > [  697.675812]  raw_notifier_call_chain+0x16/0x20
> > [  697.676223]  call_netdevice_notifiers_info+0x2d/0x60
> > [  697.676633]  register_netdevice+0x3fa/0x500
> > 
> > get indr_dev->block after check it.
> > 
> > Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> > Signed-off-by: wenxu <wenxu@ucloud.cn>
> 
> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

Please, toss this patch.

Vlad's patch provides a more complete fix.

^ permalink raw reply

* Re: [RESEND] test_verifier #13 fails on arm64: "retval 65507 != -29"
From: Andrii Nakryiko @ 2019-07-10 22:54 UTC (permalink / raw)
  To: Paolo Pisati
  Cc: Yonghong Song, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Song Liu, bpf@vger.kernel.org, Networking, ak
In-Reply-To: <CAEf4BzayQ+bEKFHcs8cUDcVnwPpQ2_2gzPaxX-j38r=AWDzVvg@mail.gmail.com>

On Wed, Jul 10, 2019 at 3:52 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> cc Andi Kleen re: refactoring, do you have any insight here?

OK, let's try again...

>
> On Wed, Jul 10, 2019 at 3:12 AM Paolo Pisati <p.pisati@gmail.com> wrote:
> >
> > On Mon, Jul 01, 2019 at 09:51:25PM +0000, Yonghong Song wrote:
> > >
> > > Below is the test case.
> > > {
> > >          "valid read map access into a read-only array 2",
> > >          .insns = {
> > >          BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > >          BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > >          BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > >          BPF_LD_MAP_FD(BPF_REG_1, 0),
> > >          BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > > BPF_FUNC_map_lookup_elem),
> > >          BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
> > >
> > >          BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> > >          BPF_MOV64_IMM(BPF_REG_2, 4),
> > >          BPF_MOV64_IMM(BPF_REG_3, 0),
> > >          BPF_MOV64_IMM(BPF_REG_4, 0),
> > >          BPF_MOV64_IMM(BPF_REG_5, 0),
> > >          BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > >                       BPF_FUNC_csum_diff),
> > >          BPF_EXIT_INSN(),
> > >          },
> > >          .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> > >          .fixup_map_array_ro = { 3 },
> > >          .result = ACCEPT,
> > >          .retval = -29,
> > > },
> > >
> > > The issue may be with helper bpf_csum_diff().
> > > Maybe you can check bpf_csum_diff() helper return value
> > > to confirm and take a further look at bpf_csum_diff implementations
> > > between x64 and amd64.
> >
> > Indeed, the different result comes from csum_partial() or, more precisely,
> > do_csum().
>
> I assume this is checksum used for ipv4 header checksum ([0]), right?
> It's defined as "16-bit one's complement of the one's complement sum
> of all 16-bit words", so at the end it has to be folded into 16-bit
> anyways.
>
> Reading csum_partial/csum_fold, seems like after calculation of
> checksum (so-called unfolded checksum), it is supposed to be passed
> into csum_fold() to convert it into 16-bit one and invert.
>
> So maybe that's what is missing from bpf_csum_diff()? Though I've
> never used it and don't even know exactly what is its purpose, so
> might be (and probably am) totally wrong here (e.g., it might be that
> BPF app is supposed to do that or something).
>
>   [0] https://en.wikipedia.org/wiki/IPv4_header_checksum
>
> >
> > x86-64 uses an asm optimized version residing in arch/x86/lib/csum-partial_64.c,
> > while the generic version is in lib/checksum.c.
> >
> > I replaced the x86-64 csum_partial() / do_csum() code, with the one in
> > lib/checksum.c and by doing so i reproduced the same error on x86-64 (thus, it's
> > not an arch dependent issue).
> >
> > I added some debugging to bpf_csum_diff(), and here are the results with different
> > checksum implementation code:
> >
> > http://paste.debian.net/1091037/
> >
> > lib/checksum.c:
> > ...
> > [  206.084537] ____bpf_csum_diff from_size: 1 to_size: 0
> > [  206.085274] ____bpf_csum_diff from[0]: 28
> > [  206.085276] ____bpf_csum_diff diff[0]: 4294967267
> > [  206.085277] ____bpf_csum_diff diff_size: 4 seed: 0
> >
> > After csum_partial() call:
> >
> > [  206.086059] ____bpf_csum_diff csum: 65507 - 0xffe3
> >
> > arch/x86/lib/csum-partial_64.c
> > ...
> > [   40.467308] ____bpf_csum_diff from_size: 1 to_size: 0
> > [   40.468141] ____bpf_csum_diff from[0]: 28
> > [   40.468143] ____bpf_csum_diff diff[0]: 4294967267
> > [   40.468144] ____bpf_csum_diff diff_size: 4 seed: 0
> >
> > After csum_partial() call:
> >
> > [   40.468937] ____bpf_csum_diff csum: -29 - 0xffffffe3
> >
> > One thing that i noticed, x86-64 csum-partial_64.c::do_csum() doesn't reduce the
> > calculated checksum to 16bit before returning it (unless the input value is
> > odd - *):
> >
> > arch/x86/lib/csum-partial_64.c::do_csum()
> >                 ...
> >         if (unlikely(odd)) {
> >                 result = from32to16(result);
> >                 result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
> >         }
> >         return result;
> > }
> >
> > contrary to all the other do_csum() implementations (that i could understand):
> >
> > lib/checksum.c::do_csum()
> > arch/alpha/lib/checksum.c::do_csum()
> > arch/parisc/lib/checksum.c::do_csum()
> >
> > Apparently even ia64 does the folding (arch/ia64/lib/do_csum.S see a comment right
> > before .do_csum_exit:), and arch/c6x/lib/csum_64plus.S too (see
> > arch/c6x/lib/csum_64plus.S).
> >
> > Funnily enough, if i change do_csum() for x86-64, folding the
> > checksum to 16 bit (following all the other implementations):
> >
> > --- a/arch/x86/lib/csum-partial_64.c
> > +++ b/arch/x86/lib/csum-partial_64.c
> > @@ -112,8 +112,8 @@ static unsigned do_csum(const unsigned char *buff, unsigned
> > len)
> >         if (len & 1)
> >                 result += *buff;
> >         result = add32_with_carry(result>>32, result & 0xffffffff);
> > +       result = from32to16(result);
> >         if (unlikely(odd)) {
> > -               result = from32to16(result);
> >                 result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
> >         }
> >         return result;
> >
> > then, the x86-64 result match the others: 65507 or 0xffe3.
> >
> > As a last attempt, i tried running the bpf test_verifier on an armhf platform,
> > and i got a completely different number:
> >
> > [   57.667999] ____bpf_csum_diff from_size: 1 to_size: 0
> > [   57.668016] ____bpf_csum_diff from[0]: 28
> > [   57.668028] ____bpf_csum_diff diff[0]: 4294967267
> > [   57.668039] ____bpf_csum_diff diff_size: 4 seed: 0
> >
> > After csum_partial() call:
> >
> > [   57.668052] ____bpf_csum_diff::2002 csum: 131042 - 0x0001ffe2
> >
> > Not sure what to make of these number, but i have a question: whats is the
> > correct checksum of the memory chunk passed to csum_partial()? Is it really -29?
> >
> > Because, at least 2 other implementations i tested (the arm assembly code and
> > the c implementation in lib/checksum.c) computes a different value, so either
> > there's a bug in checksum calcution (2 out of 3???), or we are interpreting the
> > returned value from csum_partial() somehow wrongly.
> >
> > *: originally, the x86-64 did the 16bit folding, but the logic was changed to
> > what we have today during a big rewrite - search for:
> >
> > commit 3ef076bb685a461bbaff37a1f06010fc4d7ce733
> > Author: Andi Kleen <ak@suse.de>
> > Date:   Fri Jun 13 04:27:34 2003 -0700
> >
> >     [PATCH] x86-64 merge
> >
> > in this historic repo:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> > --
> > bye,
> > p.

^ permalink raw reply

* Re: [RESEND] test_verifier #13 fails on arm64: "retval 65507 != -29"
From: Andrii Nakryiko @ 2019-07-10 22:52 UTC (permalink / raw)
  To: Paolo Pisati
  Cc: Yonghong Song, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Song Liu, bpf@vger.kernel.org, Networking, ak
In-Reply-To: <20190710100248.GA32281@harukaze>

cc Andi Kleen re: refactoring, do you have any insight here?

On Wed, Jul 10, 2019 at 3:12 AM Paolo Pisati <p.pisati@gmail.com> wrote:
>
> On Mon, Jul 01, 2019 at 09:51:25PM +0000, Yonghong Song wrote:
> >
> > Below is the test case.
> > {
> >          "valid read map access into a read-only array 2",
> >          .insns = {
> >          BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> >          BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> >          BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> >          BPF_LD_MAP_FD(BPF_REG_1, 0),
> >          BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > BPF_FUNC_map_lookup_elem),
> >          BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
> >
> >          BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> >          BPF_MOV64_IMM(BPF_REG_2, 4),
> >          BPF_MOV64_IMM(BPF_REG_3, 0),
> >          BPF_MOV64_IMM(BPF_REG_4, 0),
> >          BPF_MOV64_IMM(BPF_REG_5, 0),
> >          BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> >                       BPF_FUNC_csum_diff),
> >          BPF_EXIT_INSN(),
> >          },
> >          .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> >          .fixup_map_array_ro = { 3 },
> >          .result = ACCEPT,
> >          .retval = -29,
> > },
> >
> > The issue may be with helper bpf_csum_diff().
> > Maybe you can check bpf_csum_diff() helper return value
> > to confirm and take a further look at bpf_csum_diff implementations
> > between x64 and amd64.
>
> Indeed, the different result comes from csum_partial() or, more precisely,
> do_csum().

I assume this is checksum used for ipv4 header checksum ([0]), right?
It's defined as "16-bit one's complement of the one's complement sum
of all 16-bit words", so at the end it has to be folded into 16-bit
anyways.

Reading csum_partial/csum_fold, seems like after calculation of
checksum (so-called unfolded checksum), it is supposed to be passed
into csum_fold() to convert it into 16-bit one and invert.

So maybe that's what is missing from bpf_csum_diff()? Though I've
never used it and don't even know exactly what is its purpose, so
might be (and probably am) totally wrong here (e.g., it might be that
BPF app is supposed to do that or something).

  [0] https://en.wikipedia.org/wiki/IPv4_header_checksum

>
> x86-64 uses an asm optimized version residing in arch/x86/lib/csum-partial_64.c,
> while the generic version is in lib/checksum.c.
>
> I replaced the x86-64 csum_partial() / do_csum() code, with the one in
> lib/checksum.c and by doing so i reproduced the same error on x86-64 (thus, it's
> not an arch dependent issue).
>
> I added some debugging to bpf_csum_diff(), and here are the results with different
> checksum implementation code:
>
> http://paste.debian.net/1091037/
>
> lib/checksum.c:
> ...
> [  206.084537] ____bpf_csum_diff from_size: 1 to_size: 0
> [  206.085274] ____bpf_csum_diff from[0]: 28
> [  206.085276] ____bpf_csum_diff diff[0]: 4294967267
> [  206.085277] ____bpf_csum_diff diff_size: 4 seed: 0
>
> After csum_partial() call:
>
> [  206.086059] ____bpf_csum_diff csum: 65507 - 0xffe3
>
> arch/x86/lib/csum-partial_64.c
> ...
> [   40.467308] ____bpf_csum_diff from_size: 1 to_size: 0
> [   40.468141] ____bpf_csum_diff from[0]: 28
> [   40.468143] ____bpf_csum_diff diff[0]: 4294967267
> [   40.468144] ____bpf_csum_diff diff_size: 4 seed: 0
>
> After csum_partial() call:
>
> [   40.468937] ____bpf_csum_diff csum: -29 - 0xffffffe3
>
> One thing that i noticed, x86-64 csum-partial_64.c::do_csum() doesn't reduce the
> calculated checksum to 16bit before returning it (unless the input value is
> odd - *):
>
> arch/x86/lib/csum-partial_64.c::do_csum()
>                 ...
>         if (unlikely(odd)) {
>                 result = from32to16(result);
>                 result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
>         }
>         return result;
> }
>
> contrary to all the other do_csum() implementations (that i could understand):
>
> lib/checksum.c::do_csum()
> arch/alpha/lib/checksum.c::do_csum()
> arch/parisc/lib/checksum.c::do_csum()
>
> Apparently even ia64 does the folding (arch/ia64/lib/do_csum.S see a comment right
> before .do_csum_exit:), and arch/c6x/lib/csum_64plus.S too (see
> arch/c6x/lib/csum_64plus.S).
>
> Funnily enough, if i change do_csum() for x86-64, folding the
> checksum to 16 bit (following all the other implementations):
>
> --- a/arch/x86/lib/csum-partial_64.c
> +++ b/arch/x86/lib/csum-partial_64.c
> @@ -112,8 +112,8 @@ static unsigned do_csum(const unsigned char *buff, unsigned
> len)
>         if (len & 1)
>                 result += *buff;
>         result = add32_with_carry(result>>32, result & 0xffffffff);
> +       result = from32to16(result);
>         if (unlikely(odd)) {
> -               result = from32to16(result);
>                 result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
>         }
>         return result;
>
> then, the x86-64 result match the others: 65507 or 0xffe3.
>
> As a last attempt, i tried running the bpf test_verifier on an armhf platform,
> and i got a completely different number:
>
> [   57.667999] ____bpf_csum_diff from_size: 1 to_size: 0
> [   57.668016] ____bpf_csum_diff from[0]: 28
> [   57.668028] ____bpf_csum_diff diff[0]: 4294967267
> [   57.668039] ____bpf_csum_diff diff_size: 4 seed: 0
>
> After csum_partial() call:
>
> [   57.668052] ____bpf_csum_diff::2002 csum: 131042 - 0x0001ffe2
>
> Not sure what to make of these number, but i have a question: whats is the
> correct checksum of the memory chunk passed to csum_partial()? Is it really -29?
>
> Because, at least 2 other implementations i tested (the arm assembly code and
> the c implementation in lib/checksum.c) computes a different value, so either
> there's a bug in checksum calcution (2 out of 3???), or we are interpreting the
> returned value from csum_partial() somehow wrongly.
>
> *: originally, the x86-64 did the 16bit folding, but the logic was changed to
> what we have today during a big rewrite - search for:
>
> commit 3ef076bb685a461bbaff37a1f06010fc4d7ce733
> Author: Andi Kleen <ak@suse.de>
> Date:   Fri Jun 13 04:27:34 2003 -0700
>
>     [PATCH] x86-64 merge
>
> in this historic repo:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> --
> bye,
> p.

^ permalink raw reply

* Re: [PATCH net] net: fix use-after-free in __netif_receive_skb_core
From: Sabrina Dubroca @ 2019-07-10 22:47 UTC (permalink / raw)
  To: Edward Cree; +Cc: netdev, Andreas Steinmetz
In-Reply-To: <62ad16f6-c33a-407e-2f55-9be382b7ec52@solarflare.com>

2019-07-10, 16:07:43 +0100, Edward Cree wrote:
> On 10/07/2019 14:52, Sabrina Dubroca wrote:
> > -static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
> > +static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
> >  				    struct packet_type **ppt_prev)
> >  {
> >  	struct packet_type *ptype, *pt_prev;
> >  	rx_handler_func_t *rx_handler;
> > +	struct sk_buff *skb = *pskb;
> Would it not be simpler just to change all users of skb to *pskb?
> Then you avoid having to keep doing "*pskb = skb;" whenever skb changes
>  (with concomitant risk of bugs if one gets missed).

Yes, that would be less risky. I wrote a version of the patch that did
exactly that, but found it really too ugly (both the patch and the
resulting code). We have more than 50 occurences of skb, including
things like:

    atomic_long_inc(&skb->dev->rx_dropped);

-- 
Sabrina

^ permalink raw reply

* Re: Fw: [Bug 204099] New: systemd-networkd fails on 5.2 - same version works on 5.1.16
From: David Ahern @ 2019-07-10 22:43 UTC (permalink / raw)
  To: Stephen Hemminger, netdev
In-Reply-To: <20190709074344.76049d02@hermes.lan>

On 7/9/19 8:43 AM, Stephen Hemminger wrote:
> Looks like the stricter netlink validation broke userspace.
> This is bad.

I believe other reports have traced this to

commit 7dc2bccab0ee37ac28096b8fcdc390a679a15841
Author: Maxim Mikityanskiy <maximmi@mellanox.com>
Date:   Tue May 21 06:40:04 2019 +0000

    Validate required parameters in inet6_validate_link_af

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox