Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net/mlx5e: Provide cb_list pointer when setting up tc block on rep
From: wenxu @ 2019-07-11  0:35 UTC (permalink / raw)
  To: Vlad Buslov, netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, pablo, saeedm
In-Reply-To: <20190710182554.2988-1-vladbu@mellanox.com>


在 2019/7/11 2:25, Vlad Buslov 写道:
> Recent refactoring of tc block offloads infrastructure introduced new
> flow_block_cb_setup_simple() method intended to be used as unified way for
> all drivers to register offload callbacks. However, commit that actually
> extended all users (drivers) with block cb list and provided it to
> flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer
> dereference when creating Qdisc:
>
> [  278.385175] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  278.393233] #PF: supervisor read access in kernel mode
> [  278.399446] #PF: error_code(0x0000) - not-present page
> [  278.405847] PGD 8000000850e73067 P4D 8000000850e73067 PUD 8620cd067 PMD 0
> [  278.414141] Oops: 0000 [#1] SMP PTI
> [  278.419019] CPU: 7 PID: 3369 Comm: tc Not tainted 5.2.0-rc6+ #492
> [  278.426580] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
> [  278.435853] RIP: 0010:flow_block_cb_setup_simple+0xc4/0x190
> [  278.442953] Code: 10 48 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 05 00 01 00 00 49 89 40 08 31 c0 c3 b8 a1 ff ff ff c3 f3 c3 <48> 8b 06 48 39 c6 75 0a eb 1a 48 8b 00 48 39 c6 74 12
>  48 3b 50 28
> [  278.464829] RSP: 0018:ffffaf07c3f97990 EFLAGS: 00010246
> [  278.471648] RAX: 0000000000000000 RBX: ffff9b43ed4c7680 RCX: ffff9b43d5f80840
> [  278.480408] RDX: ffffffffc0491650 RSI: 0000000000000000 RDI: ffffaf07c3f97998
> [  278.489110] RBP: ffff9b43ddff9000 R08: ffff9b43d5f80840 R09: 0000000000000001
> [  278.497838] R10: 0000000000000009 R11: 00000000000003ad R12: ffffaf07c3f97c08
> [  278.506595] R13: ffff9b43d5f80000 R14: ffff9b43ed4c7680 R15: ffff9b43dfa20b40
> [  278.515374] FS:  00007f796be1b400(0000) GS:ffff9b43ef840000(0000) knlGS:0000000000000000
> [  278.525099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  278.532453] CR2: 0000000000000000 CR3: 0000000840398002 CR4: 00000000001606e0
> [  278.541197] Call Trace:
> [  278.545252]  tcf_block_offload_cmd.isra.52+0x7e/0xb0
> [  278.551871]  tcf_block_get_ext+0x365/0x3e0
> [  278.557569]  qdisc_create+0x15c/0x4e0
> [  278.562859]  ? kmem_cache_alloc_trace+0x1a2/0x1c0
> [  278.569235]  tc_modify_qdisc+0x1c8/0x780
> [  278.574761]  rtnetlink_rcv_msg+0x291/0x340
> [  278.580518]  ? _cond_resched+0x15/0x40
> [  278.585856]  ? rtnl_calcit.isra.29+0x120/0x120
> [  278.591868]  netlink_rcv_skb+0x4a/0x110
> [  278.597198]  netlink_unicast+0x1a0/0x250
> [  278.602601]  netlink_sendmsg+0x2c1/0x3c0
> [  278.608022]  sock_sendmsg+0x5b/0x60
> [  278.612969]  ___sys_sendmsg+0x289/0x310
> [  278.618231]  ? do_wp_page+0x99/0x730
> [  278.623216]  ? page_add_new_anon_rmap+0xbe/0x140
> [  278.629298]  ? __handle_mm_fault+0xc84/0x1360
> [  278.635113]  ? __sys_sendmsg+0x5e/0xa0
> [  278.640285]  __sys_sendmsg+0x5e/0xa0
> [  278.645239]  do_syscall_64+0x5b/0x1b0
> [  278.650274]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  278.656697] RIP: 0033:0x7f796abdeb87
> [  278.661628] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53
>  48 89 f3 48
> [  278.683248] RSP: 002b:00007ffde213ba48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [  278.692245] RAX: ffffffffffffffda RBX: 000000005d261e6f RCX: 00007f796abdeb87
> [  278.700862] RDX: 0000000000000000 RSI: 00007ffde213bab0 RDI: 0000000000000003
> [  278.709527] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
> [  278.718167] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
> [  278.726743] R13: 000000000067b580 R14: 0000000000000000 R15: 0000000000000000
> [  278.735302] Modules linked in: dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc mlx5_ib ib_uverbs intel_rapl ib_core sb_edac x86_pkg_temp_
> thermal intel_powerclamp coretemp kvm_intel kvm mlx5_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel ses mei_me enclosure mlxfw ipmi_ssif intel_cstate iTCO_wdt ptp mei
> pps_core iTCO_vendor_support pcspkr joydev intel_uncore i2c_i801 ipmi_si lpc_ich intel_rapl_perf ioatdma wmi dca pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_k
> ms_helper ttm drm mpt3sas raid_class scsi_transport_sas
> [  278.802263] CR2: 0000000000000000
> [  278.807170] ---[ end trace b1f0a442a279e66f ]---
>
> Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to
> flow_block_cb_setup_simple() function instead of hardcoded NULL pointer.
>
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 10ef90a7bddd..7245d287633d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -1175,6 +1175,8 @@ static int mlx5e_rep_setup_tc_cb(enum tc_setup_type type, void *type_data,
>  	}
>  }
>  
> +static LIST_HEAD(mlx5e_rep_block_cb_list);
> +

I think it is not necessary needs a extra LIST_HEAD, the early mlx5e_block_cb_list is ok

The early patch  http://patchwork.ozlabs.org/patch/1130439/ is enough.

>  static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  			      void *type_data)
>  {
> @@ -1182,7 +1184,8 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  
>  	switch (type) {
>  	case TC_SETUP_BLOCK:
> -		return flow_block_cb_setup_simple(type_data, NULL,
> +		return flow_block_cb_setup_simple(type_data,
> +						  &mlx5e_rep_block_cb_list,
>  						  mlx5e_rep_setup_tc_cb,
>  						  priv, priv, true);
>  	default:

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  0:29 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <f6bc7a95-e8e1-eec4-9728-3b9e36b434fa@fb.com>

On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> > BTF verifier has Different logic depending on whether we are following
> > a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> > stop early in DFS traversal while resolving BTF types. But it also
> > results in a size resolution bug, when there is a chain, e.g., of PTR ->
> > TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> > size won't be resolved, as it is considered to be a sink for pointer,
> > leading to TYPEDEF being in RESOLVED state with zero size, which is
> > completely wrong.
> >
> > Optimization is doubtful, though, as btf_check_all_types() will iterate
> > over all BTF types anyways, so the only saving is a potentially slightly
> > shorter stack. But correctness is more important that tiny savings.
> >
> > This bug manifests itself in rejecting BTF-defined maps that use array
> > typedef as a value type:
> >
> > typedef int array_t[16];
> >
> > struct {
> >       __uint(type, BPF_MAP_TYPE_ARRAY);
> >       __type(value, array_t); /* i.e., array_t *value; */
> > } test_map SEC(".maps");
> >
> > Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> > Cc: Martin KaFai Lau <kafai@fb.com>
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>
> The change seems okay to me. Currently, looks like intermediate
> modifier type will carry size = 0 (in the internal data structure).

Yes, which is totally wrong, especially that we use that size in some
cases to reject map with specified BTF.

>
> If we remove RESOLVE logic, we probably want to double check
> whether we handle circular types correctly or not. Maybe we will
> be okay if all self tests pass.

I checked, it does. We'll attempt to add referenced type unless it's a
"resolve sink" (where size is immediately known) or is already
resolved (it's state is RESOLVED). In other cases, we'll attempt to
env_stack_push(), which check that the state of that type is
NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
type is added into the stack, it's resolve state goes from NOT_VISITED
to VISITED.

So, if there is a loop, then we'll detect it as soon as we'll attempt
to add the same type onto the stack second time.

>
> I may still be worthwhile to qualify the RESOLVE optimization benefit
> before removing it.

I don't think there is any, because every type will be visited exactly
once, due to DFS nature of algorithm. The only difference is that if
we have a long chain of modifiers, we can technically reach the max
limit and fail. But at 32 I think it's pretty unrealistic to have such
a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)

>
> Another possible change is, for external usage, removing
> modifiers, before checking the size, something like below.
> Note that I am not strongly advocating my below patch as
> it has the same shortcoming that maintained modifier type
> size may not be correct.

I don't think your patch helps, it can actually confuse things even
more. It skips modifiers until underlying type is found, but you still
don't guarantee that at that time that underlying type will have its
size resolved.

>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 546ebee39e2a..6f927c3e0a89 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> btf_type *t)
>          return true;
>   }
>
> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> +                                                u32 *type_id, u32
> *ret_size,
> +                                                bool skip_modifier)
> +{
> +       const struct btf_type *size_type;
> +       u32 size_type_id = *type_id;
> +       u32 size = 0;
> +
> +       size_type = btf_type_by_id(btf, size_type_id);
> +       if (size_type && skip_modifier) {
> +               while (btf_type_is_modifier(size_type))
> +                       size_type = btf_type_by_id(btf, size_type->type);
> +       }
> +
> +       if (btf_type_nosize_or_null(size_type))
> +               return NULL;
> +
> +       if (btf_type_has_size(size_type)) {
> +               size = size_type->size;
> +       } else if (btf_type_is_array(size_type)) {
> +               size = btf->resolved_sizes[size_type_id];
> +       } else if (btf_type_is_ptr(size_type)) {
> +               size = sizeof(void *);
> +       } else {
> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> +                                !btf_type_is_var(size_type)))
> +                       return NULL;
> +
> +               size = btf->resolved_sizes[size_type_id];
> +               size_type_id = btf->resolved_ids[size_type_id];
> +               size_type = btf_type_by_id(btf, size_type_id);
> +               if (btf_type_nosize_or_null(size_type))
> +                       return NULL;
> +       }
> +
> +       *type_id = size_type_id;
> +       if (ret_size)
> +               *ret_size = size;
> +
> +       return size_type;
> +}
> +
> +const struct btf_type *btf_type_id_size(const struct btf *btf,
> +                                       u32 *type_id, u32 *ret_size)
> +{
> +       return __btf_type_id_size(btf, type_id, ret_size, true);
> +}
> +
>   /*
>    * Check that given struct member is a regular int with expected
>    * offset and size.
> @@ -633,7 +681,7 @@ bool btf_member_is_reg_int(const struct btf *btf,
> const struct btf_type *s,
>          u8 nr_bits;
>
>          id = m->type;
> -       t = btf_type_id_size(btf, &id, NULL);
> +       t = __btf_type_id_size(btf, &id, NULL, false);
>          if (!t || !btf_type_is_int(t))
>                  return false;
>
> @@ -1051,42 +1099,6 @@ static const struct btf_type
> *btf_type_id_resolve(const struct btf *btf,
>          return btf_type_by_id(btf, *type_id);
>   }
>
> -const struct btf_type *btf_type_id_size(const struct btf *btf,
> -                                       u32 *type_id, u32 *ret_size)
> -{
> -       const struct btf_type *size_type;
> -       u32 size_type_id = *type_id;
> -       u32 size = 0;
> -
> -       size_type = btf_type_by_id(btf, size_type_id);
> -       if (btf_type_nosize_or_null(size_type))
> -               return NULL;
> -
> -       if (btf_type_has_size(size_type)) {
> -               size = size_type->size;
> -       } else if (btf_type_is_array(size_type)) {
> -               size = btf->resolved_sizes[size_type_id];
> -       } else if (btf_type_is_ptr(size_type)) {
> -               size = sizeof(void *);
> -       } else {
> -               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> -                                !btf_type_is_var(size_type)))
> -                       return NULL;
> -
> -               size = btf->resolved_sizes[size_type_id];
> -               size_type_id = btf->resolved_ids[size_type_id];
> -               size_type = btf_type_by_id(btf, size_type_id);
> -               if (btf_type_nosize_or_null(size_type))
> -                       return NULL;
> -       }
> -
> -       *type_id = size_type_id;
> -       if (ret_size)
> -               *ret_size = size;
> -
> -       return size_type;
> -}
> -
>   static int btf_df_check_member(struct btf_verifier_env *env,
>                                 const struct btf_type *struct_type,
>                                 const struct btf_member *member,
> @@ -1489,7 +1501,7 @@ static int btf_modifier_check_member(struct
> btf_verifier_env *env,
>          struct btf_member resolved_member;
>          struct btf *btf = env->btf;
>
> -       resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
> +       resolved_type = __btf_type_id_size(btf, &resolved_type_id, NULL,
> false);
>          if (!resolved_type) {
>                  btf_verifier_log_member(env, struct_type, member,
>                                          "Invalid member");
> @@ -1514,7 +1526,7 @@ static int btf_modifier_check_kflag_member(struct
> btf_verifier_env *env,
>          struct btf_member resolved_member;
>          struct btf *btf = env->btf;
>
> -       resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
> +       resolved_type = __btf_type_id_size(btf, &resolved_type_id, NULL,
> false);
>          if (!resolved_type) {
>                  btf_verifier_log_member(env, struct_type, member,
>                                          "Invalid member");
> @@ -1620,7 +1632,7 @@ static int btf_modifier_resolve(struct
> btf_verifier_env *env,
>           * save us a few type-following when we use it later (e.g. in
>           * pretty print).
>           */
> -       if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
> +       if (!__btf_type_id_size(btf, &next_type_id, &next_type_size,
> false)) {
>                  if (env_type_is_resolved(env, next_type_id))
>                          next_type = btf_type_id_resolve(btf,
> &next_type_id);
>
> @@ -1675,7 +1687,7 @@ static int btf_var_resolve(struct btf_verifier_env
> *env,
>           * forward types or similar that would resolve to size of
>           * zero is allowed.
>           */
> -       if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
> +       if (!__btf_type_id_size(btf, &next_type_id, &next_type_size,
> false)) {
>                  btf_verifier_log_type(env, v->t, "Invalid type_id");
>                  return -EINVAL;
>          }
> @@ -1725,7 +1737,7 @@ static int btf_ptr_resolve(struct btf_verifier_env
> *env,
>                                                resolved_type_id);
>          }
>
> -       if (!btf_type_id_size(btf, &next_type_id, NULL)) {
> +       if (!__btf_type_id_size(btf, &next_type_id, NULL, false)) {
>                  if (env_type_is_resolved(env, next_type_id))
>                          next_type = btf_type_id_resolve(btf,
> &next_type_id);
>
> @@ -1851,7 +1863,7 @@ static int btf_array_check_member(struct
> btf_verifier_env *env,
>          }
>
>          array_type_id = member->type;
> -       btf_type_id_size(btf, &array_type_id, &array_size);
> +       __btf_type_id_size(btf, &array_type_id, &array_size, false);
>          struct_size = struct_type->size;
>          bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
>          if (struct_size - bytes_offset < array_size) {
> @@ -1938,7 +1950,7 @@ static int btf_array_resolve(struct
> btf_verifier_env *env,
>              !env_type_is_resolved(env, index_type_id))
>                  return env_stack_push(env, index_type, index_type_id);
>
> -       index_type = btf_type_id_size(btf, &index_type_id, NULL);
> +       index_type = __btf_type_id_size(btf, &index_type_id, NULL, false);
>          if (!index_type || !btf_type_is_int(index_type) ||
>              !btf_type_int_is_regular(index_type)) {
>                  btf_verifier_log_type(env, v->t, "Invalid index");
> @@ -1959,7 +1971,7 @@ static int btf_array_resolve(struct
> btf_verifier_env *env,
>              !env_type_is_resolved(env, elem_type_id))
>                  return env_stack_push(env, elem_type, elem_type_id);
>
> -       elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
> +       elem_type = __btf_type_id_size(btf, &elem_type_id, &elem_size,
> false);
>          if (!elem_type) {
>                  btf_verifier_log_type(env, v->t, "Invalid elem");
>                  return -EINVAL;
> @@ -2000,7 +2012,7 @@ static void btf_array_seq_show(const struct btf
> *btf, const struct btf_type *t,
>          u32 i, elem_size, elem_type_id;
>
>          elem_type_id = array->type;
> -       elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
> +       elem_type = __btf_type_id_size(btf, &elem_type_id, &elem_size,
> false);
>          elem_ops = btf_type_ops(elem_type);
>          seq_puts(m, "[");
>          for (i = 0; i < array->nelems; i++) {
> @@ -2732,7 +2744,7 @@ static int btf_datasec_resolve(struct
> btf_verifier_env *env,
>                  }
>
>                  type_id = var_type->type;
> -               if (!btf_type_id_size(btf, &type_id, &type_size)) {
> +               if (!__btf_type_id_size(btf, &type_id, &type_size, false)) {
>                          btf_verifier_log_vsi(env, v->t, vsi, "Invalid
> type");
>                          return -EINVAL;
>                  }
> @@ -2813,7 +2825,7 @@ static int btf_func_proto_check(struct
> btf_verifier_env *env,
>                  }
>
>                  /* Ensure the return type is a type that has a size */
> -               if (!btf_type_id_size(btf, &ret_type_id, NULL)) {
> +               if (!__btf_type_id_size(btf, &ret_type_id, NULL, false)) {
>                          btf_verifier_log_type(env, t, "Invalid return
> type");
>                          return -EINVAL;
>                  }
> @@ -2861,7 +2873,7 @@ static int btf_func_proto_check(struct
> btf_verifier_env *env,
>                                  break;
>                  }
>
> -               if (!btf_type_id_size(btf, &arg_type_id, NULL)) {
> +               if (!__btf_type_id_size(btf, &arg_type_id, NULL, false)) {
>                          btf_verifier_log_type(env, t, "Invalid arg#%u",
> i + 1);
>                          err = -EINVAL;
>                          break;
> @@ -3014,7 +3026,7 @@ static bool btf_resolve_valid(struct
> btf_verifier_env *env,
>                  u32 elem_type_id = array->type;
>                  u32 elem_size;
>
> -               elem_type = btf_type_id_size(btf, &elem_type_id,
> &elem_size);
> +               elem_type = __btf_type_id_size(btf, &elem_type_id,
> &elem_size, false);
>                  return elem_type && !btf_type_is_modifier(elem_type) &&
>                          (array->nelems * elem_size ==
>                           btf->resolved_sizes[type_id]);
>
>
> > ---
> >   kernel/bpf/btf.c | 42 +++---------------------------------------
> >   1 file changed, 3 insertions(+), 39 deletions(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index cad09858a5f2..c68c7e73b0d1 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -231,14 +231,6 @@ enum visit_state {
> >       RESOLVED,
> >   };
> >
> > -enum resolve_mode {
> > -     RESOLVE_TBD,    /* To Be Determined */
> > -     RESOLVE_PTR,    /* Resolving for Pointer */
> > -     RESOLVE_STRUCT_OR_ARRAY,        /* Resolving for struct/union
> > -                                      * or array
> > -                                      */
> > -};
> > -
> >   #define MAX_RESOLVE_DEPTH 32
> >
> >   struct btf_sec_info {
> > @@ -254,7 +246,6 @@ struct btf_verifier_env {
> >       u32 log_type_id;
> >       u32 top_stack;
> >       enum verifier_phase phase;
> > -     enum resolve_mode resolve_mode;
> >   };
> >
> >   static const char * const btf_kind_str[NR_BTF_KINDS] = {
> > @@ -964,26 +955,7 @@ static void btf_verifier_env_free(struct btf_verifier_env *env)
> >   static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
> >                                    const struct btf_type *next_type)
> >   {
> > -     switch (env->resolve_mode) {
> > -     case RESOLVE_TBD:
> > -             /* int, enum or void is a sink */
> > -             return !btf_type_needs_resolve(next_type);
> > -     case RESOLVE_PTR:
> > -             /* int, enum, void, struct, array, func or func_proto is a sink
> > -              * for ptr
> > -              */
> > -             return !btf_type_is_modifier(next_type) &&
> > -                     !btf_type_is_ptr(next_type);
> > -     case RESOLVE_STRUCT_OR_ARRAY:
> > -             /* int, enum, void, ptr, func or func_proto is a sink
> > -              * for struct and array
> > -              */
> > -             return !btf_type_is_modifier(next_type) &&
> > -                     !btf_type_is_array(next_type) &&
> > -                     !btf_type_is_struct(next_type);
> > -     default:
> > -             BUG();
> > -     }
> > +     return !btf_type_needs_resolve(next_type);
> >   }
> >
> >   static bool env_type_is_resolved(const struct btf_verifier_env *env,
> > @@ -1010,13 +982,6 @@ static int env_stack_push(struct btf_verifier_env *env,
> >       v->type_id = type_id;
> >       v->next_member = 0;
> >
> > -     if (env->resolve_mode == RESOLVE_TBD) {
> > -             if (btf_type_is_ptr(t))
> > -                     env->resolve_mode = RESOLVE_PTR;
> > -             else if (btf_type_is_struct(t) || btf_type_is_array(t))
> > -                     env->resolve_mode = RESOLVE_STRUCT_OR_ARRAY;
> > -     }
> > -
> >       return 0;
> >   }
> >
> > @@ -1038,7 +1003,7 @@ static void env_stack_pop_resolved(struct btf_verifier_env *env,
> >       env->visit_states[type_id] = RESOLVED;
> >   }
> >
> > -static const struct resolve_vertex *env_stack_peak(struct btf_verifier_env *env)
> > +static const struct resolve_vertex *env_stack_peek(struct btf_verifier_env *env)
> >   {
> >       return env->top_stack ? &env->stack[env->top_stack - 1] : NULL;
> >   }
> > @@ -3030,9 +2995,8 @@ static int btf_resolve(struct btf_verifier_env *env,
> >       const struct resolve_vertex *v;
> >       int err = 0;
> >
> > -     env->resolve_mode = RESOLVE_TBD;
> >       env_stack_push(env, t, type_id);
> > -     while (!err && (v = env_stack_peak(env))) {
> > +     while (!err && (v = env_stack_peek(env))) {
> >               env->log_type_id = v->type_id;
> >               err = btf_type_ops(v->t)->resolve(env, v);
> >       }
> >

^ permalink raw reply

* Re: [PATCH] ipvs: remove unnecessary space
From: Joe Perches @ 2019-07-11  0:22 UTC (permalink / raw)
  To: Simon Horman, yangxingwu, Pablo Neira Ayuso
  Cc: wensong, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190710080609.smxjqe2d5jyro4hv@verge.net.au>

On Wed, 2019-07-10 at 10:06 +0200, Simon Horman wrote:
> On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > this patch removes the extra space.
> > 
> > Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
> 
> Thanks, this looks good to me.
> 
> Acked-by: Simon Horman <horms@verge.net.au>
> 
> Pablo, please consider including this in nf-next.
> 
> 
> > ---
> >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > index 94d9d34..98e358e 100644
> > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> >  		return 0;
> >  	}
> >  
> > -	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > -			 sizeof(unsigned long), GFP_KERNEL);
> > +	table =	kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > +			sizeof(unsigned long), GFP_KERNEL);

bitmap_alloc?

> >  	if (!table)
> >  		return -ENOMEM;
> >  
> > -- 
> > 1.8.3.1
> > 


^ permalink raw reply

* Re: [PATCH net-next,v4 12/12] netfilter: nf_tables: add hardware offload support
From: Pablo Neira Ayuso @ 2019-07-11  0:18 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, thomas.lendacky, f.fainelli, ariel.elior,
	michael.chan, madalin.bucur, yisen.zhuang, salil.mehta,
	jeffrey.t.kirsher, tariqt, saeedm, jiri, idosch, jakub.kicinski,
	peppe.cavallaro, grygorii.strashko, andrew, vivien.didelot,
	alexandre.torgue, joabreu, linux-net-drivers, ogerlitz,
	Manish.Chopra, marcelo.leitner, mkubecek, venkatkumar.duvvuru,
	maxime.chevallier, cphealy, phil, netfilter-devel
In-Reply-To: <20190710075227.GA4362@nanopsycho>

On Wed, Jul 10, 2019 at 09:52:27AM +0200, Jiri Pirko wrote:
> Tue, Jul 09, 2019 at 10:55:50PM CEST, pablo@netfilter.org wrote:
> 
> [...]
> 
> >+	if (!dev || !dev->netdev_ops->ndo_setup_tc)
> 
> Why didn't you rename ndo_setup_tc? I put a comment about it in the
> previous version thread. I expect that you can at least write why it is
> a wrong idea.

This is a good idea. It happened that I read this email by when my new
patch series was ready. I will follow up with a patch to address this
rename as soon as the bug fixes are sorted out.

Thanks.

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  0:16 UTC (permalink / raw)
  To: Andrii Nakryiko, andrii.nakryiko@gmail.com, Alexei Starovoitov,
	daniel@iogearbox.net, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Kernel Team
  Cc: Martin Lau
In-Reply-To: <20190710080840.2613160-1-andriin@fb.com>



On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> BTF verifier has Different logic depending on whether we are following
> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> stop early in DFS traversal while resolving BTF types. But it also
> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> size won't be resolved, as it is considered to be a sink for pointer,
> leading to TYPEDEF being in RESOLVED state with zero size, which is
> completely wrong.
> 
> Optimization is doubtful, though, as btf_check_all_types() will iterate
> over all BTF types anyways, so the only saving is a potentially slightly
> shorter stack. But correctness is more important that tiny savings.
> 
> This bug manifests itself in rejecting BTF-defined maps that use array
> typedef as a value type:
> 
> typedef int array_t[16];
> 
> struct {
> 	__uint(type, BPF_MAP_TYPE_ARRAY);
> 	__type(value, array_t); /* i.e., array_t *value; */
> } test_map SEC(".maps");
> 
> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

The change seems okay to me. Currently, looks like intermediate
modifier type will carry size = 0 (in the internal data structure).

If we remove RESOLVE logic, we probably want to double check
whether we handle circular types correctly or not. Maybe we will
be okay if all self tests pass.

I may still be worthwhile to qualify the RESOLVE optimization benefit
before removing it.

Another possible change is, for external usage, removing
modifiers, before checking the size, something like below.
Note that I am not strongly advocating my below patch as
it has the same shortcoming that maintained modifier type
size may not be correct.

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 546ebee39e2a..6f927c3e0a89 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct 
btf_type *t)
         return true;
  }

+static const struct btf_type *__btf_type_id_size(const struct btf *btf,
+                                                u32 *type_id, u32 
*ret_size,
+                                                bool skip_modifier)
+{
+       const struct btf_type *size_type;
+       u32 size_type_id = *type_id;
+       u32 size = 0;
+
+       size_type = btf_type_by_id(btf, size_type_id);
+       if (size_type && skip_modifier) {
+               while (btf_type_is_modifier(size_type))
+                       size_type = btf_type_by_id(btf, size_type->type);
+       }
+
+       if (btf_type_nosize_or_null(size_type))
+               return NULL;
+
+       if (btf_type_has_size(size_type)) {
+               size = size_type->size;
+       } else if (btf_type_is_array(size_type)) {
+               size = btf->resolved_sizes[size_type_id];
+       } else if (btf_type_is_ptr(size_type)) {
+               size = sizeof(void *);
+       } else {
+               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
+                                !btf_type_is_var(size_type)))
+                       return NULL;
+
+               size = btf->resolved_sizes[size_type_id];
+               size_type_id = btf->resolved_ids[size_type_id];
+               size_type = btf_type_by_id(btf, size_type_id);
+               if (btf_type_nosize_or_null(size_type))
+                       return NULL;
+       }
+
+       *type_id = size_type_id;
+       if (ret_size)
+               *ret_size = size;
+
+       return size_type;
+}
+
+const struct btf_type *btf_type_id_size(const struct btf *btf,
+                                       u32 *type_id, u32 *ret_size)
+{
+       return __btf_type_id_size(btf, type_id, ret_size, true);
+}
+
  /*
   * Check that given struct member is a regular int with expected
   * offset and size.
@@ -633,7 +681,7 @@ bool btf_member_is_reg_int(const struct btf *btf, 
const struct btf_type *s,
         u8 nr_bits;

         id = m->type;
-       t = btf_type_id_size(btf, &id, NULL);
+       t = __btf_type_id_size(btf, &id, NULL, false);
         if (!t || !btf_type_is_int(t))
                 return false;

@@ -1051,42 +1099,6 @@ static const struct btf_type 
*btf_type_id_resolve(const struct btf *btf,
         return btf_type_by_id(btf, *type_id);
  }

-const struct btf_type *btf_type_id_size(const struct btf *btf,
-                                       u32 *type_id, u32 *ret_size)
-{
-       const struct btf_type *size_type;
-       u32 size_type_id = *type_id;
-       u32 size = 0;
-
-       size_type = btf_type_by_id(btf, size_type_id);
-       if (btf_type_nosize_or_null(size_type))
-               return NULL;
-
-       if (btf_type_has_size(size_type)) {
-               size = size_type->size;
-       } else if (btf_type_is_array(size_type)) {
-               size = btf->resolved_sizes[size_type_id];
-       } else if (btf_type_is_ptr(size_type)) {
-               size = sizeof(void *);
-       } else {
-               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
-                                !btf_type_is_var(size_type)))
-                       return NULL;
-
-               size = btf->resolved_sizes[size_type_id];
-               size_type_id = btf->resolved_ids[size_type_id];
-               size_type = btf_type_by_id(btf, size_type_id);
-               if (btf_type_nosize_or_null(size_type))
-                       return NULL;
-       }
-
-       *type_id = size_type_id;
-       if (ret_size)
-               *ret_size = size;
-
-       return size_type;
-}
-
  static int btf_df_check_member(struct btf_verifier_env *env,
                                const struct btf_type *struct_type,
                                const struct btf_member *member,
@@ -1489,7 +1501,7 @@ static int btf_modifier_check_member(struct 
btf_verifier_env *env,
         struct btf_member resolved_member;
         struct btf *btf = env->btf;

-       resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
+       resolved_type = __btf_type_id_size(btf, &resolved_type_id, NULL, 
false);
         if (!resolved_type) {
                 btf_verifier_log_member(env, struct_type, member,
                                         "Invalid member");
@@ -1514,7 +1526,7 @@ static int btf_modifier_check_kflag_member(struct 
btf_verifier_env *env,
         struct btf_member resolved_member;
         struct btf *btf = env->btf;

-       resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
+       resolved_type = __btf_type_id_size(btf, &resolved_type_id, NULL, 
false);
         if (!resolved_type) {
                 btf_verifier_log_member(env, struct_type, member,
                                         "Invalid member");
@@ -1620,7 +1632,7 @@ static int btf_modifier_resolve(struct 
btf_verifier_env *env,
          * save us a few type-following when we use it later (e.g. in
          * pretty print).
          */
-       if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
+       if (!__btf_type_id_size(btf, &next_type_id, &next_type_size, 
false)) {
                 if (env_type_is_resolved(env, next_type_id))
                         next_type = btf_type_id_resolve(btf, 
&next_type_id);

@@ -1675,7 +1687,7 @@ static int btf_var_resolve(struct btf_verifier_env 
*env,
          * forward types or similar that would resolve to size of
          * zero is allowed.
          */
-       if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
+       if (!__btf_type_id_size(btf, &next_type_id, &next_type_size, 
false)) {
                 btf_verifier_log_type(env, v->t, "Invalid type_id");
                 return -EINVAL;
         }
@@ -1725,7 +1737,7 @@ static int btf_ptr_resolve(struct btf_verifier_env 
*env,
                                               resolved_type_id);
         }

-       if (!btf_type_id_size(btf, &next_type_id, NULL)) {
+       if (!__btf_type_id_size(btf, &next_type_id, NULL, false)) {
                 if (env_type_is_resolved(env, next_type_id))
                         next_type = btf_type_id_resolve(btf, 
&next_type_id);

@@ -1851,7 +1863,7 @@ static int btf_array_check_member(struct 
btf_verifier_env *env,
         }

         array_type_id = member->type;
-       btf_type_id_size(btf, &array_type_id, &array_size);
+       __btf_type_id_size(btf, &array_type_id, &array_size, false);
         struct_size = struct_type->size;
         bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
         if (struct_size - bytes_offset < array_size) {
@@ -1938,7 +1950,7 @@ static int btf_array_resolve(struct 
btf_verifier_env *env,
             !env_type_is_resolved(env, index_type_id))
                 return env_stack_push(env, index_type, index_type_id);

-       index_type = btf_type_id_size(btf, &index_type_id, NULL);
+       index_type = __btf_type_id_size(btf, &index_type_id, NULL, false);
         if (!index_type || !btf_type_is_int(index_type) ||
             !btf_type_int_is_regular(index_type)) {
                 btf_verifier_log_type(env, v->t, "Invalid index");
@@ -1959,7 +1971,7 @@ static int btf_array_resolve(struct 
btf_verifier_env *env,
             !env_type_is_resolved(env, elem_type_id))
                 return env_stack_push(env, elem_type, elem_type_id);

-       elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
+       elem_type = __btf_type_id_size(btf, &elem_type_id, &elem_size, 
false);
         if (!elem_type) {
                 btf_verifier_log_type(env, v->t, "Invalid elem");
                 return -EINVAL;
@@ -2000,7 +2012,7 @@ static void btf_array_seq_show(const struct btf 
*btf, const struct btf_type *t,
         u32 i, elem_size, elem_type_id;

         elem_type_id = array->type;
-       elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
+       elem_type = __btf_type_id_size(btf, &elem_type_id, &elem_size, 
false);
         elem_ops = btf_type_ops(elem_type);
         seq_puts(m, "[");
         for (i = 0; i < array->nelems; i++) {
@@ -2732,7 +2744,7 @@ static int btf_datasec_resolve(struct 
btf_verifier_env *env,
                 }

                 type_id = var_type->type;
-               if (!btf_type_id_size(btf, &type_id, &type_size)) {
+               if (!__btf_type_id_size(btf, &type_id, &type_size, false)) {
                         btf_verifier_log_vsi(env, v->t, vsi, "Invalid 
type");
                         return -EINVAL;
                 }
@@ -2813,7 +2825,7 @@ static int btf_func_proto_check(struct 
btf_verifier_env *env,
                 }

                 /* Ensure the return type is a type that has a size */
-               if (!btf_type_id_size(btf, &ret_type_id, NULL)) {
+               if (!__btf_type_id_size(btf, &ret_type_id, NULL, false)) {
                         btf_verifier_log_type(env, t, "Invalid return 
type");
                         return -EINVAL;
                 }
@@ -2861,7 +2873,7 @@ static int btf_func_proto_check(struct 
btf_verifier_env *env,
                                 break;
                 }

-               if (!btf_type_id_size(btf, &arg_type_id, NULL)) {
+               if (!__btf_type_id_size(btf, &arg_type_id, NULL, false)) {
                         btf_verifier_log_type(env, t, "Invalid arg#%u", 
i + 1);
                         err = -EINVAL;
                         break;
@@ -3014,7 +3026,7 @@ static bool btf_resolve_valid(struct 
btf_verifier_env *env,
                 u32 elem_type_id = array->type;
                 u32 elem_size;

-               elem_type = btf_type_id_size(btf, &elem_type_id, 
&elem_size);
+               elem_type = __btf_type_id_size(btf, &elem_type_id, 
&elem_size, false);
                 return elem_type && !btf_type_is_modifier(elem_type) &&
                         (array->nelems * elem_size ==
                          btf->resolved_sizes[type_id]);


> ---
>   kernel/bpf/btf.c | 42 +++---------------------------------------
>   1 file changed, 3 insertions(+), 39 deletions(-)
> 
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index cad09858a5f2..c68c7e73b0d1 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -231,14 +231,6 @@ enum visit_state {
>   	RESOLVED,
>   };
>   
> -enum resolve_mode {
> -	RESOLVE_TBD,	/* To Be Determined */
> -	RESOLVE_PTR,	/* Resolving for Pointer */
> -	RESOLVE_STRUCT_OR_ARRAY,	/* Resolving for struct/union
> -					 * or array
> -					 */
> -};
> -
>   #define MAX_RESOLVE_DEPTH 32
>   
>   struct btf_sec_info {
> @@ -254,7 +246,6 @@ struct btf_verifier_env {
>   	u32 log_type_id;
>   	u32 top_stack;
>   	enum verifier_phase phase;
> -	enum resolve_mode resolve_mode;
>   };
>   
>   static const char * const btf_kind_str[NR_BTF_KINDS] = {
> @@ -964,26 +955,7 @@ static void btf_verifier_env_free(struct btf_verifier_env *env)
>   static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
>   				     const struct btf_type *next_type)
>   {
> -	switch (env->resolve_mode) {
> -	case RESOLVE_TBD:
> -		/* int, enum or void is a sink */
> -		return !btf_type_needs_resolve(next_type);
> -	case RESOLVE_PTR:
> -		/* int, enum, void, struct, array, func or func_proto is a sink
> -		 * for ptr
> -		 */
> -		return !btf_type_is_modifier(next_type) &&
> -			!btf_type_is_ptr(next_type);
> -	case RESOLVE_STRUCT_OR_ARRAY:
> -		/* int, enum, void, ptr, func or func_proto is a sink
> -		 * for struct and array
> -		 */
> -		return !btf_type_is_modifier(next_type) &&
> -			!btf_type_is_array(next_type) &&
> -			!btf_type_is_struct(next_type);
> -	default:
> -		BUG();
> -	}
> +	return !btf_type_needs_resolve(next_type);
>   }
>   
>   static bool env_type_is_resolved(const struct btf_verifier_env *env,
> @@ -1010,13 +982,6 @@ static int env_stack_push(struct btf_verifier_env *env,
>   	v->type_id = type_id;
>   	v->next_member = 0;
>   
> -	if (env->resolve_mode == RESOLVE_TBD) {
> -		if (btf_type_is_ptr(t))
> -			env->resolve_mode = RESOLVE_PTR;
> -		else if (btf_type_is_struct(t) || btf_type_is_array(t))
> -			env->resolve_mode = RESOLVE_STRUCT_OR_ARRAY;
> -	}
> -
>   	return 0;
>   }
>   
> @@ -1038,7 +1003,7 @@ static void env_stack_pop_resolved(struct btf_verifier_env *env,
>   	env->visit_states[type_id] = RESOLVED;
>   }
>   
> -static const struct resolve_vertex *env_stack_peak(struct btf_verifier_env *env)
> +static const struct resolve_vertex *env_stack_peek(struct btf_verifier_env *env)
>   {
>   	return env->top_stack ? &env->stack[env->top_stack - 1] : NULL;
>   }
> @@ -3030,9 +2995,8 @@ static int btf_resolve(struct btf_verifier_env *env,
>   	const struct resolve_vertex *v;
>   	int err = 0;
>   
> -	env->resolve_mode = RESOLVE_TBD;
>   	env_stack_push(env, t, type_id);
> -	while (!err && (v = env_stack_peak(env))) {
> +	while (!err && (v = env_stack_peek(env))) {
>   		env->log_type_id = v->type_id;
>   		err = btf_type_ops(v->t)->resolve(env, v);
>   	}
> 

^ permalink raw reply related

* [PATCH net-next 2/3] net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_t
From: Pablo Neira Ayuso @ 2019-07-11  0:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski
In-Reply-To: <20190711001235.20686-1-pablo@netfilter.org>

Rename this type definition and adapt users.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
This patch is a dependency for patch 3/3, so include/net/flow_offload.h
does not need to include include/net/sch_cls.h, and hence avoid a
circular inclusion.

 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  2 +-
 drivers/net/ethernet/mscc/ocelot_tc.c          |  2 +-
 include/net/flow_offload.h                     | 16 ++++++++++------
 include/net/pkt_cls.h                          |  4 ++--
 include/net/sch_generic.h                      |  6 ++----
 net/core/flow_offload.c                        |  9 +++++----
 net/dsa/slave.c                                |  2 +-
 net/sched/cls_api.c                            |  2 +-
 net/sched/cls_bpf.c                            |  2 +-
 net/sched/cls_flower.c                         |  2 +-
 net/sched/cls_matchall.c                       |  2 +-
 net/sched/cls_u32.c                            |  6 +++---
 12 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index a469035400cf..51cd0b6f1f3e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1679,7 +1679,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 				   struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 	bool ingress;
 	int err;
 
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index abbcb66bf5ac..fba9512e9ca6 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -134,7 +134,7 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 				 struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 	int err;
 
 	netdev_dbg(port->dev, "tc_block command %d, binder_type %d\n",
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index aa9b5287b231..98bf3af5c84d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -3,7 +3,6 @@
 
 #include <linux/kernel.h>
 #include <net/flow_dissector.h>
-#include <net/sch_generic.h>
 
 struct flow_match {
 	struct flow_dissector	*dissector;
@@ -261,23 +260,27 @@ struct flow_block_offload {
 	struct netlink_ext_ack *extack;
 };
 
+enum tc_setup_type;
+typedef int flow_setup_cb_t(enum tc_setup_type type, void *type_data,
+			    void *cb_priv);
+
 struct flow_block_cb {
 	struct list_head	driver_list;
 	struct list_head	list;
-	tc_setup_cb_t		*cb;
+	flow_setup_cb_t		*cb;
 	void			*cb_ident;
 	void			*cb_priv;
 	void			(*release)(void *cb_priv);
 	unsigned int		refcnt;
 };
 
-struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv));
 void flow_block_cb_free(struct flow_block_cb *block_cb);
 
 struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *offload,
-					   tc_setup_cb_t *cb, void *cb_ident);
+					   flow_setup_cb_t *cb, void *cb_ident);
 
 void *flow_block_cb_priv(struct flow_block_cb *block_cb);
 void flow_block_cb_incref(struct flow_block_cb *block_cb);
@@ -295,11 +298,12 @@ static inline void flow_block_cb_remove(struct flow_block_cb *block_cb,
 	list_move(&block_cb->list, &offload->cb_list);
 }
 
-bool flow_block_cb_is_busy(tc_setup_cb_t *cb, void *cb_ident,
+bool flow_block_cb_is_busy(flow_setup_cb_t *cb, void *cb_ident,
 			   struct list_head *driver_block_list);
 
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
-			       struct list_head *driver_list, tc_setup_cb_t *cb,
+			       struct list_head *driver_list,
+			       flow_setup_cb_t *cb,
 			       void *cb_ident, void *cb_priv, bool ingress_only);
 
 enum flow_cls_command {
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 841faadceb6e..cee651b76a1f 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -126,14 +126,14 @@ static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 }
 
 static inline
-int tc_setup_cb_block_register(struct tcf_block *block, tc_setup_cb_t *cb,
+int tc_setup_cb_block_register(struct tcf_block *block, flow_setup_cb_t *cb,
 			       void *cb_priv)
 {
 	return 0;
 }
 
 static inline
-void tc_setup_cb_block_unregister(struct tcf_block *block, tc_setup_cb_t *cb,
+void tc_setup_cb_block_unregister(struct tcf_block *block, flow_setup_cb_t *cb,
 				  void *cb_priv)
 {
 }
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 855167bbc372..9482e060483b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -15,6 +15,7 @@
 #include <linux/mutex.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
+#include <net/flow_offload.h>
 
 struct Qdisc_ops;
 struct qdisc_walker;
@@ -22,9 +23,6 @@ struct tcf_walker;
 struct module;
 struct bpf_flow_keys;
 
-typedef int tc_setup_cb_t(enum tc_setup_type type,
-			  void *type_data, void *cb_priv);
-
 typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
 				    enum tc_setup_type type, void *type_data);
 
@@ -313,7 +311,7 @@ struct tcf_proto_ops {
 	void			(*walk)(struct tcf_proto *tp,
 					struct tcf_walker *arg, bool rtnl_held);
 	int			(*reoffload)(struct tcf_proto *tp, bool add,
-					     tc_setup_cb_t *cb, void *cb_priv,
+					     flow_setup_cb_t *cb, void *cb_priv,
 					     struct netlink_ext_ack *extack);
 	void			(*bind_class)(void *, u32, unsigned long);
 	void *			(*tmplt_create)(struct net *net,
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 507de4b48815..a800fa78d96c 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -165,7 +165,7 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
 
-struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv))
 {
@@ -194,7 +194,7 @@ void flow_block_cb_free(struct flow_block_cb *block_cb)
 EXPORT_SYMBOL(flow_block_cb_free);
 
 struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
-					   tc_setup_cb_t *cb, void *cb_ident)
+					   flow_setup_cb_t *cb, void *cb_ident)
 {
 	struct flow_block_cb *block_cb;
 
@@ -226,7 +226,7 @@ unsigned int flow_block_cb_decref(struct flow_block_cb *block_cb)
 }
 EXPORT_SYMBOL(flow_block_cb_decref);
 
-bool flow_block_cb_is_busy(tc_setup_cb_t *cb, void *cb_ident,
+bool flow_block_cb_is_busy(flow_setup_cb_t *cb, void *cb_ident,
 			   struct list_head *driver_block_list)
 {
 	struct flow_block_cb *block_cb;
@@ -243,7 +243,8 @@ EXPORT_SYMBOL(flow_block_cb_is_busy);
 
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
 			       struct list_head *driver_block_list,
-			       tc_setup_cb_t *cb, void *cb_ident, void *cb_priv,
+			       flow_setup_cb_t *cb,
+			       void *cb_ident, void *cb_priv,
 			       bool ingress_only)
 {
 	struct flow_block_cb *block_cb;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 6ca9ec58f881..d697a64fb564 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -951,7 +951,7 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 				    struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 
 	if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		cb = dsa_slave_setup_tc_block_cb_ig;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 278014e26aec..51fbe6e95a92 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1514,7 +1514,7 @@ void tcf_block_put(struct tcf_block *block)
 EXPORT_SYMBOL(tcf_block_put);
 
 static int
-tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
+tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb,
 			    void *cb_priv, bool add, bool offload_in_use,
 			    struct netlink_ext_ack *extack)
 {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 691f71830134..3f7a9c02b70c 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -651,7 +651,7 @@ static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 	}
 }
 
-static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			     void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 38d6e85693fc..054123742e32 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1800,7 +1800,7 @@ fl_get_next_hw_filter(struct tcf_proto *tp, struct cls_fl_filter *f, bool add)
 	return NULL;
 }
 
-static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct tcf_block *block = tp->chain->block;
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index a30d2f8feb32..455ea2793f9b 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -282,7 +282,7 @@ static void mall_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 	arg->count++;
 }
 
-static int mall_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int mall_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			  void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct cls_mall_head *head = rtnl_dereference(tp->root);
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index be9e46c77e8b..8614088edd1b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -1152,7 +1152,7 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 }
 
 static int u32_reoffload_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
-			       bool add, tc_setup_cb_t *cb, void *cb_priv,
+			       bool add, flow_setup_cb_t *cb, void *cb_priv,
 			       struct netlink_ext_ack *extack)
 {
 	struct tc_cls_u32_offload cls_u32 = {};
@@ -1172,7 +1172,7 @@ static int u32_reoffload_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
 }
 
 static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
-			       bool add, tc_setup_cb_t *cb, void *cb_priv,
+			       bool add, flow_setup_cb_t *cb, void *cb_priv,
 			       struct netlink_ext_ack *extack)
 {
 	struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
@@ -1213,7 +1213,7 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 	return 0;
 }
 
-static int u32_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int u32_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			 void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct tc_u_common *tp_c = tp->data;
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next 3/3] net: flow_offload: add flow_block structure and use it
From: Pablo Neira Ayuso @ 2019-07-11  0:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski
In-Reply-To: <20190711001235.20686-1-pablo@netfilter.org>

This object stores the flow block callbacks that are attached to this
block. This patch restores block sharing.

Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/flow_offload.h        |  5 +++++
 include/net/netfilter/nf_tables.h |  5 +++--
 include/net/sch_generic.h         |  2 +-
 net/core/flow_offload.c           |  2 +-
 net/netfilter/nf_tables_api.c     |  2 +-
 net/netfilter/nf_tables_offload.c |  5 +++--
 net/sched/cls_api.c               | 10 +++++++---
 7 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 98bf3af5c84d..e50d94736829 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -248,6 +248,10 @@ enum flow_block_binder_type {
 	FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
 };
 
+struct flow_block {
+	struct list_head cb_list;
+};
+
 struct netlink_ext_ack;
 
 struct flow_block_offload {
@@ -255,6 +259,7 @@ struct flow_block_offload {
 	enum flow_block_binder_type binder_type;
 	bool block_shared;
 	struct net *net;
+	struct flow_block *block;
 	struct list_head cb_list;
 	struct list_head *driver_block_list;
 	struct netlink_ext_ack *extack;
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 35dfdd9f69b3..00658462f89b 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -11,6 +11,7 @@
 #include <linux/rhashtable.h>
 #include <net/netfilter/nf_flow_table.h>
 #include <net/netlink.h>
+#include <net/flow_offload.h>
 
 struct module;
 
@@ -951,7 +952,7 @@ struct nft_stats {
  *	@stats: per-cpu chain stats
  *	@chain: the chain
  *	@dev_name: device name that this base chain is attached to (if any)
- *	@cb_list: list of flow block callbacks (for hardware offload)
+ *	@block: flow block (for hardware offload)
  */
 struct nft_base_chain {
 	struct nf_hook_ops		ops;
@@ -961,7 +962,7 @@ struct nft_base_chain {
 	struct nft_stats __percpu	*stats;
 	struct nft_chain		chain;
 	char 				dev_name[IFNAMSIZ];
-	struct list_head		cb_list;
+	struct flow_block		block;
 };
 
 static inline struct nft_base_chain *nft_base_chain(const struct nft_chain *chain)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9482e060483b..58041cb0ce15 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -399,7 +399,7 @@ struct tcf_block {
 	refcount_t refcnt;
 	struct net *net;
 	struct Qdisc *q;
-	struct list_head cb_list;
+	struct flow_block flow;
 	struct list_head owner_list;
 	bool keep_dst;
 	unsigned int offloadcnt; /* Number of oddloaded filters */
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index a800fa78d96c..935c7f81a9ef 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -198,7 +198,7 @@ struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
 {
 	struct flow_block_cb *block_cb;
 
-	list_for_each_entry(block_cb, f->driver_block_list, driver_list) {
+	list_for_each_entry(block_cb, &f->block->cb_list, list) {
 		if (block_cb->cb == cb &&
 		    block_cb->cb_ident == cb_ident)
 			return block_cb;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index ed17a7c29b86..c565f146435b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1662,7 +1662,7 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 
 		chain->flags |= NFT_BASE_CHAIN | flags;
 		basechain->policy = NF_ACCEPT;
-		INIT_LIST_HEAD(&basechain->cb_list);
+		INIT_LIST_HEAD(&basechain->block.cb_list);
 	} else {
 		chain = kzalloc(sizeof(*chain), GFP_KERNEL);
 		if (chain == NULL)
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 2c3302845f67..2a184277ee58 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -116,7 +116,7 @@ static int nft_setup_cb_call(struct nft_base_chain *basechain,
 	struct flow_block_cb *block_cb;
 	int err;
 
-	list_for_each_entry(block_cb, &basechain->cb_list, list) {
+	list_for_each_entry(block_cb, &basechain->block.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err < 0)
 			return err;
@@ -154,7 +154,7 @@ static int nft_flow_offload_rule(struct nft_trans *trans,
 static int nft_flow_offload_bind(struct flow_block_offload *bo,
 				 struct nft_base_chain *basechain)
 {
-	list_splice(&bo->cb_list, &basechain->cb_list);
+	list_splice(&bo->cb_list, &basechain->block.cb_list);
 	return 0;
 }
 
@@ -198,6 +198,7 @@ static int nft_flow_offload_chain(struct nft_trans *trans,
 		return -EOPNOTSUPP;
 
 	bo.command = cmd;
+	bo.block = &basechain->block;
 	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	bo.extack = &extack;
 	INIT_LIST_HEAD(&bo.cb_list);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 51fbe6e95a92..66181961ad6f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -691,6 +691,8 @@ static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
 	if (!indr_dev->block)
 		return;
 
+	bo.block = &indr_dev->block->flow;
+
 	indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
 			  &bo);
 	tcf_block_setup(indr_dev->block, &bo);
@@ -775,6 +777,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 		.command	= command,
 		.binder_type	= ei->binder_type,
 		.net		= dev_net(dev),
+		.block		= &block->flow,
 		.block_shared	= tcf_block_shared(block),
 		.extack		= extack,
 	};
@@ -810,6 +813,7 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
 	bo.net = dev_net(dev);
 	bo.command = command;
 	bo.binder_type = ei->binder_type;
+	bo.block = &block->flow;
 	bo.block_shared = tcf_block_shared(block);
 	bo.extack = extack;
 	INIT_LIST_HEAD(&bo.cb_list);
@@ -988,7 +992,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 	}
 	mutex_init(&block->lock);
 	INIT_LIST_HEAD(&block->chain_list);
-	INIT_LIST_HEAD(&block->cb_list);
+	INIT_LIST_HEAD(&block->flow.cb_list);
 	INIT_LIST_HEAD(&block->owner_list);
 	INIT_LIST_HEAD(&block->chain0.filter_chain_list);
 
@@ -1570,7 +1574,7 @@ static int tcf_block_bind(struct tcf_block *block,
 
 		i++;
 	}
-	list_splice(&bo->cb_list, &block->cb_list);
+	list_splice(&bo->cb_list, &block->flow.cb_list);
 
 	return 0;
 
@@ -3155,7 +3159,7 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 	if (block->nooffloaddevcnt && err_stop)
 		return -EOPNOTSUPP;
 
-	list_for_each_entry(block_cb, &block->cb_list, list) {
+	list_for_each_entry(block_cb, &block->flow.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err) {
 			if (err_stop)
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next 1/3] net: flow_offload: remove netns parameter from flow_block_cb_alloc()
From: Pablo Neira Ayuso @ 2019-07-11  0:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski

No need to annotate the netns on the flow block callback object,
flow_block_cb_is_busy() already checks for used blocks.

Fixes: d63db30c8537 ("net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c    | 3 +--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c      | 5 ++---
 drivers/net/ethernet/mscc/ocelot_flower.c           | 3 +--
 drivers/net/ethernet/mscc/ocelot_tc.c               | 2 +-
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 6 ++----
 include/net/flow_offload.h                          | 3 +--
 net/core/flow_offload.c                             | 9 +++------
 net/dsa/slave.c                                     | 2 +-
 8 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7245d287633d..2162412073c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -735,8 +735,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 		list_add(&indr_priv->list,
 			 &rpriv->uplink_priv.tc_indr_block_priv_list);
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       mlx5e_rep_indr_setup_block_cb,
+		block_cb = flow_block_cb_alloc(mlx5e_rep_indr_setup_block_cb,
 					       indr_priv, indr_priv,
 					       mlx5e_rep_indr_tc_block_unbind);
 		if (IS_ERR(block_cb)) {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 4d34d42b3b0e..a469035400cf 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1610,8 +1610,7 @@ mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
 		acl_block = mlxsw_sp_acl_block_create(mlxsw_sp, f->net);
 		if (!acl_block)
 			return -ENOMEM;
-		block_cb = flow_block_cb_alloc(f->net,
-					       mlxsw_sp_setup_tc_block_cb_flower,
+		block_cb = flow_block_cb_alloc(mlxsw_sp_setup_tc_block_cb_flower,
 					       mlxsw_sp, acl_block,
 					       mlxsw_sp_tc_block_flower_release);
 		if (IS_ERR(block_cb)) {
@@ -1702,7 +1701,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 					  &mlxsw_sp_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, mlxsw_sp_port,
+		block_cb = flow_block_cb_alloc(cb, mlxsw_sp_port,
 					       mlxsw_sp_port, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c
index 7aaddc09c185..6a11aea8b186 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -323,8 +323,7 @@ int ocelot_setup_tc_block_flower_bind(struct ocelot_port *port,
 		if (!port_block)
 			return -ENOMEM;
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       ocelot_setup_tc_block_cb_flower,
+		block_cb = flow_block_cb_alloc(ocelot_setup_tc_block_cb_flower,
 					       port, port_block,
 					       ocelot_tc_block_unbind);
 		if (IS_ERR(block_cb)) {
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index 9e6464ffae5d..abbcb66bf5ac 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -156,7 +156,7 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 		if (flow_block_cb_is_busy(cb, port, &ocelot_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, port, port, NULL);
+		block_cb = flow_block_cb_alloc(cb, port, port, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 7e725fa60347..a0f8892bb4b5 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1324,8 +1324,7 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 					  &nfp_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       nfp_flower_setup_tc_block_cb,
+		block_cb = flow_block_cb_alloc(nfp_flower_setup_tc_block_cb,
 					       repr, repr, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
@@ -1430,8 +1429,7 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
 		cb_priv->app = app;
 		list_add(&cb_priv->list, &priv->indr_block_cb_priv);
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       nfp_flower_setup_indr_block_cb,
+		block_cb = flow_block_cb_alloc(nfp_flower_setup_indr_block_cb,
 					       cb_priv, cb_priv,
 					       nfp_flower_setup_indr_tc_release);
 		if (IS_ERR(block_cb)) {
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index db337299e81e..aa9b5287b231 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -264,7 +264,6 @@ struct flow_block_offload {
 struct flow_block_cb {
 	struct list_head	driver_list;
 	struct list_head	list;
-	struct net		*net;
 	tc_setup_cb_t		*cb;
 	void			*cb_ident;
 	void			*cb_priv;
@@ -272,7 +271,7 @@ struct flow_block_cb {
 	unsigned int		refcnt;
 };
 
-struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv));
 void flow_block_cb_free(struct flow_block_cb *block_cb);
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 76f8db3841d7..507de4b48815 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -165,7 +165,7 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
 
-struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv))
 {
@@ -175,7 +175,6 @@ struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
 	if (!block_cb)
 		return ERR_PTR(-ENOMEM);
 
-	block_cb->net = net;
 	block_cb->cb = cb;
 	block_cb->cb_ident = cb_ident;
 	block_cb->cb_priv = cb_priv;
@@ -200,8 +199,7 @@ struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
 	struct flow_block_cb *block_cb;
 
 	list_for_each_entry(block_cb, f->driver_block_list, driver_list) {
-		if (block_cb->net == f->net &&
-		    block_cb->cb == cb &&
+		if (block_cb->cb == cb &&
 		    block_cb->cb_ident == cb_ident)
 			return block_cb;
 	}
@@ -261,8 +259,7 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 		if (flow_block_cb_is_busy(cb, cb_ident, driver_block_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, cb_ident,
-					       cb_priv, NULL);
+		block_cb = flow_block_cb_alloc(cb, cb_ident, cb_priv, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 614c38ece104..6ca9ec58f881 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -967,7 +967,7 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 		if (flow_block_cb_is_busy(cb, dev, &dsa_slave_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, dev, dev, NULL);
+		block_cb = flow_block_cb_alloc(cb, dev, dev, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
-- 
2.11.0



^ permalink raw reply related

* Re: [bpf-next v3 04/12] selftests/bpf: Use bpf_prog_test_run_xattr
From: Andrii Nakryiko @ 2019-07-11  0:03 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-5-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:43 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> The bpf_prog_test_run_xattr function gives more options to set up a
> test run of a BPF program than the bpf_prog_test_run function.
>
> We will need this extra flexibility to pass ctx data later.
>
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---

lgtm, with some nits below

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/testing/selftests/bpf/test_verifier.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index c7541f572932..1640ba9f12c1 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -822,14 +822,20 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
>  {
>         __u8 tmp[TEST_DATA_LEN << 2];
>         __u32 size_tmp = sizeof(tmp);

nit: this is now is not needed as a separate local variable, inline?

> -       uint32_t retval;
>         int saved_errno;
>         int err;
> +       struct bpf_prog_test_run_attr attr = {
> +               .prog_fd = fd_prog,
> +               .repeat = 1,
> +               .data_in = data,
> +               .data_size_in = size_data,
> +               .data_out = tmp,
> +               .data_size_out = size_tmp,
> +       };
>
>         if (unpriv)
>                 set_admin(true);
> -       err = bpf_prog_test_run(fd_prog, 1, data, size_data,
> -                               tmp, &size_tmp, &retval, NULL);
> +       err = bpf_prog_test_run_xattr(&attr);
>         saved_errno = errno;
>         if (unpriv)
>                 set_admin(false);
> @@ -846,9 +852,9 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
>                         return err;
>                 }
>         }
> -       if (retval != expected_val &&
> +       if (attr.retval != expected_val &&
>             expected_val != POINTER_VALUE) {

this if condition now fits one line, can you please combine? thanks!

> -               printf("FAIL retval %d != %d ", retval, expected_val);
> +               printf("FAIL retval %d != %d ", attr.retval, expected_val);
>                 return 1;
>         }
>
> --
> 2.20.1
>

^ permalink raw reply

* Re: [bpf-next v3 03/12] selftests/bpf: Avoid another case of errno clobbering
From: Andrii Nakryiko @ 2019-07-10 23:57 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-4-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:43 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> Commit 8184d44c9a57 ("selftests/bpf: skip verifier tests for
> unsupported program types") added a check for an unsupported program
> type. The function doing it changes errno, so test_verifier should
> save it before calling it if test_verifier wants to print a reason why
> verifying a BPF program of a supported type failed.
>
> Changes since v2:
> - Move the declaration to fit the reverse christmas tree style.
>
> Fixes: 8184d44c9a57 ("selftests/bpf: skip verifier tests for unsupported program types")
> Cc: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/testing/selftests/bpf/test_verifier.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index 3fe126e0083b..c7541f572932 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -864,6 +864,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>         int run_errs, run_successes;
>         int map_fds[MAX_NR_MAPS];
>         const char *expected_err;
> +       int saved_errno;
>         int fixup_skips;

nit: combine those ints? or even with i and err below as well?

>         __u32 pflags;
>         int i, err;
> @@ -894,6 +895,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>                 pflags |= BPF_F_ANY_ALIGNMENT;
>         fd_prog = bpf_verify_program(prog_type, prog, prog_len, pflags,
>                                      "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 4);
> +       saved_errno = errno;
>         if (fd_prog < 0 && !bpf_probe_prog_type(prog_type, 0)) {
>                 printf("SKIP (unsupported program type %d)\n", prog_type);
>                 skips++;
> @@ -910,7 +912,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>         if (expected_ret == ACCEPT) {
>                 if (fd_prog < 0) {
>                         printf("FAIL\nFailed to load prog '%s'!\n",
> -                              strerror(errno));
> +                              strerror(saved_errno));
>                         goto fail_log;
>                 }
>  #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> --
> 2.20.1
>

^ permalink raw reply

* Re: [bpf-next v3 02/12] selftests/bpf: Avoid a clobbering of errno
From: Andrii Nakryiko @ 2019-07-10 23:51 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-3-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> Save errno right after bpf_prog_test_run returns, so we later check
> the error code actually set by bpf_prog_test_run, not by some libcap
> function.
>
> Changes since v1:
> - Fix the "Fixes:" tag to mention actual commit that introduced the
>   bug
>
> Changes since v2:
> - Move the declaration so it fits the reverse christmas tree style.
>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Fixes: 832c6f2c29ec ("bpf: test make sure to run unpriv test cases in test_verifier")
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index b8d065623ead..3fe126e0083b 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -823,16 +823,18 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
>         __u8 tmp[TEST_DATA_LEN << 2];
>         __u32 size_tmp = sizeof(tmp);
>         uint32_t retval;
> +       int saved_errno;
>         int err;
>
>         if (unpriv)
>                 set_admin(true);
>         err = bpf_prog_test_run(fd_prog, 1, data, size_data,
>                                 tmp, &size_tmp, &retval, NULL);

Given err is either 0 or -1, how about instead making err useful right
here without extra variable?

if (bpf_prog_test_run(...))
        err = errno;

> +       saved_errno = errno;
>         if (unpriv)
>                 set_admin(false);
>         if (err) {
> -               switch (errno) {
> +               switch (saved_errno) {
>                 case 524/*ENOTSUPP*/:

ENOTSUPP is defined in include/linux/errno.h, is there any problem
with using this in selftests?

>                         printf("Did not run the program (not supported) ");
>                         return 0;
> --
> 2.20.1
>

^ permalink raw reply

* Re: [bpf-next v3 01/12] selftests/bpf: Print a message when tester could not run a program
From: Andrii Nakryiko @ 2019-07-10 23:44 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-2-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> This prints a message when the error is about program type being not
> supported by the test runner or because of permissions problem. This
> is to see if the program we expected to run was actually executed.
>
> The messages are open-coded because strerror(ENOTSUPP) returns
> "Unknown error 524".
>
> Changes since v2:
> - Also print "FAIL" on an unexpected bpf_prog_test_run error, so there
>   is a corresponding "FAIL" message for each failed test.
>
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index c5514daf8865..b8d065623ead 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -831,11 +831,20 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
>                                 tmp, &size_tmp, &retval, NULL);
>         if (unpriv)
>                 set_admin(false);
> -       if (err && errno != 524/*ENOTSUPP*/ && errno != EPERM) {
> -               printf("Unexpected bpf_prog_test_run error ");
> -               return err;
> +       if (err) {
> +               switch (errno) {
> +               case 524/*ENOTSUPP*/:
> +                       printf("Did not run the program (not supported) ");
> +                       return 0;
> +               case EPERM:
> +                       printf("Did not run the program (no permission) ");

Let's add "SKIP: " prefix to these?

> +                       return 0;
> +               default:
> +                       printf("FAIL: Unexpected bpf_prog_test_run error (%s) ", strerror(saved_errno));
> +                       return err;
> +               }
>         }
> -       if (!err && retval != expected_val &&
> +       if (retval != expected_val &&
>             expected_val != POINTER_VALUE) {
>                 printf("FAIL retval %d != %d ", retval, expected_val);
>                 return 1;
> --
> 2.20.1
>

^ permalink raw reply

* bpf_prog_free_deferred bug. WAS: Re: WARNING in mark_chain_precision
From: Andrii Nakryiko @ 2019-07-10 23:29 UTC (permalink / raw)
  To: Hillf Danton
  Cc: syzbot, aaron.f.brown, Alexei Starovoitov, bpf, Daniel Borkmann,
	David S. Miller, hawk, intel-wired-lan, Jakub Kicinski,
	jeffrey.t.kirsher, john fastabend, Martin Lau, open list,
	Networking, sasha.neftin, Song Liu, syzkaller-bugs, xdp-newbies,
	Yonghong Song
In-Reply-To: <000000000000a94981058d37f1a4@google.com>

On Tue, Jul 9, 2019 at 9:08 PM Hillf Danton <hdanton@sina.com> wrote:
>
>
> Mon, 08 Jul 2019 21:25:00 -0700 (PDT)
> > Hello,
> >
> > syzbot has tested the proposed patch but the reproducer still triggered
> > crash:
> > WARNING in bpf_jit_free
> >
> > WARNING: CPU: 0 PID: 9077 at kernel/bpf/core.c:851 bpf_jit_free+0x157/0x1b0
> > Kernel panic - not syncing: panic_on_warn set ...
> > CPU: 0 PID: 9077 Comm: kworker/0:3 Not tainted 5.2.0-rc6+ #1
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Workqueue: events bpf_prog_free_deferred
> > Call Trace:
> >   __dump_stack lib/dump_stack.c:77 [inline]
> >   dump_stack+0x172/0x1f0 lib/dump_stack.c:113
> >   panic+0x2cb/0x744 kernel/panic.c:219
> >   __warn.cold+0x20/0x4d kernel/panic.c:576
> >   report_bug+0x263/0x2b0 lib/bug.c:186
> >   fixup_bug arch/x86/kernel/traps.c:179 [inline]
> >   fixup_bug arch/x86/kernel/traps.c:174 [inline]
> >   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
> >   do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
> >   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:986
> > RIP: 0010:bpf_jit_free+0x157/0x1b0
> > Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 b8 00 02 00 00
> > 00 00 ad de 48 39 43 70 0f 84 05 ff ff ff e8 09 7f f4 ff <0f> 0b e9 f9 fe
> > ff ff e8 2d 02 2e 00 e9 d9 fe ff ff 48 89 7d e0 e8
> > RSP: 0018:ffff888084affcb0 EFLAGS: 00010293
> > RAX: ffff88808a622100 RBX: ffff88809639d580 RCX: ffffffff817b0b0d
> > RDX: 0000000000000000 RSI: ffffffff817c4557 RDI: ffff88809639d5f0
> > RBP: ffff888084affcd0 R08: 1ffffffff150daa8 R09: fffffbfff150daa9
> > R10: fffffbfff150daa8 R11: ffffffff8a86d547 R12: ffffc90001921000
> > R13: ffff88809639d5e8 R14: ffff8880a0589800 R15: ffff8880ae834d40
> >   bpf_prog_free_deferred+0x27a/0x350 kernel/bpf/core.c:1982
> >   process_one_work+0x989/0x1790 kernel/workqueue.c:2269
> >   worker_thread+0x98/0xe40 kernel/workqueue.c:2415
> >   kthread+0x354/0x420 kernel/kthread.c:255
> >   ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> >
> >
> > Tested on:
> >
> > commit:         b9321614 bpf: fix precision bit propagation for BPF_ST ins..
> > git tree:       https://github.com/anakryiko/linux bpf-fix-precise-bpf_st
> > console output: https://syzkaller.appspot.com/x/log.txt?x=112f0dfda00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=6bb3e6e7997c14f9
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> >
>
> 1, currently, bpf_prog_put(), the put helper, deletes all kallsyms before
> invoking the free helper, bpf_prog_free(); the latter complains if kallsyms
> are detected not all off.
>
> 2, before commit 538950a1b752 ("soreuseport: setsockopt
> SO_ATTACH_REUSEPORT_[CE]BPF"), __bpf_prog_release() already called the put
> helper or the free one for a given prog depending on its type: put for
> BPF_PROG_TYPE_SOCKET_FILTER.
>
> In the commit we can see bpf_prog_destroy(), __bpf_prog_free(),
> bpf_prog_put(), here and then; Note in __get_bpf() the put for
> !BPF_PROG_TYPE_SOCKET_FILTER.
>
> 3, in commit 113214be7f6c ("bpf: refactor bpf_prog_get and type check into
> helper") bpf_prog_get_type() was added and put in __get_bpf().
>
> In the commit we can see other types like BPF_PROG_TYPE_SCHED_ACT and
> BPF_PROG_TYPE_SCHED_CLS.
>
> 4, in commit 8217ca653ec6 ("bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog
> in reuseport selection") sk_reuseport_prog_free() was added without a word
> in the log for it.
>
> 5, enrich prog type in __bpf_prog_release().

Thanks for investigation!

I'm curious what makes BPF_PROG_TYPE_SOCKET_FILTER (and
BPF_PROG_TYPE_SK_REUSEPORT) special, compared to all the other types
of BPF programs. If it's something to do with sockets specifically,
there are a bunch of other programs that deal with sockets, so should
they be handled specially as well (e.g., BPF_PROG_TYPE_CGROUP_SOCK)?

Daniel, do you have any insight here?

>
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1142,11 +1142,15 @@ static void bpf_release_orig_filter(stru
>
>  static void __bpf_prog_release(struct bpf_prog *prog)
>  {
> -       if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER) {
> +       switch (prog->type) {
> +       case BPF_PROG_TYPE_SOCKET_FILTER:
> +       case BPF_PROG_TYPE_SK_REUSEPORT:
>                 bpf_prog_put(prog);
> -       } else {
> +               break;
> +       default:
>                 bpf_release_orig_filter(prog);
>                 bpf_prog_free(prog);
> +               break;
>         }
>  }
>
> --
>

^ permalink raw reply

* Re: [RESEND] test_verifier #13 fails on arm64: "retval 65507 != -29"
From: Andi Kleen @ 2019-07-10 23:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Paolo Pisati, Yonghong Song, Daniel Borkmann, Alexei Starovoitov,
	Martin Lau, Song Liu, bpf@vger.kernel.org, Networking
In-Reply-To: <CAEf4Bzbtpqk-9ELnHFsHo278b5T4Z-2CgNnNbOqbD5Ocbuc-fg@mail.gmail.com>

> > Reading csum_partial/csum_fold, seems like after calculation of
> > checksum (so-called unfolded checksum), it is supposed to be passed
> > into csum_fold() to convert it into 16-bit one and invert.

Yes, you always need to fold at the end.

The low level code does fold sometimes, but not always.

-Andi

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5e: Provide cb_list pointer when setting up tc block on rep
From: Pablo Neira Ayuso @ 2019-07-10 23:15 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, saeedm
In-Reply-To: <20190710182554.2988-1-vladbu@mellanox.com>

On Wed, Jul 10, 2019 at 09:25:54PM +0300, Vlad Buslov wrote:
> Recent refactoring of tc block offloads infrastructure introduced new
> flow_block_cb_setup_simple() method intended to be used as unified way for
> all drivers to register offload callbacks. However, commit that actually
> extended all users (drivers) with block cb list and provided it to
> flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer
> dereference when creating Qdisc:
> 
> [  278.385175] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  278.393233] #PF: supervisor read access in kernel mode
> [  278.399446] #PF: error_code(0x0000) - not-present page
> [  278.405847] PGD 8000000850e73067 P4D 8000000850e73067 PUD 8620cd067 PMD 0
> [  278.414141] Oops: 0000 [#1] SMP PTI
> [  278.419019] CPU: 7 PID: 3369 Comm: tc Not tainted 5.2.0-rc6+ #492
> [  278.426580] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
> [  278.435853] RIP: 0010:flow_block_cb_setup_simple+0xc4/0x190
> [  278.442953] Code: 10 48 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 05 00 01 00 00 49 89 40 08 31 c0 c3 b8 a1 ff ff ff c3 f3 c3 <48> 8b 06 48 39 c6 75 0a eb 1a 48 8b 00 48 39 c6 74 12
>  48 3b 50 28
> [  278.464829] RSP: 0018:ffffaf07c3f97990 EFLAGS: 00010246
> [  278.471648] RAX: 0000000000000000 RBX: ffff9b43ed4c7680 RCX: ffff9b43d5f80840
> [  278.480408] RDX: ffffffffc0491650 RSI: 0000000000000000 RDI: ffffaf07c3f97998
> [  278.489110] RBP: ffff9b43ddff9000 R08: ffff9b43d5f80840 R09: 0000000000000001
> [  278.497838] R10: 0000000000000009 R11: 00000000000003ad R12: ffffaf07c3f97c08
> [  278.506595] R13: ffff9b43d5f80000 R14: ffff9b43ed4c7680 R15: ffff9b43dfa20b40
> [  278.515374] FS:  00007f796be1b400(0000) GS:ffff9b43ef840000(0000) knlGS:0000000000000000
> [  278.525099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  278.532453] CR2: 0000000000000000 CR3: 0000000840398002 CR4: 00000000001606e0
> [  278.541197] Call Trace:
> [  278.545252]  tcf_block_offload_cmd.isra.52+0x7e/0xb0
> [  278.551871]  tcf_block_get_ext+0x365/0x3e0
> [  278.557569]  qdisc_create+0x15c/0x4e0
> [  278.562859]  ? kmem_cache_alloc_trace+0x1a2/0x1c0
> [  278.569235]  tc_modify_qdisc+0x1c8/0x780
> [  278.574761]  rtnetlink_rcv_msg+0x291/0x340
> [  278.580518]  ? _cond_resched+0x15/0x40
> [  278.585856]  ? rtnl_calcit.isra.29+0x120/0x120
> [  278.591868]  netlink_rcv_skb+0x4a/0x110
> [  278.597198]  netlink_unicast+0x1a0/0x250
> [  278.602601]  netlink_sendmsg+0x2c1/0x3c0
> [  278.608022]  sock_sendmsg+0x5b/0x60
> [  278.612969]  ___sys_sendmsg+0x289/0x310
> [  278.618231]  ? do_wp_page+0x99/0x730
> [  278.623216]  ? page_add_new_anon_rmap+0xbe/0x140
> [  278.629298]  ? __handle_mm_fault+0xc84/0x1360
> [  278.635113]  ? __sys_sendmsg+0x5e/0xa0
> [  278.640285]  __sys_sendmsg+0x5e/0xa0
> [  278.645239]  do_syscall_64+0x5b/0x1b0
> [  278.650274]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  278.656697] RIP: 0033:0x7f796abdeb87
> [  278.661628] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53
>  48 89 f3 48
> [  278.683248] RSP: 002b:00007ffde213ba48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [  278.692245] RAX: ffffffffffffffda RBX: 000000005d261e6f RCX: 00007f796abdeb87
> [  278.700862] RDX: 0000000000000000 RSI: 00007ffde213bab0 RDI: 0000000000000003
> [  278.709527] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
> [  278.718167] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
> [  278.726743] R13: 000000000067b580 R14: 0000000000000000 R15: 0000000000000000
> [  278.735302] Modules linked in: dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc mlx5_ib ib_uverbs intel_rapl ib_core sb_edac x86_pkg_temp_
> thermal intel_powerclamp coretemp kvm_intel kvm mlx5_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel ses mei_me enclosure mlxfw ipmi_ssif intel_cstate iTCO_wdt ptp mei
> pps_core iTCO_vendor_support pcspkr joydev intel_uncore i2c_i801 ipmi_si lpc_ich intel_rapl_perf ioatdma wmi dca pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_k
> ms_helper ttm drm mpt3sas raid_class scsi_transport_sas
> [  278.802263] CR2: 0000000000000000
> [  278.807170] ---[ end trace b1f0a442a279e66f ]---
> 
> Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to
> flow_block_cb_setup_simple() function instead of hardcoded NULL pointer.
> 
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

There's a similar patch from wenxu BTW.

^ permalink raw reply

* Re: [PATCH net-next] net/sched: Fix kernel NULL pointer dereference
From: Pablo Neira Ayuso @ 2019-07-10 23:13 UTC (permalink / raw)
  To: wenxu; +Cc: davem, netfilter-devel, netdev
In-Reply-To: <20190710220337.tbflwdku4332ewo5@salvia>

On Thu, Jul 11, 2019 at 12:03:37AM +0200, Pablo Neira Ayuso wrote:
> On Wed, Jul 10, 2019 at 09:45:04PM +0800, wenxu@ucloud.cn wrote:
> > From: wenxu <wenxu@ucloud.cn>
> > 
> > [  697.665184] BUG: kernel NULL pointer dereference, address: 0000000000000030
> > [  697.665550] #PF: supervisor read access in kernel mode
> > [  697.665906] #PF: error_code(0x0000) - not-present page
> > [  697.666297] PGD 800000104e636067 P4D 800000104e636067 PUD ff4b02067 PMD 0
> > [  697.666710] Oops: 0000 [#1] SMP PTI
> > [  697.667115] CPU: 31 PID: 24466 Comm: modprobe Kdump: loaded Tainted: G           O      5.2.0-rc6+ #1
> > [  697.667867] Hardware name: Huawei Technologies Co., Ltd. RH1288 V3/BC11HGSC0, BIOS 3.57 02/26/2017
> > [  697.668620] RIP: 0010:tc_indr_block_ing_cmd.isra.52+0x4c/0xb0
> > [  697.669029] Code: 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 f3 48 ab 48 8b 06 49 8b b3 e8 04 00 00 44 89 45 b0 c7 45 b4 01 00 00 00 <8b> 48 30 48 89 75 c0 85 c9 48 8d 4d b0 0f 95 45 b8 48 85 c0 4c 8d
> > [  697.670132] RSP: 0018:ffffc90007bf7958 EFLAGS: 00010246
> > [  697.670537] RAX: 0000000000000000 RBX: ffff88905e2cbae8 RCX: 0000000000000000
> > [  697.670938] RDX: ffff88905e2cbcd8 RSI: ffffffff823a8480 RDI: ffffc90007bf7990
> > [  697.671352] RBP: ffffc90007bf79a8 R08: 0000000000000000 R09: ffff88905e2cbcc0
> > [  697.671761] R10: ffff888107c07780 R11: ffff88902c249000 R12: ffff88905e2cbcd0
> > [  697.672173] R13: ffff88905e2cbac0 R14: ffff88885596bc00 R15: ffff88905e2cbcc0
> > [  697.672582] FS:  00007fe0b4095740(0000) GS:ffff88905fbc0000(0000) knlGS:0000000000000000
> > [  697.673335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  697.673746] CR2: 0000000000000030 CR3: 0000000ff46b4005 CR4: 00000000001606e0
> > [  697.674156] Call Trace:
> > [  697.674563]  __tc_indr_block_cb_register+0x11e/0x3c0
> > [  697.674998]  mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core]
> > [  697.675411]  notifier_call_chain+0x53/0xa0
> > [  697.675812]  raw_notifier_call_chain+0x16/0x20
> > [  697.676223]  call_netdevice_notifiers_info+0x2d/0x60
> > [  697.676633]  register_netdevice+0x3fa/0x500
> > 
> > get indr_dev->block after check it.
> > 
> > Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> > Signed-off-by: wenxu <wenxu@ucloud.cn>
> 
> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

Please, toss this patch.

Vlad's patch provides a more complete fix.

^ permalink raw reply

* Re: [RESEND] test_verifier #13 fails on arm64: "retval 65507 != -29"
From: Andrii Nakryiko @ 2019-07-10 22:54 UTC (permalink / raw)
  To: Paolo Pisati
  Cc: Yonghong Song, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Song Liu, bpf@vger.kernel.org, Networking, ak
In-Reply-To: <CAEf4BzayQ+bEKFHcs8cUDcVnwPpQ2_2gzPaxX-j38r=AWDzVvg@mail.gmail.com>

On Wed, Jul 10, 2019 at 3:52 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> cc Andi Kleen re: refactoring, do you have any insight here?

OK, let's try again...

>
> On Wed, Jul 10, 2019 at 3:12 AM Paolo Pisati <p.pisati@gmail.com> wrote:
> >
> > On Mon, Jul 01, 2019 at 09:51:25PM +0000, Yonghong Song wrote:
> > >
> > > Below is the test case.
> > > {
> > >          "valid read map access into a read-only array 2",
> > >          .insns = {
> > >          BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > >          BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > >          BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > >          BPF_LD_MAP_FD(BPF_REG_1, 0),
> > >          BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > > BPF_FUNC_map_lookup_elem),
> > >          BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
> > >
> > >          BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> > >          BPF_MOV64_IMM(BPF_REG_2, 4),
> > >          BPF_MOV64_IMM(BPF_REG_3, 0),
> > >          BPF_MOV64_IMM(BPF_REG_4, 0),
> > >          BPF_MOV64_IMM(BPF_REG_5, 0),
> > >          BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > >                       BPF_FUNC_csum_diff),
> > >          BPF_EXIT_INSN(),
> > >          },
> > >          .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> > >          .fixup_map_array_ro = { 3 },
> > >          .result = ACCEPT,
> > >          .retval = -29,
> > > },
> > >
> > > The issue may be with helper bpf_csum_diff().
> > > Maybe you can check bpf_csum_diff() helper return value
> > > to confirm and take a further look at bpf_csum_diff implementations
> > > between x64 and amd64.
> >
> > Indeed, the different result comes from csum_partial() or, more precisely,
> > do_csum().
>
> I assume this is checksum used for ipv4 header checksum ([0]), right?
> It's defined as "16-bit one's complement of the one's complement sum
> of all 16-bit words", so at the end it has to be folded into 16-bit
> anyways.
>
> Reading csum_partial/csum_fold, seems like after calculation of
> checksum (so-called unfolded checksum), it is supposed to be passed
> into csum_fold() to convert it into 16-bit one and invert.
>
> So maybe that's what is missing from bpf_csum_diff()? Though I've
> never used it and don't even know exactly what is its purpose, so
> might be (and probably am) totally wrong here (e.g., it might be that
> BPF app is supposed to do that or something).
>
>   [0] https://en.wikipedia.org/wiki/IPv4_header_checksum
>
> >
> > x86-64 uses an asm optimized version residing in arch/x86/lib/csum-partial_64.c,
> > while the generic version is in lib/checksum.c.
> >
> > I replaced the x86-64 csum_partial() / do_csum() code, with the one in
> > lib/checksum.c and by doing so i reproduced the same error on x86-64 (thus, it's
> > not an arch dependent issue).
> >
> > I added some debugging to bpf_csum_diff(), and here are the results with different
> > checksum implementation code:
> >
> > http://paste.debian.net/1091037/
> >
> > lib/checksum.c:
> > ...
> > [  206.084537] ____bpf_csum_diff from_size: 1 to_size: 0
> > [  206.085274] ____bpf_csum_diff from[0]: 28
> > [  206.085276] ____bpf_csum_diff diff[0]: 4294967267
> > [  206.085277] ____bpf_csum_diff diff_size: 4 seed: 0
> >
> > After csum_partial() call:
> >
> > [  206.086059] ____bpf_csum_diff csum: 65507 - 0xffe3
> >
> > arch/x86/lib/csum-partial_64.c
> > ...
> > [   40.467308] ____bpf_csum_diff from_size: 1 to_size: 0
> > [   40.468141] ____bpf_csum_diff from[0]: 28
> > [   40.468143] ____bpf_csum_diff diff[0]: 4294967267
> > [   40.468144] ____bpf_csum_diff diff_size: 4 seed: 0
> >
> > After csum_partial() call:
> >
> > [   40.468937] ____bpf_csum_diff csum: -29 - 0xffffffe3
> >
> > One thing that i noticed, x86-64 csum-partial_64.c::do_csum() doesn't reduce the
> > calculated checksum to 16bit before returning it (unless the input value is
> > odd - *):
> >
> > arch/x86/lib/csum-partial_64.c::do_csum()
> >                 ...
> >         if (unlikely(odd)) {
> >                 result = from32to16(result);
> >                 result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
> >         }
> >         return result;
> > }
> >
> > contrary to all the other do_csum() implementations (that i could understand):
> >
> > lib/checksum.c::do_csum()
> > arch/alpha/lib/checksum.c::do_csum()
> > arch/parisc/lib/checksum.c::do_csum()
> >
> > Apparently even ia64 does the folding (arch/ia64/lib/do_csum.S see a comment right
> > before .do_csum_exit:), and arch/c6x/lib/csum_64plus.S too (see
> > arch/c6x/lib/csum_64plus.S).
> >
> > Funnily enough, if i change do_csum() for x86-64, folding the
> > checksum to 16 bit (following all the other implementations):
> >
> > --- a/arch/x86/lib/csum-partial_64.c
> > +++ b/arch/x86/lib/csum-partial_64.c
> > @@ -112,8 +112,8 @@ static unsigned do_csum(const unsigned char *buff, unsigned
> > len)
> >         if (len & 1)
> >                 result += *buff;
> >         result = add32_with_carry(result>>32, result & 0xffffffff);
> > +       result = from32to16(result);
> >         if (unlikely(odd)) {
> > -               result = from32to16(result);
> >                 result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
> >         }
> >         return result;
> >
> > then, the x86-64 result match the others: 65507 or 0xffe3.
> >
> > As a last attempt, i tried running the bpf test_verifier on an armhf platform,
> > and i got a completely different number:
> >
> > [   57.667999] ____bpf_csum_diff from_size: 1 to_size: 0
> > [   57.668016] ____bpf_csum_diff from[0]: 28
> > [   57.668028] ____bpf_csum_diff diff[0]: 4294967267
> > [   57.668039] ____bpf_csum_diff diff_size: 4 seed: 0
> >
> > After csum_partial() call:
> >
> > [   57.668052] ____bpf_csum_diff::2002 csum: 131042 - 0x0001ffe2
> >
> > Not sure what to make of these number, but i have a question: whats is the
> > correct checksum of the memory chunk passed to csum_partial()? Is it really -29?
> >
> > Because, at least 2 other implementations i tested (the arm assembly code and
> > the c implementation in lib/checksum.c) computes a different value, so either
> > there's a bug in checksum calcution (2 out of 3???), or we are interpreting the
> > returned value from csum_partial() somehow wrongly.
> >
> > *: originally, the x86-64 did the 16bit folding, but the logic was changed to
> > what we have today during a big rewrite - search for:
> >
> > commit 3ef076bb685a461bbaff37a1f06010fc4d7ce733
> > Author: Andi Kleen <ak@suse.de>
> > Date:   Fri Jun 13 04:27:34 2003 -0700
> >
> >     [PATCH] x86-64 merge
> >
> > in this historic repo:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> > --
> > bye,
> > p.

^ permalink raw reply

* Re: [RESEND] test_verifier #13 fails on arm64: "retval 65507 != -29"
From: Andrii Nakryiko @ 2019-07-10 22:52 UTC (permalink / raw)
  To: Paolo Pisati
  Cc: Yonghong Song, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Song Liu, bpf@vger.kernel.org, Networking, ak
In-Reply-To: <20190710100248.GA32281@harukaze>

cc Andi Kleen re: refactoring, do you have any insight here?

On Wed, Jul 10, 2019 at 3:12 AM Paolo Pisati <p.pisati@gmail.com> wrote:
>
> On Mon, Jul 01, 2019 at 09:51:25PM +0000, Yonghong Song wrote:
> >
> > Below is the test case.
> > {
> >          "valid read map access into a read-only array 2",
> >          .insns = {
> >          BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> >          BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> >          BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> >          BPF_LD_MAP_FD(BPF_REG_1, 0),
> >          BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > BPF_FUNC_map_lookup_elem),
> >          BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
> >
> >          BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> >          BPF_MOV64_IMM(BPF_REG_2, 4),
> >          BPF_MOV64_IMM(BPF_REG_3, 0),
> >          BPF_MOV64_IMM(BPF_REG_4, 0),
> >          BPF_MOV64_IMM(BPF_REG_5, 0),
> >          BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> >                       BPF_FUNC_csum_diff),
> >          BPF_EXIT_INSN(),
> >          },
> >          .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> >          .fixup_map_array_ro = { 3 },
> >          .result = ACCEPT,
> >          .retval = -29,
> > },
> >
> > The issue may be with helper bpf_csum_diff().
> > Maybe you can check bpf_csum_diff() helper return value
> > to confirm and take a further look at bpf_csum_diff implementations
> > between x64 and amd64.
>
> Indeed, the different result comes from csum_partial() or, more precisely,
> do_csum().

I assume this is checksum used for ipv4 header checksum ([0]), right?
It's defined as "16-bit one's complement of the one's complement sum
of all 16-bit words", so at the end it has to be folded into 16-bit
anyways.

Reading csum_partial/csum_fold, seems like after calculation of
checksum (so-called unfolded checksum), it is supposed to be passed
into csum_fold() to convert it into 16-bit one and invert.

So maybe that's what is missing from bpf_csum_diff()? Though I've
never used it and don't even know exactly what is its purpose, so
might be (and probably am) totally wrong here (e.g., it might be that
BPF app is supposed to do that or something).

  [0] https://en.wikipedia.org/wiki/IPv4_header_checksum

>
> x86-64 uses an asm optimized version residing in arch/x86/lib/csum-partial_64.c,
> while the generic version is in lib/checksum.c.
>
> I replaced the x86-64 csum_partial() / do_csum() code, with the one in
> lib/checksum.c and by doing so i reproduced the same error on x86-64 (thus, it's
> not an arch dependent issue).
>
> I added some debugging to bpf_csum_diff(), and here are the results with different
> checksum implementation code:
>
> http://paste.debian.net/1091037/
>
> lib/checksum.c:
> ...
> [  206.084537] ____bpf_csum_diff from_size: 1 to_size: 0
> [  206.085274] ____bpf_csum_diff from[0]: 28
> [  206.085276] ____bpf_csum_diff diff[0]: 4294967267
> [  206.085277] ____bpf_csum_diff diff_size: 4 seed: 0
>
> After csum_partial() call:
>
> [  206.086059] ____bpf_csum_diff csum: 65507 - 0xffe3
>
> arch/x86/lib/csum-partial_64.c
> ...
> [   40.467308] ____bpf_csum_diff from_size: 1 to_size: 0
> [   40.468141] ____bpf_csum_diff from[0]: 28
> [   40.468143] ____bpf_csum_diff diff[0]: 4294967267
> [   40.468144] ____bpf_csum_diff diff_size: 4 seed: 0
>
> After csum_partial() call:
>
> [   40.468937] ____bpf_csum_diff csum: -29 - 0xffffffe3
>
> One thing that i noticed, x86-64 csum-partial_64.c::do_csum() doesn't reduce the
> calculated checksum to 16bit before returning it (unless the input value is
> odd - *):
>
> arch/x86/lib/csum-partial_64.c::do_csum()
>                 ...
>         if (unlikely(odd)) {
>                 result = from32to16(result);
>                 result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
>         }
>         return result;
> }
>
> contrary to all the other do_csum() implementations (that i could understand):
>
> lib/checksum.c::do_csum()
> arch/alpha/lib/checksum.c::do_csum()
> arch/parisc/lib/checksum.c::do_csum()
>
> Apparently even ia64 does the folding (arch/ia64/lib/do_csum.S see a comment right
> before .do_csum_exit:), and arch/c6x/lib/csum_64plus.S too (see
> arch/c6x/lib/csum_64plus.S).
>
> Funnily enough, if i change do_csum() for x86-64, folding the
> checksum to 16 bit (following all the other implementations):
>
> --- a/arch/x86/lib/csum-partial_64.c
> +++ b/arch/x86/lib/csum-partial_64.c
> @@ -112,8 +112,8 @@ static unsigned do_csum(const unsigned char *buff, unsigned
> len)
>         if (len & 1)
>                 result += *buff;
>         result = add32_with_carry(result>>32, result & 0xffffffff);
> +       result = from32to16(result);
>         if (unlikely(odd)) {
> -               result = from32to16(result);
>                 result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
>         }
>         return result;
>
> then, the x86-64 result match the others: 65507 or 0xffe3.
>
> As a last attempt, i tried running the bpf test_verifier on an armhf platform,
> and i got a completely different number:
>
> [   57.667999] ____bpf_csum_diff from_size: 1 to_size: 0
> [   57.668016] ____bpf_csum_diff from[0]: 28
> [   57.668028] ____bpf_csum_diff diff[0]: 4294967267
> [   57.668039] ____bpf_csum_diff diff_size: 4 seed: 0
>
> After csum_partial() call:
>
> [   57.668052] ____bpf_csum_diff::2002 csum: 131042 - 0x0001ffe2
>
> Not sure what to make of these number, but i have a question: whats is the
> correct checksum of the memory chunk passed to csum_partial()? Is it really -29?
>
> Because, at least 2 other implementations i tested (the arm assembly code and
> the c implementation in lib/checksum.c) computes a different value, so either
> there's a bug in checksum calcution (2 out of 3???), or we are interpreting the
> returned value from csum_partial() somehow wrongly.
>
> *: originally, the x86-64 did the 16bit folding, but the logic was changed to
> what we have today during a big rewrite - search for:
>
> commit 3ef076bb685a461bbaff37a1f06010fc4d7ce733
> Author: Andi Kleen <ak@suse.de>
> Date:   Fri Jun 13 04:27:34 2003 -0700
>
>     [PATCH] x86-64 merge
>
> in this historic repo:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> --
> bye,
> p.

^ permalink raw reply

* Re: [PATCH net] net: fix use-after-free in __netif_receive_skb_core
From: Sabrina Dubroca @ 2019-07-10 22:47 UTC (permalink / raw)
  To: Edward Cree; +Cc: netdev, Andreas Steinmetz
In-Reply-To: <62ad16f6-c33a-407e-2f55-9be382b7ec52@solarflare.com>

2019-07-10, 16:07:43 +0100, Edward Cree wrote:
> On 10/07/2019 14:52, Sabrina Dubroca wrote:
> > -static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
> > +static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
> >  				    struct packet_type **ppt_prev)
> >  {
> >  	struct packet_type *ptype, *pt_prev;
> >  	rx_handler_func_t *rx_handler;
> > +	struct sk_buff *skb = *pskb;
> Would it not be simpler just to change all users of skb to *pskb?
> Then you avoid having to keep doing "*pskb = skb;" whenever skb changes
>  (with concomitant risk of bugs if one gets missed).

Yes, that would be less risky. I wrote a version of the patch that did
exactly that, but found it really too ugly (both the patch and the
resulting code). We have more than 50 occurences of skb, including
things like:

    atomic_long_inc(&skb->dev->rx_dropped);

-- 
Sabrina

^ permalink raw reply

* Re: Fw: [Bug 204099] New: systemd-networkd fails on 5.2 - same version works on 5.1.16
From: David Ahern @ 2019-07-10 22:43 UTC (permalink / raw)
  To: Stephen Hemminger, netdev
In-Reply-To: <20190709074344.76049d02@hermes.lan>

On 7/9/19 8:43 AM, Stephen Hemminger wrote:
> Looks like the stricter netlink validation broke userspace.
> This is bad.

I believe other reports have traced this to

commit 7dc2bccab0ee37ac28096b8fcdc390a679a15841
Author: Maxim Mikityanskiy <maximmi@mellanox.com>
Date:   Tue May 21 06:40:04 2019 +0000

    Validate required parameters in inet6_validate_link_af

^ permalink raw reply

* Re: [PATCH V2 0/1] tools/dtrace: initial implementation of DTrace
From: David Miller @ 2019-07-10 22:32 UTC (permalink / raw)
  To: kris.van.hees
  Cc: netdev, bpf, dtrace-devel, linux-kernel, rostedt, mhiramat, acme,
	ast, daniel, peterz, clm
In-Reply-To: <201907101537.x6AFboMR015946@aserv0122.oracle.com>

From: Kris Van Hees <kris.van.hees@oracle.com>
Date: Wed, 10 Jul 2019 08:37:50 -0700 (PDT)

> This is version 2 of the patch, incorporating feedback from Peter
> Zijlstra and Arnaldo Carvalho de Melo.

No way, NACK.

^ permalink raw reply

* Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial implementation of DTrace
From: David Miller @ 2019-07-10 22:31 UTC (permalink / raw)
  To: brendan.d.gregg
  Cc: kris.van.hees, daniel, corbet, netdev, bpf, dtrace-devel,
	linux-kernel, rostedt, mhiramat, acme, ast, peterz, clm
In-Reply-To: <CAE40pdeSVN+QhhUeQ4sEbsyzJ+NWkQA5XU5X0FrKAbRMHPzBsw@mail.gmail.com>

From: Brendan Gregg <brendan.d.gregg@gmail.com>
Date: Wed, 10 Jul 2019 14:49:52 -0700

> Hey Kris -- so you're referring to me, and I've used DTrace more than
> anyone over the past 15 years, and I don't think anyone has used all
> the different Linux tracers more than I have. I think my opinion has a
> lot of value.

+1

I seriously am against starting to merge all of these userland tracing
tools into the tree.

They belong as separate independant projects, outside of the kernel
tree.

^ permalink raw reply

* Re: NEIGH: BUG, double timer add, state is 8
From: David Ahern @ 2019-07-10 22:18 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: Marek Majkowski, David Miller, netdev, kernel-team
In-Reply-To: <20190705173011.GA2882@localhost.localdomain>

On 7/5/19 11:30 AM, Lorenzo Bianconi wrote:
> looking at the reproducer it seems to me the issue is due to the use of
> 'NTF_USE' from userspace.
> Should we unschedule the neigh timer if we are in IN_TIMER receiving this
> flag from userspace? (taking appropriate locking)

I think you are right. Do you want to send a patch?

^ permalink raw reply

* Re: [PATCH nf-next] net/mlx5e: Fix kernel NULL pointer dereference
From: Pablo Neira Ayuso @ 2019-07-10 22:04 UTC (permalink / raw)
  To: wenxu; +Cc: davem, netdev
In-Reply-To: <1562768310-20468-1-git-send-email-wenxu@ucloud.cn>

On Wed, Jul 10, 2019 at 10:18:30PM +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> [ 3444.666552] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 3444.666631] #PF: supervisor read access in kernel mode
> [ 3444.666701] #PF: error_code(0x0000) - not-present page
> [ 3444.666769] PGD 8000000812dd7067 P4D 8000000812dd7067 PUD 8207cc067 PMD 0
> [ 3444.666843] Oops: 0000 [#1] SMP PTI
> [ 3444.666910] CPU: 17 PID: 27387 Comm: nft Kdump: loaded Tainted: G           O      5.2.0-rc6+ #1
> [ 3444.666987] Hardware name: Huawei Technologies Co., Ltd. RH1288 V3/BC11HGSC0, BIOS 3.57 02/26/2017
> [ 3444.667071] RIP: 0010:flow_block_cb_setup_simple+0x127/0x240
> [ 3444.667141] Code: 02 48 89 43 08 31 c0 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 83 c4 10 b8 a1 ff ff ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <49> 8b 04 24 49 39 c4 75 0a eb 2f 48 8b 00 49 39 c4 74 27 4c 3b 68
> [ 3444.668201] RSP: 0018:ffffc90007b7b888 EFLAGS: 00010246
> [ 3444.668595] RAX: 0000000000000000 RBX: ffff8890439a9b40 RCX: ffff88904d5008c0
> [ 3444.668992] RDX: ffffffffa0879850 RSI: 0000000000000000 RDI: ffffc90007b7b908
> [ 3444.669389] RBP: ffffc90007b7b8c0 R08: ffff88904d5008c0 R09: 0000000000000001
> [ 3444.669787] R10: ffff88885a797d00 R11: ffff8890439a9b00 R12: 0000000000000000
> [ 3444.670186] R13: ffffffffa0879850 R14: ffffc90007b7b908 R15: ffffffff823a8480
> [ 3444.670588] FS:  00007f357c2fa740(0000) GS:ffff88885fe40000(0000) knlGS:0000000000000000
> [ 3444.671313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3444.671705] CR2: 0000000000000000 CR3: 00000001a1600002 CR4: 00000000001626e0
> [ 3444.672103] Call Trace:
> [ 3444.672505]  ? jump_label_update+0x5f/0xc0
> [ 3444.672933]  mlx5e_rep_setup_tc+0x32/0x40 [mlx5_core]
> [ 3444.673335]  nft_flow_offload_chain+0xd0/0x1d0 [nf_tables]
> [ 3444.673729]  nft_flow_rule_offload_commit+0x91/0x11b [nf_tables]
> [ 3444.674129]  nf_tables_commit+0x90/0xe30 [nf_tables]
> [ 3444.674529]  nfnetlink_rcv_batch+0x3b9/0x750 [nfnetlink]
> 
> Init the driver_block_list parameter
> 
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: wenxu <wenxu@ucloud.cn>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply

* Re: [PATCH net-next] net/sched: Fix kernel NULL pointer dereference
From: Pablo Neira Ayuso @ 2019-07-10 22:03 UTC (permalink / raw)
  To: wenxu; +Cc: davem, netfilter-devel, netdev
In-Reply-To: <1562766304-20272-1-git-send-email-wenxu@ucloud.cn>

On Wed, Jul 10, 2019 at 09:45:04PM +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> [  697.665184] BUG: kernel NULL pointer dereference, address: 0000000000000030
> [  697.665550] #PF: supervisor read access in kernel mode
> [  697.665906] #PF: error_code(0x0000) - not-present page
> [  697.666297] PGD 800000104e636067 P4D 800000104e636067 PUD ff4b02067 PMD 0
> [  697.666710] Oops: 0000 [#1] SMP PTI
> [  697.667115] CPU: 31 PID: 24466 Comm: modprobe Kdump: loaded Tainted: G           O      5.2.0-rc6+ #1
> [  697.667867] Hardware name: Huawei Technologies Co., Ltd. RH1288 V3/BC11HGSC0, BIOS 3.57 02/26/2017
> [  697.668620] RIP: 0010:tc_indr_block_ing_cmd.isra.52+0x4c/0xb0
> [  697.669029] Code: 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 f3 48 ab 48 8b 06 49 8b b3 e8 04 00 00 44 89 45 b0 c7 45 b4 01 00 00 00 <8b> 48 30 48 89 75 c0 85 c9 48 8d 4d b0 0f 95 45 b8 48 85 c0 4c 8d
> [  697.670132] RSP: 0018:ffffc90007bf7958 EFLAGS: 00010246
> [  697.670537] RAX: 0000000000000000 RBX: ffff88905e2cbae8 RCX: 0000000000000000
> [  697.670938] RDX: ffff88905e2cbcd8 RSI: ffffffff823a8480 RDI: ffffc90007bf7990
> [  697.671352] RBP: ffffc90007bf79a8 R08: 0000000000000000 R09: ffff88905e2cbcc0
> [  697.671761] R10: ffff888107c07780 R11: ffff88902c249000 R12: ffff88905e2cbcd0
> [  697.672173] R13: ffff88905e2cbac0 R14: ffff88885596bc00 R15: ffff88905e2cbcc0
> [  697.672582] FS:  00007fe0b4095740(0000) GS:ffff88905fbc0000(0000) knlGS:0000000000000000
> [  697.673335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  697.673746] CR2: 0000000000000030 CR3: 0000000ff46b4005 CR4: 00000000001606e0
> [  697.674156] Call Trace:
> [  697.674563]  __tc_indr_block_cb_register+0x11e/0x3c0
> [  697.674998]  mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core]
> [  697.675411]  notifier_call_chain+0x53/0xa0
> [  697.675812]  raw_notifier_call_chain+0x16/0x20
> [  697.676223]  call_netdevice_notifiers_info+0x2d/0x60
> [  697.676633]  register_netdevice+0x3fa/0x500
> 
> get indr_dev->block after check it.
> 
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: wenxu <wenxu@ucloud.cn>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox