Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net/mlx5e: Provide cb_list pointer when setting up tc block on rep
From: wenxu @ 2019-07-11  0:35 UTC (permalink / raw)
  To: Vlad Buslov, netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, pablo, saeedm
In-Reply-To: <20190710182554.2988-1-vladbu@mellanox.com>


在 2019/7/11 2:25, Vlad Buslov 写道:
> Recent refactoring of tc block offloads infrastructure introduced new
> flow_block_cb_setup_simple() method intended to be used as unified way for
> all drivers to register offload callbacks. However, commit that actually
> extended all users (drivers) with block cb list and provided it to
> flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer
> dereference when creating Qdisc:
>
> [  278.385175] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  278.393233] #PF: supervisor read access in kernel mode
> [  278.399446] #PF: error_code(0x0000) - not-present page
> [  278.405847] PGD 8000000850e73067 P4D 8000000850e73067 PUD 8620cd067 PMD 0
> [  278.414141] Oops: 0000 [#1] SMP PTI
> [  278.419019] CPU: 7 PID: 3369 Comm: tc Not tainted 5.2.0-rc6+ #492
> [  278.426580] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
> [  278.435853] RIP: 0010:flow_block_cb_setup_simple+0xc4/0x190
> [  278.442953] Code: 10 48 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 05 00 01 00 00 49 89 40 08 31 c0 c3 b8 a1 ff ff ff c3 f3 c3 <48> 8b 06 48 39 c6 75 0a eb 1a 48 8b 00 48 39 c6 74 12
>  48 3b 50 28
> [  278.464829] RSP: 0018:ffffaf07c3f97990 EFLAGS: 00010246
> [  278.471648] RAX: 0000000000000000 RBX: ffff9b43ed4c7680 RCX: ffff9b43d5f80840
> [  278.480408] RDX: ffffffffc0491650 RSI: 0000000000000000 RDI: ffffaf07c3f97998
> [  278.489110] RBP: ffff9b43ddff9000 R08: ffff9b43d5f80840 R09: 0000000000000001
> [  278.497838] R10: 0000000000000009 R11: 00000000000003ad R12: ffffaf07c3f97c08
> [  278.506595] R13: ffff9b43d5f80000 R14: ffff9b43ed4c7680 R15: ffff9b43dfa20b40
> [  278.515374] FS:  00007f796be1b400(0000) GS:ffff9b43ef840000(0000) knlGS:0000000000000000
> [  278.525099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  278.532453] CR2: 0000000000000000 CR3: 0000000840398002 CR4: 00000000001606e0
> [  278.541197] Call Trace:
> [  278.545252]  tcf_block_offload_cmd.isra.52+0x7e/0xb0
> [  278.551871]  tcf_block_get_ext+0x365/0x3e0
> [  278.557569]  qdisc_create+0x15c/0x4e0
> [  278.562859]  ? kmem_cache_alloc_trace+0x1a2/0x1c0
> [  278.569235]  tc_modify_qdisc+0x1c8/0x780
> [  278.574761]  rtnetlink_rcv_msg+0x291/0x340
> [  278.580518]  ? _cond_resched+0x15/0x40
> [  278.585856]  ? rtnl_calcit.isra.29+0x120/0x120
> [  278.591868]  netlink_rcv_skb+0x4a/0x110
> [  278.597198]  netlink_unicast+0x1a0/0x250
> [  278.602601]  netlink_sendmsg+0x2c1/0x3c0
> [  278.608022]  sock_sendmsg+0x5b/0x60
> [  278.612969]  ___sys_sendmsg+0x289/0x310
> [  278.618231]  ? do_wp_page+0x99/0x730
> [  278.623216]  ? page_add_new_anon_rmap+0xbe/0x140
> [  278.629298]  ? __handle_mm_fault+0xc84/0x1360
> [  278.635113]  ? __sys_sendmsg+0x5e/0xa0
> [  278.640285]  __sys_sendmsg+0x5e/0xa0
> [  278.645239]  do_syscall_64+0x5b/0x1b0
> [  278.650274]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  278.656697] RIP: 0033:0x7f796abdeb87
> [  278.661628] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53
>  48 89 f3 48
> [  278.683248] RSP: 002b:00007ffde213ba48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [  278.692245] RAX: ffffffffffffffda RBX: 000000005d261e6f RCX: 00007f796abdeb87
> [  278.700862] RDX: 0000000000000000 RSI: 00007ffde213bab0 RDI: 0000000000000003
> [  278.709527] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
> [  278.718167] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
> [  278.726743] R13: 000000000067b580 R14: 0000000000000000 R15: 0000000000000000
> [  278.735302] Modules linked in: dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc mlx5_ib ib_uverbs intel_rapl ib_core sb_edac x86_pkg_temp_
> thermal intel_powerclamp coretemp kvm_intel kvm mlx5_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel ses mei_me enclosure mlxfw ipmi_ssif intel_cstate iTCO_wdt ptp mei
> pps_core iTCO_vendor_support pcspkr joydev intel_uncore i2c_i801 ipmi_si lpc_ich intel_rapl_perf ioatdma wmi dca pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_k
> ms_helper ttm drm mpt3sas raid_class scsi_transport_sas
> [  278.802263] CR2: 0000000000000000
> [  278.807170] ---[ end trace b1f0a442a279e66f ]---
>
> Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to
> flow_block_cb_setup_simple() function instead of hardcoded NULL pointer.
>
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 10ef90a7bddd..7245d287633d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -1175,6 +1175,8 @@ static int mlx5e_rep_setup_tc_cb(enum tc_setup_type type, void *type_data,
>  	}
>  }
>  
> +static LIST_HEAD(mlx5e_rep_block_cb_list);
> +

I think it is not necessary needs a extra LIST_HEAD, the early mlx5e_block_cb_list is ok

The early patch  http://patchwork.ozlabs.org/patch/1130439/ is enough.

>  static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  			      void *type_data)
>  {
> @@ -1182,7 +1184,8 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  
>  	switch (type) {
>  	case TC_SETUP_BLOCK:
> -		return flow_block_cb_setup_simple(type_data, NULL,
> +		return flow_block_cb_setup_simple(type_data,
> +						  &mlx5e_rep_block_cb_list,
>  						  mlx5e_rep_setup_tc_cb,
>  						  priv, priv, true);
>  	default:

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  0:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <CAEf4BzaVouFd=3whC1EjhQ9mit62b-C+NhQuW4RiXW02Rq_1Ug@mail.gmail.com>



On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
>>> BTF verifier has Different logic depending on whether we are following
>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
>>> stop early in DFS traversal while resolving BTF types. But it also
>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
>>> size won't be resolved, as it is considered to be a sink for pointer,
>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
>>> completely wrong.
>>>
>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
>>> over all BTF types anyways, so the only saving is a potentially slightly
>>> shorter stack. But correctness is more important that tiny savings.
>>>
>>> This bug manifests itself in rejecting BTF-defined maps that use array
>>> typedef as a value type:
>>>
>>> typedef int array_t[16];
>>>
>>> struct {
>>>        __uint(type, BPF_MAP_TYPE_ARRAY);
>>>        __type(value, array_t); /* i.e., array_t *value; */
>>> } test_map SEC(".maps");
>>>
>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
>>> Cc: Martin KaFai Lau <kafai@fb.com>
>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>
>> The change seems okay to me. Currently, looks like intermediate
>> modifier type will carry size = 0 (in the internal data structure).
> 
> Yes, which is totally wrong, especially that we use that size in some
> cases to reject map with specified BTF.
> 
>>
>> If we remove RESOLVE logic, we probably want to double check
>> whether we handle circular types correctly or not. Maybe we will
>> be okay if all self tests pass.
> 
> I checked, it does. We'll attempt to add referenced type unless it's a
> "resolve sink" (where size is immediately known) or is already
> resolved (it's state is RESOLVED). In other cases, we'll attempt to
> env_stack_push(), which check that the state of that type is
> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> type is added into the stack, it's resolve state goes from NOT_VISITED
> to VISITED.
> 
> So, if there is a loop, then we'll detect it as soon as we'll attempt
> to add the same type onto the stack second time.
> 
>>
>> I may still be worthwhile to qualify the RESOLVE optimization benefit
>> before removing it.
> 
> I don't think there is any, because every type will be visited exactly
> once, due to DFS nature of algorithm. The only difference is that if
> we have a long chain of modifiers, we can technically reach the max
> limit and fail. But at 32 I think it's pretty unrealistic to have such
> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> 
>>
>> Another possible change is, for external usage, removing
>> modifiers, before checking the size, something like below.
>> Note that I am not strongly advocating my below patch as
>> it has the same shortcoming that maintained modifier type
>> size may not be correct.
> 
> I don't think your patch helps, it can actually confuse things even
> more. It skips modifiers until underlying type is found, but you still
> don't guarantee that at that time that underlying type will have its
> size resolved.

It actually does help. It does not change the internal btf type
traversal algorithms. It only change the implementation of
an external API btf_type_id_size(). Previously, this function
is used by externals and internal btf.c. I broke it into two,
one internal __btf_type_id_size(), and another external
btf_type_id_size(). The external one removes modifier before
finding type size. The external one is typically used only
after btf is validated.

Will go through your other comments later.

> 
>>
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 546ebee39e2a..6f927c3e0a89 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
>> btf_type *t)
>>           return true;
>>    }
>>
>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
>> +                                                u32 *type_id, u32
>> *ret_size,
>> +                                                bool skip_modifier)
>> +{
>> +       const struct btf_type *size_type;
>> +       u32 size_type_id = *type_id;
>> +       u32 size = 0;
>> +
>> +       size_type = btf_type_by_id(btf, size_type_id);
>> +       if (size_type && skip_modifier) {
>> +               while (btf_type_is_modifier(size_type))
>> +                       size_type = btf_type_by_id(btf, size_type->type);
>> +       }
>> +
>> +       if (btf_type_nosize_or_null(size_type))
>> +               return NULL;
>> +
>> +       if (btf_type_has_size(size_type)) {
>> +               size = size_type->size;
>> +       } else if (btf_type_is_array(size_type)) {
>> +               size = btf->resolved_sizes[size_type_id];
>> +       } else if (btf_type_is_ptr(size_type)) {
>> +               size = sizeof(void *);
>> +       } else {
>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
>> +                                !btf_type_is_var(size_type)))
>> +                       return NULL;
>> +
>> +               size = btf->resolved_sizes[size_type_id];
>> +               size_type_id = btf->resolved_ids[size_type_id];
>> +               size_type = btf_type_by_id(btf, size_type_id);
>> +               if (btf_type_nosize_or_null(size_type))
>> +                       return NULL;
>> +       }
>> +
>> +       *type_id = size_type_id;
>> +       if (ret_size)
>> +               *ret_size = size;
>> +
>> +       return size_type;
>> +}
>> +
[...]

^ permalink raw reply

* [PATCH bpf-next] selftests/bpf: remove logic duplication in test_verifier.c
From: Andrii Nakryiko @ 2019-07-11  1:08 UTC (permalink / raw)
  To: andrii.nakryiko, kernel-team, ast, daniel, bpf, netdev
  Cc: Andrii Nakryiko, Krzesimir Nowak

test_verifier tests can specify single- and multi-runs tests. Internally
logic of handling them is duplicated. Get rid of it by making single run
retval specification to be a first retvals spec.

Cc: Krzesimir Nowak <krzesimir@kinvolk.io>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/test_verifier.c | 37 ++++++++++-----------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index b0773291012a..120ecdf4a7db 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -86,7 +86,7 @@ struct bpf_test {
 	int fixup_sk_storage_map[MAX_FIXUPS];
 	const char *errstr;
 	const char *errstr_unpriv;
-	uint32_t retval, retval_unpriv, insn_processed;
+	uint32_t insn_processed;
 	int prog_len;
 	enum {
 		UNDEF,
@@ -95,16 +95,24 @@ struct bpf_test {
 	} result, result_unpriv;
 	enum bpf_prog_type prog_type;
 	uint8_t flags;
-	__u8 data[TEST_DATA_LEN];
 	void (*fill_helper)(struct bpf_test *self);
 	uint8_t runs;
-	struct {
-		uint32_t retval, retval_unpriv;
-		union {
-			__u8 data[TEST_DATA_LEN];
-			__u64 data64[TEST_DATA_LEN / 8];
+	union {
+		struct {
+			uint32_t retval, retval_unpriv;
+			union {
+				__u8 data[TEST_DATA_LEN];
+				__u64 data64[TEST_DATA_LEN / 8];
+			};
 		};
-	} retvals[MAX_TEST_RUNS];
+		struct {
+			uint32_t retval, retval_unpriv;
+			union {
+				__u8 data[TEST_DATA_LEN];
+				__u64 data64[TEST_DATA_LEN / 8];
+			};
+		} retvals[MAX_TEST_RUNS];
+	};
 	enum bpf_attach_type expected_attach_type;
 };
 
@@ -949,17 +957,8 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 		uint32_t expected_val;
 		int i;
 
-		if (!test->runs) {
-			expected_val = unpriv && test->retval_unpriv ?
-				test->retval_unpriv : test->retval;
-
-			err = do_prog_test_run(fd_prog, unpriv, expected_val,
-					       test->data, sizeof(test->data));
-			if (err)
-				run_errs++;
-			else
-				run_successes++;
-		}
+		if (!test->runs)
+			test->runs = 1;
 
 		for (i = 0; i < test->runs; i++) {
 			if (unpriv && test->retvals[i].retval_unpriv)
-- 
2.17.1


^ permalink raw reply related

* Re: [bpf-next v3 05/12] selftests/bpf: Allow passing more information to BPF prog test run
From: Andrii Nakryiko @ 2019-07-11  1:17 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-6-krzesimir@kinvolk.io>

On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> The test case can now specify a custom length of the data member,
> context data and its length, which will be passed to
> bpf_prog_test_run_xattr. For backward compatilibity, if the data
> length is 0 (which is what will happen when the field is left
> unspecified in the designated initializer of a struct), then the
> length passed to the bpf_prog_test_run_xattr is TEST_DATA_LEN.
>
> Also for backward compatilibity, if context data length is 0, NULL is
> passed as a context to bpf_prog_test_run_xattr. This is to avoid
> breaking other tests, where context data being NULL and context data
> length being 0 is handled differently from the case where context data
> is not NULL and context data length is 0.
>
> Custom lengths still can't be greater than hardcoded 64 bytes for data
> and 192 for context data.
>
> 192 for context data was picked to allow passing struct
> bpf_perf_event_data as a context for perf event programs. The struct
> is quite large, because it contains struct pt_regs.
>
> Test runs for perf event programs will not allow the copying the data
> back to data_out buffer, so they require data_out_size to be zero and
> data_out to be NULL. Since test_verifier hardcodes it, make it
> possible to override the size. Overriding the size to zero will cause
> the buffer to be NULL.
>
> Changes since v2:
> - Allow overriding the data out size and buffer.
>
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 105 +++++++++++++++++---
>  1 file changed, 93 insertions(+), 12 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index 1640ba9f12c1..6f124cc4ee34 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -54,6 +54,7 @@
>  #define MAX_TEST_RUNS  8
>  #define POINTER_VALUE  0xcafe4all
>  #define TEST_DATA_LEN  64
> +#define TEST_CTX_LEN   192
>
>  #define F_NEEDS_EFFICIENT_UNALIGNED_ACCESS     (1 << 0)
>  #define F_LOAD_WITH_STRICT_ALIGNMENT           (1 << 1)
> @@ -96,7 +97,12 @@ struct bpf_test {
>         enum bpf_prog_type prog_type;
>         uint8_t flags;
>         __u8 data[TEST_DATA_LEN];
> +       __u32 data_len;
> +       __u8 ctx[TEST_CTX_LEN];
> +       __u32 ctx_len;
>         void (*fill_helper)(struct bpf_test *self);
> +       bool override_data_out_len;
> +       __u32 overridden_data_out_len;
>         uint8_t runs;
>         struct {
>                 uint32_t retval, retval_unpriv;
> @@ -104,6 +110,9 @@ struct bpf_test {
>                         __u8 data[TEST_DATA_LEN];
>                         __u64 data64[TEST_DATA_LEN / 8];
>                 };
> +               __u32 data_len;
> +               __u8 ctx[TEST_CTX_LEN];
> +               __u32 ctx_len;
>         } retvals[MAX_TEST_RUNS];
>  };
>
> @@ -818,21 +827,35 @@ static int set_admin(bool admin)
>  }
>
>  static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
> -                           void *data, size_t size_data)
> +                           void *data, size_t size_data, void *ctx,
> +                           size_t size_ctx, u32 *overridden_data_out_size)
>  {
> -       __u8 tmp[TEST_DATA_LEN << 2];
> -       __u32 size_tmp = sizeof(tmp);
> -       int saved_errno;
> -       int err;
>         struct bpf_prog_test_run_attr attr = {
>                 .prog_fd = fd_prog,
>                 .repeat = 1,
>                 .data_in = data,
>                 .data_size_in = size_data,
> -               .data_out = tmp,
> -               .data_size_out = size_tmp,
> +               .ctx_in = ctx,
> +               .ctx_size_in = size_ctx,
>         };
> +       __u8 tmp[TEST_DATA_LEN << 2];
> +       __u32 size_tmp = sizeof(tmp);
> +       __u32 size_buf = size_tmp;
> +       __u8 *buf = tmp;
> +       int saved_errno;
> +       int err;
>
> +       if (overridden_data_out_size)
> +               size_buf = *overridden_data_out_size;
> +       if (size_buf > size_tmp) {
> +               printf("FAIL: out data size (%d) greater than a buffer size (%d) ",
> +                      size_buf, size_tmp);
> +               return -EINVAL;
> +       }
> +       if (!size_buf)
> +               buf = NULL;
> +       attr.data_size_out = size_buf;
> +       attr.data_out = buf;
>         if (unpriv)
>                 set_admin(true);
>         err = bpf_prog_test_run_xattr(&attr);
> @@ -956,13 +979,45 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>         if (!alignment_prevented_execution && fd_prog >= 0) {
>                 uint32_t expected_val;
>                 int i;
> +               __u32 size_data;
> +               __u32 size_ctx;
> +               bool bad_size;
> +               void *ctx;
> +               __u32 *overridden_data_out_size;
>
>                 if (!test->runs) {
> +                       if (test->data_len > 0)
> +                               size_data = test->data_len;
> +                       else
> +                               size_data = sizeof(test->data);
> +                       if (test->override_data_out_len)
> +                               overridden_data_out_size = &test->overridden_data_out_len;
> +                       else
> +                               overridden_data_out_size = NULL;
> +                       size_ctx = test->ctx_len;
> +                       bad_size = false;

I hated all this duplication of logic, which with this patch becomes
even more expansive, so I removed it. Please see [0]. Can you please
apply that patch and add all this new logic only once?

  [0] https://patchwork.ozlabs.org/patch/1130601/

>                         expected_val = unpriv && test->retval_unpriv ?
>                                 test->retval_unpriv : test->retval;
>
> -                       err = do_prog_test_run(fd_prog, unpriv, expected_val,
> -                                              test->data, sizeof(test->data));
> +                       if (size_data > sizeof(test->data)) {
> +                               printf("FAIL: data size (%u) greater than TEST_DATA_LEN (%lu) ", size_data, sizeof(test->data));
> +                               bad_size = true;
> +                       }
> +                       if (size_ctx > sizeof(test->ctx)) {
> +                               printf("FAIL: ctx size (%u) greater than TEST_CTX_LEN (%lu) ", size_ctx, sizeof(test->ctx));

These look like way too long lines, wrap them?

> +                               bad_size = true;
> +                       }
> +                       if (size_ctx)
> +                               ctx = test->ctx;
> +                       else
> +                               ctx = NULL;

nit: single line:

ctx = size_ctx ? test->ctx : NULL;

> +                       if (bad_size)
> +                               err = 1;
> +                       else
> +                               err = do_prog_test_run(fd_prog, unpriv, expected_val,
> +                                                      test->data, size_data,
> +                                                      ctx, size_ctx,
> +                                                      overridden_data_out_size);
>                         if (err)
>                                 run_errs++;
>                         else
> @@ -970,14 +1025,40 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
>                 }
>
>                 for (i = 0; i < test->runs; i++) {
> +                       if (test->retvals[i].data_len > 0)
> +                               size_data = test->retvals[i].data_len;
> +                       else
> +                               size_data = sizeof(test->retvals[i].data);
> +                       if (test->override_data_out_len)
> +                               overridden_data_out_size = &test->overridden_data_out_len;
> +                       else
> +                               overridden_data_out_size = NULL;
> +                       size_ctx = test->retvals[i].ctx_len;
> +                       bad_size = false;
>                         if (unpriv && test->retvals[i].retval_unpriv)
>                                 expected_val = test->retvals[i].retval_unpriv;
>                         else
>                                 expected_val = test->retvals[i].retval;
>
> -                       err = do_prog_test_run(fd_prog, unpriv, expected_val,
> -                                              test->retvals[i].data,
> -                                              sizeof(test->retvals[i].data));
> +                       if (size_data > sizeof(test->retvals[i].data)) {
> +                               printf("FAIL: data size (%u) at run %i greater than TEST_DATA_LEN (%lu) ", size_data, i + 1, sizeof(test->retvals[i].data));
> +                               bad_size = true;
> +                       }
> +                       if (size_ctx > sizeof(test->retvals[i].ctx)) {
> +                               printf("FAIL: ctx size (%u) at run %i greater than TEST_CTX_LEN (%lu) ", size_ctx, i + 1, sizeof(test->retvals[i].ctx));
> +                               bad_size = true;
> +                       }
> +                       if (size_ctx)
> +                               ctx = test->retvals[i].ctx;
> +                       else
> +                               ctx = NULL;
> +                       if (bad_size)
> +                               err = 1;
> +                       else
> +                               err = do_prog_test_run(fd_prog, unpriv, expected_val,
> +                                                      test->retvals[i].data, size_data,
> +                                                      ctx, size_ctx,
> +                                                      overridden_data_out_size);
>                         if (err) {
>                                 printf("(run %d/%d) ", i + 1, test->runs);
>                                 run_errs++;
> --
> 2.20.1
>

^ permalink raw reply

* RE: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Nixiaoming @ 2019-07-11  1:32 UTC (permalink / raw)
  To: Greg KH
  Cc: adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	jlayton@kernel.org, luto@kernel.org, mingo@kernel.org,
	Nadia.Derbey@bull.net, paulmck@linux.vnet.ibm.com,
	semen.protsenko@linaro.org, stable@kernel.org,
	stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org, vvs@virtuozzo.com, Huangjianhui (Alex),
	Dailei, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20190710055628.GB5778@kroah.com>

On Wed, July 10, 2019 1:56 PM Greg KH wrote:
>On Wed, Jul 10, 2019 at 11:09:07AM +0800, Xiaoming Ni wrote:
>> Registering the same notifier to a hook repeatedly can cause the hook
>> list to form a ring or lose other members of the list.
>
>Then don't do that :)
>

Duplicate registration is checked and exited in notifier_chain_cond_register()

Duplicate registration was checked in notifier_chain_register() but only 
the alarm was triggered without exiting. added by commit 831246570d34692e 
("kernel/notifier.c: double register detection")

This patch is similar to commit 8312465 and notifier_chain_cond_register(),
 with actual prevention for such behaviour,  which I think is necessary to 
 avoid the formation of a linked list ring.

>Is there any in-kernel users that do do this?  If so, please just fix
>them.
>
Notifier_chain_register() is not a hotspot path.
Adding a check here can make the kernel more stable.

Thanks

Xiaoming Ni


>thanks,
>
>greg k-h
>

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  1:45 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <304d8535-5043-836d-2933-1a5efb7aec72@fb.com>

On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>> BTF verifier has Different logic depending on whether we are following
> >>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>> stop early in DFS traversal while resolving BTF types. But it also
> >>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>> size won't be resolved, as it is considered to be a sink for pointer,
> >>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>> completely wrong.
> >>>
> >>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>> over all BTF types anyways, so the only saving is a potentially slightly
> >>> shorter stack. But correctness is more important that tiny savings.
> >>>
> >>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>> typedef as a value type:
> >>>
> >>> typedef int array_t[16];
> >>>
> >>> struct {
> >>>        __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>        __type(value, array_t); /* i.e., array_t *value; */
> >>> } test_map SEC(".maps");
> >>>
> >>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>
> >> The change seems okay to me. Currently, looks like intermediate
> >> modifier type will carry size = 0 (in the internal data structure).
> >
> > Yes, which is totally wrong, especially that we use that size in some
> > cases to reject map with specified BTF.
> >
> >>
> >> If we remove RESOLVE logic, we probably want to double check
> >> whether we handle circular types correctly or not. Maybe we will
> >> be okay if all self tests pass.
> >
> > I checked, it does. We'll attempt to add referenced type unless it's a
> > "resolve sink" (where size is immediately known) or is already
> > resolved (it's state is RESOLVED). In other cases, we'll attempt to
> > env_stack_push(), which check that the state of that type is
> > NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> > type is added into the stack, it's resolve state goes from NOT_VISITED
> > to VISITED.
> >
> > So, if there is a loop, then we'll detect it as soon as we'll attempt
> > to add the same type onto the stack second time.
> >
> >>
> >> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >> before removing it.
> >
> > I don't think there is any, because every type will be visited exactly
> > once, due to DFS nature of algorithm. The only difference is that if
> > we have a long chain of modifiers, we can technically reach the max
> > limit and fail. But at 32 I think it's pretty unrealistic to have such
> > a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >
> >>
> >> Another possible change is, for external usage, removing
> >> modifiers, before checking the size, something like below.
> >> Note that I am not strongly advocating my below patch as
> >> it has the same shortcoming that maintained modifier type
> >> size may not be correct.
> >
> > I don't think your patch helps, it can actually confuse things even
> > more. It skips modifiers until underlying type is found, but you still
> > don't guarantee that at that time that underlying type will have its
> > size resolved.
>
> It actually does help. It does not change the internal btf type
> traversal algorithms. It only change the implementation of
> an external API btf_type_id_size(). Previously, this function
> is used by externals and internal btf.c. I broke it into two,
> one internal __btf_type_id_size(), and another external
> btf_type_id_size(). The external one removes modifier before
> finding type size. The external one is typically used only
> after btf is validated.

Sure, for external callers yes, it solves the problem. But there is
deeper problem: we mark modifier types RESOLVED before types they
ultimately point to are resolved. Then in all those btf_xxx_resolve()
functions we have check:

if (!env_type_is_resolve_sink && !env_type_is_resolved)
  return env_stack_push();
else {

  /* here we assume that we can calculate size of the type */
  /* so even if we traverse through all the modifiers and find
underlying type */
  /* that type will have resolved_size = 0, because we haven't
processed it yet */
  /* but we will just incorrectly assume that zero is *final* size */
}

So I think that your patch is still just hiding the problem, not solving it.

BTW, I've also identified part of btf_ptr_resolve() logic that can be
now safely removed (it's a special case that "restarts" DFS traversal
for modifiers, because they could have been prematurely marked
resolved). This is another sign that there is something wrong in an
algorithm.

I'd rather remove unnecessary complexity and fix underlying problem,
especially given that there is no performance or correctness penalty.

I'll post v2 soon.

>
> Will go through your other comments later.
>
> >
> >>
> >> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >> index 546ebee39e2a..6f927c3e0a89 100644
> >> --- a/kernel/bpf/btf.c
> >> +++ b/kernel/bpf/btf.c
> >> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >> btf_type *t)
> >>           return true;
> >>    }
> >>
> >> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >> +                                                u32 *type_id, u32
> >> *ret_size,
> >> +                                                bool skip_modifier)
> >> +{
> >> +       const struct btf_type *size_type;
> >> +       u32 size_type_id = *type_id;
> >> +       u32 size = 0;
> >> +
> >> +       size_type = btf_type_by_id(btf, size_type_id);
> >> +       if (size_type && skip_modifier) {
> >> +               while (btf_type_is_modifier(size_type))
> >> +                       size_type = btf_type_by_id(btf, size_type->type);
> >> +       }
> >> +
> >> +       if (btf_type_nosize_or_null(size_type))
> >> +               return NULL;
> >> +
> >> +       if (btf_type_has_size(size_type)) {
> >> +               size = size_type->size;
> >> +       } else if (btf_type_is_array(size_type)) {
> >> +               size = btf->resolved_sizes[size_type_id];
> >> +       } else if (btf_type_is_ptr(size_type)) {
> >> +               size = sizeof(void *);
> >> +       } else {
> >> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >> +                                !btf_type_is_var(size_type)))
> >> +                       return NULL;
> >> +
> >> +               size = btf->resolved_sizes[size_type_id];
> >> +               size_type_id = btf->resolved_ids[size_type_id];
> >> +               size_type = btf_type_by_id(btf, size_type_id);
> >> +               if (btf_type_nosize_or_null(size_type))
> >> +                       return NULL;
> >> +       }
> >> +
> >> +       *type_id = size_type_id;
> >> +       if (ret_size)
> >> +               *ret_size = size;
> >> +
> >> +       return size_type;
> >> +}
> >> +
> [...]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  1:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190710175212.GM2887@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 4311 bytes --]

Hi all,

On Wed, 10 Jul 2019 17:52:17 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Tue, Jul 09, 2019 at 09:43:46AM +0300, Leon Romanovsky wrote:
> > On Tue, Jul 09, 2019 at 01:56:36PM +1000, Stephen Rothwell wrote:  
> > > Hi all,
> > >
> > > After merging the net-next tree, today's linux-next build (x86_64
> > > allmodconfig) failed like this:
> > >
> > > drivers/infiniband/sw/siw/siw_cm.c: In function 'siw_create_listen':
> > > drivers/infiniband/sw/siw/siw_cm.c:1978:3: error: implicit declaration of function 'for_ifa'; did you mean 'fork_idle'? [-Werror=implicit-function-declaration]
> > >    for_ifa(in_dev)
> > >    ^~~~~~~
> > >    fork_idle
> > > drivers/infiniband/sw/siw/siw_cm.c:1978:18: error: expected ';' before '{' token
> > >    for_ifa(in_dev)
> > >                   ^
> > >                   ;
> > >    {
> > >    ~
> > >
> > > Caused by commit
> > >
> > >   6c52fdc244b5 ("rdma/siw: connection management")
> > >
> > > from the rdma tree.  I don't know why this didn't fail after I mereged
> > > that tree.  
> > 
> > I had the same question, because I have this fix for a couple of days already.
> > 
> > From 56c9e15ec670af580daa8c3ffde9503af3042d67 Mon Sep 17 00:00:00 2001
> > From: Leon Romanovsky <leonro@mellanox.com>
> > Date: Sun, 7 Jul 2019 10:43:42 +0300
> > Subject: [PATCH] Fixup to build SIW issue
> > 
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/sw/siw/siw_cm.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
> > index 8e618cb7261f..c883bf514341 100644
> > +++ b/drivers/infiniband/sw/siw/siw_cm.c
> > @@ -1954,6 +1954,7 @@ static void siw_drop_listeners(struct iw_cm_id *id)
> >  int siw_create_listen(struct iw_cm_id *id, int backlog)
> >  {
> >  	struct net_device *dev = to_siw_dev(id->device)->netdev;
> > +	const struct in_ifaddr *ifa;
> >  	int rv = 0, listeners = 0;
> > 
> >  	siw_dbg(id->device, "id 0x%p: backlog %d\n", id, backlog);
> > @@ -1975,8 +1976,7 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
> >  			id, &s_laddr.sin_addr, ntohs(s_laddr.sin_port),
> >  			&s_raddr->sin_addr, ntohs(s_raddr->sin_port));
> > 
> > -		for_ifa(in_dev)
> > -		{
> > +		in_dev_for_each_ifa_rcu(ifa, in_dev) {
> >  			if (ipv4_is_zeronet(s_laddr.sin_addr.s_addr) ||  
> 
> Hum. There is no rcu lock held here and we can't use RCU anyhow as
> siw_listen_address will sleep.
> 
> I think this needs to use rtnl, as below. Bernard, please urgently
> confirm. Thanks
> 
> diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
> index 8e618cb7261f62..ee98e96a5bfaba 100644
> --- a/drivers/infiniband/sw/siw/siw_cm.c
> +++ b/drivers/infiniband/sw/siw/siw_cm.c
> @@ -1965,6 +1965,7 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
>  	 */
>  	if (id->local_addr.ss_family == AF_INET) {
>  		struct in_device *in_dev = in_dev_get(dev);
> +		const struct in_ifaddr *ifa;
>  		struct sockaddr_in s_laddr, *s_raddr;
>  
>  		memcpy(&s_laddr, &id->local_addr, sizeof(s_laddr));
> @@ -1975,8 +1976,8 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
>  			id, &s_laddr.sin_addr, ntohs(s_laddr.sin_port),
>  			&s_raddr->sin_addr, ntohs(s_raddr->sin_port));
>  
> -		for_ifa(in_dev)
> -		{
> +		rtnl_lock();
> +		in_dev_for_each_ifa_rtnl(ifa, in_dev) {
>  			if (ipv4_is_zeronet(s_laddr.sin_addr.s_addr) ||
>  			    s_laddr.sin_addr.s_addr == ifa->ifa_address) {
>  				s_laddr.sin_addr.s_addr = ifa->ifa_address;
> @@ -1988,7 +1989,7 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
>  					listeners++;
>  			}
>  		}
> -		endfor_ifa(in_dev);
> +		rtnl_unlock();
>  		in_dev_put(in_dev);
>  	} else if (id->local_addr.ss_family == AF_INET6) {
>  		struct inet6_dev *in6_dev = in6_dev_get(dev);

So today this failed to build after I merged the rdma tree (previously
it didn;t until after the net-next tree was merged (I assume a
dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
tree :-(

I have disabled the driver again.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  1:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <CAEf4Bza6Y87C2_Fobj9CwU-2YRTU32S61f8_8CQdhMPenJiJZQ@mail.gmail.com>



On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
>>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
>>>>> BTF verifier has Different logic depending on whether we are following
>>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
>>>>> stop early in DFS traversal while resolving BTF types. But it also
>>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
>>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
>>>>> size won't be resolved, as it is considered to be a sink for pointer,
>>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
>>>>> completely wrong.
>>>>>
>>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
>>>>> over all BTF types anyways, so the only saving is a potentially slightly
>>>>> shorter stack. But correctness is more important that tiny savings.
>>>>>
>>>>> This bug manifests itself in rejecting BTF-defined maps that use array
>>>>> typedef as a value type:
>>>>>
>>>>> typedef int array_t[16];
>>>>>
>>>>> struct {
>>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
>>>>>         __type(value, array_t); /* i.e., array_t *value; */
>>>>> } test_map SEC(".maps");
>>>>>
>>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
>>>>> Cc: Martin KaFai Lau <kafai@fb.com>
>>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>>>
>>>> The change seems okay to me. Currently, looks like intermediate
>>>> modifier type will carry size = 0 (in the internal data structure).
>>>
>>> Yes, which is totally wrong, especially that we use that size in some
>>> cases to reject map with specified BTF.
>>>
>>>>
>>>> If we remove RESOLVE logic, we probably want to double check
>>>> whether we handle circular types correctly or not. Maybe we will
>>>> be okay if all self tests pass.
>>>
>>> I checked, it does. We'll attempt to add referenced type unless it's a
>>> "resolve sink" (where size is immediately known) or is already
>>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
>>> env_stack_push(), which check that the state of that type is
>>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
>>> type is added into the stack, it's resolve state goes from NOT_VISITED
>>> to VISITED.
>>>
>>> So, if there is a loop, then we'll detect it as soon as we'll attempt
>>> to add the same type onto the stack second time.
>>>
>>>>
>>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
>>>> before removing it.
>>>
>>> I don't think there is any, because every type will be visited exactly
>>> once, due to DFS nature of algorithm. The only difference is that if
>>> we have a long chain of modifiers, we can technically reach the max
>>> limit and fail. But at 32 I think it's pretty unrealistic to have such
>>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
>>>
>>>>
>>>> Another possible change is, for external usage, removing
>>>> modifiers, before checking the size, something like below.
>>>> Note that I am not strongly advocating my below patch as
>>>> it has the same shortcoming that maintained modifier type
>>>> size may not be correct.
>>>
>>> I don't think your patch helps, it can actually confuse things even
>>> more. It skips modifiers until underlying type is found, but you still
>>> don't guarantee that at that time that underlying type will have its
>>> size resolved.
>>
>> It actually does help. It does not change the internal btf type
>> traversal algorithms. It only change the implementation of
>> an external API btf_type_id_size(). Previously, this function
>> is used by externals and internal btf.c. I broke it into two,
>> one internal __btf_type_id_size(), and another external
>> btf_type_id_size(). The external one removes modifier before
>> finding type size. The external one is typically used only
>> after btf is validated.
> 
> Sure, for external callers yes, it solves the problem. But there is
> deeper problem: we mark modifier types RESOLVED before types they
> ultimately point to are resolved. Then in all those btf_xxx_resolve()
> functions we have check:
> 
> if (!env_type_is_resolve_sink && !env_type_is_resolved)
>    return env_stack_push();
> else {
> 
>    /* here we assume that we can calculate size of the type */
>    /* so even if we traverse through all the modifiers and find
> underlying type */
>    /* that type will have resolved_size = 0, because we haven't
> processed it yet */
>    /* but we will just incorrectly assume that zero is *final* size */
> }
> 
> So I think that your patch is still just hiding the problem, not solving it.

That is why I am not advocating it.

The really long modifier chain (const volatile restrict ...) is rare.
So I agree removing this RESOLVE logic is okay.

> 
> BTW, I've also identified part of btf_ptr_resolve() logic that can be
> now safely removed (it's a special case that "restarts" DFS traversal
> for modifiers, because they could have been prematurely marked
> resolved). This is another sign that there is something wrong in an
> algorithm.
> 
> I'd rather remove unnecessary complexity and fix underlying problem,
> especially given that there is no performance or correctness penalty.
> 
> I'll post v2 soon.

Sounds good.

> 
>>
>> Will go through your other comments later.
>>
>>>
>>>>
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index 546ebee39e2a..6f927c3e0a89 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
>>>> btf_type *t)
>>>>            return true;
>>>>     }
>>>>
>>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
>>>> +                                                u32 *type_id, u32
>>>> *ret_size,
>>>> +                                                bool skip_modifier)
>>>> +{
>>>> +       const struct btf_type *size_type;
>>>> +       u32 size_type_id = *type_id;
>>>> +       u32 size = 0;
>>>> +
>>>> +       size_type = btf_type_by_id(btf, size_type_id);
>>>> +       if (size_type && skip_modifier) {
>>>> +               while (btf_type_is_modifier(size_type))
>>>> +                       size_type = btf_type_by_id(btf, size_type->type);
>>>> +       }
>>>> +
>>>> +       if (btf_type_nosize_or_null(size_type))
>>>> +               return NULL;
>>>> +
>>>> +       if (btf_type_has_size(size_type)) {
>>>> +               size = size_type->size;
>>>> +       } else if (btf_type_is_array(size_type)) {
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +       } else if (btf_type_is_ptr(size_type)) {
>>>> +               size = sizeof(void *);
>>>> +       } else {
>>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
>>>> +                                !btf_type_is_var(size_type)))
>>>> +                       return NULL;
>>>> +
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +               size_type_id = btf->resolved_ids[size_type_id];
>>>> +               size_type = btf_type_by_id(btf, size_type_id);
>>>> +               if (btf_type_nosize_or_null(size_type))
>>>> +                       return NULL;
>>>> +       }
>>>> +
>>>> +       *type_id = size_type_id;
>>>> +       if (ret_size)
>>>> +               *ret_size = size;
>>>> +
>>>> +       return size_type;
>>>> +}
>>>> +
>> [...]

^ permalink raw reply

* RE: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Nixiaoming @ 2019-07-11  1:55 UTC (permalink / raw)
  To: Vasily Averin, adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	gregkh@linuxfoundation.org, jlayton@kernel.org, luto@kernel.org,
	mingo@kernel.org, Nadia.Derbey@bull.net,
	paulmck@linux.vnet.ibm.com, semen.protsenko@linaro.org,
	stable@kernel.org, stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org
  Cc: Huangjianhui (Alex), Dailei, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <f628ff03-eb47-62f3-465b-fe4ed046b30c@virtuozzo.com>

On Wed, July 10, 2019 1:49 PM Vasily Averin wrote:
>On 7/10/19 6:09 AM, Xiaoming Ni wrote:
>> Registering the same notifier to a hook repeatedly can cause the hook
>> list to form a ring or lose other members of the list.
>
>I think is not enough to _prevent_ 2nd register attempt,
>it's enough to detect just attempt and generate warning to mark host in bad state.
>

Duplicate registration is prevented in my patch, not just "mark host in bad state"

Duplicate registration is checked and exited in notifier_chain_cond_register()

Duplicate registration was checked in notifier_chain_register() but only 
the alarm was triggered without exiting. added by commit 831246570d34692e 
("kernel/notifier.c: double register detection")

My patch is like a combination of 831246570d34692e and notifier_chain_cond_register(),
 which triggers an alarm and exits when a duplicate registration is detected.

>Unexpected 2nd register of the same hook most likely will lead to 2nd unregister,
>and it can lead to host crash in any time: 
>you can unregister notifier on first attempt it can be too early, it can be still in use.
>on the other hand you can never call 2nd unregister at all.

Since the member was not added to the linked list at the time of the second registration, 
no linked list ring was formed. 
The member is released on the first unregistration and -ENOENT on the second unregistration.
After patching, the fault has been alleviated

It may be more helpful to return an error code when someone tries to register the same
notification program a second time.
But I noticed that notifier_chain_cond_register() returns 0 when duplicate registration 
is detected. At the same time, in all the existing export function comments of notify,
"Currently always returns zero"

I am a bit confused: which is better?

>
>Unfortunately I do not see any ways to handle such cases properly,
>and it seems for me your patches does not resolve this problem.
>
>Am I missed something probably?
> 
>> case1: An infinite loop in notifier_chain_register() can cause soft lockup
>>         atomic_notifier_chain_register(&test_notifier_list, &test1);
>>         atomic_notifier_chain_register(&test_notifier_list, &test1);
>>         atomic_notifier_chain_register(&test_notifier_list, &test2);

Thanks

Xiaoming Ni

^ permalink raw reply

* Re: [GIT PULL] Keys: Set 4 - Key ACLs for 5.3
From: Linus Torvalds @ 2019-07-11  1:59 UTC (permalink / raw)
  To: Linus Torvalds, David Howells, James Morris, keyrings, Netdev,
	linux-nfs, CIFS, linux-afs, linux-fsdevel, linux-integrity,
	LSM List, Linux List Kernel Mailing
In-Reply-To: <20190710201552.GB83443@gmail.com>

On Wed, Jul 10, 2019 at 1:15 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Also worth noting that the key ACL patches were only in linux-next for 9 days
> before the pull request was sent.

Yes. I was not entirely happy with the whole key subsystem situation.
See my concerns in

  https://lore.kernel.org/lkml/CAHk-=wjEowdfG7v_4ttu3xhf9gqopj1+q1nGG86+mGfGDTEBBg@mail.gmail.com/

for more. That was before I realized it was buggy.

So it really would be good to have more people involved, and more
structure to the keys development (and, I suspect, much else under
security/)

Anyway, since it does seem like David is offline, I've just reverted
this from my tree, and will be continuing my normal merge window pull
requests (the other issues I have seen have fixes in their respective
trees).

                 Linus

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Jason Gunthorpe @ 2019-07-11  2:26 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711115054.7d7f468c@canb.auug.org.au>

On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:

> So today this failed to build after I merged the rdma tree (previously
> it didn;t until after the net-next tree was merged (I assume a
> dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> tree :-(

? I'm confused.. 

rdma.git builds fine stand alone (I hope!)

If you merge it with netdev then the above patch is needed afer the
merge as netdev changed to ifa_rcu

I just did this a few hours ago to make and test the patch I sent
above..

Jason

^ permalink raw reply

* Re: [GIT PULL] Keys: Set 4 - Key ACLs for 5.3
From: Mimi Zohar @ 2019-07-11  3:07 UTC (permalink / raw)
  To: Linus Torvalds, David Howells, James Morris, keyrings, Netdev,
	linux-nfs, CIFS, linux-afs, linux-fsdevel, linux-integrity,
	LSM List, Linux List Kernel Mailing
In-Reply-To: <CAHk-=wiFti6=K2fyAYhx-PSX9ovQPJUNp0FMdV0pDaO_pSx9MQ@mail.gmail.com>

Hi Linus,

On Wed, 2019-07-10 at 18:59 -0700, Linus Torvalds wrote:
> Anyway, since it does seem like David is offline, I've just reverted
> this from my tree, and will be continuing my normal merge window pull
> requests (the other issues I have seen have fixes in their respective
> trees).

Sorry for the delay.  An exception is needed for loading builtin keys
"KEY_ALLOC_BUILT_IN" onto a keyring that is not writable by userspace.
 The following works, but probably is not how David would handle the
exception.

diff --git a/security/keys/key.c b/security/keys/key.c
index 519211a996e7..a99332c1e014 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -896,7 +896,7 @@ key_ref_t key_create_or_update(key_ref_t keyring_ref,
        /* if we're going to allocate a new key, we're going to have
         * to modify the keyring */
        ret = key_permission(keyring_ref, KEY_NEED_WRITE);
-       if (ret < 0) {
+       if (ret < 0 && !(flags & KEY_ALLOC_BUILT_IN)) {
                key_ref = ERR_PTR(ret);
                goto error_link_end;
        }

Mimi


^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  3:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711015854.GC22409@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]

Hi Jason,

On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> 
> > So today this failed to build after I merged the rdma tree (previously
> > it didn;t until after the net-next tree was merged (I assume a
> > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > tree :-(  
> 
> ? I'm confused.. 
> 
> rdma.git builds fine stand alone (I hope!)

I have "Fixup to build SIW issue" from Leon (which switches to using
in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
without that the rdma tree would not build for me.  Are you saying that
I don't need that at all, now?

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  3:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711131344.452fc064@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

Hi all,

On Thu, 11 Jul 2019 13:13:44 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:
> >
> > On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> >   
> > > So today this failed to build after I merged the rdma tree (previously
> > > it didn;t until after the net-next tree was merged (I assume a
> > > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > > tree :-(    
> > 
> > ? I'm confused.. 
> > 
> > rdma.git builds fine stand alone (I hope!)  
> 
> I have "Fixup to build SIW issue" from Leon (which switches to using
> in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
> without that the rdma tree would not build for me.  Are you saying that
> I don't need that at all, now?

Actually , I get it now, "Fixup to build SIW issue" is really just a
fixup for the net-next and rdma trees merge ... OK, I will fix that up
tomorrow.  Sorry for my confusion.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  3:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711131603.6b11b831@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1420 bytes --]

Hi all,

On Thu, 11 Jul 2019 13:16:03 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
> On Thu, 11 Jul 2019 13:13:44 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:  
> > >
> > > On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> > >     
> > > > So today this failed to build after I merged the rdma tree (previously
> > > > it didn;t until after the net-next tree was merged (I assume a
> > > > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > > > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > > > tree :-(      
> > > 
> > > ? I'm confused.. 
> > > 
> > > rdma.git builds fine stand alone (I hope!)    
> > 
> > I have "Fixup to build SIW issue" from Leon (which switches to using
> > in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
> > without that the rdma tree would not build for me.  Are you saying that
> > I don't need that at all, now?  
> 
> Actually , I get it now, "Fixup to build SIW issue" is really just a
> fixup for the net-next and rdma trees merge ... OK, I will fix that up
> tomorrow.  Sorry for my confusion.

Actually, I have rewound my tree and am starting from the merge of the
rdma tree again, so hopefully it should all be good today.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH] ipvs: remove unnecessary space
From: yangxingwu @ 2019-07-11  3:39 UTC (permalink / raw)
  To: wensong
  Cc: horms, ja, pablo, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel, yangxingwu

this patch removes the extra space and use bitmap_zalloc instead

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
---
 net/netfilter/ipvs/ip_vs_mh.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
index 94d9d34..3229867 100644
--- a/net/netfilter/ipvs/ip_vs_mh.c
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -174,8 +174,7 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
 		return 0;
 	}
 
-	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
-			 sizeof(unsigned long), GFP_KERNEL);
+	table = bitmap_zalloc(IP_VS_MH_TAB_SIZE, GFP_KERNEL);
 	if (!table)
 		return -ENOMEM;
 
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH v3 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  3:50 UTC (permalink / raw)
  To: David Laight
  Cc: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	Christoph Hellwig, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux@endlessm.com, Daniel Drake, stable@vger.kernel.org
In-Reply-To: <81a2b91c4b084617bab8656fca932f6d@AcuMS.aculab.com>

David Laight <David.Laight@aculab.com> 於 2019年7月10日 週三 下午4:57寫道:
>
> From: Jian-Hong Pan
> > Sent: 10 July 2019 09:38
> >
> > Testing with RTL8822BE hardware, when available memory is low, we
> > frequently see a kernel panic and system freeze.
> >
> > First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):
> >
> > rx routine starvation
> > WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822
> > rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> > [ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> >
> > Then we see a variety of different error conditions and kernel panics,
> > such as this one (trimmed):
> >
> > rtw_pci 0000:02:00.0: pci bus timeout, check dma status
> > skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f
> > data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
> > ------------[ cut here ]------------
> > kernel BUG at net/core/skbuff.c:105!
> > invalid opcode: 0000 [#1] SMP NOPTI
> > RIP: 0010:skb_panic+0x43/0x45
> >
> > When skb allocation fails and the "rx routine starvation" is hit, the
> > function returns immediately without updating the RX ring. At this
> > point, the RX ring may continue referencing an old skb which was already
> > handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
> > bad things happen.
> >
> > This patch allocates a new, data-sized skb first in RX ISR. After
> > copying the data in, we pass it to the upper layers. However, if skb
> > allocation fails, we effectively drop the frame. In both cases, the
> > original, full size ring skb is reused.
> >
> > In addition, by fixing the kernel crash, the RX routine should now
> > generally behave better under low memory conditions.
>
> A couple of minor nits (see below).
> You may want to do a followup patch that changes the rx buffers
> (used by the hardware) to by just memory buffers.
> Nothing (probably) relies on them being skb with all the accociated
> baggage.

It is a good idea for later commit.

>         David
>
> >
> > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
> > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> > Cc: <stable@vger.kernel.org>
> > ---
> > v2:
> >  - Allocate new data-sized skb and put data into it, then pass it to
> >    mac80211. Reuse the original skb in RX ring by DMA sync.
> >  - Modify the commit message.
> >  - Introduce following [PATCH v3 2/2] rtw88: pci: Use DMA sync instead
> >    of remapping in RX ISR.
> >
> > v3:
> >  - Same as v2.
> >
> >  drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
> >  1 file changed, 22 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
> > index cfe05ba7280d..e9fe3ad896c8 100644
> > --- a/drivers/net/wireless/realtek/rtw88/pci.c
> > +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> > @@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
> >       u32 pkt_offset;
> >       u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
> >       u32 buf_desc_sz = chip->rx_buf_desc_sz;
> > +     u32 new_len;
> >       u8 *rx_desc;
> >       dma_addr_t dma;
> >
> > @@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
> >               pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
> >                            pkt_stat.shift;
> >
> > -             if (pkt_stat.is_c2h) {
> > -                     /* keep rx_desc, halmac needs it */
> > -                     skb_put(skb, pkt_stat.pkt_len + pkt_offset);
> > +             /* discard current skb if the new skb cannot be allocated as a
> > +              * new one in rx ring later
> > +              */
>
> That comment isn't quite right.
> maybe: "Allocate a new skb for this frame, discard if none available"

Thanks!  I will tweak it.

> > +             new_len = pkt_stat.pkt_len + pkt_offset;
> > +             new = dev_alloc_skb(new_len);
> > +             if (WARN_ONCE(!new, "rx routine starvation\n"))
>
> I think you should count these??

Larry has a different idea here. [1]
I agree with Larry that just need to know not enough memory here.

[1] https://lkml.org/lkml/2019/7/8/1049

Jian-Hong Pan

> > +                     goto next_rp;
> > +
> > +             /* put the DMA data including rx_desc from phy to new skb */
> > +             skb_put_data(new, skb->data, new_len);
> >
> > -                     /* pass offset for further operation */
> > -                     *((u32 *)skb->cb) = pkt_offset;
> > -                     skb_queue_tail(&rtwdev->c2h_queue, skb);
> > +             if (pkt_stat.is_c2h) {
> > +                      /* pass rx_desc & offset for further operation */
> > +                     *((u32 *)new->cb) = pkt_offset;
> > +                     skb_queue_tail(&rtwdev->c2h_queue, new);
> >                       ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
> >               } else {
> > -                     /* remove rx_desc, maybe use skb_pull? */
> > -                     skb_put(skb, pkt_stat.pkt_len);
> > -                     skb_reserve(skb, pkt_offset);
> > -
> > -                     /* alloc a smaller skb to mac80211 */
> > -                     new = dev_alloc_skb(pkt_stat.pkt_len);
> > -                     if (!new) {
> > -                             new = skb;
> > -                     } else {
> > -                             skb_put_data(new, skb->data, skb->len);
> > -                             dev_kfree_skb_any(skb);
> > -                     }
> > -                     /* TODO: merge into rx.c */
> > -                     rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
> > +                     /* remove rx_desc */
> > +                     skb_pull(new, pkt_offset);
> > +
> > +                     rtw_rx_stats(rtwdev, pkt_stat.vif, new);
> >                       memcpy(new->cb, &rx_status, sizeof(rx_status));
> >                       ieee80211_rx_irqsafe(rtwdev->hw, new);
> >               }
> >
> > -             /* skb delivered to mac80211, alloc a new one in rx ring */
> > -             new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
> > -             if (WARN(!new, "rx routine starvation\n"))
> > -                     return;
> > -
> > -             ring->buf[cur_rp] = new;
> > -             rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
> > +next_rp:
> > +             /* new skb delivered to mac80211, re-enable original skb DMA */
> > +             rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
> >
> >               /* host read next element in ring */
> >               if (++cur_rp >= ring->r.len)
> > --
> > 2.22.0
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  4:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <CAEf4Bza6Y87C2_Fobj9CwU-2YRTU32S61f8_8CQdhMPenJiJZQ@mail.gmail.com>



On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
>>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
>>>>> BTF verifier has Different logic depending on whether we are following
>>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
>>>>> stop early in DFS traversal while resolving BTF types. But it also
>>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
>>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
>>>>> size won't be resolved, as it is considered to be a sink for pointer,
>>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
>>>>> completely wrong.
>>>>>
>>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
>>>>> over all BTF types anyways, so the only saving is a potentially slightly
>>>>> shorter stack. But correctness is more important that tiny savings.
>>>>>
>>>>> This bug manifests itself in rejecting BTF-defined maps that use array
>>>>> typedef as a value type:
>>>>>
>>>>> typedef int array_t[16];
>>>>>
>>>>> struct {
>>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
>>>>>         __type(value, array_t); /* i.e., array_t *value; */
>>>>> } test_map SEC(".maps");
>>>>>
>>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
>>>>> Cc: Martin KaFai Lau <kafai@fb.com>
>>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>>>
>>>> The change seems okay to me. Currently, looks like intermediate
>>>> modifier type will carry size = 0 (in the internal data structure).
>>>
>>> Yes, which is totally wrong, especially that we use that size in some
>>> cases to reject map with specified BTF.
>>>
>>>>
>>>> If we remove RESOLVE logic, we probably want to double check
>>>> whether we handle circular types correctly or not. Maybe we will
>>>> be okay if all self tests pass.
>>>
>>> I checked, it does. We'll attempt to add referenced type unless it's a
>>> "resolve sink" (where size is immediately known) or is already
>>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
>>> env_stack_push(), which check that the state of that type is
>>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
>>> type is added into the stack, it's resolve state goes from NOT_VISITED
>>> to VISITED.
>>>
>>> So, if there is a loop, then we'll detect it as soon as we'll attempt
>>> to add the same type onto the stack second time.
>>>
>>>>
>>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
>>>> before removing it.
>>>
>>> I don't think there is any, because every type will be visited exactly
>>> once, due to DFS nature of algorithm. The only difference is that if
>>> we have a long chain of modifiers, we can technically reach the max
>>> limit and fail. But at 32 I think it's pretty unrealistic to have such
>>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
>>>
>>>>
>>>> Another possible change is, for external usage, removing
>>>> modifiers, before checking the size, something like below.
>>>> Note that I am not strongly advocating my below patch as
>>>> it has the same shortcoming that maintained modifier type
>>>> size may not be correct.
>>>
>>> I don't think your patch helps, it can actually confuse things even
>>> more. It skips modifiers until underlying type is found, but you still
>>> don't guarantee that at that time that underlying type will have its
>>> size resolved.
>>
>> It actually does help. It does not change the internal btf type
>> traversal algorithms. It only change the implementation of
>> an external API btf_type_id_size(). Previously, this function
>> is used by externals and internal btf.c. I broke it into two,
>> one internal __btf_type_id_size(), and another external
>> btf_type_id_size(). The external one removes modifier before
>> finding type size. The external one is typically used only
>> after btf is validated.
> 
> Sure, for external callers yes, it solves the problem. But there is
> deeper problem: we mark modifier types RESOLVED before types they
> ultimately point to are resolved. Then in all those btf_xxx_resolve()
> functions we have check:
> 
> if (!env_type_is_resolve_sink && !env_type_is_resolved)
>    return env_stack_push();
> else {
> 
>    /* here we assume that we can calculate size of the type */
>    /* so even if we traverse through all the modifiers and find
> underlying type */
>    /* that type will have resolved_size = 0, because we haven't
> processed it yet */
>    /* but we will just incorrectly assume that zero is *final* size */
> }
> 
> So I think that your patch is still just hiding the problem, not solving it.
> 
> BTW, I've also identified part of btf_ptr_resolve() logic that can be
> now safely removed (it's a special case that "restarts" DFS traversal
> for modifiers, because they could have been prematurely marked
> resolved). This is another sign that there is something wrong in an
> algorithm.
> 
> I'd rather remove unnecessary complexity and fix underlying problem,
> especially given that there is no performance or correctness penalty.

Could you create a special btf with type like
typedef int a1;
typedef a1 a2;
...
typedef a65533 a65532;
(maximum kernel allowed number of types is 64KB)

In the BTF, the typedef order is reverse
1: typedef a65533 to 2
2: typedef ... to 3
3 ...

So kernel won't run into deep recursion or panic?

Thanks.

> 
> I'll post v2 soon.
> 
>>
>> Will go through your other comments later.
>>
>>>
>>>>
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index 546ebee39e2a..6f927c3e0a89 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
>>>> btf_type *t)
>>>>            return true;
>>>>     }
>>>>
>>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
>>>> +                                                u32 *type_id, u32
>>>> *ret_size,
>>>> +                                                bool skip_modifier)
>>>> +{
>>>> +       const struct btf_type *size_type;
>>>> +       u32 size_type_id = *type_id;
>>>> +       u32 size = 0;
>>>> +
>>>> +       size_type = btf_type_by_id(btf, size_type_id);
>>>> +       if (size_type && skip_modifier) {
>>>> +               while (btf_type_is_modifier(size_type))
>>>> +                       size_type = btf_type_by_id(btf, size_type->type);
>>>> +       }
>>>> +
>>>> +       if (btf_type_nosize_or_null(size_type))
>>>> +               return NULL;
>>>> +
>>>> +       if (btf_type_has_size(size_type)) {
>>>> +               size = size_type->size;
>>>> +       } else if (btf_type_is_array(size_type)) {
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +       } else if (btf_type_is_ptr(size_type)) {
>>>> +               size = sizeof(void *);
>>>> +       } else {
>>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
>>>> +                                !btf_type_is_var(size_type)))
>>>> +                       return NULL;
>>>> +
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +               size_type_id = btf->resolved_ids[size_type_id];
>>>> +               size_type = btf_type_by_id(btf, size_type_id);
>>>> +               if (btf_type_nosize_or_null(size_type))
>>>> +                       return NULL;
>>>> +       }
>>>> +
>>>> +       *type_id = size_type_id;
>>>> +       if (ret_size)
>>>> +               *ret_size = size;
>>>> +
>>>> +       return size_type;
>>>> +}
>>>> +
>> [...]
> 

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  4:54 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <eebd6ac9-d968-9efb-db07-e5d877f7ae4c@fb.com>

On Wed, Jul 10, 2019 at 6:53 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> >>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>>>> BTF verifier has Different logic depending on whether we are following
> >>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>>>> stop early in DFS traversal while resolving BTF types. But it also
> >>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>>>> size won't be resolved, as it is considered to be a sink for pointer,
> >>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>>>> completely wrong.
> >>>>>
> >>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>>>> over all BTF types anyways, so the only saving is a potentially slightly
> >>>>> shorter stack. But correctness is more important that tiny savings.
> >>>>>
> >>>>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>>>> typedef as a value type:
> >>>>>
> >>>>> typedef int array_t[16];
> >>>>>
> >>>>> struct {
> >>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>>>         __type(value, array_t); /* i.e., array_t *value; */
> >>>>> } test_map SEC(".maps");
> >>>>>
> >>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>>>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>>>
> >>>> The change seems okay to me. Currently, looks like intermediate
> >>>> modifier type will carry size = 0 (in the internal data structure).
> >>>
> >>> Yes, which is totally wrong, especially that we use that size in some
> >>> cases to reject map with specified BTF.
> >>>
> >>>>
> >>>> If we remove RESOLVE logic, we probably want to double check
> >>>> whether we handle circular types correctly or not. Maybe we will
> >>>> be okay if all self tests pass.
> >>>
> >>> I checked, it does. We'll attempt to add referenced type unless it's a
> >>> "resolve sink" (where size is immediately known) or is already
> >>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
> >>> env_stack_push(), which check that the state of that type is
> >>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> >>> type is added into the stack, it's resolve state goes from NOT_VISITED
> >>> to VISITED.
> >>>
> >>> So, if there is a loop, then we'll detect it as soon as we'll attempt
> >>> to add the same type onto the stack second time.
> >>>
> >>>>
> >>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >>>> before removing it.
> >>>
> >>> I don't think there is any, because every type will be visited exactly
> >>> once, due to DFS nature of algorithm. The only difference is that if
> >>> we have a long chain of modifiers, we can technically reach the max
> >>> limit and fail. But at 32 I think it's pretty unrealistic to have such
> >>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >>>
> >>>>
> >>>> Another possible change is, for external usage, removing
> >>>> modifiers, before checking the size, something like below.
> >>>> Note that I am not strongly advocating my below patch as
> >>>> it has the same shortcoming that maintained modifier type
> >>>> size may not be correct.
> >>>
> >>> I don't think your patch helps, it can actually confuse things even
> >>> more. It skips modifiers until underlying type is found, but you still
> >>> don't guarantee that at that time that underlying type will have its
> >>> size resolved.
> >>
> >> It actually does help. It does not change the internal btf type
> >> traversal algorithms. It only change the implementation of
> >> an external API btf_type_id_size(). Previously, this function
> >> is used by externals and internal btf.c. I broke it into two,
> >> one internal __btf_type_id_size(), and another external
> >> btf_type_id_size(). The external one removes modifier before
> >> finding type size. The external one is typically used only
> >> after btf is validated.
> >
> > Sure, for external callers yes, it solves the problem. But there is
> > deeper problem: we mark modifier types RESOLVED before types they
> > ultimately point to are resolved. Then in all those btf_xxx_resolve()
> > functions we have check:
> >
> > if (!env_type_is_resolve_sink && !env_type_is_resolved)
> >    return env_stack_push();
> > else {
> >
> >    /* here we assume that we can calculate size of the type */
> >    /* so even if we traverse through all the modifiers and find
> > underlying type */
> >    /* that type will have resolved_size = 0, because we haven't
> > processed it yet */
> >    /* but we will just incorrectly assume that zero is *final* size */
> > }
> >
> > So I think that your patch is still just hiding the problem, not solving it.
>
> That is why I am not advocating it.
>
> The really long modifier chain (const volatile restrict ...) is rare.
> So I agree removing this RESOLVE logic is okay.

So :) thinking about this a bit more. Stack size is proportional not
to a longest chain of pointers and modifiers, but actually could be as
long as entire type graph (O(N)). So for this approach we'll need to
dynamically resize stack. This is easy to do, but I'm not sure how
much push back I'll get for such change.

But I'll think about doing it differently. The problem is with
resolved_sizes array, we assume it's filled for some types too early.
I'll see if I can get rid of it completely and instead just calculate
that on the fly by relying on resolved_ids. Will post v2 with one of
those approaches.

>
> >
> > BTW, I've also identified part of btf_ptr_resolve() logic that can be
> > now safely removed (it's a special case that "restarts" DFS traversal
> > for modifiers, because they could have been prematurely marked
> > resolved). This is another sign that there is something wrong in an
> > algorithm.
> >
> > I'd rather remove unnecessary complexity and fix underlying problem,
> > especially given that there is no performance or correctness penalty.
> >
> > I'll post v2 soon.
>
> Sounds good.
>
> >
> >>
> >> Will go through your other comments later.
> >>
> >>>
> >>>>
> >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>> index 546ebee39e2a..6f927c3e0a89 100644
> >>>> --- a/kernel/bpf/btf.c
> >>>> +++ b/kernel/bpf/btf.c
> >>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >>>> btf_type *t)
> >>>>            return true;
> >>>>     }
> >>>>
> >>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >>>> +                                                u32 *type_id, u32
> >>>> *ret_size,
> >>>> +                                                bool skip_modifier)
> >>>> +{
> >>>> +       const struct btf_type *size_type;
> >>>> +       u32 size_type_id = *type_id;
> >>>> +       u32 size = 0;
> >>>> +
> >>>> +       size_type = btf_type_by_id(btf, size_type_id);
> >>>> +       if (size_type && skip_modifier) {
> >>>> +               while (btf_type_is_modifier(size_type))
> >>>> +                       size_type = btf_type_by_id(btf, size_type->type);
> >>>> +       }
> >>>> +
> >>>> +       if (btf_type_nosize_or_null(size_type))
> >>>> +               return NULL;
> >>>> +
> >>>> +       if (btf_type_has_size(size_type)) {
> >>>> +               size = size_type->size;
> >>>> +       } else if (btf_type_is_array(size_type)) {
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +       } else if (btf_type_is_ptr(size_type)) {
> >>>> +               size = sizeof(void *);
> >>>> +       } else {
> >>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >>>> +                                !btf_type_is_var(size_type)))
> >>>> +                       return NULL;
> >>>> +
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +               size_type_id = btf->resolved_ids[size_type_id];
> >>>> +               size_type = btf_type_by_id(btf, size_type_id);
> >>>> +               if (btf_type_nosize_or_null(size_type))
> >>>> +                       return NULL;
> >>>> +       }
> >>>> +
> >>>> +       *type_id = size_type_id;
> >>>> +       if (ret_size)
> >>>> +               *ret_size = size;
> >>>> +
> >>>> +       return size_type;
> >>>> +}
> >>>> +
> >> [...]

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  4:56 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <05db3afa-b94e-d0ba-7d61-ec1bf9a82777@fb.com>

On Wed, Jul 10, 2019 at 9:14 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> >>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>>>> BTF verifier has Different logic depending on whether we are following
> >>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>>>> stop early in DFS traversal while resolving BTF types. But it also
> >>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>>>> size won't be resolved, as it is considered to be a sink for pointer,
> >>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>>>> completely wrong.
> >>>>>
> >>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>>>> over all BTF types anyways, so the only saving is a potentially slightly
> >>>>> shorter stack. But correctness is more important that tiny savings.
> >>>>>
> >>>>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>>>> typedef as a value type:
> >>>>>
> >>>>> typedef int array_t[16];
> >>>>>
> >>>>> struct {
> >>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>>>         __type(value, array_t); /* i.e., array_t *value; */
> >>>>> } test_map SEC(".maps");
> >>>>>
> >>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>>>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>>>
> >>>> The change seems okay to me. Currently, looks like intermediate
> >>>> modifier type will carry size = 0 (in the internal data structure).
> >>>
> >>> Yes, which is totally wrong, especially that we use that size in some
> >>> cases to reject map with specified BTF.
> >>>
> >>>>
> >>>> If we remove RESOLVE logic, we probably want to double check
> >>>> whether we handle circular types correctly or not. Maybe we will
> >>>> be okay if all self tests pass.
> >>>
> >>> I checked, it does. We'll attempt to add referenced type unless it's a
> >>> "resolve sink" (where size is immediately known) or is already
> >>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
> >>> env_stack_push(), which check that the state of that type is
> >>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> >>> type is added into the stack, it's resolve state goes from NOT_VISITED
> >>> to VISITED.
> >>>
> >>> So, if there is a loop, then we'll detect it as soon as we'll attempt
> >>> to add the same type onto the stack second time.
> >>>
> >>>>
> >>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >>>> before removing it.
> >>>
> >>> I don't think there is any, because every type will be visited exactly
> >>> once, due to DFS nature of algorithm. The only difference is that if
> >>> we have a long chain of modifiers, we can technically reach the max
> >>> limit and fail. But at 32 I think it's pretty unrealistic to have such
> >>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >>>
> >>>>
> >>>> Another possible change is, for external usage, removing
> >>>> modifiers, before checking the size, something like below.
> >>>> Note that I am not strongly advocating my below patch as
> >>>> it has the same shortcoming that maintained modifier type
> >>>> size may not be correct.
> >>>
> >>> I don't think your patch helps, it can actually confuse things even
> >>> more. It skips modifiers until underlying type is found, but you still
> >>> don't guarantee that at that time that underlying type will have its
> >>> size resolved.
> >>
> >> It actually does help. It does not change the internal btf type
> >> traversal algorithms. It only change the implementation of
> >> an external API btf_type_id_size(). Previously, this function
> >> is used by externals and internal btf.c. I broke it into two,
> >> one internal __btf_type_id_size(), and another external
> >> btf_type_id_size(). The external one removes modifier before
> >> finding type size. The external one is typically used only
> >> after btf is validated.
> >
> > Sure, for external callers yes, it solves the problem. But there is
> > deeper problem: we mark modifier types RESOLVED before types they
> > ultimately point to are resolved. Then in all those btf_xxx_resolve()
> > functions we have check:
> >
> > if (!env_type_is_resolve_sink && !env_type_is_resolved)
> >    return env_stack_push();
> > else {
> >
> >    /* here we assume that we can calculate size of the type */
> >    /* so even if we traverse through all the modifiers and find
> > underlying type */
> >    /* that type will have resolved_size = 0, because we haven't
> > processed it yet */
> >    /* but we will just incorrectly assume that zero is *final* size */
> > }
> >
> > So I think that your patch is still just hiding the problem, not solving it.
> >
> > BTW, I've also identified part of btf_ptr_resolve() logic that can be
> > now safely removed (it's a special case that "restarts" DFS traversal
> > for modifiers, because they could have been prematurely marked
> > resolved). This is another sign that there is something wrong in an
> > algorithm.
> >
> > I'd rather remove unnecessary complexity and fix underlying problem,
> > especially given that there is no performance or correctness penalty.
>
> Could you create a special btf with type like
> typedef int a1;
> typedef a1 a2;
> ...
> typedef a65533 a65532;
> (maximum kernel allowed number of types is 64KB)
>
> In the BTF, the typedef order is reverse
> 1: typedef a65533 to 2
> 2: typedef ... to 3
> 3 ...
>
> So kernel won't run into deep recursion or panic?

Yeah I was just thinking about the need to generate artificially
constructed BTFs to stress-test BTF verification. Will add something.

>
> Thanks.
>
> >
> > I'll post v2 soon.
> >
> >>
> >> Will go through your other comments later.
> >>
> >>>
> >>>>
> >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>> index 546ebee39e2a..6f927c3e0a89 100644
> >>>> --- a/kernel/bpf/btf.c
> >>>> +++ b/kernel/bpf/btf.c
> >>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >>>> btf_type *t)
> >>>>            return true;
> >>>>     }
> >>>>
> >>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >>>> +                                                u32 *type_id, u32
> >>>> *ret_size,
> >>>> +                                                bool skip_modifier)
> >>>> +{
> >>>> +       const struct btf_type *size_type;
> >>>> +       u32 size_type_id = *type_id;
> >>>> +       u32 size = 0;
> >>>> +
> >>>> +       size_type = btf_type_by_id(btf, size_type_id);
> >>>> +       if (size_type && skip_modifier) {
> >>>> +               while (btf_type_is_modifier(size_type))
> >>>> +                       size_type = btf_type_by_id(btf, size_type->type);
> >>>> +       }
> >>>> +
> >>>> +       if (btf_type_nosize_or_null(size_type))
> >>>> +               return NULL;
> >>>> +
> >>>> +       if (btf_type_has_size(size_type)) {
> >>>> +               size = size_type->size;
> >>>> +       } else if (btf_type_is_array(size_type)) {
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +       } else if (btf_type_is_ptr(size_type)) {
> >>>> +               size = sizeof(void *);
> >>>> +       } else {
> >>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >>>> +                                !btf_type_is_var(size_type)))
> >>>> +                       return NULL;
> >>>> +
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +               size_type_id = btf->resolved_ids[size_type_id];
> >>>> +               size_type = btf_type_by_id(btf, size_type_id);
> >>>> +               if (btf_type_nosize_or_null(size_type))
> >>>> +                       return NULL;
> >>>> +       }
> >>>> +
> >>>> +       *type_id = size_type_id;
> >>>> +       if (ret_size)
> >>>> +               *ret_size = size;
> >>>> +
> >>>> +       return size_type;
> >>>> +}
> >>>> +
> >> [...]
> >

^ permalink raw reply

* Re: [PATCH net-next] net: mlx5: Fix compiling error in tls.c
From: Saeed Mahameed @ 2019-07-11  5:07 UTC (permalink / raw)
  To: Mao Wenan
  Cc: David S. Miller, Saeed Mahameed, Linux Netdev List, linux-kernel
In-Reply-To: <20190710093852.34549-1-maowenan@huawei.com>

On Wed, Jul 10, 2019 at 2:33 AM Mao Wenan <maowenan@huawei.com> wrote:
>
> There are some errors while compiling tls.c if
> CONFIG_MLX5_FPGA_TLS is not obvious on.
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c: In function mlx5e_tls_set_ipv4_flow:
> ./include/linux/mlx5/device.h:61:39: error: invalid application of sizeof to incomplete type struct mlx5_ifc_tls_flow_bits
>  #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits)
>                                        ^
> ./include/linux/compiler.h:330:9: note: in definition of macro __compiletime_assert
>    if (!(condition))     \
>          ^~~~~~~~~
> ...
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c: In function mlx5e_tls_build_netdev:
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:202:13: error: MLX5_ACCEL_TLS_TX undeclared (first use in this function); did you mean __MLX5_ACCEL_TLS_H__?
>   if (caps & MLX5_ACCEL_TLS_TX) {
>              ^~~~~~~~~~~~~~~~~
>              __MLX5_ACCEL_TLS_H__
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:207:13: error: MLX5_ACCEL_TLS_RX undeclared (first use in this function); did you mean MLX5_ACCEL_TLS_TX?
>   if (caps & MLX5_ACCEL_TLS_RX) {
>              ^~~~~~~~~~~~~~~~~
>              MLX5_ACCEL_TLS_TX
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:212:15: error: MLX5_ACCEL_TLS_LRO undeclared (first use in this function); did you mean MLX5_ACCEL_TLS_RX?
>   if (!(caps & MLX5_ACCEL_TLS_LRO)) {
>                ^~~~~~~~~~~~~~~~~~
>                MLX5_ACCEL_TLS_RX
> make[5]: *** [drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.o] Error 1
> make[5]: *** Waiting for unfinished jobs....
> make[4]: *** [drivers/net/ethernet/mellanox/mlx5/core] Error 2
> make[3]: *** [drivers/net/ethernet/mellanox] Error 2
> make[3]: *** Waiting for unfinished jobs....
> make[2]: *** [drivers/net/ethernet] Error 2
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [drivers/net] Error 2
> make[1]: *** Waiting for unfinished jobs....
> make: *** [drivers] Error 2
> make: *** Waiting for unfinished jobs....
>
> this patch is to fix this error using 'depends on MLX5_FPGA_TLS' when MLX5_TLS is set.
>

Hi Mao, Thanks for the patch. sorry for the delayed response, I was
out of office.

Actually MLX5_TLS doesn't depend on MLX5_FPGA_TLS anymore.
Tariq prepared a patch to fix this, we will submit it this week.


> Fixes: e2869fb2068b ("net/mlx5: Kconfig, Better organize compilation flags")
>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> index 37fef8c..1da2770 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> @@ -139,6 +139,7 @@ config MLX5_TLS
>         depends on MLX5_CORE_EN
>         depends on TLS_DEVICE
>         depends on TLS=y || MLX5_CORE=m
> +       depends on MLX5_FPGA_TLS
>         select MLX5_ACCEL
>         default n
>         help
> --
> 2.7.4
>

^ permalink raw reply

* RE: [PATCH v6 0/5] net: macb: cover letter
From: Parshuram Raju Thombare @ 2019-07-11  5:20 UTC (permalink / raw)
  To: David Miller
  Cc: andrew@lunn.ch, nicolas.ferre@microchip.com, f.fainelli@gmail.com,
	linux@armlinux.org.uk, netdev@vger.kernel.org,
	hkallweit1@gmail.com, linux-kernel@vger.kernel.org,
	Rafal Ciepiela, Piotr Sroka, Anil Joy Varughese, Arthur Marris,
	Steven Ho, Milind Parab
In-Reply-To: <20190710.114707.1137811182536299673.davem@davemloft.net>

Hi David,

Ok, I will resubmit it.

Regards,
Parshuram Thombare

^ permalink raw reply

* [PATCH v4 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:24 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, netdev, linux-kernel, linux, Daniel Drake,
	Jian-Hong Pan, stable
In-Reply-To: <CAPpJ_edDcaBq+0DocPmS-yYM10B4MkWvBn=f6wwbYdqzSGmp_g@mail.gmail.com>

Testing with RTL8822BE hardware, when available memory is low, we
frequently see a kernel panic and system freeze.

First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):

rx routine starvation
WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
[ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]

Then we see a variety of different error conditions and kernel panics,
such as this one (trimmed):

rtw_pci 0000:02:00.0: pci bus timeout, check dma status
skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:105!
invalid opcode: 0000 [#1] SMP NOPTI
RIP: 0010:skb_panic+0x43/0x45

When skb allocation fails and the "rx routine starvation" is hit, the
function returns immediately without updating the RX ring. At this
point, the RX ring may continue referencing an old skb which was already
handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
bad things happen.

This patch allocates a new, data-sized skb first in RX ISR. After
copying the data in, we pass it to the upper layers. However, if skb
allocation fails, we effectively drop the frame. In both cases, the
original, full size ring skb is reused.

In addition, to fixing the kernel crash, the RX routine should now
generally behave better under low memory conditions.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index cfe05ba7280d..c415f5e94fed 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 	u32 pkt_offset;
 	u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
 	u32 buf_desc_sz = chip->rx_buf_desc_sz;
+	u32 new_len;
 	u8 *rx_desc;
 	dma_addr_t dma;
 
@@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 		pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
 			     pkt_stat.shift;
 
-		if (pkt_stat.is_c2h) {
-			/* keep rx_desc, halmac needs it */
-			skb_put(skb, pkt_stat.pkt_len + pkt_offset);
+		/* allocate a new skb for this frame,
+		 * discard the frame if none available
+		 */
+		new_len = pkt_stat.pkt_len + pkt_offset;
+		new = dev_alloc_skb(new_len);
+		if (WARN_ONCE(!new, "rx routine starvation\n"))
+			goto next_rp;
+
+		/* put the DMA data including rx_desc from phy to new skb */
+		skb_put_data(new, skb->data, new_len);
 
-			/* pass offset for further operation */
-			*((u32 *)skb->cb) = pkt_offset;
-			skb_queue_tail(&rtwdev->c2h_queue, skb);
+		if (pkt_stat.is_c2h) {
+			 /* pass rx_desc & offset for further operation */
+			*((u32 *)new->cb) = pkt_offset;
+			skb_queue_tail(&rtwdev->c2h_queue, new);
 			ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
 		} else {
-			/* remove rx_desc, maybe use skb_pull? */
-			skb_put(skb, pkt_stat.pkt_len);
-			skb_reserve(skb, pkt_offset);
-
-			/* alloc a smaller skb to mac80211 */
-			new = dev_alloc_skb(pkt_stat.pkt_len);
-			if (!new) {
-				new = skb;
-			} else {
-				skb_put_data(new, skb->data, skb->len);
-				dev_kfree_skb_any(skb);
-			}
-			/* TODO: merge into rx.c */
-			rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
+			/* remove rx_desc */
+			skb_pull(new, pkt_offset);
+
+			rtw_rx_stats(rtwdev, pkt_stat.vif, new);
 			memcpy(new->cb, &rx_status, sizeof(rx_status));
 			ieee80211_rx_irqsafe(rtwdev->hw, new);
 		}
 
-		/* skb delivered to mac80211, alloc a new one in rx ring */
-		new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
-		if (WARN(!new, "rx routine starvation\n"))
-			return;
-
-		ring->buf[cur_rp] = new;
-		rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
+next_rp:
+		/* new skb delivered to mac80211, re-enable original skb DMA */
+		rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
 
 		/* host read next element in ring */
 		if (++cur_rp >= ring->r.len)
-- 
2.22.0


^ permalink raw reply related

* [PATCH v4 2/2] rtw88: pci: Use DMA sync instead of remapping in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:24 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, netdev, linux-kernel, linux, Daniel Drake,
	Jian-Hong Pan, stable
In-Reply-To: <20190711052427.5582-1-jian-hong@endlessm.com>

Since each skb in RX ring is reused instead of new allocation, we can
treat the DMA in a more efficient way by DMA synchronization.

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index c415f5e94fed..68fae52151dd 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -206,6 +206,23 @@ static int rtw_pci_reset_rx_desc(struct rtw_dev *rtwdev, struct sk_buff *skb,
 	return 0;
 }
 
+static void rtw_pci_sync_rx_desc_device(struct rtw_dev *rtwdev, dma_addr_t dma,
+					struct rtw_pci_rx_ring *rx_ring,
+					u32 idx, u32 desc_sz)
+{
+	struct device *dev = rtwdev->dev;
+	struct rtw_pci_rx_buffer_desc *buf_desc;
+	int buf_sz = RTK_PCI_RX_BUF_SIZE;
+
+	dma_sync_single_for_device(dev, dma, buf_sz, DMA_FROM_DEVICE);
+
+	buf_desc = (struct rtw_pci_rx_buffer_desc *)(rx_ring->r.head +
+						     idx * desc_sz);
+	memset(buf_desc, 0, sizeof(*buf_desc));
+	buf_desc->buf_size = cpu_to_le16(RTK_PCI_RX_BUF_SIZE);
+	buf_desc->dma = cpu_to_le32(dma);
+}
+
 static int rtw_pci_init_rx_ring(struct rtw_dev *rtwdev,
 				struct rtw_pci_rx_ring *rx_ring,
 				u8 desc_size, u32 len)
@@ -782,8 +799,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 		rtw_pci_dma_check(rtwdev, ring, cur_rp);
 		skb = ring->buf[cur_rp];
 		dma = *((dma_addr_t *)skb->cb);
-		pci_unmap_single(rtwpci->pdev, dma, RTK_PCI_RX_BUF_SIZE,
-				 PCI_DMA_FROMDEVICE);
+		dma_sync_single_for_cpu(rtwdev->dev, dma, RTK_PCI_RX_BUF_SIZE,
+					DMA_FROM_DEVICE);
 		rx_desc = skb->data;
 		chip->ops->query_rx_desc(rtwdev, rx_desc, &pkt_stat, &rx_status);
 
@@ -818,7 +835,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 
 next_rp:
 		/* new skb delivered to mac80211, re-enable original skb DMA */
-		rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
+		rtw_pci_sync_rx_desc_device(rtwdev, dma, ring, cur_rp,
+					    buf_desc_sz);
 
 		/* host read next element in ring */
 		if (++cur_rp >= ring->r.len)
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH v4 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:28 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, Linux Netdev List, Linux Kernel,
	Linux Upstreaming Team, Daniel Drake, stable
In-Reply-To: <20190711052427.5582-1-jian-hong@endlessm.com>

Jian-Hong Pan <jian-hong@endlessm.com> 於 2019年7月11日 週四 下午1:25寫道:
>
> Testing with RTL8822BE hardware, when available memory is low, we
> frequently see a kernel panic and system freeze.
>
> First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):
>
> rx routine starvation
> WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> [ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
>
> Then we see a variety of different error conditions and kernel panics,
> such as this one (trimmed):
>
> rtw_pci 0000:02:00.0: pci bus timeout, check dma status
> skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:105!
> invalid opcode: 0000 [#1] SMP NOPTI
> RIP: 0010:skb_panic+0x43/0x45
>
> When skb allocation fails and the "rx routine starvation" is hit, the
> function returns immediately without updating the RX ring. At this
> point, the RX ring may continue referencing an old skb which was already
> handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
> bad things happen.
>
> This patch allocates a new, data-sized skb first in RX ISR. After
> copying the data in, we pass it to the upper layers. However, if skb
> allocation fails, we effectively drop the frame. In both cases, the
> original, full size ring skb is reused.
>
> In addition, to fixing the kernel crash, the RX routine should now
> generally behave better under low memory conditions.
>
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> Cc: <stable@vger.kernel.org>
> ---

Sorry, I forget to place the version difference here.

v2:
 - Allocate new data-sized skb and put data into it, then pass it to
   mac80211. Reuse the original skb in RX ring by DMA sync.
 - Modify the commit message.
 - Introduce following [PATCH v3 2/2] rtw88: pci: Use DMA sync instead
   of remapping in RX ISR.

v3:
 - Same as v2.

v4:
 - Fix comment: allocate a new skb for this frame, discard the frame
if none available

>  drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
>  1 file changed, 22 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
> index cfe05ba7280d..c415f5e94fed 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>         u32 pkt_offset;
>         u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
>         u32 buf_desc_sz = chip->rx_buf_desc_sz;
> +       u32 new_len;
>         u8 *rx_desc;
>         dma_addr_t dma;
>
> @@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>                 pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
>                              pkt_stat.shift;
>
> -               if (pkt_stat.is_c2h) {
> -                       /* keep rx_desc, halmac needs it */
> -                       skb_put(skb, pkt_stat.pkt_len + pkt_offset);
> +               /* allocate a new skb for this frame,
> +                * discard the frame if none available
> +                */
> +               new_len = pkt_stat.pkt_len + pkt_offset;
> +               new = dev_alloc_skb(new_len);
> +               if (WARN_ONCE(!new, "rx routine starvation\n"))
> +                       goto next_rp;
> +
> +               /* put the DMA data including rx_desc from phy to new skb */
> +               skb_put_data(new, skb->data, new_len);
>
> -                       /* pass offset for further operation */
> -                       *((u32 *)skb->cb) = pkt_offset;
> -                       skb_queue_tail(&rtwdev->c2h_queue, skb);
> +               if (pkt_stat.is_c2h) {
> +                        /* pass rx_desc & offset for further operation */
> +                       *((u32 *)new->cb) = pkt_offset;
> +                       skb_queue_tail(&rtwdev->c2h_queue, new);
>                         ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
>                 } else {
> -                       /* remove rx_desc, maybe use skb_pull? */
> -                       skb_put(skb, pkt_stat.pkt_len);
> -                       skb_reserve(skb, pkt_offset);
> -
> -                       /* alloc a smaller skb to mac80211 */
> -                       new = dev_alloc_skb(pkt_stat.pkt_len);
> -                       if (!new) {
> -                               new = skb;
> -                       } else {
> -                               skb_put_data(new, skb->data, skb->len);
> -                               dev_kfree_skb_any(skb);
> -                       }
> -                       /* TODO: merge into rx.c */
> -                       rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
> +                       /* remove rx_desc */
> +                       skb_pull(new, pkt_offset);
> +
> +                       rtw_rx_stats(rtwdev, pkt_stat.vif, new);
>                         memcpy(new->cb, &rx_status, sizeof(rx_status));
>                         ieee80211_rx_irqsafe(rtwdev->hw, new);
>                 }
>
> -               /* skb delivered to mac80211, alloc a new one in rx ring */
> -               new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
> -               if (WARN(!new, "rx routine starvation\n"))
> -                       return;
> -
> -               ring->buf[cur_rp] = new;
> -               rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
> +next_rp:
> +               /* new skb delivered to mac80211, re-enable original skb DMA */
> +               rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
>
>                 /* host read next element in ring */
>                 if (++cur_rp >= ring->r.len)
> --
> 2.22.0
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox