Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Jason Wang @ 2019-07-11  7:37 UTC (permalink / raw)
  To: Stefano Garzarella, Michael S. Tsirkin, Stefan Hajnoczi
  Cc: virtualization, netdev
In-Reply-To: <20190710153707.twmzgmwqqw3pstos@steredhat>

On 2019/7/10 下午11:37, Stefano Garzarella wrote:
> Hi,
> as Jason suggested some months ago, I looked better at the virtio-net driver to
> understand if we can reuse some parts also in the virtio-vsock driver, since we
> have similar challenges (mergeable buffers, page allocation, small
> packets, etc.).
>
> Initially, I would add the skbuff in the virtio-vsock in order to re-use
> receive_*() functions.

Yes, that will be a good step.

> Then I would move receive_[small, big, mergeable]() and
> add_recvbuf_[small, big, mergeable]() outside of virtio-net driver, in order to
> call them also from virtio-vsock. I need to do some refactoring (e.g. leave the
> XDP part on the virtio-net driver), but I think it is feasible.
>
> The idea is to create a virtio-skb.[h,c] where put these functions and a new
> object where stores some attributes needed (e.g. hdr_len ) and status (e.g.
> some fields of struct receive_queue).

My understanding is we could be more ambitious here. Do you see any 
blocker for reusing virtio-net directly? It's better to reuse not only 
the functions but also the logic like NAPI to avoid re-inventing 
something buggy and duplicated.

> This is an idea of virtio-skb.h that
> I have in mind:
>      struct virtskb;

What fields do you want to store in virtskb? It looks to be exist 
sk_buff is flexible enough to us?

>
>      struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...);
>      struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
>      struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...);
>
>      int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
>      int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
>      int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);
>
> For the Guest->Host path it should be easier, so maybe I can add a
> "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code
> of xmit_skb().

I may miss something, but I don't see any thing that prevents us from 
using xmit_skb() directly.

>
> Let me know if you have in mind better names or if I should put these function
> in another place.
>
> I would like to leave the control part completely separate, so, for example,
> the two drivers will negotiate the features independently and they will call
> the right virtskb_receive_*() function based on the negotiation.

If it's one the issue of negotiation, we can simply change the 
virtnet_probe() to deal with different devices.

>
> I already started to work on it, but before to do more steps and send an RFC
> patch, I would like to hear your opinion.
> Do you think that makes sense?
> Do you see any issue or a better solution?

I still think we need to seek a way of adding some codes on virtio-net.c 
directly if there's no huge different in the processing of TX/RX. That 
would save us a lot time.

Thanks

>
> Thanks in advance,
> Stefano

^ permalink raw reply

* Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits
From: Christoph Paasch @ 2019-07-11  7:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Prout, Andrew - LLSC - MITLL, David Miller, netdev,
	Greg Kroah-Hartman, Jonathan Looney, Neal Cardwell, Tyler Hicks,
	Yuchung Cheng, Bruce Curtis, Jonathan Lemon, Dustin Marquess
In-Reply-To: <b1dfd327-a784-6609-3c83-dab42c3c7eda@gmail.com>



> On Jul 10, 2019, at 9:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 7/10/19 8:53 PM, Prout, Andrew - LLSC - MITLL wrote:
>> 
>> Our initial rollout was v4.14.130, but I reproduced it with v4.14.132 as well, reliably for the samba test and once (not reliably) with synthetic test I was trying. A patched v4.14.132 with this patch partially reverted (just the four lines from tcp_fragment deleted) passed the samba test.
>> 
>> The synthetic test was a pair of simple send/recv test programs under the following conditions:
>> -The send socket was non-blocking
>> -SO_SNDBUF set to 128KiB
>> -The receiver NIC was being flooded with traffic from multiple hosts (to induce packet loss/retransmits)
>> -Load was on both systems: a while(1) program spinning on each CPU core
>> -The receiver was on an older unaffected kernel
>> 
> 
> SO_SNDBUF to 128KB does not permit to recover from heavy losses,
> since skbs needs to be allocated for retransmits.

Would it make sense to always allow the alloc in tcp_fragment when coming from __tcp_retransmit_skb() through the retransmit-timer ?

AFAICS, the crasher was when an attacker sends "fake" SACK-blocks. Thus, we would still be protected from too much fragmentation, but at least would always allow the retransmission to go out.


Christoph

> 
> The bug we fixed allowed remote attackers to crash all linux hosts,
> 
> I am afraid we have to enforce the real SO_SNDBUF limit, finally.
> 
> Even a cushion of 128KB per socket is dangerous, for servers with millions of TCP sockets.
> 
> You will either have to set SO_SNDBUF to higher values, or let autotuning in place.
> Or revert the patches and allow attackers hit you badly.
> 


^ permalink raw reply

* RE: [PATCH v6 rdma-next 0/6] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
From: Michal Kalderon @ 2019-07-11  7:23 UTC (permalink / raw)
  To: Gal Pressman, Ariel Elior, jgg@ziepe.ca, dledford@redhat.com
  Cc: linux-rdma@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org, sleybo@amazon.com
In-Reply-To: <7b2f2205-6b5d-c9e7-2d59-296367e517ac@amazon.com>

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Gal Pressman
> 
> On 09/07/2019 17:17, Michal Kalderon wrote:
> > This patch series uses the doorbell overflow recovery mechanism
> > introduced in commit 36907cd5cd72 ("qed: Add doorbell overflow
> > recovery mechanism") for rdma ( RoCE and iWARP )
> >
> > The first three patches modify the core code to contain helper
> > functions for managing mmap_xa inserting, getting and freeing entries.
> > The code was taken almost as is from the efa driver.
> > There is still an open discussion on whether we should take this even
> > further and make the entire mmap generic. Until a decision is made, I
> > only created the database API and modified the efa and qedr driver to
> > use it. The doorbell recovery code will be based on the common code.
> >
> > Efa driver was compile tested only.
> 
> For the whole series:
> Tested-by: Gal Pressman <galpress@amazon.com>

Thanks Gal!


^ permalink raw reply

* Re: [PATCH net-next iproute2 2/3] tc: Introduce tc ct action
From: Paul Blakey @ 2019-07-11  7:21 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	netdev@vger.kernel.org, David Miller, Aaron Conole, Zhike Wang,
	Justin Pettit, John Hurley, Rony Efraim, nst-kernel@redhat.com,
	Simon Horman
In-Reply-To: <20190709153657.GF3390@localhost.localdomain>


On 7/9/2019 6:36 PM, Marcelo Ricardo Leitner wrote:
> On Tue, Jul 09, 2019 at 06:58:36AM +0000, Paul Blakey wrote:
>> On 7/8/2019 8:54 PM, Marcelo Ricardo Leitner wrote:
>>> On Sun, Jul 07, 2019 at 11:53:47AM +0300, Paul Blakey wrote:
>>>> New tc action to send packets to conntrack module, commit
>>>> them, and set a zone, labels, mark, and nat on the connection.
>>>>
>>>> It can also clear the packet's conntrack state by using clear.
>>>>
>>>> Usage:
>>>>      ct clear
>>>>      ct commit [force] [zone] [mark] [label] [nat]
>>> Isn't the 'commit' also optional? More like
>>>       ct [commit [force]] [zone] [mark] [label] [nat]
>>>
>>>>      ct [nat] [zone]
>>>>
>>>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>>>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>>> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
>>>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>>>> Acked-by: Roi Dayan <roid@mellanox.com>
>>>> ---
>>> ...
>>>> +static void
>>>> +usage(void)
>>>> +{
>>>> +	fprintf(stderr,
>>>> +		"Usage: ct clear\n"
>>>> +		"	ct commit [force] [zone ZONE] [mark MASKED_MARK] [label MASKED_LABEL] [nat NAT_SPEC]\n"
>>> Ditto here then.
>>
>> In commit msg and here, it means there is multiple modes of operation. I
>> think it's easier to split those.
> Yep, that is good.
> More below.
>
>> "ct clear" to clear it , not other options can be added here.
>>
>> "ct commit  [force].... " sends to conntrack and commit a connection,
>> and only for commit can you specify force mark  label, and nat with
>> nat_spec....
>>
>> and the last one, "ct [nat] [zone ZONE]" is to just send the packet to
>> conntrack on some zone [optional], restore nat [optional].
>>
>>
>>>> +		"	ct [nat] [zone ZONE]\n"
>>>> +		"Where: ZONE is the conntrack zone table number\n"
>>>> +		"	NAT_SPEC is {src|dst} addr addr1[-addr2] [port port1[-port2]]\n"
>>>> +		"\n");
>>>> +	exit(-1);
>>>> +}
>>> ...
>>>
>>> The validation below doesn't enforce that commit must be there for
>>> such case.
>> which case? commit is optional. the above are the three valid patterns.
> That's the point. But the 2nd example is saying 'commit' word is
> mandatory in that mode. It is written as it is a command that was
> selected.
>
> One may use just:
>      ct [zone]
> And not
>      ct commit [zone]
> Right?

It is optional in the overall syntax.


But I split it into modes:

clear, commit, and "restore" (I unofficial call it like that, because it 
usually used to get the +est state on the packet and can restore nat, it 
doesn't actually restore anything for the first packet on the -trk rule)

It is mandatory in the second mode (commit), if you don't specify commit 
or clear, you can only use the third form - "restore", which is to send 
to ct on some optional zone, and optionally and restore nat (so we get 
ct [zone] [nat]).

I think this syntax is easy, maybe I can label them as the modes of 
operation above (then I'll need to name the restore one better :)).

If there is a different syntax you think might be easier I'll change to 
that.


Thanks,

Paul.







^ permalink raw reply

* Re: [PATCH net-next v6 0/4] net/sched: Introduce tc connection tracking
From: Paul Blakey @ 2019-07-11  7:12 UTC (permalink / raw)
  To: David Miller
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	marcelo.leitner@gmail.com, netdev@vger.kernel.org,
	aconole@redhat.com, wangzhike@jd.com, Rony Efraim,
	nst-kernel@redhat.com, john.hurley@netronome.com,
	simon.horman@netronome.com, jpettit@ovn.org
In-Reply-To: <20190709.121402.1804664264408465946.davem@davemloft.net>


On 7/9/2019 10:14 PM, David Miller wrote:
> From: Paul Blakey <paulb@mellanox.com>
> Date: Tue,  9 Jul 2019 10:30:47 +0300
>
>> This patch series add connection tracking capabilities in tc sw datapath.
>> It does so via a new tc action, called act_ct, and new tc flower classifier matching
>> on conntrack state, mark and label.
>   ...
>
> Ok, I applied this, but two things:
>
> 1) You owe Cong Wang an explanation, a real detailed one, about the L2
>     vs L3 design of this feature.  I did not see you address his feedback,
>     but if you did I apologize.
>
> 2) Because the MPLS changes went in first, TCA_ID_CT ended up in a
>     different spot in the enumeration and therefore the value is
>     different.
>
> Thanks.



Thanks!

Re 1, I provided one in "Re: [PATCH net-next v2 0/4] net/sched: 
Introduce tc connection tracking", hope that's enough.


^ permalink raw reply

* [PATCH v2 bpf-next 2/3] selftests/bpf: add trickier size resolution tests
From: Andrii Nakryiko @ 2019-07-11  6:53 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190711065307.2425636-1-andriin@fb.com>

Add more BTF tests, validating that size resolution logic is correct in
few trickier cases.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/test_btf.c | 88 ++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
index 8351cb5f4a20..3d617e806054 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -3417,6 +3417,94 @@ static struct btf_raw_test raw_tests[] = {
 	.value_type_id = 1,
 	.max_entries = 4,
 },
+/*
+ * typedef int arr_t[16];
+ * struct s {
+ *	arr_t *a;
+ * };
+ */
+{
+	.descr = "struct->ptr->typedef->array->int size resolution",
+	.raw_types = {
+		BTF_STRUCT_ENC(NAME_TBD, 1, 8),			/* [1] */
+		BTF_MEMBER_ENC(NAME_TBD, 2, 0),
+		BTF_PTR_ENC(3),					/* [2] */
+		BTF_TYPEDEF_ENC(NAME_TBD, 4),			/* [3] */
+		BTF_TYPE_ARRAY_ENC(5, 5, 16),			/* [4] */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [5] */
+		BTF_END_RAW,
+	},
+	BTF_STR_SEC("\0s\0a\0arr_t"),
+	.map_type = BPF_MAP_TYPE_ARRAY,
+	.map_name = "ptr_mod_chain_size_resolve_map",
+	.key_size = sizeof(int),
+	.value_size = sizeof(int) * 16,
+	.key_type_id = 5 /* int */,
+	.value_type_id = 3 /* arr_t */,
+	.max_entries = 4,
+},
+/*
+ * typedef int arr_t[16][8][4];
+ * struct s {
+ *	arr_t *a;
+ * };
+ */
+{
+	.descr = "struct->ptr->typedef->multi-array->int size resolution",
+	.raw_types = {
+		BTF_STRUCT_ENC(NAME_TBD, 1, 8),			/* [1] */
+		BTF_MEMBER_ENC(NAME_TBD, 2, 0),
+		BTF_PTR_ENC(3),					/* [2] */
+		BTF_TYPEDEF_ENC(NAME_TBD, 4),			/* [3] */
+		BTF_TYPE_ARRAY_ENC(5, 7, 16),			/* [4] */
+		BTF_TYPE_ARRAY_ENC(6, 7, 8),			/* [5] */
+		BTF_TYPE_ARRAY_ENC(7, 7, 4),			/* [6] */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [7] */
+		BTF_END_RAW,
+	},
+	BTF_STR_SEC("\0s\0a\0arr_t"),
+	.map_type = BPF_MAP_TYPE_ARRAY,
+	.map_name = "multi_arr_size_resolve_map",
+	.key_size = sizeof(int),
+	.value_size = sizeof(int) * 16 * 8 * 4,
+	.key_type_id = 7 /* int */,
+	.value_type_id = 3 /* arr_t */,
+	.max_entries = 4,
+},
+/*
+ * typedef int int_t;
+ * typedef int_t arr3_t[4];
+ * typedef arr3_t arr2_t[8];
+ * typedef arr2_t arr1_t[16];
+ * struct s {
+ *	arr1_t *a;
+ * };
+ */
+{
+	.descr = "typedef/multi-arr mix size resolution",
+	.raw_types = {
+		BTF_STRUCT_ENC(NAME_TBD, 1, 8),			/* [1] */
+		BTF_MEMBER_ENC(NAME_TBD, 2, 0),
+		BTF_PTR_ENC(3),					/* [2] */
+		BTF_TYPEDEF_ENC(NAME_TBD, 4),			/* [3] */
+		BTF_TYPE_ARRAY_ENC(5, 10, 16),			/* [4] */
+		BTF_TYPEDEF_ENC(NAME_TBD, 6),			/* [5] */
+		BTF_TYPE_ARRAY_ENC(7, 10, 8),			/* [6] */
+		BTF_TYPEDEF_ENC(NAME_TBD, 8),			/* [7] */
+		BTF_TYPE_ARRAY_ENC(9, 10, 4),			/* [8] */
+		BTF_TYPEDEF_ENC(NAME_TBD, 10),			/* [9] */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [10] */
+		BTF_END_RAW,
+	},
+	BTF_STR_SEC("\0s\0a\0arr1_t\0arr2_t\0arr3_t\0int_t"),
+	.map_type = BPF_MAP_TYPE_ARRAY,
+	.map_name = "typedef_arra_mix_size_resolve_map",
+	.key_size = sizeof(int),
+	.value_size = sizeof(int) * 16 * 8 * 4,
+	.key_type_id = 10 /* int */,
+	.value_type_id = 3 /* arr_t */,
+	.max_entries = 4,
+},
 
 }; /* struct btf_raw_test raw_tests[] */
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 bpf-next 3/3] selftests/bpf: use typedef'ed arrays as map values
From: Andrii Nakryiko @ 2019-07-11  6:53 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190711065307.2425636-1-andriin@fb.com>

Convert few tests that couldn't use typedef'ed arrays due to kernel bug.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c     | 3 ++-
 tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c | 3 +--
 tools/testing/selftests/bpf/progs/test_stacktrace_map.c      | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
index d06b47a09097..33254b771384 100644
--- a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
+++ b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
@@ -47,11 +47,12 @@ struct {
  * issue and avoid complicated C programming massaging.
  * This is an acceptable workaround since there is one entry here.
  */
+typedef __u64 raw_stack_trace_t[2 * MAX_STACK_RAWTP];
 struct {
 	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
 	__uint(max_entries, 1);
 	__type(key, __u32);
-	__u64 (*value)[2 * MAX_STACK_RAWTP];
+	__type(value, raw_stack_trace_t);
 } rawdata_map SEC(".maps");
 
 SEC("tracepoint/raw_syscalls/sys_enter")
diff --git a/tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c b/tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c
index bbfc8337b6f0..f5638e26865d 100644
--- a/tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c
+++ b/tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c
@@ -36,8 +36,7 @@ struct {
 	__uint(type, BPF_MAP_TYPE_ARRAY);
 	__uint(max_entries, 128);
 	__type(key, __u32);
-	/* there seems to be a bug in kernel not handling typedef properly */
-	struct bpf_stack_build_id (*value)[PERF_MAX_STACK_DEPTH];
+	__type(value, stack_trace_t);
 } stack_amap SEC(".maps");
 
 /* taken from /sys/kernel/debug/tracing/events/random/urandom_read/format */
diff --git a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
index 803c15dc109d..fa0be3e10a10 100644
--- a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
+++ b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
@@ -35,7 +35,7 @@ struct {
 	__uint(type, BPF_MAP_TYPE_ARRAY);
 	__uint(max_entries, 16384);
 	__type(key, __u32);
-	__u64 (*value)[PERF_MAX_STACK_DEPTH];
+	__type(value, stack_trace_t);
 } stack_amap SEC(".maps");
 
 /* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 bpf-next 0/3] fix BTF verification size resolution
From: Andrii Nakryiko @ 2019-07-11  6:53 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

BTF size resolution logic isn't always resolving type size correctly, leading
to erroneous map creation failures due to value size mismatch.

This patch set:
1. fixes the issue (patch #1);
2. adds tests for trickier cases (patch #2);
3. and converts few test cases utilizing BTF-defined maps, that previously
   couldn't use typedef'ed arrays due to kernel bug (patch #3).

Patch #1 can be applied against bpf tree, but selftest ones (#2 and #3) have
to go against bpf-next for now.

Andrii Nakryiko (3):
  bpf: fix BTF verifier size resolution logic
  selftests/bpf: add trickier size resolution tests
  selftests/bpf: use typedef'ed arrays as map values

 kernel/bpf/btf.c                              | 14 ++-
 .../bpf/progs/test_get_stack_rawtp.c          |  3 +-
 .../bpf/progs/test_stacktrace_build_id.c      |  3 +-
 .../selftests/bpf/progs/test_stacktrace_map.c |  2 +-
 tools/testing/selftests/bpf/test_btf.c        | 88 +++++++++++++++++++
 5 files changed, 102 insertions(+), 8 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH v2 bpf-next 1/3] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  6:53 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Martin KaFai Lau
In-Reply-To: <20190711065307.2425636-1-andriin@fb.com>

BTF verifier has a size resolution bug which in some circumstances leads to
invalid size resolution for, e.g., TYPEDEF modifier.  This happens if we have
[1] PTR -> [2] TYPEDEF -> [3] ARRAY, in which case due to being in pointer
context ARRAY size won't be resolved (because for pointer it doesn't matter, so
it's a sink in pointer context), but it will be permanently remembered as zero
for TYPEDEF and TYPEDEF will be marked as RESOLVED. Eventually ARRAY size will
be resolved correctly, but TYPEDEF resolved_size won't be updated anymore.
This, subsequently, will lead to erroneous map creation failure, if that
TYPEDEF is specified as either key or value, as key_size/value_size won't
correspond to resolved size of TYPEDEF (kernel will believe it's zero).

Note, that if BTF was ordered as [1] ARRAY <- [2] TYPEDEF <- [3] PTR, this
won't be a problem, as by the time we get to TYPEDEF, ARRAY's size is already
calculated and stored.

This bug manifests itself in rejecting BTF-defined maps that use array
typedef as a value type:

typedef int array_t[16];

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __type(value, array_t); /* i.e., array_t *value; */
} test_map SEC(".maps");

The fix consists on not relying on modifier's resolved_size and instead using
modifier's resolved_id (type ID for "concrete" type to which modifier
eventually resolves) and doing size determination for that resolved type. This
allow to preserve existing "early DFS termination" logic for PTR or
STRUCT_OR_ARRAY contexts, but still do correct size determination for modifier
types.

Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 kernel/bpf/btf.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index cad09858a5f2..22fe8b155e51 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -1073,11 +1073,18 @@ const struct btf_type *btf_type_id_size(const struct btf *btf,
 				 !btf_type_is_var(size_type)))
 			return NULL;

-		size = btf->resolved_sizes[size_type_id];
 		size_type_id = btf->resolved_ids[size_type_id];
 		size_type = btf_type_by_id(btf, size_type_id);
 		if (btf_type_nosize_or_null(size_type))
 			return NULL;
+		else if (btf_type_has_size(size_type))
+			size = size_type->size;
+		else if (btf_type_is_array(size_type))
+			size = btf->resolved_sizes[size_type_id];
+		else if (btf_type_is_ptr(size_type))
+			size = sizeof(void *);
+		else
+			return NULL;
 	}

 	*type_id = size_type_id;
@@ -1602,7 +1609,6 @@ static int btf_modifier_resolve(struct btf_verifier_env *env,
 	const struct btf_type *next_type;
 	u32 next_type_id = t->type;
 	struct btf *btf = env->btf;
-	u32 next_type_size = 0;

 	next_type = btf_type_by_id(btf, next_type_id);
 	if (!next_type || btf_type_is_resolve_source_only(next_type)) {
@@ -1620,7 +1626,7 @@ static int btf_modifier_resolve(struct btf_verifier_env *env,
 	 * save us a few type-following when we use it later (e.g. in
 	 * pretty print).
 	 */
-	if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
+	if (!btf_type_id_size(btf, &next_type_id, NULL)) {
 		if (env_type_is_resolved(env, next_type_id))
 			next_type = btf_type_id_resolve(btf, &next_type_id);

@@ -1633,7 +1639,7 @@ static int btf_modifier_resolve(struct btf_verifier_env *env,
 		}
 	}

-	env_stack_pop_resolved(env, next_type_id, next_type_size);
+	env_stack_pop_resolved(env, next_type_id, 0);

 	return 0;
 }
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] net/mlx5e: Move priv variable into case statement in mlx5e_setup_tc
From: Nathan Chancellor @ 2019-07-11  6:09 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller,
	Linux Netdev List, RDMA mailing list, linux-kernel,
	clang-built-linux
In-Reply-To: <CALzJLG9Aw=sVPDiewHr+4Jiuaod_1q=10vzMzCUVg-rCCXD6cQ@mail.gmail.com>

On Wed, Jul 10, 2019 at 11:02:00PM -0700, Saeed Mahameed wrote:
> On Wed, Jul 10, 2019 at 12:05 PM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >
> > There is an unused variable warning on arm64 defconfig when
> > CONFIG_MLX5_ESWITCH is unset:
> >
> > drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3467:21: warning:
> > unused variable 'priv' [-Wunused-variable]
> >         struct mlx5e_priv *priv = netdev_priv(dev);
> >                            ^
> > 1 warning generated.
> >
> > Move it down into the case statement where it is used.
> >
> > Fixes: 4e95bc268b91 ("net: flow_offload: add flow_block_cb_setup_simple()")
> > Link: https://github.com/ClangBuiltLinux/linux/issues/597
> > Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 6d0ae87c8ded..651eb714eb5b 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -3464,15 +3464,16 @@ static LIST_HEAD(mlx5e_block_cb_list);
> >  static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
> >                           void *type_data)
> >  {
> > -       struct mlx5e_priv *priv = netdev_priv(dev);
> > -
> >         switch (type) {
> >  #ifdef CONFIG_MLX5_ESWITCH
> > -       case TC_SETUP_BLOCK:
> > +       case TC_SETUP_BLOCK: {
> > +               struct mlx5e_priv *priv = netdev_priv(dev);
> > +
> >                 return flow_block_cb_setup_simple(type_data,
> >                                                   &mlx5e_block_cb_list,
> >                                                   mlx5e_setup_tc_block_cb,
> >                                                   priv, priv, true);
> > +       }
> 
> Hi Nathan,
> 
> We have another patch internally that fixes this, and it is already
> queued up in my queue.
> it works differently as we want to pass priv instead of netdev to
> mlx5e_setup_tc_mqprio below,
> which will also solve warning ..
> 
> So i would like to submit that patch if it is ok with you ?

Hi Saeed,

Whatever works best for you, I just care that the warning gets fixed,
not how it is done :) I wouldn't mind being put on CC so I can pick it
up for my local tests.

Thanks for the follow up!
Nathan

^ permalink raw reply

* Re: [PATCH] net/mlx5e: Move priv variable into case statement in mlx5e_setup_tc
From: Saeed Mahameed @ 2019-07-11  6:02 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller,
	Linux Netdev List, RDMA mailing list, linux-kernel,
	clang-built-linux
In-Reply-To: <20190710190502.104010-1-natechancellor@gmail.com>

On Wed, Jul 10, 2019 at 12:05 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> There is an unused variable warning on arm64 defconfig when
> CONFIG_MLX5_ESWITCH is unset:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3467:21: warning:
> unused variable 'priv' [-Wunused-variable]
>         struct mlx5e_priv *priv = netdev_priv(dev);
>                            ^
> 1 warning generated.
>
> Move it down into the case statement where it is used.
>
> Fixes: 4e95bc268b91 ("net: flow_offload: add flow_block_cb_setup_simple()")
> Link: https://github.com/ClangBuiltLinux/linux/issues/597
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 6d0ae87c8ded..651eb714eb5b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3464,15 +3464,16 @@ static LIST_HEAD(mlx5e_block_cb_list);
>  static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
>                           void *type_data)
>  {
> -       struct mlx5e_priv *priv = netdev_priv(dev);
> -
>         switch (type) {
>  #ifdef CONFIG_MLX5_ESWITCH
> -       case TC_SETUP_BLOCK:
> +       case TC_SETUP_BLOCK: {
> +               struct mlx5e_priv *priv = netdev_priv(dev);
> +
>                 return flow_block_cb_setup_simple(type_data,
>                                                   &mlx5e_block_cb_list,
>                                                   mlx5e_setup_tc_block_cb,
>                                                   priv, priv, true);
> +       }

Hi Nathan,

We have another patch internally that fixes this, and it is already
queued up in my queue.
it works differently as we want to pass priv instead of netdev to
mlx5e_setup_tc_mqprio below,
which will also solve warning ..

So i would like to submit that patch if it is ok with you ?

>  #endif
>         case TC_SETUP_QDISC_MQPRIO:
>                 return mlx5e_setup_tc_mqprio(dev, type_data);
> --
> 2.22.0
>

^ permalink raw reply

* Re: Fw: [Bug 204099] New: systemd-networkd fails on 5.2 - same version works on 5.1.16
From: Leon Romanovsky @ 2019-07-11  5:54 UTC (permalink / raw)
  To: David Ahern; +Cc: Stephen Hemminger, netdev, Maxim Mikityanskiy
In-Reply-To: <37ee2993-f81b-6265-87b0-1179162f1a2d@gmail.com>

On Wed, Jul 10, 2019 at 04:43:18PM -0600, David Ahern wrote:
> On 7/9/19 8:43 AM, Stephen Hemminger wrote:
> > Looks like the stricter netlink validation broke userspace.
> > This is bad.

Actually, the initial bug in systemd and it is where it should be fixed.

>
> I believe other reports have traced this to
>
> commit 7dc2bccab0ee37ac28096b8fcdc390a679a15841
> Author: Maxim Mikityanskiy <maximmi@mellanox.com>
> Date:   Tue May 21 06:40:04 2019 +0000
>
>     Validate required parameters in inet6_validate_link_af

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Leon Romanovsky @ 2019-07-11  5:40 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Jason Gunthorpe, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711131603.6b11b831@canb.auug.org.au>

On Thu, Jul 11, 2019 at 01:16:03PM +1000, Stephen Rothwell wrote:
> Hi all,
>
> On Thu, 11 Jul 2019 13:13:44 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:
> > >
> > > On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> > >
> > > > So today this failed to build after I merged the rdma tree (previously
> > > > it didn;t until after the net-next tree was merged (I assume a
> > > > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > > > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > > > tree :-(
> > >
> > > ? I'm confused..
> > >
> > > rdma.git builds fine stand alone (I hope!)
> >
> > I have "Fixup to build SIW issue" from Leon (which switches to using
> > in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
> > without that the rdma tree would not build for me.  Are you saying that
> > I don't need that at all, now?
>
> Actually , I get it now, "Fixup to build SIW issue" is really just a
> fixup for the net-next and rdma trees merge ... OK, I will fix that up
> tomorrow.  Sorry for my confusion.

Yes, it was for build only.

>
> --
> Cheers,
> Stephen Rothwell



^ permalink raw reply

* Re: [PATCH v4 2/2] rtw88: pci: Use DMA sync instead of remapping in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:30 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, Linux Netdev List, Linux Kernel,
	Linux Upstreaming Team, Daniel Drake, stable
In-Reply-To: <20190711052427.5582-2-jian-hong@endlessm.com>

Jian-Hong Pan <jian-hong@endlessm.com> 於 2019年7月11日 週四 下午1:25寫道：
>
> Since each skb in RX ring is reused instead of new allocation, we can
> treat the DMA in a more efficient way by DMA synchronization.
>
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> Cc: <stable@vger.kernel.org>
> ---

Sorry, also forget to place the version difference here

v2:
 - New patch by following [PATCH v3 1/2] rtw88: pci: Rearrange the
   memory usage for skb in RX ISR.

v3:
 - Remove rtw_pci_sync_rx_desc_cpu and call dma_sync_single_for_cpu in
   rtw_pci_rx_isr directly.
 - Remove the return value of rtw_pci_sync_rx_desc_device.
 - Use DMA_FROM_DEVICE instead of PCI_DMA_FROMDEVICE.

v4:
 - Same as v3.

>  drivers/net/wireless/realtek/rtw88/pci.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
> index c415f5e94fed..68fae52151dd 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -206,6 +206,23 @@ static int rtw_pci_reset_rx_desc(struct rtw_dev *rtwdev, struct sk_buff *skb,
>         return 0;
>  }
>
> +static void rtw_pci_sync_rx_desc_device(struct rtw_dev *rtwdev, dma_addr_t dma,
> +                                       struct rtw_pci_rx_ring *rx_ring,
> +                                       u32 idx, u32 desc_sz)
> +{
> +       struct device *dev = rtwdev->dev;
> +       struct rtw_pci_rx_buffer_desc *buf_desc;
> +       int buf_sz = RTK_PCI_RX_BUF_SIZE;
> +
> +       dma_sync_single_for_device(dev, dma, buf_sz, DMA_FROM_DEVICE);
> +
> +       buf_desc = (struct rtw_pci_rx_buffer_desc *)(rx_ring->r.head +
> +                                                    idx * desc_sz);
> +       memset(buf_desc, 0, sizeof(*buf_desc));
> +       buf_desc->buf_size = cpu_to_le16(RTK_PCI_RX_BUF_SIZE);
> +       buf_desc->dma = cpu_to_le32(dma);
> +}
> +
>  static int rtw_pci_init_rx_ring(struct rtw_dev *rtwdev,
>                                 struct rtw_pci_rx_ring *rx_ring,
>                                 u8 desc_size, u32 len)
> @@ -782,8 +799,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>                 rtw_pci_dma_check(rtwdev, ring, cur_rp);
>                 skb = ring->buf[cur_rp];
>                 dma = *((dma_addr_t *)skb->cb);
> -               pci_unmap_single(rtwpci->pdev, dma, RTK_PCI_RX_BUF_SIZE,
> -                                PCI_DMA_FROMDEVICE);
> +               dma_sync_single_for_cpu(rtwdev->dev, dma, RTK_PCI_RX_BUF_SIZE,
> +                                       DMA_FROM_DEVICE);
>                 rx_desc = skb->data;
>                 chip->ops->query_rx_desc(rtwdev, rx_desc, &pkt_stat, &rx_status);
>
> @@ -818,7 +835,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>
>  next_rp:
>                 /* new skb delivered to mac80211, re-enable original skb DMA */
> -               rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
> +               rtw_pci_sync_rx_desc_device(rtwdev, dma, ring, cur_rp,
> +                                           buf_desc_sz);
>
>                 /* host read next element in ring */
>                 if (++cur_rp >= ring->r.len)
> --
> 2.22.0
>

^ permalink raw reply

* Re: [PATCH v4 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:28 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, Linux Netdev List, Linux Kernel,
	Linux Upstreaming Team, Daniel Drake, stable
In-Reply-To: <20190711052427.5582-1-jian-hong@endlessm.com>

Jian-Hong Pan <jian-hong@endlessm.com> 於 2019年7月11日 週四 下午1:25寫道：
>
> Testing with RTL8822BE hardware, when available memory is low, we
> frequently see a kernel panic and system freeze.
>
> First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):
>
> rx routine starvation
> WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> [ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
>
> Then we see a variety of different error conditions and kernel panics,
> such as this one (trimmed):
>
> rtw_pci 0000:02:00.0: pci bus timeout, check dma status
> skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:105!
> invalid opcode: 0000 [#1] SMP NOPTI
> RIP: 0010:skb_panic+0x43/0x45
>
> When skb allocation fails and the "rx routine starvation" is hit, the
> function returns immediately without updating the RX ring. At this
> point, the RX ring may continue referencing an old skb which was already
> handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
> bad things happen.
>
> This patch allocates a new, data-sized skb first in RX ISR. After
> copying the data in, we pass it to the upper layers. However, if skb
> allocation fails, we effectively drop the frame. In both cases, the
> original, full size ring skb is reused.
>
> In addition, to fixing the kernel crash, the RX routine should now
> generally behave better under low memory conditions.
>
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> Cc: <stable@vger.kernel.org>
> ---

Sorry, I forget to place the version difference here.

v2:
 - Allocate new data-sized skb and put data into it, then pass it to
   mac80211. Reuse the original skb in RX ring by DMA sync.
 - Modify the commit message.
 - Introduce following [PATCH v3 2/2] rtw88: pci: Use DMA sync instead
   of remapping in RX ISR.

v3:
 - Same as v2.

v4:
 - Fix comment: allocate a new skb for this frame, discard the frame
if none available

>  drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
>  1 file changed, 22 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
> index cfe05ba7280d..c415f5e94fed 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>         u32 pkt_offset;
>         u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
>         u32 buf_desc_sz = chip->rx_buf_desc_sz;
> +       u32 new_len;
>         u8 *rx_desc;
>         dma_addr_t dma;
>
> @@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>                 pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
>                              pkt_stat.shift;
>
> -               if (pkt_stat.is_c2h) {
> -                       /* keep rx_desc, halmac needs it */
> -                       skb_put(skb, pkt_stat.pkt_len + pkt_offset);
> +               /* allocate a new skb for this frame,
> +                * discard the frame if none available
> +                */
> +               new_len = pkt_stat.pkt_len + pkt_offset;
> +               new = dev_alloc_skb(new_len);
> +               if (WARN_ONCE(!new, "rx routine starvation\n"))
> +                       goto next_rp;
> +
> +               /* put the DMA data including rx_desc from phy to new skb */
> +               skb_put_data(new, skb->data, new_len);
>
> -                       /* pass offset for further operation */
> -                       *((u32 *)skb->cb) = pkt_offset;
> -                       skb_queue_tail(&rtwdev->c2h_queue, skb);
> +               if (pkt_stat.is_c2h) {
> +                        /* pass rx_desc & offset for further operation */
> +                       *((u32 *)new->cb) = pkt_offset;
> +                       skb_queue_tail(&rtwdev->c2h_queue, new);
>                         ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
>                 } else {
> -                       /* remove rx_desc, maybe use skb_pull? */
> -                       skb_put(skb, pkt_stat.pkt_len);
> -                       skb_reserve(skb, pkt_offset);
> -
> -                       /* alloc a smaller skb to mac80211 */
> -                       new = dev_alloc_skb(pkt_stat.pkt_len);
> -                       if (!new) {
> -                               new = skb;
> -                       } else {
> -                               skb_put_data(new, skb->data, skb->len);
> -                               dev_kfree_skb_any(skb);
> -                       }
> -                       /* TODO: merge into rx.c */
> -                       rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
> +                       /* remove rx_desc */
> +                       skb_pull(new, pkt_offset);
> +
> +                       rtw_rx_stats(rtwdev, pkt_stat.vif, new);
>                         memcpy(new->cb, &rx_status, sizeof(rx_status));
>                         ieee80211_rx_irqsafe(rtwdev->hw, new);
>                 }
>
> -               /* skb delivered to mac80211, alloc a new one in rx ring */
> -               new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
> -               if (WARN(!new, "rx routine starvation\n"))
> -                       return;
> -
> -               ring->buf[cur_rp] = new;
> -               rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
> +next_rp:
> +               /* new skb delivered to mac80211, re-enable original skb DMA */
> +               rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
>
>                 /* host read next element in ring */
>                 if (++cur_rp >= ring->r.len)
> --
> 2.22.0
>

^ permalink raw reply

* [PATCH v4 2/2] rtw88: pci: Use DMA sync instead of remapping in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:24 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, netdev, linux-kernel, linux, Daniel Drake,
	Jian-Hong Pan, stable
In-Reply-To: <20190711052427.5582-1-jian-hong@endlessm.com>

Since each skb in RX ring is reused instead of new allocation, we can
treat the DMA in a more efficient way by DMA synchronization.

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index c415f5e94fed..68fae52151dd 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -206,6 +206,23 @@ static int rtw_pci_reset_rx_desc(struct rtw_dev *rtwdev, struct sk_buff *skb,
 	return 0;
 }
 
+static void rtw_pci_sync_rx_desc_device(struct rtw_dev *rtwdev, dma_addr_t dma,
+					struct rtw_pci_rx_ring *rx_ring,
+					u32 idx, u32 desc_sz)
+{
+	struct device *dev = rtwdev->dev;
+	struct rtw_pci_rx_buffer_desc *buf_desc;
+	int buf_sz = RTK_PCI_RX_BUF_SIZE;
+
+	dma_sync_single_for_device(dev, dma, buf_sz, DMA_FROM_DEVICE);
+
+	buf_desc = (struct rtw_pci_rx_buffer_desc *)(rx_ring->r.head +
+						     idx * desc_sz);
+	memset(buf_desc, 0, sizeof(*buf_desc));
+	buf_desc->buf_size = cpu_to_le16(RTK_PCI_RX_BUF_SIZE);
+	buf_desc->dma = cpu_to_le32(dma);
+}
+
 static int rtw_pci_init_rx_ring(struct rtw_dev *rtwdev,
 				struct rtw_pci_rx_ring *rx_ring,
 				u8 desc_size, u32 len)
@@ -782,8 +799,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 		rtw_pci_dma_check(rtwdev, ring, cur_rp);
 		skb = ring->buf[cur_rp];
 		dma = *((dma_addr_t *)skb->cb);
-		pci_unmap_single(rtwpci->pdev, dma, RTK_PCI_RX_BUF_SIZE,
-				 PCI_DMA_FROMDEVICE);
+		dma_sync_single_for_cpu(rtwdev->dev, dma, RTK_PCI_RX_BUF_SIZE,
+					DMA_FROM_DEVICE);
 		rx_desc = skb->data;
 		chip->ops->query_rx_desc(rtwdev, rx_desc, &pkt_stat, &rx_status);
 
@@ -818,7 +835,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 
 next_rp:
 		/* new skb delivered to mac80211, re-enable original skb DMA */
-		rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
+		rtw_pci_sync_rx_desc_device(rtwdev, dma, ring, cur_rp,
+					    buf_desc_sz);
 
 		/* host read next element in ring */
 		if (++cur_rp >= ring->r.len)
-- 
2.22.0


^ permalink raw reply related

* [PATCH v4 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:24 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, netdev, linux-kernel, linux, Daniel Drake,
	Jian-Hong Pan, stable
In-Reply-To: <CAPpJ_edDcaBq+0DocPmS-yYM10B4MkWvBn=f6wwbYdqzSGmp_g@mail.gmail.com>

Testing with RTL8822BE hardware, when available memory is low, we
frequently see a kernel panic and system freeze.

First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):

rx routine starvation
WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
[ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]

Then we see a variety of different error conditions and kernel panics,
such as this one (trimmed):

rtw_pci 0000:02:00.0: pci bus timeout, check dma status
skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:105!
invalid opcode: 0000 [#1] SMP NOPTI
RIP: 0010:skb_panic+0x43/0x45

When skb allocation fails and the "rx routine starvation" is hit, the
function returns immediately without updating the RX ring. At this
point, the RX ring may continue referencing an old skb which was already
handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
bad things happen.

This patch allocates a new, data-sized skb first in RX ISR. After
copying the data in, we pass it to the upper layers. However, if skb
allocation fails, we effectively drop the frame. In both cases, the
original, full size ring skb is reused.

In addition, to fixing the kernel crash, the RX routine should now
generally behave better under low memory conditions.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index cfe05ba7280d..c415f5e94fed 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 	u32 pkt_offset;
 	u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
 	u32 buf_desc_sz = chip->rx_buf_desc_sz;
+	u32 new_len;
 	u8 *rx_desc;
 	dma_addr_t dma;
 
@@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 		pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
 			     pkt_stat.shift;
 
-		if (pkt_stat.is_c2h) {
-			/* keep rx_desc, halmac needs it */
-			skb_put(skb, pkt_stat.pkt_len + pkt_offset);
+		/* allocate a new skb for this frame,
+		 * discard the frame if none available
+		 */
+		new_len = pkt_stat.pkt_len + pkt_offset;
+		new = dev_alloc_skb(new_len);
+		if (WARN_ONCE(!new, "rx routine starvation\n"))
+			goto next_rp;
+
+		/* put the DMA data including rx_desc from phy to new skb */
+		skb_put_data(new, skb->data, new_len);
 
-			/* pass offset for further operation */
-			*((u32 *)skb->cb) = pkt_offset;
-			skb_queue_tail(&rtwdev->c2h_queue, skb);
+		if (pkt_stat.is_c2h) {
+			 /* pass rx_desc & offset for further operation */
+			*((u32 *)new->cb) = pkt_offset;
+			skb_queue_tail(&rtwdev->c2h_queue, new);
 			ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
 		} else {
-			/* remove rx_desc, maybe use skb_pull? */
-			skb_put(skb, pkt_stat.pkt_len);
-			skb_reserve(skb, pkt_offset);
-
-			/* alloc a smaller skb to mac80211 */
-			new = dev_alloc_skb(pkt_stat.pkt_len);
-			if (!new) {
-				new = skb;
-			} else {
-				skb_put_data(new, skb->data, skb->len);
-				dev_kfree_skb_any(skb);
-			}
-			/* TODO: merge into rx.c */
-			rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
+			/* remove rx_desc */
+			skb_pull(new, pkt_offset);
+
+			rtw_rx_stats(rtwdev, pkt_stat.vif, new);
 			memcpy(new->cb, &rx_status, sizeof(rx_status));
 			ieee80211_rx_irqsafe(rtwdev->hw, new);
 		}
 
-		/* skb delivered to mac80211, alloc a new one in rx ring */
-		new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
-		if (WARN(!new, "rx routine starvation\n"))
-			return;
-
-		ring->buf[cur_rp] = new;
-		rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
+next_rp:
+		/* new skb delivered to mac80211, re-enable original skb DMA */
+		rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
 
 		/* host read next element in ring */
 		if (++cur_rp >= ring->r.len)
-- 
2.22.0


^ permalink raw reply related

* RE: [PATCH v6 0/5] net: macb: cover letter
From: Parshuram Raju Thombare @ 2019-07-11  5:20 UTC (permalink / raw)
  To: David Miller
  Cc: andrew@lunn.ch, nicolas.ferre@microchip.com, f.fainelli@gmail.com,
	linux@armlinux.org.uk, netdev@vger.kernel.org,
	hkallweit1@gmail.com, linux-kernel@vger.kernel.org,
	Rafal Ciepiela, Piotr Sroka, Anil Joy Varughese, Arthur Marris,
	Steven Ho, Milind Parab
In-Reply-To: <20190710.114707.1137811182536299673.davem@davemloft.net>

Hi David,

Ok, I will resubmit it.

Regards,
Parshuram Thombare

^ permalink raw reply

* Re: [PATCH net-next] net: mlx5: Fix compiling error in tls.c
From: Saeed Mahameed @ 2019-07-11  5:07 UTC (permalink / raw)
  To: Mao Wenan
  Cc: David S. Miller, Saeed Mahameed, Linux Netdev List, linux-kernel
In-Reply-To: <20190710093852.34549-1-maowenan@huawei.com>

On Wed, Jul 10, 2019 at 2:33 AM Mao Wenan <maowenan@huawei.com> wrote:
>
> There are some errors while compiling tls.c if
> CONFIG_MLX5_FPGA_TLS is not obvious on.
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c: In function mlx5e_tls_set_ipv4_flow:
> ./include/linux/mlx5/device.h:61:39: error: invalid application of sizeof to incomplete type struct mlx5_ifc_tls_flow_bits
>  #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits)
>                                        ^
> ./include/linux/compiler.h:330:9: note: in definition of macro __compiletime_assert
>    if (!(condition))     \
>          ^~~~~~~~~
> ...
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c: In function mlx5e_tls_build_netdev:
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:202:13: error: MLX5_ACCEL_TLS_TX undeclared (first use in this function); did you mean __MLX5_ACCEL_TLS_H__?
>   if (caps & MLX5_ACCEL_TLS_TX) {
>              ^~~~~~~~~~~~~~~~~
>              __MLX5_ACCEL_TLS_H__
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:207:13: error: MLX5_ACCEL_TLS_RX undeclared (first use in this function); did you mean MLX5_ACCEL_TLS_TX?
>   if (caps & MLX5_ACCEL_TLS_RX) {
>              ^~~~~~~~~~~~~~~~~
>              MLX5_ACCEL_TLS_TX
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:212:15: error: MLX5_ACCEL_TLS_LRO undeclared (first use in this function); did you mean MLX5_ACCEL_TLS_RX?
>   if (!(caps & MLX5_ACCEL_TLS_LRO)) {
>                ^~~~~~~~~~~~~~~~~~
>                MLX5_ACCEL_TLS_RX
> make[5]: *** [drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.o] Error 1
> make[5]: *** Waiting for unfinished jobs....
> make[4]: *** [drivers/net/ethernet/mellanox/mlx5/core] Error 2
> make[3]: *** [drivers/net/ethernet/mellanox] Error 2
> make[3]: *** Waiting for unfinished jobs....
> make[2]: *** [drivers/net/ethernet] Error 2
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [drivers/net] Error 2
> make[1]: *** Waiting for unfinished jobs....
> make: *** [drivers] Error 2
> make: *** Waiting for unfinished jobs....
>
> this patch is to fix this error using 'depends on MLX5_FPGA_TLS' when MLX5_TLS is set.
>

Hi Mao, Thanks for the patch. sorry for the delayed response, I was
out of office.

Actually MLX5_TLS doesn't depend on MLX5_FPGA_TLS anymore.
Tariq prepared a patch to fix this, we will submit it this week.


> Fixes: e2869fb2068b ("net/mlx5: Kconfig, Better organize compilation flags")
>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> index 37fef8c..1da2770 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> @@ -139,6 +139,7 @@ config MLX5_TLS
>         depends on MLX5_CORE_EN
>         depends on TLS_DEVICE
>         depends on TLS=y || MLX5_CORE=m
> +       depends on MLX5_FPGA_TLS
>         select MLX5_ACCEL
>         default n
>         help
> --
> 2.7.4
>

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  4:56 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <05db3afa-b94e-d0ba-7d61-ec1bf9a82777@fb.com>

On Wed, Jul 10, 2019 at 9:14 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> >>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>>>> BTF verifier has Different logic depending on whether we are following
> >>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>>>> stop early in DFS traversal while resolving BTF types. But it also
> >>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>>>> size won't be resolved, as it is considered to be a sink for pointer,
> >>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>>>> completely wrong.
> >>>>>
> >>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>>>> over all BTF types anyways, so the only saving is a potentially slightly
> >>>>> shorter stack. But correctness is more important that tiny savings.
> >>>>>
> >>>>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>>>> typedef as a value type:
> >>>>>
> >>>>> typedef int array_t[16];
> >>>>>
> >>>>> struct {
> >>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>>>         __type(value, array_t); /* i.e., array_t *value; */
> >>>>> } test_map SEC(".maps");
> >>>>>
> >>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>>>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>>>
> >>>> The change seems okay to me. Currently, looks like intermediate
> >>>> modifier type will carry size = 0 (in the internal data structure).
> >>>
> >>> Yes, which is totally wrong, especially that we use that size in some
> >>> cases to reject map with specified BTF.
> >>>
> >>>>
> >>>> If we remove RESOLVE logic, we probably want to double check
> >>>> whether we handle circular types correctly or not. Maybe we will
> >>>> be okay if all self tests pass.
> >>>
> >>> I checked, it does. We'll attempt to add referenced type unless it's a
> >>> "resolve sink" (where size is immediately known) or is already
> >>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
> >>> env_stack_push(), which check that the state of that type is
> >>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> >>> type is added into the stack, it's resolve state goes from NOT_VISITED
> >>> to VISITED.
> >>>
> >>> So, if there is a loop, then we'll detect it as soon as we'll attempt
> >>> to add the same type onto the stack second time.
> >>>
> >>>>
> >>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >>>> before removing it.
> >>>
> >>> I don't think there is any, because every type will be visited exactly
> >>> once, due to DFS nature of algorithm. The only difference is that if
> >>> we have a long chain of modifiers, we can technically reach the max
> >>> limit and fail. But at 32 I think it's pretty unrealistic to have such
> >>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >>>
> >>>>
> >>>> Another possible change is, for external usage, removing
> >>>> modifiers, before checking the size, something like below.
> >>>> Note that I am not strongly advocating my below patch as
> >>>> it has the same shortcoming that maintained modifier type
> >>>> size may not be correct.
> >>>
> >>> I don't think your patch helps, it can actually confuse things even
> >>> more. It skips modifiers until underlying type is found, but you still
> >>> don't guarantee that at that time that underlying type will have its
> >>> size resolved.
> >>
> >> It actually does help. It does not change the internal btf type
> >> traversal algorithms. It only change the implementation of
> >> an external API btf_type_id_size(). Previously, this function
> >> is used by externals and internal btf.c. I broke it into two,
> >> one internal __btf_type_id_size(), and another external
> >> btf_type_id_size(). The external one removes modifier before
> >> finding type size. The external one is typically used only
> >> after btf is validated.
> >
> > Sure, for external callers yes, it solves the problem. But there is
> > deeper problem: we mark modifier types RESOLVED before types they
> > ultimately point to are resolved. Then in all those btf_xxx_resolve()
> > functions we have check:
> >
> > if (!env_type_is_resolve_sink && !env_type_is_resolved)
> >    return env_stack_push();
> > else {
> >
> >    /* here we assume that we can calculate size of the type */
> >    /* so even if we traverse through all the modifiers and find
> > underlying type */
> >    /* that type will have resolved_size = 0, because we haven't
> > processed it yet */
> >    /* but we will just incorrectly assume that zero is *final* size */
> > }
> >
> > So I think that your patch is still just hiding the problem, not solving it.
> >
> > BTW, I've also identified part of btf_ptr_resolve() logic that can be
> > now safely removed (it's a special case that "restarts" DFS traversal
> > for modifiers, because they could have been prematurely marked
> > resolved). This is another sign that there is something wrong in an
> > algorithm.
> >
> > I'd rather remove unnecessary complexity and fix underlying problem,
> > especially given that there is no performance or correctness penalty.
>
> Could you create a special btf with type like
> typedef int a1;
> typedef a1 a2;
> ...
> typedef a65533 a65532;
> (maximum kernel allowed number of types is 64KB)
>
> In the BTF, the typedef order is reverse
> 1: typedef a65533 to 2
> 2: typedef ... to 3
> 3 ...
>
> So kernel won't run into deep recursion or panic?

Yeah I was just thinking about the need to generate artificially
constructed BTFs to stress-test BTF verification. Will add something.

>
> Thanks.
>
> >
> > I'll post v2 soon.
> >
> >>
> >> Will go through your other comments later.
> >>
> >>>
> >>>>
> >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>> index 546ebee39e2a..6f927c3e0a89 100644
> >>>> --- a/kernel/bpf/btf.c
> >>>> +++ b/kernel/bpf/btf.c
> >>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >>>> btf_type *t)
> >>>>            return true;
> >>>>     }
> >>>>
> >>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >>>> +                                                u32 *type_id, u32
> >>>> *ret_size,
> >>>> +                                                bool skip_modifier)
> >>>> +{
> >>>> +       const struct btf_type *size_type;
> >>>> +       u32 size_type_id = *type_id;
> >>>> +       u32 size = 0;
> >>>> +
> >>>> +       size_type = btf_type_by_id(btf, size_type_id);
> >>>> +       if (size_type && skip_modifier) {
> >>>> +               while (btf_type_is_modifier(size_type))
> >>>> +                       size_type = btf_type_by_id(btf, size_type->type);
> >>>> +       }
> >>>> +
> >>>> +       if (btf_type_nosize_or_null(size_type))
> >>>> +               return NULL;
> >>>> +
> >>>> +       if (btf_type_has_size(size_type)) {
> >>>> +               size = size_type->size;
> >>>> +       } else if (btf_type_is_array(size_type)) {
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +       } else if (btf_type_is_ptr(size_type)) {
> >>>> +               size = sizeof(void *);
> >>>> +       } else {
> >>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >>>> +                                !btf_type_is_var(size_type)))
> >>>> +                       return NULL;
> >>>> +
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +               size_type_id = btf->resolved_ids[size_type_id];
> >>>> +               size_type = btf_type_by_id(btf, size_type_id);
> >>>> +               if (btf_type_nosize_or_null(size_type))
> >>>> +                       return NULL;
> >>>> +       }
> >>>> +
> >>>> +       *type_id = size_type_id;
> >>>> +       if (ret_size)
> >>>> +               *ret_size = size;
> >>>> +
> >>>> +       return size_type;
> >>>> +}
> >>>> +
> >> [...]
> >

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  4:54 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <eebd6ac9-d968-9efb-db07-e5d877f7ae4c@fb.com>

On Wed, Jul 10, 2019 at 6:53 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> >>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>>>> BTF verifier has Different logic depending on whether we are following
> >>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>>>> stop early in DFS traversal while resolving BTF types. But it also
> >>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>>>> size won't be resolved, as it is considered to be a sink for pointer,
> >>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>>>> completely wrong.
> >>>>>
> >>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>>>> over all BTF types anyways, so the only saving is a potentially slightly
> >>>>> shorter stack. But correctness is more important that tiny savings.
> >>>>>
> >>>>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>>>> typedef as a value type:
> >>>>>
> >>>>> typedef int array_t[16];
> >>>>>
> >>>>> struct {
> >>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>>>         __type(value, array_t); /* i.e., array_t *value; */
> >>>>> } test_map SEC(".maps");
> >>>>>
> >>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>>>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>>>
> >>>> The change seems okay to me. Currently, looks like intermediate
> >>>> modifier type will carry size = 0 (in the internal data structure).
> >>>
> >>> Yes, which is totally wrong, especially that we use that size in some
> >>> cases to reject map with specified BTF.
> >>>
> >>>>
> >>>> If we remove RESOLVE logic, we probably want to double check
> >>>> whether we handle circular types correctly or not. Maybe we will
> >>>> be okay if all self tests pass.
> >>>
> >>> I checked, it does. We'll attempt to add referenced type unless it's a
> >>> "resolve sink" (where size is immediately known) or is already
> >>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
> >>> env_stack_push(), which check that the state of that type is
> >>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> >>> type is added into the stack, it's resolve state goes from NOT_VISITED
> >>> to VISITED.
> >>>
> >>> So, if there is a loop, then we'll detect it as soon as we'll attempt
> >>> to add the same type onto the stack second time.
> >>>
> >>>>
> >>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >>>> before removing it.
> >>>
> >>> I don't think there is any, because every type will be visited exactly
> >>> once, due to DFS nature of algorithm. The only difference is that if
> >>> we have a long chain of modifiers, we can technically reach the max
> >>> limit and fail. But at 32 I think it's pretty unrealistic to have such
> >>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >>>
> >>>>
> >>>> Another possible change is, for external usage, removing
> >>>> modifiers, before checking the size, something like below.
> >>>> Note that I am not strongly advocating my below patch as
> >>>> it has the same shortcoming that maintained modifier type
> >>>> size may not be correct.
> >>>
> >>> I don't think your patch helps, it can actually confuse things even
> >>> more. It skips modifiers until underlying type is found, but you still
> >>> don't guarantee that at that time that underlying type will have its
> >>> size resolved.
> >>
> >> It actually does help. It does not change the internal btf type
> >> traversal algorithms. It only change the implementation of
> >> an external API btf_type_id_size(). Previously, this function
> >> is used by externals and internal btf.c. I broke it into two,
> >> one internal __btf_type_id_size(), and another external
> >> btf_type_id_size(). The external one removes modifier before
> >> finding type size. The external one is typically used only
> >> after btf is validated.
> >
> > Sure, for external callers yes, it solves the problem. But there is
> > deeper problem: we mark modifier types RESOLVED before types they
> > ultimately point to are resolved. Then in all those btf_xxx_resolve()
> > functions we have check:
> >
> > if (!env_type_is_resolve_sink && !env_type_is_resolved)
> >    return env_stack_push();
> > else {
> >
> >    /* here we assume that we can calculate size of the type */
> >    /* so even if we traverse through all the modifiers and find
> > underlying type */
> >    /* that type will have resolved_size = 0, because we haven't
> > processed it yet */
> >    /* but we will just incorrectly assume that zero is *final* size */
> > }
> >
> > So I think that your patch is still just hiding the problem, not solving it.
>
> That is why I am not advocating it.
>
> The really long modifier chain (const volatile restrict ...) is rare.
> So I agree removing this RESOLVE logic is okay.

So :) thinking about this a bit more. Stack size is proportional not
to a longest chain of pointers and modifiers, but actually could be as
long as entire type graph (O(N)). So for this approach we'll need to
dynamically resize stack. This is easy to do, but I'm not sure how
much push back I'll get for such change.

But I'll think about doing it differently. The problem is with
resolved_sizes array, we assume it's filled for some types too early.
I'll see if I can get rid of it completely and instead just calculate
that on the fly by relying on resolved_ids. Will post v2 with one of
those approaches.

>
> >
> > BTW, I've also identified part of btf_ptr_resolve() logic that can be
> > now safely removed (it's a special case that "restarts" DFS traversal
> > for modifiers, because they could have been prematurely marked
> > resolved). This is another sign that there is something wrong in an
> > algorithm.
> >
> > I'd rather remove unnecessary complexity and fix underlying problem,
> > especially given that there is no performance or correctness penalty.
> >
> > I'll post v2 soon.
>
> Sounds good.
>
> >
> >>
> >> Will go through your other comments later.
> >>
> >>>
> >>>>
> >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>> index 546ebee39e2a..6f927c3e0a89 100644
> >>>> --- a/kernel/bpf/btf.c
> >>>> +++ b/kernel/bpf/btf.c
> >>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >>>> btf_type *t)
> >>>>            return true;
> >>>>     }
> >>>>
> >>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >>>> +                                                u32 *type_id, u32
> >>>> *ret_size,
> >>>> +                                                bool skip_modifier)
> >>>> +{
> >>>> +       const struct btf_type *size_type;
> >>>> +       u32 size_type_id = *type_id;
> >>>> +       u32 size = 0;
> >>>> +
> >>>> +       size_type = btf_type_by_id(btf, size_type_id);
> >>>> +       if (size_type && skip_modifier) {
> >>>> +               while (btf_type_is_modifier(size_type))
> >>>> +                       size_type = btf_type_by_id(btf, size_type->type);
> >>>> +       }
> >>>> +
> >>>> +       if (btf_type_nosize_or_null(size_type))
> >>>> +               return NULL;
> >>>> +
> >>>> +       if (btf_type_has_size(size_type)) {
> >>>> +               size = size_type->size;
> >>>> +       } else if (btf_type_is_array(size_type)) {
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +       } else if (btf_type_is_ptr(size_type)) {
> >>>> +               size = sizeof(void *);
> >>>> +       } else {
> >>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >>>> +                                !btf_type_is_var(size_type)))
> >>>> +                       return NULL;
> >>>> +
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +               size_type_id = btf->resolved_ids[size_type_id];
> >>>> +               size_type = btf_type_by_id(btf, size_type_id);
> >>>> +               if (btf_type_nosize_or_null(size_type))
> >>>> +                       return NULL;
> >>>> +       }
> >>>> +
> >>>> +       *type_id = size_type_id;
> >>>> +       if (ret_size)
> >>>> +               *ret_size = size;
> >>>> +
> >>>> +       return size_type;
> >>>> +}
> >>>> +
> >> [...]

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  4:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <CAEf4Bza6Y87C2_Fobj9CwU-2YRTU32S61f8_8CQdhMPenJiJZQ@mail.gmail.com>



On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
>>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
>>>>> BTF verifier has Different logic depending on whether we are following
>>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
>>>>> stop early in DFS traversal while resolving BTF types. But it also
>>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
>>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
>>>>> size won't be resolved, as it is considered to be a sink for pointer,
>>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
>>>>> completely wrong.
>>>>>
>>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
>>>>> over all BTF types anyways, so the only saving is a potentially slightly
>>>>> shorter stack. But correctness is more important that tiny savings.
>>>>>
>>>>> This bug manifests itself in rejecting BTF-defined maps that use array
>>>>> typedef as a value type:
>>>>>
>>>>> typedef int array_t[16];
>>>>>
>>>>> struct {
>>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
>>>>>         __type(value, array_t); /* i.e., array_t *value; */
>>>>> } test_map SEC(".maps");
>>>>>
>>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
>>>>> Cc: Martin KaFai Lau <kafai@fb.com>
>>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>>>
>>>> The change seems okay to me. Currently, looks like intermediate
>>>> modifier type will carry size = 0 (in the internal data structure).
>>>
>>> Yes, which is totally wrong, especially that we use that size in some
>>> cases to reject map with specified BTF.
>>>
>>>>
>>>> If we remove RESOLVE logic, we probably want to double check
>>>> whether we handle circular types correctly or not. Maybe we will
>>>> be okay if all self tests pass.
>>>
>>> I checked, it does. We'll attempt to add referenced type unless it's a
>>> "resolve sink" (where size is immediately known) or is already
>>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
>>> env_stack_push(), which check that the state of that type is
>>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
>>> type is added into the stack, it's resolve state goes from NOT_VISITED
>>> to VISITED.
>>>
>>> So, if there is a loop, then we'll detect it as soon as we'll attempt
>>> to add the same type onto the stack second time.
>>>
>>>>
>>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
>>>> before removing it.
>>>
>>> I don't think there is any, because every type will be visited exactly
>>> once, due to DFS nature of algorithm. The only difference is that if
>>> we have a long chain of modifiers, we can technically reach the max
>>> limit and fail. But at 32 I think it's pretty unrealistic to have such
>>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
>>>
>>>>
>>>> Another possible change is, for external usage, removing
>>>> modifiers, before checking the size, something like below.
>>>> Note that I am not strongly advocating my below patch as
>>>> it has the same shortcoming that maintained modifier type
>>>> size may not be correct.
>>>
>>> I don't think your patch helps, it can actually confuse things even
>>> more. It skips modifiers until underlying type is found, but you still
>>> don't guarantee that at that time that underlying type will have its
>>> size resolved.
>>
>> It actually does help. It does not change the internal btf type
>> traversal algorithms. It only change the implementation of
>> an external API btf_type_id_size(). Previously, this function
>> is used by externals and internal btf.c. I broke it into two,
>> one internal __btf_type_id_size(), and another external
>> btf_type_id_size(). The external one removes modifier before
>> finding type size. The external one is typically used only
>> after btf is validated.
> 
> Sure, for external callers yes, it solves the problem. But there is
> deeper problem: we mark modifier types RESOLVED before types they
> ultimately point to are resolved. Then in all those btf_xxx_resolve()
> functions we have check:
> 
> if (!env_type_is_resolve_sink && !env_type_is_resolved)
>    return env_stack_push();
> else {
> 
>    /* here we assume that we can calculate size of the type */
>    /* so even if we traverse through all the modifiers and find
> underlying type */
>    /* that type will have resolved_size = 0, because we haven't
> processed it yet */
>    /* but we will just incorrectly assume that zero is *final* size */
> }
> 
> So I think that your patch is still just hiding the problem, not solving it.
> 
> BTW, I've also identified part of btf_ptr_resolve() logic that can be
> now safely removed (it's a special case that "restarts" DFS traversal
> for modifiers, because they could have been prematurely marked
> resolved). This is another sign that there is something wrong in an
> algorithm.
> 
> I'd rather remove unnecessary complexity and fix underlying problem,
> especially given that there is no performance or correctness penalty.

Could you create a special btf with type like
typedef int a1;
typedef a1 a2;
...
typedef a65533 a65532;
(maximum kernel allowed number of types is 64KB)

In the BTF, the typedef order is reverse
1: typedef a65533 to 2
2: typedef ... to 3
3 ...

So kernel won't run into deep recursion or panic?

Thanks.

> 
> I'll post v2 soon.
> 
>>
>> Will go through your other comments later.
>>
>>>
>>>>
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index 546ebee39e2a..6f927c3e0a89 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
>>>> btf_type *t)
>>>>            return true;
>>>>     }
>>>>
>>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
>>>> +                                                u32 *type_id, u32
>>>> *ret_size,
>>>> +                                                bool skip_modifier)
>>>> +{
>>>> +       const struct btf_type *size_type;
>>>> +       u32 size_type_id = *type_id;
>>>> +       u32 size = 0;
>>>> +
>>>> +       size_type = btf_type_by_id(btf, size_type_id);
>>>> +       if (size_type && skip_modifier) {
>>>> +               while (btf_type_is_modifier(size_type))
>>>> +                       size_type = btf_type_by_id(btf, size_type->type);
>>>> +       }
>>>> +
>>>> +       if (btf_type_nosize_or_null(size_type))
>>>> +               return NULL;
>>>> +
>>>> +       if (btf_type_has_size(size_type)) {
>>>> +               size = size_type->size;
>>>> +       } else if (btf_type_is_array(size_type)) {
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +       } else if (btf_type_is_ptr(size_type)) {
>>>> +               size = sizeof(void *);
>>>> +       } else {
>>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
>>>> +                                !btf_type_is_var(size_type)))
>>>> +                       return NULL;
>>>> +
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +               size_type_id = btf->resolved_ids[size_type_id];
>>>> +               size_type = btf_type_by_id(btf, size_type_id);
>>>> +               if (btf_type_nosize_or_null(size_type))
>>>> +                       return NULL;
>>>> +       }
>>>> +
>>>> +       *type_id = size_type_id;
>>>> +       if (ret_size)
>>>> +               *ret_size = size;
>>>> +
>>>> +       return size_type;
>>>> +}
>>>> +
>> [...]
> 

^ permalink raw reply

* Re: [PATCH v3 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  3:50 UTC (permalink / raw)
  To: David Laight
  Cc: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	Christoph Hellwig, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux@endlessm.com, Daniel Drake, stable@vger.kernel.org
In-Reply-To: <81a2b91c4b084617bab8656fca932f6d@AcuMS.aculab.com>

David Laight <David.Laight@aculab.com> 於 2019年7月10日 週三 下午4:57寫道：
>
> From: Jian-Hong Pan
> > Sent: 10 July 2019 09:38
> >
> > Testing with RTL8822BE hardware, when available memory is low, we
> > frequently see a kernel panic and system freeze.
> >
> > First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):
> >
> > rx routine starvation
> > WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822
> > rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> > [ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> >
> > Then we see a variety of different error conditions and kernel panics,
> > such as this one (trimmed):
> >
> > rtw_pci 0000:02:00.0: pci bus timeout, check dma status
> > skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f
> > data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
> > ------------[ cut here ]------------
> > kernel BUG at net/core/skbuff.c:105!
> > invalid opcode: 0000 [#1] SMP NOPTI
> > RIP: 0010:skb_panic+0x43/0x45
> >
> > When skb allocation fails and the "rx routine starvation" is hit, the
> > function returns immediately without updating the RX ring. At this
> > point, the RX ring may continue referencing an old skb which was already
> > handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
> > bad things happen.
> >
> > This patch allocates a new, data-sized skb first in RX ISR. After
> > copying the data in, we pass it to the upper layers. However, if skb
> > allocation fails, we effectively drop the frame. In both cases, the
> > original, full size ring skb is reused.
> >
> > In addition, by fixing the kernel crash, the RX routine should now
> > generally behave better under low memory conditions.
>
> A couple of minor nits (see below).
> You may want to do a followup patch that changes the rx buffers
> (used by the hardware) to by just memory buffers.
> Nothing (probably) relies on them being skb with all the accociated
> baggage.

It is a good idea for later commit.

>         David
>
> >
> > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
> > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> > Cc: <stable@vger.kernel.org>
> > ---
> > v2:
> >  - Allocate new data-sized skb and put data into it, then pass it to
> >    mac80211. Reuse the original skb in RX ring by DMA sync.
> >  - Modify the commit message.
> >  - Introduce following [PATCH v3 2/2] rtw88: pci: Use DMA sync instead
> >    of remapping in RX ISR.
> >
> > v3:
> >  - Same as v2.
> >
> >  drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
> >  1 file changed, 22 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
> > index cfe05ba7280d..e9fe3ad896c8 100644
> > --- a/drivers/net/wireless/realtek/rtw88/pci.c
> > +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> > @@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
> >       u32 pkt_offset;
> >       u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
> >       u32 buf_desc_sz = chip->rx_buf_desc_sz;
> > +     u32 new_len;
> >       u8 *rx_desc;
> >       dma_addr_t dma;
> >
> > @@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
> >               pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
> >                            pkt_stat.shift;
> >
> > -             if (pkt_stat.is_c2h) {
> > -                     /* keep rx_desc, halmac needs it */
> > -                     skb_put(skb, pkt_stat.pkt_len + pkt_offset);
> > +             /* discard current skb if the new skb cannot be allocated as a
> > +              * new one in rx ring later
> > +              */
>
> That comment isn't quite right.
> maybe: "Allocate a new skb for this frame, discard if none available"

Thanks!  I will tweak it.

> > +             new_len = pkt_stat.pkt_len + pkt_offset;
> > +             new = dev_alloc_skb(new_len);
> > +             if (WARN_ONCE(!new, "rx routine starvation\n"))
>
> I think you should count these??

Larry has a different idea here. [1]
I agree with Larry that just need to know not enough memory here.

[1] https://lkml.org/lkml/2019/7/8/1049

Jian-Hong Pan

> > +                     goto next_rp;
> > +
> > +             /* put the DMA data including rx_desc from phy to new skb */
> > +             skb_put_data(new, skb->data, new_len);
> >
> > -                     /* pass offset for further operation */
> > -                     *((u32 *)skb->cb) = pkt_offset;
> > -                     skb_queue_tail(&rtwdev->c2h_queue, skb);
> > +             if (pkt_stat.is_c2h) {
> > +                      /* pass rx_desc & offset for further operation */
> > +                     *((u32 *)new->cb) = pkt_offset;
> > +                     skb_queue_tail(&rtwdev->c2h_queue, new);
> >                       ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
> >               } else {
> > -                     /* remove rx_desc, maybe use skb_pull? */
> > -                     skb_put(skb, pkt_stat.pkt_len);
> > -                     skb_reserve(skb, pkt_offset);
> > -
> > -                     /* alloc a smaller skb to mac80211 */
> > -                     new = dev_alloc_skb(pkt_stat.pkt_len);
> > -                     if (!new) {
> > -                             new = skb;
> > -                     } else {
> > -                             skb_put_data(new, skb->data, skb->len);
> > -                             dev_kfree_skb_any(skb);
> > -                     }
> > -                     /* TODO: merge into rx.c */
> > -                     rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
> > +                     /* remove rx_desc */
> > +                     skb_pull(new, pkt_offset);
> > +
> > +                     rtw_rx_stats(rtwdev, pkt_stat.vif, new);
> >                       memcpy(new->cb, &rx_status, sizeof(rx_status));
> >                       ieee80211_rx_irqsafe(rtwdev->hw, new);
> >               }
> >
> > -             /* skb delivered to mac80211, alloc a new one in rx ring */
> > -             new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
> > -             if (WARN(!new, "rx routine starvation\n"))
> > -                     return;
> > -
> > -             ring->buf[cur_rp] = new;
> > -             rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
> > +next_rp:
> > +             /* new skb delivered to mac80211, re-enable original skb DMA */
> > +             rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
> >
> >               /* host read next element in ring */
> >               if (++cur_rp >= ring->r.len)
> > --
> > 2.22.0
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

^ permalink raw reply

* [PATCH] ipvs: remove unnecessary space
From: yangxingwu @ 2019-07-11  3:39 UTC (permalink / raw)
  To: wensong
  Cc: horms, ja, pablo, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel, yangxingwu

this patch removes the extra space and use bitmap_zalloc instead

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
---
 net/netfilter/ipvs/ip_vs_mh.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
index 94d9d34..3229867 100644
--- a/net/netfilter/ipvs/ip_vs_mh.c
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -174,8 +174,7 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
 		return 0;
 	}
 
-	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
-			 sizeof(unsigned long), GFP_KERNEL);
+	table = bitmap_zalloc(IP_VS_MH_TAB_SIZE, GFP_KERNEL);
 	if (!table)
 		return -ENOMEM;
 
-- 
1.8.3.1


^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  3:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711131603.6b11b831@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1420 bytes --]

Hi all,

On Thu, 11 Jul 2019 13:16:03 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
> On Thu, 11 Jul 2019 13:13:44 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:  
> > >
> > > On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> > >     
> > > > So today this failed to build after I merged the rdma tree (previously
> > > > it didn;t until after the net-next tree was merged (I assume a
> > > > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > > > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > > > tree :-(      
> > > 
> > > ? I'm confused.. 
> > > 
> > > rdma.git builds fine stand alone (I hope!)    
> > 
> > I have "Fixup to build SIW issue" from Leon (which switches to using
> > in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
> > without that the rdma tree would not build for me.  Are you saying that
> > I don't need that at all, now?  
> 
> Actually , I get it now, "Fixup to build SIW issue" is really just a
> fixup for the net-next and rdma trees merge ... OK, I will fix that up
> tomorrow.  Sorry for my confusion.

Actually, I have rewound my tree and am starting from the merge of the
rdma tree again, so hopefully it should all be good today.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox