Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH v2 0/3]
From: Rakesh Pillai @ 2020-08-01  5:10 UTC (permalink / raw)
  To: 'Florian Fainelli', ath10k
  Cc: linux-wireless, linux-kernel, kvalo, davem, kuba, netdev
In-Reply-To: <c6c5b3c5-f862-9cee-6863-24f666cc28f5@gmail.com>



> -----Original Message-----
> From: Florian Fainelli <f.fainelli@gmail.com>
> Sent: Saturday, August 1, 2020 12:17 AM
> To: Rakesh Pillai <pillair@codeaurora.org>; ath10k@lists.infradead.org
> Cc: linux-wireless@vger.kernel.org; linux-kernel@vger.kernel.org;
> kvalo@codeaurora.org; davem@davemloft.net; kuba@kernel.org;
> netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/3]
> 
> On 7/31/20 11:27 AM, Rakesh Pillai wrote:
> > The history recording will be compiled only if
> > ATH10K_DEBUG is enabled, and also enabled via
> > the module parameter. Once the history recording
> > is enabled via module parameter, it can be enabled
> > or disabled runtime via debugfs.
> 
> Why not use trace prints and retrieving them via the function tracer?
> This seems very ad-hoc.

Tracing needs to be enabled to capture the events.
But these events can be turned on in some kind of a debug build and capture the history to help us debug in case there is a crash.
It wont even allocate memory if not enabled via module parameter.

> 
> >
> > ---
> > Changes from v1:
> > - Add module param and debugfs to enable/disable history recording.
> >
> > Rakesh Pillai (3):
> >   ath10k: Add history for tracking certain events
> >   ath10k: Add module param to enable history
> >   ath10k: Add debugfs support to enable event history
> >
> >  drivers/net/wireless/ath/ath10k/ce.c      |   1 +
> >  drivers/net/wireless/ath/ath10k/core.c    |   3 +
> >  drivers/net/wireless/ath/ath10k/core.h    |  82 ++++++++++++
> >  drivers/net/wireless/ath/ath10k/debug.c   | 207
> ++++++++++++++++++++++++++++++
> >  drivers/net/wireless/ath/ath10k/debug.h   |  75 +++++++++++
> >  drivers/net/wireless/ath/ath10k/snoc.c    |  15 ++-
> >  drivers/net/wireless/ath/ath10k/wmi-tlv.c |   1 +
> >  drivers/net/wireless/ath/ath10k/wmi.c     |  10 ++
> >  8 files changed, 393 insertions(+), 1 deletion(-)
> >
> 
> 
> --
> Florian


^ permalink raw reply

* Re: [PATCH] net: Phy: Add PHY lookup support on MDIO bus in case of ACPI probe
From: Florian Fainelli @ 2020-08-01  5:08 UTC (permalink / raw)
  To: Vikas Singh, Andrew Lunn
  Cc: hkallweit1, linux, netdev, Calvin Johnson (OSS), Kuldip Dwivedi,
	Madalin Bucur (OSS), Vikas Singh
In-Reply-To: <CADvVLtUAd0X7c39BX791CuyWBcmfBsp7Xvw9Jyp05w45agJY9g@mail.gmail.com>



On 7/31/2020 9:53 PM, Vikas Singh wrote:
> Hi Andrew,
>  
> As i have already mentioned that this patch is based on
> https://www.spinics.net/lists/netdev/msg662173.html,
> <https://www.spinics.net/lists/netdev/msg662173.html>
> 
> When MDIO bus gets registered itself along with devices on it , the
> function mdiobus_register() inside of_mdiobus_register(), brings
> up all the PHYs on the mdio bus and attach them to the bus with the help
> of_mdiobus_link_mdiodev() inside mdiobus_scan() .
> Additionally it has been discussed with the maintainers that the
> mdiobus_register() function should be capable of handling both ACPI &
> DTB stuff
> without any change to existing implementation.
> Now of_mdiobus_link_mdiodev() inside mdiobus_scan() see if the
> auto-probed phy has a corresponding child in the bus node, and set the
> "of_node" pointer in DT case.
> But lacks to set the "fwnode" pointer in ACPI case which is resulting in
> mdiobus_register() failure as an end result theoretically.
> 
> Now this patch set (changes) will attempt to fill this gap and
> generalise the mdiobus_register() implementation for both ACPI & DT with
> no duplicacy or redundancy.

Please reply in plain text and do not top-post, thank you.
-- 
Florian

^ permalink raw reply

* [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty
From: Brian Vazquez @ 2020-08-01  4:57 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller
  Cc: linux-kernel, netdev, bpf, Luigi Rizzo, Yonghong Song

While running some experiments it was observed that map_lookup_batch was much
slower than get_next_key + lookup when the syscall overhead is minimal.
This was because the map_lookup_batch implementation was more expensive
traversing empty buckets, this can be really costly when the pre-allocated
map is too big.

This patch optimizes the case when the bucket is empty so we can move quickly
to next bucket.

The benchmark to exercise this is as follows:

-The map was populate with a single entry to make sure that the syscall overhead
is not helping the map_batch_lookup.
-The size of the preallocated map was increased to show the effect of
traversing empty buckets.

Results:

  Using get_next_key + lookup:

  Benchmark                Time(ns)        CPU(ns)     Iteration
  ---------------------------------------------------------------
  BM_DumpHashMap/1/1k          3593           3586         192680
  BM_DumpHashMap/1/4k          6004           5972         100000
  BM_DumpHashMap/1/16k        15755          15710          44341
  BM_DumpHashMap/1/64k        59525          59376          10000

  Using htab_lookup_batch before this patch:
  Benchmark                Time(ns)        CPU(ns)     Iterations
  ---------------------------------------------------------------
  BM_DumpHashMap/1/1k          3933           3927         177978
  BM_DumpHashMap/1/4k          9192           9177          73951
  BM_DumpHashMap/1/16k        42011          41970          16789
  BM_DumpHashMap/1/64k       117895         117661           6135

  Using htab_lookup_batch with this patch:
  Benchmark                Time(ns)        CPU(ns)     Iterations
  ---------------------------------------------------------------
  BM_DumpHashMap/1/1k          2809           2803         249212
  BM_DumpHashMap/1/4k          5318           5316         100000
  BM_DumpHashMap/1/16k        14925          14895          47448
  BM_DumpHashMap/1/64k        58870          58674          10000

Suggested-by: Luigi Rizzo <lrizzo@google.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
---
 kernel/bpf/hashtab.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 2137e2200d95..150015ea6737 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1351,7 +1351,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	struct hlist_nulls_head *head;
 	struct hlist_nulls_node *n;
 	unsigned long flags = 0;
-	bool locked = false;
 	struct htab_elem *l;
 	struct bucket *b;
 	int ret = 0;
@@ -1410,19 +1409,19 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	dst_val = values;
 	b = &htab->buckets[batch];
 	head = &b->head;
-	/* do not grab the lock unless need it (bucket_cnt > 0). */
-	if (locked)
-		flags = htab_lock_bucket(htab, b);
 
+	l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)),
+					struct htab_elem, hash_node);
+	if (!l && (batch + 1 < htab->n_buckets)) {
+		batch++;
+		goto again_nocopy;
+	}
+
+	flags = htab_lock_bucket(htab, b);
 	bucket_cnt = 0;
 	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
 		bucket_cnt++;
 
-	if (bucket_cnt && !locked) {
-		locked = true;
-		goto again_nocopy;
-	}
-
 	if (bucket_cnt > (max_count - total)) {
 		if (total == 0)
 			ret = -ENOSPC;
@@ -1448,10 +1447,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		goto alloc;
 	}
 
-	/* Next block is only safe to run if you have grabbed the lock */
-	if (!locked)
-		goto next_batch;
-
 	hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
 		memcpy(dst_key, l->key, key_size);
 
@@ -1494,7 +1489,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	}
 
 	htab_unlock_bucket(htab, b, flags);
-	locked = false;
 
 	while (node_to_free) {
 		l = node_to_free;
@@ -1502,7 +1496,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		bpf_lru_push_free(&htab->lru, &l->lru_node);
 	}
 
-next_batch:
 	/* If we are not copying data, we can go to next bucket and avoid
 	 * unlocking the rcu.
 	 */
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related

* Re: [PATCH net-next] fib: fix another fib_rules_ops indirect call wrapper problem
From: Randy Dunlap @ 2020-08-01  3:52 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Eric Dumazet, Paolo Abeni,
	David S . Miller
  Cc: linux-kernel, netdev, Stephen Rothwell
In-Reply-To: <20200801030110.747164-1-brianvv@google.com>

On 7/31/20 8:01 PM, Brian Vazquez wrote:
> It turns out that on commit 41d707b7332f ("fib: fix fib_rules_ops
> indirect calls wrappers") I forgot to include the case when
> CONFIG_IP_MULTIPLE_TABLES is not set.
> 
> Fixes: 41d707b7332f ("fib: fix fib_rules_ops indirect calls wrappers")
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Brian Vazquez <brianvv@google.com>

Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested

Thanks.

> ---
>  net/core/fib_rules.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index fce645f6b9b10..a7a3f500a857b 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -17,10 +17,16 @@
>  #include <linux/indirect_call_wrapper.h>
>  
>  #ifdef CONFIG_IPV6_MULTIPLE_TABLES
> +#ifdef CONFIG_IP_MULTIPLE_TABLES
>  #define INDIRECT_CALL_MT(f, f2, f1, ...) \
>  	INDIRECT_CALL_INET(f, f2, f1, __VA_ARGS__)
>  #else
> +#define INDIRECT_CALL_MT(f, f2, f1, ...) INDIRECT_CALL_1(f, f2, __VA_ARGS__)
> +#endif
> +#elif CONFIG_IP_MULTIPLE_TABLES
>  #define INDIRECT_CALL_MT(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
> +#else
> +#define INDIRECT_CALL_MT(f, f2, f1, ...) f(__VA_ARGS__)
>  #endif
>  
>  static const struct fib_kuid_range fib_kuid_range_unset = {
> 


-- 
~Randy

^ permalink raw reply

* Re: [RFC PATCH] bpftool btf: Add prefix option to dump C
From: Andrii Nakryiko @ 2020-08-01  3:47 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Quentin Monnet, Jakub Kicinski, Jiri Olsa,
	Toke Høiland-Jørgensen, Networking, bpf, open list,
	Stanislav Fomichev
In-Reply-To: <CAP-5=fXMUWFs6YtQVuxjenCrOmKtKYCqZE3YofwdR=ArDYSwbQ@mail.gmail.com>

On Fri, Jul 31, 2020 at 6:47 PM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Jul 21, 2020 at 11:58 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Jul 21, 2020 at 10:44 PM Ian Rogers <irogers@google.com> wrote:
> > >
> > > When bpftool dumps types and enum members into a header file for
> > > inclusion the names match those in the original source. If the same
> > > header file needs to be included in the original source and the bpf
> > > program, the names of structs, unions, typedefs and enum members will
> > > have naming collisions.
> >
> > vmlinux.h is not really intended to be used from user-space, because
> > it's incompatible with pretty much any other header that declares any
> > type. Ideally we should make this better, but that might require some
> > compiler support. We've been discussing with Yonghong extending Clang
> > with a compile-time check for whether some type is defined or not,
> > which would allow to guard every type and only declare it
> > conditionally, if it's missing. But that's just an idea at this point.
>
> Thanks Andrii! We're not looking at user-space code but the BPF code.
> The prefix idea comes from a way to solve this problem in C++ with
> namespaces:
>
> namespace vmlinux {
> #include "vmlinux.h"
> }
>
> As the BPF programs are C code then the prefix acts like the
> namespace. It seems strange to need to extend the language.

This is a classic case of jumping to designing a solution without
discussing a real problem first :)

You don't need to use any of the kernel headers together with
vmlinux.h (and it won't work as well), because vmlinux.h is supposed
to have all the **used** types from the kernel. So BPF programs only
include vmlinux.h and few libbpf-provided headers with helpers. Which
is why I assumed that you are trying to use it from user-space. But
see below on what went wrong.

>
> > Regardless, vmlinux.h is also very much Clang-specific, and shouldn't
> > work well with GCC. Could you elaborate on the specifics of the use
> > case you have in mind? That could help me see what might be the right
> > solution. Thanks!
>
> So the use-case is similar to btf_iter.h:
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter.h
> To avoid collisions with somewhat cleaner macro or not games.
>
> Prompted by your concern I was looking into changing bpf_iter.h to use
> a prefix to show what the difference would be like. I also think that
> there may be issues with our kernel and tool set up that may mean that
> the prefix is unnecessary, if I fix something else. Anyway, to give an
> example I needed to build the selftests but this is failing for me.
> What I see is:
>
> $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> $ cd bpf-next
> $ make defconfig
> $ cat >>.config <<EOF
> CONFIG_DEBUG_INFO=y
> CONFIG_DEBUG_INFO_BTF=y
> EOF
> $ make -j all
> $ mkdir /tmp/selftests
> $ make O=/tmp/selftests/ TARGETS=bpf kselftest
> ...
>   CLANG    /tmp/selftests//kselftest/bpf/tools/build/bpftool/profiler.bpf.o
> skeleton/profiler.bpf.c:18:21: error: invalid application of 'sizeof'
> to an incomplete type 'struct bpf_perf_event_value'
>         __uint(value_size, sizeof(struct bpf_perf_event_value));
>                            ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Checking with bpftool the vmlinux lacks struct bpf_perf_event_value
> but as this is unconditionally defined in bpf.h this seems wrong. Do
> you have any suggestions and getting a working build?

It is unconditionally defined in bpf.h, but unless kernel code really
uses that type for something, compiler won't generate DWARF
information for that type, which subsequently won't get into BTF.
Adding CONFIG_DEBUG_INFO_BTF=y ensures you get BTF type info
generated, but only for subsystems that were compiled into vmlinux
according to your kernel config.

In this case, default config doesn't enable CONFIG_BPF_EVENTS, which
is a requirement to compile kernel/trace/bpf_trace.c, which in turn
uses struct bpf_perf_event_value in the helper signature.

So the solution in your case would be to use a slightly richer kernel
config, which enables more of the BPF subsystem. You can check
selftests/bpf/config for a list of options we typically enable to run
of selftests, for instance.

>
> > > To avoid these collisions an approach is to redeclare the header file
> > > types and enum members, which leads to duplication and possible
> > > inconsistencies. Another approach is to use preprocessor macros
> > > to rename conflicting names, but this can be cumbersome if there are
> > > many conflicts.
> > >
> > > This patch adds a prefix option for the dumped names. Use of this option
> > > can avoid name conflicts and compile time errors.
> > >
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > ---
> > >  .../bpf/bpftool/Documentation/bpftool-btf.rst |  7 ++++++-
> > >  tools/bpf/bpftool/btf.c                       | 18 ++++++++++++++---
> > >  tools/lib/bpf/btf.h                           |  1 +
> > >  tools/lib/bpf/btf_dump.c                      | 20 +++++++++++++------
> > >  4 files changed, 36 insertions(+), 10 deletions(-)
> > >
> >
> > [...]
> >
> > > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > > index 491c7b41ffdc..fea4baab00bd 100644
> > > --- a/tools/lib/bpf/btf.h
> > > +++ b/tools/lib/bpf/btf.h
> > > @@ -117,6 +117,7 @@ struct btf_dump;
> > >
> > >  struct btf_dump_opts {
> > >         void *ctx;
> > > +       const char *name_prefix;
> > >  };
> >
> > BTW, we can't do that, this breaks ABI. btf_dump_opts were added
> > before we understood the problem of backward/forward  compatibility of
> > libbpf APIs, unfortunately.
>
> This could be fixed by adding a "new" API for the parameter, which
> would be unfortunate compared to just amending the existing API. There
> may be solutions that are less duplicative.
>

Does ABI stability sucks for maintainers of the library? It absolutely
does! But we can't just go and break it.

> Thanks,
> Ian
>
> > >
> > >  typedef void (*btf_dump_printf_fn_t)(void *ctx, const char *fmt, va_list args);
> > > diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
> > > index e1c344504cae..baf2b4d82e1e 100644
> > > --- a/tools/lib/bpf/btf_dump.c
> > > +++ b/tools/lib/bpf/btf_dump.c
> >
> > [...]

^ permalink raw reply

* Re: linux-next: Tree for Jul 31 (net/decnet/ & FIB_RULES)
From: Brian Vazquez @ 2020-08-01  3:10 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, netdev@vger.kernel.org,
	linux-decnet-user, David S. Miller
In-Reply-To: <97853126-c3fb-fced-547f-6dd7d5c89ca9@infradead.org>

Ugh I completely missed CONFIG_IP_MULTIPLE_TABLES too, I sent the new
patch. This time I believe I cover all the cases. PTAL.

Thanks,
Brian

On Fri, Jul 31, 2020 at 5:50 PM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> On 7/31/20 5:35 PM, Stephen Rothwell wrote:
> > Hi Randy,
> >
> > On Fri, 31 Jul 2020 08:53:09 -0700 Randy Dunlap <rdunlap@infradead.org> wrote:
> >>
> >> on i386:
> >>
> >> ld: net/core/fib_rules.o: in function `fib_rules_lookup':
> >> fib_rules.c:(.text+0x16b8): undefined reference to `fib4_rule_match'
> >> ld: fib_rules.c:(.text+0x16bf): undefined reference to `fib4_rule_match'
> >> ld: fib_rules.c:(.text+0x170d): undefined reference to `fib4_rule_action'
> >> ld: fib_rules.c:(.text+0x171e): undefined reference to `fib4_rule_action'
> >> ld: fib_rules.c:(.text+0x1751): undefined reference to `fib4_rule_suppress'
> >> ld: fib_rules.c:(.text+0x175d): undefined reference to `fib4_rule_suppress'
> >>
> >> CONFIG_DECNET=y
> >> CONFIG_DECNET_ROUTER=y
> >>
> >> DECNET_ROUTER selects FIB_RULES.
> >
> > I assume that CONFIG_IP_MULTIPLE_TABLES was not set for that build?
>
> Correct.
>
> > Caused by commit
> >
> >   b9aaec8f0be5 ("fib: use indirect call wrappers in the most common fib_rules_ops")
> >
> > from the net-next tree.
>
> thanks.
>
> --
> ~Randy
>

^ permalink raw reply

* [PATCH net-next] fib: fix another fib_rules_ops indirect call wrapper problem
From: Brian Vazquez @ 2020-08-01  3:01 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Eric Dumazet, Paolo Abeni,
	David S . Miller
  Cc: linux-kernel, netdev, Randy Dunlap, Stephen Rothwell

It turns out that on commit 41d707b7332f ("fib: fix fib_rules_ops
indirect calls wrappers") I forgot to include the case when
CONFIG_IP_MULTIPLE_TABLES is not set.

Fixes: 41d707b7332f ("fib: fix fib_rules_ops indirect calls wrappers")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Brian Vazquez <brianvv@google.com>
---
 net/core/fib_rules.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index fce645f6b9b10..a7a3f500a857b 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -17,10 +17,16 @@
 #include <linux/indirect_call_wrapper.h>
 
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
+#ifdef CONFIG_IP_MULTIPLE_TABLES
 #define INDIRECT_CALL_MT(f, f2, f1, ...) \
 	INDIRECT_CALL_INET(f, f2, f1, __VA_ARGS__)
 #else
+#define INDIRECT_CALL_MT(f, f2, f1, ...) INDIRECT_CALL_1(f, f2, __VA_ARGS__)
+#endif
+#elif CONFIG_IP_MULTIPLE_TABLES
 #define INDIRECT_CALL_MT(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
+#else
+#define INDIRECT_CALL_MT(f, f2, f1, ...) f(__VA_ARGS__)
 #endif
 
 static const struct fib_kuid_range fib_kuid_range_unset = {
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related

* [PATCH net-next v3 2/2] hinic: add check for mailbox msg from VF
From: Luo bin @ 2020-08-01  2:49 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, luoxianjun, yin.yinshi, cloud.wangxiaoyun,
	chiqijun
In-Reply-To: <20200801024935.20819-1-luobin9@huawei.com>

PF should check whether the cmd from VF is supported and its content
is right before passing it to hw.

Signed-off-by: Luo bin <luobin9@huawei.com>
---
V1~V2: fix W=1 C=1 warnings

 .../net/ethernet/huawei/hinic/hinic_hw_cmdq.h |   8 +
 .../net/ethernet/huawei/hinic/hinic_hw_mbox.c | 173 +++++++++++++++++-
 .../net/ethernet/huawei/hinic/hinic_hw_mbox.h |  14 ++
 .../net/ethernet/huawei/hinic/hinic_sriov.c   |  62 ++++++-
 4 files changed, 255 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h
index f40c31e1879f..9c413e963a04 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h
@@ -31,6 +31,10 @@
 			(((u64)(val) & HINIC_CMDQ_CTXT_##member##_MASK) \
 			 << HINIC_CMDQ_CTXT_##member##_SHIFT)
 
+#define HINIC_CMDQ_CTXT_PAGE_INFO_GET(val, member)	\
+			(((u64)(val) >> HINIC_CMDQ_CTXT_##member##_SHIFT) \
+			 & HINIC_CMDQ_CTXT_##member##_MASK)
+
 #define HINIC_CMDQ_CTXT_PAGE_INFO_CLEAR(val, member)    \
 			((val) & (~((u64)HINIC_CMDQ_CTXT_##member##_MASK \
 			 << HINIC_CMDQ_CTXT_##member##_SHIFT)))
@@ -45,6 +49,10 @@
 			(((u64)(val) & HINIC_CMDQ_CTXT_##member##_MASK) \
 			 << HINIC_CMDQ_CTXT_##member##_SHIFT)
 
+#define HINIC_CMDQ_CTXT_BLOCK_INFO_GET(val, member)	\
+			(((u64)(val) >> HINIC_CMDQ_CTXT_##member##_SHIFT) \
+			& HINIC_CMDQ_CTXT_##member##_MASK)
+
 #define HINIC_CMDQ_CTXT_BLOCK_INFO_CLEAR(val, member)   \
 			((val) & (~((u64)HINIC_CMDQ_CTXT_##member##_MASK \
 			 << HINIC_CMDQ_CTXT_##member##_SHIFT)))
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
index 1e219adc23f6..d69f07d1da91 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
@@ -153,7 +153,6 @@ enum hinic_mbox_tx_status {
 			(MBOX_MSG_ID(func_to_func_mbox) + 1) & MBOX_MSG_ID_MASK)
 
 #define FUNC_ID_OFF_SET_8B		8
-#define FUNC_ID_OFF_SET_10B		10
 
 /* max message counter wait to process for one function */
 #define HINIC_MAX_MSG_CNT_TO_PROCESS	10
@@ -189,6 +188,37 @@ enum mbox_aeq_trig_type {
 	TRIGGER,
 };
 
+static bool check_func_id(struct hinic_hwdev *hwdev, u16 src_func_idx,
+			  const void *buf_in, u16 in_size, u16 offset)
+{
+	u16 func_idx;
+
+	if (in_size < offset + sizeof(func_idx)) {
+		dev_warn(&hwdev->hwif->pdev->dev,
+			 "Receive mailbox msg len: %d less than %d Bytes is invalid\n",
+			 in_size, offset);
+		return false;
+	}
+
+	func_idx = *((u16 *)((u8 *)buf_in + offset));
+
+	if (src_func_idx != func_idx) {
+		dev_warn(&hwdev->hwif->pdev->dev,
+			 "Receive mailbox function id: 0x%x not equal to msg function id: 0x%x\n",
+			 src_func_idx, func_idx);
+		return false;
+	}
+
+	return true;
+}
+
+bool hinic_mbox_check_func_id_8B(struct hinic_hwdev *hwdev, u16 func_idx,
+				 void *buf_in, u16 in_size)
+{
+	return check_func_id(hwdev, func_idx, buf_in, in_size,
+			     FUNC_ID_OFF_SET_8B);
+}
+
 /**
  * hinic_register_pf_mbox_cb - register mbox callback for pf
  * @hwdev: the pointer to hw device
@@ -1205,15 +1235,156 @@ static void free_mbox_wb_status(struct hinic_mbox_func_to_func *func_to_func)
 			  send_mbox->wb_paddr);
 }
 
+bool hinic_mbox_check_cmd_valid(struct hinic_hwdev *hwdev,
+				struct vf_cmd_check_handle *cmd_handle,
+				u16 vf_id, u8 cmd, void *buf_in,
+				u16 in_size, u8 size)
+{
+	u16 src_idx = vf_id + hinic_glb_pf_vf_offset(hwdev->hwif);
+	int i;
+
+	for (i = 0; i < size; i++) {
+		if (cmd == cmd_handle[i].cmd) {
+			if (cmd_handle[i].check_cmd)
+				return cmd_handle[i].check_cmd(hwdev, src_idx,
+							       buf_in, in_size);
+			else
+				return true;
+		}
+	}
+
+	dev_err(&hwdev->hwif->pdev->dev,
+		"PF Receive VF(%d) unsupported cmd(0x%x)\n",
+		vf_id + hinic_glb_pf_vf_offset(hwdev->hwif), cmd);
+
+	return false;
+}
+
+static bool hinic_cmdq_check_vf_ctxt(struct hinic_hwdev *hwdev,
+				     struct hinic_cmdq_ctxt *cmdq_ctxt)
+{
+	struct hinic_cmdq_ctxt_info *ctxt_info = &cmdq_ctxt->ctxt_info;
+	u64 curr_pg_pfn, wq_block_pfn;
+
+	if (cmdq_ctxt->ppf_idx != HINIC_HWIF_PPF_IDX(hwdev->hwif) ||
+	    cmdq_ctxt->cmdq_type > HINIC_MAX_CMDQ_TYPES)
+		return false;
+
+	curr_pg_pfn = HINIC_CMDQ_CTXT_PAGE_INFO_GET
+		(ctxt_info->curr_wqe_page_pfn, CURR_WQE_PAGE_PFN);
+	wq_block_pfn = HINIC_CMDQ_CTXT_BLOCK_INFO_GET
+		(ctxt_info->wq_block_pfn, WQ_BLOCK_PFN);
+	/* VF must use 0-level CLA */
+	if (curr_pg_pfn != wq_block_pfn)
+		return false;
+
+	return true;
+}
+
+static bool check_cmdq_ctxt(struct hinic_hwdev *hwdev, u16 func_idx,
+			    void *buf_in, u16 in_size)
+{
+	if (!hinic_mbox_check_func_id_8B(hwdev, func_idx, buf_in, in_size))
+		return false;
+
+	return hinic_cmdq_check_vf_ctxt(hwdev, buf_in);
+}
+
+#define HW_CTX_QPS_VALID(hw_ctxt)   \
+		((hw_ctxt)->rq_depth >= HINIC_QUEUE_MIN_DEPTH &&	\
+		(hw_ctxt)->rq_depth <= HINIC_QUEUE_MAX_DEPTH &&	\
+		(hw_ctxt)->sq_depth >= HINIC_QUEUE_MIN_DEPTH &&	\
+		(hw_ctxt)->sq_depth <= HINIC_QUEUE_MAX_DEPTH &&	\
+		(hw_ctxt)->rx_buf_sz_idx <= HINIC_MAX_RX_BUFFER_SIZE)
+
+static bool hw_ctxt_qps_param_valid(struct hinic_cmd_hw_ioctxt *hw_ctxt)
+{
+	if (HW_CTX_QPS_VALID(hw_ctxt))
+		return true;
+
+	if (!hw_ctxt->rq_depth && !hw_ctxt->sq_depth &&
+	    !hw_ctxt->rx_buf_sz_idx)
+		return true;
+
+	return false;
+}
+
+static bool check_hwctxt(struct hinic_hwdev *hwdev, u16 func_idx,
+			 void *buf_in, u16 in_size)
+{
+	struct hinic_cmd_hw_ioctxt *hw_ctxt = buf_in;
+
+	if (!hinic_mbox_check_func_id_8B(hwdev, func_idx, buf_in, in_size))
+		return false;
+
+	if (hw_ctxt->ppf_idx != HINIC_HWIF_PPF_IDX(hwdev->hwif))
+		return false;
+
+	if (hw_ctxt->set_cmdq_depth) {
+		if (hw_ctxt->cmdq_depth >= HINIC_QUEUE_MIN_DEPTH &&
+		    hw_ctxt->cmdq_depth <= HINIC_QUEUE_MAX_DEPTH)
+			return true;
+
+		return false;
+	}
+
+	return hw_ctxt_qps_param_valid(hw_ctxt);
+}
+
+static bool check_set_wq_page_size(struct hinic_hwdev *hwdev, u16 func_idx,
+				   void *buf_in, u16 in_size)
+{
+	struct hinic_wq_page_size *page_size_info = buf_in;
+
+	if (!hinic_mbox_check_func_id_8B(hwdev, func_idx, buf_in, in_size))
+		return false;
+
+	if (page_size_info->ppf_idx != HINIC_HWIF_PPF_IDX(hwdev->hwif))
+		return false;
+
+	if (((1U << page_size_info->page_size) * SZ_4K) !=
+	    HINIC_DEFAULT_WQ_PAGE_SIZE)
+		return false;
+
+	return true;
+}
+
+static struct vf_cmd_check_handle hw_cmd_support_vf[] = {
+	{HINIC_COMM_CMD_START_FLR, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_DMA_ATTR_SET, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_CMDQ_CTXT_SET, check_cmdq_ctxt},
+	{HINIC_COMM_CMD_CMDQ_CTXT_GET, check_cmdq_ctxt},
+	{HINIC_COMM_CMD_HWCTXT_SET, check_hwctxt},
+	{HINIC_COMM_CMD_HWCTXT_GET, check_hwctxt},
+	{HINIC_COMM_CMD_SQ_HI_CI_SET, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_RES_STATE_SET, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_IO_RES_CLEAR, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_CEQ_CTRL_REG_WR_BY_UP, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_MSI_CTRL_REG_WR_BY_UP, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_MSI_CTRL_REG_RD_BY_UP, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_L2NIC_RESET, hinic_mbox_check_func_id_8B},
+	{HINIC_COMM_CMD_PAGESIZE_SET, check_set_wq_page_size},
+};
+
 static int comm_pf_mbox_handler(void *handle, u16 vf_id, u8 cmd, void *buf_in,
 				u16 in_size, void *buf_out, u16 *out_size)
 {
+	u8 size = ARRAY_SIZE(hw_cmd_support_vf);
 	struct hinic_hwdev *hwdev = handle;
 	struct hinic_pfhwdev *pfhwdev;
 	int err = 0;
 
 	pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
 
+	if (!hinic_mbox_check_cmd_valid(handle, hw_cmd_support_vf, vf_id, cmd,
+					buf_in, in_size, size)) {
+		dev_err(&hwdev->hwif->pdev->dev,
+			"PF Receive VF: %d common cmd: 0x%x or mbox len: 0x%x is invalid\n",
+			vf_id + hinic_glb_pf_vf_offset(hwdev->hwif), cmd,
+			in_size);
+		return HINIC_MBOX_VF_CMD_ERROR;
+	}
+
 	if (cmd == HINIC_COMM_CMD_START_FLR) {
 		*out_size = 0;
 	} else {
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.h
index 0618fe515d9c..46953190d29e 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.h
@@ -24,6 +24,12 @@
 
 #define MAX_FUNCTION_NUM		512
 
+struct vf_cmd_check_handle {
+	u8 cmd;
+	bool (*check_cmd)(struct hinic_hwdev *hwdev, u16 src_func_idx,
+			  void *buf_in, u16 in_size);
+};
+
 enum hinic_mbox_ack_type {
 	MBOX_ACK,
 	MBOX_NO_ACK,
@@ -122,6 +128,14 @@ struct vf_cmd_msg_handle {
 			       void *buf_out, u16 *out_size);
 };
 
+bool hinic_mbox_check_func_id_8B(struct hinic_hwdev *hwdev, u16 func_idx,
+				 void *buf_in, u16 in_size);
+
+bool hinic_mbox_check_cmd_valid(struct hinic_hwdev *hwdev,
+				struct vf_cmd_check_handle *cmd_handle,
+				u16 vf_id, u8 cmd, void *buf_in,
+				u16 in_size, u8 size);
+
 int hinic_register_pf_mbox_cb(struct hinic_hwdev *hwdev,
 			      enum hinic_mod_type mod,
 			      hinic_pf_mbox_cb callback);
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c
index 1d8a115cb9ec..4d63680f2143 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c
@@ -429,6 +429,18 @@ static int hinic_get_vf_link_status_msg_handler(void *hwdev, u16 vf_id,
 	return 0;
 }
 
+static bool check_func_table(struct hinic_hwdev *hwdev, u16 func_idx,
+			     void *buf_in, u16 in_size)
+{
+	struct hinic_cmd_fw_ctxt *function_table = buf_in;
+
+	if (!hinic_mbox_check_func_id_8B(hwdev, func_idx, buf_in, in_size) ||
+	    !function_table->rx_buf_sz)
+		return false;
+
+	return true;
+}
+
 static struct vf_cmd_msg_handle nic_vf_cmd_msg_handler[] = {
 	{HINIC_PORT_CMD_VF_REGISTER, hinic_register_vf_msg_handler},
 	{HINIC_PORT_CMD_VF_UNREGISTER, hinic_unregister_vf_msg_handler},
@@ -439,6 +451,45 @@ static struct vf_cmd_msg_handle nic_vf_cmd_msg_handler[] = {
 	{HINIC_PORT_CMD_GET_LINK_STATE, hinic_get_vf_link_status_msg_handler},
 };
 
+static struct vf_cmd_check_handle nic_cmd_support_vf[] = {
+	{HINIC_PORT_CMD_VF_REGISTER, NULL},
+	{HINIC_PORT_CMD_VF_UNREGISTER, NULL},
+	{HINIC_PORT_CMD_CHANGE_MTU, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_ADD_VLAN, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_DEL_VLAN, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_MAC, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_MAC, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_DEL_MAC, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_RX_MODE, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_PAUSE_INFO, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_LINK_STATE, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_LRO, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_RX_CSUM, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_RX_VLAN_OFFLOAD, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_VPORT_STAT, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_CLEAN_VPORT_STAT, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_RSS_TEMPLATE_INDIR_TBL,
+	 hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_RSS_TEMPLATE_TBL, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_RSS_TEMPLATE_TBL, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_RSS_HASH_ENGINE, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_RSS_HASH_ENGINE, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_RSS_CTX_TBL, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_RSS_CTX_TBL, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_RSS_TEMP_MGR, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_RSS_CFG, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_FWCTXT_INIT, check_func_table},
+	{HINIC_PORT_CMD_GET_MGMT_VERSION, NULL},
+	{HINIC_PORT_CMD_SET_FUNC_STATE, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_GLOBAL_QPN, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_TSO, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_SET_RQ_IQ_MAP, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_LINK_STATUS_REPORT, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_UPDATE_MAC, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_CAP, hinic_mbox_check_func_id_8B},
+	{HINIC_PORT_CMD_GET_LINK_MODE, hinic_mbox_check_func_id_8B},
+};
+
 #define CHECK_IPSU_15BIT	0X8000
 
 static
@@ -972,6 +1023,7 @@ int hinic_ndo_set_vf_link_state(struct net_device *netdev, int vf_id, int link)
 static int nic_pf_mbox_handler(void *hwdev, u16 vf_id, u8 cmd, void *buf_in,
 			       u16 in_size, void *buf_out, u16 *out_size)
 {
+	u8 size = ARRAY_SIZE(nic_cmd_support_vf);
 	struct vf_cmd_msg_handle *vf_msg_handle;
 	struct hinic_hwdev *dev = hwdev;
 	struct hinic_func_to_io *nic_io;
@@ -980,7 +1032,15 @@ static int nic_pf_mbox_handler(void *hwdev, u16 vf_id, u8 cmd, void *buf_in,
 	u32 i;
 
 	if (!hwdev)
-		return -EFAULT;
+		return -EINVAL;
+
+	if (!hinic_mbox_check_cmd_valid(hwdev, nic_cmd_support_vf, vf_id, cmd,
+					buf_in, in_size, size)) {
+		dev_err(&dev->hwif->pdev->dev,
+			"PF Receive VF nic cmd: 0x%x, mbox len: 0x%x is invalid\n",
+			cmd, in_size);
+		return HINIC_MBOX_VF_CMD_ERROR;
+	}
 
 	pfhwdev = container_of(dev, struct hinic_pfhwdev, hwdev);
 	nic_io = &dev->func_to_io;
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next v3 1/2] hinic: add generating mailbox random index support
From: Luo bin @ 2020-08-01  2:49 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, luoxianjun, yin.yinshi, cloud.wangxiaoyun,
	chiqijun
In-Reply-To: <20200801024935.20819-1-luobin9@huawei.com>

add support to generate mailbox random id of VF to ensure that
mailbox messages PF received are from the correct VF.

Signed-off-by: Luo bin <luobin9@huawei.com>
---
V2~V3 fix review opinions pointed out by Jakub

 .../net/ethernet/huawei/hinic/hinic_hw_dev.h  |  13 ++
 .../net/ethernet/huawei/hinic/hinic_hw_mbox.c | 135 ++++++++++++++++++
 .../net/ethernet/huawei/hinic/hinic_hw_mbox.h |   8 ++
 .../net/ethernet/huawei/hinic/hinic_hw_mgmt.h |   2 +
 .../net/ethernet/huawei/hinic/hinic_sriov.c   |   7 +
 5 files changed, 165 insertions(+)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
index 2fb5f784f116..dc6e645f2689 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
@@ -28,6 +28,8 @@
 #define HINIC_MGMT_STATUS_EXIST				0x6
 #define HINIC_MGMT_CMD_UNSUPPORTED			0xFF
 
+#define HINIC_CMD_VER_FUNC_ID				2
+
 struct hinic_cap {
 	u16     max_qps;
 	u16     num_qps;
@@ -313,6 +315,17 @@ struct hinic_msix_config {
 	u8	rsvd1[3];
 };
 
+struct hinic_set_random_id {
+	u8    status;
+	u8    version;
+	u8    rsvd0[6];
+
+	u8    vf_in_pf;
+	u8    rsvd1;
+	u16   func_idx;
+	u32   random_id;
+};
+
 struct hinic_board_info {
 	u32	board_type;
 	u32	port_num;
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
index 47c93f946b94..1e219adc23f6 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
@@ -486,6 +486,111 @@ static void recv_mbox_handler(struct hinic_mbox_func_to_func *func_to_func,
 	kfree(rcv_mbox_temp);
 }
 
+static int set_vf_mbox_random_id(struct hinic_hwdev *hwdev, u16 func_id)
+{
+	struct hinic_mbox_func_to_func *func_to_func = hwdev->func_to_func;
+	struct hinic_set_random_id rand_info = {0};
+	u16 out_size = sizeof(rand_info);
+	struct hinic_pfhwdev *pfhwdev;
+	int ret;
+
+	pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
+
+	rand_info.version = HINIC_CMD_VER_FUNC_ID;
+	rand_info.func_idx = func_id;
+	rand_info.vf_in_pf = func_id - hinic_glb_pf_vf_offset(hwdev->hwif);
+	rand_info.random_id = get_random_u32();
+
+	func_to_func->vf_mbx_rand_id[func_id] = rand_info.random_id;
+
+	ret = hinic_msg_to_mgmt(&pfhwdev->pf_to_mgmt, HINIC_MOD_COMM,
+				HINIC_MGMT_CMD_SET_VF_RANDOM_ID,
+				&rand_info, sizeof(rand_info),
+				&rand_info, &out_size, HINIC_MGMT_MSG_SYNC);
+	if ((rand_info.status != HINIC_MGMT_CMD_UNSUPPORTED &&
+	     rand_info.status) || !out_size || ret) {
+		dev_err(&hwdev->hwif->pdev->dev, "Set VF random id failed, err: %d, status: 0x%x, out size: 0x%x\n",
+			ret, rand_info.status, out_size);
+		return -EIO;
+	}
+
+	if (rand_info.status == HINIC_MGMT_CMD_UNSUPPORTED)
+		return rand_info.status;
+
+	func_to_func->vf_mbx_old_rand_id[func_id] =
+				func_to_func->vf_mbx_rand_id[func_id];
+
+	return 0;
+}
+
+static void update_random_id_work_handler(struct work_struct *work)
+{
+	struct hinic_mbox_work *mbox_work =
+			container_of(work, struct hinic_mbox_work, work);
+	struct hinic_mbox_func_to_func *func_to_func;
+	u16 src = mbox_work->src_func_idx;
+
+	func_to_func = mbox_work->func_to_func;
+
+	if (set_vf_mbox_random_id(func_to_func->hwdev, src))
+		dev_warn(&func_to_func->hwdev->hwif->pdev->dev, "Update VF id: 0x%x random id failed\n",
+			 mbox_work->src_func_idx);
+
+	kfree(mbox_work);
+}
+
+static bool check_vf_mbox_random_id(struct hinic_mbox_func_to_func *func_to_func,
+				    u8 *header)
+{
+	struct hinic_hwdev *hwdev = func_to_func->hwdev;
+	struct hinic_mbox_work *mbox_work = NULL;
+	u64 mbox_header = *((u64 *)header);
+	u16 offset, src;
+	u32 random_id;
+	int vf_in_pf;
+
+	src = HINIC_MBOX_HEADER_GET(mbox_header, SRC_GLB_FUNC_IDX);
+
+	if (IS_PF_OR_PPF_SRC(src) || !func_to_func->support_vf_random)
+		return true;
+
+	if (!HINIC_IS_PPF(hwdev->hwif)) {
+		offset = hinic_glb_pf_vf_offset(hwdev->hwif);
+		vf_in_pf = src - offset;
+
+		if (vf_in_pf < 1 || vf_in_pf > hwdev->nic_cap.max_vf) {
+			dev_warn(&hwdev->hwif->pdev->dev,
+				 "Receive vf id(0x%x) is invalid, vf id should be from 0x%x to 0x%x\n",
+				 src, offset + 1,
+				 hwdev->nic_cap.max_vf + offset);
+			return false;
+		}
+	}
+
+	random_id = be32_to_cpu(*(u32 *)(header + MBOX_SEG_LEN +
+					 MBOX_HEADER_SZ));
+
+	if (random_id == func_to_func->vf_mbx_rand_id[src] ||
+	    random_id == func_to_func->vf_mbx_old_rand_id[src])
+		return true;
+
+	dev_warn(&hwdev->hwif->pdev->dev,
+		 "The mailbox random id(0x%x) of func_id(0x%x) doesn't match with pf reservation(0x%x)\n",
+		 random_id, src, func_to_func->vf_mbx_rand_id[src]);
+
+	mbox_work = kzalloc(sizeof(*mbox_work), GFP_KERNEL);
+	if (!mbox_work)
+		return false;
+
+	mbox_work->func_to_func = func_to_func;
+	mbox_work->src_func_idx = src;
+
+	INIT_WORK(&mbox_work->work, update_random_id_work_handler);
+	queue_work(func_to_func->workq, &mbox_work->work);
+
+	return false;
+}
+
 void hinic_mbox_func_aeqe_handler(void *handle, void *header, u8 size)
 {
 	struct hinic_mbox_func_to_func *func_to_func;
@@ -504,6 +609,9 @@ void hinic_mbox_func_aeqe_handler(void *handle, void *header, u8 size)
 		return;
 	}
 
+	if (!check_vf_mbox_random_id(func_to_func, header))
+		return;
+
 	recv_mbox = (dir == HINIC_HWIF_DIRECT_SEND) ?
 		    &func_to_func->mbox_send[src] :
 		    &func_to_func->mbox_resp[src];
@@ -1210,3 +1318,30 @@ void hinic_func_to_func_free(struct hinic_hwdev *hwdev)
 
 	kfree(func_to_func);
 }
+
+int hinic_vf_mbox_random_id_init(struct hinic_hwdev *hwdev)
+{
+	u8 vf_in_pf;
+	int err = 0;
+
+	if (HINIC_IS_VF(hwdev->hwif))
+		return 0;
+
+	for (vf_in_pf = 1; vf_in_pf <= hwdev->nic_cap.max_vf; vf_in_pf++) {
+		err = set_vf_mbox_random_id(hwdev, hinic_glb_pf_vf_offset
+					    (hwdev->hwif) + vf_in_pf);
+		if (err)
+			break;
+	}
+
+	if (err == HINIC_MGMT_CMD_UNSUPPORTED) {
+		hwdev->func_to_func->support_vf_random = false;
+		err = 0;
+		dev_warn(&hwdev->hwif->pdev->dev, "Mgmt is unsupported to set VF%d random id\n",
+			 vf_in_pf - 1);
+	} else if (!err) {
+		hwdev->func_to_func->support_vf_random = true;
+	}
+
+	return err;
+}
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.h
index 7b18559bfe80..0618fe515d9c 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.h
@@ -22,6 +22,8 @@
 #define HINIC_FUNC_CSR_MAILBOX_RESULT_H_OFF		0x0108
 #define HINIC_FUNC_CSR_MAILBOX_RESULT_L_OFF		0x010C
 
+#define MAX_FUNCTION_NUM		512
+
 enum hinic_mbox_ack_type {
 	MBOX_ACK,
 	MBOX_NO_ACK,
@@ -100,6 +102,10 @@ struct hinic_mbox_func_to_func {
 
 	/* lock for mbox event flag */
 	spinlock_t mbox_lock;
+
+	u32 vf_mbx_old_rand_id[MAX_FUNCTION_NUM];
+	u32 vf_mbx_rand_id[MAX_FUNCTION_NUM];
+	bool support_vf_random;
 };
 
 struct hinic_mbox_work {
@@ -151,4 +157,6 @@ int hinic_mbox_to_vf(struct hinic_hwdev *hwdev,
 		     enum hinic_mod_type mod, u16 vf_id, u8 cmd, void *buf_in,
 		     u16 in_size, void *buf_out, u16 *out_size, u32 timeout);
 
+int hinic_vf_mbox_random_id_init(struct hinic_hwdev *hwdev);
+
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h
index f626100b85c1..4ca81cc838db 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h
@@ -93,6 +93,8 @@ enum hinic_comm_cmd {
 
 	HINIC_COMM_CMD_WATCHDOG_INFO	= 0x56,
 
+	HINIC_MGMT_CMD_SET_VF_RANDOM_ID = 0x61,
+
 	HINIC_COMM_CMD_MAX,
 };
 
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c
index 141206917e4d..1d8a115cb9ec 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c
@@ -1108,6 +1108,13 @@ int hinic_vf_func_init(struct hinic_hwdev *hwdev)
 	int err = 0;
 	u32 size, i;
 
+	err = hinic_vf_mbox_random_id_init(hwdev);
+	if (err) {
+		dev_err(&hwdev->hwif->pdev->dev, "Failed to init vf mbox random id, err: %d\n",
+			err);
+		return err;
+	}
+
 	nic_io = &hwdev->func_to_io;
 
 	if (HINIC_IS_VF(hwdev->hwif)) {
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next v3 0/2] hinic: mailbox channel enhancement
From: Luo bin @ 2020-08-01  2:49 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, luoxianjun, yin.yinshi, cloud.wangxiaoyun,
	chiqijun

add support to generate mailbox random id for VF to ensure that
the mailbox message from VF is valid and PF should check whether
the cmd from VF is supported before passing it to hw.

Luo bin (2):
  hinic: add generating mailbox random index support
  hinic: add check for mailbox msg from VF

 .../net/ethernet/huawei/hinic/hinic_hw_cmdq.h |   8 +
 .../net/ethernet/huawei/hinic/hinic_hw_dev.h  |  13 +
 .../net/ethernet/huawei/hinic/hinic_hw_mbox.c | 308 +++++++++++++++++-
 .../net/ethernet/huawei/hinic/hinic_hw_mbox.h |  22 ++
 .../net/ethernet/huawei/hinic/hinic_hw_mgmt.h |   2 +
 .../net/ethernet/huawei/hinic/hinic_sriov.c   |  69 +++-
 6 files changed, 420 insertions(+), 2 deletions(-)

-- 
2.17.1


^ permalink raw reply

* Re: [PATCH v2] drivers/net/wan/lapbether: Use needed_headroom instead of hard_header_len
From: Willem de Bruijn @ 2020-08-01  2:33 UTC (permalink / raw)
  To: Xie He
  Cc: David S. Miller, Jakub Kicinski, Linux Kernel Network Developers,
	LKML, Linux X25, Brian Norris
In-Reply-To: <CAJht_ENYxy4pseOO9gY=0R0bvPPvs4GKrGJOUMx6=LPwBa2+Bg@mail.gmail.com>

On Fri, Jul 31, 2020 at 4:41 PM Xie He <xie.he.0141@gmail.com> wrote:
>
> Thank you for your thorough review comment!
>
> On Fri, Jul 31, 2020 at 7:13 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Thanks for fixing a kernel panic. The existing line was added recently
> > in commit 9dc829a135fb ("drivers/net/wan/lapbether: Fixed the value of
> > hard_header_len"). I assume a kernel with that commit reverted also
> > panics? It does looks like it would.
>
> Yes, that commit also fixed kernel panic. But that patch only fixed
> kernel panic when using AF_PACKET/DGRAM sockets. It didn't fix kernel
> panic when using AF_PACKET/RAW sockets. This patch attempts to fix
> kernel panic when using AF_PACKET/RAW sockets, too.

Ah, okay. That's good to know.

While this protocol is old and seemingly unmaintained, it probably is
still in use. But the packet interface is not the common datapath. We
have to be careful not to introduce regressions to that.

> > If this driver submits a modified packet to an underlying eth device,
> > it is akin to tunnel drivers. The hard_header_len vs needed_headroom
> > discussion also came up there recently [1]. That discussion points to
> > commit c95b819ad75b ("gre: Use needed_headroom"). So the general
> > approach in this patch is fine. Do note the point about mtu
> > calculations -- but this device just hardcodes a 1000 byte dev->mtu
> > irrespective of underlying ethernet device mtu, so I guess it has
> > bigger issues on that point.
>
> Yes, I didn't consider the issue of mtu calculation. Maybe we need to
> calculate the mtu of this device based on the underlying Ethernet
> device, too.
>
> We may also need to handle the situation where the mtu of the
> underlying Ethernet device changes.
>
> I'm not sure if the mtu of the device can be changed by the user
> without explicit support from the driver. If it can, we may also need
> to set max_mtu and min_mtu properly to prevent the user from setting
> it to invalid values.

I suggest to ignore mtu. It is out of scope of this patch, which does
address an unrelated real kernel panic.

> > But, packet sockets with SOCK_RAW have to pass a fully formed packet
> > with all the headers the ndo_start_xmit expects, i.e., it should be
> > safe for the device to just pull that many bytes. X25 requires the
> > peculiar one byte pseudo header you mention: lapbeth_xmit
> > unconditionally reads skb->data[0] and then calls skb_pull(skb, 1).
> > This could be considered the device hard header len.
>
> Yes, I agree that we can use hard_header_len (and min_header_len) to
> prevent packets shorter than 1 byte from passing.
>
> But because af_packet.c reserves a header space of needed_headroom for
> RAW sockets, but hard_header_len + needed_headroom for DGRAM sockets,
> it appears to me that af_packet.c expects hard_header_len to be the
> header length created by dev_hard_header. We can, however, set
> hard_header_len to 1 and let dev_hard_header generate a 0-sized
> header, but this makes af_packet.c to reserve an extra unused 1-byte
> header space for DGRAM sockets, and DGRAM sockets will not be
> protected by the 1-byte minimum length check like RAW sockets.

Good point.

> The best solution might be to implement header_ops for X.25 drivers
> and let dev_hard_header create this 1-byte header, so that
> hard_header_len can equal to the header length created by
> dev_hard_header. This might be the best way to fit the logic of
> af_packet.c. But this requires changing the interface of X.25 drivers
> so it might be a big change.

Agreed.

I quickly scanned the main x.25 datapath code. Specifically
x25_establish_link, x25_terminate_link and x25_send_frame. These all
write this 1 byte header. It appears to be an in-band communication
means between the network and data link layer, never actually ending
up on the wire?

Either lapbeth_xmit has to have a guard against 0 byte packets before
reading skb->data[0], or packet sockets should not be able to generate
those (is this actually possible today through PF_PACKET? not sure)

If SOCK_DGRAM has to always select one of the three values (0x00:
data, 0x01: establish, 0x02: terminate) the first seems most sensible.
Though if there is no way to establish a connection with
PF_PACKET/SOCK_DGRAM, that whole interface may still be academic.
Maybe eventually either 0x00 or 0x01 could be selected based on
lapb->state.. That however is out of scope of this fix.

Normally a fix should aim to have a Fixes: tag, but all this code
precedes git history, so that is not feasible here.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] hinic: add generating mailbox random index support
From: luobin (L) @ 2020-08-01  2:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, linux-kernel, netdev, luoxianjun, yin.yinshi,
	cloud.wangxiaoyun, chiqijun
In-Reply-To: <20200731125212.4d58a90a@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 2020/8/1 3:52, Jakub Kicinski wrote:
> On Fri, 31 Jul 2020 09:56:41 +0800 Luo bin wrote:
>> add support to generate mailbox random id of VF to ensure that
>> mailbox messages PF received are from the correct VF.
>>
>> Signed-off-by: Luo bin <luobin9@huawei.com>
> 
>> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
>> index 47c93f946b94..c72aa8e8bce8 100644
>> --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
>> +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c
>> @@ -486,6 +486,111 @@ static void recv_mbox_handler(struct hinic_mbox_func_to_func *func_to_func,
>>  	kfree(rcv_mbox_temp);
>>  }
>>  
>> +static int set_vf_mbox_random_id(struct hinic_hwdev *hwdev, u16 func_id)
>> +{
>> +	struct hinic_mbox_func_to_func *func_to_func = hwdev->func_to_func;
>> +	struct hinic_set_random_id rand_info = {0};
>> +	u16 out_size = sizeof(rand_info);
>> +	struct hinic_pfhwdev *pfhwdev;
>> +	int ret;
>> +
>> +	pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
>> +
>> +	rand_info.version = HINIC_CMD_VER_FUNC_ID;
>> +	rand_info.func_idx = func_id;
>> +	rand_info.vf_in_pf = (u8)(func_id - hinic_glb_pf_vf_offset(hwdev->hwif));
> 
> this cast is unnecessary
> 
Will fix. Thanks for your review.
>> +	get_random_bytes(&rand_info.random_id, sizeof(u32));
> 
> get_random_u32()
> 
Will fix. Thanks for your review.
>> +
>> +	func_to_func->vf_mbx_rand_id[func_id] = rand_info.random_id;
>> +
>> +	ret = hinic_msg_to_mgmt(&pfhwdev->pf_to_mgmt, HINIC_MOD_COMM,
>> +				HINIC_MGMT_CMD_SET_VF_RANDOM_ID,
>> +				&rand_info, sizeof(rand_info),
>> +				&rand_info, &out_size, HINIC_MGMT_MSG_SYNC);
>> +	if ((rand_info.status != HINIC_MGMT_CMD_UNSUPPORTED &&
>> +	     rand_info.status) || !out_size || ret) {
>> +		dev_err(&hwdev->hwif->pdev->dev, "Set VF random id failed, err: %d, status: 0x%x, out size: 0x%x\n",
>> +			ret, rand_info.status, out_size);
>> +		return -EIO;
>> +	}
>> +
>> +	if (rand_info.status == HINIC_MGMT_CMD_UNSUPPORTED)
>> +		return rand_info.status;
>> +
>> +	func_to_func->vf_mbx_old_rand_id[func_id] =
>> +				func_to_func->vf_mbx_rand_id[func_id];
>> +
>> +	return 0;
>> +}
> 
>> +static bool check_vf_mbox_random_id(struct hinic_mbox_func_to_func *func_to_func,
>> +				    u8 *header)
>> +{
>> +	struct hinic_hwdev *hwdev = func_to_func->hwdev;
>> +	struct hinic_mbox_work *mbox_work = NULL;
>> +	u64 mbox_header = *((u64 *)header);
>> +	u16 offset, src;
>> +	u32 random_id;
>> +	int vf_in_pf;
>> +
>> +	src = HINIC_MBOX_HEADER_GET(mbox_header, SRC_GLB_FUNC_IDX);
>> +
>> +	if (IS_PF_OR_PPF_SRC(src) || !func_to_func->support_vf_random)
>> +		return true;
>> +
>> +	if (!HINIC_IS_PPF(hwdev->hwif)) {
>> +		offset = hinic_glb_pf_vf_offset(hwdev->hwif);
>> +		vf_in_pf = src - offset;
>> +
>> +		if (vf_in_pf < 1 || vf_in_pf > hwdev->nic_cap.max_vf) {
>> +			dev_warn(&hwdev->hwif->pdev->dev,
>> +				 "Receive vf id(0x%x) is invalid, vf id should be from 0x%x to 0x%x\n",
>> +				 src, offset + 1,
>> +				 hwdev->nic_cap.max_vf + offset);
>> +			return false;
>> +		}
>> +	}
>> +
>> +	random_id = be32_to_cpu(*(u32 *)(header + MBOX_SEG_LEN +
>> +					 MBOX_HEADER_SZ));
>> +
>> +	if (random_id == func_to_func->vf_mbx_rand_id[src] ||
>> +	    random_id == func_to_func->vf_mbx_old_rand_id[src])
> 
> What guarantees src < MAX_FUNCTION_NUM ?
> 
It has been checked if src >= MAX_FUNCTION_NUM in hinic_mbox_func_aeqe_handler before calling this function.
>> +		return true;
>> +
>> +	dev_warn(&hwdev->hwif->pdev->dev,
>> +		 "The mailbox random id(0x%x) of func_id(0x%x) doesn't match with pf reservation(0x%x)\n",
>> +		 random_id, src, func_to_func->vf_mbx_rand_id[src]);
>> +
>> +	mbox_work = kzalloc(sizeof(*mbox_work), GFP_KERNEL);
>> +	if (!mbox_work)
>> +		return false;
>> +
>> +	mbox_work->func_to_func = func_to_func;
>> +	mbox_work->src_func_idx = src;
>> +
>> +	INIT_WORK(&mbox_work->work, update_random_id_work_handler);
>> +	queue_work(func_to_func->workq, &mbox_work->work);
>> +
>> +	return false;
>> +}
> 
>> +int hinic_vf_mbox_random_id_init(struct hinic_hwdev *hwdev)
>> +{
>> +	u8 vf_in_pf;
>> +	int err = 0;
>> +
>> +	if (HINIC_IS_VF(hwdev->hwif))
>> +		return 0;
>> +
>> +	for (vf_in_pf = 1; vf_in_pf <= hwdev->nic_cap.max_vf; vf_in_pf++) {
>> +		err = set_vf_mbox_random_id(hwdev, hinic_glb_pf_vf_offset
>> +					    (hwdev->hwif) + vf_in_pf);
> 
> Parenthesis around hwdev->hwif not necessary
hwdev->hwif is the parameter of hinic_glb_pf_vf_offset function.
> 
>> +		if (err)
>> +			break;
>> +	}
>> +
>> +	if (err == HINIC_MGMT_CMD_UNSUPPORTED) {
>> +		hwdev->func_to_func->support_vf_random = false;
> 
> So all VFs need to support the feature for it to be used?
If this feature is not supported by fw, VFs can also be used, so we return success.
> 
>> +		err = 0;
>> +		dev_warn(&hwdev->hwif->pdev->dev, "Mgmt is unsupported to set VF%d random id\n",
>> +			 vf_in_pf - 1);
>> +	} else if (!err) {
>> +		hwdev->func_to_func->support_vf_random = true;
>> +		dev_info(&hwdev->hwif->pdev->dev, "PF Set VF random id success\n");
> 
> Is this info message really necessary?
I'll remove this info message. Thanks.
> 
>> +	}
> 
> .
> 

^ permalink raw reply

* [PATCH net-next] tcp: fix build fong CONFIG_MPTCP=n
From: Eric Dumazet @ 2020-08-01  2:09 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet, Florian Westphal

Fixes these errors:

net/ipv4/syncookies.c: In function 'tcp_get_cookie_sock':
net/ipv4/syncookies.c:216:19: error: 'struct tcp_request_sock' has no
member named 'drop_req'
  216 |   if (tcp_rsk(req)->drop_req) {
      |                   ^~
net/ipv4/syncookies.c: In function 'cookie_tcp_reqsk_alloc':
net/ipv4/syncookies.c:289:27: warning: unused variable 'treq'
[-Wunused-variable]
  289 |  struct tcp_request_sock *treq;
      |                           ^~~~
make[3]: *** [scripts/Makefile.build:280: net/ipv4/syncookies.o] Error 1
make[3]: *** Waiting for unfinished jobs....

Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
---
 net/ipv4/syncookies.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 11b20474be8310d7070750a1c7b4013f2fba2f55..f0794f0232bae749244fff35d8b96b1f561a5e87 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -213,7 +213,7 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
 		tcp_sk(child)->tsoffset = tsoff;
 		sock_rps_save_rxhash(child, skb);
 
-		if (tcp_rsk(req)->drop_req) {
+		if (rsk_drop_req(req)) {
 			refcount_set(&req->rsk_refcnt, 2);
 			return child;
 		}
@@ -286,10 +286,11 @@ struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
 					    struct sock *sk,
 					    struct sk_buff *skb)
 {
-	struct tcp_request_sock *treq;
 	struct request_sock *req;
 
 #ifdef CONFIG_MPTCP
+	struct tcp_request_sock *treq;
+
 	if (sk_is_mptcp(sk))
 		ops = &mptcp_subflow_request_sock_ops;
 #endif
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related

* Re: [PATCH v2 net-next 0/9] mptcp: add syncookie support
From: Eric Dumazet @ 2020-08-01  1:55 UTC (permalink / raw)
  To: David Miller, fw
  Cc: netdev, edumazet, mathew.j.martineau, matthieu.baerts, pabeni
In-Reply-To: <20200731.165627.166873468993268295.davem@davemloft.net>



On 7/31/20 4:56 PM, David Miller wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Thu, 30 Jul 2020 21:25:49 +0200
> 
>> Changes in v2:
>  ...
>> When syn-cookies are used the SYN?ACK never contains a MPTCP option,
>> because the code path that creates a request socket based on a valid
>> cookie ACK lacks the needed changes to construct MPTCP request sockets.
>>
>> After this series, if SYN carries MP_CAPABLE option, the option is not
>> cleared anymore and request socket will be reconstructed using the
>> MP_CAPABLE option data that is re-sent with the ACK.
>>
>> This means that no additional state gets encoded into the syn cookie or
>> the TCP timestamp.
>  ...
> 
> Series applied, thanks Florian.
> 

Build is broken

I had to use :

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 11b20474be8310d7070750a1c7b4013f2fba2f55..f0794f0232bae749244fff35d8b96b1f561a5e87 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -213,7 +213,7 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
                tcp_sk(child)->tsoffset = tsoff;
                sock_rps_save_rxhash(child, skb);
 
-               if (tcp_rsk(req)->drop_req) {
+               if (rsk_drop_req(req)) {
                        refcount_set(&req->rsk_refcnt, 2);
                        return child;
                }
@@ -286,10 +286,11 @@ struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
                                            struct sock *sk,
                                            struct sk_buff *skb)
 {
-       struct tcp_request_sock *treq;
        struct request_sock *req;
 
 #ifdef CONFIG_MPTCP
+       struct tcp_request_sock *treq;
+
        if (sk_is_mptcp(sk))
                ops = &mptcp_subflow_request_sock_ops;
 #endif

^ permalink raw reply related

* Re: pull-request: mac80211-next 2020-07-31
From: David Miller @ 2020-08-01  1:52 UTC (permalink / raw)
  To: johannes; +Cc: netdev, linux-wireless
In-Reply-To: <20200731165403.31899-1-johannes@sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Fri, 31 Jul 2020 18:54:02 +0200

> Here's a set of patches for -next, in case we get a release
> on Sunday :-) If not I may have some more next week, but no
> point waiting now with this.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Johannes.

^ permalink raw reply

* Re: [PATCH v2 net-next 7/9] mptcp: enable JOIN requests even if cookies are in use
From: Eric Dumazet @ 2020-08-01  1:50 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, mathew.j.martineau, Matthieu Baerts, Paolo Abeni
In-Reply-To: <20200730192558.25697-8-fw@strlen.de>

On Thu, Jul 30, 2020 at 12:26 PM Florian Westphal <fw@strlen.de> wrote:
>
> JOIN requests do not work in syncookie mode -- for HMAC validation, the
> peers nonce and the mptcp token (to obtain the desired connection socket
> the join is for) are required, but this information is only present in the
> initial syn.
>
> So either we need to drop all JOIN requests once a listening socket enters
> syncookie mode, or we need to store enough state to reconstruct the request
> socket later.
>
> This adds a state table (1024 entries) to store the data present in the
> MP_JOIN syn request and the random nonce used for the cookie syn/ack.
>
> When a MP_JOIN ACK passed cookie validation, the table is consulted
> to rebuild the request socket from it.
>
> An alternate approach would be to "cancel" syn-cookie mode and force
> MP_JOIN to always use a syn queue entry.
>
> However, doing so brings the backlog over the configured queue limit.
>
> v2: use req->syncookie, not (removed) want_cookie arg
>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/ipv4/syncookies.c  |   6 ++
>  net/mptcp/Makefile     |   1 +
>  net/mptcp/ctrl.c       |   1 +
>  net/mptcp/protocol.h   |  20 +++++++
>  net/mptcp/subflow.c    |  14 +++++
>  net/mptcp/syncookies.c | 132 +++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 174 insertions(+)
>  create mode 100644 net/mptcp/syncookies.c
>
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 54838ee2e8d4..11b20474be83 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -212,6 +212,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
>                 refcount_set(&req->rsk_refcnt, 1);
>                 tcp_sk(child)->tsoffset = tsoff;
>                 sock_rps_save_rxhash(child, skb);
> +
> +               if (tcp_rsk(req)->drop_req) {
> +                       refcount_set(&req->rsk_refcnt, 2);
> +                       return child;
> +               }
> +


Hey, what happened to CONFIG_MPTCP=n ?

net/ipv4/syncookies.c: In function 'tcp_get_cookie_sock':
net/ipv4/syncookies.c:216:19: error: 'struct tcp_request_sock' has no
member named 'drop_req'
  216 |   if (tcp_rsk(req)->drop_req) {
      |                   ^~
net/ipv4/syncookies.c: In function 'cookie_tcp_reqsk_alloc':
net/ipv4/syncookies.c:289:27: warning: unused variable 'treq'
[-Wunused-variable]
  289 |  struct tcp_request_sock *treq;
      |                           ^~~~
make[3]: *** [scripts/Makefile.build:280: net/ipv4/syncookies.o] Error 1
make[3]: *** Waiting for unfinished jobs....

^ permalink raw reply

* Re: [PATCH net-next v2] rtnetlink: add support for protodown reason
From: David Miller @ 2020-08-01  1:49 UTC (permalink / raw)
  To: roopa; +Cc: kuba, netdev, nikolay
In-Reply-To: <1596242041-14347-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>
Date: Fri, 31 Jul 2020 17:34:01 -0700

> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> netdev protodown is a mechanism that allows protocols to
> hold an interface down. It was initially introduced in
> the kernel to hold links down by a multihoming protocol.
> There was also an attempt to introduce protodown
> reason at the time but was rejected. protodown and protodown reason
> is supported by almost every switching and routing platform.
> It was ok for a while to live without a protodown reason.
> But, its become more critical now given more than
> one protocol may need to keep a link down on a system
> at the same time. eg: vrrp peer node, port security,
> multihoming protocol. Its common for Network operators and
> protocol developers to look for such a reason on a networking
> box (Its also known as errDisable by most networking operators)
> 
> This patch adds support for link protodown reason
> attribute. There are two ways to maintain protodown
> reasons.
> (a) enumerate every possible reason code in kernel
>     - A protocol developer has to make a request and
>       have that appear in a certain kernel version
> (b) provide the bits in the kernel, and allow user-space
> (sysadmin or NOS distributions) to manage the bit-to-reasonname
> map.
> 	- This makes extending reason codes easier (kind of like
>       the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)
> 
> This patch takes approach (b).
> 
> a few things about the patch:
> - It treats the protodown reason bits as counter to indicate
> active protodown users
> - Since protodown attribute is already an exposed UAPI,
> the reason is not enforced on a protodown set. Its a no-op
> if not used.
> the patch follows the below algorithm:
>   - presence of reason bits set indicates protodown
>     is in use
>   - user can set protodown and protodown reason in a
>     single or multiple setlink operations
>   - setlink operation to clear protodown, will return -EBUSY
>     if there are active protodown reason bits
>   - reason is not included in link dumps if not used
> 
> example with patched iproute2:
> $cat /etc/iproute2/protodown_reasons.d/r.conf
> 0 mlag
> 1 evpn
> 2 vrrp
> 3 psecurity
> 
> $ip link set dev vxlan0 protodown on protodown_reason vrrp on
> $ip link set dev vxlan0 protodown_reason mlag on
> $ip link show
> 14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
> DEFAULT group default qlen 1000
>     link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp>
> 
> $ip link set dev vxlan0 protodown_reason mlag off
> $ip link set dev vxlan0 protodown off protodown_reason vrrp off
> 
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---
> v2 - remove unnecessary helper dev_get_proto_down_reason
>      - move dev->proto_down_reason to use an existing hole in struct net_device

Applied, thank you.

^ permalink raw reply

* Re: [RFC PATCH] bpftool btf: Add prefix option to dump C
From: Ian Rogers @ 2020-08-01  1:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Quentin Monnet, Jakub Kicinski, Jiri Olsa,
	Toke Høiland-Jørgensen, Networking, bpf, open list,
	Stanislav Fomichev
In-Reply-To: <CAEf4BzaBYaFJ3eUinS9nHeykJ0xEbZpwLts33ZDp1PT=bkyjww@mail.gmail.com>

On Tue, Jul 21, 2020 at 11:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Jul 21, 2020 at 10:44 PM Ian Rogers <irogers@google.com> wrote:
> >
> > When bpftool dumps types and enum members into a header file for
> > inclusion the names match those in the original source. If the same
> > header file needs to be included in the original source and the bpf
> > program, the names of structs, unions, typedefs and enum members will
> > have naming collisions.
>
> vmlinux.h is not really intended to be used from user-space, because
> it's incompatible with pretty much any other header that declares any
> type. Ideally we should make this better, but that might require some
> compiler support. We've been discussing with Yonghong extending Clang
> with a compile-time check for whether some type is defined or not,
> which would allow to guard every type and only declare it
> conditionally, if it's missing. But that's just an idea at this point.

Thanks Andrii! We're not looking at user-space code but the BPF code.
The prefix idea comes from a way to solve this problem in C++ with
namespaces:

namespace vmlinux {
#include "vmlinux.h"
}

As the BPF programs are C code then the prefix acts like the
namespace. It seems strange to need to extend the language.

> Regardless, vmlinux.h is also very much Clang-specific, and shouldn't
> work well with GCC. Could you elaborate on the specifics of the use
> case you have in mind? That could help me see what might be the right
> solution. Thanks!

So the use-case is similar to btf_iter.h:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter.h
To avoid collisions with somewhat cleaner macro or not games.

Prompted by your concern I was looking into changing bpf_iter.h to use
a prefix to show what the difference would be like. I also think that
there may be issues with our kernel and tool set up that may mean that
the prefix is unnecessary, if I fix something else. Anyway, to give an
example I needed to build the selftests but this is failing for me.
What I see is:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
$ cd bpf-next
$ make defconfig
$ cat >>.config <<EOF
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_BTF=y
EOF
$ make -j all
$ mkdir /tmp/selftests
$ make O=/tmp/selftests/ TARGETS=bpf kselftest
...
  CLANG    /tmp/selftests//kselftest/bpf/tools/build/bpftool/profiler.bpf.o
skeleton/profiler.bpf.c:18:21: error: invalid application of 'sizeof'
to an incomplete type 'struct bpf_perf_event_value'
        __uint(value_size, sizeof(struct bpf_perf_event_value));
                           ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Checking with bpftool the vmlinux lacks struct bpf_perf_event_value
but as this is unconditionally defined in bpf.h this seems wrong. Do
you have any suggestions and getting a working build?

> > To avoid these collisions an approach is to redeclare the header file
> > types and enum members, which leads to duplication and possible
> > inconsistencies. Another approach is to use preprocessor macros
> > to rename conflicting names, but this can be cumbersome if there are
> > many conflicts.
> >
> > This patch adds a prefix option for the dumped names. Use of this option
> > can avoid name conflicts and compile time errors.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  .../bpf/bpftool/Documentation/bpftool-btf.rst |  7 ++++++-
> >  tools/bpf/bpftool/btf.c                       | 18 ++++++++++++++---
> >  tools/lib/bpf/btf.h                           |  1 +
> >  tools/lib/bpf/btf_dump.c                      | 20 +++++++++++++------
> >  4 files changed, 36 insertions(+), 10 deletions(-)
> >
>
> [...]
>
> > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > index 491c7b41ffdc..fea4baab00bd 100644
> > --- a/tools/lib/bpf/btf.h
> > +++ b/tools/lib/bpf/btf.h
> > @@ -117,6 +117,7 @@ struct btf_dump;
> >
> >  struct btf_dump_opts {
> >         void *ctx;
> > +       const char *name_prefix;
> >  };
>
> BTW, we can't do that, this breaks ABI. btf_dump_opts were added
> before we understood the problem of backward/forward  compatibility of
> libbpf APIs, unfortunately.

This could be fixed by adding a "new" API for the parameter, which
would be unfortunate compared to just amending the existing API. There
may be solutions that are less duplicative.

Thanks,
Ian

> >
> >  typedef void (*btf_dump_printf_fn_t)(void *ctx, const char *fmt, va_list args);
> > diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
> > index e1c344504cae..baf2b4d82e1e 100644
> > --- a/tools/lib/bpf/btf_dump.c
> > +++ b/tools/lib/bpf/btf_dump.c
>
> [...]

^ permalink raw reply

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
From: Andrew Lunn @ 2020-08-01  1:14 UTC (permalink / raw)
  To: Asmaa Mnebhi
  Cc: David Thompson, netdev@vger.kernel.org, davem@davemloft.net,
	kuba@kernel.org, Jiri Pirko
In-Reply-To: <VI1PR05MB4110ACD3FE86FD3DF5F59D84DA4E0@VI1PR05MB4110.eurprd05.prod.outlook.com>

> > > > +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add,
> > > > +int
> > >
> > > > +phy_reg) {
> > >
> > > > +         struct mlxbf_gige *priv = bus->priv;
> > >
> > > > +         u32 cmd;
> > >
> > > > +         u32 ret;
> > >
> > > > +
> > >
> > > > +         /* If the lock is held by something else, drop the request.
> > >
> > > > +         * If the lock is cleared, that means the busy bit was cleared.
> > >
> > > > +         */
> > >
> > >
> > >
> > > How can this happen? The mdio core has a mutex which prevents parallel
> > access?
> > >
> > >
> > >
> > > This is a HW Lock. It is an actual register. So another HW entity can
> > > be holding that lock and reading/changing the values in the HW registers.
> > 
> > You have not explains how that can happen? Is there something in the driver
> > i missed which takes a backdoor to read/write MDIO transactions?
> 
> Ah ok! There is a HW entity (called YU) within the BlueField which is connected to the PHY device.
> I think the YU is what you are calling "backdoor" here. The YU contains several registers which control reads/writes
> To the PHY. So it is like an extra layer for reading MDIO registers. One of the YU registers is the gateway register (aka GW or
> MLXBF_GIGE_MDIO_GW_OFFSET in the code). If the GW register's LOCK bit is not cleared, we cannot write anything to the actual PHY MDIO registers.
> Did I answer your question?

Nope.

How can two transactions happen at the same time, causing this lock
bit to be locked? Given that the MDIO core has a mutex and serialises
all transactions. How can the lock bit every be set?

> > > > +         ret = mlxbf_gige_mdio_poll_bit(priv,
> > > > + MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> > >
> > > > +         if (ret)
> > >
> > > > +                       return -EBUSY;
> > >
> > >
> > >
> > > PHY drivers are not going to like that. They are not going to retry.
> > > What is likely to happen is that phylib moves into the ERROR state,
> > > and the PHY driver grinds to a halt.
> > >
> > >
> > >
> > > This is a fairly quick HW transaction. So I don’t think it would cause
> > > and issue for the PHY drivers. In this case, we use the micrel
> > > KSZ9031. We haven’t seen issues.
> > 
> > So you have happy to debug hard to find and reproduce issues when it does
> > happen? Or would you like to spend a little bit of time now and just prevent
> > it happening at all?
> 
> I think I misunderstood your comment. Did you ask why we are polling here? Or that we shouldn't be returning -EBUSY?

I think you should not be returning EBUSY. If it every happens, bad
things will happen.

This lock bit seems to server no purpose. Software will ensure that
transactions are serialized. If it serves no purpose, just ensure it
is unlocked at probe time, and then ignore it. If you ignore it, you
will never return -EBUSY and so bad things will never happen.

Just because hardware exists does not mean you have to use it or that
it adds any value.

       Andrew

^ permalink raw reply

* Re: [PATCH v6 bpf-next 0/6] bpf: tailcalls in BPF subprograms
From: Daniel Borkmann @ 2020-08-01  1:03 UTC (permalink / raw)
  To: Maciej Fijalkowski, ast; +Cc: bpf, netdev, bjorn.topel, magnus.karlsson
In-Reply-To: <20200731000324.2253-1-maciej.fijalkowski@intel.com>

On 7/31/20 2:03 AM, Maciej Fijalkowski wrote:
> v5->v6:
> - propagate only those poke descriptors that individual subprogram is
>    actually using (Daniel)
> - drop the cumbersome check if poke desc got filled in map_poke_run()
> - move poke->ip renaming in bpf_jit_add_poke_descriptor() from patch 4
>    to patch 3 to provide bisectability (Daniel)

I did a basic test with Cilium on K8s with this set, spawning a few Pods
and checking connectivity & whether we're not crashing since it has bit more
elaborate tail call use. So far so good. I was inclined to push the series
out, but there is one more issue I noticed and didn't notice earlier when
reviewing, and that is overall stack size:

What happens when you create a single program that has nested BPF to BPF
calls e.g. either up to the maximum nesting or one call that is using up
the max stack size which is then doing another BPF to BPF call that contains
the tail call. In the tail call map, you have the same program in there.
This means we create a worst case stack from BPF size of max_stack_size *
max_tail_call_size, that is, 512*32. So that adds 16k worst case. For x86
we have a stack of arch/x86/include/asm/page_64_types.h:

   #define THREAD_SIZE_ORDER       (2 + KASAN_STACK_ORDER)
  #define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)

So we end up with 16k in a typical case. And this will cause kernel stack
overflow; I'm at least not seeing where we handle this situation in the
set. Hm, need to think more, but maybe this needs tracking of max stack
across tail calls to force an upper limit..

Thanks,
Daniel

^ permalink raw reply

* Re: linux-next: Tree for Jul 31 (net/decnet/ & FIB_RULES)
From: Randy Dunlap @ 2020-08-01  0:50 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	netdev@vger.kernel.org, linux-decnet-user, Brian Vazquez,
	David S. Miller
In-Reply-To: <20200801103507.03ae069b@canb.auug.org.au>

On 7/31/20 5:35 PM, Stephen Rothwell wrote:
> Hi Randy,
> 
> On Fri, 31 Jul 2020 08:53:09 -0700 Randy Dunlap <rdunlap@infradead.org> wrote:
>>
>> on i386:
>>
>> ld: net/core/fib_rules.o: in function `fib_rules_lookup':
>> fib_rules.c:(.text+0x16b8): undefined reference to `fib4_rule_match'
>> ld: fib_rules.c:(.text+0x16bf): undefined reference to `fib4_rule_match'
>> ld: fib_rules.c:(.text+0x170d): undefined reference to `fib4_rule_action'
>> ld: fib_rules.c:(.text+0x171e): undefined reference to `fib4_rule_action'
>> ld: fib_rules.c:(.text+0x1751): undefined reference to `fib4_rule_suppress'
>> ld: fib_rules.c:(.text+0x175d): undefined reference to `fib4_rule_suppress'
>>
>> CONFIG_DECNET=y
>> CONFIG_DECNET_ROUTER=y
>>
>> DECNET_ROUTER selects FIB_RULES.
> 
> I assume that CONFIG_IP_MULTIPLE_TABLES was not set for that build?

Correct.

> Caused by commit
> 
>   b9aaec8f0be5 ("fib: use indirect call wrappers in the most common fib_rules_ops")
> 
> from the net-next tree.

thanks.

-- 
~Randy


^ permalink raw reply

* Re: linux-next: Tree for Jul 31 (net/decnet/ & FIB_RULES)
From: Stephen Rothwell @ 2020-08-01  0:35 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	netdev@vger.kernel.org, linux-decnet-user, Brian Vazquez,
	David S. Miller
In-Reply-To: <4c6abcd0-e51b-0cf3-92de-5538c366e685@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 959 bytes --]

Hi Randy,

On Fri, 31 Jul 2020 08:53:09 -0700 Randy Dunlap <rdunlap@infradead.org> wrote:
>
> on i386:
> 
> ld: net/core/fib_rules.o: in function `fib_rules_lookup':
> fib_rules.c:(.text+0x16b8): undefined reference to `fib4_rule_match'
> ld: fib_rules.c:(.text+0x16bf): undefined reference to `fib4_rule_match'
> ld: fib_rules.c:(.text+0x170d): undefined reference to `fib4_rule_action'
> ld: fib_rules.c:(.text+0x171e): undefined reference to `fib4_rule_action'
> ld: fib_rules.c:(.text+0x1751): undefined reference to `fib4_rule_suppress'
> ld: fib_rules.c:(.text+0x175d): undefined reference to `fib4_rule_suppress'
> 
> CONFIG_DECNET=y
> CONFIG_DECNET_ROUTER=y
> 
> DECNET_ROUTER selects FIB_RULES.

I assume that CONFIG_IP_MULTIPLE_TABLES was not set for that build?

Caused by commit

  b9aaec8f0be5 ("fib: use indirect call wrappers in the most common fib_rules_ops")

from the net-next tree.
-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH net-next v2] rtnetlink: add support for protodown reason
From: Roopa Prabhu @ 2020-08-01  0:34 UTC (permalink / raw)
  To: davem, kuba; +Cc: netdev, nikolay

From: Roopa Prabhu <roopa@cumulusnetworks.com>

netdev protodown is a mechanism that allows protocols to
hold an interface down. It was initially introduced in
the kernel to hold links down by a multihoming protocol.
There was also an attempt to introduce protodown
reason at the time but was rejected. protodown and protodown reason
is supported by almost every switching and routing platform.
It was ok for a while to live without a protodown reason.
But, its become more critical now given more than
one protocol may need to keep a link down on a system
at the same time. eg: vrrp peer node, port security,
multihoming protocol. Its common for Network operators and
protocol developers to look for such a reason on a networking
box (Its also known as errDisable by most networking operators)

This patch adds support for link protodown reason
attribute. There are two ways to maintain protodown
reasons.
(a) enumerate every possible reason code in kernel
    - A protocol developer has to make a request and
      have that appear in a certain kernel version
(b) provide the bits in the kernel, and allow user-space
(sysadmin or NOS distributions) to manage the bit-to-reasonname
map.
	- This makes extending reason codes easier (kind of like
      the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)

This patch takes approach (b).

a few things about the patch:
- It treats the protodown reason bits as counter to indicate
active protodown users
- Since protodown attribute is already an exposed UAPI,
the reason is not enforced on a protodown set. Its a no-op
if not used.
the patch follows the below algorithm:
  - presence of reason bits set indicates protodown
    is in use
  - user can set protodown and protodown reason in a
    single or multiple setlink operations
  - setlink operation to clear protodown, will return -EBUSY
    if there are active protodown reason bits
  - reason is not included in link dumps if not used

example with patched iproute2:
$cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity

$ip link set dev vxlan0 protodown on protodown_reason vrrp on
$ip link set dev vxlan0 protodown_reason mlag on
$ip link show
14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp>

$ip link set dev vxlan0 protodown_reason mlag off
$ip link set dev vxlan0 protodown off protodown_reason vrrp off

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
v2 - remove unnecessary helper dev_get_proto_down_reason
     - move dev->proto_down_reason to use an existing hole in struct net_device
 include/linux/netdevice.h    |   4 ++
 include/uapi/linux/if_link.h |  10 ++++
 net/core/dev.c               |  25 ++++++++++
 net/core/rtnetlink.c         | 113 +++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 147 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ac2cd3f..ba0fa6b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2058,6 +2058,8 @@ struct net_device {
 	struct timer_list	watchdog_timer;
 	int			watchdog_timeo;
 
+	u32                     proto_down_reason;
+
 	struct list_head	todo_list;
 	int __percpu		*pcpu_refcnt;
 
@@ -3810,6 +3812,8 @@ int dev_get_port_parent_id(struct net_device *dev,
 bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b);
 int dev_change_proto_down(struct net_device *dev, bool proto_down);
 int dev_change_proto_down_generic(struct net_device *dev, bool proto_down);
+void dev_change_proto_down_reason(struct net_device *dev, unsigned long mask,
+				  u32 value);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev, bool *again);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 63af646..7fba4de 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -170,12 +170,22 @@ enum {
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
 	IFLA_PERM_ADDRESS,
+	IFLA_PROTO_DOWN_REASON,
 	__IFLA_MAX
 };
 
 
 #define IFLA_MAX (__IFLA_MAX - 1)
 
+enum {
+	IFLA_PROTO_DOWN_REASON_UNSPEC,
+	IFLA_PROTO_DOWN_REASON_MASK,	/* u32, mask for reason bits */
+	IFLA_PROTO_DOWN_REASON_VALUE,   /* u32, reason bit value */
+
+	__IFLA_PROTO_DOWN_REASON_CNT,
+	IFLA_PROTO_DOWN_REASON_MAX = __IFLA_PROTO_DOWN_REASON_CNT - 1
+};
+
 /* backwards compatibility for userspace */
 #ifndef __KERNEL__
 #define IFLA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ifinfomsg))))
diff --git a/net/core/dev.c b/net/core/dev.c
index fe2e387..a1f18bb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8716,6 +8716,31 @@ int dev_change_proto_down_generic(struct net_device *dev, bool proto_down)
 }
 EXPORT_SYMBOL(dev_change_proto_down_generic);
 
+/**
+ *	dev_change_proto_down_reason - proto down reason
+ *
+ *	@dev: device
+ *	@mask: proto down mask
+ *	@value: proto down value
+ */
+void dev_change_proto_down_reason(struct net_device *dev, unsigned long mask,
+				  u32 value)
+{
+	int b;
+
+	if (!mask) {
+		dev->proto_down_reason = value;
+	} else {
+		for_each_set_bit(b, &mask, 32) {
+			if (value & (1 << b))
+				dev->proto_down_reason |= BIT(b);
+			else
+				dev->proto_down_reason &= ~BIT(b);
+		}
+	}
+}
+EXPORT_SYMBOL(dev_change_proto_down_reason);
+
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t bpf_op,
 		    enum bpf_netdev_command cmd)
 {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 85a4b01..a54c3e0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1000,6 +1000,16 @@ static size_t rtnl_prop_list_size(const struct net_device *dev)
 	return size;
 }
 
+static size_t rtnl_proto_down_size(const struct net_device *dev)
+{
+	size_t size = nla_total_size(1);
+
+	if (dev->proto_down_reason)
+		size += nla_total_size(0) + nla_total_size(4);
+
+	return size;
+}
+
 static noinline size_t if_nlmsg_size(const struct net_device *dev,
 				     u32 ext_filter_mask)
 {
@@ -1041,7 +1051,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4)  /* IFLA_EVENT */
 	       + nla_total_size(4)  /* IFLA_NEW_NETNSID */
 	       + nla_total_size(4)  /* IFLA_NEW_IFINDEX */
-	       + nla_total_size(1)  /* IFLA_PROTO_DOWN */
+	       + rtnl_proto_down_size(dev)  /* proto down */
 	       + nla_total_size(4)  /* IFLA_TARGET_NETNSID */
 	       + nla_total_size(4)  /* IFLA_CARRIER_UP_COUNT */
 	       + nla_total_size(4)  /* IFLA_CARRIER_DOWN_COUNT */
@@ -1658,6 +1668,35 @@ static int rtnl_fill_prop_list(struct sk_buff *skb,
 	return ret;
 }
 
+static int rtnl_fill_proto_down(struct sk_buff *skb,
+				const struct net_device *dev)
+{
+	struct nlattr *pr;
+	u32 preason;
+
+	if (nla_put_u8(skb, IFLA_PROTO_DOWN, dev->proto_down))
+		goto nla_put_failure;
+
+	preason = dev->proto_down_reason;
+	if (!preason)
+		return 0;
+
+	pr = nla_nest_start(skb, IFLA_PROTO_DOWN_REASON);
+	if (!pr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, IFLA_PROTO_DOWN_REASON_VALUE, preason)) {
+		nla_nest_cancel(skb, pr);
+		goto nla_put_failure;
+	}
+
+	nla_nest_end(skb, pr);
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb,
 			    struct net_device *dev, struct net *src_net,
 			    int type, u32 pid, u32 seq, u32 change,
@@ -1708,13 +1747,15 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_CARRIER_CHANGES,
 			atomic_read(&dev->carrier_up_count) +
 			atomic_read(&dev->carrier_down_count)) ||
-	    nla_put_u8(skb, IFLA_PROTO_DOWN, dev->proto_down) ||
 	    nla_put_u32(skb, IFLA_CARRIER_UP_COUNT,
 			atomic_read(&dev->carrier_up_count)) ||
 	    nla_put_u32(skb, IFLA_CARRIER_DOWN_COUNT,
 			atomic_read(&dev->carrier_down_count)))
 		goto nla_put_failure;
 
+	if (rtnl_fill_proto_down(skb, dev))
+		goto nla_put_failure;
+
 	if (event != IFLA_EVENT_NONE) {
 		if (nla_put_u32(skb, IFLA_EVENT, event))
 			goto nla_put_failure;
@@ -1834,6 +1875,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_ALT_IFNAME]	= { .type = NLA_STRING,
 				    .len = ALTIFNAMSIZ - 1 },
 	[IFLA_PERM_ADDRESS]	= { .type = NLA_REJECT },
+	[IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2483,6 +2525,67 @@ static int do_set_master(struct net_device *dev, int ifindex,
 	return 0;
 }
 
+static const struct nla_policy ifla_proto_down_reason_policy[IFLA_PROTO_DOWN_REASON_VALUE + 1] = {
+	[IFLA_PROTO_DOWN_REASON_MASK]	= { .type = NLA_U32 },
+	[IFLA_PROTO_DOWN_REASON_VALUE]	= { .type = NLA_U32 },
+};
+
+static int do_set_proto_down(struct net_device *dev,
+			     struct nlattr *nl_proto_down,
+			     struct nlattr *nl_proto_down_reason,
+			     struct netlink_ext_ack *extack)
+{
+	struct nlattr *pdreason[IFLA_PROTO_DOWN_REASON_MAX + 1];
+	const struct net_device_ops *ops = dev->netdev_ops;
+	unsigned long mask = 0;
+	u32 value;
+	bool proto_down;
+	int err;
+
+	if (!ops->ndo_change_proto_down) {
+		NL_SET_ERR_MSG(extack,  "Protodown not supported by device");
+		return -EOPNOTSUPP;
+	}
+
+	if (nl_proto_down_reason) {
+		err = nla_parse_nested_deprecated(pdreason,
+						  IFLA_PROTO_DOWN_REASON_MAX,
+						  nl_proto_down_reason,
+						  ifla_proto_down_reason_policy,
+						  NULL);
+		if (err < 0)
+			return err;
+
+		if (!pdreason[IFLA_PROTO_DOWN_REASON_VALUE]) {
+			NL_SET_ERR_MSG(extack, "Invalid protodown reason value");
+			return -EINVAL;
+		}
+
+		value = nla_get_u32(pdreason[IFLA_PROTO_DOWN_REASON_VALUE]);
+
+		if (pdreason[IFLA_PROTO_DOWN_REASON_MASK])
+			mask = nla_get_u32(pdreason[IFLA_PROTO_DOWN_REASON_MASK]);
+
+		dev_change_proto_down_reason(dev, mask, value);
+	}
+
+	if (nl_proto_down) {
+		proto_down = nla_get_u8(nl_proto_down);
+
+		/* Dont turn off protodown if there are active reasons */
+		if (!proto_down && dev->proto_down_reason) {
+			NL_SET_ERR_MSG(extack, "Cannot clear protodown, active reasons");
+			return -EBUSY;
+		}
+		err = dev_change_proto_down(dev,
+					    proto_down);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 #define DO_SETLINK_MODIFIED	0x01
 /* notify flag means notify + modified. */
 #define DO_SETLINK_NOTIFY	0x03
@@ -2771,9 +2874,9 @@ static int do_setlink(const struct sk_buff *skb,
 	}
 	err = 0;
 
-	if (tb[IFLA_PROTO_DOWN]) {
-		err = dev_change_proto_down(dev,
-					    nla_get_u8(tb[IFLA_PROTO_DOWN]));
+	if (tb[IFLA_PROTO_DOWN] || tb[IFLA_PROTO_DOWN_REASON]) {
+		err = do_set_proto_down(dev, tb[IFLA_PROTO_DOWN],
+					tb[IFLA_PROTO_DOWN_REASON], extack);
 		if (err)
 			goto errout;
 		status |= DO_SETLINK_NOTIFY;
-- 
2.1.4


^ permalink raw reply related

* Re: [PATCH][next] net/sched: cls_u32: Use struct_size() helper
From: Gustavo A. R. Silva @ 2020-08-01  0:06 UTC (permalink / raw)
  To: David Miller, gustavoars
  Cc: jhs, xiyou.wangcong, jiri, kuba, netdev, linux-kernel
In-Reply-To: <20200731.165057.1709367540726800070.davem@davemloft.net>

On 7/31/20 18:50, David Miller wrote:
> From: "Gustavo A. R. Silva" <gustavoars@kernel.org>
> Date: Thu, 30 Jul 2020 11:03:14 -0500
> 
>> Make use of the struct_size() helper, in multiple places, instead
>> of an open-coded version in order to avoid any potential type
>> mistakes and protect against potential integer overflows.
>>
>> Also, remove unnecessary object identifier size.
>>
>> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> 
> Applied, thank you.
> 

Thanks, Dave.
--
Gustavo

^ permalink raw reply

* [iproute2-next v2 5/5] devlink: support setting the overwrite mask
From: Jacob Keller @ 2020-08-01  0:21 UTC (permalink / raw)
  To: netdev; +Cc: Jacob Keller
In-Reply-To: <20200801002159.3300425-1-jacob.e.keller@intel.com>

Add support for specifying the overwrite sections to allow in the flash
update command. This is done by adding a new "overwrite" option which
can take either "settings" or "identifiers" passing the overwrite mode
multiple times will combine the fields using bitwise-OR.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 devlink/devlink.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 7dbe9c7e07a8..a3360a09898b 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -302,6 +302,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_TRAP_POLICER_BURST	BIT(36)
 #define DL_OPT_HEALTH_REPORTER_AUTO_DUMP     BIT(37)
 #define DL_OPT_PORT_FUNCTION_HW_ADDR BIT(38)
+#define DL_OPT_FLASH_OVERWRITE		BIT(39)
 
 struct dl_opts {
 	uint64_t present; /* flags of present items */
@@ -349,6 +350,7 @@ struct dl_opts {
 	uint64_t trap_policer_burst;
 	char port_function_hw_addr[MAX_ADDR_LEN];
 	uint32_t port_function_hw_addr_len;
+	uint32_t overwrite_mask;
 };
 
 struct dl {
@@ -1282,6 +1284,19 @@ eswitch_encap_mode_get(const char *typestr,
 	return 0;
 }
 
+static int flash_overwrite_mask_get(const char *sectionstr, uint32_t *mask)
+{
+	if (strcmp(sectionstr, "settings") == 0) {
+		*mask |= DEVLINK_FLASH_OVERWRITE_SETTINGS;
+	} else if (strcmp(sectionstr, "identifiers") == 0) {
+		*mask |= DEVLINK_FLASH_OVERWRITE_IDENTIFIERS;
+	} else {
+		pr_err("Unknown overwrite section \"%s\"\n", sectionstr);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int param_cmode_get(const char *cmodestr,
 			   enum devlink_param_cmode *cmode)
 {
@@ -1624,6 +1639,21 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_FLASH_COMPONENT;
+
+		} else if (dl_argv_match(dl, "overwrite") &&
+				(o_all & DL_OPT_FLASH_OVERWRITE)) {
+			const char *sectionstr;
+
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &sectionstr);
+			if(err)
+				return err;
+			err = flash_overwrite_mask_get(sectionstr,
+							&opts->overwrite_mask);
+			if (err)
+				return err;
+			o_found |= DL_OPT_FLASH_OVERWRITE;
+
 		} else if (dl_argv_match(dl, "reporter") &&
 			   (o_all & DL_OPT_HEALTH_REPORTER_NAME)) {
 			dl_arg_inc(dl);
@@ -1851,6 +1881,9 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_FLASH_COMPONENT)
 		mnl_attr_put_strz(nlh, DEVLINK_ATTR_FLASH_UPDATE_COMPONENT,
 				  opts->flash_component);
+	if (opts->present & DL_OPT_FLASH_OVERWRITE)
+		mnl_attr_put_u32(nlh, DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK,
+				 opts->overwrite_mask);
 	if (opts->present & DL_OPT_HEALTH_REPORTER_NAME)
 		mnl_attr_put_strz(nlh, DEVLINK_ATTR_HEALTH_REPORTER_NAME,
 				  opts->reporter_name);
@@ -1951,7 +1984,7 @@ static void cmd_dev_help(void)
 	pr_err("       devlink dev param show [DEV name PARAMETER]\n");
 	pr_err("       devlink dev reload DEV [ netns { PID | NAME | ID } ]\n");
 	pr_err("       devlink dev info [ DEV ]\n");
-	pr_err("       devlink dev flash DEV file PATH [ component NAME ]\n");
+	pr_err("       devlink dev flash DEV file PATH [ component NAME ] [ overwrite SECTION ]\n");
 }
 
 static bool cmp_arr_last_handle(struct dl *dl, const char *bus_name,
@@ -3205,7 +3238,7 @@ static int cmd_dev_flash(struct dl *dl)
 			       NLM_F_REQUEST | NLM_F_ACK);
 
 	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_FLASH_FILE_NAME,
-				DL_OPT_FLASH_COMPONENT);
+				DL_OPT_FLASH_COMPONENT | DL_OPT_FLASH_OVERWRITE);
 	if (err)
 		return err;
 
-- 
2.28.0.163.g6104cc2f0b60


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox