Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Re: BUG: corrupted list in p9_read_work
From: Dmitry Vyukov @ 2018-10-10 16:36 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov,
	netdev, Ron Minnich, syzkaller-bugs, v9fs-developer
In-Reply-To: <20181010161003.GA5371@nautica>

On Wed, Oct 10, 2018 at 6:10 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> Dominique Martinet wrote on Wed, Oct 10, 2018:
>> It works though, is it just picky because I didn't end it in .git? let's
>> try again, sorry for the noise...
>>
>> #syz test: git://github.com/martinetd/linux.git e4ca13f7d075e551dc158df6af18fb412a1dba0a
>
> And I guess the commit hash needs to be in the default clone branch to
> work ?
> ('git fetch <repo> <hash>' happily fetches the commit in a new clone for
> me... But that feels like a github specific behaviour maybe)

yeeeep, this is bug:
https://github.com/google/syzkaller/issues/728

Turns out git fetch of a named remote and just a tree work
differently. The latter only fetches the main branch.

'git fetch <repo> <hash>' is it a thing? Is it something that requires
special server configuration? I remember something similar that wasn't
able to fetch a random commit hash all the time...

The plan was to make a named remote and then fetch it, this should
fetch everything.


> Oh, well; made a branch for it, last try for me.
>
> #syz test: git://github.com/martinetd/linux.git for-syzbot
>
> --
> Dominique

^ permalink raw reply

* Re: [PATCH net-next] net: enable RPS on vlan devices
From: Shannon Nelson @ 2018-10-10 16:18 UTC (permalink / raw)
  To: Eric Dumazet, davem, netdev; +Cc: silviu.smarandache
In-Reply-To: <6b6794f4-fa61-a9bc-43ed-3c62eb680498@gmail.com>

On 10/9/2018 7:17 PM, Eric Dumazet wrote:
> 
> 
> On 10/09/2018 07:11 PM, Shannon Nelson wrote:
>>
>> Hence the reason we sent this as an RFC a couple of weeks ago.  We got no response, so followed up with this patch in order to get some input. Do you have any suggestions for how we might accomplish this in a less ugly way?
> 
> I dunno, maybe a modern way for all these very specific needs would be to use an eBPF
> hook to implement whatever combination of RPS/RFS/what_have_you
> 
> Then, we no longer have to review what various strategies are used by users.

We're trying to make use of an existing useful feature that was designed 
for exactly this kind of problem.  It is already there and no new user 
training is needed.  We're actually fixing what could arguably be called 
a bug since the /sys/class/net/<dev>/queues/rx-0/rps_cpus entry exists 
for vlan devices but currently doesn't do anything.  We're also 
addressing a security concern related to the recent L1TF excitement.

For this case, we want to target the network stack processing to happen 
on a certain subset of CPUs.  With admittedly only a cursory look 
through eBPF, I don't see an obvious way to target the packet processing 
to an alternative CPU, unless we add yet another field to the skb that 
eBPF/XDP could fill and then query that field in the same time as we 
currently check get_rps_cpu().  But adding to the skb is usually frowned 
upon unless absolutely necessary, and this seems like a duplication of 
what we already have with RPS, so why add a competing feature?

Back to my earlier question: are there any suggestions for how we might 
accomplish this in a less ugly way?

sln

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/7] bpf: rename stack trace map operations
From: Song Liu @ 2018-10-10 16:15 UTC (permalink / raw)
  To: mauricio.vasquez; +Cc: Alexei Starovoitov, Daniel Borkmann, Networking
In-Reply-To: <153918035870.8915.8092114744007418400.stgit@kernel>

On Wed, Oct 10, 2018 at 7:05 AM Mauricio Vasquez B
<mauricio.vasquez@polito.it> wrote:
>
> In the following patches queue and stack maps (FIFO and LIFO
> datastructures) will be implemented.  In order to avoid confusion and
> a possible name clash rename stack_map_ops to stack_trace_map_ops
>
> Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  include/linux/bpf_types.h |    2 +-
>  kernel/bpf/stackmap.c     |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 5432f4c9f50e..658509daacd4 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -51,7 +51,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_LRU_HASH, htab_lru_map_ops)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_LRU_PERCPU_HASH, htab_lru_percpu_map_ops)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_LPM_TRIE, trie_map_ops)
>  #ifdef CONFIG_PERF_EVENTS
> -BPF_MAP_TYPE(BPF_MAP_TYPE_STACK_TRACE, stack_map_ops)
> +BPF_MAP_TYPE(BPF_MAP_TYPE_STACK_TRACE, stack_trace_map_ops)
>  #endif
>  BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY_OF_MAPS, array_of_maps_map_ops)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index b2ade10f7ec3..90daf285de03 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -600,7 +600,7 @@ static void stack_map_free(struct bpf_map *map)
>         put_callchain_buffers();
>  }
>
> -const struct bpf_map_ops stack_map_ops = {
> +const struct bpf_map_ops stack_trace_map_ops = {
>         .map_alloc = stack_map_alloc,
>         .map_free = stack_map_free,
>         .map_get_next_key = stack_map_get_next_key,
>

^ permalink raw reply

* Re: [PATCH bpf-next v2 6/7] Sync uapi/bpf.h to tools/include
From: Song Liu @ 2018-10-10 16:14 UTC (permalink / raw)
  To: mauricio.vasquez; +Cc: Alexei Starovoitov, Daniel Borkmann, Networking
In-Reply-To: <153918038786.8915.164246795854402314.stgit@kernel>

On Wed, Oct 10, 2018 at 7:06 AM Mauricio Vasquez B
<mauricio.vasquez@polito.it> wrote:
>
> Sync both files.
>
> Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/include/uapi/linux/bpf.h |   30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index f9187b41dff6..c8824d5364ff 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -103,6 +103,7 @@ enum bpf_cmd {
>         BPF_BTF_LOAD,
>         BPF_BTF_GET_FD_BY_ID,
>         BPF_TASK_FD_QUERY,
> +       BPF_MAP_LOOKUP_AND_DELETE_ELEM,
>  };
>
>  enum bpf_map_type {
> @@ -128,6 +129,8 @@ enum bpf_map_type {
>         BPF_MAP_TYPE_CGROUP_STORAGE,
>         BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
>         BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
> +       BPF_MAP_TYPE_QUEUE,
> +       BPF_MAP_TYPE_STACK,
>  };
>
>  enum bpf_prog_type {
> @@ -462,6 +465,28 @@ union bpf_attr {
>   *     Return
>   *             0 on success, or a negative error in case of failure.
>   *
> + * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
> + *     Description
> + *             Push an element *value* in *map*. *flags* is one of:
> + *
> + *             **BPF_EXIST**
> + *             If the queue/stack is full, the oldest element is removed to
> + *             make room for this.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
> + *
> + * int bpf_map_pop_elem(struct bpf_map *map, void *value)
> + *     Description
> + *             Pop an element from *map*.
> + * Return
> + *             0 on success, or a negative error in case of failure.
> + *
> + * int bpf_map_peek_elem(struct bpf_map *map, void *value)
> + *     Description
> + *             Get an element from *map* without removing it.
> + * Return
> + *             0 on success, or a negative error in case of failure.
> + *
>   * int bpf_probe_read(void *dst, u32 size, const void *src)
>   *     Description
>   *             For tracing programs, safely attempt to read *size* bytes from
> @@ -2303,7 +2328,10 @@ union bpf_attr {
>         FN(skb_ancestor_cgroup_id),     \
>         FN(sk_lookup_tcp),              \
>         FN(sk_lookup_udp),              \
> -       FN(sk_release),
> +       FN(sk_release),                 \
> +       FN(map_push_elem),              \
> +       FN(map_pop_elem),               \
> +       FN(map_peek_elem),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
>

^ permalink raw reply

* Re: [PATCH bpf-next v2 4/7] bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUE
From: Song Liu @ 2018-10-10 16:10 UTC (permalink / raw)
  To: mauricio.vasquez; +Cc: Alexei Starovoitov, Daniel Borkmann, Networking
In-Reply-To: <153918037654.8915.6512666573922033685.stgit@kernel>

On Wed, Oct 10, 2018 at 7:06 AM Mauricio Vasquez B
<mauricio.vasquez@polito.it> wrote:
>
> ARG_PTR_TO_UNINIT_MAP_VALUE argument is a pointer to a memory zone
> used to save the value of a map.  Basically the same as
> ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra
> argument.
>
> This will be used in the following patch that implements some new
> helpers that receive a pointer to be filled with a map value.
>
> Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
> ---
>  include/linux/bpf.h   |    1 +
>  kernel/bpf/verifier.c |    9 ++++++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5793f0c7fbb5..e37b4986bb45 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -139,6 +139,7 @@ enum bpf_arg_type {
>         ARG_CONST_MAP_PTR,      /* const argument used as pointer to bpf_map */
>         ARG_PTR_TO_MAP_KEY,     /* pointer to stack used as map key */
>         ARG_PTR_TO_MAP_VALUE,   /* pointer to stack used as map value */
> +       ARG_PTR_TO_UNINIT_MAP_VALUE,    /* pointer to valid memory used to store a map value */
>
>         /* the following constraints used to prototype bpf_memcmp() and other
>          * functions that access data on eBPF program stack
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3f93a548a642..d84c91ac3b70 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2117,7 +2117,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
>         }
>
>         if (arg_type == ARG_PTR_TO_MAP_KEY ||
> -           arg_type == ARG_PTR_TO_MAP_VALUE) {
> +           arg_type == ARG_PTR_TO_MAP_VALUE ||
> +           arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE) {
>                 expected_type = PTR_TO_STACK;
>                 if (!type_is_pkt_pointer(type) && type != PTR_TO_MAP_VALUE &&
>                     type != expected_type)
> @@ -2187,7 +2188,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
>                 err = check_helper_mem_access(env, regno,
>                                               meta->map_ptr->key_size, false,
>                                               NULL);
> -       } else if (arg_type == ARG_PTR_TO_MAP_VALUE) {
> +       } else if (arg_type == ARG_PTR_TO_MAP_VALUE ||
> +                  arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE) {
>                 /* bpf_map_xxx(..., map_ptr, ..., value) call:
>                  * check [value, value + map->value_size) validity
>                  */
> @@ -2196,9 +2198,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
>                         verbose(env, "invalid map_ptr to access map->value\n");
>                         return -EACCES;
>                 }
> +               meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
>                 err = check_helper_mem_access(env, regno,
>                                               meta->map_ptr->value_size, false,
> -                                             NULL);
> +                                             meta);
>         } else if (arg_type_is_mem_size(arg_type)) {
>                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
>
>

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/7] bpf/syscall: allow key to be null in map functions
From: Song Liu @ 2018-10-10 16:09 UTC (permalink / raw)
  To: mauricio.vasquez; +Cc: Alexei Starovoitov, Daniel Borkmann, Networking
In-Reply-To: <153918036421.8915.10832039936335740025.stgit@kernel>

On Wed, Oct 10, 2018 at 7:06 AM Mauricio Vasquez B
<mauricio.vasquez@polito.it> wrote:
>
> This commit adds the required logic to allow key being NULL
> in case the key_size of the map is 0.
>
> A new __bpf_copy_key function helper only copies the key from
> userpsace when key_size != 0, otherwise it enforces that key must be
> null.
>
> Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
> ---
>  kernel/bpf/syscall.c |   19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 4f416234251f..f36c080ad356 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -651,6 +651,17 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
>         return -ENOTSUPP;
>  }
>
> +static void *__bpf_copy_key(void __user *ukey, u64 key_size)
> +{
> +       if (key_size)
> +               return memdup_user(ukey, key_size);
> +
> +       if (ukey)
> +               return ERR_PTR(-EINVAL);
> +
> +       return NULL;
> +}
> +
>  /* last field in 'union bpf_attr' used by this command */
>  #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD value
>
> @@ -678,7 +689,7 @@ static int map_lookup_elem(union bpf_attr *attr)
>                 goto err_put;
>         }
>
> -       key = memdup_user(ukey, map->key_size);
> +       key = __bpf_copy_key(ukey, map->key_size);
>         if (IS_ERR(key)) {
>                 err = PTR_ERR(key);
>                 goto err_put;
> @@ -774,7 +785,7 @@ static int map_update_elem(union bpf_attr *attr)
>                 goto err_put;
>         }
>
> -       key = memdup_user(ukey, map->key_size);
> +       key = __bpf_copy_key(ukey, map->key_size);
>         if (IS_ERR(key)) {
>                 err = PTR_ERR(key);
>                 goto err_put;
> @@ -876,7 +887,7 @@ static int map_delete_elem(union bpf_attr *attr)
>                 goto err_put;
>         }
>
> -       key = memdup_user(ukey, map->key_size);
> +       key = __bpf_copy_key(ukey, map->key_size);
>         if (IS_ERR(key)) {
>                 err = PTR_ERR(key);
>                 goto err_put;
> @@ -928,7 +939,7 @@ static int map_get_next_key(union bpf_attr *attr)
>         }
>
>         if (ukey) {
> -               key = memdup_user(ukey, map->key_size);
> +               key = __bpf_copy_key(ukey, map->key_size);
>                 if (IS_ERR(key)) {
>                         err = PTR_ERR(key);
>                         goto err_put;
>

^ permalink raw reply

* [PATCH net-next v4] net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI command
From: Justin.Lee1 @ 2018-10-10 16:00 UTC (permalink / raw)
  To: sam, joel; +Cc: linux-aspeed, netdev, openbmc, amithash, christian, vijaykhemka

The new command (NCSI_CMD_SEND_CMD) is added to allow user space application
to send NC-SI command to the network card.
Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response.

The work flow is as below. 

Request:
User space application
	-> Netlink interface (msg)
	-> new Netlink handler - ncsi_send_cmd_nl()
	-> ncsi_xmit_cmd()

Response:
Response received - ncsi_rcv_rsp()
	-> internal response handler - ncsi_rsp_handler_xxx()
	-> ncsi_rsp_handler_netlink()
	-> ncsi_send_netlink_rsp ()
	-> Netlink interface (msg)
	-> user space application

Command timeout - ncsi_request_timeout()
	-> ncsi_send_netlink_timeout ()
	-> Netlink interface (msg with zero data length)
	-> user space application

Error:
Error detected
	-> ncsi_send_netlink_err ()
	-> Netlink interface (err msg)
	-> user space application


Signed-off-by: Justin Lee <justin.lee1@dell.com> 


---
V4: Update comments and remove some debug message.
V3: Based on http://patchwork.ozlabs.org/patch/979688/ to remove the duplicated code.
V2: Remove non-related debug message and clean up the code.

include/uapi/linux/ncsi.h |   6 ++
 net/ncsi/internal.h       |  10 ++-
 net/ncsi/ncsi-cmd.c       |   8 ++
 net/ncsi/ncsi-manage.c    |  16 ++++
 net/ncsi/ncsi-netlink.c   | 200 ++++++++++++++++++++++++++++++++++++++++++++++
 net/ncsi/ncsi-netlink.h   |  12 +++
 net/ncsi/ncsi-rsp.c       |  67 ++++++++++++++--
 7 files changed, 313 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/ncsi.h b/include/uapi/linux/ncsi.h
index 4c292ec..0a26a55 100644
--- a/include/uapi/linux/ncsi.h
+++ b/include/uapi/linux/ncsi.h
@@ -23,6 +23,9 @@
  *	optionally the preferred NCSI_ATTR_CHANNEL_ID.
  * @NCSI_CMD_CLEAR_INTERFACE: clear any preferred package/channel combination.
  *	Requires NCSI_ATTR_IFINDEX.
+ * @NCSI_CMD_SEND_CMD: send NC-SI command to network card.
+ *	Requires NCSI_ATTR_IFINDEX, NCSI_ATTR_PACKAGE_ID
+ *	and NCSI_ATTR_CHANNEL_ID.
  * @NCSI_CMD_MAX: highest command number
  */
 enum ncsi_nl_commands {
@@ -30,6 +33,7 @@ enum ncsi_nl_commands {
 	NCSI_CMD_PKG_INFO,
 	NCSI_CMD_SET_INTERFACE,
 	NCSI_CMD_CLEAR_INTERFACE,
+	NCSI_CMD_SEND_CMD,
 
 	__NCSI_CMD_AFTER_LAST,
 	NCSI_CMD_MAX = __NCSI_CMD_AFTER_LAST - 1
@@ -43,6 +47,7 @@ enum ncsi_nl_commands {
  * @NCSI_ATTR_PACKAGE_LIST: nested array of NCSI_PKG_ATTR attributes
  * @NCSI_ATTR_PACKAGE_ID: package ID
  * @NCSI_ATTR_CHANNEL_ID: channel ID
+ * @NCSI_ATTR_DATA: command payload
  * @NCSI_ATTR_MAX: highest attribute number
  */
 enum ncsi_nl_attrs {
@@ -51,6 +56,7 @@ enum ncsi_nl_attrs {
 	NCSI_ATTR_PACKAGE_LIST,
 	NCSI_ATTR_PACKAGE_ID,
 	NCSI_ATTR_CHANNEL_ID,
+	NCSI_ATTR_DATA,
 
 	__NCSI_ATTR_AFTER_LAST,
 	NCSI_ATTR_MAX = __NCSI_ATTR_AFTER_LAST - 1
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 3d0a33b..e9db100 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -175,6 +175,8 @@ struct ncsi_package;
 #define NCSI_RESERVED_CHANNEL	0x1f
 #define NCSI_CHANNEL_INDEX(c)	((c) & ((1 << NCSI_PACKAGE_SHIFT) - 1))
 #define NCSI_TO_CHANNEL(p, c)	(((p) << NCSI_PACKAGE_SHIFT) | (c))
+#define NCSI_MAX_PACKAGE	8
+#define NCSI_MAX_CHANNEL	32
 
 struct ncsi_channel {
 	unsigned char               id;
@@ -219,12 +221,17 @@ struct ncsi_request {
 	unsigned char        id;      /* Request ID - 0 to 255           */
 	bool                 used;    /* Request that has been assigned  */
 	unsigned int         flags;   /* NCSI request property           */
-#define NCSI_REQ_FLAG_EVENT_DRIVEN	1
+#define NCSI_REQ_FLAG_EVENT_DRIVEN		1
+#define NCSI_REQ_FLAG_NETLINK_DRIVEN	2
 	struct ncsi_dev_priv *ndp;    /* Associated NCSI device          */
 	struct sk_buff       *cmd;    /* Associated NCSI command packet  */
 	struct sk_buff       *rsp;    /* Associated NCSI response packet */
 	struct timer_list    timer;   /* Timer on waiting for response   */
 	bool                 enabled; /* Time has been enabled or not    */
+
+	u32                  snd_seq;     /* netlink sending sequence number */
+	u32                  snd_portid;  /* netlink portid of sender        */
+	struct nlmsghdr      nlhdr;       /* netlink message header          */
 };
 
 enum {
@@ -310,6 +317,7 @@ struct ncsi_cmd_arg {
 		unsigned int   dwords[4];
 	};
 	unsigned char        *data;       /* NCSI OEM data                 */
+	struct genl_info     *info;       /* Netlink information           */
 };
 
 extern struct list_head ncsi_dev_list;
diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index 82b7d92..356af47 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -17,6 +17,7 @@
 #include <net/ncsi.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
@@ -346,6 +347,13 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
 	if (!nr)
 		return -ENOMEM;
 
+	/* track netlink information */
+	if (nca->req_flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		nr->snd_seq = nca->info->snd_seq;
+		nr->snd_portid = nca->info->snd_portid;
+		nr->nlhdr = *nca->info->nlhdr;
+	}
+
 	/* Prepare the packet */
 	nca->id = nr->id;
 	ret = nch->handler(nr->cmd, nca);
diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index 0912847..76a4bcb 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -19,6 +19,7 @@
 #include <net/addrconf.h>
 #include <net/ipv6.h>
 #include <net/if_inet6.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
@@ -406,6 +407,9 @@ static void ncsi_request_timeout(struct timer_list *t)
 {
 	struct ncsi_request *nr = from_timer(nr, t, timer);
 	struct ncsi_dev_priv *ndp = nr->ndp;
+	struct ncsi_package *np;
+	struct ncsi_channel *nc;
+	struct ncsi_cmd_pkt *cmd;
 	unsigned long flags;
 
 	/* If the request already had associated response,
@@ -419,6 +423,18 @@ static void ncsi_request_timeout(struct timer_list *t)
 	}
 	spin_unlock_irqrestore(&ndp->lock, flags);
 
+	if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		if (nr->cmd) {
+			/* Find the package */
+			cmd = (struct ncsi_cmd_pkt *)
+			      skb_network_header(nr->cmd);
+			ncsi_find_package_and_channel(ndp,
+						      cmd->cmd.common.channel,
+						      &np, &nc);
+			ncsi_send_netlink_timeout(nr, np, nc);
+		}
+	}
+
 	/* Release the request */
 	ncsi_free_request(nr);
 }
diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 45f33d6..62e191f 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -20,6 +20,7 @@
 #include <uapi/linux/ncsi.h>
 
 #include "internal.h"
+#include "ncsi-pkt.h"
 #include "ncsi-netlink.h"
 
 static struct genl_family ncsi_genl_family;
@@ -29,6 +30,7 @@ static const struct nla_policy ncsi_genl_policy[NCSI_ATTR_MAX + 1] = {
 	[NCSI_ATTR_PACKAGE_LIST] =	{ .type = NLA_NESTED },
 	[NCSI_ATTR_PACKAGE_ID] =	{ .type = NLA_U32 },
 	[NCSI_ATTR_CHANNEL_ID] =	{ .type = NLA_U32 },
+	[NCSI_ATTR_DATA] =		{ .type = NLA_BINARY, .len = 2048 },
 };
 
 static struct ncsi_dev_priv *ndp_from_ifindex(struct net *net, u32 ifindex)
@@ -366,6 +368,198 @@ static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
 	return 0;
 }
 
+static int ncsi_send_cmd_nl(struct sk_buff *msg, struct genl_info *info)
+{
+	struct ncsi_dev_priv *ndp;
+
+	struct ncsi_cmd_arg nca;
+	struct ncsi_pkt_hdr *hdr;
+
+	u32 package_id, channel_id;
+	unsigned char *data;
+	int len, ret;
+
+	if (!info || !info->attrs) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_IFINDEX]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_PACKAGE_ID]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_CHANNEL_ID]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ndp = ndp_from_ifindex(get_net(sock_net(msg->sk)),
+			       nla_get_u32(info->attrs[NCSI_ATTR_IFINDEX]));
+	if (!ndp) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	package_id = nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_ID]);
+	channel_id = nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_ID]);
+
+	if (package_id >= NCSI_MAX_PACKAGE || channel_id >= NCSI_MAX_CHANNEL) {
+		ret = -ERANGE;
+		goto out_netlink;
+	}
+
+	len = nla_len(info->attrs[NCSI_ATTR_DATA]);
+	if (len < sizeof(struct ncsi_pkt_hdr)) {
+		netdev_info(ndp->ndev.dev, "NCSI: no command to send %u\n",
+			    package_id);
+		ret = -EINVAL;
+		goto out_netlink;
+	} else {
+		data = (unsigned char *)nla_data(info->attrs[NCSI_ATTR_DATA]);
+	}
+
+	hdr = (struct ncsi_pkt_hdr *)data;
+
+	nca.ndp = ndp;
+	nca.package = (unsigned char)package_id;
+	nca.channel = (unsigned char)channel_id;
+	nca.type = hdr->type;
+	nca.req_flags = NCSI_REQ_FLAG_NETLINK_DRIVEN;
+	nca.info = info;
+	nca.payload = ntohs(hdr->length);
+	nca.data = data + sizeof(*hdr);
+
+	ret = ncsi_xmit_cmd(&nca);
+out_netlink:
+	if (ret != 0) {
+		netdev_err(ndp->ndev.dev,
+			   "NCSI: Error %d sending OEM command\n",
+			   ret);
+		ncsi_send_netlink_err(ndp->ndev.dev,
+				      info->snd_seq,
+				      info->snd_portid,
+				      info->nlhdr,
+				      ret);
+	}
+out:
+	return ret;
+}
+
+int ncsi_send_netlink_rsp(struct ncsi_request *nr,
+			  struct ncsi_package *np,
+			  struct ncsi_channel *nc)
+{
+	struct sk_buff *skb;
+	struct net *net;
+	void *hdr;
+	int rc;
+
+	net = dev_net(nr->rsp->dev);
+
+	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(skb, nr->snd_portid, nr->snd_seq,
+			  &ncsi_genl_family, 0, NCSI_CMD_SEND_CMD);
+	if (!hdr) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	nla_put_u32(skb, NCSI_ATTR_IFINDEX, nr->rsp->dev->ifindex);
+	if (np)
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID, np->id);
+	if (nc)
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, nc->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, NCSI_RESERVED_CHANNEL);
+
+	rc = nla_put(skb, NCSI_ATTR_DATA, nr->rsp->len, (void *)nr->rsp->data);
+	if (rc)
+		goto err;
+
+	genlmsg_end(skb, hdr);
+	return genlmsg_unicast(net, skb, nr->snd_portid);
+
+err:
+	kfree_skb(skb);
+	return rc;
+}
+
+int ncsi_send_netlink_timeout(struct ncsi_request *nr,
+			      struct ncsi_package *np,
+			      struct ncsi_channel *nc)
+{
+	struct sk_buff *skb;
+	struct net *net;
+	void *hdr;
+
+	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(skb, nr->snd_portid, nr->snd_seq,
+			  &ncsi_genl_family, 0, NCSI_CMD_SEND_CMD);
+	if (!hdr) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	net = dev_net(nr->cmd->dev);
+
+	nla_put_u32(skb, NCSI_ATTR_IFINDEX, nr->cmd->dev->ifindex);
+
+	if (np)
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID, np->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID,
+			    NCSI_PACKAGE_INDEX((((struct ncsi_pkt_hdr *)
+						 nr->cmd->data)->channel)));
+
+	if (nc)
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, nc->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, NCSI_RESERVED_CHANNEL);
+
+	genlmsg_end(skb, hdr);
+	return genlmsg_unicast(net, skb, nr->snd_portid);
+}
+
+int ncsi_send_netlink_err(struct net_device *dev,
+			  u32 snd_seq,
+			  u32 snd_portid,
+			  struct nlmsghdr *nlhdr,
+			  int err)
+{
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	struct nlmsgerr *nle;
+	struct net *net;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	net = dev_net(dev);
+
+	nlh = nlmsg_put(skb, snd_portid, snd_seq,
+			NLMSG_ERROR, sizeof(*nle), 0);
+	nle = (struct nlmsgerr *)nlmsg_data(nlh);
+	nle->error = err;
+	memcpy(&nle->msg, nlhdr, sizeof(*nlh));
+
+	nlmsg_end(skb, nlh);
+
+	return nlmsg_unicast(net->genl_sock, skb, snd_portid);
+}
+
 static const struct genl_ops ncsi_ops[] = {
 	{
 		.cmd = NCSI_CMD_PKG_INFO,
@@ -386,6 +580,12 @@ static const struct genl_ops ncsi_ops[] = {
 		.doit = ncsi_clear_interface_nl,
 		.flags = GENL_ADMIN_PERM,
 	},
+	{
+		.cmd = NCSI_CMD_SEND_CMD,
+		.policy = ncsi_genl_policy,
+		.doit = ncsi_send_cmd_nl,
+		.flags = GENL_ADMIN_PERM,
+	},
 };
 
 static struct genl_family ncsi_genl_family __ro_after_init = {
diff --git a/net/ncsi/ncsi-netlink.h b/net/ncsi/ncsi-netlink.h
index 91a5c25..c4a4688 100644
--- a/net/ncsi/ncsi-netlink.h
+++ b/net/ncsi/ncsi-netlink.h
@@ -14,6 +14,18 @@
 
 #include "internal.h"
 
+int ncsi_send_netlink_rsp(struct ncsi_request *nr,
+			  struct ncsi_package *np,
+			  struct ncsi_channel *nc);
+int ncsi_send_netlink_timeout(struct ncsi_request *nr,
+			      struct ncsi_package *np,
+			      struct ncsi_channel *nc);
+int ncsi_send_netlink_err(struct net_device *dev,
+			  u32 snd_seq,
+			  u32 snd_portid,
+			  struct nlmsghdr *nlhdr,
+			  int err);
+
 int ncsi_init_netlink(struct net_device *dev);
 int ncsi_unregister_netlink(struct net_device *dev);
 
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index d66b347..dd931d2 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -16,9 +16,11 @@
 #include <net/ncsi.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
+#include "ncsi-netlink.h"
 
 static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 				 unsigned short payload)
@@ -32,15 +34,25 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 	 * before calling this function.
 	 */
 	h = (struct ncsi_rsp_pkt_hdr *)skb_network_header(nr->rsp);
-	if (h->common.revision != NCSI_PKT_REVISION)
+
+	if (h->common.revision != NCSI_PKT_REVISION) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: unsupported header revision\n");
 		return -EINVAL;
-	if (ntohs(h->common.length) != payload)
+	}
+	if (ntohs(h->common.length) != payload) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: payload length mismatched\n");
 		return -EINVAL;
+	}
 
 	/* Check on code and reason */
 	if (ntohs(h->code) != NCSI_PKT_RSP_C_COMPLETED ||
-	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR)
-		return -EINVAL;
+	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: non zero response/reason code\n");
+		return -EPERM;
+	}
 
 	/* Validate checksum, which might be zeroes if the
 	 * sender doesn't support checksum according to NCSI
@@ -52,8 +64,11 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 
 	checksum = ncsi_calculate_checksum((unsigned char *)h,
 					   sizeof(*h) + payload - 4);
-	if (*pchecksum != htonl(checksum))
+
+	if (*pchecksum != htonl(checksum)) {
+		netdev_dbg(nr->ndp->ndev.dev, "NCSI: checksum mismatched\n");
 		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -941,6 +956,26 @@ static int ncsi_rsp_handler_gpuuid(struct ncsi_request *nr)
 	return 0;
 }
 
+static int ncsi_rsp_handler_netlink(struct ncsi_request *nr)
+{
+	struct ncsi_rsp_pkt *rsp;
+	struct ncsi_dev_priv *ndp = nr->ndp;
+	struct ncsi_package *np;
+	struct ncsi_channel *nc;
+	int ret;
+
+	/* Find the package */
+	rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
+	ncsi_find_package_and_channel(ndp, rsp->rsp.common.channel,
+				      &np, &nc);
+	if (!np)
+		return -ENODEV;
+
+	ret = ncsi_send_netlink_rsp(nr, np, nc);
+
+	return ret;
+}
+
 static struct ncsi_rsp_handler {
 	unsigned char	type;
 	int             payload;
@@ -1043,6 +1078,17 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
 		netdev_warn(ndp->ndev.dev,
 			    "NCSI: 'bad' packet ignored for type 0x%x\n",
 			    hdr->type);
+
+		if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+			if (ret == -EPERM)
+				goto out_netlink;
+			else
+				ncsi_send_netlink_err(ndp->ndev.dev,
+						      nr->snd_seq,
+						      nr->snd_portid,
+						      &nr->nlhdr,
+						      ret);
+		}
 		goto out;
 	}
 
@@ -1052,6 +1098,17 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
 		netdev_err(ndp->ndev.dev,
 			   "NCSI: Handler for packet type 0x%x returned %d\n",
 			   hdr->type, ret);
+
+out_netlink:
+	if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		ret = ncsi_rsp_handler_netlink(nr);
+		if (ret) {
+			netdev_err(ndp->ndev.dev,
+				   "NCSI: Netlink handler for packet type 0x%x returned %d\n",
+				   hdr->type, ret);
+		}
+	}
+
 out:
 	ncsi_free_request(nr);
 	return ret;
-- 
2.9.3



^ permalink raw reply related

* [PATCH ipsec] xfrm: policy: use hlist rcu variants on insert
From: Florian Westphal @ 2018-10-10 16:02 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev, Florian Westphal

bydst table/list lookups use rcu, so insertions must use rcu versions.

Fixes: a7c44247f704e ("xfrm: policy: make xfrm_policy_lookup_bytype lockless")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/xfrm/xfrm_policy.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 3110c3fbee20..3cf1fd7c0869 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -632,9 +632,9 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 				break;
 		}
 		if (newpos)
-			hlist_add_behind(&policy->bydst, newpos);
+			hlist_add_behind_rcu(&policy->bydst, newpos);
 		else
-			hlist_add_head(&policy->bydst, chain);
+			hlist_add_head_rcu(&policy->bydst, chain);
 	}
 
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
@@ -774,9 +774,9 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 			break;
 	}
 	if (newpos)
-		hlist_add_behind(&policy->bydst, newpos);
+		hlist_add_behind_rcu(&policy->bydst, newpos);
 	else
-		hlist_add_head(&policy->bydst, chain);
+		hlist_add_head_rcu(&policy->bydst, chain);
 	__xfrm_policy_link(policy, dir);
 
 	/* After previous checking, family can either be AF_INET or AF_INET6 */
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH stable 4.9 00/29] backport of IP fragmentation fixes
From: Stephen Hemminger @ 2018-10-10 23:18 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Eric Dumazet, netdev, davem, gregkh, stable, edumazet
In-Reply-To: <eae8e2b7-acc6-22d6-6354-0b7d50af3800@gmail.com>

On Tue, 9 Oct 2018 21:15:04 -0700
Florian Fainelli <f.fainelli@gmail.com> wrote:

> > 
> > Strange, I do not see "ip: use rb trees for IP frag queue." in this list ?  
> 
> And it was not in Stephen's backport to 4.14 either, wait, looks like it
> was somehow squashed into "net: sk_buff rbnode reorg". Stephen, was
> there a reason for that?
> 
> Let me go back and add bffa72cf7f9df842f0016ba03586039296b4caaf as well
> as eeea10b83a139451130df1594f26710c8fa390c8 to the rebase todo and see
> how things go from there.
> 
> Thanks for taking a look.

I don't remember, spent time doing cherry-pick and fixups. Maybe the reorg
commit got squashed as part of one rebase.

^ permalink raw reply

* Re: Possible bug in traffic control?
From: Josh Coombs @ 2018-10-10 15:52 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CACcUnf_HsNiG5aDdpOCuXY436PY297Yof0MFfp7eVhKgPgUn_A@mail.gmail.com>

2.3 billion 1 byte packets failed to re-create the bug.  To try and
simplify the setup I removed macsec from the equation, using a single
host in the middle as the bridge.  Interestingly, rather than 1.3Gbits
a second in both directions, it ran around 8Mbits a second.  Switching
the filter from u32 to matchall didn't change the performance.  Going
back to the four machine test bed, again removing macsec and just
bridging through radically decreased the throughput to around 8Mbits.
Flip on macsec for the bridge and 1.3Gbits?
On Tue, Oct 9, 2018 at 11:58 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> Hello all, I'm looking for some guidance in chasing what I believe to
> be a bug in kernel traffic control filters.  If I'm pinging the wrong
> list let me know.
>
> I have a homebrew MACSec bridge setup using two pairs of PCs.  I
> establish a MACSec link between them, and then use TC to bridge a
> second ethernet interface over the MACSec link. The second interface
> is connected to a Juniper switch at each end, and I'm using LACP over
> the links to bond them up for redundancy.  It turns out I need that
> redundancy as after awhile one pair of bridges will stop flowing
> packets in one direction.  I've since replicated this failure with a
> group of VMs as well.
>
> My test setup to replicate the failure inside ESXi:
> - Two MACSec bridge VMs, A and Z
> - Two IPerf VMs, A and Z
> My VMs are currently built using Ubuntu Server 18.04 to be quick, no
> additional packages are required outside of iperf3.  Kernel ver as
> shipped currently is 4.15.0-36.  I highly advise using a CPU with AES
> instruction support as MACSec eats CPU without it and will take longer
> to reproduce the symptoms.
>
> - A 'MACSec Bridge' network
> - A 'A Side link' network
> - A 'Z Side link' network
> In ESXi I used a dedicated vSwitch, 9000 MTU (to allow full 1500 eth
> packets + MACSec to pass on the bridge) and the security policy is
> full open (allow promiscuous, allow forged, allow mac changes) as
> we're abusing the networks as direct point to point links.  If using
> physical machines, just cable up, my example script bumps the MTU as
> required.
>
> The MACSec boxes have two ethernet interfaces each.  One pair is on
> the MACSec Bridge network.  The other interfaces go to the A and Z
> IPerf boxes respectively via their dedicated networks.  A and Z need
> their interfaces configured with IPs in a common subnet, such as
> 192.168.0.1/30 and 192.168.0.2/30.
>
> My script sets up MACSec, tweaks MTUs, and touches a few sysctls to
> turn the involved interfaces into silent actors.  It then uses TC to
> start the actual bridging.  From there I've been firing up iperf 3
> sessions in both directions between A and Z to hammer the bridge until
> it fails.  When it does, I can see packets stop being bridged in one
> direction on one MACSec host, but not the other.  The second host
> continues to flow packets in both directions.  Nothing is logged to
> dmesg when this fault occurs.  The fault seems to occur at roughly the
> same packet / traffic amount each time.  On my main application it's
> after approximately 2.5TB of traffic (random mix of sizes) and with my
> test bed it was after 5.5TB of 1500 byte packets.
>
> On the impacted MACSec node, watching interface packet counters via
> ifconfig and actual traffic with tcpdump I can see packets coming in
> MACSec and going out the host interface, the host reply coming in but
> not showing up on the MACSec interface to cross the bridge.  Clearing
> out the tc filter and qdisc and re-adding does not restore traffic
> flow.
>
> There is a PPA with 4.18 available for Ubuntu that I'm going to test
> with next to see if that makes a difference in behavior.  In the mean
> time I'd appreciate any suggestions on how to diagnose this.
>
> My MACSec bridge setup script, update sif, dif, the keys and rxmac to
> match your setup.  The rxmac is the mac addy of the remote bridge
> interface.  Keys need to be flipped between systems.
> -----------------------
> #!/bin/bash
>
> # Interfaces:
> # sif = Ingress physical interface (Source)
> # dif = Egress physical interface (Dest)
> # eif = Encrypted interface
> sif=eno2
> dif=enp1s0f0
> eif=macsec0
>
> # MACSec Keys:
> # txkey = Transmit (Local) key
> # rxkey = Receive (Remote) key
> # rxmac = Receive (Remote) MAC addy
> txkey=00000000000000000000000000000000
> rxkey=99999999999999999999999999999999
> rxmac=00:11:22:33:44:55
>
> # Use jumbo frames for macsec to allow full 1500 MTU passthrough:
> echo "* MTU update"
> ip link set "$sif" mtu 9000
> ip link set "$dif" mtu 9000
>
> # Bring up macsec:
> echo "* Enable MACSec"
> modprobe macsec
> ip link add link "$dif" "$eif" type macsec
> ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey"
> ip macsec add "$eif" rx address "$rxmac" port 1
> ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey"
> ip link set "$eif" type macsec encrypt on
> #ip link set "$eif" type macsec replay on window 64
>
> # Keep system from trying to respond to observed traffic:
> echo "* Clamp the system so bridge ports NEVER respond to traffic"
> sysctl -w net.ipv4.conf.default.arp_filter=1
> sysctl -w net.ipv4.conf.all.arp_filter=1
> ip link set "$sif" down promisc on arp off multicast off
> sysctl -w net.ipv6.conf."$sif".autoconf=0
> sysctl -w net.ipv6.conf."$sif".accept_ra=0
> sysctl -w net.ipv4.conf."$sif".arp_ignore=8
> sysctl -w net.ipv4.conf."$sif".rp_filter=0
> ip link set "$dif" down promisc on arp off multicast off
> sysctl -w net.ipv6.conf."$dif".autoconf=0
> sysctl -w net.ipv6.conf."$dif".accept_ra=0
> sysctl -w net.ipv4.conf."$dif".arp_ignore=8
> sysctl -w net.ipv4.conf."$dif".rp_filter=0
> ip link set "$eif" down promisc on arp off multicast off
> sysctl -w net.ipv6.conf."$eif".autoconf=0
> sysctl -w net.ipv6.conf."$eif".accept_ra=0
> sysctl -w net.ipv4.conf."$eif".arp_ignore=8
> sysctl -w net.ipv4.conf."$eif".rp_filter=0
>
> # Set up traffic mirroring:
> echo "* Start Port Mirror"
> # sif to eif
> tc qdisc add dev "$sif" ingress
> tc filter add dev "$sif" parent ffff: \
>           protocol all \
>           u32 match u8 0 0 \
>           action mirred egress mirror dev "$eif"
>
> # eif to sif
> tc qdisc add dev "$eif" ingress
> tc filter add dev "$eif" parent ffff: \
>           protocol all \
>           u32 match u8 0 0 \
>           action mirred egress mirror dev "$sif"
>
> # Bring up the interfaces:
> echo "* Light tunnel NICS"
> ip link set "$sif" up
> ip link set "$dif" up
> ip link set "$eif" up
>
> echo " --=[ MACSec Up ]=--"
> -----------------------
>
> Josh Coombs

^ permalink raw reply

* RE: [PATCH net-next v3] net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI command
From: Justin.Lee1 @ 2018-10-10 15:51 UTC (permalink / raw)
  To: sam, joel; +Cc: linux-aspeed, netdev, openbmc, amithash, christian, vijaykhemka
In-Reply-To: <4069c511575c0a1d0245697692d758217ce115b4.camel@mendozajonas.com>

Hi Samual,

I will address the comment and send out the v4.

Thanks,
Justin


> On Mon, 2018-10-08 at 23:13 +0000, Justin.Lee1@Dell.com wrote:
> > The new command (NCSI_CMD_SEND_CMD) is added to allow user space 
> > application to send NC-SI command to the network card.
> > Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response.
> > 
> > The work flow is as below. 
> > 
> > Request:
> > User space application
> > 	-> Netlink interface (msg)
> > 	-> new Netlink handler - ncsi_send_cmd_nl()
> > 	-> ncsi_xmit_cmd()
> > 
> > Response:
> > Response received - ncsi_rcv_rsp()
> > 	-> internal response handler - ncsi_rsp_handler_xxx()
> > 	-> ncsi_rsp_handler_netlink()
> > 	-> ncsi_send_netlink_rsp ()
> > 	-> Netlink interface (msg)
> > 	-> user space application
> > 
> > Command timeout - ncsi_request_timeout()
> > 	-> ncsi_send_netlink_timeout ()
> > 	-> Netlink interface (msg with zero data length)
> > 	-> user space application
> > 
> > Error:
> > Error detected
> > 	-> ncsi_send_netlink_err ()
> > 	-> Netlink interface (err msg)
> > 	-> user space application
> 
> Hi Justin,
> 
> I've built and tested this and it works as expected; except for some very
> minor comments below:
> 
> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
> 
> > 
> > V3: Based on http://patchwork.ozlabs.org/patch/979688/ to remove the duplicated code.
> > V2: Remove non-related debug message and clean up the code.
> 
> It's better to put these change notes under the --- below so they're not
> included in the commit message, but thanks for including them!
> 
> > 
> > 
> > Signed-off-by: Justin Lee <justin.lee1@dell.com> 
> > 
> > 
> > ---
> >  include/uapi/linux/ncsi.h |   3 +
> >  net/ncsi/internal.h       |  10 ++-
> >  net/ncsi/ncsi-cmd.c       |   8 ++
> >  net/ncsi/ncsi-manage.c    |  16 ++++
> >  net/ncsi/ncsi-netlink.c   | 204 ++++++++++++++++++++++++++++++++++++++++++++++
> >  net/ncsi/ncsi-netlink.h   |  12 +++
> >  net/ncsi/ncsi-rsp.c       |  67 +++++++++++++--
> >  7 files changed, 314 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/uapi/linux/ncsi.h b/include/uapi/linux/ncsi.h
> > index 4c292ec..4992bfc 100644
> > --- a/include/uapi/linux/ncsi.h
> > +++ b/include/uapi/linux/ncsi.h
> > @@ -30,6 +30,7 @@ enum ncsi_nl_commands {
> >  	NCSI_CMD_PKG_INFO,
> >  	NCSI_CMD_SET_INTERFACE,
> >  	NCSI_CMD_CLEAR_INTERFACE,
> > +	NCSI_CMD_SEND_CMD,
> >  
> >  	__NCSI_CMD_AFTER_LAST,
> >  	NCSI_CMD_MAX = __NCSI_CMD_AFTER_LAST - 1
> > @@ -43,6 +44,7 @@ enum ncsi_nl_commands {
> >   * @NCSI_ATTR_PACKAGE_LIST: nested array of NCSI_PKG_ATTR attributes
> >   * @NCSI_ATTR_PACKAGE_ID: package ID
> >   * @NCSI_ATTR_CHANNEL_ID: channel ID
> > + * @NCSI_ATTR_DATA: command payload
> >   * @NCSI_ATTR_MAX: highest attribute number
> >   */
> >  enum ncsi_nl_attrs {
> > @@ -51,6 +53,7 @@ enum ncsi_nl_attrs {
> >  	NCSI_ATTR_PACKAGE_LIST,
> >  	NCSI_ATTR_PACKAGE_ID,
> >  	NCSI_ATTR_CHANNEL_ID,
> > +	NCSI_ATTR_DATA,
> >  
> >  	__NCSI_ATTR_AFTER_LAST,
> >  	NCSI_ATTR_MAX = __NCSI_ATTR_AFTER_LAST - 1
> > diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
> > index 3d0a33b..e9db100 100644
> > --- a/net/ncsi/internal.h
> > +++ b/net/ncsi/internal.h
> > @@ -175,6 +175,8 @@ struct ncsi_package;
> >  #define NCSI_RESERVED_CHANNEL	0x1f
> >  #define NCSI_CHANNEL_INDEX(c)	((c) & ((1 << NCSI_PACKAGE_SHIFT) - 1))
> >  #define NCSI_TO_CHANNEL(p, c)	(((p) << NCSI_PACKAGE_SHIFT) | (c))
> > +#define NCSI_MAX_PACKAGE	8
> > +#define NCSI_MAX_CHANNEL	32
> >  
> >  struct ncsi_channel {
> >  	unsigned char               id;
> > @@ -219,12 +221,17 @@ struct ncsi_request {
> >  	unsigned char        id;      /* Request ID - 0 to 255           */
> >  	bool                 used;    /* Request that has been assigned  */
> >  	unsigned int         flags;   /* NCSI request property           */
> > -#define NCSI_REQ_FLAG_EVENT_DRIVEN	1
> > +#define NCSI_REQ_FLAG_EVENT_DRIVEN		1
> > +#define NCSI_REQ_FLAG_NETLINK_DRIVEN	2
> >  	struct ncsi_dev_priv *ndp;    /* Associated NCSI device          */
> >  	struct sk_buff       *cmd;    /* Associated NCSI command packet  */
> >  	struct sk_buff       *rsp;    /* Associated NCSI response packet */
> >  	struct timer_list    timer;   /* Timer on waiting for response   */
> >  	bool                 enabled; /* Time has been enabled or not    */
> > +
> > +	u32                  snd_seq;     /* netlink sending sequence number */
> > +	u32                  snd_portid;  /* netlink portid of sender        */
> > +	struct nlmsghdr      nlhdr;       /* netlink message header          */
> >  };
> >  
> >  enum {
> > @@ -310,6 +317,7 @@ struct ncsi_cmd_arg {
> >  		unsigned int   dwords[4];
> >  	};
> >  	unsigned char        *data;       /* NCSI OEM data                 */
> > +	struct genl_info     *info;       /* Netlink information           */
> >  };
> >  
> >  extern struct list_head ncsi_dev_list;
> > diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
> > index 82b7d92..356af47 100644
> > --- a/net/ncsi/ncsi-cmd.c
> > +++ b/net/ncsi/ncsi-cmd.c
> > @@ -17,6 +17,7 @@
> >  #include <net/ncsi.h>
> >  #include <net/net_namespace.h>
> >  #include <net/sock.h>
> > +#include <net/genetlink.h>
> >  
> >  #include "internal.h"
> >  #include "ncsi-pkt.h"
> > @@ -346,6 +347,13 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
> >  	if (!nr)
> >  		return -ENOMEM;
> >  
> > +	/* track netlink information */
> > +	if (nca->req_flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
> > +		nr->snd_seq = nca->info->snd_seq;
> > +		nr->snd_portid = nca->info->snd_portid;
> > +		nr->nlhdr = *nca->info->nlhdr;
> > +	}
> > +
> >  	/* Prepare the packet */
> >  	nca->id = nr->id;
> >  	ret = nch->handler(nr->cmd, nca);
> > diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
> > index 0912847..76a4bcb 100644
> > --- a/net/ncsi/ncsi-manage.c
> > +++ b/net/ncsi/ncsi-manage.c
> > @@ -19,6 +19,7 @@
> >  #include <net/addrconf.h>
> >  #include <net/ipv6.h>
> >  #include <net/if_inet6.h>
> > +#include <net/genetlink.h>
> >  
> >  #include "internal.h"
> >  #include "ncsi-pkt.h"
> > @@ -406,6 +407,9 @@ static void ncsi_request_timeout(struct timer_list *t)
> >  {
> >  	struct ncsi_request *nr = from_timer(nr, t, timer);
> >  	struct ncsi_dev_priv *ndp = nr->ndp;
> > +	struct ncsi_package *np;
> > +	struct ncsi_channel *nc;
> > +	struct ncsi_cmd_pkt *cmd;
> >  	unsigned long flags;
> >  
> >  	/* If the request already had associated response,
> > @@ -419,6 +423,18 @@ static void ncsi_request_timeout(struct timer_list *t)
> >  	}
> >  	spin_unlock_irqrestore(&ndp->lock, flags);
> >  
> > +	if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
> > +		if (nr->cmd) {
> > +			/* Find the package */
> > +			cmd = (struct ncsi_cmd_pkt *)
> > +			      skb_network_header(nr->cmd);
> > +			ncsi_find_package_and_channel(ndp,
> > +						      cmd->cmd.common.channel,
> > +						      &np, &nc);
> > +			ncsi_send_netlink_timeout(nr, np, nc);
> > +		}
> > +	}
> > +
> >  	/* Release the request */
> >  	ncsi_free_request(nr);
> >  }
> > diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
> > index 45f33d6..3941bf6 100644
> > --- a/net/ncsi/ncsi-netlink.c
> > +++ b/net/ncsi/ncsi-netlink.c
> > @@ -20,6 +20,7 @@
> >  #include <uapi/linux/ncsi.h>
> >  
> >  #include "internal.h"
> > +#include "ncsi-pkt.h"
> >  #include "ncsi-netlink.h"
> >  
> >  static struct genl_family ncsi_genl_family;
> > @@ -29,6 +30,7 @@ static const struct nla_policy ncsi_genl_policy[NCSI_ATTR_MAX + 1] = {
> >  	[NCSI_ATTR_PACKAGE_LIST] =	{ .type = NLA_NESTED },
> >  	[NCSI_ATTR_PACKAGE_ID] =	{ .type = NLA_U32 },
> >  	[NCSI_ATTR_CHANNEL_ID] =	{ .type = NLA_U32 },
> > +	[NCSI_ATTR_DATA] =		{ .type = NLA_BINARY, .len = 2048 },
> >  };
> >  
> >  static struct ncsi_dev_priv *ndp_from_ifindex(struct net *net, u32 ifindex)
> > @@ -366,6 +368,202 @@ static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
> >  	return 0;
> >  }
> >  
> > +static int ncsi_send_cmd_nl(struct sk_buff *msg, struct genl_info *info)
> > +{
> > +	struct ncsi_dev_priv *ndp;
> > +
> > +	struct ncsi_cmd_arg nca;
> > +	struct ncsi_pkt_hdr *hdr;
> > +
> > +	u32 package_id, channel_id;
> > +	unsigned char *data;
> > +	int len, ret;
> > +
> > +	if (!info || !info->attrs) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	if (!info->attrs[NCSI_ATTR_IFINDEX]) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	if (!info->attrs[NCSI_ATTR_PACKAGE_ID]) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	if (!info->attrs[NCSI_ATTR_CHANNEL_ID]) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	ndp = ndp_from_ifindex(get_net(sock_net(msg->sk)),
> > +			       nla_get_u32(info->attrs[NCSI_ATTR_IFINDEX]));
> > +	if (!ndp) {
> > +		ret = -ENODEV;
> > +		goto out;
> > +	}
> > +
> > +	package_id = nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_ID]);
> > +	channel_id = nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_ID]);
> > +
> > +	if (package_id >= NCSI_MAX_PACKAGE || channel_id >= NCSI_MAX_CHANNEL) {
> > +		ret = -ERANGE;
> > +		goto out_netlink;
> > +	}
> > +
> > +	len = nla_len(info->attrs[NCSI_ATTR_DATA]);
> > +	if (len < sizeof(struct ncsi_pkt_hdr)) {
> > +		netdev_info(ndp->ndev.dev, "NCSI: no OEM command to send %u\n",
> > +			    package_id);
> 
> Technically we can send any command via this interface so saying "no OEM
> command" may be a little confusing as a message.
> 
> > +		ret = -EINVAL;
> > +		goto out_netlink;
> > +	} else {
> > +		data = (unsigned char *)nla_data(info->attrs[NCSI_ATTR_DATA]);
> > +	}
> > +
> > +	hdr = (struct ncsi_pkt_hdr *)data;
> > +
> > +	nca.ndp = ndp;
> > +	nca.package = (unsigned char)package_id;
> > +	nca.channel = (unsigned char)channel_id;
> > +	nca.type = hdr->type;
> > +	nca.req_flags = NCSI_REQ_FLAG_NETLINK_DRIVEN;
> > +	nca.info = info;
> > +	nca.payload = ntohs(hdr->length);
> > +	nca.data = data + sizeof(*hdr);
> > +
> > +	ret = ncsi_xmit_cmd(&nca);
> > +out_netlink:
> > +	if (ret != 0) {
> > +		netdev_err(ndp->ndev.dev,
> > +			   "NCSI: Error %d sending OEM command\n",
> > +			   ret);
> > +		ncsi_send_netlink_err(ndp->ndev.dev,
> > +				      info->snd_seq,
> > +				      info->snd_portid,
> > +				      info->nlhdr,
> > +				      ret);
> > +	}
> > +out:
> > +	return ret;
> > +}
> > +
> > +int ncsi_send_netlink_rsp(struct ncsi_request *nr,
> > +			  struct ncsi_package *np,
> > +			  struct ncsi_channel *nc)
> > +{
> > +	struct sk_buff *skb;
> > +	struct net *net;
> > +	void *hdr;
> > +	int rc;
> > +
> > +	netdev_dbg(nr->ndp->ndev.dev, "NCSI: %s\n", __func__);
> 
> We probably don't need this message.
> 
> > +
> > +	net = dev_net(nr->rsp->dev);
> > +
> > +	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
> > +	if (!skb)
> > +		return -ENOMEM;
> > +
> > +	hdr = genlmsg_put(skb, nr->snd_portid, nr->snd_seq,
> > +			  &ncsi_genl_family, 0, NCSI_CMD_SEND_CMD);
> > +	if (!hdr) {
> > +		kfree_skb(skb);
> > +		return -EMSGSIZE;
> > +	}
> > +
> > +	nla_put_u32(skb, NCSI_ATTR_IFINDEX, nr->rsp->dev->ifindex);
> > +	if (np)
> > +		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID, np->id);
> > +	if (nc)
> > +		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, nc->id);
> > +	else
> > +		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, NCSI_RESERVED_CHANNEL);
> > +
> > +	rc = nla_put(skb, NCSI_ATTR_DATA, nr->rsp->len, (void *)nr->rsp->data);
> > +	if (rc)
> > +		goto err;
> > +
> > +	genlmsg_end(skb, hdr);
> > +	return genlmsg_unicast(net, skb, nr->snd_portid);
> > +
> > +err:
> > +	kfree_skb(skb);
> > +	return rc;
> > +}
> > +
> > +int ncsi_send_netlink_timeout(struct ncsi_request *nr,
> > +			      struct ncsi_package *np,
> > +			      struct ncsi_channel *nc)
> > +{
> > +	struct sk_buff *skb;
> > +	struct net *net;
> > +	void *hdr;
> > +
> > +	netdev_dbg(nr->ndp->ndev.dev, "NCSI: %s\n", __func__);
> 
> Or this one.
> 
> > +
> > +	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
> > +	if (!skb)
> > +		return -ENOMEM;
> > +
> > +	hdr = genlmsg_put(skb, nr->snd_portid, nr->snd_seq,
> > +			  &ncsi_genl_family, 0, NCSI_CMD_SEND_CMD);
> > +	if (!hdr) {
> > +		kfree_skb(skb);
> > +		return -EMSGSIZE;
> > +	}
> > +
> > +	net = dev_net(nr->cmd->dev);
> > +
> > +	nla_put_u32(skb, NCSI_ATTR_IFINDEX, nr->cmd->dev->ifindex);
> > +
> > +	if (np)
> > +		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID, np->id);
> > +	else
> > +		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID,
> > +			    NCSI_PACKAGE_INDEX((((struct ncsi_pkt_hdr *)
> > +						 nr->cmd->data)->channel)));
> > +
> > +	if (nc)
> > +		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, nc->id);
> > +	else
> > +		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, NCSI_RESERVED_CHANNEL);
> > +
> > +	genlmsg_end(skb, hdr);
> > +	return genlmsg_unicast(net, skb, nr->snd_portid);
> > +}
> > +
> > +int ncsi_send_netlink_err(struct net_device *dev,
> > +			  u32 snd_seq,
> > +			  u32 snd_portid,
> > +			  struct nlmsghdr *nlhdr,
> > +			  int err)
> > +{
> > +	struct sk_buff *skb;
> > +	struct nlmsghdr *nlh;
> > +	struct nlmsgerr *nle;
> > +	struct net *net;
> > +
> > +	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
> > +	if (!skb)
> > +		return -ENOMEM;
> > +
> > +	net = dev_net(dev);
> > +
> > +	nlh = nlmsg_put(skb, snd_portid, snd_seq,
> > +			NLMSG_ERROR, sizeof(*nle), 0);
> > +	nle = (struct nlmsgerr *)nlmsg_data(nlh);
> > +	nle->error = err;
> > +	memcpy(&nle->msg, nlhdr, sizeof(*nlh));
> > +
> > +	nlmsg_end(skb, nlh);
> > +
> > +	return nlmsg_unicast(net->genl_sock, skb, snd_portid);
> > +}
> > +
> >  static const struct genl_ops ncsi_ops[] = {
> >  	{
> >  		.cmd = NCSI_CMD_PKG_INFO,
> > @@ -386,6 +584,12 @@ static const struct genl_ops ncsi_ops[] = {
> >  		.doit = ncsi_clear_interface_nl,
> >  		.flags = GENL_ADMIN_PERM,
> >  	},
> > +	{
> > +		.cmd = NCSI_CMD_SEND_CMD,
> > +		.policy = ncsi_genl_policy,
> > +		.doit = ncsi_send_cmd_nl,
> > +		.flags = GENL_ADMIN_PERM,
> > +	},
> >  };
> >  
> >  static struct genl_family ncsi_genl_family __ro_after_init = {
> > diff --git a/net/ncsi/ncsi-netlink.h b/net/ncsi/ncsi-netlink.h
> > index 91a5c25..c4a4688 100644
> > --- a/net/ncsi/ncsi-netlink.h
> > +++ b/net/ncsi/ncsi-netlink.h
> > @@ -14,6 +14,18 @@
> >  
> >  #include "internal.h"
> >  
> > +int ncsi_send_netlink_rsp(struct ncsi_request *nr,
> > +			  struct ncsi_package *np,
> > +			  struct ncsi_channel *nc);
> > +int ncsi_send_netlink_timeout(struct ncsi_request *nr,
> > +			      struct ncsi_package *np,
> > +			      struct ncsi_channel *nc);
> > +int ncsi_send_netlink_err(struct net_device *dev,
> > +			  u32 snd_seq,
> > +			  u32 snd_portid,
> > +			  struct nlmsghdr *nlhdr,
> > +			  int err);
> > +
> >  int ncsi_init_netlink(struct net_device *dev);
> >  int ncsi_unregister_netlink(struct net_device *dev);
> >  
> > diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
> > index d66b347..dd931d2 100644
> > --- a/net/ncsi/ncsi-rsp.c
> > +++ b/net/ncsi/ncsi-rsp.c
> > @@ -16,9 +16,11 @@
> >  #include <net/ncsi.h>
> >  #include <net/net_namespace.h>
> >  #include <net/sock.h>
> > +#include <net/genetlink.h>
> >  
> >  #include "internal.h"
> >  #include "ncsi-pkt.h"
> > +#include "ncsi-netlink.h"
> >  
> >  static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
> >  				 unsigned short payload)
> > @@ -32,15 +34,25 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
> >  	 * before calling this function.
> >  	 */
> >  	h = (struct ncsi_rsp_pkt_hdr *)skb_network_header(nr->rsp);
> > -	if (h->common.revision != NCSI_PKT_REVISION)
> > +
> > +	if (h->common.revision != NCSI_PKT_REVISION) {
> > +		netdev_dbg(nr->ndp->ndev.dev,
> > +			   "NCSI: unsupported header revision\n");
> >  		return -EINVAL;
> > -	if (ntohs(h->common.length) != payload)
> > +	}
> > +	if (ntohs(h->common.length) != payload) {
> > +		netdev_dbg(nr->ndp->ndev.dev,
> > +			   "NCSI: payload length mismatched\n");
> >  		return -EINVAL;
> > +	}
> >  
> >  	/* Check on code and reason */
> >  	if (ntohs(h->code) != NCSI_PKT_RSP_C_COMPLETED ||
> > -	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR)
> > -		return -EINVAL;
> > +	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR) {
> > +		netdev_dbg(nr->ndp->ndev.dev,
> > +			   "NCSI: non zero response/reason code\n");
> > +		return -EPERM;
> > +	}
> >  
> >  	/* Validate checksum, which might be zeroes if the
> >  	 * sender doesn't support checksum according to NCSI
> > @@ -52,8 +64,11 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
> >  
> >  	checksum = ncsi_calculate_checksum((unsigned char *)h,
> >  					   sizeof(*h) + payload - 4);
> > -	if (*pchecksum != htonl(checksum))
> > +
> > +	if (*pchecksum != htonl(checksum)) {
> > +		netdev_dbg(nr->ndp->ndev.dev, "NCSI: checksum mismatched\n");
> >  		return -EINVAL;
> > +	}
> >  
> >  	return 0;
> >  }
> > @@ -941,6 +956,26 @@ static int ncsi_rsp_handler_gpuuid(struct ncsi_request *nr)
> >  	return 0;
> >  }
> >  
> > +static int ncsi_rsp_handler_netlink(struct ncsi_request *nr)
> > +{
> > +	struct ncsi_rsp_pkt *rsp;
> > +	struct ncsi_dev_priv *ndp = nr->ndp;
> > +	struct ncsi_package *np;
> > +	struct ncsi_channel *nc;
> > +	int ret;
> > +
> > +	/* Find the package */
> > +	rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
> > +	ncsi_find_package_and_channel(ndp, rsp->rsp.common.channel,
> > +				      &np, &nc);
> > +	if (!np)
> > +		return -ENODEV;
> > +
> > +	ret = ncsi_send_netlink_rsp(nr, np, nc);
> > +
> > +	return ret;
> > +}
> > +
> >  static struct ncsi_rsp_handler {
> >  	unsigned char	type;
> >  	int             payload;
> > @@ -1043,6 +1078,17 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
> >  		netdev_warn(ndp->ndev.dev,
> >  			    "NCSI: 'bad' packet ignored for type 0x%x\n",
> >  			    hdr->type);
> > +
> > +		if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
> > +			if (ret == -EPERM)
> > +				goto out_netlink;
> > +			else
> > +				ncsi_send_netlink_err(ndp->ndev.dev,
> > +						      nr->snd_seq,
> > +						      nr->snd_portid,
> > +						      &nr->nlhdr,
> > +						      ret);
> > +		}
> >  		goto out;
> >  	}
> >  
> > @@ -1052,6 +1098,17 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
> >  		netdev_err(ndp->ndev.dev,
> >  			   "NCSI: Handler for packet type 0x%x returned %d\n",
> >  			   hdr->type, ret);
> > +
> > +out_netlink:
> > +	if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
> > +		ret = ncsi_rsp_handler_netlink(nr);
> > +		if (ret) {
> > +			netdev_err(ndp->ndev.dev,
> > +				   "NCSI: Netlink handler for packet type 0x%x returned %d\n",
> > +				   hdr->type, ret);
> > +		}
> > +	}
> > +
> >  out:
> >  	ncsi_free_request(nr);
> >  	return ret;
> > -- 
> > 2.9.3
> > 



^ permalink raw reply

* [net  1/1] tipc: queue socket protocol error messages into socket receive buffer
From: Jon Maloy @ 2018-10-10 15:50 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>

In tipc_sk_filter_rcv(), when we detect protocol messages with error we
call tipc_sk_conn_proto_rcv() and let it reset the connection and notify
the socket by calling sk->sk_state_change().

However, tipc_sk_filter_rcv() may have been called from the function
tipc_backlog_rcv(), in which case the socket lock is held and the socket
already awake. This means that the sk_state_change() call is ignored and
the error notification lost. Now the receive queue will remain empty and
the socket sleeps forever.

In this commit, we convert the protocol message into a connection abort
message and enqueue it into the socket's receive queue. By this addition
to the above state change we cover all conditions.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index b6f99b0..49810fd 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1196,6 +1196,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
  * @skb: pointer to message buffer.
  */
 static void tipc_sk_conn_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb,
+				   struct sk_buff_head *inputq,
 				   struct sk_buff_head *xmitq)
 {
 	struct tipc_msg *hdr = buf_msg(skb);
@@ -1213,7 +1214,16 @@ static void tipc_sk_conn_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb,
 		tipc_node_remove_conn(sock_net(sk), tsk_peer_node(tsk),
 				      tsk_peer_port(tsk));
 		sk->sk_state_change(sk);
-		goto exit;
+
+		/* State change is ignored if socket already awake,
+		 * - convert msg to abort msg and add to inqueue
+		 */
+		msg_set_user(hdr, TIPC_CRITICAL_IMPORTANCE);
+		msg_set_type(hdr, TIPC_CONN_MSG);
+		msg_set_size(hdr, BASIC_H_SIZE);
+		msg_set_hdr_sz(hdr, BASIC_H_SIZE);
+		__skb_queue_tail(inputq, skb);
+		return;
 	}
 
 	tsk->probe_unacked = false;
@@ -1936,7 +1946,7 @@ static void tipc_sk_proto_rcv(struct sock *sk,
 
 	switch (msg_user(hdr)) {
 	case CONN_MANAGER:
-		tipc_sk_conn_proto_rcv(tsk, skb, xmitq);
+		tipc_sk_conn_proto_rcv(tsk, skb, inputq, xmitq);
 		return;
 	case SOCK_WAKEUP:
 		tipc_dest_del(&tsk->cong_links, msg_orignode(hdr), 0);
-- 
2.1.4

^ permalink raw reply related

* [net  1/1] tipc: set link tolerance correctly in broadcast link
From: Jon Maloy @ 2018-10-10 15:34 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

In the patch referred to below we added link tolerance as an additional
criteria for declaring broadcast transmission "stale" and resetting the
affected links.

However, the 'tolerance' field of the broadcast link is never set, and
remains at zero. This renders the whole commit without the intended
improving effect, but luckily also with no negative effect.

In this commit we add the missing initialization.

Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet
retransmission")

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/link.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index fb886b5..d229a36 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -477,6 +477,8 @@ bool tipc_link_create(struct net *net, char *if_name, int bearer_id,
 	l->in_session = false;
 	l->bearer_id = bearer_id;
 	l->tolerance = tolerance;
+	if (bc_rcvlink)
+		bc_rcvlink->tolerance = tolerance;
 	l->net_plane = net_plane;
 	l->advertised_mtu = mtu;
 	l->mtu = mtu;
@@ -1031,7 +1033,7 @@ static int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r,
 	/* Detect repeated retransmit failures on same packet */
 	if (r->last_retransm != buf_seqno(skb)) {
 		r->last_retransm = buf_seqno(skb);
-		r->stale_limit = jiffies + msecs_to_jiffies(l->tolerance);
+		r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
 	} else if (++r->stale_cnt > 99 && time_after(jiffies, r->stale_limit)) {
 		link_retransmit_failure(l, skb);
 		if (link_is_bc_sndlink(l))
@@ -1576,9 +1578,10 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
 		strncpy(if_name, data, TIPC_MAX_IF_NAME);
 
 		/* Update own tolerance if peer indicates a non-zero value */
-		if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL))
+		if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL)) {
 			l->tolerance = peers_tol;
-
+			l->bc_rcvlink->tolerance = peers_tol;
+		}
 		/* Update own priority if peer's priority is higher */
 		if (in_range(peers_prio, l->priority + 1, TIPC_MAX_LINK_PRI))
 			l->priority = peers_prio;
@@ -1604,9 +1607,10 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
 		l->rcv_nxt_state = msg_seqno(hdr) + 1;
 
 		/* Update own tolerance if peer indicates a non-zero value */
-		if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL))
+		if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL)) {
 			l->tolerance = peers_tol;
-
+			l->bc_rcvlink->tolerance = peers_tol;
+		}
 		/* Update own prio if peer indicates a different value */
 		if ((peers_prio != l->priority) &&
 		    in_range(peers_prio, 1, TIPC_MAX_LINK_PRI)) {
@@ -2223,6 +2227,8 @@ void tipc_link_set_tolerance(struct tipc_link *l, u32 tol,
 			     struct sk_buff_head *xmitq)
 {
 	l->tolerance = tol;
+	if (l->bc_rcvlink)
+		l->bc_rcvlink->tolerance = tol;
 	if (link_is_up(l))
 		tipc_link_build_proto_msg(l, STATE_MSG, 0, 0, 0, tol, 0, xmitq);
 }
-- 
2.1.4

^ permalink raw reply related

* Re: Re: BUG: corrupted list in p9_read_work
From: Dominique Martinet @ 2018-10-10 22:55 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov,
	netdev, Ron Minnich, syzkaller-bugs, v9fs-developer
In-Reply-To: <CACT4Y+ayTAyRghU5VmJgBdKaHxNCyoXf0NxXLgMGQf8aiY37Tg@mail.gmail.com>

Dmitry Vyukov wrote on Wed, Oct 10, 2018:
> yeeeep, this is bug:
> https://github.com/google/syzkaller/issues/728

Yeah, it makes sense ; I just had to stumble on it once :)

> Turns out git fetch of a named remote and just a tree work
> differently. The latter only fetches the main branch.
> 
> 'git fetch <repo> <hash>' is it a thing? Is it something that requires
> special server configuration? I remember something similar that wasn't
> able to fetch a random commit hash all the time...

With my version of git (2.19.1); this works with a local tree (git fetch
/path/to/repo ref) and with a github remote, but not with a gitolite
remote ; I don't think it's safe to assume it'll always work, but it can
work.

> The plan was to make a named remote and then fetch it, this should
> fetch everything.

Yeah, that's less efficient but that'd fetch all named branches at
least; probably safer than what I tried :)

Anyway, the fix seems to be a hit, so cool!

Thanks!
-- 
Dominique

^ permalink raw reply

* Re: KASAN: use-after-free Read in sctp_id2assoc
From: Dmitry Vyukov @ 2018-10-10 15:28 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: syzbot, David Miller, LKML, linux-sctp, netdev, Neil Horman,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <20181005145855.GB6761@localhost.localdomain>

On Fri, Oct 5, 2018 at 4:58 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Thu, Oct 04, 2018 at 01:48:03AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    4e6d47206c32 tls: Add support for inplace records encryption
>> git tree:       net-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=13834b81400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=e569aa5632ebd436
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c7dd55d7aec49d48e49a
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
>>
>> netlink: 'syz-executor1': attribute type 1 has an invalid length.
>> ==================================================================
>> BUG: KASAN: use-after-free in sctp_id2assoc+0x3a7/0x3e0
>> net/sctp/socket.c:276
>> Read of size 8 at addr ffff880195b3eb20 by task syz-executor2/15454
>>
>> CPU: 1 PID: 15454 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #242
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
>>  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>>  kasan_report_error mm/kasan/report.c:354 [inline]
>>  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>>  sctp_id2assoc+0x3a7/0x3e0 net/sctp/socket.c:276
>
> I'm not seeing yet how this could happen.
> All sockopts here are serialized by sock_lock.
> do_peeloff here would create another socket, but the issue was
> triggered before that.
> The same function that freed this memory, also removes the entry from
> idr mapping, so this entry shouldn't be there anymore.
>
> I have only two theories so far:
> - an issue with IDR/RCU.
> - something else happened that just the call stacks are not revealing.

The "asoc->base.sk != sk" check after idr_find suggests that we don't
actually know what sock it belongs to. And if we don't know then
locking this sock can't help keeping another sock association alive.
Am I missing something obvious here? Should we take assoc ref while we
are still holding sctp_assocs_id_lock?

^ permalink raw reply

* Re: [PATCH net v2 1/2] net: ipv4: update fnhe_pmtu when first hop's MTU changes
From: David Ahern @ 2018-10-10 15:27 UTC (permalink / raw)
  To: Sabrina Dubroca, netdev; +Cc: Stefano Brivio
In-Reply-To: <9db7a0c06a26f2c152efd08e22b1d2ef80e977c1.1539073548.git.sd@queasysnail.net>

On 10/9/18 9:48 AM, Sabrina Dubroca wrote:
> Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop
> exceptions"), exceptions get deprecated separately from cached
> routes. In particular, administrative changes don't clear PMTU anymore.
> 
> As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes
> on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
> the local MTU change can become stale:
>  - if the local MTU is now lower than the PMTU, that PMTU is now
>    incorrect
>  - if the local MTU was the lowest value in the path, and is increased,
>    we might discover a higher PMTU
> 
> Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those
> cases.
> 
> If the exception was locked, the discovered PMTU was smaller than the
> minimal accepted PMTU. In that case, if the new local MTU is smaller
> than the current PMTU, let PMTU discovery figure out if locking of the
> exception is still needed.
> 
> To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
> notifier. By the time the notifier is called, dev->mtu has been
> changed. This patch adds the old MTU as additional information in the
> notifier structure, and a new call_netdevice_notifiers_u32() function.
> 
> Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> v2:
>  - s/u32/mtu/ in netdev_notifier_info_ext and call_netdevice_notifiers_
>    helper, suggested by David Ahern
>  - don't EXPORT_SYMBOL the helper, it's only used in net/core/dev.c
>  - fix typo in commit message
>  - fix kerneldoc comment, spotted by kbuild bot
> 
>  include/linux/netdevice.h |  7 ++++++
>  include/net/ip_fib.h      |  1 +
>  net/core/dev.c            | 28 ++++++++++++++++++++--
>  net/ipv4/fib_frontend.c   | 12 ++++++----
>  net/ipv4/fib_semantics.c  | 50 +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 92 insertions(+), 6 deletions(-)

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* RE: [RFC PATCH 2/2] net/ncsi: Configure multi-package, multi-channel modes with failover
From: Justin.Lee1 @ 2018-10-10 22:36 UTC (permalink / raw)
  To: sam, netdev; +Cc: davem, linux-kernel, openbmc
In-Reply-To: <20181009035815.5246-2-sam@mendozajonas.com>

Hi Samuel,

I am still testing your change and have some comments below.

Thanks,
Justin


> This patch extends the ncsi-netlink interface with two new commands and
> three new attributes to configure multiple packages and/or channels at
> once, and configure specific failover modes.
> 
> NCSI_CMD_SET_PACKAGE mask and NCSI_CMD_SET_CHANNEL_MASK set a whitelist
> of packages or channels allowed to be configured with the
> NCSI_ATTR_PACKAGE_MASK and NCSI_ATTR_CHANNEL_MASK attributes
> respectively. If one of these whitelists is set only packages or
> channels matching the whitelist are considered for the channel queue in
> ncsi_choose_active_channel().
> 
> These commands may also use the NCSI_ATTR_MULTI_FLAG to signal that
> multiple packages or channels may be configured simultaneously. NCSI
> hardware arbitration (HWA) must be available in order to enable
> multi-package mode. Multi-channel mode is always available.
> 
> If the NCSI_ATTR_CHANNEL_ID attribute is present in the
> NCSI_CMD_SET_CHANNEL_MASK command the it sets the preferred channel as
> with the NCSI_CMD_SET_INTERFACE command. The combination of preferred
> channel and channel whitelist defines a primary channel and the allowed
> failover channels.
> If the NCSI_ATTR_MULTI_FLAG attribute is also present then the preferred
> channel is configured for Tx/Rx and the other channels are enabled only
> for Rx.
> 
> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
> ---
>  include/uapi/linux/ncsi.h |  16 +++
>  net/ncsi/internal.h       |  11 +-
>  net/ncsi/ncsi-aen.c       |   2 +-
>  net/ncsi/ncsi-manage.c    | 138 ++++++++++++++++--------
>  net/ncsi/ncsi-netlink.c   | 217 +++++++++++++++++++++++++++++++++-----
>  net/ncsi/ncsi-rsp.c       |   2 +-
>  6 files changed, 312 insertions(+), 74 deletions(-)
> 
> diff --git a/include/uapi/linux/ncsi.h b/include/uapi/linux/ncsi.h
> index 4c292ecbb748..035fba1693f9 100644
> --- a/include/uapi/linux/ncsi.h
> +++ b/include/uapi/linux/ncsi.h
> @@ -23,6 +23,13 @@
>   *	optionally the preferred NCSI_ATTR_CHANNEL_ID.
>   * @NCSI_CMD_CLEAR_INTERFACE: clear any preferred package/channel combination.
>   *	Requires NCSI_ATTR_IFINDEX.
> + * @NCSI_CMD_SET_PACKAGE_MASK: set a whitelist of allowed packages.
> + * @NCSI_CMD_SET_PACKAGE_MASK: set a whitelist of allowed channels.
> + *	Requires NCSI_ATTR_IFINDEX and NCSI_ATTR_PACKAGE_MASK.
> + * @NCSI_CMD_SET_PACKAGE_MASK: set a whitelist of allowed channels.
> + *	Requires NCSI_ATTR_IFINDEX, NCSI_ATTR_PACKAGE_ID, and
> + *	NCSI_ATTR_CHANNEL_MASK. If NCSI_ATTR_CHANNEL_ID is present it sets
> + *	the primary channel.
>   * @NCSI_CMD_MAX: highest command number
>   */

There are some typo in the description.
* @NCSI_CMD_SET_PACKAGE_MASK: set a whitelist of allowed packages.
 *	Requires NCSI_ATTR_IFINDEX and NCSI_ATTR_PACKAGE_MASK.
 * @NCSI_CMD_SET_CHANNEL_MASK: set a whitelist of allowed channels.
 *	Requires NCSI_ATTR_IFINDEX, NCSI_ATTR_PACKAGE_ID, and
 *	NCSI_ATTR_CHANNEL_MASK. If NCSI_ATTR_CHANNEL_ID is present it sets
 *	the primary channel.

>  enum ncsi_nl_commands {
> @@ -30,6 +37,8 @@ enum ncsi_nl_commands {
>  	NCSI_CMD_PKG_INFO,
>  	NCSI_CMD_SET_INTERFACE,
>  	NCSI_CMD_CLEAR_INTERFACE,
> +	NCSI_CMD_SET_PACKAGE_MASK,
> +	NCSI_CMD_SET_CHANNEL_MASK,
>  
>  	__NCSI_CMD_AFTER_LAST,
>  	NCSI_CMD_MAX = __NCSI_CMD_AFTER_LAST - 1
> @@ -43,6 +52,10 @@ enum ncsi_nl_commands {
>   * @NCSI_ATTR_PACKAGE_LIST: nested array of NCSI_PKG_ATTR attributes
>   * @NCSI_ATTR_PACKAGE_ID: package ID
>   * @NCSI_ATTR_CHANNEL_ID: channel ID
> + * @NCSI_ATTR_MULTI_FLAG: flag to signal that multi-mode should be enabled with
> + *	NCSI_CMD_SET_PACKAGE_MASK or NCSI_CMD_SET_CHANNEL_MASK.
> + * @NCSI_ATTR_PACKAGE_MASK: 32-bit mask of allowed packages.
> + * @NCSI_ATTR_CHANNEL_MASK: 32-bit mask of allowed channels.
>   * @NCSI_ATTR_MAX: highest attribute number
>   */
>  enum ncsi_nl_attrs {
> @@ -51,6 +64,9 @@ enum ncsi_nl_attrs {
>  	NCSI_ATTR_PACKAGE_LIST,
>  	NCSI_ATTR_PACKAGE_ID,
>  	NCSI_ATTR_CHANNEL_ID,
> +	NCSI_ATTR_MULTI_FLAG,
> +	NCSI_ATTR_PACKAGE_MASK,
> +	NCSI_ATTR_CHANNEL_MASK,

Is there a case that we might set these two masks at the same time?
If not, maybe we can just have one generic MASK attribute.

>  
>  	__NCSI_ATTR_AFTER_LAST,
>  	NCSI_ATTR_MAX = __NCSI_ATTR_AFTER_LAST - 1
> diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
> index 3d0a33b874f5..8437474d0a78 100644
> --- a/net/ncsi/internal.h
> +++ b/net/ncsi/internal.h
> @@ -213,6 +213,10 @@ struct ncsi_package {
>  	unsigned int         channel_num; /* Number of channels     */
>  	struct list_head     channels;    /* List of chanels        */
>  	struct list_head     node;        /* Form list of packages  */
> +
> +	bool                 multi_channel; /* Enable multiple channels  */
> +	u32                  channel_whitelist; /* Channels to configure */
> +	struct ncsi_channel  *preferred_channel; /* Primary channel      */
>  };
>  
>  struct ncsi_request {
> @@ -280,8 +284,6 @@ struct ncsi_dev_priv {
>  	unsigned int        package_num;     /* Number of packages         */
>  	struct list_head    packages;        /* List of packages           */
>  	struct ncsi_channel *hot_channel;    /* Channel was ever active    */
> -	struct ncsi_package *force_package;  /* Force a specific package   */
> -	struct ncsi_channel *force_channel;  /* Force a specific channel   */
>  	struct ncsi_request requests[256];   /* Request table              */
>  	unsigned int        request_id;      /* Last used request ID       */
>  #define NCSI_REQ_START_IDX	1
> @@ -294,6 +296,9 @@ struct ncsi_dev_priv {
>  	struct list_head    node;            /* Form NCSI device list      */
>  #define NCSI_MAX_VLAN_VIDS	15
>  	struct list_head    vlan_vids;       /* List of active VLAN IDs */
> +
> +	bool                multi_package;   /* Enable multiple packages   */
> +	u32                 package_whitelist; /* Packages to configure    */
>  };
>  
>  struct ncsi_cmd_arg {
> @@ -345,6 +350,8 @@ struct ncsi_request *ncsi_alloc_request(struct ncsi_dev_priv *ndp,
>  void ncsi_free_request(struct ncsi_request *nr);
>  struct ncsi_dev *ncsi_find_dev(struct net_device *dev);
>  int ncsi_process_next_channel(struct ncsi_dev_priv *ndp);
> +bool ncsi_channel_is_last(struct ncsi_dev_priv *ndp,
> +			  struct ncsi_channel *channel);
>  
>  /* Packet handlers */
>  u32 ncsi_calculate_checksum(unsigned char *data, int len);
> diff --git a/net/ncsi/ncsi-aen.c b/net/ncsi/ncsi-aen.c
> index 65f47a648be3..eac56aee30c4 100644
> --- a/net/ncsi/ncsi-aen.c
> +++ b/net/ncsi/ncsi-aen.c
> @@ -86,7 +86,7 @@ static int ncsi_aen_handler_lsc(struct ncsi_dev_priv *ndp,
>  	    !(state == NCSI_CHANNEL_ACTIVE && !(data & 0x1)))
>  		return 0;
>  
> -	if (state == NCSI_CHANNEL_ACTIVE)
> +	if (state == NCSI_CHANNEL_ACTIVE && ncsi_channel_is_last(ndp, nc))
>  		ndp->flags |= NCSI_DEV_RESHUFFLE;
>  
>  	ncsi_stop_channel_monitor(nc);
> diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
> index 665bee25ec44..6a55df700bcb 100644
> --- a/net/ncsi/ncsi-manage.c
> +++ b/net/ncsi/ncsi-manage.c
> @@ -27,6 +27,24 @@
>  LIST_HEAD(ncsi_dev_list);
>  DEFINE_SPINLOCK(ncsi_dev_lock);
>  
> +/* Returns true if the given channel is the last channel available */
> +bool ncsi_channel_is_last(struct ncsi_dev_priv *ndp,
> +			  struct ncsi_channel *channel)
> +{
> +	struct ncsi_package *np;
> +	struct ncsi_channel *nc;
> +
> +	NCSI_FOR_EACH_PACKAGE(ndp, np)
> +		NCSI_FOR_EACH_CHANNEL(np, nc) {
> +			if (nc == channel)
> +				continue;
> +			if (nc->state == NCSI_CHANNEL_ACTIVE)
> +				return false;
> +		}
> +
> +	return true;
> +}
> +
>  static void ncsi_report_link(struct ncsi_dev_priv *ndp, bool force_down)
>  {
>  	struct ncsi_dev *nd = &ndp->ndev;
> @@ -266,6 +284,7 @@ struct ncsi_package *ncsi_add_package(struct ncsi_dev_priv *ndp,
>  	np->ndp = ndp;
>  	spin_lock_init(&np->lock);
>  	INIT_LIST_HEAD(&np->channels);
> +	np->channel_whitelist = UINT_MAX;
>  
>  	spin_lock_irqsave(&ndp->lock, flags);
>  	tmp = ncsi_find_package(ndp, id);
> @@ -633,6 +652,34 @@ static int set_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
>  	return 0;
>  }
>  
> +/* Determine if a given channel should be the Tx channel */
> +bool ncsi_channel_is_tx(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc)
> +{
> +	struct ncsi_package *np = nc->package;
> +	struct ncsi_channel *channel;
> +	struct ncsi_channel_mode *ncm;
> +
> +	NCSI_FOR_EACH_CHANNEL(np, channel) {
> +		ncm = &channel->modes[NCSI_MODE_TX_ENABLE];
> +		/* Another channel is already Tx */
> +		if (ncm->enable)
> +			return false;
> +	}
> +
> +	if (!np->preferred_channel)
> +		return true;
> +
> +	if (np->preferred_channel == nc)
> +		return true;
> +
> +	/* The preferred channel is not in the queue and not active */
> +	if (list_empty(&np->preferred_channel->link) &&
> +	    np->preferred_channel->state != NCSI_CHANNEL_ACTIVE)
> +		return true;
> +
> +	return false;
> +}
> +
>  static void ncsi_configure_channel(struct ncsi_dev_priv *ndp)
>  {
>  	struct ncsi_dev *nd = &ndp->ndev;
> @@ -745,18 +792,22 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp)
>  		} else if (nd->state == ncsi_dev_state_config_ebf) {
>  			nca.type = NCSI_PKT_CMD_EBF;
>  			nca.dwords[0] = nc->caps[NCSI_CAP_BC].cap;
> -			nd->state = ncsi_dev_state_config_ecnt;
> +			if (ncsi_channel_is_tx(ndp, nc))
> +				nd->state = ncsi_dev_state_config_ecnt;
> +			else
> +				nd->state = ncsi_dev_state_config_ec;
>  #if IS_ENABLED(CONFIG_IPV6)
>  			if (ndp->inet6_addr_num > 0 &&
>  			    (nc->caps[NCSI_CAP_GENERIC].cap &
>  			     NCSI_CAP_GENERIC_MC))
>  				nd->state = ncsi_dev_state_config_egmf;
> -			else
> -				nd->state = ncsi_dev_state_config_ecnt;
>  		} else if (nd->state == ncsi_dev_state_config_egmf) {
>  			nca.type = NCSI_PKT_CMD_EGMF;
>  			nca.dwords[0] = nc->caps[NCSI_CAP_MC].cap;
> -			nd->state = ncsi_dev_state_config_ecnt;
> +			if (ncsi_channel_is_tx(ndp, nc))
> +				nd->state = ncsi_dev_state_config_ecnt;
> +			else
> +				nd->state = ncsi_dev_state_config_ec;
>  #endif /* CONFIG_IPV6 */
>  		} else if (nd->state == ncsi_dev_state_config_ecnt) {
>  			nca.type = NCSI_PKT_CMD_ECNT;
> @@ -840,43 +891,35 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp)
>  
>  static int ncsi_choose_active_channel(struct ncsi_dev_priv *ndp)
>  {
> -	struct ncsi_package *np, *force_package;
> -	struct ncsi_channel *nc, *found, *hot_nc, *force_channel;
> +	struct ncsi_package *np;
> +	struct ncsi_channel *nc, *found, *hot_nc;
>  	struct ncsi_channel_mode *ncm;
> -	unsigned long flags;
> +	unsigned long flags, cflags;
> +	bool with_link;
>  
>  	spin_lock_irqsave(&ndp->lock, flags);
>  	hot_nc = ndp->hot_channel;
> -	force_channel = ndp->force_channel;
> -	force_package = ndp->force_package;
>  	spin_unlock_irqrestore(&ndp->lock, flags);
>  
> -	/* Force a specific channel whether or not it has link if we have been
> -	 * configured to do so
> -	 */
> -	if (force_package && force_channel) {
> -		found = force_channel;
> -		ncm = &found->modes[NCSI_MODE_LINK];
> -		if (!(ncm->data[2] & 0x1))
> -			netdev_info(ndp->ndev.dev,
> -				    "NCSI: Channel %u forced, but it is link down\n",
> -				    found->id);
> -		goto out;
> -	}
> -
> -	/* The search is done once an inactive channel with up
> -	 * link is found.
> +	/* By default the search is done once an inactive channel with up
> +	 * link is found, unless a preferred channel is set.
> +	 * If multi_package or multi_channel are configured all channels in the
> +	 * whitelist with link are added to the channel queue.
>  	 */
>  	found = NULL;
> +	with_link = false;
>  	NCSI_FOR_EACH_PACKAGE(ndp, np) {
> -		if (ndp->force_package && np != ndp->force_package)
> +		if (!(ndp->package_whitelist & (0x1 << np->id)))
>  			continue;
>  		NCSI_FOR_EACH_CHANNEL(np, nc) {
> -			spin_lock_irqsave(&nc->lock, flags);
> +			if (!(np->channel_whitelist & (0x1 << nc->id)))
> +				continue;
> +
> +			spin_lock_irqsave(&nc->lock, cflags);
>  
>  			if (!list_empty(&nc->link) ||
>  			    nc->state != NCSI_CHANNEL_INACTIVE) {
> -				spin_unlock_irqrestore(&nc->lock, flags);
> +				spin_unlock_irqrestore(&nc->lock, cflags);
>  				continue;
>  			}
>  
> @@ -888,32 +931,42 @@ static int ncsi_choose_active_channel(struct ncsi_dev_priv *ndp)
>  
>  			ncm = &nc->modes[NCSI_MODE_LINK];
>  			if (ncm->data[2] & 0x1) {

This data will not be updated if the channel monitor for it is not running.
If I move the cable from the current configured channel to the other channel,
NC-SI module will not detect the link status as the other channel is not configured
and AEN will not happen.
Is it per design that NC-SI module will always use the first interface with the link?

> -				spin_unlock_irqrestore(&nc->lock, flags);
>  				found = nc;
> -				goto out;
> +				with_link = true;
> +
> +				spin_lock_irqsave(&ndp->lock, flags);
> +				list_add_tail_rcu(&found->link,
> +						  &ndp->channel_queue);
> +				spin_unlock_irqrestore(&ndp->lock, flags);
> +
> +				netdev_dbg(ndp->ndev.dev,
> +					   "NCSI: Channel %u added to queue (link %s)\n",
> +					   found->id,
> +					   ncm->data[2] & 0x1 ? "up" : "down");
>  			}
> +			spin_unlock_irqrestore(&nc->lock, cflags);
>  
> -			spin_unlock_irqrestore(&nc->lock, flags);
> +			if (with_link && !np->multi_channel)
> +				break;
>  		}
> +		if (with_link && !ndp->multi_package)
> +			break;
>  	}
>  
> -	if (!found) {
> +	if (!with_link && found) {
> +		netdev_info(ndp->ndev.dev,
> +			    "NCSI: No channel with link found, configuring channel %u\n",
> +			    found->id);
> +		spin_lock_irqsave(&ndp->lock, flags);
> +		list_add_tail_rcu(&found->link, &ndp->channel_queue);
> +		spin_unlock_irqrestore(&ndp->lock, flags);
> +	} else if (!found) {
>  		netdev_warn(ndp->ndev.dev,
> -			    "NCSI: No channel found with link\n");
> +			    "NCSI: No channel found to configure!\n");
>  		ncsi_report_link(ndp, true);
>  		return -ENODEV;
>  	}
>  
> -	ncm = &found->modes[NCSI_MODE_LINK];
> -	netdev_dbg(ndp->ndev.dev,
> -		   "NCSI: Channel %u added to queue (link %s)\n",
> -		   found->id, ncm->data[2] & 0x1 ? "up" : "down");
> -
> -out:
> -	spin_lock_irqsave(&ndp->lock, flags);
> -	list_add_tail_rcu(&found->link, &ndp->channel_queue);
> -	spin_unlock_irqrestore(&ndp->lock, flags);
> -
>  	return ncsi_process_next_channel(ndp);
>  }
>  
> @@ -1428,6 +1481,7 @@ struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
>  	INIT_LIST_HEAD(&ndp->channel_queue);
>  	INIT_LIST_HEAD(&ndp->vlan_vids);
>  	INIT_WORK(&ndp->work, ncsi_dev_work);
> +	ndp->package_whitelist = UINT_MAX;
>  
>  	/* Initialize private NCSI device */
>  	spin_lock_init(&ndp->lock);
> diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
> index 32cb7751d216..33a091e6f466 100644
> --- a/net/ncsi/ncsi-netlink.c
> +++ b/net/ncsi/ncsi-netlink.c

Is the following missed in the patch?
static const struct nla_policy ncsi_genl_policy[NCSI_ATTR_MAX + 1] = {
...
	[NCSI_ATTR_MULTI_FLAG] =	{ .type = NLA_FLAG },
	[NCSI_ATTR_PACKAGE_MASK] =	{ .type = NLA_U32 },
	[NCSI_ATTR_CHANNEL_MASK] =	{ .type = NLA_U32 },

> @@ -67,7 +67,7 @@ static int ncsi_write_channel_info(struct sk_buff *skb,
>  	nla_put_u32(skb, NCSI_CHANNEL_ATTR_LINK_STATE, m->data[2]);
>  	if (nc->state == NCSI_CHANNEL_ACTIVE)
>  		nla_put_flag(skb, NCSI_CHANNEL_ATTR_ACTIVE);
> -	if (ndp->force_channel == nc)
> +	if (nc == nc->package->preferred_channel)
>  		nla_put_flag(skb, NCSI_CHANNEL_ATTR_FORCED);
>  
>  	nla_put_u32(skb, NCSI_CHANNEL_ATTR_VERSION_MAJOR, nc->version.version);
> @@ -112,7 +112,7 @@ static int ncsi_write_package_info(struct sk_buff *skb,
>  		if (!pnest)
>  			return -ENOMEM;
>  		nla_put_u32(skb, NCSI_PKG_ATTR_ID, np->id);
> -		if (ndp->force_package == np)
> +		if ((0x1 << np->id) == ndp->package_whitelist)
>  			nla_put_flag(skb, NCSI_PKG_ATTR_FORCED);
>  		cnest = nla_nest_start(skb, NCSI_PKG_ATTR_CHANNEL_LIST);
>  		if (!cnest) {
> @@ -288,45 +288,54 @@ static int ncsi_set_interface_nl(struct sk_buff *msg, struct genl_info *info)
>  	package_id = nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_ID]);
>  	package = NULL;
>  
> -	spin_lock_irqsave(&ndp->lock, flags);
> -
>  	NCSI_FOR_EACH_PACKAGE(ndp, np)
>  		if (np->id == package_id)
>  			package = np;
>  	if (!package) {
>  		/* The user has set a package that does not exist */
> -		spin_unlock_irqrestore(&ndp->lock, flags);
>  		return -ERANGE;
>  	}
>  
>  	channel = NULL;
> -	if (!info->attrs[NCSI_ATTR_CHANNEL_ID]) {
> -		/* Allow any channel */
> -		channel_id = NCSI_RESERVED_CHANNEL;
> -	} else {
> +	if (info->attrs[NCSI_ATTR_CHANNEL_ID]) {
>  		channel_id = nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_ID]);
>  		NCSI_FOR_EACH_CHANNEL(package, nc)
> -			if (nc->id == channel_id)
> +			if (nc->id == channel_id) {
>  				channel = nc;
> +				break;
> +			}
> +		if (!channel) {
> +			netdev_info(ndp->ndev.dev,
> +				    "NCSI: Channel %u does not exist!\n",
> +				    channel_id);
> +			return -ERANGE;
> +		}
>  	}
>  
> -	if (channel_id != NCSI_RESERVED_CHANNEL && !channel) {
> -		/* The user has set a channel that does not exist on this
> -		 * package
> -		 */
> -		spin_unlock_irqrestore(&ndp->lock, flags);
> -		netdev_info(ndp->ndev.dev, "NCSI: Channel %u does not exist!\n",
> -			    channel_id);
> -		return -ERANGE;
> -	}
> -
> -	ndp->force_package = package;
> -	ndp->force_channel = channel;
> +	spin_lock_irqsave(&ndp->lock, flags);
> +	ndp->package_whitelist = 0x1 << package->id;
> +	ndp->multi_package = false;
>  	spin_unlock_irqrestore(&ndp->lock, flags);
>  
> -	netdev_info(ndp->ndev.dev, "Set package 0x%x, channel 0x%x%s as preferred\n",
> -		    package_id, channel_id,
> -		    channel_id == NCSI_RESERVED_CHANNEL ? " (any)" : "");
> +	spin_lock_irqsave(&package->lock, flags);
> +	package->multi_channel = false;
> +	if (channel) {
> +		package->channel_whitelist = 0x1 << channel->id;
> +		package->preferred_channel = channel;
> +	} else {
> +		/* Allow any channel */
> +		package->channel_whitelist = UINT_MAX;
> +		package->preferred_channel = NULL;
> +	}
> +	spin_unlock_irqrestore(&package->lock, flags);
> +
> +	if (channel)
> +		netdev_info(ndp->ndev.dev,
> +			    "Set package 0x%x, channel 0x%x as preferred\n",
> +			    package_id, channel_id);
> +	else
> +		netdev_info(ndp->ndev.dev, "Set package 0x%x as preferred\n",
> +			    package_id);
>  
>  	/* Bounce the NCSI channel to set changes */
>  	ncsi_stop_dev(&ndp->ndev);
> @@ -338,6 +347,7 @@ static int ncsi_set_interface_nl(struct sk_buff *msg, struct genl_info *info)
>  static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
>  {
>  	struct ncsi_dev_priv *ndp;
> +	struct ncsi_package *np;
>  	unsigned long flags;
>  
>  	if (!info || !info->attrs)
> @@ -351,11 +361,19 @@ static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
>  	if (!ndp)
>  		return -ENODEV;
>  
> -	/* Clear any override */
> +	/* Reset any whitelists and disable multi mode */
>  	spin_lock_irqsave(&ndp->lock, flags);
> -	ndp->force_package = NULL;
> -	ndp->force_channel = NULL;
> +	ndp->package_whitelist = UINT_MAX;
> +	ndp->multi_package = false;
>  	spin_unlock_irqrestore(&ndp->lock, flags);
> +
> +	NCSI_FOR_EACH_PACKAGE(ndp, np) {
> +		spin_lock_irqsave(&np->lock, flags);
> +		np->multi_channel = false;
> +		np->channel_whitelist = UINT_MAX;
> +		np->preferred_channel = NULL;
> +		spin_unlock_irqrestore(&np->lock, flags);
> +	}
>  	netdev_info(ndp->ndev.dev, "NCSI: Cleared preferred package/channel\n");
>  
>  	/* Bounce the NCSI channel to set changes */
> @@ -365,6 +383,137 @@ static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
>  	return 0;
>  }
>  
> +static int ncsi_set_package_mask_nl(struct sk_buff *msg,
> +				    struct genl_info *info)
> +{
> +	struct ncsi_dev_priv *ndp;
> +	unsigned long flags;
> +	int rc;
> +
> +	if (!info || !info->attrs)
> +		return -EINVAL;
> +
> +	if (!info->attrs[NCSI_ATTR_IFINDEX])
> +		return -EINVAL;
> +
> +	if (!info->attrs[NCSI_ATTR_PACKAGE_MASK])
> +		return -EINVAL;
> +
> +	ndp = ndp_from_ifindex(get_net(sock_net(msg->sk)),
> +			       nla_get_u32(info->attrs[NCSI_ATTR_IFINDEX]));
> +	if (!ndp)
> +		return -ENODEV;
> +
> +	spin_lock_irqsave(&ndp->lock, flags);
> +	ndp->package_whitelist =
> +		nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_MASK]);
> +
> +	if (nla_get_flag(info->attrs[NCSI_ATTR_MULTI_FLAG])) {
> +		if (ndp->flags & NCSI_DEV_HWA) {
> +			ndp->multi_package = true;
> +			rc = 0;
> +		} else {
> +			netdev_err(ndp->ndev.dev,
> +				   "NCSI: Can't use multiple packages without HWA\n");
> +			rc = -EPERM;
> +		}
> +	} else {
> +		rc = 0;
> +	}
> +
> +	spin_unlock_irqrestore(&ndp->lock, flags);
> +
> +	if (!rc) {
> +		/* Bounce the NCSI channel to set changes */
> +		ncsi_stop_dev(&ndp->ndev);
> +		ncsi_start_dev(&ndp->ndev);

Is it possible to delay the restart? If we have two packages, we might send
set_package_mask command once and set_channel_mask command twice.
We will see the unnecessary reconfigurations in a very short period time.

> +	}
> +
> +	return rc;
> +}
> +
> +static int ncsi_set_channel_mask_nl(struct sk_buff *msg,
> +				    struct genl_info *info)
> +{
> +	struct ncsi_package *np, *package;
> +	struct ncsi_channel *nc, *channel;
> +	struct ncsi_dev_priv *ndp;
> +	unsigned long flags;
> +	u32 package_id, channel_id;
> +
> +	if (!info || !info->attrs)
> +		return -EINVAL;
> +
> +	if (!info->attrs[NCSI_ATTR_IFINDEX])
> +		return -EINVAL;
> +
> +	if (!info->attrs[NCSI_ATTR_PACKAGE_ID])
> +		return -EINVAL;
> +
> +	if (!info->attrs[NCSI_ATTR_CHANNEL_MASK])
> +		return -EINVAL;
> +
> +	ndp = ndp_from_ifindex(get_net(sock_net(msg->sk)),
> +			       nla_get_u32(info->attrs[NCSI_ATTR_IFINDEX]));
> +	if (!ndp)
> +		return -ENODEV;
> +
> +	package_id = nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_ID]);
> +	package = NULL;
> +	NCSI_FOR_EACH_PACKAGE(ndp, np)
> +		if (np->id == package_id) {
> +			package = np;
> +			break;
> +		}
> +	if (!package)
> +		return -ERANGE;
> +
> +	spin_lock_irqsave(&package->lock, flags);
> +
> +	channel = NULL;
> +	if (info->attrs[NCSI_ATTR_CHANNEL_ID]) {
> +		channel_id = nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_ID]);
> +		NCSI_FOR_EACH_CHANNEL(np, nc)
> +			if (nc->id == channel_id) {
> +				channel = nc;
> +				break;
> +			}
> +		if (!channel) {
> +			spin_unlock_irqrestore(&package->lock, flags);
> +			return -ERANGE;
> +		}
> +		netdev_dbg(ndp->ndev.dev,
> +			   "NCSI: Channel %u set as preferred channel\n",
> +			   channel->id);
> +	}
> +
> +	package->channel_whitelist =
> +		nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_MASK]);
> +	if (package->channel_whitelist == 0)
> +		netdev_dbg(ndp->ndev.dev,
> +			   "NCSI: Package %u set to all channels disabled\n",
> +			   package->id);
> +
> +	package->preferred_channel = channel;
> +
> +	if (nla_get_flag(info->attrs[NCSI_ATTR_MULTI_FLAG])) {
> +		package->multi_channel = true;
> +		netdev_info(ndp->ndev.dev,
> +			    "NCSI: Multi-channel enabled on package %u\n",
> +			    package_id);
> +	} else {
> +		package->multi_channel = false;
> +	}
> +
> +	spin_unlock_irqrestore(&package->lock, flags);
> +
> +	/* Bounce the NCSI channel to set changes */
> +	ncsi_stop_dev(&ndp->ndev);
> +	ncsi_start_dev(&ndp->ndev);

Same question as set_package_mask function.
Is it possible to delay the restart? If we have two packages, we might send
set_package_mask command once and set_channel_mask command twice.
We will see the unnecessary reconfigurations in a very short period time.

> +
> +	return 0;
> +}
> +
>  static const struct genl_ops ncsi_ops[] = {
>  	{
>  		.cmd = NCSI_CMD_PKG_INFO,
> @@ -385,6 +534,18 @@ static const struct genl_ops ncsi_ops[] = {
>  		.doit = ncsi_clear_interface_nl,
>  		.flags = GENL_ADMIN_PERM,
>  	},
> +	{
> +		.cmd = NCSI_CMD_SET_PACKAGE_MASK,
> +		.policy = ncsi_genl_policy,
> +		.doit = ncsi_set_package_mask_nl,
> +		.flags = GENL_ADMIN_PERM,
> +	},
> +	{
> +		.cmd = NCSI_CMD_SET_CHANNEL_MASK,
> +		.policy = ncsi_genl_policy,
> +		.doit = ncsi_set_channel_mask_nl,
> +		.flags = GENL_ADMIN_PERM,
> +	},
>  };
>  
>  static struct genl_family ncsi_genl_family __ro_after_init = {
> diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
> index d66b34749027..02ce7626b579 100644
> --- a/net/ncsi/ncsi-rsp.c
> +++ b/net/ncsi/ncsi-rsp.c
> @@ -241,7 +241,7 @@ static int ncsi_rsp_handler_dcnt(struct ncsi_request *nr)
>  	if (!ncm->enable)
>  		return 0;
>  
> -	ncm->enable = 1;
> +	ncm->enable = 0;
>  	return 0;
>  }
>  
> -- 
> 2.19.0

^ permalink raw reply

* Re: [PATCH net-next 0/3] nfp: flower: speed up stats update loop
From: Jakub Kicinski @ 2018-10-10 15:08 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, Linux Netdev List, oss-drivers
In-Reply-To: <CAJ3xEMi4p=XP9fbfjPxgSdd-8c+d++v3wdAUZihCwtf46+LODA@mail.gmail.com>

On Wed, 10 Oct 2018 16:40:56 +0300, Or Gerlitz wrote:
> On Tue, Oct 9, 2018 at 4:58 AM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> 
> > Given that our statistic IDs are already array indices, and considering
> > each statistic is only 24B in size, we decided to forego the use  
> 
> 8B packet + 8B bytes --> 16B -- does your FW/HW provide last use? how do
> you express this lastuse value in host jiffies?

24B is the size of the structure we keep in the driver, which costs
host memory.  The message from FW contains this:

struct nfp_fl_stats_frame {
	__be32 stats_con_id;
	__be32 pkt_count;
	__be64 byte_count;
	__be64 stats_cookie;
};

^ permalink raw reply

* [PATCH iproute 2/2] utils: fix get_rtnl_link_stats_rta stats parsing
From: Lorenzo Bianconi @ 2018-10-10 15:00 UTC (permalink / raw)
  To: stephen; +Cc: netdev
In-Reply-To: <cover.1539182623.git.lorenzo.bianconi@redhat.com>

iproute2 walks through the list of available tunnels using netlink
protocol in order to get device info instead of reading
them from proc filesystem. However the kernel reports device statistics
using IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS attributes nested in
IFLA_PROTINFO one but iproutes expects these info in
IFLA_STATS64/IFLA_STATS attributes.
The issue can be triggered with the following reproducer:

$ip link add ip6d0 type ip6tnl mode ip6ip6 local 1111::1 remote 2222::1
$ip -6 -d -s tunnel show ip6d0
ip6d0: ipv6/ipv6 remote 2222::1 local 1111::1 encaplimit 4 hoplimit 64
tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
Dump terminated

Fix the issue introducing IFLA_INET6_STATS attribute parsing

Fixes: 3e953938717f ("iptunnel/ip6tunnel: Use netlink to walk through
tunnels list")

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
---
 lib/utils.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/lib/utils.c b/lib/utils.c
index e87ecf31..7be2d6be 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -27,6 +27,7 @@
 #include <linux/param.h>
 #include <linux/if_arp.h>
 #include <linux/mpls.h>
+#include <linux/snmp.h>
 #include <time.h>
 #include <sys/time.h>
 #include <errno.h>
@@ -1549,6 +1550,24 @@ static void copy_rtnl_link_stats64(struct rtnl_link_stats64 *stats64,
 		*a++ = *b++;
 }
 
+#define IPSTATS_MIB_MAX_LEN	(__IPSTATS_MIB_MAX * sizeof(__u64))
+static void get_snmp_counters(struct rtnl_link_stats64 *stats64,
+			      struct rtattr *s)
+{
+	__u64 *mib = (__u64 *)RTA_DATA(s);
+
+	memset(stats64, 0, sizeof(*stats64));
+
+	stats64->rx_packets = mib[IPSTATS_MIB_INPKTS];
+	stats64->rx_bytes = mib[IPSTATS_MIB_INOCTETS];
+	stats64->tx_packets = mib[IPSTATS_MIB_OUTPKTS];
+	stats64->tx_bytes = mib[IPSTATS_MIB_OUTOCTETS];
+	stats64->rx_errors = mib[IPSTATS_MIB_INDISCARDS];
+	stats64->tx_errors = mib[IPSTATS_MIB_OUTDISCARDS];
+	stats64->multicast = mib[IPSTATS_MIB_INMCASTPKTS];
+	stats64->rx_frame_errors = mib[IPSTATS_MIB_CSUMERRORS];
+}
+
 int get_rtnl_link_stats_rta(struct rtnl_link_stats64 *stats64,
 			    struct rtattr *tb[])
 {
@@ -1565,6 +1584,14 @@ int get_rtnl_link_stats_rta(struct rtnl_link_stats64 *stats64,
 		rta = tb[IFLA_STATS];
 		size = sizeof(struct rtnl_link_stats);
 		s = &stats;
+	} else if (tb[IFLA_PROTINFO]) {
+		struct rtattr *ptb[IPSTATS_MIB_MAX_LEN + 1];
+
+		parse_rtattr_nested(ptb, IPSTATS_MIB_MAX_LEN,
+				    tb[IFLA_PROTINFO]);
+		if (ptb[IFLA_INET6_STATS])
+			get_snmp_counters(stats64, ptb[IFLA_INET6_STATS]);
+		return sizeof(*stats64);
 	} else {
 		return -1;
 	}
-- 
2.17.1

^ permalink raw reply related

* [PATCH iproute 1/2] uapi: add snmp header file
From: Lorenzo Bianconi @ 2018-10-10 15:00 UTC (permalink / raw)
  To: stephen; +Cc: netdev
In-Reply-To: <cover.1539182623.git.lorenzo.bianconi@redhat.com>

Introduce snmp header file. It will be used in subsequent patch in
order to parse device statistics reported in
IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS netlink attributes

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
---
 include/uapi/linux/snmp.h | 323 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 323 insertions(+)
 create mode 100644 include/uapi/linux/snmp.h

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
new file mode 100644
index 00000000..f80135e5
--- /dev/null
+++ b/include/uapi/linux/snmp.h
@@ -0,0 +1,323 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Definitions for MIBs
+ *
+ * Author: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
+ */
+
+#ifndef _LINUX_SNMP_H
+#define _LINUX_SNMP_H
+
+/* ipstats mib definitions */
+/*
+ * RFC 1213:  MIB-II
+ * RFC 2011 (updates 1213):  SNMPv2-MIB-IP
+ * RFC 2863:  Interfaces Group MIB
+ * RFC 2465:  IPv6 MIB: General Group
+ * draft-ietf-ipv6-rfc2011-update-10.txt: MIB for IP: IP Statistics Tables
+ */
+enum
+{
+	IPSTATS_MIB_NUM = 0,
+/* frequently written fields in fast path, kept in same cache line */
+	IPSTATS_MIB_INPKTS,			/* InReceives */
+	IPSTATS_MIB_INOCTETS,			/* InOctets */
+	IPSTATS_MIB_INDELIVERS,			/* InDelivers */
+	IPSTATS_MIB_OUTFORWDATAGRAMS,		/* OutForwDatagrams */
+	IPSTATS_MIB_OUTPKTS,			/* OutRequests */
+	IPSTATS_MIB_OUTOCTETS,			/* OutOctets */
+/* other fields */
+	IPSTATS_MIB_INHDRERRORS,		/* InHdrErrors */
+	IPSTATS_MIB_INTOOBIGERRORS,		/* InTooBigErrors */
+	IPSTATS_MIB_INNOROUTES,			/* InNoRoutes */
+	IPSTATS_MIB_INADDRERRORS,		/* InAddrErrors */
+	IPSTATS_MIB_INUNKNOWNPROTOS,		/* InUnknownProtos */
+	IPSTATS_MIB_INTRUNCATEDPKTS,		/* InTruncatedPkts */
+	IPSTATS_MIB_INDISCARDS,			/* InDiscards */
+	IPSTATS_MIB_OUTDISCARDS,		/* OutDiscards */
+	IPSTATS_MIB_OUTNOROUTES,		/* OutNoRoutes */
+	IPSTATS_MIB_REASMTIMEOUT,		/* ReasmTimeout */
+	IPSTATS_MIB_REASMREQDS,			/* ReasmReqds */
+	IPSTATS_MIB_REASMOKS,			/* ReasmOKs */
+	IPSTATS_MIB_REASMFAILS,			/* ReasmFails */
+	IPSTATS_MIB_FRAGOKS,			/* FragOKs */
+	IPSTATS_MIB_FRAGFAILS,			/* FragFails */
+	IPSTATS_MIB_FRAGCREATES,		/* FragCreates */
+	IPSTATS_MIB_INMCASTPKTS,		/* InMcastPkts */
+	IPSTATS_MIB_OUTMCASTPKTS,		/* OutMcastPkts */
+	IPSTATS_MIB_INBCASTPKTS,		/* InBcastPkts */
+	IPSTATS_MIB_OUTBCASTPKTS,		/* OutBcastPkts */
+	IPSTATS_MIB_INMCASTOCTETS,		/* InMcastOctets */
+	IPSTATS_MIB_OUTMCASTOCTETS,		/* OutMcastOctets */
+	IPSTATS_MIB_INBCASTOCTETS,		/* InBcastOctets */
+	IPSTATS_MIB_OUTBCASTOCTETS,		/* OutBcastOctets */
+	IPSTATS_MIB_CSUMERRORS,			/* InCsumErrors */
+	IPSTATS_MIB_NOECTPKTS,			/* InNoECTPkts */
+	IPSTATS_MIB_ECT1PKTS,			/* InECT1Pkts */
+	IPSTATS_MIB_ECT0PKTS,			/* InECT0Pkts */
+	IPSTATS_MIB_CEPKTS,			/* InCEPkts */
+	IPSTATS_MIB_REASM_OVERLAPS,		/* ReasmOverlaps */
+	__IPSTATS_MIB_MAX
+};
+
+/* icmp mib definitions */
+/*
+ * RFC 1213:  MIB-II ICMP Group
+ * RFC 2011 (updates 1213):  SNMPv2 MIB for IP: ICMP group
+ */
+enum
+{
+	ICMP_MIB_NUM = 0,
+	ICMP_MIB_INMSGS,			/* InMsgs */
+	ICMP_MIB_INERRORS,			/* InErrors */
+	ICMP_MIB_INDESTUNREACHS,		/* InDestUnreachs */
+	ICMP_MIB_INTIMEEXCDS,			/* InTimeExcds */
+	ICMP_MIB_INPARMPROBS,			/* InParmProbs */
+	ICMP_MIB_INSRCQUENCHS,			/* InSrcQuenchs */
+	ICMP_MIB_INREDIRECTS,			/* InRedirects */
+	ICMP_MIB_INECHOS,			/* InEchos */
+	ICMP_MIB_INECHOREPS,			/* InEchoReps */
+	ICMP_MIB_INTIMESTAMPS,			/* InTimestamps */
+	ICMP_MIB_INTIMESTAMPREPS,		/* InTimestampReps */
+	ICMP_MIB_INADDRMASKS,			/* InAddrMasks */
+	ICMP_MIB_INADDRMASKREPS,		/* InAddrMaskReps */
+	ICMP_MIB_OUTMSGS,			/* OutMsgs */
+	ICMP_MIB_OUTERRORS,			/* OutErrors */
+	ICMP_MIB_OUTDESTUNREACHS,		/* OutDestUnreachs */
+	ICMP_MIB_OUTTIMEEXCDS,			/* OutTimeExcds */
+	ICMP_MIB_OUTPARMPROBS,			/* OutParmProbs */
+	ICMP_MIB_OUTSRCQUENCHS,			/* OutSrcQuenchs */
+	ICMP_MIB_OUTREDIRECTS,			/* OutRedirects */
+	ICMP_MIB_OUTECHOS,			/* OutEchos */
+	ICMP_MIB_OUTECHOREPS,			/* OutEchoReps */
+	ICMP_MIB_OUTTIMESTAMPS,			/* OutTimestamps */
+	ICMP_MIB_OUTTIMESTAMPREPS,		/* OutTimestampReps */
+	ICMP_MIB_OUTADDRMASKS,			/* OutAddrMasks */
+	ICMP_MIB_OUTADDRMASKREPS,		/* OutAddrMaskReps */
+	ICMP_MIB_CSUMERRORS,			/* InCsumErrors */
+	__ICMP_MIB_MAX
+};
+
+#define __ICMPMSG_MIB_MAX 512	/* Out+In for all 8-bit ICMP types */
+
+/* icmp6 mib definitions */
+/*
+ * RFC 2466:  ICMPv6-MIB
+ */
+enum
+{
+	ICMP6_MIB_NUM = 0,
+	ICMP6_MIB_INMSGS,			/* InMsgs */
+	ICMP6_MIB_INERRORS,			/* InErrors */
+	ICMP6_MIB_OUTMSGS,			/* OutMsgs */
+	ICMP6_MIB_OUTERRORS,			/* OutErrors */
+	ICMP6_MIB_CSUMERRORS,			/* InCsumErrors */
+	__ICMP6_MIB_MAX
+};
+
+#define __ICMP6MSG_MIB_MAX 512 /* Out+In for all 8-bit ICMPv6 types */
+
+/* tcp mib definitions */
+/*
+ * RFC 1213:  MIB-II TCP group
+ * RFC 2012 (updates 1213):  SNMPv2-MIB-TCP
+ */
+enum
+{
+	TCP_MIB_NUM = 0,
+	TCP_MIB_RTOALGORITHM,			/* RtoAlgorithm */
+	TCP_MIB_RTOMIN,				/* RtoMin */
+	TCP_MIB_RTOMAX,				/* RtoMax */
+	TCP_MIB_MAXCONN,			/* MaxConn */
+	TCP_MIB_ACTIVEOPENS,			/* ActiveOpens */
+	TCP_MIB_PASSIVEOPENS,			/* PassiveOpens */
+	TCP_MIB_ATTEMPTFAILS,			/* AttemptFails */
+	TCP_MIB_ESTABRESETS,			/* EstabResets */
+	TCP_MIB_CURRESTAB,			/* CurrEstab */
+	TCP_MIB_INSEGS,				/* InSegs */
+	TCP_MIB_OUTSEGS,			/* OutSegs */
+	TCP_MIB_RETRANSSEGS,			/* RetransSegs */
+	TCP_MIB_INERRS,				/* InErrs */
+	TCP_MIB_OUTRSTS,			/* OutRsts */
+	TCP_MIB_CSUMERRORS,			/* InCsumErrors */
+	__TCP_MIB_MAX
+};
+
+/* udp mib definitions */
+/*
+ * RFC 1213:  MIB-II UDP group
+ * RFC 2013 (updates 1213):  SNMPv2-MIB-UDP
+ */
+enum
+{
+	UDP_MIB_NUM = 0,
+	UDP_MIB_INDATAGRAMS,			/* InDatagrams */
+	UDP_MIB_NOPORTS,			/* NoPorts */
+	UDP_MIB_INERRORS,			/* InErrors */
+	UDP_MIB_OUTDATAGRAMS,			/* OutDatagrams */
+	UDP_MIB_RCVBUFERRORS,			/* RcvbufErrors */
+	UDP_MIB_SNDBUFERRORS,			/* SndbufErrors */
+	UDP_MIB_CSUMERRORS,			/* InCsumErrors */
+	UDP_MIB_IGNOREDMULTI,			/* IgnoredMulti */
+	__UDP_MIB_MAX
+};
+
+/* linux mib definitions */
+enum
+{
+	LINUX_MIB_NUM = 0,
+	LINUX_MIB_SYNCOOKIESSENT,		/* SyncookiesSent */
+	LINUX_MIB_SYNCOOKIESRECV,		/* SyncookiesRecv */
+	LINUX_MIB_SYNCOOKIESFAILED,		/* SyncookiesFailed */
+	LINUX_MIB_EMBRYONICRSTS,		/* EmbryonicRsts */
+	LINUX_MIB_PRUNECALLED,			/* PruneCalled */
+	LINUX_MIB_RCVPRUNED,			/* RcvPruned */
+	LINUX_MIB_OFOPRUNED,			/* OfoPruned */
+	LINUX_MIB_OUTOFWINDOWICMPS,		/* OutOfWindowIcmps */
+	LINUX_MIB_LOCKDROPPEDICMPS,		/* LockDroppedIcmps */
+	LINUX_MIB_ARPFILTER,			/* ArpFilter */
+	LINUX_MIB_TIMEWAITED,			/* TimeWaited */
+	LINUX_MIB_TIMEWAITRECYCLED,		/* TimeWaitRecycled */
+	LINUX_MIB_TIMEWAITKILLED,		/* TimeWaitKilled */
+	LINUX_MIB_PAWSACTIVEREJECTED,		/* PAWSActiveRejected */
+	LINUX_MIB_PAWSESTABREJECTED,		/* PAWSEstabRejected */
+	LINUX_MIB_DELAYEDACKS,			/* DelayedACKs */
+	LINUX_MIB_DELAYEDACKLOCKED,		/* DelayedACKLocked */
+	LINUX_MIB_DELAYEDACKLOST,		/* DelayedACKLost */
+	LINUX_MIB_LISTENOVERFLOWS,		/* ListenOverflows */
+	LINUX_MIB_LISTENDROPS,			/* ListenDrops */
+	LINUX_MIB_TCPHPHITS,			/* TCPHPHits */
+	LINUX_MIB_TCPPUREACKS,			/* TCPPureAcks */
+	LINUX_MIB_TCPHPACKS,			/* TCPHPAcks */
+	LINUX_MIB_TCPRENORECOVERY,		/* TCPRenoRecovery */
+	LINUX_MIB_TCPSACKRECOVERY,		/* TCPSackRecovery */
+	LINUX_MIB_TCPSACKRENEGING,		/* TCPSACKReneging */
+	LINUX_MIB_TCPSACKREORDER,		/* TCPSACKReorder */
+	LINUX_MIB_TCPRENOREORDER,		/* TCPRenoReorder */
+	LINUX_MIB_TCPTSREORDER,			/* TCPTSReorder */
+	LINUX_MIB_TCPFULLUNDO,			/* TCPFullUndo */
+	LINUX_MIB_TCPPARTIALUNDO,		/* TCPPartialUndo */
+	LINUX_MIB_TCPDSACKUNDO,			/* TCPDSACKUndo */
+	LINUX_MIB_TCPLOSSUNDO,			/* TCPLossUndo */
+	LINUX_MIB_TCPLOSTRETRANSMIT,		/* TCPLostRetransmit */
+	LINUX_MIB_TCPRENOFAILURES,		/* TCPRenoFailures */
+	LINUX_MIB_TCPSACKFAILURES,		/* TCPSackFailures */
+	LINUX_MIB_TCPLOSSFAILURES,		/* TCPLossFailures */
+	LINUX_MIB_TCPFASTRETRANS,		/* TCPFastRetrans */
+	LINUX_MIB_TCPSLOWSTARTRETRANS,		/* TCPSlowStartRetrans */
+	LINUX_MIB_TCPTIMEOUTS,			/* TCPTimeouts */
+	LINUX_MIB_TCPLOSSPROBES,		/* TCPLossProbes */
+	LINUX_MIB_TCPLOSSPROBERECOVERY,		/* TCPLossProbeRecovery */
+	LINUX_MIB_TCPRENORECOVERYFAIL,		/* TCPRenoRecoveryFail */
+	LINUX_MIB_TCPSACKRECOVERYFAIL,		/* TCPSackRecoveryFail */
+	LINUX_MIB_TCPRCVCOLLAPSED,		/* TCPRcvCollapsed */
+	LINUX_MIB_TCPDSACKOLDSENT,		/* TCPDSACKOldSent */
+	LINUX_MIB_TCPDSACKOFOSENT,		/* TCPDSACKOfoSent */
+	LINUX_MIB_TCPDSACKRECV,			/* TCPDSACKRecv */
+	LINUX_MIB_TCPDSACKOFORECV,		/* TCPDSACKOfoRecv */
+	LINUX_MIB_TCPABORTONDATA,		/* TCPAbortOnData */
+	LINUX_MIB_TCPABORTONCLOSE,		/* TCPAbortOnClose */
+	LINUX_MIB_TCPABORTONMEMORY,		/* TCPAbortOnMemory */
+	LINUX_MIB_TCPABORTONTIMEOUT,		/* TCPAbortOnTimeout */
+	LINUX_MIB_TCPABORTONLINGER,		/* TCPAbortOnLinger */
+	LINUX_MIB_TCPABORTFAILED,		/* TCPAbortFailed */
+	LINUX_MIB_TCPMEMORYPRESSURES,		/* TCPMemoryPressures */
+	LINUX_MIB_TCPMEMORYPRESSURESCHRONO,	/* TCPMemoryPressuresChrono */
+	LINUX_MIB_TCPSACKDISCARD,		/* TCPSACKDiscard */
+	LINUX_MIB_TCPDSACKIGNOREDOLD,		/* TCPSACKIgnoredOld */
+	LINUX_MIB_TCPDSACKIGNOREDNOUNDO,	/* TCPSACKIgnoredNoUndo */
+	LINUX_MIB_TCPSPURIOUSRTOS,		/* TCPSpuriousRTOs */
+	LINUX_MIB_TCPMD5NOTFOUND,		/* TCPMD5NotFound */
+	LINUX_MIB_TCPMD5UNEXPECTED,		/* TCPMD5Unexpected */
+	LINUX_MIB_TCPMD5FAILURE,		/* TCPMD5Failure */
+	LINUX_MIB_SACKSHIFTED,
+	LINUX_MIB_SACKMERGED,
+	LINUX_MIB_SACKSHIFTFALLBACK,
+	LINUX_MIB_TCPBACKLOGDROP,
+	LINUX_MIB_PFMEMALLOCDROP,
+	LINUX_MIB_TCPMINTTLDROP, /* RFC 5082 */
+	LINUX_MIB_TCPDEFERACCEPTDROP,
+	LINUX_MIB_IPRPFILTER, /* IP Reverse Path Filter (rp_filter) */
+	LINUX_MIB_TCPTIMEWAITOVERFLOW,		/* TCPTimeWaitOverflow */
+	LINUX_MIB_TCPREQQFULLDOCOOKIES,		/* TCPReqQFullDoCookies */
+	LINUX_MIB_TCPREQQFULLDROP,		/* TCPReqQFullDrop */
+	LINUX_MIB_TCPRETRANSFAIL,		/* TCPRetransFail */
+	LINUX_MIB_TCPRCVCOALESCE,		/* TCPRcvCoalesce */
+	LINUX_MIB_TCPOFOQUEUE,			/* TCPOFOQueue */
+	LINUX_MIB_TCPOFODROP,			/* TCPOFODrop */
+	LINUX_MIB_TCPOFOMERGE,			/* TCPOFOMerge */
+	LINUX_MIB_TCPCHALLENGEACK,		/* TCPChallengeACK */
+	LINUX_MIB_TCPSYNCHALLENGE,		/* TCPSYNChallenge */
+	LINUX_MIB_TCPFASTOPENACTIVE,		/* TCPFastOpenActive */
+	LINUX_MIB_TCPFASTOPENACTIVEFAIL,	/* TCPFastOpenActiveFail */
+	LINUX_MIB_TCPFASTOPENPASSIVE,		/* TCPFastOpenPassive*/
+	LINUX_MIB_TCPFASTOPENPASSIVEFAIL,	/* TCPFastOpenPassiveFail */
+	LINUX_MIB_TCPFASTOPENLISTENOVERFLOW,	/* TCPFastOpenListenOverflow */
+	LINUX_MIB_TCPFASTOPENCOOKIEREQD,	/* TCPFastOpenCookieReqd */
+	LINUX_MIB_TCPFASTOPENBLACKHOLE,		/* TCPFastOpenBlackholeDetect */
+	LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES, /* TCPSpuriousRtxHostQueues */
+	LINUX_MIB_BUSYPOLLRXPACKETS,		/* BusyPollRxPackets */
+	LINUX_MIB_TCPAUTOCORKING,		/* TCPAutoCorking */
+	LINUX_MIB_TCPFROMZEROWINDOWADV,		/* TCPFromZeroWindowAdv */
+	LINUX_MIB_TCPTOZEROWINDOWADV,		/* TCPToZeroWindowAdv */
+	LINUX_MIB_TCPWANTZEROWINDOWADV,		/* TCPWantZeroWindowAdv */
+	LINUX_MIB_TCPSYNRETRANS,		/* TCPSynRetrans */
+	LINUX_MIB_TCPORIGDATASENT,		/* TCPOrigDataSent */
+	LINUX_MIB_TCPHYSTARTTRAINDETECT,	/* TCPHystartTrainDetect */
+	LINUX_MIB_TCPHYSTARTTRAINCWND,		/* TCPHystartTrainCwnd */
+	LINUX_MIB_TCPHYSTARTDELAYDETECT,	/* TCPHystartDelayDetect */
+	LINUX_MIB_TCPHYSTARTDELAYCWND,		/* TCPHystartDelayCwnd */
+	LINUX_MIB_TCPACKSKIPPEDSYNRECV,		/* TCPACKSkippedSynRecv */
+	LINUX_MIB_TCPACKSKIPPEDPAWS,		/* TCPACKSkippedPAWS */
+	LINUX_MIB_TCPACKSKIPPEDSEQ,		/* TCPACKSkippedSeq */
+	LINUX_MIB_TCPACKSKIPPEDFINWAIT2,	/* TCPACKSkippedFinWait2 */
+	LINUX_MIB_TCPACKSKIPPEDTIMEWAIT,	/* TCPACKSkippedTimeWait */
+	LINUX_MIB_TCPACKSKIPPEDCHALLENGE,	/* TCPACKSkippedChallenge */
+	LINUX_MIB_TCPWINPROBE,			/* TCPWinProbe */
+	LINUX_MIB_TCPKEEPALIVE,			/* TCPKeepAlive */
+	LINUX_MIB_TCPMTUPFAIL,			/* TCPMTUPFail */
+	LINUX_MIB_TCPMTUPSUCCESS,		/* TCPMTUPSuccess */
+	LINUX_MIB_TCPDELIVERED,			/* TCPDelivered */
+	LINUX_MIB_TCPDELIVEREDCE,		/* TCPDeliveredCE */
+	LINUX_MIB_TCPACKCOMPRESSED,		/* TCPAckCompressed */
+	LINUX_MIB_TCPZEROWINDOWDROP,		/* TCPZeroWindowDrop */
+	LINUX_MIB_TCPRCVQDROP,			/* TCPRcvQDrop */
+	__LINUX_MIB_MAX
+};
+
+/* linux Xfrm mib definitions */
+enum
+{
+	LINUX_MIB_XFRMNUM = 0,
+	LINUX_MIB_XFRMINERROR,			/* XfrmInError */
+	LINUX_MIB_XFRMINBUFFERERROR,		/* XfrmInBufferError */
+	LINUX_MIB_XFRMINHDRERROR,		/* XfrmInHdrError */
+	LINUX_MIB_XFRMINNOSTATES,		/* XfrmInNoStates */
+	LINUX_MIB_XFRMINSTATEPROTOERROR,	/* XfrmInStateProtoError */
+	LINUX_MIB_XFRMINSTATEMODEERROR,		/* XfrmInStateModeError */
+	LINUX_MIB_XFRMINSTATESEQERROR,		/* XfrmInStateSeqError */
+	LINUX_MIB_XFRMINSTATEEXPIRED,		/* XfrmInStateExpired */
+	LINUX_MIB_XFRMINSTATEMISMATCH,		/* XfrmInStateMismatch */
+	LINUX_MIB_XFRMINSTATEINVALID,		/* XfrmInStateInvalid */
+	LINUX_MIB_XFRMINTMPLMISMATCH,		/* XfrmInTmplMismatch */
+	LINUX_MIB_XFRMINNOPOLS,			/* XfrmInNoPols */
+	LINUX_MIB_XFRMINPOLBLOCK,		/* XfrmInPolBlock */
+	LINUX_MIB_XFRMINPOLERROR,		/* XfrmInPolError */
+	LINUX_MIB_XFRMOUTERROR,			/* XfrmOutError */
+	LINUX_MIB_XFRMOUTBUNDLEGENERROR,	/* XfrmOutBundleGenError */
+	LINUX_MIB_XFRMOUTBUNDLECHECKERROR,	/* XfrmOutBundleCheckError */
+	LINUX_MIB_XFRMOUTNOSTATES,		/* XfrmOutNoStates */
+	LINUX_MIB_XFRMOUTSTATEPROTOERROR,	/* XfrmOutStateProtoError */
+	LINUX_MIB_XFRMOUTSTATEMODEERROR,	/* XfrmOutStateModeError */
+	LINUX_MIB_XFRMOUTSTATESEQERROR,		/* XfrmOutStateSeqError */
+	LINUX_MIB_XFRMOUTSTATEEXPIRED,		/* XfrmOutStateExpired */
+	LINUX_MIB_XFRMOUTPOLBLOCK,		/* XfrmOutPolBlock */
+	LINUX_MIB_XFRMOUTPOLDEAD,		/* XfrmOutPolDead */
+	LINUX_MIB_XFRMOUTPOLERROR,		/* XfrmOutPolError */
+	LINUX_MIB_XFRMFWDHDRERROR,		/* XfrmFwdHdrError*/
+	LINUX_MIB_XFRMOUTSTATEINVALID,		/* XfrmOutStateInvalid */
+	LINUX_MIB_XFRMACQUIREERROR,		/* XfrmAcquireError */
+	__LINUX_MIB_XFRMMAX
+};
+
+#endif	/* _LINUX_SNMP_H */
-- 
2.17.1

^ permalink raw reply related

* [PATCH iproute 0/2] introduce IFLA_INET6_STATS attribute parsing
From: Lorenzo Bianconi @ 2018-10-10 15:00 UTC (permalink / raw)
  To: stephen; +Cc: netdev

Add IFLA_INET6_STATS netlink attribute parsing in order to fix an issue
triggered dumping device statistics.
Introduce snmp header as helper to walks through attribute subfields

Lorenzo Bianconi (2):
  uapi: add snmp header file
  utils: fix get_rtnl_link_stats_rta stats parsing

 include/uapi/linux/snmp.h | 323 ++++++++++++++++++++++++++++++++++++++
 lib/utils.c               |  27 ++++
 2 files changed, 350 insertions(+)
 create mode 100644 include/uapi/linux/snmp.h

-- 
2.17.1

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: Dmitry Vyukov @ 2018-10-10 14:51 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML,
	Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <20181010144059.GA20918@nautica>

On Wed, Oct 10, 2018 at 4:40 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> Dmitry Vyukov wrote on Wed, Oct 10, 2018:
>> How can they be faked?
>> If we could create a private rdma/virtio stub instance per test
>> process, then we could I think easily use that instance for 9p. But is
>> it possible?
>
> "RDMA" itself can be faked pretty easily nowadays, there's a "rxe"
> driver that is soft RDMA over ethernet and can run over anything.
>
> The problem is that you can't just give the client a file like trans fd;
> you'd need to open an ""rdma socket"" (simplifying wording a bit), and
> afaik there is no standard tool for it ; or rather, the problem is that
> RDMA is packet based so even if there were you can't just write stuff
> in a fd and hope it'll work, so you need a server.
>
> If you're interested, 9p is trivial enough that I could provide you with
> a trivial server that works like your file (just need to reimplement
> something that parses header to packetize it properly; so you could
> write to its stdin for example) ; that'd require some setup in the VM
> (configure rxe and install that tool), but it would definitely be
> possible.
> What do you think ?

I would like to hear more details.
Opening a socket is not a problem. Why do we need a tool for this?
I don't understand the problem with "packet-based" and what does it
mean to have a separate server? Any why?
We definitely don't want to involve a separate third-party server,
that's very problematic for multiple reasons. But we can have a chunk
of custom C code inside of syzkaller.
What exactly setup we need?

I guess it will make things simpler if you provide some kind of "hello
world" C program that mounts 9p/rdma. I don't need exact messages
(they will be same as with pipe transport, right?) nor actual server
implementation, but just the place where to inject these packets.

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: Dmitry Vyukov @ 2018-10-10 14:42 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov,
	netdev, Ron Minnich, syzkaller-bugs, v9fs-developer
In-Reply-To: <20181009020949.GA29622@nautica>

On Tue, Oct 9, 2018 at 4:09 AM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> syzbot wrote on Mon, Oct 08, 2018:
>> syzbot has found a reproducer for the following crash on:
>>
>> HEAD commit:    0854ba5ff5c9 Merge git://git.kernel.org/pub/scm/linux/kern..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1514ec06400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=88e9a8a39dc0be2d
>> dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10b91685400000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com
>>
>> list_del corruption, ffff88019ae36ee8->next is LIST_POISON1
>> (dead000000000100)
>> ------------[ cut here ]------------
>> [...]
>>  list_del include/linux/list.h:125 [inline]
>>  p9_read_work+0xab6/0x10e0 net/9p/trans_fd.c:379
>
> Hmm this looks very much like the report from
> syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com
> which should have been fixed by Tomas in 9f476d7c540cb
> ("net/9p/trans_fd.c: fix race by holding the lock")...
>
> It looks like another double list_del, looking at the code again there
> actually are other ways this could happen around connection errors.
> For example,
>  - p9_read_work receives something and lookup works... meanwhile
>  - p9_write_work fails to write and calls p9_conn_cancel, which deletes
> from the req_list without waiting for other works to finish (could also
> happen in p9_poll_mux)
>  - p9_read_work finishes processing the read and deletes from list again
>
> For this one the simplest fix would probably be to just not
> list_del/call p9_client_cb at all if m->r?req->status isn't
> REQ_STATUS_ERROR in p9_read_work after the "got new packet" debug print,
> and frankly I think that's saner so I'll send a patch shortly doing
> that, but I have zero confidence there aren't similar bugs around, the
> tcp code is so messy... Most of the syzbot reports recently have been
> around trans_fd which I don't think is used much in real life, and this
> is not really motivating (i.e. I think it would probably need a more
> extensive rewrite but nobody cares) :/
>
>
> Dmitry, on that note, do you think syzbot could possibly test other
> transports somehow? rdma or virtio cannot be faked as easily as passing
> a fd around, but I'd be very interested in seeing these flayed a bit.
>
> (I'm also curious what logic is used to generate the syz tests, the
> write$P9_Rxx replies have nothing to do with what the client would
> expect so it probably doesn't test very far; this test in particular
> does not even get past the initial P9_TVERSION that the client would
> expect immediately after mount, so it's basically only testing logic
> around packet handling on error... Or if we're accepting a RREADDIR in
> reply to TVERSION we have bigger problems, and now I'm looking at it I
> think we just might never check that....... I'll look at that for the
> next cycle)

Good question.

It's a mix of dumb and not-so-dumb.
First we have descriptions of kernel interface, here are 9p ones:
https://github.com/google/syzkaller/blob/master/sys/linux/9p.txt
These descriptions allows to generate primitively meaningful things
(e.g. proper struct layout). They also capture some interrelations
between calls. For example, you can see these "resource rfd9p" and
"resource wfd9p" at the top, these as "fd subtypes", and descriptions
capture what produces these resources as output and what consumes
these resources as input. For example, rfd9p is produced by pipe and
consumed by mount, so we know that these calls need to be called in
that order. But this does not work too well for, for example, 9p
message tags/types, because we don't know what exactly message type we
will read out and these tags expire after reply.

Second, syzkaller uses code coverage as guidance. So as soon as it
learns to do proper handshake, it sees new coverage and memorizes this
program as useful and tries to extend it more in future. Later it
learns how to create a single file, sees new coverage, memorizes, etc.
This allows it to incrementally build more and more complex programs
over time.

You can see current code coverage it achieved here (in cover column):
https://syzkaller.appspot.com/#managers
e.g. (note: 80MB file):
https://storage.googleapis.com/syzkaller/cover/ci-upstream-kasan-gce-root.html

As far as I see it did some non-trivial progress for 9p subsystem. I
don't know if it reached everything reachable or not, though.

^ permalink raw reply

* [PATCH] r8152: limit MAC pass-through to one device
From: Oliver Neukum @ 2018-10-10 14:29 UTC (permalink / raw)
  To: netdev, davem, jkohoutek, mario_limonciello; +Cc: Oliver Neukum

MAC address having to be unique, a MAC coming from the host
must be used at most once at a time. Hence the users must
be recorded and additional users must fall back to conventional
methods.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Fixes: 34ee32c9a5696 ("r8152: Add support for setting pass through MAC address on RTL8153-AD")
---
 drivers/net/usb/r8152.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index f1b5201cc320..7345a2258ee4 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -766,6 +766,9 @@ enum tx_csum_stat {
 	TX_CSUM_NONE
 };
 
+/* pass through MACs are per host, hence concurrent use is forbidden */
+static struct r8152 *pass_through_user = NULL;
+
 /* Maximum number of multicast addresses to filter (vs. Rx-all-multicast).
  * The RTL chips use a 64 element hash table based on the Ethernet CRC.
  */
@@ -1221,7 +1224,14 @@ static int set_ethernet_addr(struct r8152 *tp)
 		 * or system doesn't provide valid _SB.AMAC this will be
 		 * be expected to non-zero
 		 */
-		ret = vendor_mac_passthru_addr_read(tp, &sa);
+		if (!pass_through_user) {
+			ret = vendor_mac_passthru_addr_read(tp, &sa);
+			if (ret >= 0)
+				/* we must record the user against concurrent use */
+				pass_through_user = tp;
+		} else {
+			ret = -EBUSY;
+		}
 		if (ret < 0)
 			ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa.sa_data);
 	}
@@ -5304,6 +5314,8 @@ static void rtl8152_disconnect(struct usb_interface *intf)
 		cancel_delayed_work_sync(&tp->hw_phy_work);
 		tp->rtl_ops.unload(tp);
 		free_netdev(tp->netdev);
+		if (pass_through_user == tp)
+			pass_through_user = NULL;
 	}
 }
 
-- 
2.16.4

^ permalink raw reply related

* [PATCH net 1/3] devlink: Fix param set handling for string type
From: Moshe Shemesh @ 2018-10-10 13:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jiri Pirko, netdev, linux-kernel, Moshe Shemesh
In-Reply-To: <1539176967-22172-1-git-send-email-moshe@mellanox.com>

In case devlink param type is string, it needs to copy the string value
it got from the input to devlink_param_value.

Fixes: e3b7ca18ad7b ("devlink: Add param set command")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/net/devlink.h |  2 +-
 net/core/devlink.c    | 11 ++++++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b9b89d6..b0e17c0 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -311,7 +311,7 @@ enum devlink_param_type {
 	u8 vu8;
 	u16 vu16;
 	u32 vu32;
-	const char *vstr;
+	char vstr[DEVLINK_PARAM_MAX_STRING_VALUE];
 	bool vbool;
 };
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 8c0ed22..d808af7 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2995,6 +2995,8 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg,
 				  struct genl_info *info,
 				  union devlink_param_value *value)
 {
+	int len;
+
 	if (param->type != DEVLINK_PARAM_TYPE_BOOL &&
 	    !info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])
 		return -EINVAL;
@@ -3010,10 +3012,13 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg,
 		value->vu32 = nla_get_u32(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
 		break;
 	case DEVLINK_PARAM_TYPE_STRING:
-		if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) >
-		    DEVLINK_PARAM_MAX_STRING_VALUE)
+		len = strnlen(nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]),
+			      nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));
+		if (len == nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) ||
+		    len >= DEVLINK_PARAM_MAX_STRING_VALUE)
 			return -EINVAL;
-		value->vstr = nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
+		strcpy(value->vstr,
+		       nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));
 		break;
 	case DEVLINK_PARAM_TYPE_BOOL:
 		value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ?
-- 
1.8.3.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox