Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/7] bpf: implement BPF_TASK_FD_QUERY
From: Yonghong Song @ 2018-05-22 16:30 UTC (permalink / raw)
  To: peterz, ast, daniel, netdev; +Cc: kernel-team

Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.

There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.

This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, this command will return bpf related information
to user space. Right now it only supports tracepoint/kprobe/uprobe
perf event fd's. For such a fd, BPF_TASK_FD_QUERY will return
   . prog_id
   . tracepoint name, or
   . k[ret]probe funcname + offset or kernel addr, or
   . u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.

Patch #1 adds function perf_get_event() in kernel/events/core.c.
Patch #2 implements the bpf subcommand BPF_TASK_FD_QUERY.
Patch #3 syncs tools bpf.h header and also add bpf_task_fd_query()
in the libbpf library for samples/selftests/bpftool to use.
Patch #4 adds ksym_get_addr() utility function.
Patch #5 add a test in samples/bpf for querying k[ret]probes and
u[ret]probes.
Patch #6 add a test in tools/testing/selftests/bpf for querying
raw_tracepoint and tracepoint.
Patch #7 add a new subcommand "perf" to bpftool.

Changelogs:
  v2 -> v3:
     . made perf_get_event() return perf_event pointer const.
       this was to ensure that event fields are not meddled.
     . detect whether newly BPF_TASK_FD_QUERY is supported or
       not in "bpftool perf" and warn users if it is not.
  v1 -> v2:
     . changed bpf subcommand name from BPF_PERF_EVENT_QUERY
       to BPF_TASK_FD_QUERY.
     . fixed various "bpftool perf" issues and added documentation
       and auto-completion.

Yonghong Song (7):
  perf/core: add perf_get_event() to return perf_event given a struct
    file
  bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
  tools/bpf: sync kernel header bpf.h and add bpf_trace_event_query in
    libbpf
  tools/bpf: add ksym_get_addr() in trace_helpers
  samples/bpf: add a samples/bpf test for BPF_TASK_FD_QUERY
  tools/bpf: add two BPF_TASK_FD_QUERY tests in test_progs
  tools/bpftool: add perf subcommand

 include/linux/perf_event.h                       |   5 +
 include/linux/trace_events.h                     |  16 +
 include/uapi/linux/bpf.h                         |  27 ++
 kernel/bpf/syscall.c                             | 124 ++++++++
 kernel/events/core.c                             |   8 +
 kernel/trace/bpf_trace.c                         |  48 +++
 kernel/trace/trace_kprobe.c                      |  29 ++
 kernel/trace/trace_uprobe.c                      |  22 ++
 samples/bpf/Makefile                             |   4 +
 samples/bpf/task_fd_query_kern.c                 |  19 ++
 samples/bpf/task_fd_query_user.c                 | 379 +++++++++++++++++++++++
 tools/bpf/bpftool/Documentation/bpftool-perf.rst |  81 +++++
 tools/bpf/bpftool/Documentation/bpftool.rst      |   5 +-
 tools/bpf/bpftool/bash-completion/bpftool        |   9 +
 tools/bpf/bpftool/main.c                         |   3 +-
 tools/bpf/bpftool/main.h                         |   1 +
 tools/bpf/bpftool/perf.c                         | 244 +++++++++++++++
 tools/include/uapi/linux/bpf.h                   |  27 ++
 tools/lib/bpf/bpf.c                              |  24 ++
 tools/lib/bpf/bpf.h                              |   3 +
 tools/testing/selftests/bpf/test_progs.c         | 133 ++++++++
 tools/testing/selftests/bpf/trace_helpers.c      |  12 +
 tools/testing/selftests/bpf/trace_helpers.h      |   1 +
 23 files changed, 1222 insertions(+), 2 deletions(-)
 create mode 100644 samples/bpf/task_fd_query_kern.c
 create mode 100644 samples/bpf/task_fd_query_user.c
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-perf.rst
 create mode 100644 tools/bpf/bpftool/perf.c

-- 
2.9.5

^ permalink raw reply

* [PATCH bpf-next v3 1/7] perf/core: add perf_get_event() to return perf_event given a struct file
From: Yonghong Song @ 2018-05-22 16:30 UTC (permalink / raw)
  To: peterz, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180522163048.3128924-1-yhs@fb.com>

A new extern function, perf_get_event(), is added to return a perf event
given a struct file. This function will be used in later patches.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/perf_event.h | 5 +++++
 kernel/events/core.c       | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e71e99e..eec302b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -868,6 +868,7 @@ extern void perf_event_exit_task(struct task_struct *child);
 extern void perf_event_free_task(struct task_struct *task);
 extern void perf_event_delayed_put(struct task_struct *task);
 extern struct file *perf_event_get(unsigned int fd);
+extern const struct perf_event *perf_get_event(struct file *file);
 extern const struct perf_event_attr *perf_event_attrs(struct perf_event *event);
 extern void perf_event_print_debug(void);
 extern void perf_pmu_disable(struct pmu *pmu);
@@ -1289,6 +1290,10 @@ static inline void perf_event_exit_task(struct task_struct *child)	{ }
 static inline void perf_event_free_task(struct task_struct *task)	{ }
 static inline void perf_event_delayed_put(struct task_struct *task)	{ }
 static inline struct file *perf_event_get(unsigned int fd)	{ return ERR_PTR(-EINVAL); }
+static inline const struct perf_event *perf_get_event(struct file *file)
+{
+	return ERR_PTR(-EINVAL);
+}
 static inline const struct perf_event_attr *perf_event_attrs(struct perf_event *event)
 {
 	return ERR_PTR(-EINVAL);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 67612ce..6eeab86 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11212,6 +11212,14 @@ struct file *perf_event_get(unsigned int fd)
 	return file;
 }
 
+const struct perf_event *perf_get_event(struct file *file)
+{
+	if (file->f_op != &perf_fops)
+		return ERR_PTR(-EINVAL);
+
+	return file->private_data;
+}
+
 const struct perf_event_attr *perf_event_attrs(struct perf_event *event)
 {
 	if (!event)
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next v3 3/7] tools/bpf: sync kernel header bpf.h and add bpf_trace_event_query in libbpf
From: Yonghong Song @ 2018-05-22 16:30 UTC (permalink / raw)
  To: peterz, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180522163048.3128924-1-yhs@fb.com>

Sync kernel header bpf.h to tools/include/uapi/linux/bpf.h and
implement bpf_trace_event_query() in libbpf. The test programs
in samples/bpf and tools/testing/selftests/bpf, and later bpftool
will use this libbpf function to query kernel.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/bpf.h | 27 +++++++++++++++++++++++++++
 tools/lib/bpf/bpf.c            | 24 ++++++++++++++++++++++++
 tools/lib/bpf/bpf.h            |  3 +++
 3 files changed, 54 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 97446bb..a602150 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -97,6 +97,7 @@ enum bpf_cmd {
 	BPF_RAW_TRACEPOINT_OPEN,
 	BPF_BTF_LOAD,
 	BPF_BTF_GET_FD_BY_ID,
+	BPF_TASK_FD_QUERY,
 };
 
 enum bpf_map_type {
@@ -379,6 +380,22 @@ union bpf_attr {
 		__u32		btf_log_size;
 		__u32		btf_log_level;
 	};
+
+	struct {
+		int		pid;		/* input: pid */
+		int		fd;		/* input: fd */
+		__u32		flags;		/* input: flags */
+		__u32		buf_len;	/* input: buf len */
+		__aligned_u64	buf;		/* input/output:
+						 *   tp_name for tracepoint
+						 *   symbol for kprobe
+						 *   filename for uprobe
+						 */
+		__u32		prog_id;	/* output: prod_id */
+		__u32		attach_info;	/* output: BPF_ATTACH_* */
+		__u64		probe_offset;	/* output: probe_offset */
+		__u64		probe_addr;	/* output: probe_addr */
+	} task_fd_query;
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
@@ -2458,4 +2475,14 @@ struct bpf_fib_lookup {
 	__u8	dmac[6];     /* ETH_ALEN */
 };
 
+/* used by <task, fd> based query */
+enum {
+	BPF_ATTACH_RAW_TRACEPOINT,	/* tp name */
+	BPF_ATTACH_TRACEPOINT,		/* tp name */
+	BPF_ATTACH_KPROBE,		/* (symbol + offset) or addr */
+	BPF_ATTACH_KRETPROBE,		/* (symbol + offset) or addr */
+	BPF_ATTACH_UPROBE,		/* filename + offset */
+	BPF_ATTACH_URETPROBE,		/* filename + offset */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 6a8a000..da3f336 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -643,3 +643,27 @@ int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
 
 	return fd;
 }
+
+int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 buf_len,
+		      __u32 *prog_id, __u32 *attach_info,
+		      __u64 *probe_offset, __u64 *probe_addr)
+{
+	union bpf_attr attr = {};
+	int err;
+
+	attr.task_fd_query.pid = pid;
+	attr.task_fd_query.fd = fd;
+	attr.task_fd_query.flags = flags;
+	attr.task_fd_query.buf = ptr_to_u64(buf);
+	attr.task_fd_query.buf_len = buf_len;
+
+	err = sys_bpf(BPF_TASK_FD_QUERY, &attr, sizeof(attr));
+	if (!err) {
+		*prog_id = attr.task_fd_query.prog_id;
+		*attach_info = attr.task_fd_query.attach_info;
+		*probe_offset = attr.task_fd_query.probe_offset;
+		*probe_addr = attr.task_fd_query.probe_addr;
+	}
+
+	return err;
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 15bff77..9adfde6 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -107,4 +107,7 @@ int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
 int bpf_raw_tracepoint_open(const char *name, int prog_fd);
 int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
 		 bool do_log);
+int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 buf_len,
+		      __u32 *prog_id, __u32 *prog_info,
+		      __u64 *probe_offset, __u64 *probe_addr);
 #endif
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next v3 4/7] tools/bpf: add ksym_get_addr() in trace_helpers
From: Yonghong Song @ 2018-05-22 16:30 UTC (permalink / raw)
  To: peterz, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180522163048.3128924-1-yhs@fb.com>

Given a kernel function name, ksym_get_addr() will return the kernel
address for this function, or 0 if it cannot find this function name
in /proc/kallsyms. This function will be used later when a kernel
address is used to initiate a kprobe perf event.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/trace_helpers.c | 12 ++++++++++++
 tools/testing/selftests/bpf/trace_helpers.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index 8fb4fe8..3868dcb 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -72,6 +72,18 @@ struct ksym *ksym_search(long key)
 	return &syms[0];
 }
 
+long ksym_get_addr(const char *name)
+{
+	int i;
+
+	for (i = 0; i < sym_cnt; i++) {
+		if (strcmp(syms[i].name, name) == 0)
+			return syms[i].addr;
+	}
+
+	return 0;
+}
+
 static int page_size;
 static int page_cnt = 8;
 static struct perf_event_mmap_page *header;
diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h
index 36d90e3..3b4bcf7 100644
--- a/tools/testing/selftests/bpf/trace_helpers.h
+++ b/tools/testing/selftests/bpf/trace_helpers.h
@@ -11,6 +11,7 @@ struct ksym {
 
 int load_kallsyms(void);
 struct ksym *ksym_search(long key);
+long ksym_get_addr(const char *name);
 
 typedef enum bpf_perf_event_ret (*perf_event_print_fn)(void *data, int size);
 
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH bpf-next 3/7] bpf: btf: Check array->index_type
From: Martin KaFai Lau @ 2018-05-22 16:20 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <169ab3ed-d03f-eb3f-7d4f-6545c5516bec@fb.com>

On Mon, May 21, 2018 at 02:04:51PM -0700, Yonghong Song wrote:
> 
> 
> On 5/18/18 5:16 PM, Martin KaFai Lau wrote:
> > Instead of ingoring the array->index_type field.  Enforce that
> > it must be an unsigned BTF_KIND_INT.
> > 
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >   kernel/bpf/btf.c | 83 ++++++++++++++++++++++++++++++++++++++++----------------
> >   1 file changed, 59 insertions(+), 24 deletions(-)
> > 
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 536e5981ad8c..b4e48dae2240 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -444,6 +444,28 @@ static const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id)
> >   	return btf->types[type_id];
> >   }
> > +/*
> > + * Regular int is not a bit field and it must be either
> > + * u8/u16/u32/u64.
> > + */
> > +static bool btf_type_int_is_regular(const struct btf_type *t)
> > +{
> > +	u16 nr_bits, nr_bytes;
> > +	u32 int_data;
> > +
> > +	int_data = btf_type_int(t);
> > +	nr_bits = BTF_INT_BITS(int_data);
> > +	nr_bytes = BITS_ROUNDUP_BYTES(nr_bits);
> > +	if (BITS_PER_BYTE_MASKED(nr_bits) ||
> > +	    BTF_INT_OFFSET(int_data) ||
> > +	    (nr_bytes != sizeof(u8) && nr_bytes != sizeof(u16) &&
> > +	     nr_bytes != sizeof(u32) && nr_bytes != sizeof(u64))) {
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> >   __printf(2, 3) static void __btf_verifier_log(struct bpf_verifier_log *log,
> >   					      const char *fmt, ...)
> >   {
> > @@ -1309,14 +1331,16 @@ static s32 btf_array_check_meta(struct btf_verifier_env *env,
> >   		return -EINVAL;
> >   	}
> > -	/* We are a little forgiving on array->index_type since
> > -	 * the kernel is not using it.
> > -	 */
> > -	/* Array elem cannot be in type void,
> > -	 * so !array->type is not allowed.
> > +	/* Array elem type and index type cannot be in type void,
> > +	 * so !array->type and !array->index_type are not allowed.
> >   	 */
> >   	if (!array->type || BTF_TYPE_PARENT(array->type)) {
> > -		btf_verifier_log_type(env, t, "Invalid type_id");
> > +		btf_verifier_log_type(env, t, "Invalid elem");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!array->index_type || BTF_TYPE_PARENT(array->index_type)) {
> > +		btf_verifier_log_type(env, t, "Invalid index");
> >   		return -EINVAL;
> >   	}
> > @@ -1329,11 +1353,35 @@ static int btf_array_resolve(struct btf_verifier_env *env,
> >   			     const struct resolve_vertex *v)
> >   {
> >   	const struct btf_array *array = btf_type_array(v->t);
> > -	const struct btf_type *elem_type;
> > -	u32 elem_type_id = array->type;
> > +	const struct btf_type *elem_type, *index_type;
> > +	u32 elem_type_id, index_type_id;
> >   	struct btf *btf = env->btf;
> >   	u32 elem_size;
> > +	/* Check array->index_type */
> > +	index_type_id = array->index_type;
> > +	index_type = btf_type_by_id(btf, index_type_id);
> > +	if (btf_type_is_void_or_null(index_type)) {
> > +		btf_verifier_log_type(env, v->t, "Invalid index");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!env_type_is_resolve_sink(env, index_type) &&
> > +	    !env_type_is_resolved(env, index_type_id))
> > +		return env_stack_push(env, index_type, index_type_id);
> > +
> > +	index_type = btf_type_id_size(btf, &index_type_id, NULL);
> > +	if (!index_type || !btf_type_is_int(index_type) ||
> > +	    /* bit field int is not allowed */
> > +	    !btf_type_int_is_regular(index_type) ||
> > +	    /* unsigned only */
> > +	    BTF_INT_ENCODING(btf_type_int(index_type))) {
> > +		btf_verifier_log_type(env, v->t, "Invalid index");
> > +		return -EINVAL;
> > +	}
> 
> Currently, in uapi/linux/btf.h, we have
> /* Attributes stored in the BTF_INT_ENCODING */
> #define BTF_INT_SIGNED  0x1
> #define BTF_INT_CHAR    0x2
> #define BTF_INT_BOOL    0x4
> #define BTF_INT_VARARGS 0x8
> 
> The BPF_INT_ENCODING value 0 stands for UNSIGNED.
It is a bit field, so getting 0 defined would be confusing.
If BTF_INT_SIGNED bit is not set, then it is not signed.

I think it will help to define them as (1 << x) to make
the bit field nature more obvious.

> Do we want to explicitly document this in uapi/linux/bpf.h?
> 
> > +
> > +	/* Check array->type */
> > +	elem_type_id = array->type;
> >   	elem_type = btf_type_by_id(btf, elem_type_id);
> >   	if (btf_type_is_void_or_null(elem_type)) {
> >   		btf_verifier_log_type(env, v->t,
> > @@ -1351,22 +1399,9 @@ static int btf_array_resolve(struct btf_verifier_env *env,
> >   		return -EINVAL;
> >   	}
> > -	if (btf_type_is_int(elem_type)) {
> > -		int int_type_data = btf_type_int(elem_type);
> > -		u16 nr_bits = BTF_INT_BITS(int_type_data);
> > -		u16 nr_bytes = BITS_ROUNDUP_BYTES(nr_bits);
> > -
> > -		/* Put more restriction on array of int.  The int cannot
> > -		 * be a bit field and it must be either u8/u16/u32/u64.
> > -		 */
> > -		if (BITS_PER_BYTE_MASKED(nr_bits) ||
> > -		    BTF_INT_OFFSET(int_type_data) ||
> > -		    (nr_bytes != sizeof(u8) && nr_bytes != sizeof(u16) &&
> > -		     nr_bytes != sizeof(u32) && nr_bytes != sizeof(u64))) {
> > -			btf_verifier_log_type(env, v->t,
> > -					      "Invalid array of int");
> > -			return -EINVAL;
> > -		}
> > +	if (btf_type_is_int(elem_type) && !btf_type_int_is_regular(elem_type)) {
> > +		btf_verifier_log_type(env, v->t, "Invalid array of int");
> > +		return -EINVAL;
> >   	}
> >   	if (array->nelems && elem_size > U32_MAX / array->nelems) {
> > 

^ permalink raw reply

* [PATCH][V2] net: vxge: fix spelling mistake in macro VXGE_HW_ERR_PRIVILAGED_OPEARATION
From: Colin King @ 2018-05-22 16:18 UTC (permalink / raw)
  To: Jon Mason, David S . Miller, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Rename VXGE_HW_ERR_PRIVILAGED_OPEARATION to VXGE_HW_ERR_PRIVILEGED_OPERATION
to fix spelling mistake.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
V2: PRIVILAGED -> PRIVILEGED, thanks to Edward Cree for spotting that mistake

---
 drivers/net/ethernet/neterion/vxge/vxge-config.c  | 12 ++++++------
 drivers/net/ethernet/neterion/vxge/vxge-config.h  |  2 +-
 drivers/net/ethernet/neterion/vxge/vxge-ethtool.c |  2 +-
 drivers/net/ethernet/neterion/vxge/vxge-main.c    |  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.c b/drivers/net/ethernet/neterion/vxge/vxge-config.c
index 6223930a8155..8656fcc9f2a0 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.c
@@ -693,7 +693,7 @@ __vxge_hw_device_is_privilaged(u32 host_type, u32 func_id)
 		VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)
 		return VXGE_HW_OK;
 	else
-		return VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+		return VXGE_HW_ERR_PRIVILEGED_OPERATION;
 }
 
 /*
@@ -1920,7 +1920,7 @@ enum vxge_hw_status vxge_hw_device_getpause_data(struct __vxge_hw_device *hldev,
 	}
 
 	if (!(hldev->access_rights & VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
-		status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+		status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
 		goto exit;
 	}
 
@@ -3153,7 +3153,7 @@ vxge_hw_mgmt_reg_read(struct __vxge_hw_device *hldev,
 	case vxge_hw_mgmt_reg_type_mrpcim:
 		if (!(hldev->access_rights &
 			VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
-			status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+			status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
 			break;
 		}
 		if (offset > sizeof(struct vxge_hw_mrpcim_reg) - 8) {
@@ -3165,7 +3165,7 @@ vxge_hw_mgmt_reg_read(struct __vxge_hw_device *hldev,
 	case vxge_hw_mgmt_reg_type_srpcim:
 		if (!(hldev->access_rights &
 			VXGE_HW_DEVICE_ACCESS_RIGHT_SRPCIM)) {
-			status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+			status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
 			break;
 		}
 		if (index > VXGE_HW_TITAN_SRPCIM_REG_SPACES - 1) {
@@ -3279,7 +3279,7 @@ vxge_hw_mgmt_reg_write(struct __vxge_hw_device *hldev,
 	case vxge_hw_mgmt_reg_type_mrpcim:
 		if (!(hldev->access_rights &
 			VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
-			status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+			status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
 			break;
 		}
 		if (offset > sizeof(struct vxge_hw_mrpcim_reg) - 8) {
@@ -3291,7 +3291,7 @@ vxge_hw_mgmt_reg_write(struct __vxge_hw_device *hldev,
 	case vxge_hw_mgmt_reg_type_srpcim:
 		if (!(hldev->access_rights &
 			VXGE_HW_DEVICE_ACCESS_RIGHT_SRPCIM)) {
-			status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+			status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
 			break;
 		}
 		if (index > VXGE_HW_TITAN_SRPCIM_REG_SPACES - 1) {
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.h b/drivers/net/ethernet/neterion/vxge/vxge-config.h
index cfa970417f81..5ebdbfedc269 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.h
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.h
@@ -127,7 +127,7 @@ enum vxge_hw_status {
 	VXGE_HW_ERR_INVALID_TCODE 		  = VXGE_HW_BASE_ERR + 14,
 	VXGE_HW_ERR_INVALID_BLOCK_SIZE		  = VXGE_HW_BASE_ERR + 15,
 	VXGE_HW_ERR_INVALID_STATE		  = VXGE_HW_BASE_ERR + 16,
-	VXGE_HW_ERR_PRIVILAGED_OPEARATION	  = VXGE_HW_BASE_ERR + 17,
+	VXGE_HW_ERR_PRIVILEGED_OPERATION	  = VXGE_HW_BASE_ERR + 17,
 	VXGE_HW_ERR_INVALID_PORT 		  = VXGE_HW_BASE_ERR + 18,
 	VXGE_HW_ERR_FIFO		 	  = VXGE_HW_BASE_ERR + 19,
 	VXGE_HW_ERR_VPATH			  = VXGE_HW_BASE_ERR + 20,
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c b/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
index 0452848d1316..2f1bde500420 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
@@ -276,7 +276,7 @@ static void vxge_get_ethtool_stats(struct net_device *dev,
 	*ptr++ = 0;
 	status = vxge_hw_device_xmac_stats_get(hldev, xmac_stats);
 	if (status != VXGE_HW_OK) {
-		if (status != VXGE_HW_ERR_PRIVILAGED_OPEARATION) {
+		if (status != VXGE_HW_ERR_PRIVILEGED_OPERATION) {
 			vxge_debug_init(VXGE_ERR,
 				"%s : %d Failure in getting xmac stats",
 				__func__, __LINE__);
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index b2299f2b2155..64fa94f8b471 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -3484,11 +3484,11 @@ static int vxge_device_register(struct __vxge_hw_device *hldev,
 				0,
 				&stat);
 
-	if (status == VXGE_HW_ERR_PRIVILAGED_OPEARATION)
+	if (status == VXGE_HW_ERR_PRIVILEGED_OPERATION)
 		vxge_debug_init(
 			vxge_hw_device_trace_level_get(hldev),
 			"%s: device stats clear returns"
-			"VXGE_HW_ERR_PRIVILAGED_OPEARATION", ndev->name);
+			"VXGE_HW_ERR_PRIVILEGED_OPERATION", ndev->name);
 
 	vxge_debug_entryexit(vxge_hw_device_trace_level_get(hldev),
 		"%s: %s:%d  Exiting...",
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 16:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, stephen, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <20180522184112-mutt-send-email-mst@kernel.org>

Fixing the subj, sorry about that.

Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>> >
>> >On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> >> > Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> >> > > Use the registration/notification framework supported by the generic
>> >> > > failover infrastructure.
>> >> > > 
>> >> > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> > In previous patchset versions, the common code did
>> >> > netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> >> > (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >> > 
>> >> > This should be part of the common "failover" code.
>> >
>> >Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>> >netvsc and only commonize the notifier and the main event handler routine.
>> >Another complication is that netvsc does part of registration in a delayed workqueue.
>> 
>> :( This kind of degrades the whole efford of having single solution
>> in "failover" module. I think that common parts, as
>> netdev_rx_handler_register() and others certainly is should be inside
>> the common module. This is not a good time to minimize changes. Let's do
>> the thing properly and fix the netvsc mess now.
>> 
>> 
>> >
>> >It should be possible to move some of the code from net_failover.c to generic
>> >failover.c in future if Stephen is ok with it.
>> >
>> >
>> >> > 
>> >> Also note that in the current patchset you use IFF_FAILOVER flag for
>> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> >> IFF_FAILOVER_SLAVE should be used.
>> >
>> >Not sure which code you are referring to.  I only set IFF_FAILOVER_SLAVE
>> >in patch 3.
>> 
>> The existing netvsc driver.
>
>We really can't change netvsc's flags now, even if it's interface is
>messy, it's being used in the field. We can add a flag that makes netvsc
>behave differently, and if this flag also allows enhanced functionality
>userspace will gradually switch.

Okay, although in this case, it really does not make much sense, so be
it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
now. (This once-wrong-forever-wrong policy is flustrating me).

But since this patchset introduces private flag IFF_FAILOVER and
IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
netdevice to get at least some consistency between virtio_net and
netvsc.


>
>Anything breaking userspace I fully expect Stephen to nack and
>IMO with good reason.
>
>-- 
>MST

^ permalink raw reply

* Re: [PATCH bpf-next 3/7] bpf: btf: Check array->index_type
From: Martin KaFai Lau @ 2018-05-22 16:04 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <a1016774-1604-f230-5eea-95656e01f692@fb.com>

On Mon, May 21, 2018 at 09:41:11PM -0700, Yonghong Song wrote:
> 
> 
> On 5/18/18 5:16 PM, Martin KaFai Lau wrote:
> > Instead of ingoring the array->index_type field.  Enforce that
> > it must be an unsigned BTF_KIND_INT.
> > 
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >   kernel/bpf/btf.c | 83 ++++++++++++++++++++++++++++++++++++++++----------------
> >   1 file changed, 59 insertions(+), 24 deletions(-)
> > 
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 536e5981ad8c..b4e48dae2240 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -444,6 +444,28 @@ static const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id)
> >   	return btf->types[type_id];
> >   }
> > +/*
> > + * Regular int is not a bit field and it must be either
> > + * u8/u16/u32/u64.
> > + */
> > +static bool btf_type_int_is_regular(const struct btf_type *t)
> > +{
> > +	u16 nr_bits, nr_bytes;
> > +	u32 int_data;
> > +
> > +	int_data = btf_type_int(t);
> > +	nr_bits = BTF_INT_BITS(int_data);
> > +	nr_bytes = BITS_ROUNDUP_BYTES(nr_bits);
> > +	if (BITS_PER_BYTE_MASKED(nr_bits) ||
> > +	    BTF_INT_OFFSET(int_data) ||
> > +	    (nr_bytes != sizeof(u8) && nr_bytes != sizeof(u16) &&
> > +	     nr_bytes != sizeof(u32) && nr_bytes != sizeof(u64))) {
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> >   __printf(2, 3) static void __btf_verifier_log(struct bpf_verifier_log *log,
> >   					      const char *fmt, ...)
> >   {
> > @@ -1309,14 +1331,16 @@ static s32 btf_array_check_meta(struct btf_verifier_env *env,
> >   		return -EINVAL;
> >   	}
> > -	/* We are a little forgiving on array->index_type since
> > -	 * the kernel is not using it.
> > -	 */
> > -	/* Array elem cannot be in type void,
> > -	 * so !array->type is not allowed.
> > +	/* Array elem type and index type cannot be in type void,
> > +	 * so !array->type and !array->index_type are not allowed.
> >   	 */
> >   	if (!array->type || BTF_TYPE_PARENT(array->type)) {
> > -		btf_verifier_log_type(env, t, "Invalid type_id");
> > +		btf_verifier_log_type(env, t, "Invalid elem");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!array->index_type || BTF_TYPE_PARENT(array->index_type)) {
> > +		btf_verifier_log_type(env, t, "Invalid index");
> >   		return -EINVAL;
> >   	}
> > @@ -1329,11 +1353,35 @@ static int btf_array_resolve(struct btf_verifier_env *env,
> >   			     const struct resolve_vertex *v)
> >   {
> >   	const struct btf_array *array = btf_type_array(v->t);
> > -	const struct btf_type *elem_type;
> > -	u32 elem_type_id = array->type;
> > +	const struct btf_type *elem_type, *index_type;
> > +	u32 elem_type_id, index_type_id;
> >   	struct btf *btf = env->btf;
> >   	u32 elem_size;
> > +	/* Check array->index_type */
> > +	index_type_id = array->index_type;
> > +	index_type = btf_type_by_id(btf, index_type_id);
> > +	if (btf_type_is_void_or_null(index_type)) {
> > +		btf_verifier_log_type(env, v->t, "Invalid index");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!env_type_is_resolve_sink(env, index_type) &&
> > +	    !env_type_is_resolved(env, index_type_id))
> > +		return env_stack_push(env, index_type, index_type_id);
> > +
> > +	index_type = btf_type_id_size(btf, &index_type_id, NULL);
> > +	if (!index_type || !btf_type_is_int(index_type) ||
> > +	    /* bit field int is not allowed */
> > +	    !btf_type_int_is_regular(index_type) ||
> > +	    /* unsigned only */
> > +	    BTF_INT_ENCODING(btf_type_int(index_type))) {
> 
> Could you explain why you only support array index type to be
> unsigned? A lot of test cases  in Patch #7 are amended with unsigned types.
> In C, signed integers can surely be index, e.g., a[-1].
I will allow the signed in v2.

> 
> > +		btf_verifier_log_type(env, v->t, "Invalid index");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Check array->type */
> > +	elem_type_id = array->type;
> >   	elem_type = btf_type_by_id(btf, elem_type_id);
> >   	if (btf_type_is_void_or_null(elem_type)) {
> >   		btf_verifier_log_type(env, v->t,
> > @@ -1351,22 +1399,9 @@ static int btf_array_resolve(struct btf_verifier_env *env,
> >   		return -EINVAL;
> >   	}
> > -	if (btf_type_is_int(elem_type)) {
> > -		int int_type_data = btf_type_int(elem_type);
> > -		u16 nr_bits = BTF_INT_BITS(int_type_data);
> > -		u16 nr_bytes = BITS_ROUNDUP_BYTES(nr_bits);
> > -
> > -		/* Put more restriction on array of int.  The int cannot
> > -		 * be a bit field and it must be either u8/u16/u32/u64.
> > -		 */
> > -		if (BITS_PER_BYTE_MASKED(nr_bits) ||
> > -		    BTF_INT_OFFSET(int_type_data) ||
> > -		    (nr_bytes != sizeof(u8) && nr_bytes != sizeof(u16) &&
> > -		     nr_bytes != sizeof(u32) && nr_bytes != sizeof(u64))) {
> > -			btf_verifier_log_type(env, v->t,
> > -					      "Invalid array of int");
> > -			return -EINVAL;
> > -		}
> > +	if (btf_type_is_int(elem_type) && !btf_type_int_is_regular(elem_type)) {
> > +		btf_verifier_log_type(env, v->t, "Invalid array of int");
> > +		return -EINVAL;
> >   	}
> >   	if (array->nelems && elem_size > U32_MAX / array->nelems) {
> > 

^ permalink raw reply

* Re: [lkp-robot] [selftests] 3f45449746: kernel_selftests.net.fib_rule_tests.sh.fail
From: Roopa Prabhu @ 2018-05-22 16:01 UTC (permalink / raw)
  To: kernel test robot
  Cc: David Miller, netdev, David Ahern, Nikolay Aleksandrov,
	Ido Schimmel, lkp
In-Reply-To: <20180522051541.GF7714@yexl-desktop>

On Mon, May 21, 2018 at 10:15 PM, kernel test robot
<xiaolong.ye@intel.com> wrote:
>
> FYI, we noticed the following commit (built with gcc-7):
>
> commit: 3f454497465bb6b1533772900bb35fab1324e35e ("selftests: net: initial fib rule tests")
> url: https://github.com/0day-ci/linux/commits/Roopa-Prabhu/ipv4-support-sport-dport-and-ip_proto-in-RTM_GETROUTE/20180518-051740
>
>
> in testcase: kernel_selftests
> with following parameters:
>
>         group: kselftests-02
>
> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>
>
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> selftests: fib_rule_tests.sh
> ========================================
> selftests: Warning: file fib_rule_tests.sh is not executable, correct this.
> not ok 1..16 selftests: fib_rule_tests.sh [FAIL]
> make: Leaving directory '/usr/src/linux-selftests-x86_64-rhel-7.2-3f454497465bb6b1533772900bb35fab1324e35e/tools/testing/selftests/net'
>
>

Thanks for the report. will fix

^ permalink raw reply

* Re: [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
From: Saeed Mahameed @ 2018-05-22 16:01 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Huy Nguyen, David S. Miller, Linux Netdev List, Saeed Mahameed
In-Reply-To: <CAJ3xEMh=LS4_iv6B7VUrB0yWVytnY9YKW0RPGYFmDA31C4e7Cw@mail.gmail.com>

On Tue, May 22, 2018 at 3:21 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, May 22, 2018 at 1:19 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Tue, May 22, 2018 at 12:04 AM, Saeed Mahameed <saeedm@mellanox.com> wrote:
>>> From: Huy Nguyen <huyn@mellanox.com>
>>>
>>> Add pbmc and pptb in the port_access_reg_cap_mask. These two
>>> bits determine if device supports receive buffer configuration.
>>>
>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>>
>> Huy, Parav reviewed your code to death (but he's still alive and kicking!),
>> go a head and add his R.Bs note to the entire series.
>
> when you fix that, also address checkpatch's scream on
>
> WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
>
> in four cases along the series
>

We are going to do this once for all mlx5 files soon, i don't want to
have two types of license headers in the meanwhile.
let's keep this as is until then.

>
>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply

* RE: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: David Laight @ 2018-05-22 15:36 UTC (permalink / raw)
  To: 'Marcelo Ricardo Leitner', Michael Tuexen
  Cc: Neil Horman, Xin Long, network dev, linux-sctp@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <20180521155127.GB5489@localhost.localdomain>

...
> > >> the reason this was added is to have a specified way to allow a system to
> > >> behave like a client and server making use of the INIT collision.
> > >>
> > >> For 1-to-many style sockets you can do this by creating a socket, binding it,
> > >> calling listen on it and trying to connect to the peer.
> > >>
> > >> For 1-to-1 style sockets you need two sockets for it. One listener and one
> > >> you use to connect (and close it in case of failure, open a new one...).
> > >>
> > >> It was not clear if one can achieve this with SO_REUSEPORT and/or SO_REUSEADDR
> > >> on all platforms. We left that unspecified.
> > >>
> > >> I hope this makes the intention clearer.

That really doesn't help for 1-1 sockets.
You need a way of accepting connections that come from a specific remote host.

Otherwise the application has to verify that the remove address on the incoming
connection is the one it is expecting.

It is also a problem if two different applications (instances) want to
generate 'INIT collisions' for the same local addresses but different remote
ones.

The only way 'INIT collisions' are going to work is with a different
option that stops the receipt of an ABORT chunk erroring a connect.

	David

^ permalink raw reply

* Re: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun
From: Willem de Bruijn @ 2018-05-22 15:46 UTC (permalink / raw)
  To: Jon Rosen (jrosen)
  Cc: David S. Miller, Willem de Bruijn, Eric Dumazet, Kees Cook,
	David Windsor, Rosen, Rami, Reshetova, Elena, Mike Maloney,
	Benjamin Poirier, Thomas Gleixner, Greg Kroah-Hartman,
	open list:NETWORKING [GENERAL], open list
In-Reply-To: <00de829386b04823bb58a555fb620317@XCH-RTP-016.cisco.com>

On Tue, May 22, 2018 at 10:12 AM, Jon Rosen (jrosen) <jrosen@cisco.com> wrote:
> On Monday, May 21, 2018 2:17 PM, Jon Rosen (jrosen) <jrosen@cisco.com> wrote:
>> On Monday, May 21, 2018 1:07 PM, Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>> On Mon, May 21, 2018 at 8:57 AM, Jon Rosen (jrosen) <jrosen@cisco.com> wrote:
>
> ...snip...
>
>>>
>>> A setsockopt for userspace to signal a stricter interpretation of
>>> tp_status to elide the shadow hack could then be considered.
>>> It's not pretty. Either way, no full new version is required.
>>>
>>>> As much as I would like to find a solution that doesn't require
>>>> the spin lock I have yet to do so. Maybe the answer is that
>>>> existing applications will need to suffer the performance impact
>>>> but a new version or option for TPACKET_V1/V2 could be added to
>>>> indicate strict adherence of the TP_STATUS_USER bit and then the
>>>> original diffs could be used.
>
> It looks like adding new socket options is pretty rare so I
> wonder if a better option might be to define a new TP_STATUS_XXX
> bit which would signal from a userspace application to the kernel
> that it strictly interprets the TP_STATUS_USER bit to determine
> ownership.
>
> Todays applications set tp_status = TP_STATUS_KERNEL(0) for the
> kernel to pick up the entry.  We could define a new value to pass
> ownership as well as one to indicate to other kernel threads that
> an entry is inuse:
>
>         #define TP_STATUS_USER_TO_KERNEL        (1 << 8)
>         #define TP_STATUS_INUSE                 (1 << 9)
>
> If the kernel sees tp_status == TP_STATUS_KERNEL then it should
> use the shadow method for tacking ownership. If it sees tp_status
> == TP_STATUS_USER_TO_KERNEL then it can use the TP_STATUS_INUSE
> method.

Interesting idea. Userspace would have to consistently follow the
new behavior for all slots, which is hard to enforce. And as this is
checked at runtime, the kernel always has to defensively allocate
the shadow memory. I do think that a per-ring option set before
ring allocation is simpler. Either way, this optimization would be a
separate follow-on patch.

^ permalink raw reply

* Re: Shepherd request (P83): Multipath TCP: Present Use Cases and an Upstream Future
From: Michael S. Tsirkin @ 2018-05-22 15:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Samudrala, Sridhar, stephen, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <20180522153614.GK2149@nanopsycho>

On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
> >
> >On 5/22/2018 2:08 AM, Jiri Pirko wrote:
> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
> >> > Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
> >> > > Use the registration/notification framework supported by the generic
> >> > > failover infrastructure.
> >> > > 
> >> > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> > In previous patchset versions, the common code did
> >> > netdev_rx_handler_register() and netdev_upper_dev_link() etc
> >> > (netvsc_vf_join()). Now, this is still done in netvsc. Why?
> >> > 
> >> > This should be part of the common "failover" code.
> >
> >Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
> >netvsc and only commonize the notifier and the main event handler routine.
> >Another complication is that netvsc does part of registration in a delayed workqueue.
> 
> :( This kind of degrades the whole efford of having single solution
> in "failover" module. I think that common parts, as
> netdev_rx_handler_register() and others certainly is should be inside
> the common module. This is not a good time to minimize changes. Let's do
> the thing properly and fix the netvsc mess now.
> 
> 
> >
> >It should be possible to move some of the code from net_failover.c to generic
> >failover.c in future if Stephen is ok with it.
> >
> >
> >> > 
> >> Also note that in the current patchset you use IFF_FAILOVER flag for
> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
> >> IFF_FAILOVER_SLAVE should be used.
> >
> >Not sure which code you are referring to.  I only set IFF_FAILOVER_SLAVE
> >in patch 3.
> 
> The existing netvsc driver.

We really can't change netvsc's flags now, even if it's interface is
messy, it's being used in the field. We can add a flag that makes netvsc
behave differently, and if this flag also allows enhanced functionality
userspace will gradually switch.

Anything breaking userspace I fully expect Stephen to nack and
IMO with good reason.

-- 
MST

^ permalink raw reply

* Re: [PATCH net-next v4 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE
From: Roopa Prabhu @ 2018-05-22 15:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Nikolay Aleksandrov, David Ahern,
	Ido Schimmel
In-Reply-To: <9e2a1bc7-e80a-1642-3b83-2341fddc6097@gmail.com>

On Tue, May 22, 2018 at 8:04 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/22/2018 07:51 AM, Roopa Prabhu wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>>
>
> ...
>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 4d622112..cf5cfc5 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -649,6 +649,10 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
>>       [RTA_ENCAP]             = { .type = NLA_NESTED },
>>       [RTA_UID]               = { .type = NLA_U32 },
>>       [RTA_MARK]              = { .type = NLA_U32 },
>> +     [RTA_TABLE]             = { .type = NLA_U32 },
>> +     [RTA_IP_PROTO]          = { .type = NLA_U8 },
>> +     [RTA_SPORT]             = { .type = NLA_U16 },
>> +     [RTA_DPORT]             = { .type = NLA_U16 },
>>  };
>
> Hi Roopa
>
> RTA_TABLE addition looks like a bug fix for net tree ?
>
> This should be sent as an independent patch IMO.
>
> Thanks.

sure, I can do that. thanks.

^ permalink raw reply

* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 15:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
	virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
	davem
In-Reply-To: <20180522181804-mutt-send-email-mst@kernel.org>

Tue, May 22, 2018 at 05:32:30PM CEST, mst@redhat.com wrote:
>On Tue, May 22, 2018 at 05:13:43PM +0200, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 03:39:33PM CEST, mst@redhat.com wrote:
>> >On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
>> >> Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
>> >> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
>> >> >> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
>> >> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
>> >> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> >> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> >> >> >> >>Use the registration/notification framework supported by the generic
>> >> >> >> >>failover infrastructure.
>> >> >> >> >>
>> >> >> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> >> >> >
>> >> >> >> >In previous patchset versions, the common code did
>> >> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> >> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >> >> >> >
>> >> >> >> >This should be part of the common "failover" code.
>> >> >> >> >
>> >> >> >> 
>> >> >> >> Also note that in the current patchset you use IFF_FAILOVER flag for
>> >> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> >> >> >> IFF_FAILOVER_SLAVE should be used.
>> >> >> >
>> >> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
>> >> >> 
>> >> >> No. IFF_SLAVE is for bonding.
>> >> >
>> >> >What breaks if we reuse it for failover?
>> >> 
>> >> This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
>> >> And failover slave is not a bonding slave.
>> >
>> >That does not really answer the question.  I'd claim it's sufficiently
>> >like a bond slave for IFF_SLAVE to make sense.
>> >
>> >In fact you will find that netvsc already sets IFF_SLAVE, and so
>> 
>> netvsc does the whole failover thing in a wrong way. This patchset is
>> trying to fix it.
>
>Maybe, but we don't need gratuitous changes either, especially if they
>break userspace.

What do you mean by the "break"? It was a mistake to reuse IFF_SLAVE at
the first place, lets fix it. If some userspace depends on that flag, it
is broken anyway.


>
>> >does e.g. the eql driver.
>> >
>> >The advantage of using IFF_SLAVE is that userspace knows to skip it.  If
>> 
>> The userspace should know how to skip other types of slaves - team,
>> bridge, ovs, etc.
>> The "master link" should be the one to look at.
>> 
>
>How should existing userspace know which ones to skip and which one is
>the master?  Right now userspace seems to assume whatever does not have
>IFF_SLAVE should be looked at. Are you saying that's not the right thing

Why do you say so? What do you mean by "looked at"? Certainly not.
IFLA_MASTER is the attribute that should be looked at, nothing else.


>to do and userspace should be fixed? What should userspace do in
>your opinion that will be forward compatible with future kernels?
>
>> 
>> >we don't set IFF_SLAVE existing userspace tries to use the lowerdev.
>> 
>> Each master type has a IFF_ master flag and IFF_ slave flag.
>
>Could you give some examples please?

enum netdev_priv_flags {
        IFF_EBRIDGE                     = 1<<1,
        IFF_BRIDGE_PORT                 = 1<<9,
        IFF_OPENVSWITCH                 = 1<<20,
        IFF_OVS_DATAPATH                = 1<<10,
	IFF_L3MDEV_MASTER               = 1<<18,
        IFF_L3MDEV_SLAVE                = 1<<21,
        IFF_TEAM                        = 1<<22,
        IFF_TEAM_PORT                   = 1<<13,
};


>
>> In private
>> flag. I don't see no reason to break this pattern here.
>
>Other masters are setup from userspace, this one is set up automatically
>by kernel. So the bar is higher, we need an interface that existing
>userspace knows about.  We can't just say "oh if userspace set this up
>it should know to skip lowerdevs".
>
>Otherwise multiple interfaces with same mac tend to confuse userspace.

No difference, really.
Regardless who does the setup, and independent userspace deamon should
react accordingly.

^ permalink raw reply

* Re: [PATCH] net: vxge: fix spelling mistake in macro VXGE_HW_ERR_PRIVILAGED_OPEARATION
From: Edward Cree @ 2018-05-22 15:44 UTC (permalink / raw)
  To: Colin King, Jon Mason, David S . Miller, netdev
  Cc: kernel-janitors, linux-kernel
In-Reply-To: <20180522153616.13980-1-colin.king@canonical.com>

On 22/05/18 16:36, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> Rename VXGE_HW_ERR_PRIVILAGED_OPEARATION to VXGE_HW_ERR_PRIVILAGED_OPERATION
> to fix spelling mistake.
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
"Privilaged" doesn't look right either, maybe fix both at once?
 -> VXGE_HW_PRIVILEGED_OPERATION

-Ed

^ permalink raw reply

* [PATCH][V2] net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
From: Colin King @ 2018-05-22 15:42 UTC (permalink / raw)
  To: Tariq Toukan, David S . Miller, netdev, linux-rdma
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in mlx4_dbg debug message and also
change the phrasing of the message so that is is more readable

Signed-off-by: Colin Ian King <colin.king@canonical.com>

---
V2: rephrase message, as helpfully suggested by Tariq Toukan
---
 drivers/net/ethernet/mellanox/mlx4/intf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 2edcce98ab2d..65482f004e50 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -172,7 +172,7 @@ int mlx4_do_bond(struct mlx4_dev *dev, bool enable)
 		list_add_tail(&dev_ctx->list, &priv->ctx_list);
 		spin_unlock_irqrestore(&priv->ctx_lock, flags);
 
-		mlx4_dbg(dev, "Inrerface for protocol %d restarted with when bonded mode is %s\n",
+		mlx4_dbg(dev, "Interface for protocol %d restarted with bonded mode %s\n",
 			 dev_ctx->intf->protocol, enable ?
 			 "enabled" : "disabled");
 	}
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun
From: Willem de Bruijn @ 2018-05-22 15:41 UTC (permalink / raw)
  To: Jon Rosen (jrosen)
  Cc: David S. Miller, Willem de Bruijn, Eric Dumazet, Kees Cook,
	David Windsor, Rosen, Rami, Reshetova, Elena, Mike Maloney,
	Benjamin Poirier, Thomas Gleixner, Greg Kroah-Hartman,
	open list:NETWORKING [GENERAL], open list
In-Reply-To: <64c91d04479348cabbcdf1df0ff7d40f@XCH-RTP-016.cisco.com>

>>> I think the bigger issues as you've pointed out are the cost of
>>> the additional spin lock and should the additional state be
>>> stored in-band (fewer cache lines) or out-of band (less risk of
>>> breaking due to unpredictable application behavior).
>>
>> We don't need the spinlock if clearing the shadow byte after
>> setting the status to user.
>>
>> Worst case, user will set it back to kernel while the shadow
>> byte is not cleared yet and the next producer will drop a packet.
>> But next producers will make progress, so there is no deadlock
>> or corruption.
>
> I thought so too for a while but after spending more time than I
> care to admit I relized the following sequence was occuring:
>
>    Core A                       Core B
>    ------                       ------
>    - Enter spin_lock
>    -   Get tp_status of head (X)
>        tp_status == 0
>    -   Check inuse
>        inuse == 0
>    -   Allocate entry X
>        advance head (X+1)
>        set inuse=1
>    - Exit spin_lock
>
>      <very long delay>
>
>                                 <allocate N-1 entries
>                                 where N = size of ring>
>
>                                 - Enter spin_lock
>                                 -   get tp_status of head (X+N)
>                                     tp_status == 0 (but slot
>                                     in use for X on core A)
>
>    - write tp_status of         <--- trouble!
>      X = TP_STATUS_USER         <--- trouble!
>    - write inuse=0              <--- trouble!
>
>                                 -   Check inuse
>                                     inuse == 0
>                                 -   Allocate entry X+N
>                                     advance head (X+N+1)
>                                     set inuse=1
>                                 - Exit spin_lock
>
>
> At this point Core A just passed slot X to userspace with a
> packet and Core B has just been assigned slot X+N (same slot as
> X) for it's new packet. Both cores A and B end up filling in that
> slot.  Tracking ths donw was one of the reasons it took me a
> while to produce these updated diffs.

Is this not just an ordering issue? Since inuse is set after tp_status,
it has to be tested first (and barriers are needed to avoid reordering).

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: Add PPS and Flexible PPS support
From: Jose Abreu @ 2018-05-22 15:38 UTC (permalink / raw)
  To: Andrew Lunn, Jose Abreu, Richard Cochran
  Cc: netdev, David S. Miller, Joao Pinto, Vitor Soares,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <20180522151418.GA4396@lunn.ch>

On 22-05-2018 16:14, Andrew Lunn wrote:
> On Tue, May 22, 2018 at 01:58:40PM +0100, Jose Abreu wrote:
>> This adds support for PPS output and Flexible PPS (which is equivalent
>> to per_out output of PTP subsystem).
> You forgot to Cc: the PTP maintainer, Richard Cochran <richardcochran@gmail.com>

Thanks Andrew!

Richard,

Can you take a look at this patch? (I can cc you in the original
thread if you're not a netdev subscriber).

Thanks and Best Regards,
Jose Miguel Abreu

>
>     Andrew

^ permalink raw reply

* Re: [PATCH net-next 08/12] amd-xgbe: Add ethtool show/set channels support
From: Tom Lendacky @ 2018-05-22 15:37 UTC (permalink / raw)
  To: Edward Cree, Jakub Kicinski; +Cc: netdev, David Miller
In-Reply-To: <ddce7e2b-ab4d-8eb3-9b56-acb192ae29a6@solarflare.com>

On 5/22/2018 8:29 AM, Edward Cree wrote:
> On 22/05/18 14:24, Tom Lendacky wrote:
>> The amd-xgbe driver is not designed to allocate separate IRQs for Rx and
>> Tx.  In general, there is one IRQ for a channel of which Tx and Rx are
>> shared.  You can have more Tx channels than Rx channels or vice-versa, but
>> the min() of those numbers will be a combined Tx/Rx with the excess being
>> Tx or Rx only: e.g. combined 0 tx 8 rx 10 results in 8 combined channels
>> plus two Rx only channels.
> If you cannot allocate the channels requested by the user, surely the correct
>  thing is not to fudge it into something similar, but rather to return an
>  error from the ethtool set_channels() op.

Ok, another vote on changing the logic.  I'll update it and submit a v2.

Thanks,
Tom

> 
> -Ed
> 

^ permalink raw reply

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
From: Huy Nguyen @ 2018-05-22 15:36 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: David S. Miller, netdev, Jiri Pirko, Or Gerlitz, Parav Pandit
In-Reply-To: <20180521222026.4f54f479@cakuba>

On 5/22/2018 12:20 AM, Jakub Kicinski wrote:
> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
>> From: Huy Nguyen <huyn@mellanox.com>
>>
>> In this patch, we add dcbnl buffer attribute to allow user
>> change the NIC's buffer configuration such as priority
>> to buffer mapping and buffer size of individual buffer.
>>
>> This attribute combined with pfc attribute allows advance user to
>> fine tune the qos setting for specific priority queue. For example,
>> user can give dedicated buffer for one or more prirorities or user
>> can give large buffer to certain priorities.
>>
>> We present an use case scenario where dcbnl buffer attribute configured
>> by advance user helps reduce the latency of messages of different sizes.
>>
>> Scenarios description:
>> On ConnectX-5, we run latency sensitive traffic with
>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
>> and large message sizes to their own pfc enables priorities as follow.
>>    Priorities 1 & 2 (64B, 256B and 1KB)
>>    Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>>    Priorities 5 & 6 (512KB and 1MB)
>>
>> By default, ConnectX-5 maps all pfc enabled priorities to a single
>> lossless fixed buffer size of 50% of total available buffer space. The
>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
>> we create three equal size lossless buffers. Each buffer has 25% of total
>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
>> to lossless  buffer mappings are set as follow.
>>    Priorities 1 & 2 on lossless buffer #1
>>    Priorities 3 & 4 on lossless buffer #2
>>    Priorities 5 & 6 on lossless buffer #3
>>
>> We observe improvements in latency for small and medium message sizes
>> as follows. Please note that the large message sizes bandwidth performance is
>> reduced but the total bandwidth remains the same.
>>    256B message size (42 % latency reduction)
>>    4K message size (21% latency reduction)
>>    64K message size (16% latency reduction)
>>
>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> On a cursory look this bares a lot of resemblance to devlink shared
> buffer configuration ABI.  Did you look into using that?
>
> Just to be clear devlink shared buffer ABIs don't require representors
> and "switchdev mode".
> .
[HQN] Dear Jakub, there are several reasons that devlink shared buffer 
ABI cannot be used:
1. The devlink shared buffer ABI is written based on the switch cli 
which you can find out more
from this link https://community.mellanox.com/docs/DOC-2558.
2. The dcbnl interfaces have been used for QoS settings. In NIC, the 
buffer configuration are tied to
priority (ETS PFC). The buffer configuration are not tied to port like 
switch.
3. Shared buffer, alpha, threshold are switch specific terms.

Please let me know if you have any further question.
Regards,
Huy Nguyen

^ permalink raw reply

* Re: Shepherd request (P83): Multipath TCP: Present Use Cases and an Upstream Future
From: Jiri Pirko @ 2018-05-22 15:36 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <39081bce-3913-5b07-3d07-0c476fca5e78@intel.com>

Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>
>On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> > Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> > > Use the registration/notification framework supported by the generic
>> > > failover infrastructure.
>> > > 
>> > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > In previous patchset versions, the common code did
>> > netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> > (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> > 
>> > This should be part of the common "failover" code.
>
>Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>netvsc and only commonize the notifier and the main event handler routine.
>Another complication is that netvsc does part of registration in a delayed workqueue.

:( This kind of degrades the whole efford of having single solution
in "failover" module. I think that common parts, as
netdev_rx_handler_register() and others certainly is should be inside
the common module. This is not a good time to minimize changes. Let's do
the thing properly and fix the netvsc mess now.


>
>It should be possible to move some of the code from net_failover.c to generic
>failover.c in future if Stephen is ok with it.
>
>
>> > 
>> Also note that in the current patchset you use IFF_FAILOVER flag for
>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> IFF_FAILOVER_SLAVE should be used.
>
>Not sure which code you are referring to.  I only set IFF_FAILOVER_SLAVE
>in patch 3.

The existing netvsc driver.

^ permalink raw reply

* [PATCH] net: vxge: fix spelling mistake in macro VXGE_HW_ERR_PRIVILAGED_OPEARATION
From: Colin King @ 2018-05-22 15:36 UTC (permalink / raw)
  To: Jon Mason, David S . Miller, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Rename VXGE_HW_ERR_PRIVILAGED_OPEARATION to VXGE_HW_ERR_PRIVILAGED_OPERATION
to fix spelling mistake.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/neterion/vxge/vxge-config.c  | 12 ++++++------
 drivers/net/ethernet/neterion/vxge/vxge-config.h  |  2 +-
 drivers/net/ethernet/neterion/vxge/vxge-ethtool.c |  2 +-
 drivers/net/ethernet/neterion/vxge/vxge-main.c    |  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.c b/drivers/net/ethernet/neterion/vxge/vxge-config.c
index 6223930a8155..8656fcc9f2a0 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.c
@@ -693,7 +693,7 @@ __vxge_hw_device_is_privilaged(u32 host_type, u32 func_id)
 		VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)
 		return VXGE_HW_OK;
 	else
-		return VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+		return VXGE_HW_ERR_PRIVILAGED_OPERATION;
 }
 
 /*
@@ -1920,7 +1920,7 @@ enum vxge_hw_status vxge_hw_device_getpause_data(struct __vxge_hw_device *hldev,
 	}
 
 	if (!(hldev->access_rights & VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
-		status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+		status = VXGE_HW_ERR_PRIVILAGED_OPERATION;
 		goto exit;
 	}
 
@@ -3153,7 +3153,7 @@ vxge_hw_mgmt_reg_read(struct __vxge_hw_device *hldev,
 	case vxge_hw_mgmt_reg_type_mrpcim:
 		if (!(hldev->access_rights &
 			VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
-			status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+			status = VXGE_HW_ERR_PRIVILAGED_OPERATION;
 			break;
 		}
 		if (offset > sizeof(struct vxge_hw_mrpcim_reg) - 8) {
@@ -3165,7 +3165,7 @@ vxge_hw_mgmt_reg_read(struct __vxge_hw_device *hldev,
 	case vxge_hw_mgmt_reg_type_srpcim:
 		if (!(hldev->access_rights &
 			VXGE_HW_DEVICE_ACCESS_RIGHT_SRPCIM)) {
-			status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+			status = VXGE_HW_ERR_PRIVILAGED_OPERATION;
 			break;
 		}
 		if (index > VXGE_HW_TITAN_SRPCIM_REG_SPACES - 1) {
@@ -3279,7 +3279,7 @@ vxge_hw_mgmt_reg_write(struct __vxge_hw_device *hldev,
 	case vxge_hw_mgmt_reg_type_mrpcim:
 		if (!(hldev->access_rights &
 			VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
-			status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+			status = VXGE_HW_ERR_PRIVILAGED_OPERATION;
 			break;
 		}
 		if (offset > sizeof(struct vxge_hw_mrpcim_reg) - 8) {
@@ -3291,7 +3291,7 @@ vxge_hw_mgmt_reg_write(struct __vxge_hw_device *hldev,
 	case vxge_hw_mgmt_reg_type_srpcim:
 		if (!(hldev->access_rights &
 			VXGE_HW_DEVICE_ACCESS_RIGHT_SRPCIM)) {
-			status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+			status = VXGE_HW_ERR_PRIVILAGED_OPERATION;
 			break;
 		}
 		if (index > VXGE_HW_TITAN_SRPCIM_REG_SPACES - 1) {
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.h b/drivers/net/ethernet/neterion/vxge/vxge-config.h
index cfa970417f81..5ebdbfedc269 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.h
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.h
@@ -127,7 +127,7 @@ enum vxge_hw_status {
 	VXGE_HW_ERR_INVALID_TCODE 		  = VXGE_HW_BASE_ERR + 14,
 	VXGE_HW_ERR_INVALID_BLOCK_SIZE		  = VXGE_HW_BASE_ERR + 15,
 	VXGE_HW_ERR_INVALID_STATE		  = VXGE_HW_BASE_ERR + 16,
-	VXGE_HW_ERR_PRIVILAGED_OPEARATION	  = VXGE_HW_BASE_ERR + 17,
+	VXGE_HW_ERR_PRIVILAGED_OPERATION	  = VXGE_HW_BASE_ERR + 17,
 	VXGE_HW_ERR_INVALID_PORT 		  = VXGE_HW_BASE_ERR + 18,
 	VXGE_HW_ERR_FIFO		 	  = VXGE_HW_BASE_ERR + 19,
 	VXGE_HW_ERR_VPATH			  = VXGE_HW_BASE_ERR + 20,
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c b/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
index 0452848d1316..2f1bde500420 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
@@ -276,7 +276,7 @@ static void vxge_get_ethtool_stats(struct net_device *dev,
 	*ptr++ = 0;
 	status = vxge_hw_device_xmac_stats_get(hldev, xmac_stats);
 	if (status != VXGE_HW_OK) {
-		if (status != VXGE_HW_ERR_PRIVILAGED_OPEARATION) {
+		if (status != VXGE_HW_ERR_PRIVILAGED_OPERATION) {
 			vxge_debug_init(VXGE_ERR,
 				"%s : %d Failure in getting xmac stats",
 				__func__, __LINE__);
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index b2299f2b2155..64fa94f8b471 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -3484,11 +3484,11 @@ static int vxge_device_register(struct __vxge_hw_device *hldev,
 				0,
 				&stat);
 
-	if (status == VXGE_HW_ERR_PRIVILAGED_OPEARATION)
+	if (status == VXGE_HW_ERR_PRIVILAGED_OPERATION)
 		vxge_debug_init(
 			vxge_hw_device_trace_level_get(hldev),
 			"%s: device stats clear returns"
-			"VXGE_HW_ERR_PRIVILAGED_OPEARATION", ndev->name);
+			"VXGE_HW_ERR_PRIVILAGED_OPERATION", ndev->name);
 
 	vxge_debug_entryexit(vxge_hw_device_trace_level_get(hldev),
 		"%s: %s:%d  Exiting...",
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next v2 2/2] gso: limit udp gso to egress-only virtual devices
From: Willem de Bruijn @ 2018-05-22 15:34 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn, Alexander Duyck
In-Reply-To: <20180522153440.204128-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Until the udp receive stack supports large packets (UDP GRO), GSO
packets must not loop from the egress to the ingress path.

Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual
devices through NETIF_F_GSO_ENCAP_ALL as this included devices that
may loop packets, such as veth and macvlan.

Instead add it to specific devices that forward to another device's
egress path, bonding and team.

Fixes: 83aa025f535f ("udp: add gso support to virtual devices")
CC: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/bonding/bond_main.c | 5 +++--
 drivers/net/team/team.c         | 5 +++--
 include/linux/netdev_features.h | 1 -
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 06efdf6a762b..fea17b92b1ae 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1107,7 +1107,8 @@ static void bond_compute_features(struct bonding *bond)
 
 done:
 	bond_dev->vlan_features = vlan_features;
-	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
+	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
+				    NETIF_F_GSO_UDP_L4;
 	bond_dev->gso_max_segs = gso_max_segs;
 	netif_set_gso_max_size(bond_dev, gso_max_size);
 
@@ -4268,7 +4269,7 @@ void bond_setup(struct net_device *bond_dev)
 				NETIF_F_HW_VLAN_CTAG_RX |
 				NETIF_F_HW_VLAN_CTAG_FILTER;
 
-	bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
+	bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
 	bond_dev->features |= bond_dev->hw_features;
 }
 
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 9dbd390ace34..d6ff881165d0 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1026,7 +1026,8 @@ static void __team_compute_features(struct team *team)
 	}
 
 	team->dev->vlan_features = vlan_features;
-	team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
+	team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
+				     NETIF_F_GSO_UDP_L4;
 	team->dev->hard_header_len = max_hard_header_len;
 
 	team->dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
@@ -2117,7 +2118,7 @@ static void team_setup(struct net_device *dev)
 			   NETIF_F_HW_VLAN_CTAG_RX |
 			   NETIF_F_HW_VLAN_CTAG_FILTER;
 
-	dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
+	dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
 	dev->features |= dev->hw_features;
 }
 
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index c87c3a3453c1..623bb8ced060 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -220,7 +220,6 @@ enum {
 				 NETIF_F_GSO_GRE_CSUM |			\
 				 NETIF_F_GSO_IPXIP4 |			\
 				 NETIF_F_GSO_IPXIP6 |			\
-				 NETIF_F_GSO_UDP_L4 |			\
 				 NETIF_F_GSO_UDP_TUNNEL |		\
 				 NETIF_F_GSO_UDP_TUNNEL_CSUM)
 
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH net-next v2 1/2] udp: exclude gso from xfrm paths
From: Willem de Bruijn @ 2018-05-22 15:34 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn, Michal Kubecek
In-Reply-To: <20180522153440.204128-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

UDP GSO delays final datagram construction to the GSO layer. This
conflicts with protocol transformations.

Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/udp.c | 3 ++-
 net/ipv6/udp.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ff4d4ba67735..d71f1f3e1155 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -788,7 +788,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
 			return -EINVAL;
 		if (sk->sk_no_check_tx)
 			return -EINVAL;
-		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite)
+		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
+		    dst_xfrm(skb_dst(skb)))
 			return -EIO;
 
 		skb_shinfo(skb)->gso_size = cork->gso_size;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2839c1bd1e58..426c9d2b418d 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1053,7 +1053,8 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
 			return -EINVAL;
 		if (udp_sk(sk)->no_check6_tx)
 			return -EINVAL;
-		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite)
+		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
+		    dst_xfrm(skb_dst(skb)))
 			return -EIO;
 
 		skb_shinfo(skb)->gso_size = cork->gso_size;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox