Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 4/9] kbuild: Add filechk2 function
From: Jiri Olsa @ 2018-04-05 15:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: lkml, netdev, linux-kbuild, Quentin Monnet, Eugene Syromiatnikov,
	Jiri Benc, Stanislav Kozina, Jerome Marchand,
	Arnaldo Carvalho de Melo, Masahiro Yamada, Michal Marek,
	Jiri Kosina
In-Reply-To: <20180405151645.19130-1-jolsa@kernel.org>

Adding filechk2 function  that has the same semantics
as filechk, but it takes the target file from the 2nd
argument instead of from the '$@' as in the filechk
function.

This function is needed when we can't have separate
target for the file, like in the following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 scripts/Kbuild.include | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 065324a8046f..9775ce2771d4 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -67,6 +67,30 @@ define filechk
 	fi
 endef
 
+# filechk2 is used to check if the content of a generated file is updated.
+# It follows the same logic as the filechk except instead of the $@ target
+# variable, the checked file is passed in 2nd argument.
+#
+# Sample usage:
+# define filechk_sample
+#	echo $KERNELRELEASE
+# endef
+# version.h : Makefile
+#	$(call filechk2,sample,version.h)
+# endef
+define filechk2
+	$(Q)set -e;					\
+	$(kecho) '  CHK     $(2)';			\
+	mkdir -p $(dir $(2));				\
+	$(filechk_$(1)) < $< > $(2).tmp;		\
+	if [ -r $(2) ] && cmp -s $(2) $(2).tmp; then	\
+		rm -f $(2).tmp;				\
+	else						\
+		$(kecho) '  UPD     $(2)';		\
+		mv -f $(2).tmp $(2);			\
+	fi
+endef
+
 ######
 # gcc support functions
 # See documentation in Documentation/kbuild/makefiles.txt
-- 
2.13.6

^ permalink raw reply related

* [PATCH 3/9] kbuild: Do not pass arguments to link-vmlinux.sh
From: Jiri Olsa @ 2018-04-05 15:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: lkml, netdev, linux-kbuild, Quentin Monnet, Eugene Syromiatnikov,
	Jiri Benc, Stanislav Kozina, Jerome Marchand,
	Arnaldo Carvalho de Melo, Masahiro Yamada, Michal Marek,
	Jiri Kosina
In-Reply-To: <20180405151645.19130-1-jolsa@kernel.org>

There's no need to pass LD* arguments to link-vmlinux.sh,
because they are passed as variables. The only argument
the link-vmlinux.sh supports is the 'clean' argument.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index d3300e46f925..a65a3919c6ad 100644
--- a/Makefile
+++ b/Makefile
@@ -1027,7 +1027,7 @@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Final link of vmlinux with optional arch pass after final link
 cmd_link-vmlinux =                                                 \
-	$(CONFIG_SHELL) $< $(LD) $(LDFLAGS) $(LDFLAGS_vmlinux) ;    \
+	$(CONFIG_SHELL) $< ;                                       \
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
 vmlinux: scripts/link-vmlinux.sh vmlinux_prereq $(vmlinux-deps) FORCE
-- 
2.13.6

^ permalink raw reply related

* [PATCH 2/9] perf tools: Add fetch_kernel_buildid function
From: Jiri Olsa @ 2018-04-05 15:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: lkml, netdev, linux-kbuild, Quentin Monnet, Eugene Syromiatnikov,
	Jiri Benc, Stanislav Kozina, Jerome Marchand,
	Arnaldo Carvalho de Melo, Masahiro Yamada, Michal Marek,
	Jiri Kosina
In-Reply-To: <20180405151645.19130-1-jolsa@kernel.org>

Adding fetch_kernel_buildid helper function to
retrieve build id from running kernel. It will
be used in following patches.

Link: http://lkml.kernel.org/n/tip-at98orsncas8v2ito61u3qod@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/util.c | 18 ++++++++++++++++++
 tools/perf/util/util.h |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 41fc61d941ef..26f93866bc02 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -495,6 +495,24 @@ int parse_notes_buildid(void *note_data, size_t note_len, void *bf,
 	return -1;
 }
 
+int fetch_kernel_buildid(char *buildid, int size)
+{
+	char path[PATH_MAX], *buf;
+	size_t len;
+	int err;
+
+	scnprintf(path, PATH_MAX, "%s/kernel/notes",
+		  sysfs__mountpoint());
+
+	if (filename__read_str(path, &buf, &len))
+		return -EINVAL;
+
+	err = parse_notes_buildid(buf, len, buildid, size, false);
+
+	free(buf);
+	return err;
+}
+
 const char *perf_tip(const char *dirpath)
 {
 	struct strlist *tips;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 27106548396b..1ccaa957e3ab 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -48,6 +48,8 @@ extern int cacheline_size;
 int fetch_kernel_version(unsigned int *puint,
 			 char *str, size_t str_sz);
 
+int fetch_kernel_buildid(char *buildid, int size);
+
 int parse_notes_buildid(void *note_data, size_t note_len, void *bf,
 			size_t size, bool need_swap);
 
-- 
2.13.6

^ permalink raw reply related

* [PATCH 1/9] perf tools: Make read_build_id function public
From: Jiri Olsa @ 2018-04-05 15:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: lkml, netdev, linux-kbuild, Quentin Monnet, Eugene Syromiatnikov,
	Jiri Benc, Stanislav Kozina, Jerome Marchand,
	Arnaldo Carvalho de Melo, Masahiro Yamada, Michal Marek,
	Jiri Kosina
In-Reply-To: <20180405151645.19130-1-jolsa@kernel.org>

And renaming it into parse_notes_buildid to be more
precise and usable in following patches.

Link: http://lkml.kernel.org/n/tip-v1mz76rkdxfnbfz2v05fumn6@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/symbol-minimal.c | 50 ++--------------------------------------
 tools/perf/util/util.c           | 48 ++++++++++++++++++++++++++++++++++++++
 tools/perf/util/util.h           |  4 ++++
 3 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index ff48d0d49584..bd281d3dc508 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -24,53 +24,6 @@ static bool check_need_swap(int file_endian)
 	return host_endian != file_endian;
 }
 
-#define NOTE_ALIGN(sz) (((sz) + 3) & ~3)
-
-#define NT_GNU_BUILD_ID	3
-
-static int read_build_id(void *note_data, size_t note_len, void *bf,
-			 size_t size, bool need_swap)
-{
-	struct {
-		u32 n_namesz;
-		u32 n_descsz;
-		u32 n_type;
-	} *nhdr;
-	void *ptr;
-
-	ptr = note_data;
-	while (ptr < (note_data + note_len)) {
-		const char *name;
-		size_t namesz, descsz;
-
-		nhdr = ptr;
-		if (need_swap) {
-			nhdr->n_namesz = bswap_32(nhdr->n_namesz);
-			nhdr->n_descsz = bswap_32(nhdr->n_descsz);
-			nhdr->n_type = bswap_32(nhdr->n_type);
-		}
-
-		namesz = NOTE_ALIGN(nhdr->n_namesz);
-		descsz = NOTE_ALIGN(nhdr->n_descsz);
-
-		ptr += sizeof(*nhdr);
-		name = ptr;
-		ptr += namesz;
-		if (nhdr->n_type == NT_GNU_BUILD_ID &&
-		    nhdr->n_namesz == sizeof("GNU")) {
-			if (memcmp(name, "GNU", sizeof("GNU")) == 0) {
-				size_t sz = min(size, descsz);
-				memcpy(bf, ptr, sz);
-				memset(bf + sz, 0, size - sz);
-				return 0;
-			}
-		}
-		ptr += descsz;
-	}
-
-	return -1;
-}
-
 int filename__read_debuglink(const char *filename __maybe_unused,
 			     char *debuglink __maybe_unused,
 			     size_t size __maybe_unused)
@@ -153,7 +106,8 @@ int filename__read_build_id(const char *filename, void *bf, size_t size)
 			if (fread(buf, buf_size, 1, fp) != 1)
 				goto out_free;
 
-			ret = read_build_id(buf, buf_size, bf, size, need_swap);
+			ret = parse_notes_buildid(buf, buf_size, bf, size,
+						  need_swap);
 			if (ret == 0)
 				ret = size;
 			break;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 1019bbc5dbd8..41fc61d941ef 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -447,6 +447,54 @@ fetch_kernel_version(unsigned int *puint, char *str,
 	return 0;
 }
 
+#define NOTE_ALIGN(sz) (((sz) + 3) & ~3)
+
+#define NT_GNU_BUILD_ID	3
+
+int parse_notes_buildid(void *note_data, size_t note_len, void *bf,
+			size_t size, bool need_swap)
+{
+	struct {
+		u32 n_namesz;
+		u32 n_descsz;
+		u32 n_type;
+	} *nhdr;
+	void *ptr;
+
+	ptr = note_data;
+	while (ptr < (note_data + note_len)) {
+		const char *name;
+		size_t namesz, descsz;
+
+		nhdr = ptr;
+		if (need_swap) {
+			nhdr->n_namesz = bswap_32(nhdr->n_namesz);
+			nhdr->n_descsz = bswap_32(nhdr->n_descsz);
+			nhdr->n_type = bswap_32(nhdr->n_type);
+		}
+
+		namesz = NOTE_ALIGN(nhdr->n_namesz);
+		descsz = NOTE_ALIGN(nhdr->n_descsz);
+
+		ptr += sizeof(*nhdr);
+		name = ptr;
+		ptr += namesz;
+		if (nhdr->n_type == NT_GNU_BUILD_ID &&
+		    nhdr->n_namesz == sizeof("GNU")) {
+			if (memcmp(name, "GNU", sizeof("GNU")) == 0) {
+				size_t sz = min(size, descsz);
+
+				memcpy(bf, ptr, sz);
+				memset(bf + sz, 0, size - sz);
+				return 0;
+			}
+		}
+		ptr += descsz;
+	}
+
+	return -1;
+}
+
 const char *perf_tip(const char *dirpath)
 {
 	struct strlist *tips;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 9496365da3d7..27106548396b 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -47,6 +47,10 @@ extern int cacheline_size;
 
 int fetch_kernel_version(unsigned int *puint,
 			 char *str, size_t str_sz);
+
+int parse_notes_buildid(void *note_data, size_t note_len, void *bf,
+			size_t size, bool need_swap);
+
 #define KVER_VERSION(x)		(((x) >> 16) & 0xff)
 #define KVER_PATCHLEVEL(x)	(((x) >> 8) & 0xff)
 #define KVER_SUBLEVEL(x)	((x) & 0xff)
-- 
2.13.6

^ permalink raw reply related

* [RFC 0/9] bpf: Add buildid check support
From: Jiri Olsa @ 2018-04-05 15:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: lkml, netdev, linux-kbuild, Quentin Monnet, Eugene Syromiatnikov,
	Jiri Benc, Stanislav Kozina, Jerome Marchand,
	Arnaldo Carvalho de Melo, Masahiro Yamada, Michal Marek,
	Jiri Kosina

hi,
eBPF programs loaded for kprobes are allowed to read kernel
internal structures. We check the provided kernel version
to ensure that the program is loaded for the proper kernel. 

The problem is that the version check is not enough, because
it only follows the version setup from kernel's Makefile.
However, the internal kernel structures change based on the
.config data, so in practise we have different kernels with
same version.

The eBPF kprobe program thus then get loaded for different
kernel than it's been built for, get wrong data (silently)
and provide misleading output.

This patchset implements additional check in eBPF loading code
on provided build ID (from kernel's elf image, .notes section
GNU build ID) to ensure we load the eBPF program on correct
kernel.

Also available in here (based on bpf-next/master):
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  bpf/checksum

This patchset consists of several changes:

- adding CONFIG_BUILDID_H option that instructs the build
  to generate uapi header file with build ID data, that
  will be included by eBPF program

- adding CONFIG_BPF_BUILDID_CHECK option and new bpf_attr
  field to allow build ID checking when loading the eBPF
  program

- changing libbpf to read and pass build ID to the kernel

- several small side fixes

- example perf eBPF code in bpf-samples/bpf-stdout-example.c
  to show the build ID support/usage.

    # perf record -vv  -e ./bpf-samples/bpf-stdout-example.c kill 2>&1 | grep buildid
    libbpf: section(7) buildid, size 21, link 0, flags 3, type=1
    libbpf: kernel buildid of ./bpf-samples/bpf-stdout-example.c is: 6e25edeb408513184e2753bebad25d42314501a0

  The buildid is provided the same way we provide kernel
  version, in a special "buildid" section:

    # cat ./bpf-samples/bpf-stdout-example.c
    ...
    #include <linux/buildid.h>

    char _buildid[] SEC("buildid") = LINUX_BUILDID_DATA;
    ...

  where LINUX_BUILDID_DATA is defined in the generated buildid.h.

please note it's an RFC ;-) any comments and suggestions are welcome

thanks,
jirka


---
Jiri Olsa (9):
      perf tools: Make read_build_id function public
      perf tools: Add fetch_kernel_buildid function
      kbuild: Do not pass arguments to link-vmlinux.sh
      kbuild: Add filechk2 function
      bpf: Add CONFIG_BUILDID_H option
      bpf: Add CONFIG_BPF_BUILDID_CHECK option
      libbpf: Synchronize uapi bpf.h header
      libbpf: Add support to attach buildid to program load
      perf tools: The buildid usage in example eBPF program

 Makefile                                    | 14 +++++++++++++-
 include/uapi/linux/bpf.h                    |  2 ++
 init/Kconfig                                | 12 ++++++++++++
 kernel/bpf/syscall.c                        | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 scripts/Kbuild.include                      | 24 ++++++++++++++++++++++++
 scripts/Makefile                            |  1 +
 scripts/extract-buildid.c                   | 42 ++++++++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/Makefile                  |  5 ++++-
 tools/include/uapi/linux/bpf.h              |  3 +++
 tools/lib/bpf/bpf.c                         |  6 ++++--
 tools/lib/bpf/bpf.h                         |  5 +++--
 tools/lib/bpf/libbpf.c                      | 46 ++++++++++++++++++++++++++++++++++++++++------
 tools/perf/bpf-samples/bpf-stdout-example.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 tools/perf/tests/bpf.c                      |  9 ++++++++-
 tools/perf/util/symbol-minimal.c            | 50 ++------------------------------------------------
 tools/perf/util/util.c                      | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/util.h                      |  6 ++++++
 17 files changed, 355 insertions(+), 62 deletions(-)
 create mode 100644 scripts/extract-buildid.c
 create mode 100644 tools/perf/bpf-samples/bpf-stdout-example.c

^ permalink raw reply

* Re: [PATCH] net: thunderx: rework mac addresses list to u64 array
From: Christoph Hellwig @ 2018-04-05 15:07 UTC (permalink / raw)
  To: Vadim Lomovtsev
  Cc: sgoutham, sunil.kovvuri, robert.richter, linux-arm-kernel, netdev,
	linux-kernel, davem, dnelson, gustavo, Vadim Lomovtsev
In-Reply-To: <20180405145756.12633-1-Vadim.Lomovtsev@caviumnetworks.com>

>  struct xcast_addr_list {
> -	struct list_head list;
>  	int              count;
> +	u64              mc[0];

Please use the standard C99 syntax here:

	u64              mc[];

> +				mc_list = kmalloc(sizeof(*mc_list) +
> +						  sizeof(u64) * netdev_mc_count(netdev),
> +						  GFP_ATOMIC);

kmalloc_array(), please.

^ permalink raw reply

* Re: [PATCH] net: thunderx: rework mac addresses list to u64 array
From: Vadim Lomovtsev @ 2018-04-05 15:00 UTC (permalink / raw)
  To: sgoutham, sunil.kovvuri, rric, linux-arm-kernel, netdev,
	linux-kernel, davem
  Cc: dnelson, gustavo, Vadim Lomovtsev
In-Reply-To: <20180405145756.12633-1-Vadim.Lomovtsev@caviumnetworks.com>

[correct Roberts' address]

On Thu, Apr 05, 2018 at 07:57:56AM -0700, Vadim Lomovtsev wrote:
> From: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
> 
> It is too expensive to pass u64 values via linked list, instead
> allocate array for them by overall number of mac addresses from netdev.
> 
> This eventually removes multiple kmalloc() calls, aviod memory
> fragmentation and allow to put single null check on kmalloc
> return value in order to prevent a potential null pointer dereference.
> 
> Addresses-Coverity-ID: 1467429 ("Dereference null return value")
> Fixes: 37c3347eb247 ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
> ---
>  drivers/net/ethernet/cavium/thunder/nic.h        |  7 +-----
>  drivers/net/ethernet/cavium/thunder/nicvf_main.c | 28 +++++++++---------------
>  2 files changed, 11 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
> index 5fc46c5a4f36..da052159f014 100644
> --- a/drivers/net/ethernet/cavium/thunder/nic.h
> +++ b/drivers/net/ethernet/cavium/thunder/nic.h
> @@ -265,14 +265,9 @@ struct nicvf_drv_stats {
>  
>  struct cavium_ptp;
>  
> -struct xcast_addr {
> -	struct list_head list;
> -	u64              addr;
> -};
> -
>  struct xcast_addr_list {
> -	struct list_head list;
>  	int              count;
> +	u64              mc[0];
>  };
>  
>  struct nicvf_work {
> diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> index 1e9a31fef729..a26d8bc92e01 100644
> --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> @@ -1929,7 +1929,7 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
>  						  work.work);
>  	struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
>  	union nic_mbx mbx = {};
> -	struct xcast_addr *xaddr, *next;
> +	u8 idx = 0;
>  
>  	if (!vf_work)
>  		return;
> @@ -1956,16 +1956,10 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
>  	/* check if we have any specific MACs to be added to PF DMAC filter */
>  	if (vf_work->mc) {
>  		/* now go through kernel list of MACs and add them one by one */
> -		list_for_each_entry_safe(xaddr, next,
> -					 &vf_work->mc->list, list) {
> +		for (idx = 0; idx < vf_work->mc->count; idx++) {
>  			mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
> -			mbx.xcast.data.mac = xaddr->addr;
> +			mbx.xcast.data.mac = vf_work->mc->mc[idx];
>  			nicvf_send_msg_to_pf(nic, &mbx);
> -
> -			/* after receiving ACK from PF release memory */
> -			list_del(&xaddr->list);
> -			kfree(xaddr);
> -			vf_work->mc->count--;
>  		}
>  		kfree(vf_work->mc);
>  	}
> @@ -1996,17 +1990,15 @@ static void nicvf_set_rx_mode(struct net_device *netdev)
>  			mode |= BGX_XCAST_MCAST_FILTER;
>  			/* here we need to copy mc addrs */
>  			if (netdev_mc_count(netdev)) {
> -				struct xcast_addr *xaddr;
> -
> -				mc_list = kmalloc(sizeof(*mc_list), GFP_ATOMIC);
> -				INIT_LIST_HEAD(&mc_list->list);
> +				mc_list = kmalloc(sizeof(*mc_list) +
> +						  sizeof(u64) * netdev_mc_count(netdev),
> +						  GFP_ATOMIC);
> +				if (unlikely(!mc_list))
> +					return;
> +				mc_list->count = 0;
>  				netdev_hw_addr_list_for_each(ha, &netdev->mc) {
> -					xaddr = kmalloc(sizeof(*xaddr),
> -							GFP_ATOMIC);
> -					xaddr->addr =
> +					mc_list->mc[mc_list->count] =
>  						ether_addr_to_u64(ha->addr);
> -					list_add_tail(&xaddr->list,
> -						      &mc_list->list);
>  					mc_list->count++;
>  				}
>  			}
> -- 
> 2.14.3
> 

^ permalink raw reply

* [PATCH] net: thunderx: rework mac addresses list to u64 array
From: Vadim Lomovtsev @ 2018-04-05 14:57 UTC (permalink / raw)
  To: sgoutham, sunil.kovvuri, robert.richter, linux-arm-kernel, netdev,
	linux-kernel, davem
  Cc: dnelson, gustavo, Vadim Lomovtsev

From: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>

It is too expensive to pass u64 values via linked list, instead
allocate array for them by overall number of mac addresses from netdev.

This eventually removes multiple kmalloc() calls, aviod memory
fragmentation and allow to put single null check on kmalloc
return value in order to prevent a potential null pointer dereference.

Addresses-Coverity-ID: 1467429 ("Dereference null return value")
Fixes: 37c3347eb247 ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
---
 drivers/net/ethernet/cavium/thunder/nic.h        |  7 +-----
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 28 +++++++++---------------
 2 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index 5fc46c5a4f36..da052159f014 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -265,14 +265,9 @@ struct nicvf_drv_stats {
 
 struct cavium_ptp;
 
-struct xcast_addr {
-	struct list_head list;
-	u64              addr;
-};
-
 struct xcast_addr_list {
-	struct list_head list;
 	int              count;
+	u64              mc[0];
 };
 
 struct nicvf_work {
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 1e9a31fef729..a26d8bc92e01 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1929,7 +1929,7 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
 						  work.work);
 	struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
 	union nic_mbx mbx = {};
-	struct xcast_addr *xaddr, *next;
+	u8 idx = 0;
 
 	if (!vf_work)
 		return;
@@ -1956,16 +1956,10 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
 	/* check if we have any specific MACs to be added to PF DMAC filter */
 	if (vf_work->mc) {
 		/* now go through kernel list of MACs and add them one by one */
-		list_for_each_entry_safe(xaddr, next,
-					 &vf_work->mc->list, list) {
+		for (idx = 0; idx < vf_work->mc->count; idx++) {
 			mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
-			mbx.xcast.data.mac = xaddr->addr;
+			mbx.xcast.data.mac = vf_work->mc->mc[idx];
 			nicvf_send_msg_to_pf(nic, &mbx);
-
-			/* after receiving ACK from PF release memory */
-			list_del(&xaddr->list);
-			kfree(xaddr);
-			vf_work->mc->count--;
 		}
 		kfree(vf_work->mc);
 	}
@@ -1996,17 +1990,15 @@ static void nicvf_set_rx_mode(struct net_device *netdev)
 			mode |= BGX_XCAST_MCAST_FILTER;
 			/* here we need to copy mc addrs */
 			if (netdev_mc_count(netdev)) {
-				struct xcast_addr *xaddr;
-
-				mc_list = kmalloc(sizeof(*mc_list), GFP_ATOMIC);
-				INIT_LIST_HEAD(&mc_list->list);
+				mc_list = kmalloc(sizeof(*mc_list) +
+						  sizeof(u64) * netdev_mc_count(netdev),
+						  GFP_ATOMIC);
+				if (unlikely(!mc_list))
+					return;
+				mc_list->count = 0;
 				netdev_hw_addr_list_for_each(ha, &netdev->mc) {
-					xaddr = kmalloc(sizeof(*xaddr),
-							GFP_ATOMIC);
-					xaddr->addr =
+					mc_list->mc[mc_list->count] =
 						ether_addr_to_u64(ha->addr);
-					list_add_tail(&xaddr->list,
-						      &mc_list->list);
 					mc_list->count++;
 				}
 			}
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH] net: phy: marvell: Enable interrupt function on LED2 pin
From: Andrew Lunn @ 2018-04-05 14:43 UTC (permalink / raw)
  To: Bhadram Varka
  Cc: Esben Haabendal, Esben Haabendal, Rasmus Villemoes,
	Florian Fainelli, open list, netdev@vger.kernel.org
In-Reply-To: <4d9e0665abf7408988fc4ce20c26a08f@bgmail102.nvidia.com>

> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index 0e0978d8a0eb..f03a510f1247 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -457,6 +457,21 @@ static int marvell_of_reg_init(struct phy_device *phydev)  }  #endif /* CONFIG_OF_MDIO */
>  
> +static int m88e1318_config_intr(struct phy_device *phydev) {
> +	int err;
> +
> +	err = marvell_config_intr(phydev);
> +	if (err)
> +		return err;
> +
> +	/* Setup LED[2] as interrupt pin (active low) */
> +	return phy_modify(phydev, MII_88E1318S_PHY_LED_TCR,
> +			  MII_88E1318S_PHY_LED_TCR_FORCE_INT,
> +			  MII_88E1318S_PHY_LED_TCR_INTn_ENABLE |
> +			  MII_88E1318S_PHY_LED_TCR_INT_ACTIVE_LOW);
> 
> Can we move this part of the code to m88e1121_config_init() ?
> 
> Every time whether we disable or enable the interrupts this part of code will execute.

Yes, doing this once would be better. But please allow the LED pin to
be used as an LED when not using interrupts. phy_interrupt_is_valid()
should be involved somehow.

       Andrew

^ permalink raw reply

* Re: [PATCH v3 2/4] bus: fsl-mc: add restool userspace support
From: Laurentiu Tudor @ 2018-04-05 14:43 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Stuart Yoder, Arnd Bergmann, Ioana Ciornei, gregkh,
	Linux Kernel Mailing List, Ruxandra Ioana Ciocoi Radulescu,
	Razvan Stefanescu, Roy Pledge, Networking
In-Reply-To: <20180405124810.GE12178@lunn.ch>

Hi Andrew,

On 04/05/2018 03:48 PM, Andrew Lunn wrote:
>>> Hi Laurentiu
>>>
>>> So i can use switchdev without it? I can modprobe the switchdev
>>> driver, all the physical interfaces will appear, and i can use ip addr
>>> add etc. I do not need to use a user space tool at all in order to use
>>> the network functionality?
>>
>> Absolutely!
>
> Great.
>
> Then the easiest way forwards is to simply drop the IOCTL code for the
> moment. Get the basic support for the hardware into the kernel
> first. Then come back later to look at dynamic behaviour which needs
> some form of configuration.

Hmm, not sure I understand. We already have a fully functional ethernet 
driver [1] and a switch driver [2] ...

>> In normal use cases the system designer, depending on the requirements,
>> configures the various devices that it desires through a firmware
>> configuration (think something like a device tree). The devices
>> configured are presented on the mc-bus and probed normally by the
>> kernel. The standard networking linux tools can be used as expected.
>
> So what you should probably do is start a discussion on what this
> device tree binding looks like. But you need to be careful even
> here. Device tree describes the hardware, not how you configure the
> hardware. So maybe DT does not actually fit.

It's not an actual device tree, but a configuration file that happens to 
reuse the DTS format. I guess my analogy with a device tree was not the 
best.
Detailed documentation on the syntax can be found here [3], chapter 22.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/fsl-dpaa2/ethernet
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/fsl-dpaa2/ethsw
[3] https://www.nxp.com/docs/en/user-guide/DPAA2_UM.pdf

---
Best Regards, Laurentiu

^ permalink raw reply

* Re: [PATCH net-next] netns: filter uevents correctly
From: Christian Brauner @ 2018-04-05 14:41 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Christian Brauner, ebiederm, davem, gregkh, netdev, linux-kernel,
	avagin, serge
In-Reply-To: <941de2b9-332f-75fc-f8ac-4059a9b5426f@virtuozzo.com>

On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> On 05.04.2018 17:07, Christian Brauner wrote:
> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
> >>>
> >>> enabled sending hotplug events into all network namespaces back in 2010.
> >>> Over time the set of uevents that get sent into all network namespaces has
> >>> shrunk. We have now reached the point where hotplug events for all devices
> >>> that carry a namespace tag are filtered according to that namespace.
> >>>
> >>> Specifically, they are filtered whenever the namespace tag of the kobject
> >>> does not match the namespace tag of the netlink socket. One example are
> >>> network devices. Uevents for network devices only show up in the network
> >>> namespaces these devices are moved to or created in.
> >>>
> >>> However, any uevent for a kobject that does not have a namespace tag
> >>> associated with it will not be filtered and we will *try* to broadcast it
> >>> into all network namespaces.
> >>>
> >>> The original patchset was written in 2010 before user namespaces were a
> >>> thing. With the introduction of user namespaces sending out uevents became
> >>> partially isolated as they were filtered by user namespaces:
> >>>
> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >>>
> >>> if (!net_eq(sock_net(sk), p->net)) {
> >>>         if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >>>                 return;
> >>>
> >>>         if (!peernet_has_id(sock_net(sk), p->net))
> >>>                 return;
> >>>
> >>>         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >>>                              CAP_NET_BROADCAST))
> >>>         j       return;
> >>> }
> >>>
> >>> The file_ns_capable() check will check whether the caller had
> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> >>> namespace of interest. This check is fine in general but seems insufficient
> >>> to me when paired with uevents. The reason is that devices always belong to
> >>> the initial user namespace so uevents for kobjects that do not carry a
> >>> namespace tag should never be sent into another user namespace. This has
> >>> been the intention all along. But there's one case where this breaks,
> >>> namely if a new user namespace is created by root on the host and an
> >>> identity mapping is established between root on the host and root in the
> >>> new user namespace. Here's a reproducer:
> >>>
> >>>  sudo unshare -U --map-root
> >>>  udevadm monitor -k
> >>>  # Now change to initial user namespace and e.g. do
> >>>  modprobe kvm
> >>>  # or
> >>>  rmmod kvm
> >>>
> >>> will allow the non-initial user namespace to retrieve all uevents from the
> >>> host. This seems very anecdotal given that in the general case user
> >>> namespaces do not see any uevents and also can't really do anything useful
> >>> with them.
> >>>
> >>> Additionally, it is now possible to send uevents from userspace. As such we
> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >>> namespace of the network namespace of the netlink socket) userspace process
> >>> make a decision what uevents should be sent.
> >>>
> >>> This makes me think that we should simply ensure that uevents for kobjects
> >>> that do not carry a namespace tag are *always* filtered by user namespace
> >>> in kobj_bcast_filter(). Specifically:
> >>> - If the owning user namespace of the uevent socket is not init_user_ns the
> >>>   event will always be filtered.
> >>> - If the network namespace the uevent socket belongs to was created in the
> >>>   initial user namespace but was opened from a non-initial user namespace
> >>>   the event will be filtered as well.
> >>> Put another way, uevents for kobjects not carrying a namespace tag are now
> >>> always only sent to the initial user namespace. The regression potential
> >>> for this is near to non-existent since user namespaces can't really do
> >>> anything with interesting devices.
> >>>
> >>> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> >>> ---
> >>>  lib/kobject_uevent.c | 10 +++++++++-
> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >>> index 15ea216a67ce..cb98cddb6e3b 100644
> >>> --- a/lib/kobject_uevent.c
> >>> +++ b/lib/kobject_uevent.c
> >>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
> >>>  		return sock_ns != ns;
> >>>  	}
> >>>  
> >>> -	return 0;
> >>> +	/*
> >>> +	 * The kobject does not carry a namespace tag so filter by user
> >>> +	 * namespace below.
> >>> +	 */
> >>> +	if (sock_net(dsk)->user_ns != &init_user_ns)
> >>> +		return 1;
> >>> +
> >>> +	/* Check if socket was opened from non-initial user namespace. */
> >>> +	return sk_user_ns(dsk) != &init_user_ns;
> >>>  }
> >>>  #endif
> >>
> >> So, this prohibits to listen events of all devices except network-related
> >> in containers? If it's so, I don't think it's a good solution. Uevents is not
> > 
> > No, this is not correct: As it is right now *without my patch* no
> > non-initial user namespace is receiving *any uevents* but those
> > specifically namespaced such as those for network devices. This patch
> > doesn't change that at all. The commit message outlines this in detail
> > how this comes about.
> > There is only one case where this currently breaks and this is as I
> > outlined explicitly in my commit message when you create a new user
> > namespace and map container(0) -> host(0). This patch fixes this.
> 
> Could you please point the place, where non-initial user namespaces are filtered?
> I only see the kobj_bcast_filter() logic, and it used to return 0, which means "accepted".
> Now it will return 1 sometimes.

Oh sure, it's in the commit message though. The callchain is
lib/kobject_uevent.c:kobject_uevent_net_broadcast() ->
nnet/netlink/af_netlink.c:netlink_broadcast_filtered() ->
net/netlink/af_netlink.c:do_one_broadcast():

This codepiece will check whether the openened socket holds
CAP_NET_BROADCAST in the user namespace of the target network namespace
which it won't because we don't have device namespaces and all devices
belong to the initial set of namespaces.

        if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
                             CAP_NET_BROADCAST))
        j       return;

The only exception are network devices but they will pass the preceeding
if (!net_eq(sock_net(sk), p->net)) for any netlink uevent socket opened
in the same network namespace.

> 
> >> net-devices-only related interface and it's used for all devices in system.
> >> People may want to delegate block devices to nested user_ns, for example.
> > 
> > That's fine but that's why I added uevent injection in a previous patch
> > series: I repeat no non-initial user namespace will by default receive
> > uevents.
> 
> Thanks,
> Kirill

^ permalink raw reply

* Re: [PATCH net] arp: fix arp_filter on l3slave devices
From: David Ahern @ 2018-04-05 14:40 UTC (permalink / raw)
  To: Miguel Fadon Perlines, netdev, David Miller
In-Reply-To: <1522916738-192046-1-git-send-email-mfadon@teldat.com>

On 4/5/18 2:25 AM, Miguel Fadon Perlines wrote:
> arp_filter performs an ip_route_output search for arp source address and
> checks if output device is the same where the arp request was received,
> if it is not, the arp request is not answered.
> 
> This route lookup is always done on main route table so l3slave devices
> never find the proper route and arp is not answered.
> 
> Passing l3mdev_master_ifindex_rcu(dev) return value as oif fixes the
> lookup for l3slave devices while maintaining same behavior for non
> l3slave devices as this function returns 0 in that case.
> 
> Signed-off-by: Miguel Fadon Perlines <mfadon@teldat.com>
> ---
>  net/ipv4/arp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: David Ahern <dsa@cumulusnetworks.com>

DaveM: this is a day 1 bug for VRF. Best guess at a Fixes tag would be:

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")

It would be good to get this into stable releases 4.9 and up. Thanks,

^ permalink raw reply

* Re: [PATCH net-next] netns: filter uevents correctly
From: Kirill Tkhai @ 2018-04-05 14:26 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, davem, gregkh, netdev, linux-kernel, avagin, serge
In-Reply-To: <20180405140709.GA1697@gmail.com>

On 05.04.2018 17:07, Christian Brauner wrote:
> On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> On 04.04.2018 22:48, Christian Brauner wrote:
>>> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
>>>
>>> enabled sending hotplug events into all network namespaces back in 2010.
>>> Over time the set of uevents that get sent into all network namespaces has
>>> shrunk. We have now reached the point where hotplug events for all devices
>>> that carry a namespace tag are filtered according to that namespace.
>>>
>>> Specifically, they are filtered whenever the namespace tag of the kobject
>>> does not match the namespace tag of the netlink socket. One example are
>>> network devices. Uevents for network devices only show up in the network
>>> namespaces these devices are moved to or created in.
>>>
>>> However, any uevent for a kobject that does not have a namespace tag
>>> associated with it will not be filtered and we will *try* to broadcast it
>>> into all network namespaces.
>>>
>>> The original patchset was written in 2010 before user namespaces were a
>>> thing. With the introduction of user namespaces sending out uevents became
>>> partially isolated as they were filtered by user namespaces:
>>>
>>> net/netlink/af_netlink.c:do_one_broadcast()
>>>
>>> if (!net_eq(sock_net(sk), p->net)) {
>>>         if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>>>                 return;
>>>
>>>         if (!peernet_has_id(sock_net(sk), p->net))
>>>                 return;
>>>
>>>         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>>>                              CAP_NET_BROADCAST))
>>>         j       return;
>>> }
>>>
>>> The file_ns_capable() check will check whether the caller had
>>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
>>> namespace of interest. This check is fine in general but seems insufficient
>>> to me when paired with uevents. The reason is that devices always belong to
>>> the initial user namespace so uevents for kobjects that do not carry a
>>> namespace tag should never be sent into another user namespace. This has
>>> been the intention all along. But there's one case where this breaks,
>>> namely if a new user namespace is created by root on the host and an
>>> identity mapping is established between root on the host and root in the
>>> new user namespace. Here's a reproducer:
>>>
>>>  sudo unshare -U --map-root
>>>  udevadm monitor -k
>>>  # Now change to initial user namespace and e.g. do
>>>  modprobe kvm
>>>  # or
>>>  rmmod kvm
>>>
>>> will allow the non-initial user namespace to retrieve all uevents from the
>>> host. This seems very anecdotal given that in the general case user
>>> namespaces do not see any uevents and also can't really do anything useful
>>> with them.
>>>
>>> Additionally, it is now possible to send uevents from userspace. As such we
>>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>>> namespace of the network namespace of the netlink socket) userspace process
>>> make a decision what uevents should be sent.
>>>
>>> This makes me think that we should simply ensure that uevents for kobjects
>>> that do not carry a namespace tag are *always* filtered by user namespace
>>> in kobj_bcast_filter(). Specifically:
>>> - If the owning user namespace of the uevent socket is not init_user_ns the
>>>   event will always be filtered.
>>> - If the network namespace the uevent socket belongs to was created in the
>>>   initial user namespace but was opened from a non-initial user namespace
>>>   the event will be filtered as well.
>>> Put another way, uevents for kobjects not carrying a namespace tag are now
>>> always only sent to the initial user namespace. The regression potential
>>> for this is near to non-existent since user namespaces can't really do
>>> anything with interesting devices.
>>>
>>> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
>>> ---
>>>  lib/kobject_uevent.c | 10 +++++++++-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>>> index 15ea216a67ce..cb98cddb6e3b 100644
>>> --- a/lib/kobject_uevent.c
>>> +++ b/lib/kobject_uevent.c
>>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
>>>  		return sock_ns != ns;
>>>  	}
>>>  
>>> -	return 0;
>>> +	/*
>>> +	 * The kobject does not carry a namespace tag so filter by user
>>> +	 * namespace below.
>>> +	 */
>>> +	if (sock_net(dsk)->user_ns != &init_user_ns)
>>> +		return 1;
>>> +
>>> +	/* Check if socket was opened from non-initial user namespace. */
>>> +	return sk_user_ns(dsk) != &init_user_ns;
>>>  }
>>>  #endif
>>
>> So, this prohibits to listen events of all devices except network-related
>> in containers? If it's so, I don't think it's a good solution. Uevents is not
> 
> No, this is not correct: As it is right now *without my patch* no
> non-initial user namespace is receiving *any uevents* but those
> specifically namespaced such as those for network devices. This patch
> doesn't change that at all. The commit message outlines this in detail
> how this comes about.
> There is only one case where this currently breaks and this is as I
> outlined explicitly in my commit message when you create a new user
> namespace and map container(0) -> host(0). This patch fixes this.

Could you please point the place, where non-initial user namespaces are filtered?
I only see the kobj_bcast_filter() logic, and it used to return 0, which means "accepted".
Now it will return 1 sometimes.

>> net-devices-only related interface and it's used for all devices in system.
>> People may want to delegate block devices to nested user_ns, for example.
> 
> That's fine but that's why I added uevent injection in a previous patch
> series: I repeat no non-initial user namespace will by default receive
> uevents.

Thanks,
Kirill

^ permalink raw reply

* Re: [PATCH v3 2/4] bus: fsl-mc: add restool userspace support
From: gregkh @ 2018-04-05 14:19 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Andrew Lunn, Stuart Yoder, Arnd Bergmann, Ioana Ciornei,
	Linux Kernel Mailing List, Ruxandra Ioana Ciocoi Radulescu,
	Razvan Stefanescu, Roy Pledge, Networking
In-Reply-To: <5AC62E2A.90900@nxp.com>

On Thu, Apr 05, 2018 at 02:09:47PM +0000, Laurentiu Tudor wrote:
> Hi Greg,
> 
> On 04/05/2018 03:30 PM, gregkh wrote:
> > On Thu, Apr 05, 2018 at 10:30:01AM +0000, Laurentiu Tudor wrote:
> >> Hello,
> >>
> >> My 2c below.
> >>
> >> On 04/04/2018 03:42 PM, Andrew Lunn wrote:
> >>>> I hear you.  It is more complicated this way...having all these individual
> >>>> objects vs just a single "bundle" of them that represents a NIC.  But, that's
> >>>> the way the DPAA2 hardware is, and we're implementing kernel support for
> >>>> the hardware as it is.
> >>>
> >>> Hi Stuart
> >>>
> >>> I see we are not making any progress here.
> >>>
> >>> So what i suggest is you post the kernel code and configuration tool
> >>> concept to netdev for a full review. You want reviews from David
> >>> Miller, Jiri Pirko, Jakub Kicinski, David Ahern, etc.
> >>>
> >>
> >> I think that the discussion steered too much towards networking related
> >> topics, while this ioctl doesn't have much to do with networking.
> >> It's just an ioctl for our mc-bus bus driver that is used to manage the
> >> devices on this bus through userspace tools.
> >> In addition, I'd drop any mention of our reference user space app
> >> (restool) to emphasize that this ioctl is not added just for a
> >> particular user space app. I think Stuart also mentioned this.
> >
> > I'm not going to take a "generic device configuration ioctl" patch
> > unless it is documented to all exactly what it does, and why it is
> > there.
> 
> The ioctl() is just a simple pass-through interface to the firmware.

Ah, so a new syscall?  :)

> It passes commands to the firmware and returns the response back to the 
> userspace. Thus the ABI used by the firmware applies for this ioctl() 
> and it is documented in detail here:
> 
> https://www.nxp.com/docs/en/user-guide/DPAA2_UM.pdf

Let's wait on this until people all agree that it's ok to expose this
directly.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v3 2/4] bus: fsl-mc: add restool userspace support
From: Laurentiu Tudor @ 2018-04-05 14:09 UTC (permalink / raw)
  To: gregkh
  Cc: Andrew Lunn, Stuart Yoder, Arnd Bergmann, Ioana Ciornei,
	Linux Kernel Mailing List, Ruxandra Ioana Ciocoi Radulescu,
	Razvan Stefanescu, Roy Pledge, Networking
In-Reply-To: <20180405123018.GA17751@kroah.com>

Hi Greg,

On 04/05/2018 03:30 PM, gregkh wrote:
> On Thu, Apr 05, 2018 at 10:30:01AM +0000, Laurentiu Tudor wrote:
>> Hello,
>>
>> My 2c below.
>>
>> On 04/04/2018 03:42 PM, Andrew Lunn wrote:
>>>> I hear you.  It is more complicated this way...having all these individual
>>>> objects vs just a single "bundle" of them that represents a NIC.  But, that's
>>>> the way the DPAA2 hardware is, and we're implementing kernel support for
>>>> the hardware as it is.
>>>
>>> Hi Stuart
>>>
>>> I see we are not making any progress here.
>>>
>>> So what i suggest is you post the kernel code and configuration tool
>>> concept to netdev for a full review. You want reviews from David
>>> Miller, Jiri Pirko, Jakub Kicinski, David Ahern, etc.
>>>
>>
>> I think that the discussion steered too much towards networking related
>> topics, while this ioctl doesn't have much to do with networking.
>> It's just an ioctl for our mc-bus bus driver that is used to manage the
>> devices on this bus through userspace tools.
>> In addition, I'd drop any mention of our reference user space app
>> (restool) to emphasize that this ioctl is not added just for a
>> particular user space app. I think Stuart also mentioned this.
>
> I'm not going to take a "generic device configuration ioctl" patch
> unless it is documented to all exactly what it does, and why it is
> there.

The ioctl() is just a simple pass-through interface to the firmware.
It passes commands to the firmware and returns the response back to the 
userspace. Thus the ABI used by the firmware applies for this ioctl() 
and it is documented in detail here:

https://www.nxp.com/docs/en/user-guide/DPAA2_UM.pdf

---
Best Regards, Laurentiu

^ permalink raw reply

* Re: [PATCH net-next] netns: filter uevents correctly
From: Christian Brauner @ 2018-04-05 14:07 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: ebiederm, davem, gregkh, netdev, linux-kernel, avagin, serge
In-Reply-To: <442e89b8-e947-6eeb-1bcb-fa28f22a25f0@virtuozzo.com>

On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> On 04.04.2018 22:48, Christian Brauner wrote:
> > commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
> > 
> > enabled sending hotplug events into all network namespaces back in 2010.
> > Over time the set of uevents that get sent into all network namespaces has
> > shrunk. We have now reached the point where hotplug events for all devices
> > that carry a namespace tag are filtered according to that namespace.
> > 
> > Specifically, they are filtered whenever the namespace tag of the kobject
> > does not match the namespace tag of the netlink socket. One example are
> > network devices. Uevents for network devices only show up in the network
> > namespaces these devices are moved to or created in.
> > 
> > However, any uevent for a kobject that does not have a namespace tag
> > associated with it will not be filtered and we will *try* to broadcast it
> > into all network namespaces.
> > 
> > The original patchset was written in 2010 before user namespaces were a
> > thing. With the introduction of user namespaces sending out uevents became
> > partially isolated as they were filtered by user namespaces:
> > 
> > net/netlink/af_netlink.c:do_one_broadcast()
> > 
> > if (!net_eq(sock_net(sk), p->net)) {
> >         if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >                 return;
> > 
> >         if (!peernet_has_id(sock_net(sk), p->net))
> >                 return;
> > 
> >         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >                              CAP_NET_BROADCAST))
> >         j       return;
> > }
> > 
> > The file_ns_capable() check will check whether the caller had
> > CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> > namespace of interest. This check is fine in general but seems insufficient
> > to me when paired with uevents. The reason is that devices always belong to
> > the initial user namespace so uevents for kobjects that do not carry a
> > namespace tag should never be sent into another user namespace. This has
> > been the intention all along. But there's one case where this breaks,
> > namely if a new user namespace is created by root on the host and an
> > identity mapping is established between root on the host and root in the
> > new user namespace. Here's a reproducer:
> > 
> >  sudo unshare -U --map-root
> >  udevadm monitor -k
> >  # Now change to initial user namespace and e.g. do
> >  modprobe kvm
> >  # or
> >  rmmod kvm
> > 
> > will allow the non-initial user namespace to retrieve all uevents from the
> > host. This seems very anecdotal given that in the general case user
> > namespaces do not see any uevents and also can't really do anything useful
> > with them.
> > 
> > Additionally, it is now possible to send uevents from userspace. As such we
> > can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> > namespace of the network namespace of the netlink socket) userspace process
> > make a decision what uevents should be sent.
> > 
> > This makes me think that we should simply ensure that uevents for kobjects
> > that do not carry a namespace tag are *always* filtered by user namespace
> > in kobj_bcast_filter(). Specifically:
> > - If the owning user namespace of the uevent socket is not init_user_ns the
> >   event will always be filtered.
> > - If the network namespace the uevent socket belongs to was created in the
> >   initial user namespace but was opened from a non-initial user namespace
> >   the event will be filtered as well.
> > Put another way, uevents for kobjects not carrying a namespace tag are now
> > always only sent to the initial user namespace. The regression potential
> > for this is near to non-existent since user namespaces can't really do
> > anything with interesting devices.
> > 
> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> > ---
> >  lib/kobject_uevent.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> > index 15ea216a67ce..cb98cddb6e3b 100644
> > --- a/lib/kobject_uevent.c
> > +++ b/lib/kobject_uevent.c
> > @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
> >  		return sock_ns != ns;
> >  	}
> >  
> > -	return 0;
> > +	/*
> > +	 * The kobject does not carry a namespace tag so filter by user
> > +	 * namespace below.
> > +	 */
> > +	if (sock_net(dsk)->user_ns != &init_user_ns)
> > +		return 1;
> > +
> > +	/* Check if socket was opened from non-initial user namespace. */
> > +	return sk_user_ns(dsk) != &init_user_ns;
> >  }
> >  #endif
> 
> So, this prohibits to listen events of all devices except network-related
> in containers? If it's so, I don't think it's a good solution. Uevents is not

No, this is not correct: As it is right now *without my patch* no
non-initial user namespace is receiving *any uevents* but those
specifically namespaced such as those for network devices. This patch
doesn't change that at all. The commit message outlines this in detail
how this comes about.
There is only one case where this currently breaks and this is as I
outlined explicitly in my commit message when you create a new user
namespace and map container(0) -> host(0). This patch fixes this.

> net-devices-only related interface and it's used for all devices in system.
> People may want to delegate block devices to nested user_ns, for example.

That's fine but that's why I added uevent injection in a previous patch
series: I repeat no non-initial user namespace will by default receive
uevents.

Thanks!
Christian

^ permalink raw reply

* Re: [PATCH 00/12] Ethernet: Add and use ether_<type>_addr globals
From: Felix Fietkau @ 2018-04-05 14:05 UTC (permalink / raw)
  To: Joe Perches, netdev, linux-wireless, b43-dev, bridge,
	netfilter-devel, coreteam
  Cc: linux-kernel, brcm80211-dev-list.pdl, brcm80211-dev-list
In-Reply-To: <1522936292.11185.15.camel@perches.com>

On 2018-04-05 15:51, Joe Perches wrote:
>>  You have to factor in
>> not just the .text size, but the fact that referencing an exported
>> symbol needs a .reloc entry as well, which also eats up some space (at
>> least when the code is being built as module).
> 
> Thanks, the modules I built got smaller.
Please post some numbers to show this. By the way, on other
architectures the numbers will probably be different, especially on
ARM/MIPS.
>> In my opinion, your series probably causes more bloat in common
>> configurations instead of reducing it.
>> 
>> You're also touching several places that could easily use
>> eth_broadcast_addr and eth_zero_addr. I think making those changes would
>> be more productive than what you did in this series.
> 
> Doubtful. AFAIK: possible unaligned addresses.
Those two are just memset calls, alignment does not matter.

- Felix

^ permalink raw reply

* RE: [PATCH] net: phy: marvell: Enable interrupt function on LED2 pin
From: Bhadram Varka @ 2018-04-05 14:00 UTC (permalink / raw)
  To: Esben Haabendal
  Cc: Esben Haabendal, Rasmus Villemoes, Andrew Lunn, Florian Fainelli,
	open list, netdev@vger.kernel.org
In-Reply-To: <20180405133504.12257-1-esben.haabendal@gmail.com>

Hi Esben,

-----Original Message-----
From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On Behalf Of Esben Haabendal
Sent: Thursday, April 05, 2018 7:05 PM
To: netdev@vger.kernel.org
Cc: Esben Haabendal <eha@deif.com>; Rasmus Villemoes <rasmus.villemoes@prevas.dk>; Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>; open list <linux-kernel@vger.kernel.org>
Subject: [PATCH] net: phy: marvell: Enable interrupt function on LED2 pin

From: Esben Haabendal <eha@deif.com>

The LED2[2]/INTn pin on Marvell 88E1318S as well as 88E1510/12/14/18 needs to be configured to be usable as interrupt not only when WOL is enabled, but whenever we rely on interrupts from the PHY.

Signed-off-by: Esben Haabendal <eha@deif.com>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 drivers/net/phy/marvell.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index 0e0978d8a0eb..f03a510f1247 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -457,6 +457,21 @@ static int marvell_of_reg_init(struct phy_device *phydev)  }  #endif /* CONFIG_OF_MDIO */
 
+static int m88e1318_config_intr(struct phy_device *phydev) {
+	int err;
+
+	err = marvell_config_intr(phydev);
+	if (err)
+		return err;
+
+	/* Setup LED[2] as interrupt pin (active low) */
+	return phy_modify(phydev, MII_88E1318S_PHY_LED_TCR,
+			  MII_88E1318S_PHY_LED_TCR_FORCE_INT,
+			  MII_88E1318S_PHY_LED_TCR_INTn_ENABLE |
+			  MII_88E1318S_PHY_LED_TCR_INT_ACTIVE_LOW);

Can we move this part of the code to m88e1121_config_init() ?

Every time whether we disable or enable the interrupts this part of code will execute.

Thanks!

^ permalink raw reply

* Re: [PATCH 00/12] Ethernet: Add and use ether_<type>_addr globals
From: Joe Perches @ 2018-04-05 13:51 UTC (permalink / raw)
  To: Felix Fietkau, netdev, linux-wireless, b43-dev, bridge,
	netfilter-devel, coreteam
  Cc: linux-kernel, brcm80211-dev-list.pdl, brcm80211-dev-list
In-Reply-To: <e0ab9616-64fa-cdbc-7a77-7e25d70c1a52@nbd.name>

On Thu, 2018-04-05 at 15:27 +0200, Felix Fietkau wrote:
> On 2018-03-31 09:05, Joe Perches wrote:
> > There are many local static and non-static arrays that are used for
> > Ethernet broadcast address output or comparison.
> > 
> > Centralize the array into a single separate file and remove the local
> > arrays.
> 
> I suspect that for many targets and configurations, the local arrays
> might actually be smaller than exporting a global.

I tried x86-64 allnoconfig and defconfig.
Those both did not increase vmlinux size.

The defconfig actually got smaller, but that might have been
some object alignment oddity.

>  You have to factor in
> not just the .text size, but the fact that referencing an exported
> symbol needs a .reloc entry as well, which also eats up some space (at
> least when the code is being built as module).

Thanks, the modules I built got smaller.

> In my opinion, your series probably causes more bloat in common
> configurations instead of reducing it.
> 
> You're also touching several places that could easily use
> eth_broadcast_addr and eth_zero_addr. I think making those changes would
> be more productive than what you did in this series.

Doubtful. AFAIK: possible unaligned addresses.

^ permalink raw reply

* [PATCH net 6/6] vti6: better validate user provided tunnel names
From: Eric Dumazet @ 2018-04-05 13:39 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Steffen Klassert, Eric Dumazet, Eric Dumazet
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>

Use valid_name() to make sure user does not provide illegal
device name.

Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/ip6_vti.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 6ebb2e8777f42054ca5ee8338aa560f5501d8262..c214ffec02f06f6dccfb9769fc8640e5e56da618 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -212,10 +212,13 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p
 	char name[IFNAMSIZ];
 	int err;
 
-	if (p->name[0])
+	if (p->name[0]) {
+		if (!dev_valid_name(p->name))
+			goto failed;
 		strlcpy(name, p->name, IFNAMSIZ);
-	else
+	} else {
 		sprintf(name, "ip6_vti%%d");
+	}
 
 	dev = alloc_netdev(sizeof(*t), name, NET_NAME_UNKNOWN, vti6_dev_setup);
 	if (!dev)
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net 5/6] ip6_tunnel: better validate user provided tunnel names
From: Eric Dumazet @ 2018-04-05 13:39 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Steffen Klassert, Eric Dumazet, Eric Dumazet
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>

Use valid_name() to make sure user does not provide illegal
device name.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6_tunnel.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index df4c29f7d59f030729b1158b809a10f4115d4bbf..da66aaac51cecbf933827c8842e61a8cbb4d274f 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -297,13 +297,16 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
 	struct net_device *dev;
 	struct ip6_tnl *t;
 	char name[IFNAMSIZ];
-	int err = -ENOMEM;
+	int err = -E2BIG;
 
-	if (p->name[0])
+	if (p->name[0]) {
+		if (!dev_valid_name(p->name))
+			goto failed;
 		strlcpy(name, p->name, IFNAMSIZ);
-	else
+	} else {
 		sprintf(name, "ip6tnl%%d");
-
+	}
+	err = -ENOMEM;
 	dev = alloc_netdev(sizeof(*t), name, NET_NAME_UNKNOWN,
 			   ip6_tnl_dev_setup);
 	if (!dev)
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net 4/6] ip6_gre: better validate user provided tunnel names
From: Eric Dumazet @ 2018-04-05 13:39 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Steffen Klassert, Eric Dumazet, Eric Dumazet
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>

Use dev_valid_name() to make sure user does not provide illegal
device name.

syzbot caught the following bug :

BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339
Write of size 20 at addr ffff8801afb9f7b8 by task syzkaller851048/4466

CPU: 1 PID: 4466 Comm: syzkaller851048 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1b9/0x29f lib/dump_stack.c:53
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 memcpy+0x37/0x50 mm/kasan/kasan.c:303
 strlcpy include/linux/string.h:300 [inline]
 ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339
 ip6gre_tunnel_ioctl+0x69d/0x12e0 net/ipv6/ip6_gre.c:1195
 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
 sock_ioctl+0x47e/0x680 net/socket.c:1015
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 SYSC_ioctl fs/ioctl.c:708 [inline]
 SyS_ioctl+0x24/0x30 fs/ioctl.c:706
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv6/ip6_gre.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index f8a103bdbd603be103bf6c18ed6a55703aab18df..69727bc168cb027009dac95431e40b71291697da 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -335,11 +335,13 @@ static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net,
 	if (t || !create)
 		return t;
 
-	if (parms->name[0])
+	if (parms->name[0]) {
+		if (!dev_valid_name(parms->name))
+			return NULL;
 		strlcpy(name, parms->name, IFNAMSIZ);
-	else
+	} else {
 		strcpy(name, "ip6gre%d");
-
+	}
 	dev = alloc_netdev(sizeof(*t), name, NET_NAME_UNKNOWN,
 			   ip6gre_tunnel_setup);
 	if (!dev)
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net 3/6] ipv6: sit: better validate user provided tunnel names
From: Eric Dumazet @ 2018-04-05 13:39 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Steffen Klassert, Eric Dumazet, Eric Dumazet
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>

Use dev_valid_name() to make sure user does not provide illegal
device name.

syzbot caught the following bug :

BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254
Write of size 33 at addr ffff8801b64076d8 by task syzkaller932654/4453

CPU: 0 PID: 4453 Comm: syzkaller932654 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1b9/0x29f lib/dump_stack.c:53
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 memcpy+0x37/0x50 mm/kasan/kasan.c:303
 strlcpy include/linux/string.h:300 [inline]
 ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254
 ipip6_tunnel_ioctl+0xe71/0x241b net/ipv6/sit.c:1221
 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
 sock_ioctl+0x47e/0x680 net/socket.c:1015
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 SYSC_ioctl fs/ioctl.c:708 [inline]
 SyS_ioctl+0x24/0x30 fs/ioctl.c:706
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv6/sit.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 1522bcfd253fcc0a01a4daa0ebcfb8bf154ab5dc..2afce37a71776f5ebd44d20e2d064909cb91f8dd 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -250,11 +250,13 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
 	if (!create)
 		goto failed;
 
-	if (parms->name[0])
+	if (parms->name[0]) {
+		if (!dev_valid_name(parms->name))
+			goto failed;
 		strlcpy(name, parms->name, IFNAMSIZ);
-	else
+	} else {
 		strcpy(name, "sit%d");
-
+	}
 	dev = alloc_netdev(sizeof(*t), name, NET_NAME_UNKNOWN,
 			   ipip6_tunnel_setup);
 	if (!dev)
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net 2/6] ip_tunnel: better validate user provided tunnel names
From: Eric Dumazet @ 2018-04-05 13:39 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Steffen Klassert, Eric Dumazet, Eric Dumazet
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>

Use dev_valid_name() to make sure user does not provide illegal
device name.

syzbot caught the following bug :

BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
Write of size 20 at addr ffff8801ac79f810 by task syzkaller268107/4482

CPU: 0 PID: 4482 Comm: syzkaller268107 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1b9/0x29f lib/dump_stack.c:53
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 memcpy+0x37/0x50 mm/kasan/kasan.c:303
 strlcpy include/linux/string.h:300 [inline]
 __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
 ip_tunnel_create net/ipv4/ip_tunnel.c:352 [inline]
 ip_tunnel_ioctl+0x818/0xd40 net/ipv4/ip_tunnel.c:861
 ipip_tunnel_ioctl+0x1c5/0x420 net/ipv4/ipip.c:350
 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
 sock_ioctl+0x47e/0x680 net/socket.c:1015
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 SYSC_ioctl fs/ioctl.c:708 [inline]
 SyS_ioctl+0x24/0x30 fs/ioctl.c:706
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv4/ip_tunnel.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index de6d94482fe7ef6be52eeea3291849cfb9d961f9..6b0e362cc99b5d3510ea7aa8db68dcb5c8eb675c 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -253,13 +253,14 @@ static struct net_device *__ip_tunnel_create(struct net *net,
 	struct net_device *dev;
 	char name[IFNAMSIZ];
 
-	if (parms->name[0])
+	err = -E2BIG;
+	if (parms->name[0]) {
+		if (!dev_valid_name(parms->name))
+			goto failed;
 		strlcpy(name, parms->name, IFNAMSIZ);
-	else {
-		if (strlen(ops->kind) > (IFNAMSIZ - 3)) {
-			err = -E2BIG;
+	} else {
+		if (strlen(ops->kind) > (IFNAMSIZ - 3))
 			goto failed;
-		}
 		strlcpy(name, ops->kind, IFNAMSIZ);
 		strncat(name, "%d", 2);
 	}
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net 1/6] net: fool proof dev_valid_name()
From: Eric Dumazet @ 2018-04-05 13:39 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Steffen Klassert, Eric Dumazet, Eric Dumazet
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>

We want to use dev_valid_name() to validate tunnel names,
so better use strnlen(name, IFNAMSIZ) than strlen(name) to make
sure to not upset KASAN.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9b04a9fd1dfd0e065a7fe798dd840a07f0e0a4df..969462ebb296250fe5f3b7c4621e9ba9720a2dbe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1027,7 +1027,7 @@ bool dev_valid_name(const char *name)
 {
 	if (*name == '\0')
 		return false;
-	if (strlen(name) >= IFNAMSIZ)
+	if (strnlen(name, IFNAMSIZ) == IFNAMSIZ)
 		return false;
 	if (!strcmp(name, ".") || !strcmp(name, ".."))
 		return false;
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox