Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [bpf-next PATCH 0/3] bpf: improvements to xdp_fwd sample
From: Jesper Dangaard Brouer @ 2019-08-08  9:29 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Xdp, Anton Protopopov, dsahern, Toke Høiland-Jørgensen,
	brouer, netdev@vger.kernel.org
In-Reply-To: <CAC1LvL29KS9CKcXYwR4EHeNo7++i4hYQuXfY5OLtbPFDVUO2mw@mail.gmail.com>

On Wed, 7 Aug 2019 15:09:09 -0700
Zvi Effron <zeffron@riotgames.com> wrote:

> On Wed, Aug 7, 2019 at 6:00 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > Toke's devmap lookup improvement is first avail in kernel v5.3.
> > Thus, not part of XDP-tutorial yet.
> >  
> I probably missed this in an earlier email, but what are Toke's devmap
> improvements? Performance? Capability?

Toke's devmap and redirect improvements are primarily about usability.

Currently, from BPF-context (kernel-side) you cannot read the contents
of devmap (or cpumap or xskmap(AF_XDP)).  Because for devmap you get
the real pointer to the net_device ifindex, and we cannot allow you to
write/change that from BPF (kernel would likely crash or be inconsistent).

The work-around, is to keep a shadow map, that contains the "config" of
the devmap, which you check/validate against instead.  It is just a pain
to maintain this shadow map.  Toke's change allow you to read devmap
from BPF-context.  Thus, you can avoid this shadow map.

Another improvement from Toke, is that the bpf_redirect_map() helper,
now also check if the redirect index is valid in the map.  If not, then
it returns another value than XDP_REDIRECT.  You can choose the
alternative return value yourself, via "flags" e.g. XDP_PASS.  Thus,
you don't even need to check/validate devmap in your BPF-code, as it is
part of the bpf_redirect_map() call now.

 action = bpf_redirect_map(&map, &index, flags_as_xdp_value) 

The default flags used in most programs today is 0, which maps to
XDP_ABORTED.  This is sort of a small UAPI change, but for the better.
As today, the packet is dropped later, only diagnose/seen via
tracepoint xdp:xdp_redirect_map_err.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH v2 bpf-next] xdp: xdp_umem: fix umem pages mapping for 32bits systems
From: Ivan Khoronzhuk @ 2019-08-08  9:38 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson
  Cc: davem, ast, daniel, john.fastabend, hawk, netdev, bpf,
	xdp-newbies, linux-kernel, Ivan Khoronzhuk

Use kmap instead of page_address as it's not always in low memory.

Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---

Based on bpf-next/master
v2..v1:
	included highmem.h

v1: https://lkml.org/lkml/2019/6/26/693

 net/xdp/xdp_umem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..a0607969f8c0 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -14,6 +14,7 @@
 #include <linux/netdevice.h>
 #include <linux/rtnetlink.h>
 #include <linux/idr.h>
+#include <linux/highmem.h>
 
 #include "xdp_umem.h"
 #include "xsk_queue.h"
@@ -164,6 +165,14 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 	umem->zc = false;
 }
 
+static void xdp_umem_unmap_pages(struct xdp_umem *umem)
+{
+	unsigned int i;
+
+	for (i = 0; i < umem->npgs; i++)
+		kunmap(umem->pgs[i]);
+}
+
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
 	unsigned int i;
@@ -207,6 +216,7 @@ static void xdp_umem_release(struct xdp_umem *umem)
 
 	xsk_reuseq_destroy(umem);
 
+	xdp_umem_unmap_pages(umem);
 	xdp_umem_unpin_pages(umem);
 
 	kfree(umem->pages);
@@ -369,7 +379,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	}
 
 	for (i = 0; i < umem->npgs; i++)
-		umem->pages[i].addr = page_address(umem->pgs[i]);
+		umem->pages[i].addr = kmap(umem->pgs[i]);
 
 	return 0;
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH] pcan_usb_fd: zero out the common command buffer
From: Oliver Neukum @ 2019-08-08  9:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: Oliver Neukum

Lest we leak kernel memory to a device we better zero out buffers.

Reported-by: syzbot+513e4d0985298538bf9b@syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
---
 drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
index 34761c3a6286..47cc1ff5b88e 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
@@ -841,7 +841,7 @@ static int pcan_usb_fd_init(struct peak_usb_device *dev)
 			goto err_out;
 
 		/* allocate command buffer once for all for the interface */
-		pdev->cmd_buffer_addr = kmalloc(PCAN_UFD_CMD_BUFFER_SIZE,
+		pdev->cmd_buffer_addr = kzalloc(PCAN_UFD_CMD_BUFFER_SIZE,
 						GFP_KERNEL);
 		if (!pdev->cmd_buffer_addr)
 			goto err_out_1;
-- 
2.16.4


^ permalink raw reply related

* [PATCH] zd1211rw: remove false assertion from zd_mac_clear()
From: Oliver Neukum @ 2019-08-08  9:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: Oliver Neukum

The function is called before the lock which is asserted was ever used.
Just remove it.

Reported-by: syzbot+74c65761783d66a9c97c@syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
---
 drivers/net/wireless/zydas/zd1211rw/zd_mac.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/zydas/zd1211rw/zd_mac.c b/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
index da7e63fca9f5..a9999d10ae81 100644
--- a/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
@@ -223,7 +223,6 @@ void zd_mac_clear(struct zd_mac *mac)
 {
 	flush_workqueue(zd_workqueue);
 	zd_chip_clear(&mac->chip);
-	lockdep_assert_held(&mac->lock);
 	ZD_MEMCLEAR(mac, sizeof(struct zd_mac));
 }
 
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 0/2] Fix collisions in socket cookie generation
From: Daniel Borkmann @ 2019-08-08  9:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, bpf, m, edumazet, ast, willemb, Daniel Borkmann

This change makes the socket cookie generator as a global counter
instead of per netns in order to fix cookie collisions for BPF use
cases we ran into. See main patch #1 for more details.

Given the change is small/trivial and fixes an issue we're seeing
my preference would be net tree (though it cleanly applies to
net-next as well). Went for net tree instead of bpf tree here given
the main change is in net/core/sock_diag.c, but either way would be
fine with me.

Thanks a lot!

Daniel Borkmann (2):
  sock: make cookie generation global instead of per netns
  bpf: sync bpf.h to tools infrastructure

 include/net/net_namespace.h    |  1 -
 include/uapi/linux/bpf.h       |  4 ++--
 net/core/sock_diag.c           |  3 ++-
 tools/include/uapi/linux/bpf.h | 11 +++++++----
 4 files changed, 11 insertions(+), 8 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH net 2/2] bpf: sync bpf.h to tools infrastructure
From: Daniel Borkmann @ 2019-08-08  9:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, bpf, m, edumazet, ast, willemb, Daniel Borkmann
In-Reply-To: <20190808094937.26918-1-daniel@iogearbox.net>

Pull in updates in BPF helper function description.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/include/uapi/linux/bpf.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4e455018da65..a5aa7d3ac6a1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1466,8 +1466,8 @@ union bpf_attr {
  * 		If no cookie has been set yet, generate a new cookie. Once
  * 		generated, the socket cookie remains stable for the life of the
  * 		socket. This helper can be useful for monitoring per socket
- * 		networking traffic statistics as it provides a unique socket
- * 		identifier per namespace.
+ * 		networking traffic statistics as it provides a global socket
+ * 		identifier that can be assumed unique.
  * 	Return
  * 		A 8-byte long non-decreasing number on success, or 0 if the
  * 		socket field is missing inside *skb*.
@@ -1571,8 +1571,11 @@ union bpf_attr {
  * 		but this is only implemented for native XDP (with driver
  * 		support) as of this writing).
  *
- * 		All values for *flags* are reserved for future usage, and must
- * 		be left at zero.
+ * 		The lower two bits of *flags* are used as the return code if
+ * 		the map lookup fails. This is so that the return value can be
+ * 		one of the XDP program return codes up to XDP_TX, as chosen by
+ * 		the caller. Any higher bits in the *flags* argument must be
+ * 		unset.
  *
  * 		When used to redirect packets to net devices, this helper
  * 		provides a high performance increase over **bpf_redirect**\ ().
-- 
2.17.1


^ permalink raw reply related

* [PATCH net 1/2] sock: make cookie generation global instead of per netns
From: Daniel Borkmann @ 2019-08-08  9:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, bpf, m, edumazet, ast, willemb, Daniel Borkmann
In-Reply-To: <20190808094937.26918-1-daniel@iogearbox.net>

Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.

The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veths or ipvlan slaves. Change the counter to be global instead.

Socket cookie consumers must assume the value as opqaue in any case.
The cookie does not guarantee an always unique identifier since it
could wrap in fabricated corner cases where two sockets could end up
holding the same cookie, but is good enough to be used as a hint for
many use cases; not every socket must have a cookie generated hence
knowledge of the counter value does not provide much value either way.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Martynas Pumputis <m@lambda.lt>
---
 include/net/net_namespace.h | 1 -
 include/uapi/linux/bpf.h    | 4 ++--
 net/core/sock_diag.c        | 3 ++-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4a9da951a794..cb668bc2692d 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -61,7 +61,6 @@ struct net {
 	spinlock_t		rules_mod_lock;
 
 	u32			hash_mix;
-	atomic64_t		cookie_gen;
 
 	struct list_head	list;		/* list of network namespaces */
 	struct list_head	exit_list;	/* To linked to call pernet exit
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fa1c753dcdbc..a5aa7d3ac6a1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1466,8 +1466,8 @@ union bpf_attr {
  * 		If no cookie has been set yet, generate a new cookie. Once
  * 		generated, the socket cookie remains stable for the life of the
  * 		socket. This helper can be useful for monitoring per socket
- * 		networking traffic statistics as it provides a unique socket
- * 		identifier per namespace.
+ * 		networking traffic statistics as it provides a global socket
+ * 		identifier that can be assumed unique.
  * 	Return
  * 		A 8-byte long non-decreasing number on success, or 0 if the
  * 		socket field is missing inside *skb*.
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 3312a5849a97..c13ffbd33d8d 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -19,6 +19,7 @@ static const struct sock_diag_handler *sock_diag_handlers[AF_MAX];
 static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh);
 static DEFINE_MUTEX(sock_diag_table_mutex);
 static struct workqueue_struct *broadcast_wq;
+static atomic64_t cookie_gen;
 
 u64 sock_gen_cookie(struct sock *sk)
 {
@@ -27,7 +28,7 @@ u64 sock_gen_cookie(struct sock *sk)
 
 		if (res)
 			return res;
-		res = atomic64_inc_return(&sock_net(sk)->cookie_gen);
+		res = atomic64_inc_return(&cookie_gen);
 		atomic64_cmpxchg(&sk->sk_cookie, 0, res);
 	}
 }
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH rdma-next 0/4] Add XRQ and SRQ support to DEVX interface
From: Leon Romanovsky @ 2019-08-08 10:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: RDMA mailing list, Edward Srouji, Saeed Mahameed, Yishai Hadas,
	linux-netdev
In-Reply-To: <20190808084358.29517-1-leon@kernel.org>

On Thu, Aug 08, 2019 at 11:43:54AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
>
> Hi,
>
> This small series extends DEVX interface with SRQ and XRQ legacy commands.

Sorry for typo in cover letter, there is no SRQ here.

Thanks

>
> Thanks
>
> Yishai Hadas (4):
>   net/mlx5: Use debug message instead of warn
>   net/mlx5: Add XRQ legacy commands opcodes
>   IB/mlx5: Add legacy events to DEVX list
>   IB/mlx5: Expose XRQ legacy commands over the DEVX interface
>
>  drivers/infiniband/hw/mlx5/devx.c             | 12 ++++++++++++
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  4 ++++
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  2 +-
>  include/linux/mlx5/device.h                   |  9 +++++++++
>  include/linux/mlx5/mlx5_ifc.h                 |  2 ++
>  5 files changed, 28 insertions(+), 1 deletion(-)
>
> --
> 2.20.1
>

^ permalink raw reply

* Re: [PATCH] net/netfilter/nf_nat_proto.c - make tables static
From: Florian Westphal @ 2019-08-08 10:33 UTC (permalink / raw)
  To: Valdis Klētnieks
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <55481.1565243002@turing-police>

Valdis Klētnieks <valdis.kletnieks@vt.edu> wrote:
> Sparse warns about two tables not being declared.
> 
>   CHECK   net/netfilter/nf_nat_proto.c
> net/netfilter/nf_nat_proto.c:725:26: warning: symbol 'nf_nat_ipv4_ops' was not declared. Should it be static?
> net/netfilter/nf_nat_proto.c:964:26: warning: symbol 'nf_nat_ipv6_ops' was not declared. Should it be static?
> 
> And in fact they can indeed be static.

Acked-by: Florian Westphal <fw@strlen.de>

Seems i removed the static qualifier when i added inet nat support,
but the patch that was merged doesn't use them outside of
nf_nat_proto.c.

Thanks for fixing this.

^ permalink raw reply

* Re: [PATCH net 1/2] sock: make cookie generation global instead of per netns
From: Eric Dumazet @ 2019-08-08 10:45 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, netdev, bpf, m, Alexei Starovoitov,
	Willem de Bruijn
In-Reply-To: <20190808094937.26918-2-daniel@iogearbox.net>

On Thu, Aug 8, 2019 at 11:50 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>

> Socket cookie consumers must assume the value as opqaue in any case.
> The cookie does not guarantee an always unique identifier since it
> could wrap in fabricated corner cases where two sockets could end up
> holding the same cookie,

What do you mean by this ?

Cookie is guaranteed to be unique, it is from a 64bit counter...

There should be no collision.

> but is good enough to be used as a hint for
> many use cases; not every socket must have a cookie generated hence
> knowledge of the counter value does not provide much value either way.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Martynas Pumputis <m@lambda.lt>
> ---
>  include/net/net_namespace.h | 1 -
>  include/uapi/linux/bpf.h    | 4 ++--
>  net/core/sock_diag.c        | 3 ++-
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index 4a9da951a794..cb668bc2692d 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -61,7 +61,6 @@ struct net {
>         spinlock_t              rules_mod_lock;
>
>         u32                     hash_mix;
> -       atomic64_t              cookie_gen;
>
>         struct list_head        list;           /* list of network namespaces */
>         struct list_head        exit_list;      /* To linked to call pernet exit
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index fa1c753dcdbc..a5aa7d3ac6a1 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1466,8 +1466,8 @@ union bpf_attr {
>   *             If no cookie has been set yet, generate a new cookie. Once
>   *             generated, the socket cookie remains stable for the life of the
>   *             socket. This helper can be useful for monitoring per socket
> - *             networking traffic statistics as it provides a unique socket
> - *             identifier per namespace.
> + *             networking traffic statistics as it provides a global socket
> + *             identifier that can be assumed unique.
>   *     Return
>   *             A 8-byte long non-decreasing number on success, or 0 if the
>   *             socket field is missing inside *skb*.
> diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
> index 3312a5849a97..c13ffbd33d8d 100644
> --- a/net/core/sock_diag.c
> +++ b/net/core/sock_diag.c
> @@ -19,6 +19,7 @@ static const struct sock_diag_handler *sock_diag_handlers[AF_MAX];
>  static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh);
>  static DEFINE_MUTEX(sock_diag_table_mutex);
>  static struct workqueue_struct *broadcast_wq;
> +static atomic64_t cookie_gen;
>
>  u64 sock_gen_cookie(struct sock *sk)
>  {
> @@ -27,7 +28,7 @@ u64 sock_gen_cookie(struct sock *sk)
>
>                 if (res)
>                         return res;
> -               res = atomic64_inc_return(&sock_net(sk)->cookie_gen);
> +               res = atomic64_inc_return(&cookie_gen);
>                 atomic64_cmpxchg(&sk->sk_cookie, 0, res);
>         }
>  }
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH bpf-next] btf: expose BTF info through sysfs
From: Jiri Olsa @ 2019-08-08 10:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, yhs, andrii.nakryiko, kernel-team,
	Masahiro Yamada, Arnaldo Carvalho de Melo, Jiri Olsa
In-Reply-To: <20190807183821.138728-1-andriin@fb.com>

On Wed, Aug 07, 2019 at 11:38:21AM -0700, Andrii Nakryiko wrote:
> Make .BTF section allocated and expose its contents through sysfs.
> 
> /sys/kernel/btf directory is created to contain all the BTFs present
> inside kernel. Currently there is only kernel's main BTF, represented as
> /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> each module will expose its BTF as /sys/kernel/btf/<module-name> file.
> 
> Current approach relies on a few pieces coming together:
> 1. pahole is used to take almost final vmlinux image (modulo .BTF and
>    kallsyms) and generate .BTF section by converting DWARF info into
>    BTF. This section is not allocated and not mapped to any segment,
>    though, so is not yet accessible from inside kernel at runtime.
> 2. objcopy dumps .BTF contents into binary file and subsequently
>    convert binary file into linkable object file with automatically
>    generated symbols _binary__btf_kernel_bin_start and
>    _binary__btf_kernel_bin_end, pointing to start and end, respectively,
>    of BTF raw data.
> 3. final vmlinux image is generated by linking this object file (and
>    kallsyms, if necessary). sysfs_btf.c then creates
>    /sys/kernel/btf/kernel file and exposes embedded BTF contents through
>    it. This allows, e.g., libbpf and bpftool access BTF info at
>    well-known location, without resorting to searching for vmlinux image
>    on disk (location of which is not standardized and vmlinux image
>    might not be even available in some scenarios, e.g., inside qemu
>    during testing).
> 
> Alternative approach using .incbin assembler directive to embed BTF
> contents directly was attempted but didn't work, because sysfs_proc.o is
> not re-compiled during link-vmlinux.sh stage. This is required, though,
> to update embedded BTF data (initially empty data is embedded, then
> pahole generates BTF info and we need to regenerate sysfs_btf.o with
> updated contents, but it's too late at that point).
> 
> If BTF couldn't be generated due to missing or too old pahole,
> sysfs_btf.c handles that gracefully by detecting that
> _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> /sys/kernel/btf at all.
> 
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  kernel/bpf/Makefile     |  3 +++
>  kernel/bpf/sysfs_btf.c  | 52 ++++++++++++++++++++++++++++++++++++++++
>  scripts/link-vmlinux.sh | 53 ++++++++++++++++++++++++++---------------
>  3 files changed, 89 insertions(+), 19 deletions(-)
>  create mode 100644 kernel/bpf/sysfs_btf.c
> 
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 29d781061cd5..e1d9adb212f9 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -22,3 +22,6 @@ obj-$(CONFIG_CGROUP_BPF) += cgroup.o
>  ifeq ($(CONFIG_INET),y)
>  obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
>  endif
> +ifeq ($(CONFIG_SYSFS),y)
> +obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
> +endif
> diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
> new file mode 100644
> index 000000000000..ac06ce1d62e8
> --- /dev/null
> +++ b/kernel/bpf/sysfs_btf.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Provide kernel BTF information for introspection and use by eBPF tools.
> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/kobject.h>
> +#include <linux/init.h>
> +
> +/* See scripts/link-vmlinux.sh, gen_btf() func for details */
> +extern char __weak _binary__btf_kernel_bin_start[];
> +extern char __weak _binary__btf_kernel_bin_end[];
> +
> +static ssize_t
> +btf_kernel_read(struct file *file, struct kobject *kobj,
> +		struct bin_attribute *bin_attr,
> +		char *buf, loff_t off, size_t len)
> +{
> +	memcpy(buf, _binary__btf_kernel_bin_start + off, len);

hum, should you check the end of the btf data?
maybe use the memory_read_from_buffer function instead

jirka

^ permalink raw reply

* Re: [PATCH bpf-next] btf: expose BTF info through sysfs
From: Jiri Olsa @ 2019-08-08 10:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, yhs, andrii.nakryiko, kernel-team,
	Masahiro Yamada, Arnaldo Carvalho de Melo, Jiri Olsa
In-Reply-To: <20190808105558.GB31775@krava>

On Thu, Aug 08, 2019 at 12:56:01PM +0200, Jiri Olsa wrote:
> On Wed, Aug 07, 2019 at 11:38:21AM -0700, Andrii Nakryiko wrote:
> > Make .BTF section allocated and expose its contents through sysfs.
> > 
> > /sys/kernel/btf directory is created to contain all the BTFs present
> > inside kernel. Currently there is only kernel's main BTF, represented as
> > /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> > each module will expose its BTF as /sys/kernel/btf/<module-name> file.
> > 
> > Current approach relies on a few pieces coming together:
> > 1. pahole is used to take almost final vmlinux image (modulo .BTF and
> >    kallsyms) and generate .BTF section by converting DWARF info into
> >    BTF. This section is not allocated and not mapped to any segment,
> >    though, so is not yet accessible from inside kernel at runtime.
> > 2. objcopy dumps .BTF contents into binary file and subsequently
> >    convert binary file into linkable object file with automatically
> >    generated symbols _binary__btf_kernel_bin_start and
> >    _binary__btf_kernel_bin_end, pointing to start and end, respectively,
> >    of BTF raw data.
> > 3. final vmlinux image is generated by linking this object file (and
> >    kallsyms, if necessary). sysfs_btf.c then creates
> >    /sys/kernel/btf/kernel file and exposes embedded BTF contents through
> >    it. This allows, e.g., libbpf and bpftool access BTF info at
> >    well-known location, without resorting to searching for vmlinux image
> >    on disk (location of which is not standardized and vmlinux image
> >    might not be even available in some scenarios, e.g., inside qemu
> >    during testing).
> > 
> > Alternative approach using .incbin assembler directive to embed BTF
> > contents directly was attempted but didn't work, because sysfs_proc.o is
> > not re-compiled during link-vmlinux.sh stage. This is required, though,
> > to update embedded BTF data (initially empty data is embedded, then
> > pahole generates BTF info and we need to regenerate sysfs_btf.o with
> > updated contents, but it's too late at that point).
> > 
> > If BTF couldn't be generated due to missing or too old pahole,
> > sysfs_btf.c handles that gracefully by detecting that
> > _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> > /sys/kernel/btf at all.
> > 
> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Jiri Olsa <jolsa@kernel.org>
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  kernel/bpf/Makefile     |  3 +++
> >  kernel/bpf/sysfs_btf.c  | 52 ++++++++++++++++++++++++++++++++++++++++
> >  scripts/link-vmlinux.sh | 53 ++++++++++++++++++++++++++---------------
> >  3 files changed, 89 insertions(+), 19 deletions(-)
> >  create mode 100644 kernel/bpf/sysfs_btf.c
> > 
> > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> > index 29d781061cd5..e1d9adb212f9 100644
> > --- a/kernel/bpf/Makefile
> > +++ b/kernel/bpf/Makefile
> > @@ -22,3 +22,6 @@ obj-$(CONFIG_CGROUP_BPF) += cgroup.o
> >  ifeq ($(CONFIG_INET),y)
> >  obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
> >  endif
> > +ifeq ($(CONFIG_SYSFS),y)
> > +obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
> > +endif
> > diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
> > new file mode 100644
> > index 000000000000..ac06ce1d62e8
> > --- /dev/null
> > +++ b/kernel/bpf/sysfs_btf.c
> > @@ -0,0 +1,52 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Provide kernel BTF information for introspection and use by eBPF tools.
> > + */
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +#include <linux/kobject.h>
> > +#include <linux/init.h>
> > +
> > +/* See scripts/link-vmlinux.sh, gen_btf() func for details */
> > +extern char __weak _binary__btf_kernel_bin_start[];
> > +extern char __weak _binary__btf_kernel_bin_end[];
> > +
> > +static ssize_t
> > +btf_kernel_read(struct file *file, struct kobject *kobj,
> > +		struct bin_attribute *bin_attr,
> > +		char *buf, loff_t off, size_t len)
> > +{
> > +	memcpy(buf, _binary__btf_kernel_bin_start + off, len);
> 
> hum, should you check the end of the btf data?
> maybe use the memory_read_from_buffer function instead

nah, looks like that size settings will do that for you
in sysfs_kf_bin_read, nice.. nevermind then ;-)

jirka

^ permalink raw reply

* Re: [PATCH net 1/2] sock: make cookie generation global instead of per netns
From: Daniel Borkmann @ 2019-08-08 11:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, bpf, m, Alexei Starovoitov,
	Willem de Bruijn
In-Reply-To: <CANn89iKzaxxyC=6s45PEnTsKfz7GN4HHOw3wtpb6-ozrJSRP=g@mail.gmail.com>

On 8/8/19 12:45 PM, Eric Dumazet wrote:
> On Thu, Aug 8, 2019 at 11:50 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
>> Socket cookie consumers must assume the value as opqaue in any case.
>> The cookie does not guarantee an always unique identifier since it
>> could wrap in fabricated corner cases where two sockets could end up
>> holding the same cookie,
> 
> What do you mean by this ?
> 
> Cookie is guaranteed to be unique, it is from a 64bit counter...
> 
> There should be no collision.

I meant the [theoretical] corner case where socket_1 has cookie X and
we'd create, trigger sock_gen_cookie() to increment, close socket in a
loop until we wrap and get another cookie X for socket_2; agree it's
impractical and for little gain anyway. So in practice there should be
no collision which is what I tried to say.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net-next v1 1/8] netfilter: inlined four headers files into another one.
From: Pablo Neira Ayuso @ 2019-08-08 11:23 UTC (permalink / raw)
  To: Jeremy Sowden; +Cc: Netfilter Devel, Net Dev, Masahiro Yamada, kadlec
In-Reply-To: <20190807141705.4864-2-jeremy@azazel.net>

Hi Jeremy,

Thanks for working on this.

Cc'ing Jozsef.

On Wed, Aug 07, 2019 at 03:16:58PM +0100, Jeremy Sowden wrote:
[...]
> +/* Called from uadd only, protected by the set spinlock.
> + * The kadt functions don't use the comment extensions in any way.
> + */
> +static inline void
> +ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
> +		    const struct ip_set_ext *ext)

Not related to this patch, but I think the number of inline functions
could be reduced a bit by exporting symbols? Specifically for
functions that are called from the netlink control plane, ie. _uadd()
functions. I think forcing the compiler to inline this is not useful.
This could be done in a follow up patchset.

Thanks.

^ permalink raw reply

* Re: [PATCH] net: ethernet: et131x: Use GFP_KERNEL instead of GFP_ATOMIC when allocating tx_ring->tcb_ring
From: Matthew Wilcox @ 2019-08-08 11:24 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Christophe JAILLET, mark.einon, davem, f.fainelli, andrew, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <20190807222346.00002ba7@intel.com>

On Wed, Aug 07, 2019 at 10:23:46PM -0700, Jesse Brandeburg wrote:
> On Wed, 31 Jul 2019 09:38:42 +0200
> Christophe JAILLET <christophe.jaillet@wanadoo.fr> wrote:
> 
> > There is no good reason to use GFP_ATOMIC here. Other memory allocations
> > are performed with GFP_KERNEL (see other 'dma_alloc_coherent()' below and
> > 'kzalloc()' in 'et131x_rx_dma_memory_alloc()')
> > 
> > Use GFP_KERNEL which should be enough.
> > 
> > Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> 
> Sure, but generally I'd say GFP_ATOMIC is ok if you're in an init path
> and you can afford to have the allocation thread sleep while memory is
> being found by the kernel.

That's not what GFP_ATOMIC means.  GFP_ATOMIC _will not_ sleep.  GFP_KERNEL
will.

^ permalink raw reply

* Re: [PATCH net 1/2] sock: make cookie generation global instead of per netns
From: Eric Dumazet @ 2019-08-08 11:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, netdev, bpf, m, Alexei Starovoitov,
	Willem de Bruijn
In-Reply-To: <d87d35a1-0ebc-4e48-1950-e94fde62a6c4@iogearbox.net>

On Thu, Aug 8, 2019 at 1:09 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/8/19 12:45 PM, Eric Dumazet wrote:
> > On Thu, Aug 8, 2019 at 11:50 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> >> Socket cookie consumers must assume the value as opqaue in any case.
> >> The cookie does not guarantee an always unique identifier since it
> >> could wrap in fabricated corner cases where two sockets could end up
> >> holding the same cookie,
> >
> > What do you mean by this ?
> >
> > Cookie is guaranteed to be unique, it is from a 64bit counter...
> >
> > There should be no collision.
>
> I meant the [theoretical] corner case where socket_1 has cookie X and
> we'd create, trigger sock_gen_cookie() to increment, close socket in a
> loop until we wrap and get another cookie X for socket_2; agree it's
> impractical and for little gain anyway. So in practice there should be
> no collision which is what I tried to say.


If a 64bit counter, updated by one unit at a time could overflow
during the lifetime of a host,
I would agree with you, but this can not happen, even if we succeed to
make 1 billion
locked increments per second (this would still need 584 years)

I would prefer not mentioning something that can not possibly happen
in your changelog ;)

^ permalink raw reply

* Re: [PATCH net 1/2] sock: make cookie generation global instead of per netns
From: Daniel Borkmann @ 2019-08-08 11:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, bpf, m, Alexei Starovoitov,
	Willem de Bruijn
In-Reply-To: <CANn89iLqhYF=JYtNtB25O=0a_tn50dRko3fqvvC-sWTZXuK+0g@mail.gmail.com>

On 8/8/19 1:37 PM, Eric Dumazet wrote:
> On Thu, Aug 8, 2019 at 1:09 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 8/8/19 12:45 PM, Eric Dumazet wrote:
>>> On Thu, Aug 8, 2019 at 11:50 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>
>>>> Socket cookie consumers must assume the value as opqaue in any case.
>>>> The cookie does not guarantee an always unique identifier since it
>>>> could wrap in fabricated corner cases where two sockets could end up
>>>> holding the same cookie,
>>>
>>> What do you mean by this ?
>>>
>>> Cookie is guaranteed to be unique, it is from a 64bit counter...
>>>
>>> There should be no collision.
>>
>> I meant the [theoretical] corner case where socket_1 has cookie X and
>> we'd create, trigger sock_gen_cookie() to increment, close socket in a
>> loop until we wrap and get another cookie X for socket_2; agree it's
>> impractical and for little gain anyway. So in practice there should be
>> no collision which is what I tried to say.
> 
> If a 64bit counter, updated by one unit at a time could overflow
> during the lifetime of a host,
> I would agree with you, but this can not happen, even if we succeed to
> make 1 billion
> locked increments per second (this would still need 584 years)
> 
> I would prefer not mentioning something that can not possibly happen
> in your changelog ;)

Yep fair enough, makes sense. I'll fix it :)

^ permalink raw reply

* [PATCH v3 2/2] dt-bindings: net: meson-dwmac: convert to yaml
From: Neil Armstrong @ 2019-08-08 11:41 UTC (permalink / raw)
  To: robh+dt
  Cc: Neil Armstrong, martin.blumenstingl, devicetree, netdev,
	linux-amlogic, linux-arm-kernel, linux-kernel, Rob Herring
In-Reply-To: <20190808114101.29982-1-narmstrong@baylibre.com>

Now that we have the DT validation in place, let's convert the device tree
bindings for the Synopsys DWMAC Glue for Amlogic SoCs over to a YAML schemas.

Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
---
 .../bindings/net/amlogic,meson-dwmac.yaml     | 113 ++++++++++++++++++
 .../devicetree/bindings/net/meson-dwmac.txt   |  71 -----------
 .../devicetree/bindings/net/snps,dwmac.yaml   |   5 +
 3 files changed, 118 insertions(+), 71 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
 delete mode 100644 Documentation/devicetree/bindings/net/meson-dwmac.txt

diff --git a/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml b/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
new file mode 100644
index 000000000000..ae91aa9d8616
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
@@ -0,0 +1,113 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+# Copyright 2019 BayLibre, SAS
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/net/amlogic,meson-dwmac.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: Amlogic Meson DWMAC Ethernet controller
+
+maintainers:
+  - Neil Armstrong <narmstrong@baylibre.com>
+  - Martin Blumenstingl <martin.blumenstingl@googlemail.com>
+
+# We need a select here so we don't match all nodes with 'snps,dwmac'
+select:
+  properties:
+    compatible:
+      contains:
+        enum:
+          - amlogic,meson6-dwmac
+          - amlogic,meson8b-dwmac
+          - amlogic,meson8m2-dwmac
+          - amlogic,meson-gxbb-dwmac
+          - amlogic,meson-axg-dwmac
+  required:
+    - compatible
+
+allOf:
+  - $ref: "snps,dwmac.yaml#"
+  - if:
+      properties:
+        compatible:
+          contains:
+            enum:
+              - amlogic,meson8b-dwmac
+              - amlogic,meson8m2-dwmac
+              - amlogic,meson-gxbb-dwmac
+              - amlogic,meson-axg-dwmac
+
+    then:
+      properties:
+        clocks:
+          items:
+            - description: GMAC main clock
+            - description: First parent clock of the internal mux
+            - description: Second parent clock of the internal mux
+
+        clock-names:
+          minItems: 3
+          maxItems: 3
+          items:
+            - const: stmmaceth
+            - const: clkin0
+            - const: clkin1
+
+        amlogic,tx-delay-ns:
+          $ref: /schemas/types.yaml#definitions/uint32
+          description:
+            The internal RGMII TX clock delay (provided by this driver) in
+            nanoseconds. Allowed values are 0ns, 2ns, 4ns, 6ns.
+            When phy-mode is set to "rgmii" then the TX delay should be
+            explicitly configured. When not configured a fallback of 2ns is
+            used. When the phy-mode is set to either "rgmii-id" or "rgmii-txid"
+            the TX clock delay is already provided by the PHY. In that case
+            this property should be set to 0ns (which disables the TX clock
+            delay in the MAC to prevent the clock from going off because both
+            PHY and MAC are adding a delay).
+            Any configuration is ignored when the phy-mode is set to "rmii".
+
+properties:
+  compatible:
+    additionalItems: true
+    maxItems: 3
+    items:
+      - enum:
+          - amlogic,meson6-dwmac
+          - amlogic,meson8b-dwmac
+          - amlogic,meson8m2-dwmac
+          - amlogic,meson-gxbb-dwmac
+          - amlogic,meson-axg-dwmac
+    contains:
+      enum:
+        - snps,dwmac-3.70a
+        - snps,dwmac
+
+  reg:
+    items:
+      - description:
+          The first register range should be the one of the DWMAC controller
+      - description:
+          The second range is is for the Amlogic specific configuration
+          (for example the PRG_ETHERNET register range on Meson8b and newer)
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - interrupt-names
+  - clocks
+  - clock-names
+  - phy-mode
+
+examples:
+  - |
+    ethmac: ethernet@c9410000 {
+         compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
+         reg = <0xc9410000 0x10000>, <0xc8834540 0x8>;
+         interrupts = <8>;
+         interrupt-names = "macirq";
+         clocks = <&clk_eth>, <&clkc_fclk_div2>, <&clk_mpll2>;
+         clock-names = "stmmaceth", "clkin0", "clkin1";
+         phy-mode = "rgmii";
+    };
diff --git a/Documentation/devicetree/bindings/net/meson-dwmac.txt b/Documentation/devicetree/bindings/net/meson-dwmac.txt
deleted file mode 100644
index 1321bb194ed9..000000000000
--- a/Documentation/devicetree/bindings/net/meson-dwmac.txt
+++ /dev/null
@@ -1,71 +0,0 @@
-* Amlogic Meson DWMAC Ethernet controller
-
-The device inherits all the properties of the dwmac/stmmac devices
-described in the file stmmac.txt in the current directory with the
-following changes.
-
-Required properties on all platforms:
-
-- compatible:	Depending on the platform this should be one of:
-			- "amlogic,meson6-dwmac"
-			- "amlogic,meson8b-dwmac"
-			- "amlogic,meson8m2-dwmac"
-			- "amlogic,meson-gxbb-dwmac"
-			- "amlogic,meson-axg-dwmac"
-		Additionally "snps,dwmac" and any applicable more
-		detailed version number described in net/stmmac.txt
-		should be used.
-
-- reg:	The first register range should be the one of the DWMAC
-	controller. The second range is is for the Amlogic specific
-	configuration (for example the PRG_ETHERNET register range
-	on Meson8b and newer)
-
-Required properties on Meson8b, Meson8m2, GXBB and newer:
-- clock-names:	Should contain the following:
-		- "stmmaceth" - see stmmac.txt
-		- "clkin0" - first parent clock of the internal mux
-		- "clkin1" - second parent clock of the internal mux
-
-Optional properties on Meson8b, Meson8m2, GXBB and newer:
-- amlogic,tx-delay-ns:	The internal RGMII TX clock delay (provided
-			by this driver) in nanoseconds. Allowed values
-			are: 0ns, 2ns, 4ns, 6ns.
-			When phy-mode is set to "rgmii" then the TX
-			delay should be explicitly configured. When
-			not configured a fallback of 2ns is used.
-			When the phy-mode is set to either "rgmii-id"
-			or "rgmii-txid" the TX clock delay is already
-			provided by the PHY. In that case this
-			property should be set to 0ns (which disables
-			the TX clock delay in the MAC to prevent the
-			clock from going off because both PHY and MAC
-			are adding a delay).
-			Any configuration is ignored when the phy-mode
-			is set to "rmii".
-
-Example for Meson6:
-
-	ethmac: ethernet@c9410000 {
-		compatible = "amlogic,meson6-dwmac", "snps,dwmac";
-		reg = <0xc9410000 0x10000
-		       0xc1108108 0x4>;
-		interrupts = <0 8 1>;
-		interrupt-names = "macirq";
-		clocks = <&clk81>;
-		clock-names = "stmmaceth";
-	}
-
-Example for GXBB:
-	ethmac: ethernet@c9410000 {
-		compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
-		reg = <0x0 0xc9410000 0x0 0x10000>,
-			<0x0 0xc8834540 0x0 0x8>;
-		interrupts = <0 8 1>;
-		interrupt-names = "macirq";
-		clocks = <&clkc CLKID_ETH>,
-				<&clkc CLKID_FCLK_DIV2>,
-				<&clkc CLKID_MPLL2>;
-		clock-names = "stmmaceth", "clkin0", "clkin1";
-		phy-mode = "rgmii";
-	};
diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
index 4377f511a51d..c78be15704b9 100644
--- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
@@ -50,6 +50,11 @@ properties:
         - allwinner,sun8i-r40-emac
         - allwinner,sun8i-v3s-emac
         - allwinner,sun50i-a64-emac
+        - amlogic,meson6-dwmac
+        - amlogic,meson8b-dwmac
+        - amlogic,meson8m2-dwmac
+        - amlogic,meson-gxbb-dwmac
+        - amlogic,meson-axg-dwmac
         - snps,dwmac
         - snps,dwmac-3.50a
         - snps,dwmac-3.610
-- 
2.22.0


^ permalink raw reply related

* [PATCH v3 1/2] dt-bindings: net: snps,dwmac: update reg minItems maxItems
From: Neil Armstrong @ 2019-08-08 11:41 UTC (permalink / raw)
  To: robh+dt
  Cc: Neil Armstrong, martin.blumenstingl, devicetree, netdev,
	linux-amlogic, linux-arm-kernel, linux-kernel, Rob Herring,
	Maxime Ripard
In-Reply-To: <20190808114101.29982-1-narmstrong@baylibre.com>

The Amlogic Meson DWMAC glue bindings needs a second reg cells for the
glue registers, thus update the reg minItems/maxItems to allow more
than a single reg cell.

Also update the allwinner,sun7i-a20-gmac.yaml derivative schema to specify
maxItems to 1.

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
---
 .../devicetree/bindings/net/allwinner,sun7i-a20-gmac.yaml      | 3 +++
 Documentation/devicetree/bindings/net/snps,dwmac.yaml          | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/allwinner,sun7i-a20-gmac.yaml b/Documentation/devicetree/bindings/net/allwinner,sun7i-a20-gmac.yaml
index 06b1cc8bea14..ef446ae166f3 100644
--- a/Documentation/devicetree/bindings/net/allwinner,sun7i-a20-gmac.yaml
+++ b/Documentation/devicetree/bindings/net/allwinner,sun7i-a20-gmac.yaml
@@ -17,6 +17,9 @@ properties:
   compatible:
     const: allwinner,sun7i-a20-gmac
 
+  reg:
+    maxItems: 1
+
   interrupts:
     maxItems: 1
 
diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
index 76fea2be66ac..4377f511a51d 100644
--- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
@@ -61,7 +61,8 @@ properties:
         - snps,dwxgmac-2.10
 
   reg:
-    maxItems: 1
+    minItems: 1
+    maxItems: 2
 
   interrupts:
     minItems: 1
-- 
2.22.0


^ permalink raw reply related

* [PATCH v3 0/2] dt-bindings: net: meson-dwmac: convert to yaml
From: Neil Armstrong @ 2019-08-08 11:40 UTC (permalink / raw)
  To: robh+dt
  Cc: Neil Armstrong, martin.blumenstingl, devicetree, netdev,
	linux-amlogic, linux-arm-kernel, linux-kernel

This patchsets converts the Amlogic Meson DWMAC glue bindings over to
YAML schemas using the already converted dwmac bindings.

The first patch is needed because the Amlogic glue needs a supplementary
reg cell to access the DWMAC glue registers.

Changes since v2:
- Added review tags
- Updated allwinner,sun7i-a20-gmac.yaml reg maxItems

Neil Armstrong (2):
  dt-bindings: net: snps,dwmac: update reg minItems maxItems
  dt-bindings: net: meson-dwmac: convert to yaml

 .../net/allwinner,sun7i-a20-gmac.yaml         |   3 +
 .../bindings/net/amlogic,meson-dwmac.yaml     | 113 ++++++++++++++++++
 .../devicetree/bindings/net/meson-dwmac.txt   |  71 -----------
 .../devicetree/bindings/net/snps,dwmac.yaml   |   8 +-
 4 files changed, 123 insertions(+), 72 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
 delete mode 100644 Documentation/devicetree/bindings/net/meson-dwmac.txt

-- 
2.22.0


^ permalink raw reply

* RE: Clause 73 and USXGMII
From: Jose Abreu @ 2019-08-08 11:45 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: netdev@vger.kernel.org, Andrew Lunn, Florian Fainelli,
	Heiner Kallweit
In-Reply-To: <20190808092313.GC5193@shell.armlinux.org.uk>

From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Date: Aug/08/2019, 10:23:13 (UTC+00:00)

> On Thu, Aug 08, 2019 at 09:02:57AM +0000, Jose Abreu wrote:
> > From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> > Date: Aug/08/2019, 09:26:26 (UTC+00:00)
> > 
> > > Hi,
> > > 
> > > Have you tried enabling debug mode in phylink (add #define DEBUG at the
> > > top of the file) ?
> > 
> > Yes:
> > 
> > [ With > 2.5G modes removed ]
> > # dmesg | grep -i phy
> > libphy: stmmac: probed
> > stmmaceth 0000:04:00.0 enp4s0: PHY [stmmac-1:00] driver [Synopsys 10G]
> > stmmaceth 0000:04:00.0 enp4s0: phy: setting supported 
> > 00,00000000,0002e040 advertising 00,00000000,0002e040
> > stmmaceth 0000:04:00.0 enp4s0: configuring for phy/usxgmii link mode
> > stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
> > mode=phy/usxgmii/Unknown/Unknown adv=00,00000000,0002e040 pause=10 
> > link=0 an=1
> > stmmaceth 0000:04:00.0 enp4s0: phy link down usxgmii/Unknown/Unknown
> 
> This shows that the PHY isn't reporting that the link came up.  Did
> the PHY negotiate link?  If so, why isn't it reporting that the link
> came up?  Maybe something is mis-programming the capability bits in
> the PHY?  Maybe disabling the 10G speeds disables everything faster
> than 1G?

Autoneg was started but never finishes and disabling 10G modes is 
causing autoneg to fail.

> 
> > [ Without any limit ]
> > # dmesg | grep -i phy
> > libphy: stmmac: probed
> > stmmaceth 0000:04:00.0 enp4s0: PHY [stmmac-1:00] driver [Synopsys 10G]
> > stmmaceth 0000:04:00.0 enp4s0: phy: setting supported 
> > 00,00000000,000ee040 advertising 00,00000000,000ee040
> > stmmaceth 0000:04:00.0 enp4s0: configuring for phy/usxgmii link mode
> > stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
> > mode=phy/usxgmii/Unknown/Unknown adv=00,00000000,000ee040 pause=10 
> > link=0 an=1
> > stmmaceth 0000:04:00.0 enp4s0: phy link down usxgmii/Unknown/Unknown
> > stmmaceth 0000:04:00.0 enp4s0: phy link up usxgmii/2.5Gbps/Full
> > stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
> > mode=phy/usxgmii/2.5Gbps/Full adv=00,00000000,00000000 pause=0f link=1 
> > an=0
> > 
> > I'm thinking on whether this can be related with USXGMII. As link is 
> > operating in 10G but I configure USXGMII for 2.5G maybe autoneg outcome 
> > should always be 10G ?
> 
> As I understand USXGMII (which isn't very well, because the spec isn't
> available) I believe that it operates in a similar way to SGMII where
> data is replicated the appropriate number of times to achieve the link
> speed.  So, the USXGMII link always operates at a bit rate equivalent
> to 10G, but data is replicated twice for 5G, four times for 2.5G, ten
> times for 1G, etc.
> 
> I notice that you don't say that you support any copper speeds, which
> brings up the question about what the PHY's media is...

I just added the speeds that XPCS supports within Clause 73 
specification:
Technology Ability field. Indicates the supported technologies:
	A0: When this bit is set to 1, the 1000BASE-KX technology is supported
	A1: When this bit is set to 1, the 10GBASE-KX4 technology is supported
	A2: When this bit is set to 1, the 10GBASE-KR technology is supported
	A11: When this bit is set to 1, the 2.5GBASE-KX technology is supported
	A12: When this bit is set to 1, the 5GBASE-KR technology is supported

And, within USXGMII, XPCS supports the following:
	Single Port: 10G-SXGMII, 5G-SXGMII, 2.5G-SXGMII
	Dual Port: 10G-DXGMII, 5G-DXGMII
	Quad Port: 10G-XGMII

My HW is currently fixed for USXGMII at 2.5G.

> 
> > > On Thu, Aug 08, 2019 at 08:17:29AM +0000, Jose Abreu wrote:
> > > > ++ PHY Experts
> > > > 
> > > > From: Jose Abreu <joabreu@synopsys.com>
> > > > Date: Aug/07/2019, 16:46:23 (UTC+00:00)
> > > > 
> > > > > Hello,
> > > > > 
> > > > > I've some sample code for Clause 73 support using Synopsys based XPCS 
> > > > > but I would like to clarify some things that I noticed.
> > > > > 
> > > > > I'm using USXGMII as interface and a single SERDES that operates at 10G 
> > > > > rate but MAC side is working at 2.5G. Maximum available bandwidth is 
> > > > > therefore 2.5Gbps.
> > > > > 
> > > > > So, I configure USXGMII for 2.5G mode and it works but if I try to limit 
> > > > > the autoneg abilities to 2.5G max then it never finishes:
> > > > > # ethtool enp4s0
> > > > > Settings for enp4s0:
> > > > > 	Supported ports: [ ]
> > > > > 	Supported link modes:   1000baseKX/Full 
> > > > > 	                        2500baseX/Full 
> > > > > 	Supported pause frame use: Symmetric Receive-only
> > > > > 	Supports auto-negotiation: Yes
> > > > > 	Supported FEC modes: Not reported
> > > > > 	Advertised link modes:  1000baseKX/Full 
> > > > > 	                        2500baseX/Full 
> > > > > 	Advertised pause frame use: Symmetric Receive-only
> > > > > 	Advertised auto-negotiation: Yes
> > > > > 	Advertised FEC modes: Not reported
> > > > > 	Speed: Unknown!
> > > > > 	Duplex: Unknown! (255)
> > > > > 	Port: MII
> > > > > 	PHYAD: 0
> > > > > 	Transceiver: internal
> > > > > 	Auto-negotiation: on
> > > > > 	Supports Wake-on: ug
> > > > > 	Wake-on: d
> > > > > 	Current message level: 0x0000003f (63)
> > > > > 			       drv probe link timer ifdown ifup
> > > > > 	Link detected: no
> > > > > 
> > > > > When I do not limit autoneg and I say that maximum limit is 10G then I 
> > > > > get Link Up and autoneg finishes with this outcome:
> > > > > # ethtool enp4s0
> > > > > Settings for enp4s0:
> > > > > 	Supported ports: [ ]
> > > > > 	Supported link modes:   1000baseKX/Full 
> > > > > 	                        2500baseX/Full 
> > > > > 	                        10000baseKX4/Full 
> > > > > 	                        10000baseKR/Full 
> > > > > 	Supported pause frame use: Symmetric Receive-only
> > > > > 	Supports auto-negotiation: Yes
> > > > > 	Supported FEC modes: Not reported
> > > > > 	Advertised link modes:  1000baseKX/Full 
> > > > > 	                        2500baseX/Full 
> > > > > 	                        10000baseKX4/Full 
> > > > > 	                        10000baseKR/Full 
> > > > > 	Advertised pause frame use: Symmetric Receive-only
> > > > > 	Advertised auto-negotiation: Yes
> > > > > 	Advertised FEC modes: Not reported
> > > > > 	Link partner advertised link modes:  1000baseKX/Full 
> > > > > 	                                     2500baseX/Full 
> > > > > 	                                     10000baseKX4/Full 
> > > > > 	                                     10000baseKR/Full 
> > > > > 	Link partner advertised pause frame use: Symmetric Receive-only
> > > > > 	Link partner advertised auto-negotiation: Yes
> > > > > 	Link partner advertised FEC modes: Not reported
> > > > > 	Speed: 2500Mb/s
> > > > > 	Duplex: Full
> > > > > 	Port: MII <- Never mind this, it's a SW issue
> > > > > 	PHYAD: 0
> > > > > 	Transceiver: internal
> > > > > 	Auto-negotiation: on
> > > > > 	Supports Wake-on: ug
> > > > > 	Wake-on: d
> > > > > 	Current message level: 0x0000003f (63)
> > > > > 			       drv probe link timer ifdown ifup
> > > > > 	Link detected: yes
> > > > > 
> > > > > I was expecting that, as MAC side is limited to 2.5G, I should set in 
> > > > > phylink the correct capabilities and then outcome of autoneg would only 
> > > > > have up to 2.5G modes. Am I wrong ?
> > > > > 
> > > > > ---
> > > > > Thanks,
> > > > > Jose Miguel Abreu
> > > > 
> > > > 
> > > > ---
> > > > Thanks,
> > > > Jose Miguel Abreu
> > > > 
> > > 
> > > -- 
> > > RMK's Patch system: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.armlinux.org.uk_developer_patches_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=1MdSlPrmzsMMCJbbLcDYTNuPq1njfusBRjcRz3UD4Dg&s=_30hwSYkGf9DfyCG48mnh7lXP8iiULXpfAP_6agUJno&e= 
> > > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> > > According to speedtest.net: 11.9Mbps down 500kbps up
> > 
> > 
> > ---
> > Thanks,
> > Jose Miguel Abreu
> > 
> 
> -- 
> RMK's Patch system: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.armlinux.org.uk_developer_patches_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=d_Og5QaTJOl1WoLi43ZAlCMajHnZp4mGg8Npwlaa2pk&s=bs6ws6HmZNKiutYWGFPy1ztnEQBuhtWyjiE0Hr1_URo&e= 
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up


---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH net-next 5/5] r8152: change rx_frag_head_sz and rx_max_agg_num dynamically
From: Maciej Fijalkowski @ 2019-08-08 11:49 UTC (permalink / raw)
  To: Hayes Wang
  Cc: Jakub Kicinski, netdev@vger.kernel.org, nic_swsd,
	linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2F18D0D8E@RTITMBSVM03.realtek.com.tw>

On Thu, 8 Aug 2019 08:52:51 +0000
Hayes Wang <hayeswang@realtek.com> wrote:

> Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
> > Sent: Wednesday, August 07, 2019 6:10 AM  
> [...]
> > On Tue, 6 Aug 2019 19:18:04 +0800, Hayes Wang wrote:  
> > > Let rx_frag_head_sz and rx_max_agg_num could be modified dynamically
> > > through the sysfs.
> > >
> > > Signed-off-by: Hayes Wang <hayeswang@realtek.com>  
> > 
> > Please don't expose those via sysfs. Ethtool's copybreak and descriptor
> > count should be applicable here, I think.  
> 
> Excuse me again.
> I find the kernel supports the copybreak of Ethtool.
> However, I couldn't find a command of Ethtool to use it.

Ummm there's set_tunable ops. Amazon's ena driver is making use of it from what
I see. Look at ena_set_tunable() in
drivers/net/ethernet/amazon/ena/ena_ethtool.c.

Maciej

> Do I miss something?
> 
> Best Regards,
> Hayes
> 


^ permalink raw reply

* [PATCH net v2 0/2] Fix collisions in socket cookie generation
From: Daniel Borkmann @ 2019-08-08 11:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, bpf, m, edumazet, ast, willemb, Daniel Borkmann

This change makes the socket cookie generator as a global counter
instead of per netns in order to fix cookie collisions for BPF use
cases we ran into. See main patch #1 for more details.

Given the change is small/trivial and fixes an issue we're seeing
my preference would be net tree (though it cleanly applies to
net-next as well). Went for net tree instead of bpf tree here given
the main change is in net/core/sock_diag.c, but either way would be
fine with me.

Thanks a lot!

v1 -> v2:
  - Fix up commit description in patch #1, thanks Eric!

Daniel Borkmann (2):
  sock: make cookie generation global instead of per netns
  bpf: sync bpf.h to tools infrastructure

 include/net/net_namespace.h    |  1 -
 include/uapi/linux/bpf.h       |  4 ++--
 net/core/sock_diag.c           |  3 ++-
 tools/include/uapi/linux/bpf.h | 11 +++++++----
 4 files changed, 11 insertions(+), 8 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH net v2 1/2] sock: make cookie generation global instead of per netns
From: Daniel Borkmann @ 2019-08-08 11:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, bpf, m, edumazet, ast, willemb, Daniel Borkmann
In-Reply-To: <20190808115726.31703-1-daniel@iogearbox.net>

Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.

The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veth or ipvlan devices with peer in different netns. Change the
counter to be global instead.

Socket cookie consumers must assume the value as opqaue in any case.
Not every socket must have a cookie generated and knowledge of the
counter value itself does not provide much value either way hence
conversion to global is fine.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Martynas Pumputis <m@lambda.lt>
---
 include/net/net_namespace.h | 1 -
 include/uapi/linux/bpf.h    | 4 ++--
 net/core/sock_diag.c        | 3 ++-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4a9da951a794..cb668bc2692d 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -61,7 +61,6 @@ struct net {
 	spinlock_t		rules_mod_lock;
 
 	u32			hash_mix;
-	atomic64_t		cookie_gen;
 
 	struct list_head	list;		/* list of network namespaces */
 	struct list_head	exit_list;	/* To linked to call pernet exit
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fa1c753dcdbc..a5aa7d3ac6a1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1466,8 +1466,8 @@ union bpf_attr {
  * 		If no cookie has been set yet, generate a new cookie. Once
  * 		generated, the socket cookie remains stable for the life of the
  * 		socket. This helper can be useful for monitoring per socket
- * 		networking traffic statistics as it provides a unique socket
- * 		identifier per namespace.
+ * 		networking traffic statistics as it provides a global socket
+ * 		identifier that can be assumed unique.
  * 	Return
  * 		A 8-byte long non-decreasing number on success, or 0 if the
  * 		socket field is missing inside *skb*.
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 3312a5849a97..c13ffbd33d8d 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -19,6 +19,7 @@ static const struct sock_diag_handler *sock_diag_handlers[AF_MAX];
 static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh);
 static DEFINE_MUTEX(sock_diag_table_mutex);
 static struct workqueue_struct *broadcast_wq;
+static atomic64_t cookie_gen;
 
 u64 sock_gen_cookie(struct sock *sk)
 {
@@ -27,7 +28,7 @@ u64 sock_gen_cookie(struct sock *sk)
 
 		if (res)
 			return res;
-		res = atomic64_inc_return(&sock_net(sk)->cookie_gen);
+		res = atomic64_inc_return(&cookie_gen);
 		atomic64_cmpxchg(&sk->sk_cookie, 0, res);
 	}
 }
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v2 2/2] tcp: Update TCP_BASE_MSS comment
From: Neal Cardwell @ 2019-08-08 11:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Josh Hunt, Netdev, David Miller, Eric Dumazet
In-Reply-To: <c1c2febd-742d-4e13-af9f-a7d7ec936ed9@gmail.com>

On Thu, Aug 8, 2019 at 2:13 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On 8/8/19 1:52 AM, Josh Hunt wrote:
> > TCP_BASE_MSS is used as the default initial MSS value when MTU probing is
> > enabled. Update the comment to reflect this.
> >
> > Suggested-by: Neal Cardwell <ncardwell@google.com>
> > Signed-off-by: Josh Hunt <johunt@akamai.com>
> > ---
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Josh!

neal

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox