Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next v6 4/8] samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4}
From: Maciej Fijalkowski @ 2019-02-01 21:42 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201214230.1441-1-maciej.fijalkowski@intel.com>

There is a common problem with xdp samples that happens when user wants
to run a particular sample and some bpf program is already loaded. The
default 64kb RLIMIT_MEMLOCK resource limit will cause a following error
(assuming that xdp sample that is failing was converted to libbpf
usage):

libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
libbpf: failed to load object './xdp_sample_pkts_kern.o'

Fix it in xdp_sample_pkts and xdp_router_ipv4 by setting RLIMIT_MEMLOCK
to RLIM_INFINITY.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/bpf/xdp_router_ipv4_user.c | 7 +++++++
 samples/bpf/xdp_sample_pkts_user.c | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c
index cea2306f5ab7..c63c6beec7d6 100644
--- a/samples/bpf/xdp_router_ipv4_user.c
+++ b/samples/bpf/xdp_router_ipv4_user.c
@@ -25,6 +25,7 @@
 #include <sys/syscall.h>
 #include "bpf_util.h"
 #include "bpf/libbpf.h"
+#include <sys/resource.h>
 
 int sock, sock_arp, flags = 0;
 static int total_ifindex;
@@ -609,6 +610,7 @@ static int monitor_route(void)
 
 int main(int ac, char **argv)
 {
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
@@ -635,6 +637,11 @@ int main(int ac, char **argv)
 		ifname_list = (argv + 1);
 	}
 
+	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+		perror("setrlimit(RLIMIT_MEMLOCK)");
+		return 1;
+	}
+
 	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return 1;
 
diff --git a/samples/bpf/xdp_sample_pkts_user.c b/samples/bpf/xdp_sample_pkts_user.c
index 8dd87c1eb560..5f5828ee0761 100644
--- a/samples/bpf/xdp_sample_pkts_user.c
+++ b/samples/bpf/xdp_sample_pkts_user.c
@@ -12,6 +12,7 @@
 #include <signal.h>
 #include <libbpf.h>
 #include <bpf/bpf.h>
+#include <sys/resource.h>
 
 #include "perf-sys.h"
 #include "trace_helpers.h"
@@ -99,6 +100,7 @@ static void sig_handler(int signo)
 
 int main(int argc, char **argv)
 {
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
@@ -114,6 +116,11 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+		perror("setrlimit(RLIMIT_MEMLOCK)");
+		return 1;
+	}
+
 	numcpus = get_nprocs();
 	if (numcpus > MAX_CPUS)
 		numcpus = MAX_CPUS;
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v6 2/8] samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe
From: Maciej Fijalkowski @ 2019-02-01 21:42 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201214230.1441-1-maciej.fijalkowski@intel.com>

From: Jesper Dangaard Brouer <brouer@redhat.com>

The sample xdp_redirect_cpu is not using helper bpf_trace_printk.
Thus it makes no sense that the --debug option us reading
from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe.
Simply remove it.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/bpf/xdp_redirect_cpu_user.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c
index 2d23054aaccf..f141e752ca0a 100644
--- a/samples/bpf/xdp_redirect_cpu_user.c
+++ b/samples/bpf/xdp_redirect_cpu_user.c
@@ -51,7 +51,6 @@ static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
 	{"dev",		required_argument,	NULL, 'd' },
 	{"skb-mode",	no_argument,		NULL, 'S' },
-	{"debug",	no_argument,		NULL, 'D' },
 	{"sec",		required_argument,	NULL, 's' },
 	{"prognum",	required_argument,	NULL, 'p' },
 	{"qsize",	required_argument,	NULL, 'q' },
@@ -563,7 +562,6 @@ int main(int argc, char **argv)
 	bool use_separators = true;
 	bool stress_mode = false;
 	char filename[256];
-	bool debug = false;
 	int added_cpus = 0;
 	int longindex = 0;
 	int interval = 2;
@@ -624,9 +622,6 @@ int main(int argc, char **argv)
 		case 'S':
 			xdp_flags |= XDP_FLAGS_SKB_MODE;
 			break;
-		case 'D':
-			debug = true;
-			break;
 		case 'x':
 			stress_mode = true;
 			break;
@@ -688,11 +683,6 @@ int main(int argc, char **argv)
 		return EXIT_FAIL_XDP;
 	}
 
-	if (debug) {
-		printf("Debug-mode reading trace pipe (fix #define DEBUG)\n");
-		read_trace_pipe();
-	}
-
 	stats_poll(interval, use_separators, prog_num, stress_mode);
 	return EXIT_OK;
 }
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v6 1/8] libbpf: Add a helper for retrieving a map fd for a given name
From: Maciej Fijalkowski @ 2019-02-01 21:42 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201214230.1441-1-maciej.fijalkowski@intel.com>

XDP samples are mostly cooperating with eBPF maps through their file
descriptors. In case of a eBPF program that contains multiple maps it
might be tiresome to iterate through them and call bpf_map__fd for each
one. Add a helper mostly based on bpf_object__find_map_by_name, but
instead of returning the struct bpf_map pointer, return map fd.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/lib/bpf/libbpf.c   | 6 ++++++
 tools/lib/bpf/libbpf.h   | 3 +++
 tools/lib/bpf/libbpf.map | 1 +
 3 files changed, 10 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2ccde17957e6..03bc01ca2577 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2884,6 +2884,12 @@ bpf_object__find_map_by_name(struct bpf_object *obj, const char *name)
 	return NULL;
 }
 
+int
+bpf_object__find_map_fd_by_name(struct bpf_object *obj, const char *name)
+{
+	return bpf_map__fd(bpf_object__find_map_by_name(obj, name));
+}
+
 struct bpf_map *
 bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 62ae6cb93da1..931be6f3408c 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -264,6 +264,9 @@ struct bpf_map;
 LIBBPF_API struct bpf_map *
 bpf_object__find_map_by_name(struct bpf_object *obj, const char *name);
 
+LIBBPF_API int
+bpf_object__find_map_fd_by_name(struct bpf_object *obj, const char *name);
+
 /*
  * Get bpf_map through the offset of corresponding struct bpf_map_def
  * in the BPF object file.
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index f6f96fc38c50..43ba9bb8d24b 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -131,4 +131,5 @@ LIBBPF_0.0.2 {
 		bpf_probe_map_type;
 		bpf_probe_prog_type;
 		bpf_map_lookup_elem_flags;
+		bpf_object__find_map_fd_by_name;
 } LIBBPF_0.0.1;
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v6 0/8] xdp: Avoid unloading xdp prog not attached by sample
From: Maciej Fijalkowski @ 2019-02-01 21:42 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend

Hi!
This patchset tries to address the situation where:
* user loads a particular xdp sample application that does stats polling
* user loads another sample application on the same interface
* then, user sends SIGINT/SIGTERM to the app that was attached as a first one
* second application ends up with an unloaded xdp program

1st patch contains a helper libbpf function for getting the map fd by a
given map name.
In patch 2 Jesper removes the read_trace_pipe usage from xdp_redirect_cpu which
was a blocker for converting this sample to libbpf usage.
3rd patch updates a bunch of xdp samples to make the use of libbpf.
Patch 4 adjusts RLIMIT_MEMLOCK for two samples touched in this patchset.
In patch 5 extack messages are added for cases where dev_change_xdp_fd returns
with an error so user has an idea what was the reason for not attaching the
xdp program onto interface.
Patch 6 makes the samples behavior similar to what iproute2 does when loading
xdp prog - the "force" flag is introduced.
Patch 7 introduces the libbpf function that will query the driver from
userspace about the currently attached xdp prog id.

Use it in samples that do polling by checking the prog id in signal handler
and comparing it with previously stored one which is the scope of patch 8.

Thanks!

v1->v2:
* add a libbpf helper for getting a prog via relative index
* include xdp_redirect_cpu into conversion

v2->v3: mostly addressing Daniel's/Jesper's comments
* get rid of the helper from v1->v2
* feed the xdp_redirect_cpu with program name instead of number

v3->v4:
* fix help message in xdp_sample_pkts

v4->v5:
* in get_link_xdp_fd, assign prog_id only when libbpf_nl_get_link returned
  with 0
* add extack messages in dev_change_xdp_fd
* check the return value of bpf_get_link_xdp_id when exiting from sample progs

v5->v6:
* rebase

Jesper Dangaard Brouer (1):
  samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe

Maciej Fijalkowski (7):
  libbpf: Add a helper for retrieving a map fd for a given name
  samples/bpf: Convert XDP samples to libbpf usage
  samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4}
  xdp: Provide extack messages when prog attachment failed
  samples/bpf: Add a "force" flag to XDP samples
  libbpf: Add a support for getting xdp prog id on ifindex
  samples/bpf: Check the prog id before exiting

 net/core/dev.c                      |  12 ++-
 samples/bpf/Makefile                |   8 +-
 samples/bpf/xdp1_user.c             |  34 ++++++-
 samples/bpf/xdp_adjust_tail_user.c  |  38 +++++--
 samples/bpf/xdp_redirect_cpu_user.c | 196 +++++++++++++++++++++++++-----------
 samples/bpf/xdp_redirect_map_user.c | 106 +++++++++++++++----
 samples/bpf/xdp_redirect_user.c     | 103 ++++++++++++++++---
 samples/bpf/xdp_router_ipv4_user.c  | 179 +++++++++++++++++++++++---------
 samples/bpf/xdp_rxq_info_user.c     |  41 ++++++--
 samples/bpf/xdp_sample_pkts_user.c  |  81 ++++++++++++---
 samples/bpf/xdp_tx_iptunnel_user.c  |  71 ++++++++++---
 samples/bpf/xdpsock_user.c          |  30 +++++-
 tools/lib/bpf/libbpf.c              |   6 ++
 tools/lib/bpf/libbpf.h              |   4 +
 tools/lib/bpf/libbpf.map            |   2 +
 tools/lib/bpf/netlink.c             |  85 ++++++++++++++++
 16 files changed, 796 insertions(+), 200 deletions(-)

-- 
2.16.1

^ permalink raw reply

* Re: [PATCH bpf-next v5 0/8] xdp: Avoid unloading xdp prog not attached by sample
From: Maciej Fijałkowski @ 2019-02-01 21:47 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <1221854d-a2ad-cc03-8e72-985a265c49c9@iogearbox.net>

On Fri, 1 Feb 2019 22:23:45 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 02/01/2019 01:19 AM, Maciej Fijalkowski wrote:
> > Hi!
> > This patchset tries to address the situation where:
> > * user loads a particular xdp sample application that does stats polling
> > * user loads another sample application on the same interface
> > * then, user sends SIGINT/SIGTERM to the app that was attached as a first
> > one
> > * second application ends up with an unloaded xdp program
> > 
> > 1st patch contains a helper libbpf function for getting the map fd by a
> > given map name.
> > In patch 2 Jesper removes the read_trace_pipe usage from xdp_redirect_cpu
> > which was a blocker for converting this sample to libbpf usage.
> > 3rd patch updates a bunch of xdp samples to make the use of libbpf.
> > Patch 4 adjusts RLIMIT_MEMLOCK for two samples touched in this patchset.
> > In patch 5 extack messages are added for cases where dev_change_xdp_fd
> > returns with an error so user has an idea what was the reason for not
> > attaching the xdp program onto interface.
> > Patch 6 makes the samples behavior similar to what iproute2 does when
> > loading xdp prog - the "force" flag is introduced.
> > Patch 7 introduces the libbpf function that will query the driver from
> > userspace about the currently attached xdp prog id.
> > 
> > Use it in samples that do polling by checking the prog id in signal handler
> > and comparing it with previously stored one which is the scope of patch 8.
> > 
> > Thanks!
> > 
> > v1->v2:
> > * add a libbpf helper for getting a prog via relative index
> > * include xdp_redirect_cpu into conversion
> > 
> > v2->v3: mostly addressing Daniel's/Jesper's comments
> > * get rid of the helper from v1->v2
> > * feed the xdp_redirect_cpu with program name instead of number
> > 
> > v3->v4:
> > * fix help message in xdp_sample_pkts
> > 
> > v4->v5:
> > * in get_link_xdp_fd, assign prog_id only when libbpf_nl_get_link returned
> >   with 0
> > * add extack messages in dev_change_xdp_fd
> > * check the return value of bpf_get_link_xdp_id when exiting from sample
> > progs
> 
> Series looks good to me, but doesn't apply cleanly, please rebase.
> 
> Thanks,
> Daniel

Sure, sending v6 in a minute.

^ permalink raw reply

* Re: Co-existing XDP generic and native mode? (Re: [PATCH bpf-next v5 5/8] xdp: Provide extack messages when prog attachment failed)
From: Jakub Kicinski @ 2019-02-01 21:44 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jesper Dangaard Brouer, ast, David Miller, Maciej Fijalkowski,
	netdev, john.fastabend, David Ahern, Saeed Mahameed
In-Reply-To: <abc267ff-1c6c-9013-d5b4-628a0bab12f1@iogearbox.net>

On Fri, 1 Feb 2019 22:33:22 +0100, Daniel Borkmann wrote:
> On 02/01/2019 07:47 PM, Jakub Kicinski wrote:
> >> These are only refactor ideas, so if you can argue why your internal
> >> feature request for simultaneous generic and native make more sense,
> >> then I'm open for allowing this ?  
> > 
> > The request was actually to enable xdpoffload and xdpgeneric at the
> > same time.  I'm happy to have that as another HW offload exclusive
> > for now :)  
> 
> The latter is probably fine, though what's the concrete use case? :)

I think it was more of a "I expected this to work, since driver+offload
worked" than a feature request.  Looking back at it it was filed as bug
I converted it to feature.

> Reason we kept native vs generic separate is mainly so that native XDP
> drivers are discouraged to punt missing features to generic hook instead
> of properly implementing them in native mode.

Agreed, although one could make a counter argument that the
performance should be a strong enough incentive and we shouldn't stop
people for experimenting and prototyping :)  The code change looks
simple enough:

diff --git a/net/core/dev.c b/net/core/dev.c
index 8e276e0192a1..ce4880e5e95d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7976,11 +7976,13 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
        enum bpf_netdev_command query;
        struct bpf_prog *prog = NULL;
        bpf_op_t bpf_op, bpf_chk;
+       bool offload;
        int err;
 
        ASSERT_RTNL();
 
-       query = flags & XDP_FLAGS_HW_MODE ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+       offload = flags & XDP_FLAGS_HW_MODE;
+       query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
 
        bpf_op = bpf_chk = ops->ndo_bpf;
        if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE)))
@@ -7991,8 +7993,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
                bpf_chk = generic_xdp_install;
 
        if (fd >= 0) {
-               if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) ||
-                   __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW))
+               if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG))
                        return -EEXIST;
                if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) &&
                    __dev_xdp_query(dev, bpf_op, query))
@@ -8003,8 +8004,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
                if (IS_ERR(prog))
                        return PTR_ERR(prog);
 
-               if (!(flags & XDP_FLAGS_HW_MODE) &&
-                   bpf_prog_is_dev_bound(prog->aux)) {
+               if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
                        NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
                        bpf_prog_put(prog);
                        return -EINVAL;

Do you think we shouldn't do it?

^ permalink raw reply related

* Re: [PATCH bpf-next v5 7/8] libbpf: Add a support for getting xdp prog id on ifindex
From: Daniel Borkmann @ 2019-02-01 21:43 UTC (permalink / raw)
  To: Maciej Fijalkowski, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-8-maciej.fijalkowski@intel.com>

On 02/01/2019 01:19 AM, Maciej Fijalkowski wrote:
> Since we have a dedicated netlink attributes for xdp setup on a
> particular interface, it is now possible to retrieve the program id that
> is currently attached to the interface. The use case is targeted for
> sample xdp programs, which will store the program id just after loading
> bpf program onto iface. On shutdown, the sample will make sure that it
> can unload the program by querying again the iface and verifying that
> both program id's matches.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
[...]
> +int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
> +{
> +	struct xdp_id_md xdp_id = {};
> +	int sock, ret;
> +	__u32 nl_pid;
> +	__u32 mask;
> +
> +	if (flags & ~XDP_FLAGS_MASK)
> +		return -EINVAL;
> +
> +	/* Check whether the single {HW,DRV,SKB} mode is set */
> +	flags &= (XDP_FLAGS_SKB_MODE | XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE);
> +	mask = flags - 1;
> +	if (flags && flags & mask)
> +		return -EINVAL;
> +
> +	sock = libbpf_netlink_open(&nl_pid);
> +	if (sock < 0)
> +		return sock;
> +
> +	xdp_id.ifindex = ifindex;
> +	xdp_id.flags = flags;
> +
> +	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_id, &xdp_id);
> +	if (!ret)
> +		*prog_id = xdp_id.id;
> +
> +	close(sock);
> +	return ret;
> +}

Btw, is anyone going to follow-up on XDP_ATTACHED_MULTI support as well
later on?

Thanks,
Daniel

^ permalink raw reply

* Re: Co-existing XDP generic and native mode? (Re: [PATCH bpf-next v5 5/8] xdp: Provide extack messages when prog attachment failed)
From: Daniel Borkmann @ 2019-02-01 21:33 UTC (permalink / raw)
  To: Jakub Kicinski, Jesper Dangaard Brouer
  Cc: ast, David Miller, Maciej Fijalkowski, netdev, john.fastabend,
	David Ahern, Saeed Mahameed
In-Reply-To: <20190201104738.7a3b33d6@cakuba.hsd1.ca.comcast.net>

On 02/01/2019 07:47 PM, Jakub Kicinski wrote:
[...]
>> These are only refactor ideas, so if you can argue why your internal
>> feature request for simultaneous generic and native make more sense,
>> then I'm open for allowing this ?
> 
> The request was actually to enable xdpoffload and xdpgeneric at the
> same time.  I'm happy to have that as another HW offload exclusive
> for now :)

The latter is probably fine, though what's the concrete use case? :)
Reason we kept native vs generic separate is mainly so that native XDP
drivers are discouraged to punt missing features to generic hook instead
of properly implementing them in native mode.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf-next v5 0/8] xdp: Avoid unloading xdp prog not attached by sample
From: Daniel Borkmann @ 2019-02-01 21:23 UTC (permalink / raw)
  To: Maciej Fijalkowski, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

On 02/01/2019 01:19 AM, Maciej Fijalkowski wrote:
> Hi!
> This patchset tries to address the situation where:
> * user loads a particular xdp sample application that does stats polling
> * user loads another sample application on the same interface
> * then, user sends SIGINT/SIGTERM to the app that was attached as a first one
> * second application ends up with an unloaded xdp program
> 
> 1st patch contains a helper libbpf function for getting the map fd by a
> given map name.
> In patch 2 Jesper removes the read_trace_pipe usage from xdp_redirect_cpu which
> was a blocker for converting this sample to libbpf usage.
> 3rd patch updates a bunch of xdp samples to make the use of libbpf.
> Patch 4 adjusts RLIMIT_MEMLOCK for two samples touched in this patchset.
> In patch 5 extack messages are added for cases where dev_change_xdp_fd returns
> with an error so user has an idea what was the reason for not attaching the
> xdp program onto interface.
> Patch 6 makes the samples behavior similar to what iproute2 does when loading
> xdp prog - the "force" flag is introduced.
> Patch 7 introduces the libbpf function that will query the driver from
> userspace about the currently attached xdp prog id.
> 
> Use it in samples that do polling by checking the prog id in signal handler
> and comparing it with previously stored one which is the scope of patch 8.
> 
> Thanks!
> 
> v1->v2:
> * add a libbpf helper for getting a prog via relative index
> * include xdp_redirect_cpu into conversion
> 
> v2->v3: mostly addressing Daniel's/Jesper's comments
> * get rid of the helper from v1->v2
> * feed the xdp_redirect_cpu with program name instead of number
> 
> v3->v4:
> * fix help message in xdp_sample_pkts
> 
> v4->v5:
> * in get_link_xdp_fd, assign prog_id only when libbpf_nl_get_link returned
>   with 0
> * add extack messages in dev_change_xdp_fd
> * check the return value of bpf_get_link_xdp_id when exiting from sample progs

Series looks good to me, but doesn't apply cleanly, please rebase.

Thanks,
Daniel

^ permalink raw reply

* [PATCH net] net: systemport: Fix WoL with password after deep sleep
From: Florian Fainelli @ 2019-02-01 21:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, Florian Fainelli

Broadcom STB chips support a deep sleep mode where all register
contents are lost. Because we were stashing the MagicPacket password
into some of these registers a suspend into that deep sleep then a
resumption would not lead to being able to wake-up from MagicPacket with
password again.

Fix this by keeping a software copy of the password and program it
during suspend.

Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 25 +++++++++-------------
 drivers/net/ethernet/broadcom/bcmsysport.h |  2 ++
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index f9521d0274b7..28c9b0bdf2f6 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -520,7 +520,6 @@ static void bcm_sysport_get_wol(struct net_device *dev,
 				struct ethtool_wolinfo *wol)
 {
 	struct bcm_sysport_priv *priv = netdev_priv(dev);
-	u32 reg;
 
 	wol->supported = WAKE_MAGIC | WAKE_MAGICSECURE | WAKE_FILTER;
 	wol->wolopts = priv->wolopts;
@@ -528,11 +527,7 @@ static void bcm_sysport_get_wol(struct net_device *dev,
 	if (!(priv->wolopts & WAKE_MAGICSECURE))
 		return;
 
-	/* Return the programmed SecureOn password */
-	reg = umac_readl(priv, UMAC_PSW_MS);
-	put_unaligned_be16(reg, &wol->sopass[0]);
-	reg = umac_readl(priv, UMAC_PSW_LS);
-	put_unaligned_be32(reg, &wol->sopass[2]);
+	memcpy(wol->sopass, priv->sopass, sizeof(priv->sopass));
 }
 
 static int bcm_sysport_set_wol(struct net_device *dev,
@@ -548,13 +543,8 @@ static int bcm_sysport_set_wol(struct net_device *dev,
 	if (wol->wolopts & ~supported)
 		return -EINVAL;
 
-	/* Program the SecureOn password */
-	if (wol->wolopts & WAKE_MAGICSECURE) {
-		umac_writel(priv, get_unaligned_be16(&wol->sopass[0]),
-			    UMAC_PSW_MS);
-		umac_writel(priv, get_unaligned_be32(&wol->sopass[2]),
-			    UMAC_PSW_LS);
-	}
+	if (wol->wolopts & WAKE_MAGICSECURE)
+		memcpy(priv->sopass, wol->sopass, sizeof(priv->sopass));
 
 	/* Flag the device and relevant IRQ as wakeup capable */
 	if (wol->wolopts) {
@@ -2649,13 +2639,18 @@ static int bcm_sysport_suspend_to_wol(struct bcm_sysport_priv *priv)
 	unsigned int index, i = 0;
 	u32 reg;
 
-	/* Password has already been programmed */
 	reg = umac_readl(priv, UMAC_MPD_CTRL);
 	if (priv->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE))
 		reg |= MPD_EN;
 	reg &= ~PSW_EN;
-	if (priv->wolopts & WAKE_MAGICSECURE)
+	if (priv->wolopts & WAKE_MAGICSECURE) {
+		/* Program the SecureOn password */
+		umac_writel(priv, get_unaligned_be16(&priv->sopass[0]),
+			    UMAC_PSW_MS);
+		umac_writel(priv, get_unaligned_be32(&priv->sopass[2]),
+			    UMAC_PSW_LS);
 		reg |= PSW_EN;
+	}
 	umac_writel(priv, reg, UMAC_MPD_CTRL);
 
 	if (priv->wolopts & WAKE_FILTER) {
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h b/drivers/net/ethernet/broadcom/bcmsysport.h
index 0887e6356649..0b192fea9c5d 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -12,6 +12,7 @@
 #define __BCM_SYSPORT_H
 
 #include <linux/bitmap.h>
+#include <linux/ethtool.h>
 #include <linux/if_vlan.h>
 #include <linux/net_dim.h>
 
@@ -778,6 +779,7 @@ struct bcm_sysport_priv {
 	unsigned int		crc_fwd:1;
 	u16			rev;
 	u32			wolopts;
+	u8			sopass[SOPASS_MAX];
 	unsigned int		wol_irq_disabled:1;
 
 	/* MIB related fields */
-- 
2.17.1


^ permalink raw reply related

* Bluetooth: hci0: last event is not cmd complete (0x0f) with LG TV
From: Paul Menzel @ 2019-02-01 21:20 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg
  Cc: David S. Miller, linux-bluetooth, netdev, LKML

Dear Linux folks,

When trying to pair a Dell Latitude E7250 running Debian Sid/unstable 
with Linux 4.20 and GNOME 3.30 with an LG TV, after starting the pairing 
process the TV is listed. in Bluetooth dialog of GNOME setting.

The TV displays the instructions below.

> Complete the next three steps on your mobile device:
> 1. Turn ON Bluetooth.
> 2. Select the TV name from the list of available devices.
>    • TV Name : 679
> 3. Confirm the connection request.

Selecting the TV in the GNOME dialog, a dialog is shown on my system 
with a PIN consisting of six numbers. With the dialog, Linux logs the 
message below.

      Bluetooth: hci0: last event is not cmd complete (0x0f)

But, the TV does not show any PIN. Confirming it anyway, nothing is 
happening further.

Is that supposed to work? It’d be great if you helped me to set this up.

Kind regards,

Paul

^ permalink raw reply

* Re: [PATCH v2] net: dp83640: expire old TX-skb
From: Andrew Lunn @ 2019-02-01 21:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Richard Cochran, Florian Fainelli, Heiner Kallweit, netdev
In-Reply-To: <20190201210918.gvufqxpqzvuzfk5n@linutronix.de>

On Fri, Feb 01, 2019 at 10:09:18PM +0100, Sebastian Andrzej Siewior wrote:
> During sendmsg() a cloned skb is saved via dp83640_txtstamp() in
> ->tx_queue. After the NIC sends this packet, the PHY will reply with a
> timestamp for that TX packet. If the cable is pulled at the right time I
> don't see that packet. It might gets flushed as part of queue shutdown
> on NIC's side.
> Once the link is up again then after the next sendmsg() we enqueue
> another skb in dp83640_txtstamp() and have two on the list. Then the PHY
> will send a reply and decode_txts() attaches it to the first skb on the
> list.
> No crash occurs since refcounting works but we are one packet behind.
> linuxptp/ptp4l usually closes the socket and opens a new one (in such a
> timeout case) so those "stale" replies never get there. However it does
> not resume normal operation anymore.
> 
> Purge old skbs in decode_txts().
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>

Hi Sebastian

netdev does not use the Cc: stable@vger.kernel.org.

https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt

Please include a Fixes: tag, and a subject of [PATCH net] ...

Thanks
       Andrew

^ permalink raw reply

* [PATCH v2] net: dp83640: expire old TX-skb
From: Sebastian Andrzej Siewior @ 2019-02-01 21:09 UTC (permalink / raw)
  To: Richard Cochran; +Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit, netdev

During sendmsg() a cloned skb is saved via dp83640_txtstamp() in
->tx_queue. After the NIC sends this packet, the PHY will reply with a
timestamp for that TX packet. If the cable is pulled at the right time I
don't see that packet. It might gets flushed as part of queue shutdown
on NIC's side.
Once the link is up again then after the next sendmsg() we enqueue
another skb in dp83640_txtstamp() and have two on the list. Then the PHY
will send a reply and decode_txts() attaches it to the first skb on the
list.
No crash occurs since refcounting works but we are one packet behind.
linuxptp/ptp4l usually closes the socket and opens a new one (in such a
timeout case) so those "stale" replies never get there. However it does
not resume normal operation anymore.

Purge old skbs in decode_txts().

Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
---
RFC … v2:
	- reverse xmas tree
	- stable tag

 drivers/net/phy/dp83640.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index 18b41bc345ab..6e8807212aa3 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -898,14 +898,14 @@ static void decode_txts(struct dp83640_private *dp83640,
 			struct phy_txts *phy_txts)
 {
 	struct skb_shared_hwtstamps shhwtstamps;
+	struct dp83640_skb_info *skb_info;
 	struct sk_buff *skb;
-	u64 ns;
 	u8 overflow;
+	u64 ns;
 
 	/* We must already have the skb that triggered this. */
-
+again:
 	skb = skb_dequeue(&dp83640->tx_queue);
-
 	if (!skb) {
 		pr_debug("have timestamp but tx_queue empty\n");
 		return;
@@ -920,6 +920,11 @@ static void decode_txts(struct dp83640_private *dp83640,
 		}
 		return;
 	}
+	skb_info = (struct dp83640_skb_info *)skb->cb;
+	if (time_after(jiffies, skb_info->tmo)) {
+		kfree_skb(skb);
+		goto again;
+	}
 
 	ns = phy2txts(phy_txts);
 	memset(&shhwtstamps, 0, sizeof(shhwtstamps));
@@ -1472,6 +1477,7 @@ static bool dp83640_rxtstamp(struct phy_device *phydev,
 static void dp83640_txtstamp(struct phy_device *phydev,
 			     struct sk_buff *skb, int type)
 {
+	struct dp83640_skb_info *skb_info = (struct dp83640_skb_info *)skb->cb;
 	struct dp83640_private *dp83640 = phydev->priv;
 
 	switch (dp83640->hwts_tx_en) {
@@ -1484,6 +1490,7 @@ static void dp83640_txtstamp(struct phy_device *phydev,
 		/* fall through */
 	case HWTSTAMP_TX_ON:
 		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+		skb_info->tmo = jiffies + SKB_TIMESTAMP_TIMEOUT;
 		skb_queue_tail(&dp83640->tx_queue, skb);
 		break;
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH bpf-next v6 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
From: Alexei Starovoitov @ 2019-02-01 21:06 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Peter Oskolkov,
	David Ahern, Willem de Bruijn
In-Reply-To: <20190201172229.108867-3-posk@google.com>

On Fri, Feb 01, 2019 at 09:22:26AM -0800, Peter Oskolkov wrote:
> This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
> BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
> and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
> to packets (e.g. IP/GRE, GUE, IPIP).
> 
> This is useful when thousands of different short-lived flows should be
> encapped, each with different and dynamically determined destination.
> Although lwtunnels can be used in some of these scenarios, the ability
> to dynamically generate encap headers adds more flexibility, e.g.
> when routing depends on the state of the host (reflected in global bpf
> maps).
> 
> Note: a follow-up patch with deal with GSO-enabled packets, which
> are currently rejected at encapping attempt.
> 
> Signed-off-by: Peter Oskolkov <posk@google.com>
...
> +int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
> +{
> +	struct iphdr *iph;
> +	bool ipv4;
> +	int err;
> +
> +	if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
> +		return -EINVAL;
> +
> +	/* GSO-enabled packets cannot be encapped at the moment. */
> +	if (unlikely(skb_is_gso(skb)))
> +		return -EINVAL;

I don't understand why that's 'unlikely'.
Both tx and rx are very likely to have gso skbs.
Are you saying this feature will require user to disable gro/gso on a device?
imo gso has to be supported from the start.


^ permalink raw reply

* Re: [RFC] net: dp83640: expire old TX-skb
From: Sebastian Andrzej Siewior @ 2019-02-01 21:04 UTC (permalink / raw)
  To: Richard Cochran; +Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit, netdev
In-Reply-To: <20190131042606.zkinycpnbjpsm3dg@localhost>

On 2019-01-30 20:26:06 [-0800], Richard Cochran wrote:
> Thanks for the detailed explanation.  This sounds like a really rare
> bug, but maybe you guys were able to trigger it reliably?

More or less reliably, yes. I had two switches (for the uplink of the
PHY) and a third for my regular network during testing. I had one
combination which triggered the issue after no longer than the fifth
disconnect and another one which happily survived _a lot_ longer. 

> > Purge old skbs in decode_txts().
> 
> It is too bad that the Tx timestamp from the HW doesn't provide
> matching fields.  Using the timeout is probably the best that we can
> do.

yeah. I've been looking around and it seems there is nothing. And since
that packet from the PHY is sent very seen usually shortly after the
transmit I went for something similar that is already done the RX path.

Sebastian

^ permalink raw reply

* Re: [PATCH bpf-next v2 3/3] tools/bpf: simplify libbpf API function libbpf_set_print()
From: Yonghong Song @ 2019-02-01 20:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Arnaldo Carvalho de Melo, Magnus Karlsson, netdev@vger.kernel.org,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team
In-Reply-To: <CAEf4Bzbb8u0L8c6yZS0mPf554A0Udvs_idaX9xBdwmC8vcUK3A@mail.gmail.com>



On 2/1/19 11:02 AM, Andrii Nakryiko wrote:
> On Fri, Feb 1, 2019 at 10:16 AM Yonghong Song <yhs@fb.com> wrote:
>>
>> Currently, the libbpf API function libbpf_set_print()
>> takes three function pointer parameters for warning, info
>> and debug printout respectively.
>>
>> This patch changes the API to have just one function pointer
>> parameter and the function pointer has one additional
>> parameter "debugging level". So if in the future, if
>> the debug level is increased, the function signature
>> won't change.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   tools/lib/bpf/libbpf.c                        | 28 ++++-----------
>>   tools/lib/bpf/libbpf.h                        | 14 +++-----
>>   tools/lib/bpf/test_libbpf.cpp                 |  2 +-
>>   tools/perf/util/bpf-loader.c                  | 32 +++++++----------
>>   tools/testing/selftests/bpf/test_btf.c        |  7 ++--
>>   .../testing/selftests/bpf/test_libbpf_open.c  | 36 +++++++++----------
>>   tools/testing/selftests/bpf/test_progs.c      | 20 +++++++++--
>>   7 files changed, 63 insertions(+), 76 deletions(-)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 1b1c0b504d25..d2337a179837 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -54,8 +54,8 @@
>>
>>   #define __printf(a, b) __attribute__((format(printf, a, b)))
>>
>> -__printf(1, 2)
>> -static int __base_pr(const char *format, ...)
>> +__printf(2, 3)
>> +static int __base_pr(enum libbpf_print_level level, const char *format, ...)
>>   {
>>          va_list args;
>>          int err;
>> @@ -66,17 +66,11 @@ static int __base_pr(const char *format, ...)
>>          return err;
>>   }
>>
>> -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
>> -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
>> -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
>> +static __printf(2, 3) libbpf_print_fn_t __libbpf_pr = __base_pr;
>>
>> -void libbpf_set_print(libbpf_print_fn_t warn,
>> -                     libbpf_print_fn_t info,
>> -                     libbpf_print_fn_t debug)
>> +void libbpf_set_print(libbpf_print_fn_t fn)
>>   {
>> -       __pr_warning = warn;
>> -       __pr_info = info;
>> -       __pr_debug = debug;
>> +       __libbpf_pr = fn;
>>   }
>>
>>   __printf(2, 3)
>> @@ -85,16 +79,8 @@ void libbpf_debug_print(enum libbpf_print_level level, const char *format, ...)
>>          va_list args;
>>
>>          va_start(args, format);
>> -       if (level == LIBBPF_WARN) {
>> -               if (__pr_warning)
>> -                       __pr_warning(format, args);
>> -       } else if (level == LIBBPF_INFO) {
>> -               if (__pr_info)
>> -                       __pr_info(format, args);
>> -       } else {
>> -               if (__pr_debug)
>> -                       __pr_debug(format, args);
>> -       }
>> +       if (__libbpf_pr)
> 
> If __libbpf_pr is NULL, is there a need to call va_start/va_end? If
> not, should they be moved inside if's body?

You are right. Will fix this in the next version.

> 
>> +               __libbpf_pr(level, format, args);
>>          va_end(args);
>>   }
>>

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: powerpc64: add JIT support for bpf line info
From: Daniel Borkmann @ 2019-02-01 20:07 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: kafai, naveen.n.rao, netdev
In-Reply-To: <20190201103232.3355-1-sandipan@linux.ibm.com>

On 02/01/2019 11:32 AM, Sandipan Das wrote:
> This adds support for generating bpf line info for
> JITed programs.
> 
> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
> ---
>  arch/powerpc/net/bpf_jit_comp64.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index 15bba765fa79..4194d3cfb60c 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -1185,6 +1185,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  
>  	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
>  	if (!fp->is_func || extra_pass) {
> +		bpf_prog_fill_jited_linfo(fp, addrs);
>  out_addrs:
>  		kfree(addrs);
>  		kfree(jit_data);
> 

Applied, thanks!

^ permalink raw reply

* Re: [PATCH v7 bpf-next 0/9] introduce bpf_spin_lock
From: Daniel Borkmann @ 2019-02-01 20:07 UTC (permalink / raw)
  To: Alexei Starovoitov, davem; +Cc: peterz, jannh, netdev, kernel-team
In-Reply-To: <20190131234012.3712779-1-ast@kernel.org>

On 02/01/2019 12:40 AM, Alexei Starovoitov wrote:
> Many algorithms need to read and modify several variables atomically.
> Until now it was hard to impossible to implement such algorithms in BPF.
> Hence introduce support for bpf_spin_lock.
> 
> The api consists of 'struct bpf_spin_lock' that should be placed
> inside hash/array/cgroup_local_storage element
> and bpf_spin_lock/unlock() helper function.
> 
> Example:
> struct hash_elem {
>     int cnt;
>     struct bpf_spin_lock lock;
> };
> struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
> if (val) {
>     bpf_spin_lock(&val->lock);
>     val->cnt++;
>     bpf_spin_unlock(&val->lock);
> }
> 
> and BPF_F_LOCK flag for lookup/update bpf syscall commands that
> allows user space to read/write map elements under lock.
> 
> Together these primitives allow race free access to map elements
> from bpf programs and from user space.
> 
> Key restriction: root only.
> Key requirement: maps must be annotated with BTF.
> 
> This concept was discussed at Linux Plumbers Conference 2018.
> Thank you everyone who participated and helped to iron out details
> of api and implementation.
> 
> Patch 1: bpf_spin_lock support in the verifier, BTF, hash, array.
> Patch 2: bpf_spin_lock in cgroup local storage.
> Patches 3,4,5: tests
> Patch 6: BPF_F_LOCK flag to lookup/update
> Patches 7,8,9: tests
> 
> v6->v7:
> - fixed this_cpu->__this_cpu per Peter's suggestion and added Ack.
> - simplified bpf_spin_lock and load/store overlap check in the verifier
>   as suggested by Andrii
> - rebase
> 
> v5->v6:
> - adopted arch_spinlock approach suggested by Peter
> - switched to spin_lock_irqsave equivalent as the simplest way
>   to avoid deadlocks in rare case of nested networking progs
>   (cgroup-bpf prog in preempt_disable vs clsbpf in softirq sharing
>   the same map with bpf_spin_lock)
>   bpf_spin_lock is only allowed in networking progs that don't
>   have arbitrary entry points unlike tracing progs.
> - rebase and split test_verifier tests
> 
> v4->v5:
> - disallow bpf_spin_lock for tracing progs due to insufficient preemption checks
> - socket filter progs cannot use bpf_spin_lock due to missing preempt_disable
> - fix atomic_set_release. Spotted by Peter.
> - fixed hash_of_maps
>   
> v3->v4:
> - fix BPF_EXIST | BPF_NOEXIST check patch 6. Spotted by Jakub. Thanks!
> - rebase
> 
> v2->v3:
> - fixed build on ia64 and archs where qspinlock is not supported
> - fixed missing lock init during lookup w/o BPF_F_LOCK. Spotted by Martin
> 
> v1->v2:
> - addressed several issues spotted by Daniel and Martin in patch 1
> - added test11 to patch 4 as suggested by Daniel
> 
> Alexei Starovoitov (9):
>   bpf: introduce bpf_spin_lock
>   bpf: add support for bpf_spin_lock to cgroup local storage
>   tools/bpf: sync include/uapi/linux/bpf.h
>   selftests/bpf: add bpf_spin_lock verifier tests
>   selftests/bpf: add bpf_spin_lock C test
>   bpf: introduce BPF_F_LOCK flag
>   tools/bpf: sync uapi/bpf.h
>   libbpf: introduce bpf_map_lookup_elem_flags()
>   selftests/bpf: test for BPF_F_LOCK
> 
>  include/linux/bpf.h                           |  39 ++-
>  include/linux/bpf_verifier.h                  |   1 +
>  include/linux/btf.h                           |   1 +
>  include/uapi/linux/bpf.h                      |   8 +-
>  kernel/Kconfig.locks                          |   3 +
>  kernel/bpf/arraymap.c                         |  23 +-
>  kernel/bpf/btf.c                              |  42 +++
>  kernel/bpf/core.c                             |   2 +
>  kernel/bpf/hashtab.c                          |  63 +++-
>  kernel/bpf/helpers.c                          |  96 +++++
>  kernel/bpf/local_storage.c                    |  16 +-
>  kernel/bpf/map_in_map.c                       |   5 +
>  kernel/bpf/syscall.c                          |  45 ++-
>  kernel/bpf/verifier.c                         | 171 ++++++++-
>  net/core/filter.c                             |  16 +-
>  tools/include/uapi/linux/bpf.h                |   8 +-
>  tools/lib/bpf/bpf.c                           |  13 +
>  tools/lib/bpf/bpf.h                           |   2 +
>  tools/lib/bpf/libbpf.map                      |   1 +
>  tools/testing/selftests/bpf/Makefile          |   2 +-
>  tools/testing/selftests/bpf/bpf_helpers.h     |   4 +
>  tools/testing/selftests/bpf/test_map_lock.c   |  66 ++++
>  tools/testing/selftests/bpf/test_progs.c      | 117 ++++++-
>  tools/testing/selftests/bpf/test_spin_lock.c  | 108 ++++++
>  tools/testing/selftests/bpf/test_verifier.c   | 104 +++++-
>  .../selftests/bpf/verifier/spin_lock.c        | 331 ++++++++++++++++++
>  26 files changed, 1248 insertions(+), 39 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/test_map_lock.c
>  create mode 100644 tools/testing/selftests/bpf/test_spin_lock.c
>  create mode 100644 tools/testing/selftests/bpf/verifier/spin_lock.c
> 

Applied, thanks!

^ permalink raw reply

* stable 3.18 backport: netlink: Trim skb to alloc size to avoid MSG_TRUNC
From: Mark Salyzyn @ 2019-02-01 19:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Salyzyn, Ronen Arad, David S . Miller, Dmitry Safonov,
	David Ahern, Kirill Tkhai, Andrei Vagin, Li RongQing, YU Bo,
	Denys Vlasenko, netdev

Direct this upstream db65a3aaf29ecce2e34271d52e8d2336b97bd9fe sha to
stable 3.18.  This patch addresses a race condition where a call to

 nlk->max_recvmsg_len = max(nlk->max_recvmsg_len, len);
 nlk->max_recvmsg_len = min_t(size_t, nlk->max_recvmsg_len,

one thread in-between another thread:

 skb = netlink_alloc_skb(sk,

and

 skb_reserve(skb, skb_tailroom(skb) -
             nlk->max_recvmsg_len);

in netlink_dump.  The result can be a negative value and will cause
a kernel panic ad BUG at net/core/skbuff.c because the negative value
turns into an extremely large positive value.

Original commit:

netlink_dump() allocates skb based on the calculated min_dump_alloc or
a per socket max_recvmsg_len.
min_alloc_size is maximum space required for any single netdev
attributes as calculated by rtnl_calcit().
max_recvmsg_len tracks the user provided buffer to netlink_recvmsg.
It is capped at 16KiB.
The intention is to avoid small allocations and to minimize the number
of calls required to obtain dump information for all net devices.

netlink_dump packs as many small messages as could fit within an skb
that was sized for the largest single netdev information. The actual
space available within an skb is larger than what is requested. It could
be much larger and up to near 2x with align to next power of 2 approach.

Allowing netlink_dump to use all the space available within the
allocated skb increases the buffer size a user has to provide to avoid
truncaion (i.e. MSG_TRUNG flag set).

It was observed that with many VLANs configured on at least one netdev,
a larger buffer of near 64KiB was necessary to avoid "Message truncated"
error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when
min_alloc_size was only little over 32KiB.

This patch trims skb to allocated size in order to allow the user to
avoid truncation with more reasonable buffer size.

Signed-off-by: Ronen Arad <ronen.arad@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry pick commit db65a3aaf29ecce2e34271d52e8d2336b97bd9fe)
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
---
 net/netlink/af_netlink.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 50096e0edd8e..57d9a72f8b6d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1977,6 +1977,7 @@ static int netlink_dump(struct sock *sk)
 	struct nlmsghdr *nlh;
 	struct module *module;
 	int err = -ENOBUFS;
+	int alloc_min_size;
 	int alloc_size;

 	mutex_lock(nlk->cb_mutex);
@@ -1985,9 +1986,6 @@ static int netlink_dump(struct sock *sk)
 		goto errout_skb;
 	}

-	cb = &nlk->cb;
-	alloc_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE);
-
 	if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
 		goto errout_skb;

@@ -1996,22 +1994,34 @@ static int netlink_dump(struct sock *sk)
 	 * to reduce number of system calls on dump operations, if user
 	 * ever provided a big enough buffer.
 	 */
+	cb = &nlk->cb;
+	alloc_min_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE);
+
 	if (alloc_size < nlk->max_recvmsg_len) {
-		skb = netlink_alloc_skb(sk,
-					nlk->max_recvmsg_len,
-					nlk->portid,
+		alloc_size = nlk->max_recvmsg_len;
+		skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
 					(GFP_KERNEL & ~__GFP_WAIT) |
 					__GFP_NOWARN | __GFP_NORETRY);
-		/* available room should be exact amount to avoid MSG_TRUNC */
-		if (skb)
-			skb_reserve(skb, skb_tailroom(skb) -
-					 nlk->max_recvmsg_len);
 	}
-	if (!skb)
+	if (!skb) {
+		alloc_size = alloc_min_size;
 		skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
 					(GFP_KERNEL & ~__GFP_WAIT));
+	}
 	if (!skb)
 		goto errout_skb;
+
+	/* Trim skb to allocated size. User is expected to provide buffer as
+	 * large as max(min_dump_alloc, 16KiB (mac_recvmsg_len capped at
+	 * netlink_recvmsg())). dump will pack as many smaller messages as
+	 * could fit within the allocated skb. skb is typically allocated
+	 * with larger space than required (could be as much as near 2x the
+	 * requested size with align to next power of 2 approach). Allowing
+	 * dump to use the excess space makes it difficult for a user to have a
+	 * reasonable static buffer based on the expected largest dump of a
+	 * single netdev. The outcome is MSG_TRUNC error.
+	 */
+	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
 	netlink_skb_set_owner_r(skb, sk);

 	if (nlk->dump_done_errno > 0)
-- 
2.20.1.611.gfbb209baf1-goog

^ permalink raw reply related

* Re: [PATCH net] sctp: walk the list of asoc safely
From: David Miller @ 2019-02-01 19:04 UTC (permalink / raw)
  To: gregkh; +Cc: vyasevich, nhorman, marcelo.leitner, linux-sctp, netdev
In-Reply-To: <20190201141522.GA20785@kroah.com>

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Fri, 1 Feb 2019 15:15:22 +0100

> In sctp_sendmesg(), when walking the list of endpoint associations, the
> association can be dropped from the list, making the list corrupt.
> Properly handle this by using list_for_each_entry_safe()
> 
> Fixes: 4910280503f3 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
> Reported-by: Secunia Research <vuln@secunia.com>
> Tested-by: Secunia Research <vuln@secunia.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH 0/3] pull request for net-next: batman-adv 2019-02-01
From: David Miller @ 2019-02-01 19:04 UTC (permalink / raw)
  To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20190201111810.14150-1-sw@simonwunderlich.de>

From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Fri,  1 Feb 2019 12:18:07 +0100

> here is a small feature/cleanup pull request of batman-adv to go into net-next.
> 
> Please pull or let me know of any problem!

Pulled, thanks Simon.


^ permalink raw reply

* Re: [PATCH bpf-next v2 3/3] tools/bpf: simplify libbpf API function libbpf_set_print()
From: Andrii Nakryiko @ 2019-02-01 19:02 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Arnaldo Carvalho de Melo, Magnus Karlsson, netdev,
	Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20190201174733.695666-1-yhs@fb.com>

On Fri, Feb 1, 2019 at 10:16 AM Yonghong Song <yhs@fb.com> wrote:
>
> Currently, the libbpf API function libbpf_set_print()
> takes three function pointer parameters for warning, info
> and debug printout respectively.
>
> This patch changes the API to have just one function pointer
> parameter and the function pointer has one additional
> parameter "debugging level". So if in the future, if
> the debug level is increased, the function signature
> won't change.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  tools/lib/bpf/libbpf.c                        | 28 ++++-----------
>  tools/lib/bpf/libbpf.h                        | 14 +++-----
>  tools/lib/bpf/test_libbpf.cpp                 |  2 +-
>  tools/perf/util/bpf-loader.c                  | 32 +++++++----------
>  tools/testing/selftests/bpf/test_btf.c        |  7 ++--
>  .../testing/selftests/bpf/test_libbpf_open.c  | 36 +++++++++----------
>  tools/testing/selftests/bpf/test_progs.c      | 20 +++++++++--
>  7 files changed, 63 insertions(+), 76 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 1b1c0b504d25..d2337a179837 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -54,8 +54,8 @@
>
>  #define __printf(a, b) __attribute__((format(printf, a, b)))
>
> -__printf(1, 2)
> -static int __base_pr(const char *format, ...)
> +__printf(2, 3)
> +static int __base_pr(enum libbpf_print_level level, const char *format, ...)
>  {
>         va_list args;
>         int err;
> @@ -66,17 +66,11 @@ static int __base_pr(const char *format, ...)
>         return err;
>  }
>
> -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
> +static __printf(2, 3) libbpf_print_fn_t __libbpf_pr = __base_pr;
>
> -void libbpf_set_print(libbpf_print_fn_t warn,
> -                     libbpf_print_fn_t info,
> -                     libbpf_print_fn_t debug)
> +void libbpf_set_print(libbpf_print_fn_t fn)
>  {
> -       __pr_warning = warn;
> -       __pr_info = info;
> -       __pr_debug = debug;
> +       __libbpf_pr = fn;
>  }
>
>  __printf(2, 3)
> @@ -85,16 +79,8 @@ void libbpf_debug_print(enum libbpf_print_level level, const char *format, ...)
>         va_list args;
>
>         va_start(args, format);
> -       if (level == LIBBPF_WARN) {
> -               if (__pr_warning)
> -                       __pr_warning(format, args);
> -       } else if (level == LIBBPF_INFO) {
> -               if (__pr_info)
> -                       __pr_info(format, args);
> -       } else {
> -               if (__pr_debug)
> -                       __pr_debug(format, args);
> -       }
> +       if (__libbpf_pr)

If __libbpf_pr is NULL, is there a need to call va_start/va_end? If
not, should they be moved inside if's body?

> +               __libbpf_pr(level, format, args);
>         va_end(args);
>  }
>
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 4e21971101c9..f8f27f1bb6bf 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -53,17 +53,11 @@ enum libbpf_print_level {
>          LIBBPF_DEBUG,
>  };
>
> -/*
> - * __printf is defined in include/linux/compiler-gcc.h. However,
> - * it would be better if libbpf.h didn't depend on Linux header files.
> - * So instead of __printf, here we use gcc attribute directly.
> - */
> -typedef int (*libbpf_print_fn_t)(const char *, ...)
> -       __attribute__((format(printf, 1, 2)));
> +typedef int (*libbpf_print_fn_t)(enum libbpf_print_level level,
> +                                const char *, ...)
> +       __attribute__((format(printf, 2, 3)));
>
> -LIBBPF_API void libbpf_set_print(libbpf_print_fn_t warn,
> -                                libbpf_print_fn_t info,
> -                                libbpf_print_fn_t debug);
> +LIBBPF_API void libbpf_set_print(libbpf_print_fn_t fn);
>
>  /* Hide internal to user */
>  struct bpf_object;
> diff --git a/tools/lib/bpf/test_libbpf.cpp b/tools/lib/bpf/test_libbpf.cpp
> index be67f5ea2c19..fc134873bb6d 100644
> --- a/tools/lib/bpf/test_libbpf.cpp
> +++ b/tools/lib/bpf/test_libbpf.cpp
> @@ -8,7 +8,7 @@
>  int main(int argc, char *argv[])
>  {
>      /* libbpf.h */
> -    libbpf_set_print(NULL, NULL, NULL);
> +    libbpf_set_print(NULL);
>
>      /* bpf.h */
>      bpf_prog_get_fd_by_id(0);
> diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> index 2f3eb6d293ee..c492f3a2acdc 100644
> --- a/tools/perf/util/bpf-loader.c
> +++ b/tools/perf/util/bpf-loader.c
> @@ -24,21 +24,17 @@
>  #include "llvm-utils.h"
>  #include "c++/clang-c.h"
>
> -#define DEFINE_PRINT_FN(name, level) \
> -static int libbpf_##name(const char *fmt, ...) \
> -{                                              \
> -       va_list args;                           \
> -       int ret;                                \
> -                                               \
> -       va_start(args, fmt);                    \
> -       ret = veprintf(level, verbose, pr_fmt(fmt), args);\
> -       va_end(args);                           \
> -       return ret;                             \
> -}
> +static int libbpf_perf_dprint(enum libbpf_print_level level __attribute__((unused)),
> +                             const char *fmt, ...)
> +{
> +       va_list args;
> +       int ret;
>
> -DEFINE_PRINT_FN(warning, 1)
> -DEFINE_PRINT_FN(info, 1)
> -DEFINE_PRINT_FN(debug, 1)
> +       va_start(args, fmt);
> +       ret = veprintf(1, verbose, pr_fmt(fmt), args);
> +       va_end(args);
> +       return ret;
> +}
>
>  struct bpf_prog_priv {
>         bool is_tp;
> @@ -59,9 +55,7 @@ bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz, const char *name)
>         struct bpf_object *obj;
>
>         if (!libbpf_initialized) {
> -               libbpf_set_print(libbpf_warning,
> -                                libbpf_info,
> -                                libbpf_debug);
> +               libbpf_set_print(libbpf_perf_dprint);
>                 libbpf_initialized = true;
>         }
>
> @@ -79,9 +73,7 @@ struct bpf_object *bpf__prepare_load(const char *filename, bool source)
>         struct bpf_object *obj;
>
>         if (!libbpf_initialized) {
> -               libbpf_set_print(libbpf_warning,
> -                                libbpf_info,
> -                                libbpf_debug);
> +               libbpf_set_print(libbpf_perf_dprint);
>                 libbpf_initialized = true;
>         }
>
> diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
> index 179f1d8ec5bf..aebaeff5a5a0 100644
> --- a/tools/testing/selftests/bpf/test_btf.c
> +++ b/tools/testing/selftests/bpf/test_btf.c
> @@ -54,8 +54,9 @@ static int count_result(int err)
>
>  #define __printf(a, b) __attribute__((format(printf, a, b)))
>
> -__printf(1, 2)
> -static int __base_pr(const char *format, ...)
> +__printf(2, 3)
> +static int __base_pr(enum libbpf_print_level level __attribute__((unused)),
> +                    const char *format, ...)
>  {
>         va_list args;
>         int err;
> @@ -5650,7 +5651,7 @@ int main(int argc, char **argv)
>                 return err;
>
>         if (args.always_log)
> -               libbpf_set_print(__base_pr, __base_pr, __base_pr);
> +               libbpf_set_print(__base_pr);
>
>         if (args.raw_test)
>                 err |= test_raw();
> diff --git a/tools/testing/selftests/bpf/test_libbpf_open.c b/tools/testing/selftests/bpf/test_libbpf_open.c
> index 8fcd1c076add..3fe258520e4b 100644
> --- a/tools/testing/selftests/bpf/test_libbpf_open.c
> +++ b/tools/testing/selftests/bpf/test_libbpf_open.c
> @@ -34,23 +34,22 @@ static void usage(char *argv[])
>         printf("\n");
>  }
>
> -#define DEFINE_PRINT_FN(name, enabled) \
> -static int libbpf_##name(const char *fmt, ...)         \
> -{                                                      \
> -        va_list args;                                  \
> -        int ret;                                       \
> -                                                       \
> -        va_start(args, fmt);                           \
> -       if (enabled) {                                  \
> -               fprintf(stderr, "[" #name "] ");        \
> -               ret = vfprintf(stderr, fmt, args);      \
> -       }                                               \
> -        va_end(args);                                  \
> -        return ret;                                    \
> +static bool debug = 0;
> +static int libbpf_print(enum libbpf_print_level level,
> +                       const char *fmt, ...)
> +{
> +       va_list args;
> +       int ret;
> +
> +       if (level == LIBBPF_DEBUG && !debug)
> +               return 0;
> +
> +       va_start(args, fmt);
> +       fprintf(stderr, "[%d] ", level);
> +       ret = vfprintf(stderr, fmt, args);
> +       va_end(args);
> +       return ret;
>  }
> -DEFINE_PRINT_FN(warning, 1)
> -DEFINE_PRINT_FN(info, 1)
> -DEFINE_PRINT_FN(debug, 1)
>
>  #define EXIT_FAIL_LIBBPF EXIT_FAILURE
>  #define EXIT_FAIL_OPTION 2
> @@ -120,15 +119,14 @@ int main(int argc, char **argv)
>         int longindex = 0;
>         int opt;
>
> -       libbpf_set_print(libbpf_warning, libbpf_info, NULL);
> +       libbpf_set_print(libbpf_print);
>
>         /* Parse commands line args */
>         while ((opt = getopt_long(argc, argv, "hDq",
>                                   long_options, &longindex)) != -1) {
>                 switch (opt) {
>                 case 'D':
> -                       libbpf_set_print(libbpf_warning, libbpf_info,
> -                                        libbpf_debug);
> +                       debug = 1;
>                         break;
>                 case 'q': /* Use in scripting mode */
>                         verbose = 0;
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index d8940b8b2f8d..5eff68ab2c1c 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -10,6 +10,7 @@
>  #include <string.h>
>  #include <assert.h>
>  #include <stdlib.h>
> +#include <stdarg.h>
>  #include <time.h>
>
>  #include <linux/types.h>
> @@ -1783,6 +1784,21 @@ static void test_task_fd_query_tp(void)
>                                    "sys_enter_read");
>  }
>
> +static int libbpf_print(enum libbpf_print_level level,
> +                       const char *format, ...)
> +{
> +       va_list args;
> +       int ret;
> +
> +       if (level == LIBBPF_DEBUG)
> +               return 0;
> +
> +       va_start(args, format);
> +       ret = vfprintf(stderr, format, args);
> +       va_end(args);
> +       return ret;
> +}
> +
>  static void test_reference_tracking()
>  {
>         const char *file = "./test_sk_lookup_kern.o";
> @@ -1809,9 +1825,9 @@ static void test_reference_tracking()
>
>                 /* Expect verifier failure if test name has 'fail' */
>                 if (strstr(title, "fail") != NULL) {
> -                       libbpf_set_print(NULL, NULL, NULL);
> +                       libbpf_set_print(NULL);
>                         err = !bpf_program__load(prog, "GPL", 0);
> -                       libbpf_set_print(printf, printf, NULL);
> +                       libbpf_set_print(libbpf_print);
>                 } else {
>                         err = bpf_program__load(prog, "GPL", 0);
>                 }
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH net-next v4 00/12] net: y2038-safe socket timestamps
From: Willem de Bruijn @ 2019-02-01 18:59 UTC (permalink / raw)
  To: Deepa Dinamani
  Cc: David Miller, LKML, Network Development, Arnd Bergmann,
	y2038 Mailman List
In-Reply-To: <20190201154356.15536-1-deepa.kernel@gmail.com>

On Fri, Feb 1, 2019 at 7:47 AM Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>
> The series introduces new socket timestamps that are
> y2038 safe.
>
> The time data types used for the existing socket timestamp
> options: SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING
> are not y2038 safe. The series introduces SO_TIMESTAMP_NEW,
> SO_TIMESTAMPNS_NEW and SO_TIMESTAMPING_NEW to replace these.
> These new timestamps can be used on all architectures.
>
> The alternative considered was to extend the sys_setsockopt()
> by using the flags. We did not receive any strong opinions about
> either of the approaches. Hence, this was chosen, as glibc folks
> preferred this.
>
> The series does not deal with updating the internal kernel socket
> calls like rxrpc to make them y2038 safe. This will be dealt
> with separately.
>
> Note that the timestamps behavior already does not match the
> man page specific behavior:
> SIOCGSTAMP
>     This ioctl should only be used if the socket option SO_TIMESTAMP
>         is not set on the socket. Otherwise, it returns the timestamp of
>         the last packet that was received while SO_TIMESTAMP was not set,
>         or it fails if no such packet has been received,
>         (i.e., ioctl(2) returns -1 with errno set to ENOENT).
>
> The recommendation is to update the man page to remove the above statement.
>
> The overview of the socket timestamp series is as below:
> 1. Delete asm specific socket.h when possible.
> 2. Support SO/SCM_TIMESTAMP* options only in userspace.
> 3. Rename current SO/SCM_TIMESTAMP* to SO/SCM_TIMESTAMP*_OLD.
> 3. Alter socket options so that SOCK_RCVTSTAMPNS does
>    not rely on SOCK_RCVTSTAMP.
> 4. Introduce y2038 safe types for socket timestamp.
> 5. Introduce new y2038 safe socket options SO/SCM_TIMESTAMP*_NEW.
> 6. Intorduce new y2038 safe socket timeout options.
>
> Changes since v3:
> * Rebased onto net-next and fixups as per review comments
> * Merged the socket timeout series
> * Integrated Arnd's patch to simplify compat handling of timeout syscalls
>
> Changes since v2:
> * Removed extra functions to reduce diff churn as per code review
>
> Changes since v1:
> * Dropped the change to disentangle sock flags
> * Renamed sock_timeval to __kernel_sock_timeval
> * Updated a few comments
> * Added documentation changes
>
> Arnd Bergmann (1):
>   socket: move compat timeout handling into sock.c
>
> Deepa Dinamani (11):
>   selftests: add missing include unistd
>   arch: Use asm-generic/socket.h when possible
>   sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD
>   arch: sparc: Override struct __kernel_old_timeval
>   socket: Use old_timeval types for socket timestamps
>   socket: Add struct __kernel_sock_timeval
>   socket: Add SO_TIMESTAMP[NS]_NEW
>   socket: Add SO_TIMESTAMPING_NEW
>   socket: Update timestamping Documentation
>   socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes
>   sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW
>
>  Documentation/networking/timestamping.txt     |  43 ++++-
>  arch/alpha/include/uapi/asm/socket.h          |  47 ++++--
>  arch/ia64/include/uapi/asm/Kbuild             |   1 +
>  arch/ia64/include/uapi/asm/socket.h           | 122 --------------
>  arch/mips/include/uapi/asm/socket.h           |  47 ++++--
>  arch/parisc/include/uapi/asm/socket.h         |  46 ++++--
>  arch/powerpc/include/uapi/asm/socket.h        |   4 +-
>  arch/s390/include/uapi/asm/Kbuild             |   1 +
>  arch/s390/include/uapi/asm/socket.h           | 119 --------------
>  arch/sparc/include/uapi/asm/posix_types.h     |  10 ++
>  arch/sparc/include/uapi/asm/socket.h          |  49 ++++--
>  arch/x86/include/uapi/asm/Kbuild              |   1 +
>  arch/x86/include/uapi/asm/socket.h            |   1 -
>  arch/xtensa/include/asm/Kbuild                |   1 +
>  arch/xtensa/include/uapi/asm/Kbuild           |   1 +
>  arch/xtensa/include/uapi/asm/socket.h         | 124 --------------
>  drivers/isdn/mISDN/socket.c                   |   2 +-
>  fs/dlm/lowcomms.c                             |   4 +-
>  include/linux/skbuff.h                        |  24 ++-
>  include/linux/socket.h                        |   8 +
>  include/net/sock.h                            |   1 +
>  include/uapi/asm-generic/socket.h             |  48 ++++--
>  include/uapi/linux/errqueue.h                 |   4 +
>  include/uapi/linux/time.h                     |   7 +
>  net/bluetooth/hci_sock.c                      |   4 +-
>  net/compat.c                                  |  78 +--------
>  net/core/scm.c                                |  27 ++++
>  net/core/sock.c                               | 151 +++++++++++++-----
>  net/ipv4/tcp.c                                |  61 ++++---
>  net/rds/af_rds.c                              |  10 +-
>  net/rds/recv.c                                |  18 ++-
>  net/rxrpc/local_object.c                      |   2 +-
>  net/smc/af_smc.c                              |   3 +-
>  net/socket.c                                  |  50 ++++--
>  net/vmw_vsock/af_vsock.c                      |   4 +-
>  .../networking/timestamping/rxtimestamp.c     |   1 +
>  36 files changed, 541 insertions(+), 583 deletions(-)
>  delete mode 100644 arch/ia64/include/uapi/asm/socket.h
>  delete mode 100644 arch/s390/include/uapi/asm/socket.h
>  delete mode 100644 arch/x86/include/uapi/asm/socket.h
>  delete mode 100644 arch/xtensa/include/uapi/asm/socket.h
>
> --
> 2.17.1
>

For the series:

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: [PATCH net-next v4 02/12] socket: move compat timeout handling into sock.c
From: Willem de Bruijn @ 2019-02-01 18:58 UTC (permalink / raw)
  To: Deepa Dinamani
  Cc: David Miller, LKML, Network Development, Arnd Bergmann,
	y2038 Mailman List
In-Reply-To: <20190201154356.15536-3-deepa.kernel@gmail.com>

On Fri, Feb 1, 2019 at 7:48 AM Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> This is a cleanup to prepare for the addition of 64-bit time_t
> in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
> unnecessarily complex and error-prone, moving it all into the
> main setsockopt()/getsockopt() implementation requires half
> as much code and is easier to extend.
>
> 32-bit user space can now use old_timeval32 on both 32-bit
> and 64-bit machines, while 64-bit code can use
> __old_kernel_timeval.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> ---

> @@ -1121,7 +1155,8 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
>                 int val;
>                 u64 val64;
>                 struct linger ling;
> -               struct timeval tm;
> +               struct old_timeval32 tm32;
> +               struct __kernel_old_timeval tm;

nit: not used?

same for stm added later in the series

^ permalink raw reply

* Re: Co-existing XDP generic and native mode? (Re: [PATCH bpf-next v5 5/8] xdp: Provide extack messages when prog attachment failed)
From: Jakub Kicinski @ 2019-02-01 18:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: daniel, ast, David Miller, Maciej Fijalkowski, netdev,
	john.fastabend, David Ahern, Saeed Mahameed
In-Reply-To: <20190201080236.446d84d4@redhat.com>

On Fri, 1 Feb 2019 08:02:36 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 31 Jan 2019 19:11:01 -0800
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> 
> > On Fri,  1 Feb 2019 01:19:51 +0100, Maciej Fijalkowski wrote:  
> > >  		if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) ||
> > > -		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW))
> > > +		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) {
> > > +			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
> > >  			return -EEXIST;
> > > +		}    
> > 
> > This reminds me, since we allowed native/driver and offloaded XDP
> > programs to coexist in a25717d2b604 ("xdp: support simultaneous 
> > driver and hw XDP attachment") I got an internal feature request 
> > to also allow generic and native mode.  Would anyone object to that?  
> 
> Well, I will object ;-)
> 
> I have two refactor ideas [1] and [2], that depend on not allowing
> XDP-native and XDP-generic to co-exist.   The general idea is to let
> XDP-native use the same fields in net_device->rx[] as XDP-generic given
> they (currently) cannot co-exist. 
>  The goal is (1) to move stuff out of driver code, and (2) hopefully
> make it easier to implement per RXq XDP progs.

You mean you'd use one pointer to keep the prog in the RXQ structure?
Then some from from of an extra flag will be necessary to distinguish?
I.e.:
 if (rxq->prog && rxq->is_native)
	/* got_prog */

rather than:
 if (rxq->native_prog)
	/* got_prog */
 
The cost of this reuse would be a read-only cache line per-q when XDP is
not enabled.  Right now drivers have the ability to pack the XDP prog
into a structure which is in cache already, and don't need to bring the
entire RXQ structure out (which is cache line aligned so driver authors
can't do anything to place it cleverly).

No doubt, thought, that if we allow both to be enabled we will have to
bloat the data structures.

> These are only refactor ideas, so if you can argue why your internal
> feature request for simultaneous generic and native make more sense,
> then I'm open for allowing this ?

The request was actually to enable xdpoffload and xdpgeneric at the
same time.  I'm happy to have that as another HW offload exclusive
for now :)

> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_per_rxq01.org#refactor-idea-move-xdp_rxq_info-to-net_devicenetdev_rx_queue
> 
> [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_per_rxq01.org#refactor-idea-xdpbpf_prog-into-netdev_rx_queuenet_device
> 
> > Apart from a touch up to test_offload.py I don't think anything 
> > would care.  netlink can already carry multiple IDs, iproute2
> > understands it, too..  
> 
> And we did notice you added support for HW+native:
>  [3] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_per_rxq01.org


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox