Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 1/2] can: hi311x: Acquire SPI lock on ->do_get_berr_counter
From: Marc Kleine-Budde @ 2018-05-10 16:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, linux-can, kernel, Lukas Wunner, Mathias Duckeck,
	Akshay Bhat, Casey Fitzpatrick, Stef Walter, Karel Zak, stable,
	Marc Kleine-Budde
In-Reply-To: <20180510164749.20481-1-mkl@pengutronix.de>

From: Lukas Wunner <lukas@wunner.de>

hi3110_get_berr_counter() may run concurrently to the rest of the driver
but neglects to acquire the lock protecting access to the SPI device.
As a result, it and the rest of the driver may clobber each other's tx
and rx buffers.

We became aware of this issue because transmission of packets with
"cangen -g 0 -i -x" frequently hung.  It turns out that agetty executes
->do_get_berr_counter every few seconds via the following call stack:

    CPU: 2 PID: 1605 Comm: agetty
    [<7f3f7500>] (hi3110_get_berr_counter [hi311x])
    [<7f130204>] (can_fill_info [can_dev])
    [<80693bc0>] (rtnl_fill_ifinfo)
    [<806949ec>] (rtnl_dump_ifinfo)
    [<806b4834>] (netlink_dump)
    [<806b4bc8>] (netlink_recvmsg)
    [<8065f180>] (sock_recvmsg)
    [<80660f90>] (___sys_recvmsg)
    [<80661e7c>] (__sys_recvmsg)
    [<80661ec0>] (SyS_recvmsg)
    [<80108b20>] (ret_fast_syscall+0x0/0x1c)

agetty listens to netlink messages in order to update the login prompt
when IP addresses change (if /etc/issue contains \4 or \6 escape codes):
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=e36deb6424e8

It's a useful feature, though it seems questionable that it causes CAN
bit error statistics to be queried.

Be that as it may, if hi3110_get_berr_counter() is invoked while a frame
is sent by hi3110_hw_tx(), bogus SPI transfers like the following may
occur:

    => 12 00             (hi3110_get_berr_counter() wanted to transmit
                          EC 00 to query the transmit error counter,
                          but the first byte was overwritten by
                          hi3110_hw_tx_frame())

    => EA 00 3E 80 01 FB (hi3110_hw_tx_frame() wanted to transmit a
                          frame, but the first byte was overwritten by
                          hi3110_get_berr_counter() because it wanted
                          to query the receive error counter)

This sequence hangs the transmission because the driver believes it has
sent a frame and waits for the interrupt signaling completion, but in
reality the chip has never sent away the frame since the commands it
received were malformed.

Fix by acquiring the SPI lock in hi3110_get_berr_counter().

I've scrutinized the entire driver for further unlocked SPI accesses but
found no others.

Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: Stef Walter <stefw@redhat.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Akshay Bhat <akshay.bhat@timesys.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/spi/hi311x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
index 5590c559a8ca..c2cf254e4e95 100644
--- a/drivers/net/can/spi/hi311x.c
+++ b/drivers/net/can/spi/hi311x.c
@@ -427,8 +427,10 @@ static int hi3110_get_berr_counter(const struct net_device *net,
 	struct hi3110_priv *priv = netdev_priv(net);
 	struct spi_device *spi = priv->spi;
 
+	mutex_lock(&priv->hi3110_lock);
 	bec->txerr = hi3110_read(spi, HI3110_READ_TEC);
 	bec->rxerr = hi3110_read(spi, HI3110_READ_REC);
+	mutex_unlock(&priv->hi3110_lock);
 
 	return 0;
 }
-- 
2.17.0

^ permalink raw reply related

* [PATCH] [PATCH net v2] ipv6: remove min MTU check for ipsec tunnels
From: Ashwanth Goli @ 2018-05-10 16:48 UTC (permalink / raw)
  To: netdev, davem; +Cc: pabeni, dsahern

With 749439bfac "fix udpv6 sendmsg crash caused by too small MTU"
ipsec tunnels that report a MTU less than IPV6_MIN_MTU are broken
even for packets that are smaller than IPV6_MIN_MTU.

According to rfc2473#section-7.1

    if the original IPv6 packet is equal or  smaller  than  the
    IPv6 minimum link MTU, the tunnel entry-point node
    encapsulates the original packet, and subsequently
    fragments the resulting IPv6 tunnel packet into IPv6
    fragments that do not exceed the Path MTU to the tunnel
    exit-point.

Dropping the MTU check for ipsec tunnel destinations.

Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
---
 net/ipv6/ip6_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2e891d2..c4c3313 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
-	if (mtu < IPV6_MIN_MTU)
+	if (!(rt->dst.flags & DST_XFRM_TUNNEL) && mtu < IPV6_MIN_MTU)
 		return -EINVAL;
 	cork->base.fragsize = mtu;
 	if (dst_allfrag(xfrm_dst_path(&rt->dst)))
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] [PATCH net v2] ipv6: remove min MTU check for ipsec tunnels
From: Eric Dumazet @ 2018-05-10 17:05 UTC (permalink / raw)
  To: Ashwanth Goli, netdev, davem; +Cc: pabeni, dsahern
In-Reply-To: <1525970887-26600-1-git-send-email-ashwanth@codeaurora.org>



On 05/10/2018 09:48 AM, Ashwanth Goli wrote:
> With 749439bfac "fix udpv6 sendmsg crash caused by too small MTU"
> ipsec tunnels that report a MTU less than IPV6_MIN_MTU are broken
> even for packets that are smaller than IPV6_MIN_MTU.
> 
> According to rfc2473#section-7.1
> 
>     if the original IPv6 packet is equal or  smaller  than  the
>     IPv6 minimum link MTU, the tunnel entry-point node
>     encapsulates the original packet, and subsequently
>     fragments the resulting IPv6 tunnel packet into IPv6
>     fragments that do not exceed the Path MTU to the tunnel
>     exit-point.
> 
> Dropping the MTU check for ipsec tunnel destinations.
> 
> Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
> ---
>  net/ipv6/ip6_output.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 2e891d2..c4c3313 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
>  		if (np->frag_size)
>  			mtu = np->frag_size;
>  	}
> -	if (mtu < IPV6_MIN_MTU)
> +	if (!(rt->dst.flags & DST_XFRM_TUNNEL) && mtu < IPV6_MIN_MTU)
>  		return -EINVAL;
>  	cork->base.fragsize = mtu;
>  	if (dst_allfrag(xfrm_dst_path(&rt->dst)))
> 

Given that the (mtu < IPV6_MIN_MTU) condition is unlikely, I would reorder to :

if (mtu < IPV6_MIN_MTU && !(rt->dst.flags & DST_XFRM_TUNNEL))
	return -EINVAL;

The fast path should not look at dst.flags

^ permalink raw reply

* [PATCH bpf] tools: bpf: handle NULL return in bpf_prog_load_xattr()
From: Jakub Kicinski @ 2018-05-10 17:09 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski

bpf_object__open() can return error pointer as well as NULL.
Fix error handling in bpf_prog_load_xattr() (and indirectly
bpf_prog_load()).

Fixes: 6f6d33f3b3d0 ("bpf: selftests add sockmap tests")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/lib/bpf/libbpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5922443063f0..0f9f06df49bc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2035,7 +2035,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 		return -EINVAL;
 
 	obj = bpf_object__open(attr->file);
-	if (IS_ERR(obj))
+	if (IS_ERR_OR_NULL(obj))
 		return -ENOENT;
 
 	bpf_object__for_each_program(prog, obj) {
-- 
2.17.0

^ permalink raw reply related

* [PATCH] [PATCH net v2] ipv6: remove min MTU check for ipsec tunnels
From: Ashwanth Goli @ 2018-05-10 17:19 UTC (permalink / raw)
  To: netdev, davem; +Cc: pabeni, dsahern, eric.dumazet

With 749439bfac "fix udpv6 sendmsg crash caused by too small MTU"
ipsec tunnels that report a MTU less than IPV6_MIN_MTU are broken
even for packets that are smaller than IPV6_MIN_MTU.

According to rfc2473#section-7.1

    if the original IPv6 packet is equal or  smaller  than  the
    IPv6 minimum link MTU, the tunnel entry-point node
    encapsulates the original packet, and subsequently
    fragments the resulting IPv6 tunnel packet into IPv6
    fragments that do not exceed the Path MTU to the tunnel
    exit-point.

Dropping the MTU check for ipsec tunnel destinations.

Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
---
 net/ipv6/ip6_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2e891d2..c4c3313 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
-	if (mtu < IPV6_MIN_MTU)
+	if (!(rt->dst.flags & DST_XFRM_TUNNEL) && mtu < IPV6_MIN_MTU)
 		return -EINVAL;
 	cork->base.fragsize = mtu;
 	if (dst_allfrag(xfrm_dst_path(&rt->dst)))
-- 
1.9.1

^ permalink raw reply related

* [PATCH bpf-next 0/7] bpf: add perf event reading loop and move samples closer to libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski

Hi!

This series started out as a follow up to the bpftool perf event dumping
patches.

As suggested by Daniel patch 1 makes use of PERF_SAMPLE_TIME to simplify
code and improve accuracy of timestamps.

Remaining patches are trying to move perf event loop into libbpf as
suggested by Alexei.  One user for this new function is bpftool which
links with libbpf nicely, the other, unfortunately, is in samples/bpf.
Remaining patches make samples/bpf link against full libbpf.a (not just
a handful of objects).  Once we have full power of libbpf at our disposal
we can convert some of XDP samples to use libbpf loader instead of
bpf_load.c.  My understanding is that this is the desired direction,
at least for networking code.

Jakub Kicinski (7):
  tools: bpftool: use PERF_SAMPLE_TIME instead of reading the clock
  samples: bpf: rename struct bpf_map_def to avoid conflict with libbpf
  samples: bpf: compile and link against full libbpf
  tools: bpf: move the event reading loop to libbpf
  tools: bpf: improve comments in libbpf.h
  tools: bpf: don't complain about no kernel version for networking code
  samples: bpf: convert some XDP samples from bpf_load to libbpf

 samples/bpf/Makefile                        |  37 +++++--
 samples/bpf/bpf_load.c                      |  10 +-
 samples/bpf/bpf_load.h                      |   4 +-
 samples/bpf/trace_output_user.c             |   6 +-
 samples/bpf/xdp1_user.c                     |  31 ++++--
 samples/bpf/xdp_adjust_tail_user.c          |  36 ++++---
 samples/bpf/xdp_rxq_info_user.c             |  46 ++++++---
 tools/bpf/bpftool/map_perf_ring.c           |  83 ++++-----------
 tools/lib/bpf/Makefile                      |   2 +-
 tools/lib/bpf/libbpf.c                      | 107 ++++++++++++++++++--
 tools/lib/bpf/libbpf.h                      |  61 ++++++-----
 tools/testing/selftests/bpf/Makefile        |   2 +-
 tools/testing/selftests/bpf/test_progs.c    |   6 +-
 tools/testing/selftests/bpf/trace_helpers.c |  87 ++++++----------
 tools/testing/selftests/bpf/trace_helpers.h |  11 +-
 15 files changed, 307 insertions(+), 222 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH bpf-next 1/7] tools: bpftool: use PERF_SAMPLE_TIME instead of reading the clock
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>

Ask the kernel to include sample time in each even instead of
reading the clock.  This is also more accurate because our
clock reading was done when user space would dump the buffer,
not when sample was produced.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/bpf/bpftool/map_perf_ring.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
index c5a2ced8552d..9ae4bb8a2cad 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -39,6 +39,7 @@ struct event_ring_info {
 
 struct perf_event_sample {
 	struct perf_event_header header;
+	u64 time;
 	__u32 size;
 	unsigned char data[];
 };
@@ -57,17 +58,9 @@ print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
 		__u64 id;
 		__u64 lost;
 	} *lost = (void *)e;
-	struct timespec ts;
-
-	if (clock_gettime(CLOCK_MONOTONIC, &ts)) {
-		perror("Can't read clock for timestamp");
-		return;
-	}
 
 	if (json_output) {
 		jsonw_start_object(json_wtr);
-		jsonw_name(json_wtr, "timestamp");
-		jsonw_uint(json_wtr, ts.tv_sec * 1000000000ull + ts.tv_nsec);
 		jsonw_name(json_wtr, "type");
 		jsonw_uint(json_wtr, e->header.type);
 		jsonw_name(json_wtr, "cpu");
@@ -75,6 +68,8 @@ print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
 		jsonw_name(json_wtr, "index");
 		jsonw_uint(json_wtr, ring->key);
 		if (e->header.type == PERF_RECORD_SAMPLE) {
+			jsonw_name(json_wtr, "timestamp");
+			jsonw_uint(json_wtr, e->time);
 			jsonw_name(json_wtr, "data");
 			print_data_json(e->data, e->size);
 		} else if (e->header.type == PERF_RECORD_LOST) {
@@ -89,8 +84,8 @@ print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
 		jsonw_end_object(json_wtr);
 	} else {
 		if (e->header.type == PERF_RECORD_SAMPLE) {
-			printf("== @%ld.%ld CPU: %d index: %d =====\n",
-			       (long)ts.tv_sec, ts.tv_nsec,
+			printf("== @%lld.%09lld CPU: %d index: %d =====\n",
+			       e->time / 1000000000ULL, e->time % 1000000000ULL,
 			       ring->cpu, ring->key);
 			fprint_hex(stdout, e->data, e->size, " ");
 			printf("\n");
@@ -185,7 +180,7 @@ static void perf_event_unmap(void *mem)
 static int bpf_perf_event_open(int map_fd, int key, int cpu)
 {
 	struct perf_event_attr attr = {
-		.sample_type = PERF_SAMPLE_RAW,
+		.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_TIME,
 		.type = PERF_TYPE_SOFTWARE,
 		.config = PERF_COUNT_SW_BPF_OUTPUT,
 	};
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next 2/7] samples: bpf: rename struct bpf_map_def to avoid conflict with libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>

Both tools/lib/bpf/libbpf.h and samples/bpf/bpf_load.h define their
own version of struct bpf_map_def.  The version in bpf_load.h has
more fields.  libbpf does not support inner maps and its definition
of struct bpf_map_def lacks the related fields.  Rename the definition
in bpf_load.h (samples/bpf) to avoid conflicts.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 samples/bpf/bpf_load.c | 10 +++++-----
 samples/bpf/bpf_load.h |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index da9bccfaf391..a6b290de5632 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -420,7 +420,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
 
 	/* Keeping compatible with ELF maps section changes
 	 * ------------------------------------------------
-	 * The program size of struct bpf_map_def is known by loader
+	 * The program size of struct bpf_load_map_def is known by loader
 	 * code, but struct stored in ELF file can be different.
 	 *
 	 * Unfortunately sym[i].st_size is zero.  To calculate the
@@ -429,7 +429,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
 	 * symbols.
 	 */
 	map_sz_elf = data_maps->d_size / nr_maps;
-	map_sz_copy = sizeof(struct bpf_map_def);
+	map_sz_copy = sizeof(struct bpf_load_map_def);
 	if (map_sz_elf < map_sz_copy) {
 		/*
 		 * Backward compat, loading older ELF file with
@@ -448,8 +448,8 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
 
 	/* Memcpy relevant part of ELF maps data to loader maps */
 	for (i = 0; i < nr_maps; i++) {
+		struct bpf_load_map_def *def;
 		unsigned char *addr, *end;
-		struct bpf_map_def *def;
 		const char *map_name;
 		size_t offset;
 
@@ -464,9 +464,9 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
 
 		/* Symbol value is offset into ELF maps section data area */
 		offset = sym[i].st_value;
-		def = (struct bpf_map_def *)(data_maps->d_buf + offset);
+		def = (struct bpf_load_map_def *)(data_maps->d_buf + offset);
 		maps[i].elf_offset = offset;
-		memset(&maps[i].def, 0, sizeof(struct bpf_map_def));
+		memset(&maps[i].def, 0, sizeof(struct bpf_load_map_def));
 		memcpy(&maps[i].def, def, map_sz_copy);
 
 		/* Verify no newer features were requested */
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index 2c3d0b448632..f9da59bca0cc 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -7,7 +7,7 @@
 #define MAX_MAPS 32
 #define MAX_PROGS 32
 
-struct bpf_map_def {
+struct bpf_load_map_def {
 	unsigned int type;
 	unsigned int key_size;
 	unsigned int value_size;
@@ -21,7 +21,7 @@ struct bpf_map_data {
 	int fd;
 	char *name;
 	size_t elf_offset;
-	struct bpf_map_def def;
+	struct bpf_load_map_def def;
 };
 
 typedef void (*fixup_map_cb)(struct bpf_map_data *map, int idx);
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next 3/7] samples: bpf: compile and link against full libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>

samples/bpf currently cherry-picks object files from tools/lib/bpf
to link against.  Just compile the full library and link statically
against it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 samples/bpf/Makefile | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 8e0c7fb6d7cc..5f0bf1367826 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -48,7 +48,7 @@ hostprogs-y += xdp_adjust_tail
 hostprogs-y += xdpsock
 
 # Libbpf dependencies
-LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
+LIBBPF := ../../tools/lib/bpf/libbpf.a
 CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
 TRACE_HELPERS := ../../tools/testing/selftests/bpf/trace_helpers.o
 
@@ -73,10 +73,10 @@ offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o $(TRACE_HELPERS)
 spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o $(TRACE_HELPERS)
 map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o
 test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
-test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
-test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o
-test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o $(CGROUP_HELPERS)
-test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o
+test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o $(LIBBPF)
+test_cgrp2_attach-objs := test_cgrp2_attach.o $(LIBBPF)
+test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(LIBBPF) $(CGROUP_HELPERS)
+test_cgrp2_sock-objs := test_cgrp2_sock.o $(LIBBPF)
 test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
 xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
 # reuse xdp1 source intentionally
@@ -90,7 +90,7 @@ tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o
 lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o
 xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o
 test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o
-per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o
+per_socket_stats_example-objs := cookie_uid_helper_example.o $(LIBBPF)
 xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
 xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
 xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
@@ -162,6 +162,8 @@ HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
 HOSTCFLAGS += -I$(srctree)/tools/perf
 
 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
+HOSTLOADLIBES_test_lru_dist += -lelf
+HOSTLOADLIBES_sock_example += -lelf
 HOSTLOADLIBES_fds_example += -lelf
 HOSTLOADLIBES_sockex1 += -lelf
 HOSTLOADLIBES_sockex2 += -lelf
@@ -173,6 +175,10 @@ HOSTLOADLIBES_tracex4 += -lelf -lrt
 HOSTLOADLIBES_tracex5 += -lelf
 HOSTLOADLIBES_tracex6 += -lelf
 HOSTLOADLIBES_tracex7 += -lelf
+HOSTLOADLIBES_test_cgrp2_array_pin += -lelf
+HOSTLOADLIBES_test_cgrp2_attach += -lelf
+HOSTLOADLIBES_test_cgrp2_attach2 += -lelf
+HOSTLOADLIBES_test_cgrp2_sock += -lelf
 HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
 HOSTLOADLIBES_load_sock_ops += -lelf
 HOSTLOADLIBES_test_probe_write_user += -lelf
@@ -192,6 +198,7 @@ HOSTLOADLIBES_tc_l2_redirect += -l elf
 HOSTLOADLIBES_lwt_len_hist += -l elf
 HOSTLOADLIBES_xdp_tx_iptunnel += -lelf
 HOSTLOADLIBES_test_map_in_map += -lelf
+HOSTLOADLIBES_per_socket_stats_example += -lelf
 HOSTLOADLIBES_xdp_redirect += -lelf
 HOSTLOADLIBES_xdp_redirect_map += -lelf
 HOSTLOADLIBES_xdp_redirect_cpu += -lelf
@@ -222,7 +229,7 @@ all: $(LIBBPF)
 	@rm -f *~
 
 $(LIBBPF): FORCE
-	$(MAKE) -C $(dir $@) $(notdir $@)
+	$(MAKE) -C $(dir $@)
 
 $(obj)/syscall_nrs.s:	$(src)/syscall_nrs.c
 	$(call if_changed_dep,cc_s_c)
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next 4/7] tools: bpf: move the event reading loop to libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>

There are two copies of event reading loop - in bpftool and
trace_helpers "library".  Consolidate them and move the code
to libbpf.  Return codes from trace_helpers are kept, but
renamed to include LIBBPF prefix.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 samples/bpf/Makefile                        |  8 ++
 samples/bpf/trace_output_user.c             |  6 +-
 tools/bpf/bpftool/map_perf_ring.c           | 66 ++++------------
 tools/lib/bpf/Makefile                      |  2 +-
 tools/lib/bpf/libbpf.c                      | 61 +++++++++++++++
 tools/lib/bpf/libbpf.h                      | 13 +++
 tools/testing/selftests/bpf/Makefile        |  2 +-
 tools/testing/selftests/bpf/test_progs.c    |  6 +-
 tools/testing/selftests/bpf/trace_helpers.c | 87 +++++++--------------
 tools/testing/selftests/bpf/trace_helpers.h | 11 +--
 10 files changed, 139 insertions(+), 123 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 5f0bf1367826..4f99de93d4db 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -162,6 +162,14 @@ HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
 HOSTCFLAGS += -I$(srctree)/tools/perf
 
 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
+HOSTCFLAGS_trace_helpers.o += -I$(srctree)/tools/lib/bpf/
+
+HOSTCFLAGS_trace_output_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_offwaketime_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_spintest_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_trace_event_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_sampleip_user.o += -I$(srctree)/tools/lib/bpf/
+
 HOSTLOADLIBES_test_lru_dist += -lelf
 HOSTLOADLIBES_sock_example += -lelf
 HOSTLOADLIBES_fds_example += -lelf
diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
index 5e78c2ecd08d..da98be721001 100644
--- a/samples/bpf/trace_output_user.c
+++ b/samples/bpf/trace_output_user.c
@@ -48,7 +48,7 @@ static int print_bpf_output(void *data, int size)
 	if (e->cookie != 0x12345678) {
 		printf("BUG pid %llx cookie %llx sized %d\n",
 		       e->pid, e->cookie, size);
-		return PERF_EVENT_ERROR;
+		return LIBBPF_PERF_EVENT_ERROR;
 	}
 
 	cnt++;
@@ -56,10 +56,10 @@ static int print_bpf_output(void *data, int size)
 	if (cnt == MAX_CNT) {
 		printf("recv %lld events per sec\n",
 		       MAX_CNT * 1000000000ll / (time_get_ns() - start_time));
-		return PERF_EVENT_DONE;
+		return LIBBPF_PERF_EVENT_DONE;
 	}
 
-	return PERF_EVENT_CONT;
+	return LIBBPF_PERF_EVENT_CONT;
 }
 
 static void test_bpf_perf_event(void)
diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
index 9ae4bb8a2cad..1832100d1b27 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -50,14 +50,15 @@ static void int_exit(int signo)
 	stop = true;
 }
 
-static void
-print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
+static enum bpf_perf_event_ret print_bpf_output(void *event, void *priv)
 {
+	struct event_ring_info *ring = priv;
+	struct perf_event_sample *e = event;
 	struct {
 		struct perf_event_header header;
 		__u64 id;
 		__u64 lost;
-	} *lost = (void *)e;
+	} *lost = event;
 
 	if (json_output) {
 		jsonw_start_object(json_wtr);
@@ -96,60 +97,23 @@ print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
 			       e->header.type, e->header.size);
 		}
 	}
+
+	return LIBBPF_PERF_EVENT_CONT;
 }
 
 static void
 perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len)
 {
-	volatile struct perf_event_mmap_page *header = ring->mem;
-	__u64 buffer_size = MMAP_PAGE_CNT * get_page_size();
-	__u64 data_tail = header->data_tail;
-	__u64 data_head = header->data_head;
-	void *base, *begin, *end;
-
-	asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
-	if (data_head == data_tail)
-		return;
-
-	base = ((char *)header) + get_page_size();
-
-	begin = base + data_tail % buffer_size;
-	end = base + data_head % buffer_size;
-
-	while (begin != end) {
-		struct perf_event_sample *e;
-
-		e = begin;
-		if (begin + e->header.size > base + buffer_size) {
-			long len = base + buffer_size - begin;
-
-			if (*buf_len < e->header.size) {
-				free(*buf);
-				*buf = malloc(e->header.size);
-				if (!*buf) {
-					fprintf(stderr,
-						"can't allocate memory");
-					stop = true;
-					return;
-				}
-				*buf_len = e->header.size;
-			}
-
-			memcpy(*buf, begin, len);
-			memcpy(*buf + len, base, e->header.size - len);
-			e = (void *)*buf;
-			begin = base + e->header.size - len;
-		} else if (begin + e->header.size == base + buffer_size) {
-			begin = base;
-		} else {
-			begin += e->header.size;
-		}
-
-		print_bpf_output(ring, e);
+	enum bpf_perf_event_ret ret;
+
+	ret = bpf_perf_event_read_simple(ring->mem,
+					 MMAP_PAGE_CNT * get_page_size(),
+					 get_page_size(), buf, buf_len,
+					 print_bpf_output, ring);
+	if (ret != LIBBPF_PERF_EVENT_CONT) {
+		fprintf(stderr, "perf read loop failed with %d\n", ret);
+		stop = true;
 	}
-
-	__sync_synchronize(); /* smp_mb() */
-	header->data_tail = data_head;
 }
 
 static int perf_mmap_size(void)
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index e6d5f8d1477f..f3fab4af4260 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -69,7 +69,7 @@ FEATURE_USER = .libbpf
 FEATURE_TESTS = libelf libelf-getphdrnum libelf-mmap bpf
 FEATURE_DISPLAY = libelf bpf
 
-INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi
+INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi -I$(srctree)/tools/perf
 FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES)
 
 check_feat := 1
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7bcdca13083a..ce96f1fe3f37 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -31,6 +31,7 @@
 #include <unistd.h>
 #include <fcntl.h>
 #include <errno.h>
+#include <perf-sys.h>
 #include <asm/unistd.h>
 #include <linux/err.h>
 #include <linux/kernel.h>
@@ -2210,3 +2211,63 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 	*prog_fd = bpf_program__fd(first_prog);
 	return 0;
 }
+
+enum bpf_perf_event_ret
+bpf_perf_event_read_simple(void *mem, unsigned long size,
+			   unsigned long page_size, void **buf, size_t *buf_len,
+			   bpf_perf_event_print_t fn, void *priv)
+{
+	volatile struct perf_event_mmap_page *header = mem;
+	__u64 data_tail = header->data_tail;
+	__u64 data_head = header->data_head;
+	void *base, *begin, *end;
+	int ret;
+
+	asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
+	if (data_head == data_tail)
+		return LIBBPF_PERF_EVENT_CONT;
+
+	base = ((char *)header) + page_size;
+
+	begin = base + data_tail % size;
+	end = base + data_head % size;
+
+	while (begin != end) {
+		struct perf_event_header *ehdr;
+
+		ehdr = begin;
+		if (begin + ehdr->size > base + size) {
+			long len = base + size - begin;
+
+			if (*buf_len < ehdr->size) {
+				free(*buf);
+				*buf = malloc(ehdr->size);
+				if (!*buf) {
+					ret = LIBBPF_PERF_EVENT_ERROR;
+					break;
+				}
+				*buf_len = ehdr->size;
+			}
+
+			memcpy(*buf, begin, len);
+			memcpy(*buf + len, base, ehdr->size - len);
+			ehdr = (void *)*buf;
+			begin = base + ehdr->size - len;
+		} else if (begin + ehdr->size == base + size) {
+			begin = base;
+		} else {
+			begin += ehdr->size;
+		}
+
+		ret = fn(ehdr, priv);
+		if (ret != LIBBPF_PERF_EVENT_CONT)
+			break;
+
+		data_tail += ehdr->size;
+	}
+
+	__sync_synchronize(); /* smp_mb() */
+	header->data_tail = data_tail;
+
+	return ret;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 197f9ce2248c..ce681097584e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -267,4 +267,17 @@ int bpf_prog_load(const char *file, enum bpf_prog_type type,
 		  struct bpf_object **pobj, int *prog_fd);
 
 int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+
+enum bpf_perf_event_ret {
+	LIBBPF_PERF_EVENT_DONE	= 0,
+	LIBBPF_PERF_EVENT_ERROR	= -1,
+	LIBBPF_PERF_EVENT_CONT	= -2,
+};
+
+typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(void *event,
+							  void *priv);
+int bpf_perf_event_read_simple(void *mem, unsigned long size,
+			       unsigned long page_size,
+			       void **buf, size_t *buf_len,
+			       bpf_perf_event_print_t fn, void *priv);
 #endif
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 9d762184b805..eac128afe15a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -10,7 +10,7 @@ ifneq ($(wildcard $(GENHDR)),)
   GENFLAGS := -DHAVE_GENHDR
 endif
 
-CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
+CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(BPFDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
 LDLIBS += -lcap -lelf -lrt -lpthread
 
 TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index ed197eef1cfc..f7731973ec68 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -1337,12 +1337,12 @@ static int get_stack_print_output(void *data, int size)
 			good_user_stack = true;
 	}
 	if (!good_kern_stack || !good_user_stack)
-		return PERF_EVENT_ERROR;
+		return LIBBPF_PERF_EVENT_ERROR;
 
 	if (cnt == MAX_CNT_RAWTP)
-		return PERF_EVENT_DONE;
+		return LIBBPF_PERF_EVENT_DONE;
 
-	return PERF_EVENT_CONT;
+	return LIBBPF_PERF_EVENT_CONT;
 }
 
 static void test_get_stack_raw_tp(void)
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index ad025bd75f1c..8fb4fe8686e4 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -74,7 +74,7 @@ struct ksym *ksym_search(long key)
 
 static int page_size;
 static int page_cnt = 8;
-static volatile struct perf_event_mmap_page *header;
+static struct perf_event_mmap_page *header;
 
 int perf_event_mmap(int fd)
 {
@@ -107,74 +107,47 @@ struct perf_event_sample {
 	char data[];
 };
 
-static int perf_event_read(perf_event_print_fn fn)
+static enum bpf_perf_event_ret bpf_perf_event_print(void *event, void *priv)
 {
-	__u64 data_tail = header->data_tail;
-	__u64 data_head = header->data_head;
-	__u64 buffer_size = page_cnt * page_size;
-	void *base, *begin, *end;
-	char buf[256];
+	struct perf_event_sample *e = event;
+	perf_event_print_fn fn = priv;
 	int ret;
 
-	asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
-	if (data_head == data_tail)
-		return PERF_EVENT_CONT;
-
-	base = ((char *)header) + page_size;
-
-	begin = base + data_tail % buffer_size;
-	end = base + data_head % buffer_size;
-
-	while (begin != end) {
-		struct perf_event_sample *e;
-
-		e = begin;
-		if (begin + e->header.size > base + buffer_size) {
-			long len = base + buffer_size - begin;
-
-			assert(len < e->header.size);
-			memcpy(buf, begin, len);
-			memcpy(buf + len, base, e->header.size - len);
-			e = (void *) buf;
-			begin = base + e->header.size - len;
-		} else if (begin + e->header.size == base + buffer_size) {
-			begin = base;
-		} else {
-			begin += e->header.size;
-		}
-
-		if (e->header.type == PERF_RECORD_SAMPLE) {
-			ret = fn(e->data, e->size);
-			if (ret != PERF_EVENT_CONT)
-				return ret;
-		} else if (e->header.type == PERF_RECORD_LOST) {
-			struct {
-				struct perf_event_header header;
-				__u64 id;
-				__u64 lost;
-			} *lost = (void *) e;
-			printf("lost %lld events\n", lost->lost);
-		} else {
-			printf("unknown event type=%d size=%d\n",
-			       e->header.type, e->header.size);
-		}
+	if (e->header.type == PERF_RECORD_SAMPLE) {
+		ret = fn(e->data, e->size);
+		if (ret != LIBBPF_PERF_EVENT_CONT)
+			return ret;
+	} else if (e->header.type == PERF_RECORD_LOST) {
+		struct {
+			struct perf_event_header header;
+			__u64 id;
+			__u64 lost;
+		} *lost = (void *) e;
+		printf("lost %lld events\n", lost->lost);
+	} else {
+		printf("unknown event type=%d size=%d\n",
+		       e->header.type, e->header.size);
 	}
 
-	__sync_synchronize(); /* smp_mb() */
-	header->data_tail = data_head;
-	return PERF_EVENT_CONT;
+	return LIBBPF_PERF_EVENT_CONT;
 }
 
 int perf_event_poller(int fd, perf_event_print_fn output_fn)
 {
-	int ret;
+	enum bpf_perf_event_ret ret;
+	void *buf = NULL;
+	size_t len = 0;
 
 	for (;;) {
 		perf_event_poll(fd);
-		ret = perf_event_read(output_fn);
-		if (ret != PERF_EVENT_CONT)
-			return ret;
+		ret = bpf_perf_event_read_simple(header, page_cnt * page_size,
+						 page_size, &buf, &len,
+						 bpf_perf_event_print,
+						 output_fn);
+		if (ret != LIBBPF_PERF_EVENT_CONT)
+			break;
 	}
+	free(buf);
 
-	return PERF_EVENT_DONE;
+	return ret;
 }
diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h
index fe3eefd21e86..36d90e3b1ea9 100644
--- a/tools/testing/selftests/bpf/trace_helpers.h
+++ b/tools/testing/selftests/bpf/trace_helpers.h
@@ -2,6 +2,8 @@
 #ifndef __TRACE_HELPER_H
 #define __TRACE_HELPER_H
 
+#include <libbpf.h>
+
 struct ksym {
 	long addr;
 	char *name;
@@ -10,14 +12,9 @@ struct ksym {
 int load_kallsyms(void);
 struct ksym *ksym_search(long key);
 
-typedef int (*perf_event_print_fn)(void *data, int size);
-
-/* return code for perf_event_print_fn */
-#define PERF_EVENT_DONE		0
-#define PERF_EVENT_ERROR	-1
-#define PERF_EVENT_CONT		-2
+typedef enum bpf_perf_event_ret (*perf_event_print_fn)(void *data, int size);
 
 int perf_event_mmap(int fd);
-/* return PERF_EVENT_DONE or PERF_EVENT_ERROR */
+/* return LIBBPF_PERF_EVENT_DONE or LIBBPF_PERF_EVENT_ERROR */
 int perf_event_poller(int fd, perf_event_print_fn output_fn);
 #endif
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next 5/7] tools: bpf: improve comments in libbpf.h
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>

Fix spelling mistakes, improve and clarify the language of comments
in libbpf.h.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/lib/bpf/libbpf.h | 48 +++++++++++++++++++++---------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index ce681097584e..4574b9563278 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -52,8 +52,8 @@ enum libbpf_errno {
 int libbpf_strerror(int err, char *buf, size_t size);
 
 /*
- * In include/linux/compiler-gcc.h, __printf is defined. However
- * it should be better if libbpf.h doesn't depend on Linux header file.
+ * __printf is defined in include/linux/compiler-gcc.h. However,
+ * it would be better if libbpf.h didn't depend on Linux header files.
  * So instead of __printf, here we use gcc attribute directly.
  */
 typedef int (*libbpf_print_fn_t)(const char *, ...)
@@ -92,7 +92,7 @@ int bpf_object__set_priv(struct bpf_object *obj, void *priv,
 			 bpf_object_clear_priv_t clear_priv);
 void *bpf_object__priv(struct bpf_object *prog);
 
-/* Accessors of bpf_program. */
+/* Accessors of bpf_program */
 struct bpf_program;
 struct bpf_program *bpf_program__next(struct bpf_program *prog,
 				      struct bpf_object *obj);
@@ -121,28 +121,28 @@ struct bpf_insn;
 
 /*
  * Libbpf allows callers to adjust BPF programs before being loaded
- * into kernel. One program in an object file can be transform into
- * multiple variants to be attached to different code.
+ * into kernel. One program in an object file can be transformed into
+ * multiple variants to be attached to different hooks.
  *
  * bpf_program_prep_t, bpf_program__set_prep and bpf_program__nth_fd
- * are APIs for this propose.
+ * form an API for this purpose.
  *
  * - bpf_program_prep_t:
- *   It defines 'preprocessor', which is a caller defined function
+ *   Defines a 'preprocessor', which is a caller defined function
  *   passed to libbpf through bpf_program__set_prep(), and will be
  *   called before program is loaded. The processor should adjust
- *   the program one time for each instances according to the number
+ *   the program one time for each instance according to the instance id
  *   passed to it.
  *
  * - bpf_program__set_prep:
- *   Attachs a preprocessor to a BPF program. The number of instances
- *   whould be created is also passed through this function.
+ *   Attaches a preprocessor to a BPF program. The number of instances
+ *   that should be created is also passed through this function.
  *
  * - bpf_program__nth_fd:
- *   After the program is loaded, get resuling fds from bpf program for
- *   each instances.
+ *   After the program is loaded, get resulting FD of a given instance
+ *   of the BPF program.
  *
- * If bpf_program__set_prep() is not used, the program whould be loaded
+ * If bpf_program__set_prep() is not used, the program would be loaded
  * without adjustment during bpf_object__load(). The program has only
  * one instance. In this case bpf_program__fd(prog) is equal to
  * bpf_program__nth_fd(prog, 0).
@@ -156,7 +156,7 @@ struct bpf_prog_prep_result {
 	struct bpf_insn *new_insn_ptr;
 	int new_insn_cnt;
 
-	/* If not NULL, result fd is set to it */
+	/* If not NULL, result FD is written to it. */
 	int *pfd;
 };
 
@@ -169,8 +169,8 @@ struct bpf_prog_prep_result {
  *  - res:	Output parameter, result of transformation.
  *
  * Return value:
- *  - Zero: pre-processing success.
- *  - Non-zero: pre-processing, stop loading.
+ *  - Zero:	pre-processing success.
+ *  - Non-zero:	pre-processing error, stop loading.
  */
 typedef int (*bpf_program_prep_t)(struct bpf_program *prog, int n,
 				  struct bpf_insn *insns, int insns_cnt,
@@ -182,7 +182,7 @@ int bpf_program__set_prep(struct bpf_program *prog, int nr_instance,
 int bpf_program__nth_fd(struct bpf_program *prog, int n);
 
 /*
- * Adjust type of bpf program. Default is kprobe.
+ * Adjust type of BPF program. Default is kprobe.
  */
 int bpf_program__set_socket_filter(struct bpf_program *prog);
 int bpf_program__set_tracepoint(struct bpf_program *prog);
@@ -206,10 +206,10 @@ bool bpf_program__is_xdp(struct bpf_program *prog);
 bool bpf_program__is_perf_event(struct bpf_program *prog);
 
 /*
- * We don't need __attribute__((packed)) now since it is
- * unnecessary for 'bpf_map_def' because they are all aligned.
- * In addition, using it will trigger -Wpacked warning message,
- * and will be treated as an error due to -Werror.
+ * No need for __attribute__((packed)), all members of 'bpf_map_def'
+ * are all aligned.  In addition, using __attribute__((packed))
+ * would trigger a -Wpacked warning message, and lead to an error
+ * if -Werror is set.
  */
 struct bpf_map_def {
 	unsigned int type;
@@ -220,8 +220,8 @@ struct bpf_map_def {
 };
 
 /*
- * There is another 'struct bpf_map' in include/linux/map.h. However,
- * it is not a uapi header so no need to consider name clash.
+ * The 'struct bpf_map' in include/linux/bpf.h is internal to the kernel,
+ * so no need to worry about a name clash.
  */
 struct bpf_map;
 struct bpf_map *
@@ -229,7 +229,7 @@ bpf_object__find_map_by_name(struct bpf_object *obj, const char *name);
 
 /*
  * Get bpf_map through the offset of corresponding struct bpf_map_def
- * in the bpf object file.
+ * in the BPF object file.
  */
 struct bpf_map *
 bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset);
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next 6/7] tools: bpf: don't complain about no kernel version for networking code
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: oss-drivers, netdev, Jakub Kicinski, Stephen Rothwell
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>

BPF programs only have to specify the target kernel version for
tracing related hooks, in networking world that requirement does
not really apply.  Loosen the checks in libbpf to reflect that.

bpf_object__open() users will continue to see the error for backward
compatibility (and because prog_type is not available there).

Error code for NULL file name is changed from ENOENT to EINVAL,
as it seems more appropriate, hopefully, that's an OK change.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
Merge note:

This will conflict with "tools: bpf: handle NULL return in 
bpf_prog_load_xattr()" the fix from bpf tree should be completely
discarded, we don't have to check for NULL any more.

CC: Stephen Rothwell <sfr@canb.auug.org.au>
---
 tools/lib/bpf/libbpf.c | 46 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ce96f1fe3f37..df54c4c9e48a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1438,9 +1438,37 @@ bpf_object__load_progs(struct bpf_object *obj)
 	return 0;
 }
 
-static int bpf_object__validate(struct bpf_object *obj)
+static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
+{
+	switch (type) {
+	case BPF_PROG_TYPE_SOCKET_FILTER:
+	case BPF_PROG_TYPE_SCHED_CLS:
+	case BPF_PROG_TYPE_SCHED_ACT:
+	case BPF_PROG_TYPE_XDP:
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_LWT_IN:
+	case BPF_PROG_TYPE_LWT_OUT:
+	case BPF_PROG_TYPE_LWT_XMIT:
+	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_SK_SKB:
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_SK_MSG:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+		return false;
+	case BPF_PROG_TYPE_UNSPEC:
+	case BPF_PROG_TYPE_KPROBE:
+	case BPF_PROG_TYPE_TRACEPOINT:
+	case BPF_PROG_TYPE_PERF_EVENT:
+	case BPF_PROG_TYPE_RAW_TRACEPOINT:
+	default:
+		return true;
+	}
+}
+
+static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
 {
-	if (obj->kern_version == 0) {
+	if (needs_kver && obj->kern_version == 0) {
 		pr_warning("%s doesn't provide kernel version\n",
 			   obj->path);
 		return -LIBBPF_ERRNO__KVERSION;
@@ -1449,7 +1477,8 @@ static int bpf_object__validate(struct bpf_object *obj)
 }
 
 static struct bpf_object *
-__bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
+__bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
+		   bool needs_kver)
 {
 	struct bpf_object *obj;
 	int err;
@@ -1467,7 +1496,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
 	CHECK_ERR(bpf_object__check_endianness(obj), err, out);
 	CHECK_ERR(bpf_object__elf_collect(obj), err, out);
 	CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
-	CHECK_ERR(bpf_object__validate(obj), err, out);
+	CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
 
 	bpf_object__elf_finish(obj);
 	return obj;
@@ -1484,7 +1513,7 @@ struct bpf_object *bpf_object__open(const char *path)
 
 	pr_debug("loading %s\n", path);
 
-	return __bpf_object__open(path, NULL, 0);
+	return __bpf_object__open(path, NULL, 0, true);
 }
 
 struct bpf_object *bpf_object__open_buffer(void *obj_buf,
@@ -1507,7 +1536,7 @@ struct bpf_object *bpf_object__open_buffer(void *obj_buf,
 	pr_debug("loading object '%s' from buffer\n",
 		 name);
 
-	return __bpf_object__open(name, obj_buf, obj_buf_sz);
+	return __bpf_object__open(name, obj_buf, obj_buf_sz, true);
 }
 
 int bpf_object__unload(struct bpf_object *obj)
@@ -2164,8 +2193,11 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 
 	if (!attr)
 		return -EINVAL;
+	if (!attr->file)
+		return -EINVAL;
 
-	obj = bpf_object__open(attr->file);
+	obj = __bpf_object__open(attr->file, NULL, 0,
+				 bpf_prog_type__needs_kver(attr->prog_type));
 	if (IS_ERR(obj))
 		return -ENOENT;
 
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next 7/7] samples: bpf: convert some XDP samples from bpf_load to libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>

Now that we can use full powers of libbpf in BPF samples, we
should perhaps make the simplest XDP programs not depend on
bpf_load helpers.  This way newcomers will be exposed to the
recommended library from the start.

Use of bpf_prog_load_xattr() will also make it trivial to later
on request offload of the programs by simply adding ifindex to
the xattr.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 samples/bpf/Makefile               |  8 +++---
 samples/bpf/xdp1_user.c            | 31 +++++++++++++-------
 samples/bpf/xdp_adjust_tail_user.c | 36 ++++++++++++++---------
 samples/bpf/xdp_rxq_info_user.c    | 46 ++++++++++++++++++++----------
 4 files changed, 78 insertions(+), 43 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 4f99de93d4db..7181b23226a7 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -78,9 +78,9 @@ test_cgrp2_attach-objs := test_cgrp2_attach.o $(LIBBPF)
 test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(LIBBPF) $(CGROUP_HELPERS)
 test_cgrp2_sock-objs := test_cgrp2_sock.o $(LIBBPF)
 test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
-xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+xdp1-objs := xdp1_user.o $(LIBBPF)
 # reuse xdp1 source intentionally
-xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+xdp2-objs := xdp1_user.o $(LIBBPF)
 xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o
 test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \
 				       test_current_task_under_cgroup_user.o
@@ -95,10 +95,10 @@ xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
 xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
 xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
-xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
+xdp_rxq_info-objs := xdp_rxq_info_user.o $(LIBBPF)
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
 cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
-xdp_adjust_tail-objs := bpf_load.o $(LIBBPF) xdp_adjust_tail_user.o
+xdp_adjust_tail-objs := xdp_adjust_tail_user.o $(LIBBPF)
 xdpsock-objs := bpf_load.o $(LIBBPF) xdpsock_user.o
 
 # Tell kbuild to always build the programs
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index b901ee2b3336..b02c531510ed 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -16,9 +16,9 @@
 #include <libgen.h>
 #include <sys/resource.h>
 
-#include "bpf_load.h"
 #include "bpf_util.h"
-#include "libbpf.h"
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
 
 static int ifindex;
 static __u32 xdp_flags;
@@ -31,7 +31,7 @@ static void int_exit(int sig)
 
 /* simple per-protocol drop counter
  */
-static void poll_stats(int interval)
+static void poll_stats(int map_fd, int interval)
 {
 	unsigned int nr_cpus = bpf_num_possible_cpus();
 	const unsigned int nr_keys = 256;
@@ -47,7 +47,7 @@ static void poll_stats(int interval)
 		for (key = 0; key < nr_keys; key++) {
 			__u64 sum = 0;
 
-			assert(bpf_map_lookup_elem(map_fd[0], &key, values) == 0);
+			assert(bpf_map_lookup_elem(map_fd, &key, values) == 0);
 			for (i = 0; i < nr_cpus; i++)
 				sum += (values[i] - prev[key][i]);
 			if (sum)
@@ -71,9 +71,14 @@ static void usage(const char *prog)
 int main(int argc, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type	= BPF_PROG_TYPE_XDP,
+	};
 	const char *optstr = "SN";
+	int prog_fd, map_fd, opt;
+	struct bpf_object *obj;
+	struct bpf_map *map;
 	char filename[256];
-	int opt;
 
 	while ((opt = getopt(argc, argv, optstr)) != -1) {
 		switch (opt) {
@@ -102,13 +107,19 @@ int main(int argc, char **argv)
 	ifindex = strtoul(argv[optind], NULL, 0);
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	prog_load_attr.file = filename;
 
-	if (load_bpf_file(filename)) {
-		printf("%s", bpf_log_buf);
+	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+		return 1;
+
+	map = bpf_map__next(NULL, obj);
+	if (!map) {
+		printf("finding a map in obj file failed\n");
 		return 1;
 	}
+	map_fd = bpf_map__fd(map);
 
-	if (!prog_fd[0]) {
+	if (!prog_fd) {
 		printf("load_bpf_file: %s\n", strerror(errno));
 		return 1;
 	}
@@ -116,12 +127,12 @@ int main(int argc, char **argv)
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
 
-	poll_stats(2);
+	poll_stats(map_fd, 2);
 
 	return 0;
 }
diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
index f621a541b574..3042ce37dae8 100644
--- a/samples/bpf/xdp_adjust_tail_user.c
+++ b/samples/bpf/xdp_adjust_tail_user.c
@@ -18,9 +18,8 @@
 #include <netinet/ether.h>
 #include <unistd.h>
 #include <time.h>
-#include "bpf_load.h"
-#include "libbpf.h"
-#include "bpf_util.h"
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
 
 #define STATS_INTERVAL_S 2U
 
@@ -36,7 +35,7 @@ static void int_exit(int sig)
 
 /* simple "icmp packet too big sent" counter
  */
-static void poll_stats(unsigned int kill_after_s)
+static void poll_stats(unsigned int map_fd, unsigned int kill_after_s)
 {
 	time_t started_at = time(NULL);
 	__u64 value = 0;
@@ -46,7 +45,7 @@ static void poll_stats(unsigned int kill_after_s)
 	while (!kill_after_s || time(NULL) - started_at <= kill_after_s) {
 		sleep(STATS_INTERVAL_S);
 
-		assert(bpf_map_lookup_elem(map_fd[0], &key, &value) == 0);
+		assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0);
 
 		printf("icmp \"packet too big\" sent: %10llu pkts\n", value);
 	}
@@ -66,14 +65,17 @@ static void usage(const char *cmd)
 
 int main(int argc, char **argv)
 {
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type	= BPF_PROG_TYPE_XDP,
+	};
 	unsigned char opt_flags[256] = {};
 	unsigned int kill_after_s = 0;
 	const char *optstr = "i:T:SNh";
-	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	int i, prog_fd, map_fd, opt;
+	struct bpf_object *obj;
+	struct bpf_map *map;
 	char filename[256];
-	int opt;
-	int i;
-
 
 	for (i = 0; i < strlen(optstr); i++)
 		if (optstr[i] != 'h' && 'a' <= optstr[i] && optstr[i] <= 'z')
@@ -115,13 +117,19 @@ int main(int argc, char **argv)
 	}
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	prog_load_attr.file = filename;
+
+	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+		return 1;
 
-	if (load_bpf_file(filename)) {
-		printf("%s", bpf_log_buf);
+	map = bpf_map__next(NULL, obj);
+	if (!map) {
+		printf("finding a map in obj file failed\n");
 		return 1;
 	}
+	map_fd = bpf_map__fd(map);
 
-	if (!prog_fd[0]) {
+	if (!prog_fd) {
 		printf("load_bpf_file: %s\n", strerror(errno));
 		return 1;
 	}
@@ -129,12 +137,12 @@ int main(int argc, char **argv)
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
 
-	poll_stats(kill_after_s);
+	poll_stats(map_fd, kill_after_s);
 
 	bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
 
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index 478d95412de4..e4e9ba52bff0 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -22,8 +22,8 @@ static const char *__doc__ = " XDP RX-queue info extract example\n\n"
 #include <arpa/inet.h>
 #include <linux/if_link.h>
 
-#include "libbpf.h"
-#include "bpf_load.h"
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
 #include "bpf_util.h"
 
 static int ifindex = -1;
@@ -32,6 +32,9 @@ static char *ifname;
 
 static __u32 xdp_flags;
 
+static struct bpf_map *stats_global_map;
+static struct bpf_map *rx_queue_index_map;
+
 /* Exit return codes */
 #define EXIT_OK		0
 #define EXIT_FAIL		1
@@ -174,7 +177,7 @@ static struct datarec *alloc_record_per_cpu(void)
 
 static struct record *alloc_record_per_rxq(void)
 {
-	unsigned int nr_rxqs = map_data[2].def.max_entries;
+	unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
 	struct record *array;
 	size_t size;
 
@@ -190,7 +193,7 @@ static struct record *alloc_record_per_rxq(void)
 
 static struct stats_record *alloc_stats_record(void)
 {
-	unsigned int nr_rxqs = map_data[2].def.max_entries;
+	unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
 	struct stats_record *rec;
 	int i;
 
@@ -210,7 +213,7 @@ static struct stats_record *alloc_stats_record(void)
 
 static void free_stats_record(struct stats_record *r)
 {
-	unsigned int nr_rxqs = map_data[2].def.max_entries;
+	unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
 	int i;
 
 	for (i = 0; i < nr_rxqs; i++)
@@ -254,11 +257,11 @@ static void stats_collect(struct stats_record *rec)
 {
 	int fd, i, max_rxqs;
 
-	fd = map_data[1].fd; /* map: stats_global_map */
+	fd = bpf_map__fd(stats_global_map);
 	map_collect_percpu(fd, 0, &rec->stats);
 
-	fd = map_data[2].fd; /* map: rx_queue_index_map */
-	max_rxqs = map_data[2].def.max_entries;
+	fd = bpf_map__fd(rx_queue_index_map);
+	max_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
 	for (i = 0; i < max_rxqs; i++)
 		map_collect_percpu(fd, i, &rec->rxq[i]);
 }
@@ -304,8 +307,8 @@ static void stats_print(struct stats_record *stats_rec,
 			struct stats_record *stats_prev,
 			int action)
 {
+	unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
 	unsigned int nr_cpus = bpf_num_possible_cpus();
-	unsigned int nr_rxqs = map_data[2].def.max_entries;
 	double pps = 0, err = 0;
 	struct record *rec, *prev;
 	double t;
@@ -419,31 +422,44 @@ static void stats_poll(int interval, int action)
 int main(int argc, char **argv)
 {
 	struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type	= BPF_PROG_TYPE_XDP,
+	};
+	int prog_fd, map_fd, opt, err;
 	bool use_separators = true;
 	struct config cfg = { 0 };
+	struct bpf_object *obj;
+	struct bpf_map *map;
 	char filename[256];
 	int longindex = 0;
 	int interval = 2;
 	__u32 key = 0;
-	int opt, err;
 
 	char action_str_buf[XDP_ACTION_MAX_STRLEN + 1 /* for \0 */] = { 0 };
 	int action = XDP_PASS; /* Default action */
 	char *action_str = NULL;
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	prog_load_attr.file = filename;
 
 	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
 		perror("setrlimit(RLIMIT_MEMLOCK)");
 		return 1;
 	}
 
-	if (load_bpf_file(filename)) {
-		fprintf(stderr, "ERR in load_bpf_file(): %s", bpf_log_buf);
+	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+		return EXIT_FAIL;
+
+	map = bpf_map__next(NULL, obj);
+	stats_global_map = bpf_map__next(map, obj);
+	rx_queue_index_map = bpf_map__next(stats_global_map, obj);
+	if (!map || !stats_global_map || !rx_queue_index_map) {
+		printf("finding a map in obj file failed\n");
 		return EXIT_FAIL;
 	}
+	map_fd = bpf_map__fd(map);
 
-	if (!prog_fd[0]) {
+	if (!prog_fd) {
 		fprintf(stderr, "ERR: load_bpf_file: %s\n", strerror(errno));
 		return EXIT_FAIL;
 	}
@@ -512,7 +528,7 @@ int main(int argc, char **argv)
 		setlocale(LC_NUMERIC, "en_US");
 
 	/* User-side setup ifindex in config_map */
-	err = bpf_map_update_elem(map_fd[0], &key, &cfg, 0);
+	err = bpf_map_update_elem(map_fd, &key, &cfg, 0);
 	if (err) {
 		fprintf(stderr, "Store config failed (err:%d)\n", err);
 		exit(EXIT_FAIL_BPF);
@@ -521,7 +537,7 @@ int main(int argc, char **argv)
 	/* Remove XDP program when program is interrupted */
 	signal(SIGINT, int_exit);
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
 		fprintf(stderr, "link set xdp fd failed\n");
 		return EXIT_FAIL_XDP;
 	}
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next] net:sched: add gkprio scheduler
From: Cong Wang @ 2018-05-10 17:38 UTC (permalink / raw)
  To: Michel Machado
  Cc: Nishanth Devarajan, Jiri Pirko, Jamal Hadi Salim, David Miller,
	Linux Kernel Network Developers, Cody Doucette
In-Reply-To: <d450d9da-a1d2-4aae-1ed7-edc2d0972119@digirati.com.br>

On Wed, May 9, 2018 at 7:09 AM, Michel Machado <michel@digirati.com.br> wrote:
> On 05/08/2018 10:24 PM, Cong Wang wrote:
>>
>> On Tue, May 8, 2018 at 5:59 AM, Michel Machado <michel@digirati.com.br>
>> wrote:
>>>>>
>>>>> Overall it looks good to me, just one thing below:
>>>>>
>>>>>> +struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
>>>>>> +       .id             =       "gkprio",
>>>>>> +       .priv_size      =       sizeof(struct gkprio_sched_data),
>>>>>> +       .enqueue        =       gkprio_enqueue,
>>>>>> +       .dequeue        =       gkprio_dequeue,
>>>>>> +       .peek           =       qdisc_peek_dequeued,
>>>>>> +       .init           =       gkprio_init,
>>>>>> +       .reset          =       gkprio_reset,
>>>>>> +       .change         =       gkprio_change,
>>>>>> +       .dump           =       gkprio_dump,
>>>>>> +       .destroy        =       gkprio_destroy,
>>>>>> +       .owner          =       THIS_MODULE,
>>>>>> +};
>>>>>
>>>>>
>>>>>
>>>>> You probably want to add Qdisc_class_ops here so that you can
>>>>> dump the stats of each internal queue.
>>>
>>>
>>>
>>> Hi Cong,
>>>
>>>     In the production scenario we are targeting, this priority queue must
>>> be
>>> classless; being classful would only bloat the code for us. I don't see
>>> making this queue classful as a problem per se, but I suggest leaving it
>>> as
>>> a future improvement for when someone can come up with a useful scenario
>>> for
>>> it.
>>
>>
>>
>> Take a look at sch_prio, it is fairly simple since your internal
>> queues are just an array... Per-queue stats are quite useful
>> in production, we definitely want to observe which queues are
>> full which are not.
>>
>
> DSprio cannot add Qdisc_class_ops without a rewrite of other queue
> disciplines, which doesn't seem desirable. Since the method cops->leaf is
> required (see register_qdisc()), we would need to replace the array struct
> sk_buff_head qdiscs[GKPRIO_MAX_PRIORITY] in struct gkprio_sched_data with
> the array struct Qdisc *queues[GKPRIO_MAX_PRIORITY] to be able to return a
> Qdisc in dsprio_leaf(). The problem with this change is that Qdisc does not
> have a method to dequeue from its tail. This new method may not even make
> sense in other queue disciplines. But without this method, gkprio_enqueue()
> cannot drop the lowest priority packet when the queue is full and an
> incoming packet has higher priority.

Sorry for giving you a bad example. Take a look at sch_fq_codel instead,
it returns NULL for ->leaf() and maps its internal flows to classes.

I thought sch_prio uses internal qdiscs, but I was wrong, as you noticed
it actually exposes them to user via classes.

My point is never to make it classful, just want to expose the useful stats,
like how fq_codel dumps its internal flows.


>
> Nevertheless, I see your point on being able to observe the distribution of
> queued packets per priority. A solution for that would be to add the array
> __u32 qlen[GKPRIO_MAX_PRIORITY] in struct tc_gkprio_qopt. This solution even
> avoids adding overhead in the critical paths of DSprio. Do you see a better
> solution?

I believe you can return NULL for ->leaf() and don't need to worry about
->graft() either. ;)


>
> By the way, I've used GKPRIO_MAX_PRIORITY and other names that include
> "gkprio" above to reflect the version 1 of this patch that we are
> discussing. We will rename these identifiers for version 2 of this patch to
> replace "gkprio" with "dsprio".
>

Sounds good.

Thanks.

^ permalink raw reply

* [PATCH] [PATCH net v3] ipv6: remove min MTU check for ipsec tunnels
From: Ashwanth Goli @ 2018-05-10 17:45 UTC (permalink / raw)
  To: netdev, davem; +Cc: pabeni, dsahern, eric.dumazet

With 749439bfac "fix udpv6 sendmsg crash caused by too small MTU"
ipsec tunnels that report a MTU less than IPV6_MIN_MTU are broken
even for packets that are smaller than IPV6_MIN_MTU.

According to rfc2473#section-7.1

    if the original IPv6 packet is equal or  smaller  than  the
    IPv6 minimum link MTU, the tunnel entry-point node
    encapsulates the original packet, and subsequently
    fragments the resulting IPv6 tunnel packet into IPv6
    fragments that do not exceed the Path MTU to the tunnel
    exit-point.

Dropping the MTU check for ipsec tunnel destinations.

Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
---
 net/ipv6/ip6_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2e891d2..ea1ef1b 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
-	if (mtu < IPV6_MIN_MTU)
+	if (mtu < IPV6_MIN_MTU && !(rt->dst.flags & DST_XFRM_TUNNEL))
 		return -EINVAL;
 	cork->base.fragsize = mtu;
 	if (dst_allfrag(xfrm_dst_path(&rt->dst)))
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next 1/1] net sched actions: fix invalid pointer dereferencing if skbedit flags missing
From: Cong Wang @ 2018-05-10 17:54 UTC (permalink / raw)
  To: Roman Mashak
  Cc: David Miller, Linux Kernel Network Developers, kernel,
	Jamal Hadi Salim, Jiri Pirko, Alexander Duyck
In-Reply-To: <1525900723-17625-1-git-send-email-mrv@mojatatu.com>

On Wed, May 9, 2018 at 2:18 PM, Roman Mashak <mrv@mojatatu.com> wrote:
> The caller calls action's ->init() and passes pointer to "struct tc_action *a",
> which later may initialized to point at the existing action, otherwise
> "struct tc_action *a" is still invalid, and therefore dereferencing it is error.

So technically we just need to check if(exists) before calling
tcf_idr_release(), right?


>
> So checking flags should be done as early as possible, before idr lookups.

In theory, this could break the case of binding an existing action.
But in practice it seems nonsense to pass invalid flags in this case
either, so probably this is okay.

BTW, this should be a candidate for -net and possibly -stable too.

^ permalink raw reply

* Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Gianluca Borello @ 2018-05-10 17:57 UTC (permalink / raw)
  To: bp
  Cc: Alexei Starovoitov, peterz, Yonghong Song, mingo, torvalds,
	Alexei Starovoitov, Daniel Borkmann, linux-kernel, x86,
	Linux Networking Development Mailing List, kernel-team, tglx
In-Reply-To: <20180510162028.GB16130@pd.tnic>

On Thu, May 10, 2018 at 9:28 AM Borislav Petkov <bp@alien8.de> wrote:

> As someone already pointed out on IRC, arch/x86/include/asm/cpufeature.h
> is solely a kernel header so nothing but kernel should include it. So
> forget the userspace breakage "argument".

For what is worth, I have the same exact problem in a relatively popular
open source system call tracer, and my attempt to fix the issue from user
space has been:
https://github.com/draios/sysdig/commit/2958eb1d52e047f4b93d1238be803e7c405bdec2

While I can definitely live with that (and I'd be happy to submit a patch
to samples/ following the same approach) and absolutely respect the
technical authority of the reviewers here, it would be nice to recognize
that these changes actually affect users to a certain degree, even if from
a technical point of view they don't break userspace.

 From a practical point of view, BPF is widely used from userspace programs
to access some kernel data structures to gather visibility information, and
even the simplest use case, such as including linux/sched.h to access some
task_struct members, ends up pulling in arch/x86/include/asm/cpufeature.h,
thus (ab)using that file. Adding these quirks definitely increases the
complexity a developer needs to keep in mind in order to take advantage of
a BPF based instrumentation.

Thanks

^ permalink raw reply

* Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Alexei Starovoitov @ 2018-05-10 17:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Yonghong Song, mingo, torvalds, ast, daniel,
	linux-kernel, x86, netdev, kernel-team, Thomas Gleixner
In-Reply-To: <20180510162028.GB16130@pd.tnic>

On Thu, May 10, 2018 at 06:20:28PM +0200, Borislav Petkov wrote:
> On Thu, May 10, 2018 at 08:52:42AM -0700, Alexei Starovoitov wrote:
> > That makes me wonder what happened with "we do not break user space" rule?
> 
> As someone already pointed out on IRC, arch/x86/include/asm/cpufeature.h
> is solely a kernel header so nothing but kernel should include it. So
> forget the userspace breakage "argument".

libbcc is a library. It's not only used by bcc scripts, but by production
services that compile bpf programs with clang.
Without this kernel fix the companies won't be able to upgrade the kernel.
Even if we could somehow hack libbcc and workaround in user space
(which is not possible as already explained), it's a long dependency chain
to recompile and upgrade core services before upgrading the kernel
which makes even theoretical user space fix impractical.
I see no option, but to fix the kernel.
Regardless whether it's called user space breakage or kernel breakage.

^ permalink raw reply

* [PATCH net] ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
From: Andrey Ignatov @ 2018-05-10 17:59 UTC (permalink / raw)
  To: netdev, davem; +Cc: Andrey Ignatov, ast, eric.dumazet, kernel-team

Fix more memory leaks in ip_cmsg_send() callers. Part of them were fixed
earlier in 919483096bfe.

* udp_sendmsg one was there since the beginning when linux sources were
  first added to git;
* ping_v4_sendmsg one was copy/pasted in c319b4d76b9e.

Whenever return happens in udp_sendmsg() or ping_v4_sendmsg() IP options
have to be freed if they were allocated previously.

Add label so that future callers (if any) can use it instead of kfree()
before return that is easy to forget.

Fixes: c319b4d76b9e (net: ipv4: add IPPROTO_ICMP socket kind)
Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 net/ipv4/ping.c | 7 +++++--
 net/ipv4/udp.c  | 7 +++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 05e47d7..56a0106 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -775,8 +775,10 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	ipc.addr = faddr = daddr;
 
 	if (ipc.opt && ipc.opt->opt.srr) {
-		if (!daddr)
-			return -EINVAL;
+		if (!daddr) {
+			err = -EINVAL;
+			goto out_free;
+		}
 		faddr = ipc.opt->opt.faddr;
 	}
 	tos = get_rttos(&ipc, inet);
@@ -842,6 +844,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 out:
 	ip_rt_put(rt);
+out_free:
 	if (free)
 		kfree(ipc.opt);
 	if (!err) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 24b5c59..dc4fc46 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -952,8 +952,10 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	sock_tx_timestamp(sk, ipc.sockc.tsflags, &ipc.tx_flags);
 
 	if (ipc.opt && ipc.opt->opt.srr) {
-		if (!daddr)
-			return -EINVAL;
+		if (!daddr) {
+			err = -EINVAL;
+			goto out_free;
+		}
 		faddr = ipc.opt->opt.faddr;
 		connected = 0;
 	}
@@ -1074,6 +1076,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 out:
 	ip_rt_put(rt);
+out_free:
 	if (free)
 		kfree(ipc.opt);
 	if (!err)
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH net] tun: fix use after free for ptr_ring
From: Cong Wang @ 2018-05-10 18:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Linux Kernel Network Developers, LKML, Eric Dumazet,
	Michael S . Tsirkin
In-Reply-To: <1525849198-9786-1-git-send-email-jasowang@redhat.com>

On Tue, May 8, 2018 at 11:59 PM, Jason Wang <jasowang@redhat.com> wrote:
> We used to initialize ptr_ring during TUNSETIFF, this is because its
> size depends on the tx_queue_len of netdevice. And we try to clean it
> up when socket were detached from netdevice. A race were spotted when
> trying to do uninit during a read which will lead a use after free for
> pointer ring. Solving this by always initialize a zero size ptr_ring
> in open() and do resizing during TUNSETIFF, and then we can safely do
> cleanup during close(). With this, there's no need for the workaround
> that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
> for tfile->tx_array").
>

Ah, I didn't know ptr_ring_init(0) could work... Nice patch!
Except one thing below.


> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index ef33950..298cb96 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -681,15 +681,6 @@ static void tun_queue_purge(struct tun_file *tfile)
>         skb_queue_purge(&tfile->sk.sk_error_queue);
>  }
>
> -static void tun_cleanup_tx_ring(struct tun_file *tfile)
> -{
> -       if (tfile->tx_ring.queue) {
> -               ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
> -               xdp_rxq_info_unreg(&tfile->xdp_rxq);
> -               memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
> -       }
> -}


I don't think you can totally remove ptr_ring_cleanup(), it should be
called unconditionally with your ptr_ring_init(0) trick, right?

^ permalink raw reply

* Re: [PATCH ipsec-next] xfrm: Allow Output Mark to be Updated Using UPDSA
From: Nathan Harold @ 2018-05-10 18:14 UTC (permalink / raw)
  To: Eyal Birger; +Cc: netdev, Steffen Klassert, tobias
In-Reply-To: <20180510084459.598ce663@jimi>

That makes sense to me; the restriction about which you inquire is a
practical one rather than a philosophical one, which I will be happy
to see lifted.

With the new set_mark, a non-zero mask will indicate that the caller
has a set an "explicit" zero mark, which sidesteps the
currently-ambiguous situation; the logic can then become "if (set_mark
|| set_mark_mask) { // update mark and update mask}". There is a
question of the behavior for a caller who sets a set_mark and
set_mark_mask, then subsequently calls UPDSA with only a mark
(omitting the mask, or with explicit set_mask == 0). I think it's fair
and appropriate the mask be re-set to 0xFFFFFFFF (to avoid the
special-case of (if new_set_mask == 0 && set_mask != 0xFFFFFFFF). Of
course, this means that the inability to return to zero limitation
that I currently mention as being on the output_mark would transfer
under that proposal to the set_mark_mask. All of this is fix-able by
having the update take into account the presence or absence of the
XFRMAs sent rather than just looking at a built xfrm_state, but I'm
couldn't fathom any use cases for reverting the mark scheme back to an
"unused" state while the SA remains ACTIVE, so I think simpler is
better (same reasoning applied to the current change).

-Nathan

On Wed, May 9, 2018 at 10:44 PM, Eyal Birger <eyal.birger@gmail.com> wrote:
> Hi Nathan,
>
> On Wed,  9 May 2018 13:46:26 -0700
> Nathan Harold <nharold@google.com> wrote:
>
>> Allow UPDSA to change output_mark to permit
>> policy separation of packet routing decisions from
>> SA keying in systems that use mark-based routing.
>>
>> In the output_mark, used as a routing and firewall
>> mark for outbound packets, is made update-able which
>> allows routing decisions to be handled independently
>> of keying/SA creation. To maintain consistency with
>> other optional attributes, the output mark is only
>> updated if sent with a non-zero value. Once set, the
>> output mark may not be reset to zero, which ensures
>> that updating the SA does not require the mark to
>> be re-sent to avoid the value being clobbered.
>
> There is an attempt to extend the 'output_mark' to support the input
> direction and masking.
>
> In the proposed implementation, output_mark is converted to type 'struct
> xfrm_mark' where the semantics are as follows:
>
> - If mark is given by XFRMA_OUTPUT_MARK (renamed to XFRMA_SET_MARK)
>   then a new XFRMA_SET_MARK_MASK attribute is consulted to set the mask
>   value
> - if no XFRMA_SET_MARK_MASK attribute is provided, the mask is set to
>   0xffffffff
>
> Therefore, if the mask value is 0, we can regard the mark as 'not
> given'.
>
> My question is, in the context of this patch, it seems that the
> "Once set, the output mark may not be reset to zero" restriction may be
> lifted in favor of updating the mark only if the new mask is non zero.
>
> Does this make sense to you?
> Eyal

^ permalink raw reply

* Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Borislav Petkov @ 2018-05-10 18:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Yonghong Song, mingo, torvalds, ast, daniel,
	linux-kernel, x86, netdev, kernel-team, Thomas Gleixner
In-Reply-To: <20180510175833.nujqinu26mydeape@ast-mbp.dhcp.thefacebook.com>

On Thu, May 10, 2018 at 10:58:35AM -0700, Alexei Starovoitov wrote:
> libbcc is a library. It's not only used by bcc scripts, but by production
> services that compile bpf programs with clang.

Let me get this straight: libbcc fails to compile because it
includes (through some long include chain) the kernel header
arch/x86/include/asm/cpufeature.h.

Is that it?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply

* Re: [RFC bpf-next 04/10] bpf: cfg: detect loop use domination information
From: John Fastabend @ 2018-05-10 18:17 UTC (permalink / raw)
  To: Jiong Wang, alexei.starovoitov, daniel; +Cc: netdev, oss-drivers
In-Reply-To: <1525688567-19618-5-git-send-email-jiong.wang@netronome.com>

On 05/07/2018 03:22 AM, Jiong Wang wrote:
> If one bb is dominating its predecessor, then there is loop.
> 
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  kernel/bpf/cfg.c      | 22 ++++++++++++++++++++++
>  kernel/bpf/cfg.h      |  1 +
>  kernel/bpf/verifier.c |  8 ++++++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/kernel/bpf/cfg.c b/kernel/bpf/cfg.c
> index b50937a..90692e4 100644
> --- a/kernel/bpf/cfg.c
> +++ b/kernel/bpf/cfg.c
> @@ -568,6 +568,28 @@ int subprog_build_dom_info(struct bpf_subprog_info *subprog)
>  	return ret;
>  }
>  
> +bool subprog_has_loop(struct bpf_subprog_info *subprog)
> +{
> +	int lane_len = BITS_TO_LONGS(subprog->bb_num - 2);
> +	struct list_head *bb_list = &subprog->bbs;
> +	struct bb_node *bb, *entry_bb;
> +	struct edge_node *e;
> +
> +	entry_bb = entry_bb(bb_list);
> +	bb = bb_next(entry_bb);
> +	list_for_each_entry_from(bb, &exit_bb(bb_list)->l, l)
> +		list_for_each_entry(e, &bb->e_prevs, l) {
> +			struct bb_node *latch = e->src;
> +
> +			if (latch != entry_bb &&
> +			    test_bit(bb->idx,
> +				     subprog->dtree + latch->idx * lane_len))
> +				return true;
> +		}
> +
> +	return false;
> +}
> +

Because we are using this to guard against loops we need to detect
all loops not just reducible loops. And because (assuming my understanding
is correct) Tarjan's algorithm will only detect all loops when the
graph is reducible we need additional tests.

There are a couple options to fix this with varying levels of complexity.
Because I'm using this to build loop info structures to find induction
variables and show termination. After the loop structures are built we
could search for any back-edges not in valid loops. This would be similar
to the existing back-edge detection code but with an extra check to
allow edges that have been validated. I would need to check that this
doesn't have any escapes before actually proposing it though.

The other method would be to properly test for reducibility using one of
the algorithms for this. I think the most intuitive is to remove back-edges
and test the graph is acyclic. This would be run before the dom tree is
built. This is IMO what we should do, it seems the most "correct" way to
do this.

The most complex would be to handle irreducible programs using some of the
more complex methods. I really don't think this is necessary but in theory
at least we could use something like Havlak-Tarjan algorithm and allow
some programs with irreducible loops. This is likely overkill especially
in a first iteration.

Here is a sample that fails without this series, using original back-edge
detection, but is allowed with this patch,

SEC("classifier_tc_mark")
int _tc_mark(struct __sk_buff *ctx)
{
        void *data      = (void *)(unsigned long)ctx->data;
        void *data_end  = (void *)(unsigned long)ctx->data_end;
        void *data_meta = (void *)(unsigned long)ctx->data_meta;
        struct meta_info *meta = data_meta;
        volatile int mark = ctx->mark;

        mark += 1;

        if (meta + 1 > data) {
B:
                mark += 2;

                if (mark < ctx->mark * 3)
                        goto C;
        } else if (meta < data) {
C:
                mark += 1;
                if (mark < 1000)
                        goto  B;
        }

        return TC_ACT_OK;
}

A more concise example could be made but I just hacked on one of the
sample programs. This generates the CFG as follows (I have a patch
on top of your stack to print the CFG and DOM tables)

CFG:  65535[-1,-1] -> 0[0,9]  0[0,9] -> 3[20,20]  0[0,9] -> 1[10,18]  1[10,18] -> 4[21,28]  1[10,18] -> 2[19,19]  2[19,19] -> 5[29,30]  3[20,20] -> 5[29,30]  3[20,20] -> 4[21,28]  4[21,28] -> 1[10,18]  4[21,28] -> 5[29,30]  5[29,30] -> 65534[31,65534] 
DOM:
 1  0  0  0  0  0 
 1  1  0  0  0  0 
 1  1  1  0  0  0 
 1  0  0  1  0  0 
 1  0  0  0  1  0 
 1  0  0  0  0  1 

Here we have the loop 1[10,18]->4[21,28] and the back-edge 4[21,28]->1[10,18].
The notation is #idx[head_insn,tail_insn]. The above can then be imported
into dot notation and graphed if needed.

Jiong, please verify this analysis is correct.

Thanks,
John

^ permalink raw reply

* RE: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
From: Keller, Jacob E @ 2018-05-10 18:42 UTC (permalink / raw)
  To: Benjamin Poirier, Kirsher, Jeffrey T
  Cc: Achim Mildenberger, olouvignes@gmail.com, jayanth@goubiq.com,
	ehabkost@redhat.com, postmodern.mod3@gmail.com,
	Bart.VanAssche@wdc.com, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20180510072835.5549-1-bpoirier@suse.com>

> -----Original Message-----
> From: Benjamin Poirier [mailto:bpoirier@suse.com]
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger
> <admin@fph.physik.uni-karlsruhe.de>; olouvignes@gmail.com;
> jayanth@goubiq.com; ehabkost@redhat.com; postmodern.mod3@gmail.com;
> Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
> 
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
> 
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> 
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
> 
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
> 
> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ec4a9759a6f2..3afb1f3b6f91 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
> *adapter, u32 *timinca)
>  		}
>  		break;
>  	case e1000_pch_spt:
> -		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
> -			/* Stable 24MHz frequency */
> -			incperiod = INCPERIOD_24MHZ;
> -			incvalue = INCVALUE_24MHZ;
> -			shift = INCVALUE_SHIFT_24MHZ;
> -			adapter->cc.shift = shift;
> -			break;
> -		}
> -		return -EINVAL;
> +		/* Stable 24MHz frequency */
> +		incperiod = INCPERIOD_24MHZ;
> +		incvalue = INCVALUE_24MHZ;
> +		shift = INCVALUE_SHIFT_24MHZ;
> +		adapter->cc.shift = shift;
> +		break;
>  	case e1000_pch_cnp:
>  		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>  			/* Stable 24MHz frequency */
> --
> 2.16.3

Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH net-next 01/10] net: stmmac: Let descriptor code set skbuff address
From: David Miller @ 2018-05-10 19:06 UTC (permalink / raw)
  To: Jose.Abreu
  Cc: netdev, Joao.Pinto, Vitor.Soares, peppe.cavallaro,
	alexandre.torgue
In-Reply-To: <cfc227a5bce0caf2058a8efe1bf5071b9dec48e0.1525683833.git.joabreu@synopsys.com>

From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Tue,  8 May 2018 15:45:24 +0100

> Stop using if conditions depending on the GMAC version for setting the
> the descriptor skbuff address and use instead a helper implemented in
> the descriptor files.
> 
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>

With Spectre mitigations, indirect calls are extremely expensive.  Much
more expensive than conditional checks.

And since this is the descriptor setup in the fast paths of the driver,
I advise that you keep these conditionals.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox