* Re: [net-next PATCH 1/5] udp: Record gso_segs when supporting UDP segmentation offload
From: Eric Dumazet @ 2018-05-04 1:50 UTC (permalink / raw)
To: Alexander Duyck, netdev, willemb, davem
In-Reply-To: <20180504003321.4496.34585.stgit@localhost.localdomain>
On 05/03/2018 05:33 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> We need to record the number of segments that will be generated when this
> frame is segmented. The expectation is that if gso_size is set then
> gso_segs is set as well. Without this some drivers such as ixgbe get
> confused if they attempt to offload this as they record 0 segments for the
> entire packet instead of the correct value.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> net/ipv4/udp.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index dd3102a37ef9..e07db83b311e 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -793,6 +793,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
>
> skb_shinfo(skb)->gso_size = cork->gso_size;
> skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> + skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(len - sizeof(uh),
> + cork->gso_size);
> goto csum_partial;
> }
>
>
Yes, this also fixes qdisc_pkt_len_init()
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* [PATCH bpf-next 10/10] bpf: fix references to free_bpf_prog_info() in comments
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
Comments in the verifier refer to free_bpf_prog_info() which
seems to have never existed in tree. Replace it with
free_used_maps().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
kernel/bpf/verifier.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 23ec9efeb91d..f0633e89ba8b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5043,7 +5043,7 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
/* hold the map. If the program is rejected by verifier,
* the map will be released by release_maps() or it
* will be used by the valid program until it's unloaded
- * and all maps are released in free_bpf_prog_info()
+ * and all maps are released in free_used_maps()
*/
map = bpf_map_inc(map, false);
if (IS_ERR(map)) {
@@ -5792,7 +5792,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
err_release_maps:
if (!env->prog->aux->used_maps)
/* if we didn't copy map pointers into bpf_prog_info, release
- * them now. Otherwise free_bpf_prog_info() will release them.
+ * them now. Otherwise free_used_maps() will release them.
*/
release_maps(env);
*prog = env->prog;
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 09/10] tools: bpftool: add simple perf event output reader
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
Users of BPF sooner or later discover perf_event_output() helpers
and BPF_MAP_TYPE_PERF_EVENT_ARRAY. Dumping this array type is
not possible, however, we can add simple reading of perf events.
Create a new event_pipe subcommand for maps, this sub command
will only work with BPF_MAP_TYPE_PERF_EVENT_ARRAY maps.
Parts of the code from samples/bpf/trace_output_user.c.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
.../bpf/bpftool/Documentation/bpftool-map.rst | 29 +-
tools/bpf/bpftool/Documentation/bpftool.rst | 2 +-
tools/bpf/bpftool/Makefile | 7 +-
tools/bpf/bpftool/bash-completion/bpftool | 36 +-
tools/bpf/bpftool/common.c | 19 +
tools/bpf/bpftool/main.h | 4 +
tools/bpf/bpftool/map.c | 19 +-
tools/bpf/bpftool/map_perf_ring.c | 347 ++++++++++++++++++
8 files changed, 444 insertions(+), 19 deletions(-)
create mode 100644 tools/bpf/bpftool/map_perf_ring.c
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index c3eef8c972cd..a6258bc8ec4f 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -22,12 +22,13 @@ MAP COMMANDS
=============
| **bpftool** **map { show | list }** [*MAP*]
-| **bpftool** **map dump** *MAP*
-| **bpftool** **map update** *MAP* **key** *DATA* **value** *VALUE* [*UPDATE_FLAGS*]
-| **bpftool** **map lookup** *MAP* **key** *DATA*
-| **bpftool** **map getnext** *MAP* [**key** *DATA*]
-| **bpftool** **map delete** *MAP* **key** *DATA*
-| **bpftool** **map pin** *MAP* *FILE*
+| **bpftool** **map dump** *MAP*
+| **bpftool** **map update** *MAP* **key** *DATA* **value** *VALUE* [*UPDATE_FLAGS*]
+| **bpftool** **map lookup** *MAP* **key** *DATA*
+| **bpftool** **map getnext** *MAP* [**key** *DATA*]
+| **bpftool** **map delete** *MAP* **key** *DATA*
+| **bpftool** **map pin** *MAP* *FILE*
+| **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
| **bpftool** **map help**
|
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
@@ -76,6 +77,22 @@ DESCRIPTION
Note: *FILE* must be located in *bpffs* mount.
+ **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
+ Read events from a BPF_MAP_TYPE_PERF_EVENT_ARRAY map.
+
+ Install perf rings into a perf event array map and dump
+ output of any bpf_perf_event_output() call in the kernel.
+ By default read the number of CPUs on the system and
+ install perf ring for each CPU in the corresponding index
+ in the array.
+
+ If **cpu** and **index** are specified, install perf ring
+ for given **cpu** at **index** in the array (single ring).
+
+ Note that installing a perf ring into an array will silently
+ replace any existing ring. Any other application will stop
+ receiving events if it installed its rings earlier.
+
**bpftool map help**
Print short help message.
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst
index 20689a321ffe..564cb0d9692b 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -23,7 +23,7 @@ SYNOPSIS
*MAP-COMMANDS* :=
{ **show** | **list** | **dump** | **update** | **lookup** | **getnext** | **delete**
- | **pin** | **help** }
+ | **pin** | **event_pipe** | **help** }
*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin**
| **load** | **help** }
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 4e69782c4a79..892dbf095bff 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -39,7 +39,12 @@ CC = gcc
CFLAGS += -O2
CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow -Wno-missing-field-initializers
-CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
+CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ \
+ -I$(srctree)/kernel/bpf/ \
+ -I$(srctree)/tools/include \
+ -I$(srctree)/tools/include/uapi \
+ -I$(srctree)/tools/lib/bpf \
+ -I$(srctree)/tools/perf
CFLAGS += -DBPFTOOL_VERSION='"$(BPFTOOL_VERSION)"'
LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 852d84a98acd..b301c9b315f1 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -1,6 +1,6 @@
# bpftool(8) bash completion -*- shell-script -*-
#
-# Copyright (C) 2017 Netronome Systems, Inc.
+# Copyright (C) 2017-2018 Netronome Systems, Inc.
#
# This software is dual licensed under the GNU General License
# Version 2, June 1991 as shown in the file COPYING in the top-level
@@ -79,6 +79,14 @@ _bpftool_get_map_ids()
command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
}
+_bpftool_get_perf_map_ids()
+{
+ COMPREPLY+=( $( compgen -W "$( bpftool -jp map 2>&1 | \
+ command grep -C2 perf_event_array | \
+ command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
+}
+
+
_bpftool_get_prog_ids()
{
COMPREPLY+=( $( compgen -W "$( bpftool -jp prog 2>&1 | \
@@ -359,10 +367,34 @@ _bpftool()
fi
return 0
;;
+ event_pipe)
+ case $prev in
+ $command)
+ COMPREPLY=( $( compgen -W "$MAP_TYPE" -- "$cur" ) )
+ return 0
+ ;;
+ id)
+ _bpftool_get_perf_map_ids
+ return 0
+ ;;
+ cpu)
+ return 0
+ ;;
+ index)
+ return 0
+ ;;
+ *)
+ _bpftool_once_attr 'cpu'
+ _bpftool_once_attr 'index'
+ return 0
+ ;;
+ esac
+ ;;
*)
[[ $prev == $object ]] && \
COMPREPLY=( $( compgen -W 'delete dump getnext help \
- lookup pin show list update' -- "$cur" ) )
+ lookup pin event_pipe show list update' -- \
+ "$cur" ) )
;;
esac
;;
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 9c620770c6ed..32f9e397a6c0 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -331,6 +331,16 @@ char *get_fdinfo(int fd, const char *key)
return NULL;
}
+void print_data_json(uint8_t *data, size_t len)
+{
+ unsigned int i;
+
+ jsonw_start_array(json_wtr);
+ for (i = 0; i < len; i++)
+ jsonw_printf(json_wtr, "%d", data[i]);
+ jsonw_end_array(json_wtr);
+}
+
void print_hex_data_json(uint8_t *data, size_t len)
{
unsigned int i;
@@ -421,6 +431,15 @@ void delete_pinned_obj_table(struct pinned_obj_table *tab)
}
}
+unsigned int get_page_size(void)
+{
+ static int result;
+
+ if (!result)
+ result = getpagesize();
+ return result;
+}
+
unsigned int get_possible_cpus(void)
{
static unsigned int result;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index cbf8985da362..6173cd997e7a 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -117,14 +117,18 @@ int do_pin_fd(int fd, const char *name);
int do_prog(int argc, char **arg);
int do_map(int argc, char **arg);
+int do_event_pipe(int argc, char **argv);
int do_cgroup(int argc, char **arg);
int prog_parse_fd(int *argc, char ***argv);
+int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len);
void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
const char *arch);
+void print_data_json(uint8_t *data, size_t len);
void print_hex_data_json(uint8_t *data, size_t len);
+unsigned int get_page_size(void);
unsigned int get_possible_cpus(void);
const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino);
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 5efefde5f578..af6766e956ba 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -130,8 +130,7 @@ static int map_parse_fd(int *argc, char ***argv)
return -1;
}
-static int
-map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len)
+int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len)
{
int err;
int fd;
@@ -817,12 +816,13 @@ static int do_help(int argc, char **argv)
fprintf(stderr,
"Usage: %s %s { show | list } [MAP]\n"
- " %s %s dump MAP\n"
- " %s %s update MAP key DATA value VALUE [UPDATE_FLAGS]\n"
- " %s %s lookup MAP key DATA\n"
- " %s %s getnext MAP [key DATA]\n"
- " %s %s delete MAP key DATA\n"
- " %s %s pin MAP FILE\n"
+ " %s %s dump MAP\n"
+ " %s %s update MAP key DATA value VALUE [UPDATE_FLAGS]\n"
+ " %s %s lookup MAP key DATA\n"
+ " %s %s getnext MAP [key DATA]\n"
+ " %s %s delete MAP key DATA\n"
+ " %s %s pin MAP FILE\n"
+ " %s %s event_pipe MAP [cpu N index M]\n"
" %s %s help\n"
"\n"
" MAP := { id MAP_ID | pinned FILE }\n"
@@ -834,7 +834,7 @@ static int do_help(int argc, char **argv)
"",
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
- bin_name, argv[-2], bin_name, argv[-2]);
+ bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
return 0;
}
@@ -849,6 +849,7 @@ static const struct cmd cmds[] = {
{ "getnext", do_getnext },
{ "delete", do_delete },
{ "pin", do_pin },
+ { "event_pipe", do_event_pipe },
{ 0 }
};
diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
new file mode 100644
index 000000000000..c5a2ced8552d
--- /dev/null
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2018 Netronome Systems, Inc. */
+/* This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <errno.h>
+#include <fcntl.h>
+#include <libbpf.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include <unistd.h>
+#include <linux/bpf.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+
+#include <bpf.h>
+#include <perf-sys.h>
+
+#include "main.h"
+
+#define MMAP_PAGE_CNT 16
+
+static bool stop;
+
+struct event_ring_info {
+ int fd;
+ int key;
+ unsigned int cpu;
+ void *mem;
+};
+
+struct perf_event_sample {
+ struct perf_event_header header;
+ __u32 size;
+ unsigned char data[];
+};
+
+static void int_exit(int signo)
+{
+ fprintf(stderr, "Stopping...\n");
+ stop = true;
+}
+
+static void
+print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
+{
+ struct {
+ struct perf_event_header header;
+ __u64 id;
+ __u64 lost;
+ } *lost = (void *)e;
+ struct timespec ts;
+
+ if (clock_gettime(CLOCK_MONOTONIC, &ts)) {
+ perror("Can't read clock for timestamp");
+ return;
+ }
+
+ if (json_output) {
+ jsonw_start_object(json_wtr);
+ jsonw_name(json_wtr, "timestamp");
+ jsonw_uint(json_wtr, ts.tv_sec * 1000000000ull + ts.tv_nsec);
+ jsonw_name(json_wtr, "type");
+ jsonw_uint(json_wtr, e->header.type);
+ jsonw_name(json_wtr, "cpu");
+ jsonw_uint(json_wtr, ring->cpu);
+ jsonw_name(json_wtr, "index");
+ jsonw_uint(json_wtr, ring->key);
+ if (e->header.type == PERF_RECORD_SAMPLE) {
+ jsonw_name(json_wtr, "data");
+ print_data_json(e->data, e->size);
+ } else if (e->header.type == PERF_RECORD_LOST) {
+ jsonw_name(json_wtr, "lost");
+ jsonw_start_object(json_wtr);
+ jsonw_name(json_wtr, "id");
+ jsonw_uint(json_wtr, lost->id);
+ jsonw_name(json_wtr, "count");
+ jsonw_uint(json_wtr, lost->lost);
+ jsonw_end_object(json_wtr);
+ }
+ jsonw_end_object(json_wtr);
+ } else {
+ if (e->header.type == PERF_RECORD_SAMPLE) {
+ printf("== @%ld.%ld CPU: %d index: %d =====\n",
+ (long)ts.tv_sec, ts.tv_nsec,
+ ring->cpu, ring->key);
+ fprint_hex(stdout, e->data, e->size, " ");
+ printf("\n");
+ } else if (e->header.type == PERF_RECORD_LOST) {
+ printf("lost %lld events\n", lost->lost);
+ } else {
+ printf("unknown event type=%d size=%d\n",
+ e->header.type, e->header.size);
+ }
+ }
+}
+
+static void
+perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len)
+{
+ volatile struct perf_event_mmap_page *header = ring->mem;
+ __u64 buffer_size = MMAP_PAGE_CNT * get_page_size();
+ __u64 data_tail = header->data_tail;
+ __u64 data_head = header->data_head;
+ void *base, *begin, *end;
+
+ asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
+ if (data_head == data_tail)
+ return;
+
+ base = ((char *)header) + get_page_size();
+
+ begin = base + data_tail % buffer_size;
+ end = base + data_head % buffer_size;
+
+ while (begin != end) {
+ struct perf_event_sample *e;
+
+ e = begin;
+ if (begin + e->header.size > base + buffer_size) {
+ long len = base + buffer_size - begin;
+
+ if (*buf_len < e->header.size) {
+ free(*buf);
+ *buf = malloc(e->header.size);
+ if (!*buf) {
+ fprintf(stderr,
+ "can't allocate memory");
+ stop = true;
+ return;
+ }
+ *buf_len = e->header.size;
+ }
+
+ memcpy(*buf, begin, len);
+ memcpy(*buf + len, base, e->header.size - len);
+ e = (void *)*buf;
+ begin = base + e->header.size - len;
+ } else if (begin + e->header.size == base + buffer_size) {
+ begin = base;
+ } else {
+ begin += e->header.size;
+ }
+
+ print_bpf_output(ring, e);
+ }
+
+ __sync_synchronize(); /* smp_mb() */
+ header->data_tail = data_head;
+}
+
+static int perf_mmap_size(void)
+{
+ return get_page_size() * (MMAP_PAGE_CNT + 1);
+}
+
+static void *perf_event_mmap(int fd)
+{
+ int mmap_size = perf_mmap_size();
+ void *base;
+
+ base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ if (base == MAP_FAILED) {
+ p_err("event mmap failed: %s\n", strerror(errno));
+ return NULL;
+ }
+
+ return base;
+}
+
+static void perf_event_unmap(void *mem)
+{
+ if (munmap(mem, perf_mmap_size()))
+ fprintf(stderr, "Can't unmap ring memory!\n");
+}
+
+static int bpf_perf_event_open(int map_fd, int key, int cpu)
+{
+ struct perf_event_attr attr = {
+ .sample_type = PERF_SAMPLE_RAW,
+ .type = PERF_TYPE_SOFTWARE,
+ .config = PERF_COUNT_SW_BPF_OUTPUT,
+ };
+ int pmu_fd;
+
+ pmu_fd = sys_perf_event_open(&attr, -1, cpu, -1, 0);
+ if (pmu_fd < 0) {
+ p_err("failed to open perf event %d for CPU %d", key, cpu);
+ return -1;
+ }
+
+ if (bpf_map_update_elem(map_fd, &key, &pmu_fd, BPF_ANY)) {
+ p_err("failed to update map for event %d for CPU %d", key, cpu);
+ goto err_close;
+ }
+ if (ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0)) {
+ p_err("failed to enable event %d for CPU %d", key, cpu);
+ goto err_close;
+ }
+
+ return pmu_fd;
+
+err_close:
+ close(pmu_fd);
+ return -1;
+}
+
+int do_event_pipe(int argc, char **argv)
+{
+ int i, nfds, map_fd, index = -1, cpu = -1;
+ struct bpf_map_info map_info = {};
+ struct event_ring_info *rings;
+ size_t tmp_buf_sz = 0;
+ void *tmp_buf = NULL;
+ struct pollfd *pfds;
+ __u32 map_info_len;
+ bool do_all = true;
+
+ map_info_len = sizeof(map_info);
+ map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len);
+ if (map_fd < 0)
+ return -1;
+
+ if (map_info.type != BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
+ p_err("map is not a perf event array");
+ goto err_close_map;
+ }
+
+ while (argc) {
+ if (argc < 2)
+ BAD_ARG();
+
+ if (is_prefix(*argv, "cpu")) {
+ char *endptr;
+
+ NEXT_ARG();
+ cpu = strtoul(*argv, &endptr, 0);
+ if (*endptr) {
+ p_err("can't parse %s as CPU ID", **argv);
+ goto err_close_map;
+ }
+
+ NEXT_ARG();
+ } else if (is_prefix(*argv, "index")) {
+ char *endptr;
+
+ NEXT_ARG();
+ index = strtoul(*argv, &endptr, 0);
+ if (*endptr) {
+ p_err("can't parse %s as index", **argv);
+ goto err_close_map;
+ }
+
+ NEXT_ARG();
+ } else {
+ BAD_ARG();
+ }
+
+ do_all = false;
+ }
+
+ if (!do_all) {
+ if (index == -1 || cpu == -1) {
+ p_err("cpu and index must be specified together");
+ goto err_close_map;
+ }
+
+ nfds = 1;
+ } else {
+ nfds = min(get_possible_cpus(), map_info.max_entries);
+ cpu = 0;
+ index = 0;
+ }
+
+ rings = calloc(nfds, sizeof(rings[0]));
+ if (!rings)
+ goto err_close_map;
+
+ pfds = calloc(nfds, sizeof(pfds[0]));
+ if (!pfds)
+ goto err_free_rings;
+
+ for (i = 0; i < nfds; i++) {
+ rings[i].cpu = cpu + i;
+ rings[i].key = index + i;
+
+ rings[i].fd = bpf_perf_event_open(map_fd, rings[i].key,
+ rings[i].cpu);
+ if (rings[i].fd < 0)
+ goto err_close_fds_prev;
+
+ rings[i].mem = perf_event_mmap(rings[i].fd);
+ if (!rings[i].mem)
+ goto err_close_fds_current;
+
+ pfds[i].fd = rings[i].fd;
+ pfds[i].events = POLLIN;
+ }
+
+ signal(SIGINT, int_exit);
+ signal(SIGHUP, int_exit);
+ signal(SIGTERM, int_exit);
+
+ if (json_output)
+ jsonw_start_array(json_wtr);
+
+ while (!stop) {
+ poll(pfds, nfds, 200);
+ for (i = 0; i < nfds; i++)
+ perf_event_read(&rings[i], &tmp_buf, &tmp_buf_sz);
+ }
+ free(tmp_buf);
+
+ if (json_output)
+ jsonw_end_array(json_wtr);
+
+ for (i = 0; i < nfds; i++) {
+ perf_event_unmap(rings[i].mem);
+ close(rings[i].fd);
+ }
+ free(pfds);
+ free(rings);
+ close(map_fd);
+
+ return 0;
+
+err_close_fds_prev:
+ while (i--) {
+ perf_event_unmap(rings[i].mem);
+err_close_fds_current:
+ close(rings[i].fd);
+ }
+ free(pfds);
+err_free_rings:
+ free(rings);
+err_close_map:
+ close(map_fd);
+ return -1;
+}
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 08/10] tools: bpftool: move get_possible_cpus() to common code
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
Move the get_possible_cpus() function to shared code. No functional
changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
---
tools/bpf/bpftool/common.c | 58 +++++++++++++++++++++++++++++++++++++-
tools/bpf/bpftool/main.h | 3 +-
tools/bpf/bpftool/map.c | 56 ------------------------------------
3 files changed, 59 insertions(+), 58 deletions(-)
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 465995281dcd..9c620770c6ed 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -33,6 +33,7 @@
/* Author: Jakub Kicinski <kubakici@wp.pl> */
+#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <fts.h>
@@ -420,6 +421,61 @@ void delete_pinned_obj_table(struct pinned_obj_table *tab)
}
}
+unsigned int get_possible_cpus(void)
+{
+ static unsigned int result;
+ char buf[128];
+ long int n;
+ char *ptr;
+ int fd;
+
+ if (result)
+ return result;
+
+ fd = open("/sys/devices/system/cpu/possible", O_RDONLY);
+ if (fd < 0) {
+ p_err("can't open sysfs possible cpus");
+ exit(-1);
+ }
+
+ n = read(fd, buf, sizeof(buf));
+ if (n < 2) {
+ p_err("can't read sysfs possible cpus");
+ exit(-1);
+ }
+ close(fd);
+
+ if (n == sizeof(buf)) {
+ p_err("read sysfs possible cpus overflow");
+ exit(-1);
+ }
+
+ ptr = buf;
+ n = 0;
+ while (*ptr && *ptr != '\n') {
+ unsigned int a, b;
+
+ if (sscanf(ptr, "%u-%u", &a, &b) == 2) {
+ n += b - a + 1;
+
+ ptr = strchr(ptr, '-') + 1;
+ } else if (sscanf(ptr, "%u", &a) == 1) {
+ n++;
+ } else {
+ assert(0);
+ }
+
+ while (isdigit(*ptr))
+ ptr++;
+ if (*ptr == ',')
+ ptr++;
+ }
+
+ result = n;
+
+ return result;
+}
+
static char *
ifindex_to_name_ns(__u32 ifindex, __u32 ns_dev, __u32 ns_ino, char *buf)
{
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index b8e9584d6246..cbf8985da362 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -125,6 +125,7 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
const char *arch);
void print_hex_data_json(uint8_t *data, size_t len);
+unsigned int get_possible_cpus(void);
const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino);
#endif
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 7da77e4166ec..5efefde5f578 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -34,7 +34,6 @@
/* Author: Jakub Kicinski <kubakici@wp.pl> */
#include <assert.h>
-#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <stdbool.h>
@@ -69,61 +68,6 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_CPUMAP] = "cpumap",
};
-static unsigned int get_possible_cpus(void)
-{
- static unsigned int result;
- char buf[128];
- long int n;
- char *ptr;
- int fd;
-
- if (result)
- return result;
-
- fd = open("/sys/devices/system/cpu/possible", O_RDONLY);
- if (fd < 0) {
- p_err("can't open sysfs possible cpus");
- exit(-1);
- }
-
- n = read(fd, buf, sizeof(buf));
- if (n < 2) {
- p_err("can't read sysfs possible cpus");
- exit(-1);
- }
- close(fd);
-
- if (n == sizeof(buf)) {
- p_err("read sysfs possible cpus overflow");
- exit(-1);
- }
-
- ptr = buf;
- n = 0;
- while (*ptr && *ptr != '\n') {
- unsigned int a, b;
-
- if (sscanf(ptr, "%u-%u", &a, &b) == 2) {
- n += b - a + 1;
-
- ptr = strchr(ptr, '-') + 1;
- } else if (sscanf(ptr, "%u", &a) == 1) {
- n++;
- } else {
- assert(0);
- }
-
- while (isdigit(*ptr))
- ptr++;
- if (*ptr == ',')
- ptr++;
- }
-
- result = n;
-
- return result;
-}
-
static bool map_is_per_cpu(__u32 type)
{
return type == BPF_MAP_TYPE_PERCPU_HASH ||
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 07/10] tools: bpftool: fold hex keyword in command help
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
Instead of spelling [hex] BYTES everywhere use DATA as keyword
for generalized value. This will help us keep the messages
concise when longer command are added in the future. It will
also be useful once BTF support comes. We will only have to
change the definition of DATA.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
.../bpf/bpftool/Documentation/bpftool-map.rst | 19 ++++++++++---------
tools/bpf/bpftool/map.c | 13 +++++++------
2 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 5f512b14bff9..c3eef8c972cd 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -23,16 +23,17 @@ MAP COMMANDS
| **bpftool** **map { show | list }** [*MAP*]
| **bpftool** **map dump** *MAP*
-| **bpftool** **map update** *MAP* **key** [**hex**] *BYTES* **value** [**hex**] *VALUE* [*UPDATE_FLAGS*]
-| **bpftool** **map lookup** *MAP* **key** [**hex**] *BYTES*
-| **bpftool** **map getnext** *MAP* [**key** [**hex**] *BYTES*]
-| **bpftool** **map delete** *MAP* **key** [**hex**] *BYTES*
+| **bpftool** **map update** *MAP* **key** *DATA* **value** *VALUE* [*UPDATE_FLAGS*]
+| **bpftool** **map lookup** *MAP* **key** *DATA*
+| **bpftool** **map getnext** *MAP* [**key** *DATA*]
+| **bpftool** **map delete** *MAP* **key** *DATA*
| **bpftool** **map pin** *MAP* *FILE*
| **bpftool** **map help**
|
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
+| *DATA* := { [**hex**] *BYTES* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
-| *VALUE* := { *BYTES* | *MAP* | *PROG* }
+| *VALUE* := { *DATA* | *MAP* | *PROG* }
| *UPDATE_FLAGS* := { **any** | **exist** | **noexist** }
DESCRIPTION
@@ -48,7 +49,7 @@ DESCRIPTION
**bpftool map dump** *MAP*
Dump all entries in a given *MAP*.
- **bpftool map update** *MAP* **key** [**hex**] *BYTES* **value** [**hex**] *VALUE* [*UPDATE_FLAGS*]
+ **bpftool map update** *MAP* **key** *DATA* **value** *VALUE* [*UPDATE_FLAGS*]
Update map entry for a given *KEY*.
*UPDATE_FLAGS* can be one of: **any** update existing entry
@@ -61,13 +62,13 @@ DESCRIPTION
the bytes are parsed as decimal values, unless a "0x" prefix
(for hexadecimal) or a "0" prefix (for octal) is provided.
- **bpftool map lookup** *MAP* **key** [**hex**] *BYTES*
+ **bpftool map lookup** *MAP* **key** *DATA*
Lookup **key** in the map.
- **bpftool map getnext** *MAP* [**key** [**hex**] *BYTES*]
+ **bpftool map getnext** *MAP* [**key** *DATA*]
Get next key. If *key* is not specified, get first key.
- **bpftool map delete** *MAP* **key** [**hex**] *BYTES*
+ **bpftool map delete** *MAP* **key** *DATA*
Remove entry from the map.
**bpftool map pin** *MAP* *FILE*
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index a6cdb640a0d7..7da77e4166ec 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -874,16 +874,17 @@ static int do_help(int argc, char **argv)
fprintf(stderr,
"Usage: %s %s { show | list } [MAP]\n"
" %s %s dump MAP\n"
- " %s %s update MAP key [hex] BYTES value [hex] VALUE [UPDATE_FLAGS]\n"
- " %s %s lookup MAP key [hex] BYTES\n"
- " %s %s getnext MAP [key [hex] BYTES]\n"
- " %s %s delete MAP key [hex] BYTES\n"
+ " %s %s update MAP key DATA value VALUE [UPDATE_FLAGS]\n"
+ " %s %s lookup MAP key DATA\n"
+ " %s %s getnext MAP [key DATA]\n"
+ " %s %s delete MAP key DATA\n"
" %s %s pin MAP FILE\n"
" %s %s help\n"
"\n"
" MAP := { id MAP_ID | pinned FILE }\n"
+ " DATA := { [hex] BYTES }\n"
" " HELP_SPEC_PROGRAM "\n"
- " VALUE := { BYTES | MAP | PROG }\n"
+ " VALUE := { DATA | MAP | PROG }\n"
" UPDATE_FLAGS := { any | exist | noexist }\n"
" " HELP_SPEC_OPTIONS "\n"
"",
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 06/10] nfp: bpf: rewrite map pointers with NFP TIDs
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
Kernel will now replace map fds with actual pointer before
calling the offload prepare. We can identify those pointers
and replace them with NFP table IDs instead of loading the
table ID in code generated for CALL instruction.
This allows us to support having the same CALL being used with
different maps.
Since we don't want to change the FW ABI we still need to
move the TID from R1 to portion of R0 before the jump.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
---
drivers/net/ethernet/netronome/nfp/bpf/jit.c | 44 ++++++++++++++-----
.../net/ethernet/netronome/nfp/bpf/verifier.c | 9 ----
2 files changed, 32 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 2127cf1548dd..326a2085d650 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1395,15 +1395,9 @@ static int adjust_head(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
static int
map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
- struct bpf_offloaded_map *offmap;
- struct nfp_bpf_map *nfp_map;
bool load_lm_ptr;
u32 ret_tgt;
s64 lm_off;
- swreg tid;
-
- offmap = (struct bpf_offloaded_map *)meta->arg1.map_ptr;
- nfp_map = offmap->dev_priv;
/* We only have to reload LM0 if the key is not at start of stack */
lm_off = nfp_prog->stack_depth;
@@ -1416,17 +1410,12 @@ map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
if (meta->func_id == BPF_FUNC_map_update_elem)
emit_csr_wr(nfp_prog, reg_b(3 * 2), NFP_CSR_ACT_LM_ADDR2);
- /* Load map ID into a register, it should actually fit as an immediate
- * but in case it doesn't deal with it here, not in the delay slots.
- */
- tid = ur_load_imm_any(nfp_prog, nfp_map->tid, imm_a(nfp_prog));
-
emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO + meta->func_id,
2, RELO_BR_HELPER);
ret_tgt = nfp_prog_current_offset(nfp_prog) + 2;
/* Load map ID into A0 */
- wrp_mov(nfp_prog, reg_a(0), tid);
+ wrp_mov(nfp_prog, reg_a(0), reg_a(2));
/* Load the return address into B0 */
wrp_immed_relo(nfp_prog, reg_b(0), ret_tgt, RELO_IMMED_REL);
@@ -3254,6 +3243,33 @@ static int nfp_bpf_optimize(struct nfp_prog *nfp_prog)
return 0;
}
+static int nfp_bpf_replace_map_ptrs(struct nfp_prog *nfp_prog)
+{
+ struct nfp_insn_meta *meta1, *meta2;
+ struct nfp_bpf_map *nfp_map;
+ struct bpf_map *map;
+
+ nfp_for_each_insn_walk2(nfp_prog, meta1, meta2) {
+ if (meta1->skip || meta2->skip)
+ continue;
+
+ if (meta1->insn.code != (BPF_LD | BPF_IMM | BPF_DW) ||
+ meta1->insn.src_reg != BPF_PSEUDO_MAP_FD)
+ continue;
+
+ map = (void *)(unsigned long)((u32)meta1->insn.imm |
+ (u64)meta2->insn.imm << 32);
+ if (bpf_map_offload_neutral(map))
+ continue;
+ nfp_map = map_to_offmap(map)->dev_priv;
+
+ meta1->insn.imm = nfp_map->tid;
+ meta2->insn.imm = 0;
+ }
+
+ return 0;
+}
+
static int nfp_bpf_ustore_calc(u64 *prog, unsigned int len)
{
__le64 *ustore = (__force __le64 *)prog;
@@ -3290,6 +3306,10 @@ int nfp_bpf_jit(struct nfp_prog *nfp_prog)
{
int ret;
+ ret = nfp_bpf_replace_map_ptrs(nfp_prog);
+ if (ret)
+ return ret;
+
ret = nfp_bpf_optimize(nfp_prog);
if (ret)
return ret;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index c76b622743ae..e163f3cfa47d 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -151,15 +151,6 @@ nfp_bpf_map_call_ok(const char *fname, struct bpf_verifier_env *env,
return false;
}
- /* Rest of the checks is only if we re-parse the same insn */
- if (!meta->func_id)
- return true;
-
- if (meta->arg1.map_ptr != reg1->map_ptr) {
- pr_vlog(env, "%s: called for different map\n", fname);
- return false;
- }
-
return true;
}
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 05/10] nfp: bpf: perf event output helpers support
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
Add support for the perf_event_output family of helpers.
The implementation on the NFP will not match the host code exactly.
The state of the host map and rings is unknown to the device, hence
device can't return errors when rings are not installed. The device
simply packs the data into a firmware notification message and sends
it over to the host, returning success to the program.
There is no notion of a host CPU on the device when packets are being
processed. Device will only offload programs which set BPF_F_CURRENT_CPU.
Still, if map index doesn't match CPU no error will be returned (see
above).
Dropped/lost firmware notification messages will not cause "lost
events" event on the perf ring, they are only visible via device
error counters.
Firmware notification messages may also get reordered in respect
to the packets which caused their generation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 16 ++++-
drivers/net/ethernet/netronome/nfp/bpf/fw.h | 20 +++++-
drivers/net/ethernet/netronome/nfp/bpf/jit.c | 32 ++++++++-
drivers/net/ethernet/netronome/nfp/bpf/main.c | 3 +
drivers/net/ethernet/netronome/nfp/bpf/main.h | 4 ++
.../net/ethernet/netronome/nfp/bpf/offload.c | 47 +++++++++++++
.../net/ethernet/netronome/nfp/bpf/verifier.c | 69 ++++++++++++++++++-
7 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
index 7e298148ca26..cb87fccb9f6a 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -102,6 +102,15 @@ nfp_bpf_cmsg_map_req_alloc(struct nfp_app_bpf *bpf, unsigned int n)
return nfp_bpf_cmsg_alloc(bpf, size);
}
+static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb)
+{
+ struct cmsg_hdr *hdr;
+
+ hdr = (struct cmsg_hdr *)skb->data;
+
+ return hdr->type;
+}
+
static unsigned int nfp_bpf_cmsg_get_tag(struct sk_buff *skb)
{
struct cmsg_hdr *hdr;
@@ -431,6 +440,11 @@ void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
goto err_free;
}
+ if (nfp_bpf_cmsg_get_type(skb) == CMSG_TYPE_BPF_EVENT) {
+ nfp_bpf_event_output(bpf, skb);
+ return;
+ }
+
nfp_ctrl_lock(bpf->app->ctrl);
tag = nfp_bpf_cmsg_get_tag(skb);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
index 39639ac28b01..3dbc21653ce5 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -37,6 +37,14 @@
#include <linux/bitops.h>
#include <linux/types.h>
+/* Kernel's enum bpf_reg_type is not uABI so people may change it breaking
+ * our FW ABI. In that case we will do translation in the driver.
+ */
+#define NFP_BPF_SCALAR_VALUE 1
+#define NFP_BPF_MAP_VALUE 4
+#define NFP_BPF_STACK 6
+#define NFP_BPF_PACKET_DATA 8
+
enum bpf_cap_tlv_type {
NFP_BPF_CAP_TYPE_FUNC = 1,
NFP_BPF_CAP_TYPE_ADJUST_HEAD = 2,
@@ -81,6 +89,7 @@ enum nfp_bpf_cmsg_type {
CMSG_TYPE_MAP_DELETE = 5,
CMSG_TYPE_MAP_GETNEXT = 6,
CMSG_TYPE_MAP_GETFIRST = 7,
+ CMSG_TYPE_BPF_EVENT = 8,
__CMSG_TYPE_MAP_MAX,
};
@@ -155,4 +164,13 @@ struct cmsg_reply_map_op {
__be32 resv;
struct cmsg_key_value_pair elem[0];
};
+
+struct cmsg_bpf_event {
+ struct cmsg_hdr hdr;
+ __be32 cpu_id;
+ __be64 map_ptr;
+ __be32 data_size;
+ __be32 pkt_size;
+ u8 data[0];
+};
#endif
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 65f0791cae0c..2127cf1548dd 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Netronome Systems, Inc.
+ * Copyright (C) 2016-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -1456,6 +1456,31 @@ nfp_get_prandom_u32(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
return 0;
}
+static int
+nfp_perf_event_output(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+ swreg ptr_type;
+ u32 ret_tgt;
+
+ ptr_type = ur_load_imm_any(nfp_prog, meta->arg1.type, imm_a(nfp_prog));
+
+ ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
+
+ emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO + meta->func_id,
+ 2, RELO_BR_HELPER);
+
+ /* Load ptr type into A1 */
+ wrp_mov(nfp_prog, reg_a(1), ptr_type);
+
+ /* Load the return address into B0 */
+ wrp_immed_relo(nfp_prog, reg_b(0), ret_tgt, RELO_IMMED_REL);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, ret_tgt))
+ return -EINVAL;
+
+ return 0;
+}
+
/* --- Callbacks --- */
static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
@@ -2411,6 +2436,8 @@ static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
return map_call_stack_common(nfp_prog, meta);
case BPF_FUNC_get_prandom_u32:
return nfp_get_prandom_u32(nfp_prog, meta);
+ case BPF_FUNC_perf_event_output:
+ return nfp_perf_event_output(nfp_prog, meta);
default:
WARN_ONCE(1, "verifier allowed unsupported function\n");
return -EOPNOTSUPP;
@@ -3353,6 +3380,9 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv)
case BPF_FUNC_map_delete_elem:
val = nfp_prog->bpf->helpers.map_delete;
break;
+ case BPF_FUNC_perf_event_output:
+ val = nfp_prog->bpf->helpers.perf_event_output;
+ break;
default:
pr_err("relocation of unknown helper %d\n",
val);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index f65df341c557..d72f9e7f42da 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -298,6 +298,9 @@ nfp_bpf_parse_cap_func(struct nfp_app_bpf *bpf, void __iomem *value, u32 length)
case BPF_FUNC_map_delete_elem:
bpf->helpers.map_delete = readl(&cap->func_addr);
break;
+ case BPF_FUNC_perf_event_output:
+ bpf->helpers.perf_event_output = readl(&cap->func_addr);
+ break;
}
return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 5db6284a44ba..82682378d57f 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -136,6 +136,7 @@ enum pkt_vec {
* @helpers.map_lookup: map lookup helper address
* @helpers.map_update: map update helper address
* @helpers.map_delete: map delete helper address
+ * @helpers.perf_event_output: output perf event to a ring buffer
*
* @pseudo_random: FW initialized the pseudo-random machinery (CSRs)
*/
@@ -176,6 +177,7 @@ struct nfp_app_bpf {
u32 map_lookup;
u32 map_update;
u32 map_delete;
+ u32 perf_event_output;
} helpers;
bool pseudo_random;
@@ -458,5 +460,7 @@ int nfp_bpf_ctrl_lookup_entry(struct bpf_offloaded_map *offmap,
int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
void *key, void *next_key);
+int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct sk_buff *skb);
+
void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb);
#endif
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index e3ead906cc60..4db0ac1e42a8 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -441,6 +441,53 @@ int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf)
}
}
+static unsigned long
+nfp_bpf_perf_event_copy(void *dst, const void *src,
+ unsigned long off, unsigned long len)
+{
+ memcpy(dst, src + off, len);
+ return 0;
+}
+
+int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct sk_buff *skb)
+{
+ struct cmsg_bpf_event *cbe = (void *)skb->data;
+ u32 pkt_size, data_size;
+ struct bpf_map *map;
+
+ if (skb->len < sizeof(struct cmsg_bpf_event))
+ goto err_drop;
+
+ pkt_size = be32_to_cpu(cbe->pkt_size);
+ data_size = be32_to_cpu(cbe->data_size);
+ map = (void *)(unsigned long)be64_to_cpu(cbe->map_ptr);
+
+ if (skb->len < sizeof(struct cmsg_bpf_event) + pkt_size + data_size)
+ goto err_drop;
+ if (cbe->hdr.ver != CMSG_MAP_ABI_VERSION)
+ goto err_drop;
+
+ rcu_read_lock();
+ if (!rhashtable_lookup_fast(&bpf->maps_neutral, &map,
+ nfp_bpf_maps_neutral_params)) {
+ rcu_read_unlock();
+ pr_warn("perf event: dest map pointer %px not recognized, dropping event\n",
+ map);
+ goto err_drop;
+ }
+
+ bpf_event_output(map, be32_to_cpu(cbe->cpu_id),
+ &cbe->data[round_up(pkt_size, 4)], data_size,
+ cbe->data, pkt_size, nfp_bpf_perf_event_copy);
+ rcu_read_unlock();
+
+ dev_consume_skb_any(skb);
+ return 0;
+err_drop:
+ dev_kfree_skb_any(skb);
+ return -EINVAL;
+}
+
static int
nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog,
struct netlink_ext_ack *extack)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 06ad53ce4ad9..c76b622743ae 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Netronome Systems, Inc.
+ * Copyright (C) 2016-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -36,6 +36,8 @@
#include <linux/kernel.h>
#include <linux/pkt_cls.h>
+#include "../nfp_app.h"
+#include "../nfp_main.h"
#include "fw.h"
#include "main.h"
@@ -216,6 +218,71 @@ nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env,
pr_vlog(env, "bpf_get_prandom_u32(): FW doesn't support random number generation\n");
return -EOPNOTSUPP;
+ case BPF_FUNC_perf_event_output:
+ BUILD_BUG_ON(NFP_BPF_SCALAR_VALUE != SCALAR_VALUE ||
+ NFP_BPF_MAP_VALUE != PTR_TO_MAP_VALUE ||
+ NFP_BPF_STACK != PTR_TO_STACK ||
+ NFP_BPF_PACKET_DATA != PTR_TO_PACKET);
+
+ if (!bpf->helpers.perf_event_output) {
+ pr_vlog(env, "event_output: not supported by FW\n");
+ return -EOPNOTSUPP;
+ }
+
+ /* Force current CPU to make sure we can report the event
+ * wherever we get the control message from FW.
+ */
+ if (reg3->var_off.mask & BPF_F_INDEX_MASK ||
+ (reg3->var_off.value & BPF_F_INDEX_MASK) !=
+ BPF_F_CURRENT_CPU) {
+ char tn_buf[48];
+
+ tnum_strn(tn_buf, sizeof(tn_buf), reg3->var_off);
+ pr_vlog(env, "event_output: must use BPF_F_CURRENT_CPU, var_off: %s\n",
+ tn_buf);
+ return -EOPNOTSUPP;
+ }
+
+ /* Save space in meta, we don't care about arguments other
+ * than 4th meta, shove it into arg1.
+ */
+ reg1 = cur_regs(env) + BPF_REG_4;
+
+ if (reg1->type != SCALAR_VALUE /* NULL ptr */ &&
+ reg1->type != PTR_TO_STACK &&
+ reg1->type != PTR_TO_MAP_VALUE &&
+ reg1->type != PTR_TO_PACKET) {
+ pr_vlog(env, "event_output: unsupported ptr type: %d\n",
+ reg1->type);
+ return -EOPNOTSUPP;
+ }
+
+ if (reg1->type == PTR_TO_STACK &&
+ !nfp_bpf_stack_arg_ok("event_output", env, reg1, NULL))
+ return -EOPNOTSUPP;
+
+ /* Warn user that on offload NFP may return success even if map
+ * is not going to accept the event, since the event output is
+ * fully async and device won't know the state of the map.
+ * There is also FW limitation on the event length.
+ *
+ * Lost events will not show up on the perf ring, driver
+ * won't see them at all. Events may also get reordered.
+ */
+ dev_warn_once(&nfp_prog->bpf->app->pf->pdev->dev,
+ "bpf: note: return codes and behavior of bpf_event_output() helper differs for offloaded programs!\n");
+ pr_vlog(env, "warning: return codes and behavior of event_output helper differ for offload!\n");
+
+ if (!meta->func_id)
+ break;
+
+ if (reg1->type != meta->arg1.type) {
+ pr_vlog(env, "event_output: ptr type changed: %d %d\n",
+ meta->arg1.type, reg1->type);
+ return -EINVAL;
+ }
+ break;
+
default:
pr_vlog(env, "unsupported function id: %d\n", func_id);
return -EOPNOTSUPP;
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 04/10] bpf: replace map pointer loads before calling into offloads
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
Offloads may find host map pointers more useful than map fds.
Map pointers can be used to identify the map, while fds are
only valid within the context of loading process.
Jump to skip_full_check on error in case verifier log overflow
has to be handled (replace_map_fd_with_map_ptr() prints to the
log, driver prep may do that too in the future).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
---
kernel/bpf/verifier.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 712d8655e916..23ec9efeb91d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5712,16 +5712,16 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
env->strict_alignment = true;
+ ret = replace_map_fd_with_map_ptr(env);
+ if (ret < 0)
+ goto skip_full_check;
+
if (bpf_prog_is_dev_bound(env->prog->aux)) {
ret = bpf_prog_offload_verifier_prep(env);
if (ret)
- goto err_unlock;
+ goto skip_full_check;
}
- ret = replace_map_fd_with_map_ptr(env);
- if (ret < 0)
- goto skip_full_check;
-
env->explored_states = kcalloc(env->prog->len,
sizeof(struct bpf_verifier_state_list *),
GFP_USER);
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 03/10] bpf: export bpf_event_output()
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
bpf_event_output() is useful for offloads to add events to BPF
event rings, export it. Note that export is placed near the stub
since tracing is optional and kernel/bpf/core.c is always going
to be built.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
---
kernel/bpf/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 90feeba3a1a1..ab9bf1213b2e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1799,6 +1799,7 @@ bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
{
return -ENOTSUPP;
}
+EXPORT_SYMBOL_GPL(bpf_event_output);
/* Always built-in helper functions. */
const struct bpf_func_proto bpf_tail_call_proto = {
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 02/10] nfp: bpf: record offload neutral maps in the driver
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
For asynchronous events originating from the device, like perf event
output, we need to be able to make sure that objects being referred
to by the FW message are valid on the host. FW events can get queued
and reordered. Even if we had a FW message "barrier" we should still
protect ourselves from bogus FW output.
Add a reverse-mapping hash table and record in it all raw map pointers
FW may refer to. Only record neutral maps, i.e. perf event arrays.
These are currently the only objects FW can refer to. Use RCU protection
on the read side, update side is under RTNL.
Since program vs map destruction order is slightly painful for offload
simply take an extra reference on all the recorded maps to make sure
they don't disappear.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
drivers/net/ethernet/netronome/nfp/bpf/main.c | 25 +++-
drivers/net/ethernet/netronome/nfp/bpf/main.h | 20 ++-
.../net/ethernet/netronome/nfp/bpf/offload.c | 125 +++++++++++++++++-
drivers/net/ethernet/netronome/nfp/nfp_app.c | 2 +-
kernel/bpf/syscall.c | 2 +
5 files changed, 168 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 1dc424685f4e..f65df341c557 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -43,6 +43,14 @@
#include "fw.h"
#include "main.h"
+const struct rhashtable_params nfp_bpf_maps_neutral_params = {
+ .nelem_hint = 4,
+ .key_len = FIELD_SIZEOF(struct nfp_bpf_neutral_map, ptr),
+ .key_offset = offsetof(struct nfp_bpf_neutral_map, ptr),
+ .head_offset = offsetof(struct nfp_bpf_neutral_map, l),
+ .automatic_shrinking = true,
+};
+
static bool nfp_net_ebpf_capable(struct nfp_net *nn)
{
#ifdef __LITTLE_ENDIAN
@@ -401,17 +409,28 @@ static int nfp_bpf_init(struct nfp_app *app)
init_waitqueue_head(&bpf->cmsg_wq);
INIT_LIST_HEAD(&bpf->map_list);
- err = nfp_bpf_parse_capabilities(app);
+ err = rhashtable_init(&bpf->maps_neutral, &nfp_bpf_maps_neutral_params);
if (err)
goto err_free_bpf;
+ err = nfp_bpf_parse_capabilities(app);
+ if (err)
+ goto err_free_neutral_maps;
+
return 0;
+err_free_neutral_maps:
+ rhashtable_destroy(&bpf->maps_neutral);
err_free_bpf:
kfree(bpf);
return err;
}
+static void nfp_check_rhashtable_empty(void *ptr, void *arg)
+{
+ WARN_ON_ONCE(1);
+}
+
static void nfp_bpf_clean(struct nfp_app *app)
{
struct nfp_app_bpf *bpf = app->priv;
@@ -419,6 +438,8 @@ static void nfp_bpf_clean(struct nfp_app *app)
WARN_ON(!skb_queue_empty(&bpf->cmsg_replies));
WARN_ON(!list_empty(&bpf->map_list));
WARN_ON(bpf->maps_in_use || bpf->map_elems_in_use);
+ rhashtable_free_and_destroy(&bpf->maps_neutral,
+ nfp_check_rhashtable_empty, NULL);
kfree(bpf);
}
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 68b5d326483d..5db6284a44ba 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Netronome Systems, Inc.
+ * Copyright (C) 2016-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -39,6 +39,7 @@
#include <linux/bpf_verifier.h>
#include <linux/kernel.h>
#include <linux/list.h>
+#include <linux/rhashtable.h>
#include <linux/skbuff.h>
#include <linux/types.h>
#include <linux/wait.h>
@@ -114,6 +115,8 @@ enum pkt_vec {
* @maps_in_use: number of currently offloaded maps
* @map_elems_in_use: number of elements allocated to offloaded maps
*
+ * @maps_neutral: hash table of offload-neutral maps (on pointer)
+ *
* @adjust_head: adjust head capability
* @adjust_head.flags: extra flags for adjust head
* @adjust_head.off_min: minimal packet offset within buffer required
@@ -150,6 +153,8 @@ struct nfp_app_bpf {
unsigned int maps_in_use;
unsigned int map_elems_in_use;
+ struct rhashtable maps_neutral;
+
struct nfp_bpf_cap_adjust_head {
u32 flags;
int off_min;
@@ -199,6 +204,14 @@ struct nfp_bpf_map {
enum nfp_bpf_map_use use_map[];
};
+struct nfp_bpf_neutral_map {
+ struct rhash_head l;
+ struct bpf_map *ptr;
+ u32 count;
+};
+
+extern const struct rhashtable_params nfp_bpf_maps_neutral_params;
+
struct nfp_prog;
struct nfp_insn_meta;
typedef int (*instr_cb_t)(struct nfp_prog *, struct nfp_insn_meta *);
@@ -367,6 +380,8 @@ static inline bool is_mbpf_xadd(const struct nfp_insn_meta *meta)
* @error: error code if something went wrong
* @stack_depth: max stack depth from the verifier
* @adjust_head_location: if program has single adjust head call - the insn no.
+ * @map_records_cnt: the number of map pointers recorded for this prog
+ * @map_records: the map record pointers from bpf->maps_neutral
* @insns: list of BPF instruction wrappers (struct nfp_insn_meta)
*/
struct nfp_prog {
@@ -390,6 +405,9 @@ struct nfp_prog {
unsigned int stack_depth;
unsigned int adjust_head_location;
+ unsigned int map_records_cnt;
+ struct nfp_bpf_neutral_map **map_records;
+
struct list_head insns;
};
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 42d98792bd25..e3ead906cc60 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Netronome Systems, Inc.
+ * Copyright (C) 2016-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -56,6 +56,126 @@
#include "../nfp_net_ctrl.h"
#include "../nfp_net.h"
+static int
+nfp_map_ptr_record(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog,
+ struct bpf_map *map)
+{
+ struct nfp_bpf_neutral_map *record;
+ int err;
+
+ /* Map record paths are entered via ndo, update side is protected. */
+ ASSERT_RTNL();
+
+ /* Reuse path - other offloaded program is already tracking this map. */
+ record = rhashtable_lookup_fast(&bpf->maps_neutral, &map,
+ nfp_bpf_maps_neutral_params);
+ if (record) {
+ nfp_prog->map_records[nfp_prog->map_records_cnt++] = record;
+ record->count++;
+ return 0;
+ }
+
+ /* Grab a single ref to the map for our record. The prog destroy ndo
+ * happens after free_used_maps().
+ */
+ map = bpf_map_inc(map, false);
+ if (IS_ERR(map))
+ return PTR_ERR(map);
+
+ record = kmalloc(sizeof(*record), GFP_KERNEL);
+ if (!record) {
+ err = -ENOMEM;
+ goto err_map_put;
+ }
+
+ record->ptr = map;
+ record->count = 1;
+
+ err = rhashtable_insert_fast(&bpf->maps_neutral, &record->l,
+ nfp_bpf_maps_neutral_params);
+ if (err)
+ goto err_free_rec;
+
+ nfp_prog->map_records[nfp_prog->map_records_cnt++] = record;
+
+ return 0;
+
+err_free_rec:
+ kfree(record);
+err_map_put:
+ bpf_map_put(map);
+ return err;
+}
+
+static void
+nfp_map_ptrs_forget(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog)
+{
+ bool freed = false;
+ int i;
+
+ ASSERT_RTNL();
+
+ for (i = 0; i < nfp_prog->map_records_cnt; i++) {
+ if (--nfp_prog->map_records[i]->count) {
+ nfp_prog->map_records[i] = NULL;
+ continue;
+ }
+
+ WARN_ON(rhashtable_remove_fast(&bpf->maps_neutral,
+ &nfp_prog->map_records[i]->l,
+ nfp_bpf_maps_neutral_params));
+ freed = true;
+ }
+
+ if (freed) {
+ synchronize_rcu();
+
+ for (i = 0; i < nfp_prog->map_records_cnt; i++)
+ if (nfp_prog->map_records[i]) {
+ bpf_map_put(nfp_prog->map_records[i]->ptr);
+ kfree(nfp_prog->map_records[i]);
+ }
+ }
+
+ kfree(nfp_prog->map_records);
+ nfp_prog->map_records = NULL;
+ nfp_prog->map_records_cnt = 0;
+}
+
+static int
+nfp_map_ptrs_record(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog,
+ struct bpf_prog *prog)
+{
+ int i, cnt, err;
+
+ /* Quickly count the maps we will have to remember */
+ cnt = 0;
+ for (i = 0; i < prog->aux->used_map_cnt; i++)
+ if (bpf_map_offload_neutral(prog->aux->used_maps[i]))
+ cnt++;
+ if (!cnt)
+ return 0;
+
+ nfp_prog->map_records = kmalloc_array(cnt,
+ sizeof(nfp_prog->map_records[0]),
+ GFP_KERNEL);
+ if (!nfp_prog->map_records)
+ return -ENOMEM;
+
+ for (i = 0; i < prog->aux->used_map_cnt; i++)
+ if (bpf_map_offload_neutral(prog->aux->used_maps[i])) {
+ err = nfp_map_ptr_record(bpf, nfp_prog,
+ prog->aux->used_maps[i]);
+ if (err) {
+ nfp_map_ptrs_forget(bpf, nfp_prog);
+ return err;
+ }
+ }
+ WARN_ON(cnt != nfp_prog->map_records_cnt);
+
+ return 0;
+}
+
static int
nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
unsigned int cnt)
@@ -151,7 +271,7 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
prog->aux->offload->jited_len = nfp_prog->prog_len * sizeof(u64);
prog->aux->offload->jited_image = nfp_prog->prog;
- return 0;
+ return nfp_map_ptrs_record(nfp_prog->bpf, nfp_prog, prog);
}
static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog)
@@ -159,6 +279,7 @@ static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog)
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
kvfree(nfp_prog->prog);
+ nfp_map_ptrs_forget(nfp_prog->bpf, nfp_prog);
nfp_prog_free(nfp_prog);
return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c b/drivers/net/ethernet/netronome/nfp/nfp_app.c
index 6aedef0ad433..0e0253c7e17b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 263e13ede029..9b87198deea2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -282,6 +282,7 @@ void bpf_map_put(struct bpf_map *map)
{
__bpf_map_put(map, true);
}
+EXPORT_SYMBOL_GPL(bpf_map_put);
void bpf_map_put_with_uref(struct bpf_map *map)
{
@@ -543,6 +544,7 @@ struct bpf_map *bpf_map_inc(struct bpf_map *map, bool uref)
atomic_inc(&map->usercnt);
return map;
}
+EXPORT_SYMBOL_GPL(bpf_map_inc);
struct bpf_map *bpf_map_get_with_uref(u32 ufd)
{
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 01/10] bpf: offload: allow offloaded programs to use perf event arrays
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180504013717.29317-1-jakub.kicinski@netronome.com>
BPF_MAP_TYPE_PERF_EVENT_ARRAY is special as far as offload goes.
The map only holds glue to perf ring, not actual data. Allow
non-offloaded perf event arrays to be used in offloaded programs.
Offload driver can extract the events from HW and put them in
the map for user space to retrieve.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
---
include/linux/bpf.h | 5 +++++
kernel/bpf/offload.c | 6 ++++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c553f6f9c6b0..9d68bb665cb2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -110,6 +110,11 @@ static inline struct bpf_offloaded_map *map_to_offmap(struct bpf_map *map)
return container_of(map, struct bpf_offloaded_map, map);
}
+static inline bool bpf_map_offload_neutral(const struct bpf_map *map)
+{
+ return map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
+}
+
static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
{
return map->ops->map_seq_show_elem && map->ops->map_check_btf;
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index c9401075b58c..ac747d5cf7c6 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -474,8 +474,10 @@ bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map)
struct bpf_prog_offload *offload;
bool ret;
- if (!bpf_prog_is_dev_bound(prog->aux) || !bpf_map_is_dev_bound(map))
+ if (!bpf_prog_is_dev_bound(prog->aux))
return false;
+ if (!bpf_map_is_dev_bound(map))
+ return bpf_map_offload_neutral(map);
down_read(&bpf_devs_lock);
offload = prog->aux->offload;
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 00/10] bpf: support offload of bpf_event_output()
From: Jakub Kicinski @ 2018-05-04 1:37 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
Hi!
This series centres on NFP offload of bpf_event_output(). The
first patch allows perf event arrays to be used by offloaded
programs. Next patch makes the nfp driver keep track of such
arrays to be able to filter FW events referring to maps.
Perf event arrays are not device bound. Having driver
reimplement and manage the perf array seems brittle and unnecessary.
Patch 4 moves slightly the verifier step which replaces map fds
with map pointers. This is useful for nfp JIT since we can then
easily replace host pointers with NFP table ids (patch 6). This
allows us to lift the limitation on map helpers having to be used
with the same map pointer on all paths. Second use of replacing
fds with real host map pointers is that we can use the host map
pointer as a key for FW events in perf event array offload.
Patch 5 adds perf event output offload support for the NFP.
There are some differences between bpf_event_output() offloaded
and non-offloaded version. The FW messages which carry events
may get dropped and reordered relatively easily. The return codes
from the helper are also not guaranteed to match the host. Users
are warned about some of those discrepancies with a one time
warning message to kernel logs.
bpftool gains an ability to dump perf ring events in a very simple
format. This was very useful for testing and simple debug, maybe
it will be useful to others?
Last patch is a trivial comment fix.
Jakub Kicinski (10):
bpf: offload: allow offloaded programs to use perf event arrays
nfp: bpf: record offload neutral maps in the driver
bpf: export bpf_event_output()
bpf: replace map pointer loads before calling into offloads
nfp: bpf: perf event output helpers support
nfp: bpf: rewrite map pointers with NFP TIDs
tools: bpftool: fold hex keyword in command help
tools: bpftool: move get_possible_cpus() to common code
tools: bpftool: add simple perf event output reader
bpf: fix references to free_bpf_prog_info() in comments
drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 16 +-
drivers/net/ethernet/netronome/nfp/bpf/fw.h | 20 +-
drivers/net/ethernet/netronome/nfp/bpf/jit.c | 76 +++-
drivers/net/ethernet/netronome/nfp/bpf/main.c | 28 +-
drivers/net/ethernet/netronome/nfp/bpf/main.h | 24 +-
.../net/ethernet/netronome/nfp/bpf/offload.c | 172 ++++++++-
.../net/ethernet/netronome/nfp/bpf/verifier.c | 74 +++-
drivers/net/ethernet/netronome/nfp/nfp_app.c | 2 +-
include/linux/bpf.h | 5 +
kernel/bpf/core.c | 1 +
kernel/bpf/offload.c | 6 +-
kernel/bpf/syscall.c | 2 +
kernel/bpf/verifier.c | 14 +-
.../bpf/bpftool/Documentation/bpftool-map.rst | 40 +-
tools/bpf/bpftool/Documentation/bpftool.rst | 2 +-
tools/bpf/bpftool/Makefile | 7 +-
tools/bpf/bpftool/bash-completion/bpftool | 36 +-
tools/bpf/bpftool/common.c | 77 +++-
tools/bpf/bpftool/main.h | 7 +-
tools/bpf/bpftool/map.c | 80 +---
tools/bpf/bpftool/map_perf_ring.c | 347 ++++++++++++++++++
21 files changed, 912 insertions(+), 124 deletions(-)
create mode 100644 tools/bpf/bpftool/map_perf_ring.c
--
2.17.0
^ permalink raw reply
* [PATCHv2] net/xfrm: Revert "[XFRM]: Do not add a state whose SPI is zero to the SPI hash."
From: Dmitry Safonov @ 2018-05-04 1:20 UTC (permalink / raw)
To: linux-kernel
Cc: 0x7f454c46, Dmitry Safonov, Herbert Xu, Masahide NAKAMURA,
YOSHIFUJI Hideaki, Steffen Klassert, David S. Miller, netdev
This reverts commit 7b4dc3600e48 ("[XFRM]: Do not add a state whose SPI
is zero to the SPI hash.").
Zero SPI is legal and defined for IPcomp.
We shouldn't omit adding the state to SPI hash because it'll not be
possible to delete or lookup for it afterward:
__xfrm_state_insert() obviously doesn't add hash for zero
SPI in xfrm.state_byspi, and xfrm_user_state_lookup() will fail as
xfrm_state_lookup() does lookups by hash.
It also isn't possible to workaround from userspace as
xfrm_id_proto_match() will be always true for ah/esp/comp protos.
v1 link: https://lkml.kernel.org/r/<20180502020220.2027-1-dima@arista.com>
Cc: Masahide NAKAMURA <nakam@linux-ipv6.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
net/xfrm/xfrm_state.c | 39 +++++++++++++++------------------------
1 file changed, 15 insertions(+), 24 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f9d2f2233f09..03afe5423448 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -97,12 +97,9 @@ static void xfrm_hash_transfer(struct hlist_head *list,
nhashmask);
hlist_add_head_rcu(&x->bysrc, nsrctable + h);
- if (x->id.spi) {
- h = __xfrm_spi_hash(&x->id.daddr, x->id.spi,
- x->id.proto, x->props.family,
- nhashmask);
- hlist_add_head_rcu(&x->byspi, nspitable + h);
- }
+ h = __xfrm_spi_hash(&x->id.daddr, x->id.spi, x->id.proto,
+ x->props.family, nhashmask);
+ hlist_add_head_rcu(&x->byspi, nspitable + h);
}
}
@@ -613,8 +610,7 @@ int __xfrm_state_delete(struct xfrm_state *x)
list_del(&x->km.all);
hlist_del_rcu(&x->bydst);
hlist_del_rcu(&x->bysrc);
- if (x->id.spi)
- hlist_del_rcu(&x->byspi);
+ hlist_del_rcu(&x->byspi);
net->xfrm.state_num--;
spin_unlock(&net->xfrm.xfrm_state_lock);
@@ -958,7 +954,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
tmpl->mode == x->props.mode &&
tmpl->id.proto == x->id.proto &&
- (tmpl->id.spi == x->id.spi || !tmpl->id.spi))
+ tmpl->id.spi == x->id.spi)
xfrm_state_look_at(pol, x, fl, encap_family,
&best, &acquire_in_progress, &error);
}
@@ -974,7 +970,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
xfrm_addr_equal(&x->id.daddr, daddr, encap_family) &&
tmpl->mode == x->props.mode &&
tmpl->id.proto == x->id.proto &&
- (tmpl->id.spi == x->id.spi || !tmpl->id.spi))
+ tmpl->id.spi == x->id.spi)
xfrm_state_look_at(pol, x, fl, encap_family,
&best, &acquire_in_progress, &error);
}
@@ -982,8 +978,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
found:
x = best;
if (!x && !error && !acquire_in_progress) {
- if (tmpl->id.spi &&
- (x0 = __xfrm_state_lookup(net, mark, daddr, tmpl->id.spi,
+ if ((x0 = __xfrm_state_lookup(net, mark, daddr, tmpl->id.spi,
tmpl->id.proto, encap_family)) != NULL) {
to_put = x0;
error = -EEXIST;
@@ -1025,10 +1020,8 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
hlist_add_head_rcu(&x->bydst, net->xfrm.state_bydst + h);
h = xfrm_src_hash(net, daddr, saddr, encap_family);
hlist_add_head_rcu(&x->bysrc, net->xfrm.state_bysrc + h);
- if (x->id.spi) {
- h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family);
- hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h);
- }
+ h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family);
+ hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h);
x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
tasklet_hrtimer_start(&x->mtimer, ktime_set(net->xfrm.sysctl_acq_expires, 0), HRTIMER_MODE_REL);
net->xfrm.state_num++;
@@ -1134,7 +1127,7 @@ static void __xfrm_state_insert(struct xfrm_state *x)
h = xfrm_src_hash(net, &x->id.daddr, &x->props.saddr, x->props.family);
hlist_add_head_rcu(&x->bysrc, net->xfrm.state_bysrc + h);
- if (x->id.spi) {
+ if (xfrm_id_proto_match(x->id.proto, IPSEC_PROTO_ANY)) {
h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto,
x->props.family);
@@ -1787,14 +1780,12 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 high)
xfrm_state_put(x0);
}
}
- if (x->id.spi) {
- spin_lock_bh(&net->xfrm.xfrm_state_lock);
- h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, x->props.family);
- hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h);
- spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+ spin_lock_bh(&net->xfrm.xfrm_state_lock);
+ h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, x->props.family);
+ hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h);
+ spin_unlock_bh(&net->xfrm.xfrm_state_lock);
- err = 0;
- }
+ err = 0;
unlock:
spin_unlock_bh(&x->lock);
--
2.13.6
^ permalink raw reply related
* Re: [PATCH net-next 1/4] ipv6: Calculate hash thresholds for IPv6 nexthops
From: David Ahern @ 2018-05-04 1:13 UTC (permalink / raw)
To: Thomas Winter, Ido Schimmel
Cc: Eric Dumazet, davem@davemloft.net, Ido Schimmel,
netdev@vger.kernel.org, roopa@cumulusnetworks.com,
nikolay@cumulusnetworks.com, pch@ordbogen.com, jkbs@redhat.com,
yoshfuji@linux-ipv6.org, mlxsw@mellanox.com
In-Reply-To: <84f1a89f-20f0-a559-8a1b-da1400794f29@gmail.com>
On 5/2/18 2:56 PM, David Ahern wrote:
> On 5/2/18 2:48 PM, Thomas Winter wrote:
>> Should I look at reworking this? It would be great to have these ECMP routes for other purposes.
>
> Looking at my IPv6 bug list this change is on it -- allowing ECMP routes
> to have a device only hop.
>
> Let me take a look at it at the same time as a few other bugs.
>
I see the problem: the multipath code for IPv6 tries to helpful and
auto-determine that a new route can be appended to an existing one --
basically adding another nexthop if it already exists. What it should be
doing is requiring the NLM_F_APPEND to modify an existing route. If the
same prefix and metric comes down and APPEND or REPLACE is not set it
should fail EEXISTS rather than consolidating into an ECMP.
Fixing it to do the right thing will break existing userspace, but as it
stands it prevents dev only nexthops (no gateway) and replace with a
REJECT route ends up adding another route
e.g., ip -6 ro replace unreachable 2001:db8:104::/64
leaves the existing route and adds a new entry which can never be hit:
$ ip -6 ro ls
...
2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024 pref medium
unreachable 2001:db8:104::/64 dev lo metric 1024 pref medium
...
^ permalink raw reply
* Re: [PATCH net-next] net: stmmac: Add support for U32 TC filter using Flexible RX Parser
From: Jakub Kicinski @ 2018-05-04 0:57 UTC (permalink / raw)
To: Jose Abreu
Cc: netdev, David S. Miller, Joao Pinto, Vitor Soares,
Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <9f57f98ef35360619001882fdcf0d7ef0558863f.1525351447.git.joabreu@synopsys.com>
On Thu, 3 May 2018 13:45:30 +0100, Jose Abreu wrote:
> + case TC_SETUP_CLSU32:
> + if (!(priv->dev->hw_features & NETIF_F_HW_TC))
> + ret = -EOPNOTSUPP;
> + else
> + ret = stmmac_tc_setup_cls_u32(priv, priv, type_data);
> + break;
Please just use tc_cls_can_offload_and_chain0(). The extack message it
sets is useful for debugging and you don't support chains other than 0
either.
^ permalink raw reply
* Re: [bpf-next v1 8/9] bpf: Provide helper to do lookups in kernel FIB table
From: David Ahern @ 2018-05-04 0:47 UTC (permalink / raw)
To: Daniel Borkmann, netdev, borkmann, ast
Cc: davem, shm, roopa, brouer, toke, john.fastabend
In-Reply-To: <df3ac0c5-de3c-f47d-2d8b-1b3b42a51187@iogearbox.net>
On 5/3/18 6:45 PM, Daniel Borkmann wrote:
>> + .ret_type = RET_INTEGER,
>> + .arg1_type = ARG_PTR_TO_CTX,
>> + .arg2_type = ARG_PTR_TO_MEM,
>> + .arg3_type = ARG_CONST_SIZE,
>> + .arg4_type = ARG_ANYTHING,
>> +};
>> +
>> +BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
>> + struct bpf_fib_lookup *, params, int, plen, u32, flags)
>> +{
>> + if (plen < sizeof(*params))
>> + return -EINVAL;
>> +
>> + switch (params->family) {
>> +#if IS_ENABLED(CONFIG_INET)
>> + case AF_INET:
>> + return bpf_ipv4_fib_lookup(dev_net(skb->dev), params, flags);
>> +#endif
>> +#if IS_ENABLED(CONFIG_IPV6)
>> + case AF_INET6:
>> + return bpf_ipv6_fib_lookup(dev_net(skb->dev), params, flags);
>> +#endif
>> + }
>> + return -ENOTSUPP;
>> +}
>> +
>> +static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
>> + .func = bpf_skb_fib_lookup,
>> + .gpl_only = true,
>> + .pkt_access = true,
>
> ... this should both not be marked as pkt_access = true. What this means is that
> arg2, which is the struct bpf_fib_lookup, could come from the raw packet buffer.
leftover from the first version which did pass in the packet. Will remove.
^ permalink raw reply
* Re: [bpf-next v1 8/9] bpf: Provide helper to do lookups in kernel FIB table
From: Daniel Borkmann @ 2018-05-04 0:45 UTC (permalink / raw)
To: David Ahern, netdev, borkmann, ast
Cc: davem, shm, roopa, brouer, toke, john.fastabend
In-Reply-To: <20180503035319.18290-9-dsahern@gmail.com>
On 05/03/2018 05:53 AM, David Ahern wrote:
[...]
> +
> +BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
> + struct bpf_fib_lookup *, params, int, plen, u32, flags)
> +{
> + if (plen < sizeof(*params))
> + return -EINVAL;
> +
> + switch (params->family) {
> +#if IS_ENABLED(CONFIG_INET)
> + case AF_INET:
> + return bpf_ipv4_fib_lookup(dev_net(ctx->rxq->dev), params,
> + flags);
> +#endif
> +#if IS_ENABLED(CONFIG_IPV6)
> + case AF_INET6:
> + return bpf_ipv6_fib_lookup(dev_net(ctx->rxq->dev), params,
> + flags);
> +#endif
> + }
> + return -ENOTSUPP;
> +}
> +
> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
> + .func = bpf_xdp_fib_lookup,
> + .gpl_only = true,
> + .pkt_access = true,
Hmm, this one and ...
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_CTX,
> + .arg2_type = ARG_PTR_TO_MEM,
> + .arg3_type = ARG_CONST_SIZE,
> + .arg4_type = ARG_ANYTHING,
> +};
> +
> +BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
> + struct bpf_fib_lookup *, params, int, plen, u32, flags)
> +{
> + if (plen < sizeof(*params))
> + return -EINVAL;
> +
> + switch (params->family) {
> +#if IS_ENABLED(CONFIG_INET)
> + case AF_INET:
> + return bpf_ipv4_fib_lookup(dev_net(skb->dev), params, flags);
> +#endif
> +#if IS_ENABLED(CONFIG_IPV6)
> + case AF_INET6:
> + return bpf_ipv6_fib_lookup(dev_net(skb->dev), params, flags);
> +#endif
> + }
> + return -ENOTSUPP;
> +}
> +
> +static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
> + .func = bpf_skb_fib_lookup,
> + .gpl_only = true,
> + .pkt_access = true,
... this should both not be marked as pkt_access = true. What this means is that
arg2, which is the struct bpf_fib_lookup, could come from the raw packet buffer.
Meaning instead of the struct sitting on stack it could in that case be passed
out of the packet directly. We have this enabled in helpers like csum_diff() or
map lookup/update where it would make sense to pass raw buffer there, but here
it seems unintentional and should be removed instead. See also description in
last paragraph in 36bbef52c7eb ("bpf: direct packet write and access for helpers
for clsact progs").
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_CTX,
> + .arg2_type = ARG_PTR_TO_MEM,
> + .arg3_type = ARG_CONST_SIZE,
> + .arg4_type = ARG_ANYTHING,
> +};
> +
> static const struct bpf_func_proto *
> bpf_base_func_proto(enum bpf_func_id func_id)
> {
> @@ -3933,6 +4192,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> case BPF_FUNC_skb_get_xfrm_state:
> return &bpf_skb_get_xfrm_state_proto;
> #endif
> + case BPF_FUNC_fib_lookup:
> + return &bpf_skb_fib_lookup_proto;
> default:
> return bpf_base_func_proto(func_id);
> }
> @@ -3958,6 +4219,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> return &bpf_xdp_redirect_map_proto;
> case BPF_FUNC_xdp_adjust_tail:
> return &bpf_xdp_adjust_tail_proto;
> + case BPF_FUNC_fib_lookup:
> + return &bpf_xdp_fib_lookup_proto;
> default:
> return bpf_base_func_proto(func_id);
> }
>
^ permalink raw reply
* [RFC PATCH 2/2] ixgbevf: Add support for UDP segmentation offload
From: Alexander Duyck @ 2018-05-04 0:39 UTC (permalink / raw)
To: netdev, willemb, intel-wired-lan
In-Reply-To: <20180504003556.4769.11407.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch adds support for UDP segmentation offload. Relatively few
changes were needed to add this support as it functions much like the TCP
segmentation offload.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 25 ++++++++++++++++-----
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 9a939dcaf727..c2986142c98a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3709,6 +3709,7 @@ static int ixgbevf_tso(struct ixgbevf_ring *tx_ring,
} ip;
union {
struct tcphdr *tcp;
+ struct udphdr *udp;
unsigned char *hdr;
} l4;
u32 paylen, l4_offset;
@@ -3731,7 +3732,8 @@ static int ixgbevf_tso(struct ixgbevf_ring *tx_ring,
l4.hdr = skb_checksum_start(skb);
/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
- type_tucmd = IXGBE_ADVTXD_TUCMD_L4T_TCP;
+ type_tucmd = (skb->csum_offset == offsetof(struct tcphdr, check)) ?
+ IXGBE_ADVTXD_TUCMD_L4T_TCP : IXGBE_ADVTXD_TUCMD_L4T_UDP;
/* initialize outer IP header fields */
if (ip.v4->version == 4) {
@@ -3759,12 +3761,20 @@ static int ixgbevf_tso(struct ixgbevf_ring *tx_ring,
/* determine offset of inner transport header */
l4_offset = l4.hdr - skb->data;
- /* compute length of segmentation header */
- *hdr_len = (l4.tcp->doff * 4) + l4_offset;
-
/* remove payload length from inner checksum */
paylen = skb->len - l4_offset;
- csum_replace_by_diff(&l4.tcp->check, htonl(paylen));
+
+ if (type_tucmd & IXGBE_ADVTXD_TUCMD_L4T_TCP) {
+ /* compute length of segmentation header */
+ *hdr_len = (l4.tcp->doff * 4) + l4_offset;
+ csum_replace_by_diff(&l4.tcp->check,
+ (__force __wsum)htonl(paylen));
+ } else {
+ /* compute length of segmentation header */
+ *hdr_len = sizeof(*l4.udp) + l4_offset;
+ csum_replace_by_diff(&l4.udp->check,
+ (__force __wsum)htonl(paylen));
+ }
/* update gso size and bytecount with header size */
first->gso_segs = skb_shinfo(skb)->gso_segs;
@@ -4368,6 +4378,7 @@ static void ixgbevf_get_stats(struct net_device *netdev,
if (unlikely(mac_hdr_len > IXGBEVF_MAX_MAC_HDR_LEN))
return features & ~(NETIF_F_HW_CSUM |
NETIF_F_SCTP_CRC |
+ NETIF_F_GSO_UDP_L4 |
NETIF_F_HW_VLAN_CTAG_TX |
NETIF_F_TSO |
NETIF_F_TSO6);
@@ -4376,6 +4387,7 @@ static void ixgbevf_get_stats(struct net_device *netdev,
if (unlikely(network_hdr_len > IXGBEVF_MAX_NETWORK_HDR_LEN))
return features & ~(NETIF_F_HW_CSUM |
NETIF_F_SCTP_CRC |
+ NETIF_F_GSO_UDP_L4 |
NETIF_F_TSO |
NETIF_F_TSO6);
@@ -4571,7 +4583,8 @@ static int ixgbevf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
NETIF_F_TSO6 |
NETIF_F_RXCSUM |
NETIF_F_HW_CSUM |
- NETIF_F_SCTP_CRC;
+ NETIF_F_SCTP_CRC |
+ NETIF_F_GSO_UDP_L4;
#define IXGBEVF_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \
NETIF_F_GSO_GRE_CSUM | \
^ permalink raw reply related
* [RFC PATCH 1/2] ixgbe: Add support for UDP segmentation offload
From: Alexander Duyck @ 2018-05-04 0:39 UTC (permalink / raw)
To: netdev, willemb, intel-wired-lan
In-Reply-To: <20180504003556.4769.11407.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch adds support for UDP segmentation offload. Relatively few
changes were needed to add this support as it functions much like the TCP
segmentation offload.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a52d92e182ee..0bed3350a795 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7693,6 +7693,7 @@ static int ixgbe_tso(struct ixgbe_ring *tx_ring,
} ip;
union {
struct tcphdr *tcp;
+ struct udphdr *udp;
unsigned char *hdr;
} l4;
u32 paylen, l4_offset;
@@ -7716,7 +7717,8 @@ static int ixgbe_tso(struct ixgbe_ring *tx_ring,
l4.hdr = skb_checksum_start(skb);
/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
- type_tucmd = IXGBE_ADVTXD_TUCMD_L4T_TCP;
+ type_tucmd = (skb->csum_offset == offsetof(struct tcphdr, check)) ?
+ IXGBE_ADVTXD_TUCMD_L4T_TCP : IXGBE_ADVTXD_TUCMD_L4T_UDP;
/* initialize outer IP header fields */
if (ip.v4->version == 4) {
@@ -7746,12 +7748,20 @@ static int ixgbe_tso(struct ixgbe_ring *tx_ring,
/* determine offset of inner transport header */
l4_offset = l4.hdr - skb->data;
- /* compute length of segmentation header */
- *hdr_len = (l4.tcp->doff * 4) + l4_offset;
-
/* remove payload length from inner checksum */
paylen = skb->len - l4_offset;
- csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
+
+ if (type_tucmd & IXGBE_ADVTXD_TUCMD_L4T_TCP) {
+ /* compute length of segmentation header */
+ *hdr_len = (l4.tcp->doff * 4) + l4_offset;
+ csum_replace_by_diff(&l4.tcp->check,
+ (__force __wsum)htonl(paylen));
+ } else {
+ /* compute length of segmentation header */
+ *hdr_len = sizeof(*l4.udp) + l4_offset;
+ csum_replace_by_diff(&l4.udp->check,
+ (__force __wsum)htonl(paylen));
+ }
/* update gso size and bytecount with header size */
first->gso_segs = skb_shinfo(skb)->gso_segs;
@@ -9931,6 +9941,7 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
if (unlikely(mac_hdr_len > IXGBE_MAX_MAC_HDR_LEN))
return features & ~(NETIF_F_HW_CSUM |
NETIF_F_SCTP_CRC |
+ NETIF_F_GSO_UDP_L4 |
NETIF_F_HW_VLAN_CTAG_TX |
NETIF_F_TSO |
NETIF_F_TSO6);
@@ -9939,6 +9950,7 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
if (unlikely(network_hdr_len > IXGBE_MAX_NETWORK_HDR_LEN))
return features & ~(NETIF_F_HW_CSUM |
NETIF_F_SCTP_CRC |
+ NETIF_F_GSO_UDP_L4 |
NETIF_F_TSO |
NETIF_F_TSO6);
@@ -10480,7 +10492,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
IXGBE_GSO_PARTIAL_FEATURES;
if (hw->mac.type >= ixgbe_mac_82599EB)
- netdev->features |= NETIF_F_SCTP_CRC;
+ netdev->features |= NETIF_F_SCTP_CRC | NETIF_F_GSO_UDP_L4;
/* copy netdev features into list of user selectable features */
netdev->hw_features |= netdev->features |
^ permalink raw reply related
* [RFC PATCH 0/2] ixgbe/ixgbevf: Add support for UDP segmentation offload
From: Alexander Duyck @ 2018-05-04 0:39 UTC (permalink / raw)
To: netdev, willemb, intel-wired-lan
These patches are meant to be a follow-up to the following series:
https://patchwork.ozlabs.org/project/netdev/list/?series=42476&archive=both&state=*
These patches enable driver support for the new UDP segmentation offload
feature. For now I am pushing them as an RFC as they haven't been
officially productized or validated and I was using these mostly as a test
vehicle to verify the offload could be supported generically by drivers.
---
Alexander Duyck (2):
ixgbe: Add support for UDP segmentation offload
ixgbevf: Add support for UDP segmentation offload
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 +++++++++++++++-----
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 25 ++++++++++++++++-----
2 files changed, 37 insertions(+), 12 deletions(-)
^ permalink raw reply
* [net-next PATCH 5/5] net: Add NETIF_F_GSO_UDP_L4 to list of GSO offloads with fallback
From: Alexander Duyck @ 2018-05-04 0:33 UTC (permalink / raw)
To: netdev, willemb, davem
In-Reply-To: <20180504002817.4496.64616.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Enable UDP offload as a generic software offload since we can now handle it
for multiple cases including if the checksum isn't present and via
gso_partial in the case of tunnels.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
include/linux/netdev_features.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index c87c3a3453c1..efbd8b2c0197 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -184,7 +184,8 @@ enum {
/* List of features with software fallbacks. */
#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | \
- NETIF_F_GSO_SCTP)
+ NETIF_F_GSO_SCTP| \
+ NETIF_F_GSO_UDP_L4)
/*
* If one device supports one of these features, then enable them
^ permalink raw reply related
* [net-next PATCH 4/5] udp: Do not copy destructor if one is not present
From: Alexander Duyck @ 2018-05-04 0:33 UTC (permalink / raw)
To: netdev, willemb, davem
In-Reply-To: <20180504002817.4496.64616.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch makes it so that if a destructor is not present we avoid trying
to update the skb socket or any reference counting that would be associated
with the NULL socket and/or descriptor. By doing this we can support
traffic coming from another namespace without any issues.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
net/ipv4/udp_offload.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index fd94bbb369b2..52760660d674 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -195,6 +195,7 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
unsigned int sum_truesize = 0;
struct udphdr *uh;
unsigned int mss;
+ bool copy_dtor;
__sum16 check;
__be16 newlen;
@@ -208,12 +209,14 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
skb_pull(gso_skb, sizeof(*uh));
/* clear destructor to avoid skb_segment assigning it to tail */
- WARN_ON_ONCE(gso_skb->destructor != sock_wfree);
+ copy_dtor = gso_skb->destructor == sock_wfree;
+ WARN_ON_ONCE(gso_skb->destructor && !copy_dtor);
gso_skb->destructor = NULL;
segs = skb_segment(gso_skb, features);
if (unlikely(IS_ERR_OR_NULL(segs))) {
- gso_skb->destructor = sock_wfree;
+ if (copy_dtor)
+ gso_skb->destructor = sock_wfree;
return segs;
}
@@ -234,9 +237,11 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
(__force u32)newlen));
for (;;) {
- seg->destructor = sock_wfree;
- seg->sk = sk;
- sum_truesize += seg->truesize;
+ if (copy_dtor) {
+ seg->destructor = sock_wfree;
+ seg->sk = sk;
+ sum_truesize += seg->truesize;
+ }
if (!seg->next)
break;
@@ -268,7 +273,9 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
uh->check = gso_make_checksum(seg, ~check);
/* update refcount for the packet */
- refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc);
+ if (copy_dtor)
+ refcount_add(sum_truesize - gso_skb->truesize,
+ &sk->sk_wmem_alloc);
out:
return segs;
}
^ permalink raw reply related
* [net-next PATCH 3/5] udp: Add support for software checksum and GSO_PARTIAL with GSO offload
From: Alexander Duyck @ 2018-05-04 0:33 UTC (permalink / raw)
To: netdev, willemb, davem
In-Reply-To: <20180504002817.4496.64616.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch adds support for a software provided checksum and GSO_PARTIAL
segmentation support. With this we can offload UDP segmentation on devices
that only have partial support for tunnels.
Since we are no longer needing the hardware checksum we can drop the checks
in the segmentation code that were verifying if it was present.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
net/ipv4/udp_offload.c | 28 ++++++++++++++++++----------
net/ipv6/udp_offload.c | 11 +----------
2 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 946d06d2aa0c..fd94bbb369b2 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -217,6 +217,13 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
return segs;
}
+ /* GSO partial and frag_list segmentation only requires splitting
+ * the frame into an MSS multiple and possibly a remainder, both
+ * cases return a GSO skb. So update the mss now.
+ */
+ if (skb_is_gso(segs))
+ mss *= skb_shinfo(segs)->gso_segs;
+
seg = segs;
uh = udp_hdr(seg);
@@ -237,6 +244,11 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
uh->len = newlen;
uh->check = check;
+ if (seg->ip_summed == CHECKSUM_PARTIAL)
+ gso_reset_checksum(seg, ~check);
+ else
+ uh->check = gso_make_checksum(seg, ~check);
+
seg = seg->next;
uh = udp_hdr(seg);
}
@@ -250,6 +262,11 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
uh->len = newlen;
uh->check = check;
+ if (seg->ip_summed == CHECKSUM_PARTIAL)
+ gso_reset_checksum(seg, ~check);
+ else
+ uh->check = gso_make_checksum(seg, ~check);
+
/* update refcount for the packet */
refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc);
out:
@@ -257,15 +274,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
}
EXPORT_SYMBOL_GPL(__udp_gso_segment);
-static struct sk_buff *__udp4_gso_segment(struct sk_buff *gso_skb,
- netdev_features_t features)
-{
- if (!can_checksum_protocol(features, htons(ETH_P_IP)))
- return ERR_PTR(-EIO);
-
- return __udp_gso_segment(gso_skb, features);
-}
-
static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
netdev_features_t features)
{
@@ -289,7 +297,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
goto out;
if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
- return __udp4_gso_segment(skb, features);
+ return __udp_gso_segment(skb, features);
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 61e34f1d2fa2..03a2ff3fe1e6 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -17,15 +17,6 @@
#include <net/ip6_checksum.h>
#include "ip6_offload.h"
-static struct sk_buff *__udp6_gso_segment(struct sk_buff *gso_skb,
- netdev_features_t features)
-{
- if (!can_checksum_protocol(features, htons(ETH_P_IPV6)))
- return ERR_PTR(-EIO);
-
- return __udp_gso_segment(gso_skb, features);
-}
-
static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
netdev_features_t features)
{
@@ -58,7 +49,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
goto out;
if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
- return __udp6_gso_segment(skb, features);
+ return __udp_gso_segment(skb, features);
/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
* do checksum of UDP packets sent as multiple IP fragments.
^ permalink raw reply related
* [net-next PATCH 2/5] udp: Do not pass checksum or MSS as parameters
From: Alexander Duyck @ 2018-05-04 0:33 UTC (permalink / raw)
To: netdev, willemb, davem
In-Reply-To: <20180504002817.4496.64616.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch is meant to be a start at cleaning up some of the UDP GSO
segmentation code. Specifically we were passing mss and a recomputed
checksum when we really didn't need to. The function itself could derive
that information based on the already provided checksum, the length of the
packet stored in the UDP header, and the gso_size.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
include/net/udp.h | 3 +-
net/ipv4/udp_offload.c | 61 +++++++++++++++++++++++++++++++-----------------
net/ipv6/udp_offload.c | 7 +-----
3 files changed, 41 insertions(+), 30 deletions(-)
diff --git a/include/net/udp.h b/include/net/udp.h
index 05990746810e..9289b6425032 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -175,8 +175,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
- netdev_features_t features,
- unsigned int mss, __sum16 check);
+ netdev_features_t features);
static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
{
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 006257092f06..946d06d2aa0c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -188,19 +188,23 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
EXPORT_SYMBOL(skb_udp_tunnel_segment);
struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
- netdev_features_t features,
- unsigned int mss, __sum16 check)
+ netdev_features_t features)
{
+ struct sk_buff *seg, *segs = ERR_PTR(-EINVAL);
struct sock *sk = gso_skb->sk;
unsigned int sum_truesize = 0;
- struct sk_buff *segs, *seg;
- unsigned int hdrlen;
struct udphdr *uh;
+ unsigned int mss;
+ __sum16 check;
+ __be16 newlen;
+ mss = skb_shinfo(gso_skb)->gso_size;
if (gso_skb->len <= sizeof(*uh) + mss)
- return ERR_PTR(-EINVAL);
+ goto out;
+
+ if (!pskb_may_pull(gso_skb, sizeof(*uh)))
+ goto out;
- hdrlen = gso_skb->data - skb_mac_header(gso_skb);
skb_pull(gso_skb, sizeof(*uh));
/* clear destructor to avoid skb_segment assigning it to tail */
@@ -213,24 +217,42 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
return segs;
}
- for (seg = segs; seg; seg = seg->next) {
- uh = udp_hdr(seg);
- uh->len = htons(seg->len - hdrlen);
- uh->check = check;
+ seg = segs;
+ uh = udp_hdr(seg);
- /* last packet can be partial gso_size */
- if (!seg->next)
- csum_replace2(&uh->check, htons(mss),
- htons(seg->len - hdrlen - sizeof(*uh)));
+ /* compute checksum adjustment based on old length versus new */
+ newlen = htons(sizeof(*uh) + mss);
+ check = ~csum_fold((__force __wsum)((__force u32)uh->check +
+ ((__force u32)uh->len ^ 0xFFFF) +
+ (__force u32)newlen));
- uh->check = ~uh->check;
+ for (;;) {
seg->destructor = sock_wfree;
seg->sk = sk;
sum_truesize += seg->truesize;
+
+ if (!seg->next)
+ break;
+
+ uh->len = newlen;
+ uh->check = check;
+
+ seg = seg->next;
+ uh = udp_hdr(seg);
}
- refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc);
+ /* last packet can be partial gso_size, account for that in checksum */
+ newlen = htons(skb_tail_pointer(seg) - skb_transport_header(seg) +
+ seg->data_len);
+ check = ~csum_fold((__force __wsum)((__force u32)uh->check +
+ ((__force u32)uh->len ^ 0xFFFF) +
+ (__force u32)newlen));
+ uh->len = newlen;
+ uh->check = check;
+ /* update refcount for the packet */
+ refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc);
+out:
return segs;
}
EXPORT_SYMBOL_GPL(__udp_gso_segment);
@@ -238,15 +260,10 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
static struct sk_buff *__udp4_gso_segment(struct sk_buff *gso_skb,
netdev_features_t features)
{
- const struct iphdr *iph = ip_hdr(gso_skb);
- unsigned int mss = skb_shinfo(gso_skb)->gso_size;
-
if (!can_checksum_protocol(features, htons(ETH_P_IP)))
return ERR_PTR(-EIO);
- return __udp_gso_segment(gso_skb, features, mss,
- udp_v4_check(sizeof(struct udphdr) + mss,
- iph->saddr, iph->daddr, 0));
+ return __udp_gso_segment(gso_skb, features);
}
static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index f7b85b1e6b3e..61e34f1d2fa2 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -20,15 +20,10 @@
static struct sk_buff *__udp6_gso_segment(struct sk_buff *gso_skb,
netdev_features_t features)
{
- const struct ipv6hdr *ip6h = ipv6_hdr(gso_skb);
- unsigned int mss = skb_shinfo(gso_skb)->gso_size;
-
if (!can_checksum_protocol(features, htons(ETH_P_IPV6)))
return ERR_PTR(-EIO);
- return __udp_gso_segment(gso_skb, features, mss,
- udp_v6_check(sizeof(struct udphdr) + mss,
- &ip6h->saddr, &ip6h->daddr, 0));
+ return __udp_gso_segment(gso_skb, features);
}
static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
^ permalink raw reply related
* [net-next PATCH 1/5] udp: Record gso_segs when supporting UDP segmentation offload
From: Alexander Duyck @ 2018-05-04 0:33 UTC (permalink / raw)
To: netdev, willemb, davem
In-Reply-To: <20180504002817.4496.64616.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
We need to record the number of segments that will be generated when this
frame is segmented. The expectation is that if gso_size is set then
gso_segs is set as well. Without this some drivers such as ixgbe get
confused if they attempt to offload this as they record 0 segments for the
entire packet instead of the correct value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
net/ipv4/udp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index dd3102a37ef9..e07db83b311e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -793,6 +793,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
skb_shinfo(skb)->gso_size = cork->gso_size;
skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+ skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(len - sizeof(uh),
+ cork->gso_size);
goto csum_partial;
}
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox