* Re: [PATCH net-next 1/1] net sched actions: fix invalid pointer dereferencing if skbedit flags missing
From: Cong Wang @ 2018-05-10 17:54 UTC (permalink / raw)
To: Roman Mashak
Cc: David Miller, Linux Kernel Network Developers, kernel,
Jamal Hadi Salim, Jiri Pirko, Alexander Duyck
In-Reply-To: <1525900723-17625-1-git-send-email-mrv@mojatatu.com>
On Wed, May 9, 2018 at 2:18 PM, Roman Mashak <mrv@mojatatu.com> wrote:
> The caller calls action's ->init() and passes pointer to "struct tc_action *a",
> which later may initialized to point at the existing action, otherwise
> "struct tc_action *a" is still invalid, and therefore dereferencing it is error.
So technically we just need to check if(exists) before calling
tcf_idr_release(), right?
>
> So checking flags should be done as early as possible, before idr lookups.
In theory, this could break the case of binding an existing action.
But in practice it seems nonsense to pass invalid flags in this case
either, so probably this is okay.
BTW, this should be a candidate for -net and possibly -stable too.
^ permalink raw reply
* [PATCH] [PATCH net v3] ipv6: remove min MTU check for ipsec tunnels
From: Ashwanth Goli @ 2018-05-10 17:45 UTC (permalink / raw)
To: netdev, davem; +Cc: pabeni, dsahern, eric.dumazet
With 749439bfac "fix udpv6 sendmsg crash caused by too small MTU"
ipsec tunnels that report a MTU less than IPV6_MIN_MTU are broken
even for packets that are smaller than IPV6_MIN_MTU.
According to rfc2473#section-7.1
if the original IPv6 packet is equal or smaller than the
IPv6 minimum link MTU, the tunnel entry-point node
encapsulates the original packet, and subsequently
fragments the resulting IPv6 tunnel packet into IPv6
fragments that do not exceed the Path MTU to the tunnel
exit-point.
Dropping the MTU check for ipsec tunnel destinations.
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
---
net/ipv6/ip6_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2e891d2..ea1ef1b 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
+ if (mtu < IPV6_MIN_MTU && !(rt->dst.flags & DST_XFRM_TUNNEL))
return -EINVAL;
cork->base.fragsize = mtu;
if (dst_allfrag(xfrm_dst_path(&rt->dst)))
--
1.9.1
^ permalink raw reply related
* Re: [PATCH net-next] net:sched: add gkprio scheduler
From: Cong Wang @ 2018-05-10 17:38 UTC (permalink / raw)
To: Michel Machado
Cc: Nishanth Devarajan, Jiri Pirko, Jamal Hadi Salim, David Miller,
Linux Kernel Network Developers, Cody Doucette
In-Reply-To: <d450d9da-a1d2-4aae-1ed7-edc2d0972119@digirati.com.br>
On Wed, May 9, 2018 at 7:09 AM, Michel Machado <michel@digirati.com.br> wrote:
> On 05/08/2018 10:24 PM, Cong Wang wrote:
>>
>> On Tue, May 8, 2018 at 5:59 AM, Michel Machado <michel@digirati.com.br>
>> wrote:
>>>>>
>>>>> Overall it looks good to me, just one thing below:
>>>>>
>>>>>> +struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
>>>>>> + .id = "gkprio",
>>>>>> + .priv_size = sizeof(struct gkprio_sched_data),
>>>>>> + .enqueue = gkprio_enqueue,
>>>>>> + .dequeue = gkprio_dequeue,
>>>>>> + .peek = qdisc_peek_dequeued,
>>>>>> + .init = gkprio_init,
>>>>>> + .reset = gkprio_reset,
>>>>>> + .change = gkprio_change,
>>>>>> + .dump = gkprio_dump,
>>>>>> + .destroy = gkprio_destroy,
>>>>>> + .owner = THIS_MODULE,
>>>>>> +};
>>>>>
>>>>>
>>>>>
>>>>> You probably want to add Qdisc_class_ops here so that you can
>>>>> dump the stats of each internal queue.
>>>
>>>
>>>
>>> Hi Cong,
>>>
>>> In the production scenario we are targeting, this priority queue must
>>> be
>>> classless; being classful would only bloat the code for us. I don't see
>>> making this queue classful as a problem per se, but I suggest leaving it
>>> as
>>> a future improvement for when someone can come up with a useful scenario
>>> for
>>> it.
>>
>>
>>
>> Take a look at sch_prio, it is fairly simple since your internal
>> queues are just an array... Per-queue stats are quite useful
>> in production, we definitely want to observe which queues are
>> full which are not.
>>
>
> DSprio cannot add Qdisc_class_ops without a rewrite of other queue
> disciplines, which doesn't seem desirable. Since the method cops->leaf is
> required (see register_qdisc()), we would need to replace the array struct
> sk_buff_head qdiscs[GKPRIO_MAX_PRIORITY] in struct gkprio_sched_data with
> the array struct Qdisc *queues[GKPRIO_MAX_PRIORITY] to be able to return a
> Qdisc in dsprio_leaf(). The problem with this change is that Qdisc does not
> have a method to dequeue from its tail. This new method may not even make
> sense in other queue disciplines. But without this method, gkprio_enqueue()
> cannot drop the lowest priority packet when the queue is full and an
> incoming packet has higher priority.
Sorry for giving you a bad example. Take a look at sch_fq_codel instead,
it returns NULL for ->leaf() and maps its internal flows to classes.
I thought sch_prio uses internal qdiscs, but I was wrong, as you noticed
it actually exposes them to user via classes.
My point is never to make it classful, just want to expose the useful stats,
like how fq_codel dumps its internal flows.
>
> Nevertheless, I see your point on being able to observe the distribution of
> queued packets per priority. A solution for that would be to add the array
> __u32 qlen[GKPRIO_MAX_PRIORITY] in struct tc_gkprio_qopt. This solution even
> avoids adding overhead in the critical paths of DSprio. Do you see a better
> solution?
I believe you can return NULL for ->leaf() and don't need to worry about
->graft() either. ;)
>
> By the way, I've used GKPRIO_MAX_PRIORITY and other names that include
> "gkprio" above to reflect the version 1 of this patch that we are
> discussing. We will rename these identifiers for version 2 of this patch to
> replace "gkprio" with "dsprio".
>
Sounds good.
Thanks.
^ permalink raw reply
* [PATCH bpf-next 7/7] samples: bpf: convert some XDP samples from bpf_load to libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>
Now that we can use full powers of libbpf in BPF samples, we
should perhaps make the simplest XDP programs not depend on
bpf_load helpers. This way newcomers will be exposed to the
recommended library from the start.
Use of bpf_prog_load_xattr() will also make it trivial to later
on request offload of the programs by simply adding ifindex to
the xattr.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
samples/bpf/Makefile | 8 +++---
samples/bpf/xdp1_user.c | 31 +++++++++++++-------
samples/bpf/xdp_adjust_tail_user.c | 36 ++++++++++++++---------
samples/bpf/xdp_rxq_info_user.c | 46 ++++++++++++++++++++----------
4 files changed, 78 insertions(+), 43 deletions(-)
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 4f99de93d4db..7181b23226a7 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -78,9 +78,9 @@ test_cgrp2_attach-objs := test_cgrp2_attach.o $(LIBBPF)
test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(LIBBPF) $(CGROUP_HELPERS)
test_cgrp2_sock-objs := test_cgrp2_sock.o $(LIBBPF)
test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
-xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+xdp1-objs := xdp1_user.o $(LIBBPF)
# reuse xdp1 source intentionally
-xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+xdp2-objs := xdp1_user.o $(LIBBPF)
xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o
test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \
test_current_task_under_cgroup_user.o
@@ -95,10 +95,10 @@ xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
-xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
+xdp_rxq_info-objs := xdp_rxq_info_user.o $(LIBBPF)
syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
-xdp_adjust_tail-objs := bpf_load.o $(LIBBPF) xdp_adjust_tail_user.o
+xdp_adjust_tail-objs := xdp_adjust_tail_user.o $(LIBBPF)
xdpsock-objs := bpf_load.o $(LIBBPF) xdpsock_user.o
# Tell kbuild to always build the programs
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index b901ee2b3336..b02c531510ed 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -16,9 +16,9 @@
#include <libgen.h>
#include <sys/resource.h>
-#include "bpf_load.h"
#include "bpf_util.h"
-#include "libbpf.h"
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
static int ifindex;
static __u32 xdp_flags;
@@ -31,7 +31,7 @@ static void int_exit(int sig)
/* simple per-protocol drop counter
*/
-static void poll_stats(int interval)
+static void poll_stats(int map_fd, int interval)
{
unsigned int nr_cpus = bpf_num_possible_cpus();
const unsigned int nr_keys = 256;
@@ -47,7 +47,7 @@ static void poll_stats(int interval)
for (key = 0; key < nr_keys; key++) {
__u64 sum = 0;
- assert(bpf_map_lookup_elem(map_fd[0], &key, values) == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, values) == 0);
for (i = 0; i < nr_cpus; i++)
sum += (values[i] - prev[key][i]);
if (sum)
@@ -71,9 +71,14 @@ static void usage(const char *prog)
int main(int argc, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ struct bpf_prog_load_attr prog_load_attr = {
+ .prog_type = BPF_PROG_TYPE_XDP,
+ };
const char *optstr = "SN";
+ int prog_fd, map_fd, opt;
+ struct bpf_object *obj;
+ struct bpf_map *map;
char filename[256];
- int opt;
while ((opt = getopt(argc, argv, optstr)) != -1) {
switch (opt) {
@@ -102,13 +107,19 @@ int main(int argc, char **argv)
ifindex = strtoul(argv[optind], NULL, 0);
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+ prog_load_attr.file = filename;
- if (load_bpf_file(filename)) {
- printf("%s", bpf_log_buf);
+ if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+ return 1;
+
+ map = bpf_map__next(NULL, obj);
+ if (!map) {
+ printf("finding a map in obj file failed\n");
return 1;
}
+ map_fd = bpf_map__fd(map);
- if (!prog_fd[0]) {
+ if (!prog_fd) {
printf("load_bpf_file: %s\n", strerror(errno));
return 1;
}
@@ -116,12 +127,12 @@ int main(int argc, char **argv)
signal(SIGINT, int_exit);
signal(SIGTERM, int_exit);
- if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+ if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
printf("link set xdp fd failed\n");
return 1;
}
- poll_stats(2);
+ poll_stats(map_fd, 2);
return 0;
}
diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
index f621a541b574..3042ce37dae8 100644
--- a/samples/bpf/xdp_adjust_tail_user.c
+++ b/samples/bpf/xdp_adjust_tail_user.c
@@ -18,9 +18,8 @@
#include <netinet/ether.h>
#include <unistd.h>
#include <time.h>
-#include "bpf_load.h"
-#include "libbpf.h"
-#include "bpf_util.h"
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
#define STATS_INTERVAL_S 2U
@@ -36,7 +35,7 @@ static void int_exit(int sig)
/* simple "icmp packet too big sent" counter
*/
-static void poll_stats(unsigned int kill_after_s)
+static void poll_stats(unsigned int map_fd, unsigned int kill_after_s)
{
time_t started_at = time(NULL);
__u64 value = 0;
@@ -46,7 +45,7 @@ static void poll_stats(unsigned int kill_after_s)
while (!kill_after_s || time(NULL) - started_at <= kill_after_s) {
sleep(STATS_INTERVAL_S);
- assert(bpf_map_lookup_elem(map_fd[0], &key, &value) == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0);
printf("icmp \"packet too big\" sent: %10llu pkts\n", value);
}
@@ -66,14 +65,17 @@ static void usage(const char *cmd)
int main(int argc, char **argv)
{
+ struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ struct bpf_prog_load_attr prog_load_attr = {
+ .prog_type = BPF_PROG_TYPE_XDP,
+ };
unsigned char opt_flags[256] = {};
unsigned int kill_after_s = 0;
const char *optstr = "i:T:SNh";
- struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ int i, prog_fd, map_fd, opt;
+ struct bpf_object *obj;
+ struct bpf_map *map;
char filename[256];
- int opt;
- int i;
-
for (i = 0; i < strlen(optstr); i++)
if (optstr[i] != 'h' && 'a' <= optstr[i] && optstr[i] <= 'z')
@@ -115,13 +117,19 @@ int main(int argc, char **argv)
}
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+ prog_load_attr.file = filename;
+
+ if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+ return 1;
- if (load_bpf_file(filename)) {
- printf("%s", bpf_log_buf);
+ map = bpf_map__next(NULL, obj);
+ if (!map) {
+ printf("finding a map in obj file failed\n");
return 1;
}
+ map_fd = bpf_map__fd(map);
- if (!prog_fd[0]) {
+ if (!prog_fd) {
printf("load_bpf_file: %s\n", strerror(errno));
return 1;
}
@@ -129,12 +137,12 @@ int main(int argc, char **argv)
signal(SIGINT, int_exit);
signal(SIGTERM, int_exit);
- if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+ if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
printf("link set xdp fd failed\n");
return 1;
}
- poll_stats(kill_after_s);
+ poll_stats(map_fd, kill_after_s);
bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index 478d95412de4..e4e9ba52bff0 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -22,8 +22,8 @@ static const char *__doc__ = " XDP RX-queue info extract example\n\n"
#include <arpa/inet.h>
#include <linux/if_link.h>
-#include "libbpf.h"
-#include "bpf_load.h"
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
#include "bpf_util.h"
static int ifindex = -1;
@@ -32,6 +32,9 @@ static char *ifname;
static __u32 xdp_flags;
+static struct bpf_map *stats_global_map;
+static struct bpf_map *rx_queue_index_map;
+
/* Exit return codes */
#define EXIT_OK 0
#define EXIT_FAIL 1
@@ -174,7 +177,7 @@ static struct datarec *alloc_record_per_cpu(void)
static struct record *alloc_record_per_rxq(void)
{
- unsigned int nr_rxqs = map_data[2].def.max_entries;
+ unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
struct record *array;
size_t size;
@@ -190,7 +193,7 @@ static struct record *alloc_record_per_rxq(void)
static struct stats_record *alloc_stats_record(void)
{
- unsigned int nr_rxqs = map_data[2].def.max_entries;
+ unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
struct stats_record *rec;
int i;
@@ -210,7 +213,7 @@ static struct stats_record *alloc_stats_record(void)
static void free_stats_record(struct stats_record *r)
{
- unsigned int nr_rxqs = map_data[2].def.max_entries;
+ unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
int i;
for (i = 0; i < nr_rxqs; i++)
@@ -254,11 +257,11 @@ static void stats_collect(struct stats_record *rec)
{
int fd, i, max_rxqs;
- fd = map_data[1].fd; /* map: stats_global_map */
+ fd = bpf_map__fd(stats_global_map);
map_collect_percpu(fd, 0, &rec->stats);
- fd = map_data[2].fd; /* map: rx_queue_index_map */
- max_rxqs = map_data[2].def.max_entries;
+ fd = bpf_map__fd(rx_queue_index_map);
+ max_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
for (i = 0; i < max_rxqs; i++)
map_collect_percpu(fd, i, &rec->rxq[i]);
}
@@ -304,8 +307,8 @@ static void stats_print(struct stats_record *stats_rec,
struct stats_record *stats_prev,
int action)
{
+ unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
unsigned int nr_cpus = bpf_num_possible_cpus();
- unsigned int nr_rxqs = map_data[2].def.max_entries;
double pps = 0, err = 0;
struct record *rec, *prev;
double t;
@@ -419,31 +422,44 @@ static void stats_poll(int interval, int action)
int main(int argc, char **argv)
{
struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
+ struct bpf_prog_load_attr prog_load_attr = {
+ .prog_type = BPF_PROG_TYPE_XDP,
+ };
+ int prog_fd, map_fd, opt, err;
bool use_separators = true;
struct config cfg = { 0 };
+ struct bpf_object *obj;
+ struct bpf_map *map;
char filename[256];
int longindex = 0;
int interval = 2;
__u32 key = 0;
- int opt, err;
char action_str_buf[XDP_ACTION_MAX_STRLEN + 1 /* for \0 */] = { 0 };
int action = XDP_PASS; /* Default action */
char *action_str = NULL;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+ prog_load_attr.file = filename;
if (setrlimit(RLIMIT_MEMLOCK, &r)) {
perror("setrlimit(RLIMIT_MEMLOCK)");
return 1;
}
- if (load_bpf_file(filename)) {
- fprintf(stderr, "ERR in load_bpf_file(): %s", bpf_log_buf);
+ if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+ return EXIT_FAIL;
+
+ map = bpf_map__next(NULL, obj);
+ stats_global_map = bpf_map__next(map, obj);
+ rx_queue_index_map = bpf_map__next(stats_global_map, obj);
+ if (!map || !stats_global_map || !rx_queue_index_map) {
+ printf("finding a map in obj file failed\n");
return EXIT_FAIL;
}
+ map_fd = bpf_map__fd(map);
- if (!prog_fd[0]) {
+ if (!prog_fd) {
fprintf(stderr, "ERR: load_bpf_file: %s\n", strerror(errno));
return EXIT_FAIL;
}
@@ -512,7 +528,7 @@ int main(int argc, char **argv)
setlocale(LC_NUMERIC, "en_US");
/* User-side setup ifindex in config_map */
- err = bpf_map_update_elem(map_fd[0], &key, &cfg, 0);
+ err = bpf_map_update_elem(map_fd, &key, &cfg, 0);
if (err) {
fprintf(stderr, "Store config failed (err:%d)\n", err);
exit(EXIT_FAIL_BPF);
@@ -521,7 +537,7 @@ int main(int argc, char **argv)
/* Remove XDP program when program is interrupted */
signal(SIGINT, int_exit);
- if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+ if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
fprintf(stderr, "link set xdp fd failed\n");
return EXIT_FAIL_XDP;
}
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 6/7] tools: bpf: don't complain about no kernel version for networking code
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
To: alexei.starovoitov, daniel
Cc: oss-drivers, netdev, Jakub Kicinski, Stephen Rothwell
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>
BPF programs only have to specify the target kernel version for
tracing related hooks, in networking world that requirement does
not really apply. Loosen the checks in libbpf to reflect that.
bpf_object__open() users will continue to see the error for backward
compatibility (and because prog_type is not available there).
Error code for NULL file name is changed from ENOENT to EINVAL,
as it seems more appropriate, hopefully, that's an OK change.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
Merge note:
This will conflict with "tools: bpf: handle NULL return in
bpf_prog_load_xattr()" the fix from bpf tree should be completely
discarded, we don't have to check for NULL any more.
CC: Stephen Rothwell <sfr@canb.auug.org.au>
---
tools/lib/bpf/libbpf.c | 46 +++++++++++++++++++++++++++++++++++-------
1 file changed, 39 insertions(+), 7 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ce96f1fe3f37..df54c4c9e48a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1438,9 +1438,37 @@ bpf_object__load_progs(struct bpf_object *obj)
return 0;
}
-static int bpf_object__validate(struct bpf_object *obj)
+static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
+{
+ switch (type) {
+ case BPF_PROG_TYPE_SOCKET_FILTER:
+ case BPF_PROG_TYPE_SCHED_CLS:
+ case BPF_PROG_TYPE_SCHED_ACT:
+ case BPF_PROG_TYPE_XDP:
+ case BPF_PROG_TYPE_CGROUP_SKB:
+ case BPF_PROG_TYPE_CGROUP_SOCK:
+ case BPF_PROG_TYPE_LWT_IN:
+ case BPF_PROG_TYPE_LWT_OUT:
+ case BPF_PROG_TYPE_LWT_XMIT:
+ case BPF_PROG_TYPE_SOCK_OPS:
+ case BPF_PROG_TYPE_SK_SKB:
+ case BPF_PROG_TYPE_CGROUP_DEVICE:
+ case BPF_PROG_TYPE_SK_MSG:
+ case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+ return false;
+ case BPF_PROG_TYPE_UNSPEC:
+ case BPF_PROG_TYPE_KPROBE:
+ case BPF_PROG_TYPE_TRACEPOINT:
+ case BPF_PROG_TYPE_PERF_EVENT:
+ case BPF_PROG_TYPE_RAW_TRACEPOINT:
+ default:
+ return true;
+ }
+}
+
+static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
{
- if (obj->kern_version == 0) {
+ if (needs_kver && obj->kern_version == 0) {
pr_warning("%s doesn't provide kernel version\n",
obj->path);
return -LIBBPF_ERRNO__KVERSION;
@@ -1449,7 +1477,8 @@ static int bpf_object__validate(struct bpf_object *obj)
}
static struct bpf_object *
-__bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
+__bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
+ bool needs_kver)
{
struct bpf_object *obj;
int err;
@@ -1467,7 +1496,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
CHECK_ERR(bpf_object__check_endianness(obj), err, out);
CHECK_ERR(bpf_object__elf_collect(obj), err, out);
CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
- CHECK_ERR(bpf_object__validate(obj), err, out);
+ CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
bpf_object__elf_finish(obj);
return obj;
@@ -1484,7 +1513,7 @@ struct bpf_object *bpf_object__open(const char *path)
pr_debug("loading %s\n", path);
- return __bpf_object__open(path, NULL, 0);
+ return __bpf_object__open(path, NULL, 0, true);
}
struct bpf_object *bpf_object__open_buffer(void *obj_buf,
@@ -1507,7 +1536,7 @@ struct bpf_object *bpf_object__open_buffer(void *obj_buf,
pr_debug("loading object '%s' from buffer\n",
name);
- return __bpf_object__open(name, obj_buf, obj_buf_sz);
+ return __bpf_object__open(name, obj_buf, obj_buf_sz, true);
}
int bpf_object__unload(struct bpf_object *obj)
@@ -2164,8 +2193,11 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
if (!attr)
return -EINVAL;
+ if (!attr->file)
+ return -EINVAL;
- obj = bpf_object__open(attr->file);
+ obj = __bpf_object__open(attr->file, NULL, 0,
+ bpf_prog_type__needs_kver(attr->prog_type));
if (IS_ERR(obj))
return -ENOENT;
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 5/7] tools: bpf: improve comments in libbpf.h
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>
Fix spelling mistakes, improve and clarify the language of comments
in libbpf.h.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
tools/lib/bpf/libbpf.h | 48 +++++++++++++++++++++---------------------
1 file changed, 24 insertions(+), 24 deletions(-)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index ce681097584e..4574b9563278 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -52,8 +52,8 @@ enum libbpf_errno {
int libbpf_strerror(int err, char *buf, size_t size);
/*
- * In include/linux/compiler-gcc.h, __printf is defined. However
- * it should be better if libbpf.h doesn't depend on Linux header file.
+ * __printf is defined in include/linux/compiler-gcc.h. However,
+ * it would be better if libbpf.h didn't depend on Linux header files.
* So instead of __printf, here we use gcc attribute directly.
*/
typedef int (*libbpf_print_fn_t)(const char *, ...)
@@ -92,7 +92,7 @@ int bpf_object__set_priv(struct bpf_object *obj, void *priv,
bpf_object_clear_priv_t clear_priv);
void *bpf_object__priv(struct bpf_object *prog);
-/* Accessors of bpf_program. */
+/* Accessors of bpf_program */
struct bpf_program;
struct bpf_program *bpf_program__next(struct bpf_program *prog,
struct bpf_object *obj);
@@ -121,28 +121,28 @@ struct bpf_insn;
/*
* Libbpf allows callers to adjust BPF programs before being loaded
- * into kernel. One program in an object file can be transform into
- * multiple variants to be attached to different code.
+ * into kernel. One program in an object file can be transformed into
+ * multiple variants to be attached to different hooks.
*
* bpf_program_prep_t, bpf_program__set_prep and bpf_program__nth_fd
- * are APIs for this propose.
+ * form an API for this purpose.
*
* - bpf_program_prep_t:
- * It defines 'preprocessor', which is a caller defined function
+ * Defines a 'preprocessor', which is a caller defined function
* passed to libbpf through bpf_program__set_prep(), and will be
* called before program is loaded. The processor should adjust
- * the program one time for each instances according to the number
+ * the program one time for each instance according to the instance id
* passed to it.
*
* - bpf_program__set_prep:
- * Attachs a preprocessor to a BPF program. The number of instances
- * whould be created is also passed through this function.
+ * Attaches a preprocessor to a BPF program. The number of instances
+ * that should be created is also passed through this function.
*
* - bpf_program__nth_fd:
- * After the program is loaded, get resuling fds from bpf program for
- * each instances.
+ * After the program is loaded, get resulting FD of a given instance
+ * of the BPF program.
*
- * If bpf_program__set_prep() is not used, the program whould be loaded
+ * If bpf_program__set_prep() is not used, the program would be loaded
* without adjustment during bpf_object__load(). The program has only
* one instance. In this case bpf_program__fd(prog) is equal to
* bpf_program__nth_fd(prog, 0).
@@ -156,7 +156,7 @@ struct bpf_prog_prep_result {
struct bpf_insn *new_insn_ptr;
int new_insn_cnt;
- /* If not NULL, result fd is set to it */
+ /* If not NULL, result FD is written to it. */
int *pfd;
};
@@ -169,8 +169,8 @@ struct bpf_prog_prep_result {
* - res: Output parameter, result of transformation.
*
* Return value:
- * - Zero: pre-processing success.
- * - Non-zero: pre-processing, stop loading.
+ * - Zero: pre-processing success.
+ * - Non-zero: pre-processing error, stop loading.
*/
typedef int (*bpf_program_prep_t)(struct bpf_program *prog, int n,
struct bpf_insn *insns, int insns_cnt,
@@ -182,7 +182,7 @@ int bpf_program__set_prep(struct bpf_program *prog, int nr_instance,
int bpf_program__nth_fd(struct bpf_program *prog, int n);
/*
- * Adjust type of bpf program. Default is kprobe.
+ * Adjust type of BPF program. Default is kprobe.
*/
int bpf_program__set_socket_filter(struct bpf_program *prog);
int bpf_program__set_tracepoint(struct bpf_program *prog);
@@ -206,10 +206,10 @@ bool bpf_program__is_xdp(struct bpf_program *prog);
bool bpf_program__is_perf_event(struct bpf_program *prog);
/*
- * We don't need __attribute__((packed)) now since it is
- * unnecessary for 'bpf_map_def' because they are all aligned.
- * In addition, using it will trigger -Wpacked warning message,
- * and will be treated as an error due to -Werror.
+ * No need for __attribute__((packed)), all members of 'bpf_map_def'
+ * are all aligned. In addition, using __attribute__((packed))
+ * would trigger a -Wpacked warning message, and lead to an error
+ * if -Werror is set.
*/
struct bpf_map_def {
unsigned int type;
@@ -220,8 +220,8 @@ struct bpf_map_def {
};
/*
- * There is another 'struct bpf_map' in include/linux/map.h. However,
- * it is not a uapi header so no need to consider name clash.
+ * The 'struct bpf_map' in include/linux/bpf.h is internal to the kernel,
+ * so no need to worry about a name clash.
*/
struct bpf_map;
struct bpf_map *
@@ -229,7 +229,7 @@ bpf_object__find_map_by_name(struct bpf_object *obj, const char *name);
/*
* Get bpf_map through the offset of corresponding struct bpf_map_def
- * in the bpf object file.
+ * in the BPF object file.
*/
struct bpf_map *
bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset);
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 4/7] tools: bpf: move the event reading loop to libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>
There are two copies of event reading loop - in bpftool and
trace_helpers "library". Consolidate them and move the code
to libbpf. Return codes from trace_helpers are kept, but
renamed to include LIBBPF prefix.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
samples/bpf/Makefile | 8 ++
samples/bpf/trace_output_user.c | 6 +-
tools/bpf/bpftool/map_perf_ring.c | 66 ++++------------
tools/lib/bpf/Makefile | 2 +-
tools/lib/bpf/libbpf.c | 61 +++++++++++++++
tools/lib/bpf/libbpf.h | 13 +++
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/bpf/test_progs.c | 6 +-
tools/testing/selftests/bpf/trace_helpers.c | 87 +++++++--------------
tools/testing/selftests/bpf/trace_helpers.h | 11 +--
10 files changed, 139 insertions(+), 123 deletions(-)
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 5f0bf1367826..4f99de93d4db 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -162,6 +162,14 @@ HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
HOSTCFLAGS += -I$(srctree)/tools/perf
HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
+HOSTCFLAGS_trace_helpers.o += -I$(srctree)/tools/lib/bpf/
+
+HOSTCFLAGS_trace_output_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_offwaketime_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_spintest_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_trace_event_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_sampleip_user.o += -I$(srctree)/tools/lib/bpf/
+
HOSTLOADLIBES_test_lru_dist += -lelf
HOSTLOADLIBES_sock_example += -lelf
HOSTLOADLIBES_fds_example += -lelf
diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
index 5e78c2ecd08d..da98be721001 100644
--- a/samples/bpf/trace_output_user.c
+++ b/samples/bpf/trace_output_user.c
@@ -48,7 +48,7 @@ static int print_bpf_output(void *data, int size)
if (e->cookie != 0x12345678) {
printf("BUG pid %llx cookie %llx sized %d\n",
e->pid, e->cookie, size);
- return PERF_EVENT_ERROR;
+ return LIBBPF_PERF_EVENT_ERROR;
}
cnt++;
@@ -56,10 +56,10 @@ static int print_bpf_output(void *data, int size)
if (cnt == MAX_CNT) {
printf("recv %lld events per sec\n",
MAX_CNT * 1000000000ll / (time_get_ns() - start_time));
- return PERF_EVENT_DONE;
+ return LIBBPF_PERF_EVENT_DONE;
}
- return PERF_EVENT_CONT;
+ return LIBBPF_PERF_EVENT_CONT;
}
static void test_bpf_perf_event(void)
diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
index 9ae4bb8a2cad..1832100d1b27 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -50,14 +50,15 @@ static void int_exit(int signo)
stop = true;
}
-static void
-print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
+static enum bpf_perf_event_ret print_bpf_output(void *event, void *priv)
{
+ struct event_ring_info *ring = priv;
+ struct perf_event_sample *e = event;
struct {
struct perf_event_header header;
__u64 id;
__u64 lost;
- } *lost = (void *)e;
+ } *lost = event;
if (json_output) {
jsonw_start_object(json_wtr);
@@ -96,60 +97,23 @@ print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
e->header.type, e->header.size);
}
}
+
+ return LIBBPF_PERF_EVENT_CONT;
}
static void
perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len)
{
- volatile struct perf_event_mmap_page *header = ring->mem;
- __u64 buffer_size = MMAP_PAGE_CNT * get_page_size();
- __u64 data_tail = header->data_tail;
- __u64 data_head = header->data_head;
- void *base, *begin, *end;
-
- asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
- if (data_head == data_tail)
- return;
-
- base = ((char *)header) + get_page_size();
-
- begin = base + data_tail % buffer_size;
- end = base + data_head % buffer_size;
-
- while (begin != end) {
- struct perf_event_sample *e;
-
- e = begin;
- if (begin + e->header.size > base + buffer_size) {
- long len = base + buffer_size - begin;
-
- if (*buf_len < e->header.size) {
- free(*buf);
- *buf = malloc(e->header.size);
- if (!*buf) {
- fprintf(stderr,
- "can't allocate memory");
- stop = true;
- return;
- }
- *buf_len = e->header.size;
- }
-
- memcpy(*buf, begin, len);
- memcpy(*buf + len, base, e->header.size - len);
- e = (void *)*buf;
- begin = base + e->header.size - len;
- } else if (begin + e->header.size == base + buffer_size) {
- begin = base;
- } else {
- begin += e->header.size;
- }
-
- print_bpf_output(ring, e);
+ enum bpf_perf_event_ret ret;
+
+ ret = bpf_perf_event_read_simple(ring->mem,
+ MMAP_PAGE_CNT * get_page_size(),
+ get_page_size(), buf, buf_len,
+ print_bpf_output, ring);
+ if (ret != LIBBPF_PERF_EVENT_CONT) {
+ fprintf(stderr, "perf read loop failed with %d\n", ret);
+ stop = true;
}
-
- __sync_synchronize(); /* smp_mb() */
- header->data_tail = data_head;
}
static int perf_mmap_size(void)
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index e6d5f8d1477f..f3fab4af4260 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -69,7 +69,7 @@ FEATURE_USER = .libbpf
FEATURE_TESTS = libelf libelf-getphdrnum libelf-mmap bpf
FEATURE_DISPLAY = libelf bpf
-INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi
+INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi -I$(srctree)/tools/perf
FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES)
check_feat := 1
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7bcdca13083a..ce96f1fe3f37 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -31,6 +31,7 @@
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
+#include <perf-sys.h>
#include <asm/unistd.h>
#include <linux/err.h>
#include <linux/kernel.h>
@@ -2210,3 +2211,63 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
*prog_fd = bpf_program__fd(first_prog);
return 0;
}
+
+enum bpf_perf_event_ret
+bpf_perf_event_read_simple(void *mem, unsigned long size,
+ unsigned long page_size, void **buf, size_t *buf_len,
+ bpf_perf_event_print_t fn, void *priv)
+{
+ volatile struct perf_event_mmap_page *header = mem;
+ __u64 data_tail = header->data_tail;
+ __u64 data_head = header->data_head;
+ void *base, *begin, *end;
+ int ret;
+
+ asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
+ if (data_head == data_tail)
+ return LIBBPF_PERF_EVENT_CONT;
+
+ base = ((char *)header) + page_size;
+
+ begin = base + data_tail % size;
+ end = base + data_head % size;
+
+ while (begin != end) {
+ struct perf_event_header *ehdr;
+
+ ehdr = begin;
+ if (begin + ehdr->size > base + size) {
+ long len = base + size - begin;
+
+ if (*buf_len < ehdr->size) {
+ free(*buf);
+ *buf = malloc(ehdr->size);
+ if (!*buf) {
+ ret = LIBBPF_PERF_EVENT_ERROR;
+ break;
+ }
+ *buf_len = ehdr->size;
+ }
+
+ memcpy(*buf, begin, len);
+ memcpy(*buf + len, base, ehdr->size - len);
+ ehdr = (void *)*buf;
+ begin = base + ehdr->size - len;
+ } else if (begin + ehdr->size == base + size) {
+ begin = base;
+ } else {
+ begin += ehdr->size;
+ }
+
+ ret = fn(ehdr, priv);
+ if (ret != LIBBPF_PERF_EVENT_CONT)
+ break;
+
+ data_tail += ehdr->size;
+ }
+
+ __sync_synchronize(); /* smp_mb() */
+ header->data_tail = data_tail;
+
+ return ret;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 197f9ce2248c..ce681097584e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -267,4 +267,17 @@ int bpf_prog_load(const char *file, enum bpf_prog_type type,
struct bpf_object **pobj, int *prog_fd);
int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+
+enum bpf_perf_event_ret {
+ LIBBPF_PERF_EVENT_DONE = 0,
+ LIBBPF_PERF_EVENT_ERROR = -1,
+ LIBBPF_PERF_EVENT_CONT = -2,
+};
+
+typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(void *event,
+ void *priv);
+int bpf_perf_event_read_simple(void *mem, unsigned long size,
+ unsigned long page_size,
+ void **buf, size_t *buf_len,
+ bpf_perf_event_print_t fn, void *priv);
#endif
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 9d762184b805..eac128afe15a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -10,7 +10,7 @@ ifneq ($(wildcard $(GENHDR)),)
GENFLAGS := -DHAVE_GENHDR
endif
-CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
+CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(BPFDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
LDLIBS += -lcap -lelf -lrt -lpthread
TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index ed197eef1cfc..f7731973ec68 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -1337,12 +1337,12 @@ static int get_stack_print_output(void *data, int size)
good_user_stack = true;
}
if (!good_kern_stack || !good_user_stack)
- return PERF_EVENT_ERROR;
+ return LIBBPF_PERF_EVENT_ERROR;
if (cnt == MAX_CNT_RAWTP)
- return PERF_EVENT_DONE;
+ return LIBBPF_PERF_EVENT_DONE;
- return PERF_EVENT_CONT;
+ return LIBBPF_PERF_EVENT_CONT;
}
static void test_get_stack_raw_tp(void)
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index ad025bd75f1c..8fb4fe8686e4 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -74,7 +74,7 @@ struct ksym *ksym_search(long key)
static int page_size;
static int page_cnt = 8;
-static volatile struct perf_event_mmap_page *header;
+static struct perf_event_mmap_page *header;
int perf_event_mmap(int fd)
{
@@ -107,74 +107,47 @@ struct perf_event_sample {
char data[];
};
-static int perf_event_read(perf_event_print_fn fn)
+static enum bpf_perf_event_ret bpf_perf_event_print(void *event, void *priv)
{
- __u64 data_tail = header->data_tail;
- __u64 data_head = header->data_head;
- __u64 buffer_size = page_cnt * page_size;
- void *base, *begin, *end;
- char buf[256];
+ struct perf_event_sample *e = event;
+ perf_event_print_fn fn = priv;
int ret;
- asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
- if (data_head == data_tail)
- return PERF_EVENT_CONT;
-
- base = ((char *)header) + page_size;
-
- begin = base + data_tail % buffer_size;
- end = base + data_head % buffer_size;
-
- while (begin != end) {
- struct perf_event_sample *e;
-
- e = begin;
- if (begin + e->header.size > base + buffer_size) {
- long len = base + buffer_size - begin;
-
- assert(len < e->header.size);
- memcpy(buf, begin, len);
- memcpy(buf + len, base, e->header.size - len);
- e = (void *) buf;
- begin = base + e->header.size - len;
- } else if (begin + e->header.size == base + buffer_size) {
- begin = base;
- } else {
- begin += e->header.size;
- }
-
- if (e->header.type == PERF_RECORD_SAMPLE) {
- ret = fn(e->data, e->size);
- if (ret != PERF_EVENT_CONT)
- return ret;
- } else if (e->header.type == PERF_RECORD_LOST) {
- struct {
- struct perf_event_header header;
- __u64 id;
- __u64 lost;
- } *lost = (void *) e;
- printf("lost %lld events\n", lost->lost);
- } else {
- printf("unknown event type=%d size=%d\n",
- e->header.type, e->header.size);
- }
+ if (e->header.type == PERF_RECORD_SAMPLE) {
+ ret = fn(e->data, e->size);
+ if (ret != LIBBPF_PERF_EVENT_CONT)
+ return ret;
+ } else if (e->header.type == PERF_RECORD_LOST) {
+ struct {
+ struct perf_event_header header;
+ __u64 id;
+ __u64 lost;
+ } *lost = (void *) e;
+ printf("lost %lld events\n", lost->lost);
+ } else {
+ printf("unknown event type=%d size=%d\n",
+ e->header.type, e->header.size);
}
- __sync_synchronize(); /* smp_mb() */
- header->data_tail = data_head;
- return PERF_EVENT_CONT;
+ return LIBBPF_PERF_EVENT_CONT;
}
int perf_event_poller(int fd, perf_event_print_fn output_fn)
{
- int ret;
+ enum bpf_perf_event_ret ret;
+ void *buf = NULL;
+ size_t len = 0;
for (;;) {
perf_event_poll(fd);
- ret = perf_event_read(output_fn);
- if (ret != PERF_EVENT_CONT)
- return ret;
+ ret = bpf_perf_event_read_simple(header, page_cnt * page_size,
+ page_size, &buf, &len,
+ bpf_perf_event_print,
+ output_fn);
+ if (ret != LIBBPF_PERF_EVENT_CONT)
+ break;
}
+ free(buf);
- return PERF_EVENT_DONE;
+ return ret;
}
diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h
index fe3eefd21e86..36d90e3b1ea9 100644
--- a/tools/testing/selftests/bpf/trace_helpers.h
+++ b/tools/testing/selftests/bpf/trace_helpers.h
@@ -2,6 +2,8 @@
#ifndef __TRACE_HELPER_H
#define __TRACE_HELPER_H
+#include <libbpf.h>
+
struct ksym {
long addr;
char *name;
@@ -10,14 +12,9 @@ struct ksym {
int load_kallsyms(void);
struct ksym *ksym_search(long key);
-typedef int (*perf_event_print_fn)(void *data, int size);
-
-/* return code for perf_event_print_fn */
-#define PERF_EVENT_DONE 0
-#define PERF_EVENT_ERROR -1
-#define PERF_EVENT_CONT -2
+typedef enum bpf_perf_event_ret (*perf_event_print_fn)(void *data, int size);
int perf_event_mmap(int fd);
-/* return PERF_EVENT_DONE or PERF_EVENT_ERROR */
+/* return LIBBPF_PERF_EVENT_DONE or LIBBPF_PERF_EVENT_ERROR */
int perf_event_poller(int fd, perf_event_print_fn output_fn);
#endif
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 3/7] samples: bpf: compile and link against full libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>
samples/bpf currently cherry-picks object files from tools/lib/bpf
to link against. Just compile the full library and link statically
against it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
samples/bpf/Makefile | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 8e0c7fb6d7cc..5f0bf1367826 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -48,7 +48,7 @@ hostprogs-y += xdp_adjust_tail
hostprogs-y += xdpsock
# Libbpf dependencies
-LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
+LIBBPF := ../../tools/lib/bpf/libbpf.a
CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
TRACE_HELPERS := ../../tools/testing/selftests/bpf/trace_helpers.o
@@ -73,10 +73,10 @@ offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o $(TRACE_HELPERS)
spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o $(TRACE_HELPERS)
map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o
test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
-test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
-test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o
-test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o $(CGROUP_HELPERS)
-test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o
+test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o $(LIBBPF)
+test_cgrp2_attach-objs := test_cgrp2_attach.o $(LIBBPF)
+test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(LIBBPF) $(CGROUP_HELPERS)
+test_cgrp2_sock-objs := test_cgrp2_sock.o $(LIBBPF)
test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
# reuse xdp1 source intentionally
@@ -90,7 +90,7 @@ tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o
lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o
xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o
test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o
-per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o
+per_socket_stats_example-objs := cookie_uid_helper_example.o $(LIBBPF)
xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
@@ -162,6 +162,8 @@ HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
HOSTCFLAGS += -I$(srctree)/tools/perf
HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
+HOSTLOADLIBES_test_lru_dist += -lelf
+HOSTLOADLIBES_sock_example += -lelf
HOSTLOADLIBES_fds_example += -lelf
HOSTLOADLIBES_sockex1 += -lelf
HOSTLOADLIBES_sockex2 += -lelf
@@ -173,6 +175,10 @@ HOSTLOADLIBES_tracex4 += -lelf -lrt
HOSTLOADLIBES_tracex5 += -lelf
HOSTLOADLIBES_tracex6 += -lelf
HOSTLOADLIBES_tracex7 += -lelf
+HOSTLOADLIBES_test_cgrp2_array_pin += -lelf
+HOSTLOADLIBES_test_cgrp2_attach += -lelf
+HOSTLOADLIBES_test_cgrp2_attach2 += -lelf
+HOSTLOADLIBES_test_cgrp2_sock += -lelf
HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
HOSTLOADLIBES_load_sock_ops += -lelf
HOSTLOADLIBES_test_probe_write_user += -lelf
@@ -192,6 +198,7 @@ HOSTLOADLIBES_tc_l2_redirect += -l elf
HOSTLOADLIBES_lwt_len_hist += -l elf
HOSTLOADLIBES_xdp_tx_iptunnel += -lelf
HOSTLOADLIBES_test_map_in_map += -lelf
+HOSTLOADLIBES_per_socket_stats_example += -lelf
HOSTLOADLIBES_xdp_redirect += -lelf
HOSTLOADLIBES_xdp_redirect_map += -lelf
HOSTLOADLIBES_xdp_redirect_cpu += -lelf
@@ -222,7 +229,7 @@ all: $(LIBBPF)
@rm -f *~
$(LIBBPF): FORCE
- $(MAKE) -C $(dir $@) $(notdir $@)
+ $(MAKE) -C $(dir $@)
$(obj)/syscall_nrs.s: $(src)/syscall_nrs.c
$(call if_changed_dep,cc_s_c)
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 2/7] samples: bpf: rename struct bpf_map_def to avoid conflict with libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>
Both tools/lib/bpf/libbpf.h and samples/bpf/bpf_load.h define their
own version of struct bpf_map_def. The version in bpf_load.h has
more fields. libbpf does not support inner maps and its definition
of struct bpf_map_def lacks the related fields. Rename the definition
in bpf_load.h (samples/bpf) to avoid conflicts.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
samples/bpf/bpf_load.c | 10 +++++-----
samples/bpf/bpf_load.h | 4 ++--
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index da9bccfaf391..a6b290de5632 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -420,7 +420,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
/* Keeping compatible with ELF maps section changes
* ------------------------------------------------
- * The program size of struct bpf_map_def is known by loader
+ * The program size of struct bpf_load_map_def is known by loader
* code, but struct stored in ELF file can be different.
*
* Unfortunately sym[i].st_size is zero. To calculate the
@@ -429,7 +429,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
* symbols.
*/
map_sz_elf = data_maps->d_size / nr_maps;
- map_sz_copy = sizeof(struct bpf_map_def);
+ map_sz_copy = sizeof(struct bpf_load_map_def);
if (map_sz_elf < map_sz_copy) {
/*
* Backward compat, loading older ELF file with
@@ -448,8 +448,8 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
/* Memcpy relevant part of ELF maps data to loader maps */
for (i = 0; i < nr_maps; i++) {
+ struct bpf_load_map_def *def;
unsigned char *addr, *end;
- struct bpf_map_def *def;
const char *map_name;
size_t offset;
@@ -464,9 +464,9 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
/* Symbol value is offset into ELF maps section data area */
offset = sym[i].st_value;
- def = (struct bpf_map_def *)(data_maps->d_buf + offset);
+ def = (struct bpf_load_map_def *)(data_maps->d_buf + offset);
maps[i].elf_offset = offset;
- memset(&maps[i].def, 0, sizeof(struct bpf_map_def));
+ memset(&maps[i].def, 0, sizeof(struct bpf_load_map_def));
memcpy(&maps[i].def, def, map_sz_copy);
/* Verify no newer features were requested */
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index 2c3d0b448632..f9da59bca0cc 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -7,7 +7,7 @@
#define MAX_MAPS 32
#define MAX_PROGS 32
-struct bpf_map_def {
+struct bpf_load_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
@@ -21,7 +21,7 @@ struct bpf_map_data {
int fd;
char *name;
size_t elf_offset;
- struct bpf_map_def def;
+ struct bpf_load_map_def def;
};
typedef void (*fixup_map_cb)(struct bpf_map_data *map, int idx);
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 1/7] tools: bpftool: use PERF_SAMPLE_TIME instead of reading the clock
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>
Ask the kernel to include sample time in each even instead of
reading the clock. This is also more accurate because our
clock reading was done when user space would dump the buffer,
not when sample was produced.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
tools/bpf/bpftool/map_perf_ring.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
index c5a2ced8552d..9ae4bb8a2cad 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -39,6 +39,7 @@ struct event_ring_info {
struct perf_event_sample {
struct perf_event_header header;
+ u64 time;
__u32 size;
unsigned char data[];
};
@@ -57,17 +58,9 @@ print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
__u64 id;
__u64 lost;
} *lost = (void *)e;
- struct timespec ts;
-
- if (clock_gettime(CLOCK_MONOTONIC, &ts)) {
- perror("Can't read clock for timestamp");
- return;
- }
if (json_output) {
jsonw_start_object(json_wtr);
- jsonw_name(json_wtr, "timestamp");
- jsonw_uint(json_wtr, ts.tv_sec * 1000000000ull + ts.tv_nsec);
jsonw_name(json_wtr, "type");
jsonw_uint(json_wtr, e->header.type);
jsonw_name(json_wtr, "cpu");
@@ -75,6 +68,8 @@ print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
jsonw_name(json_wtr, "index");
jsonw_uint(json_wtr, ring->key);
if (e->header.type == PERF_RECORD_SAMPLE) {
+ jsonw_name(json_wtr, "timestamp");
+ jsonw_uint(json_wtr, e->time);
jsonw_name(json_wtr, "data");
print_data_json(e->data, e->size);
} else if (e->header.type == PERF_RECORD_LOST) {
@@ -89,8 +84,8 @@ print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
jsonw_end_object(json_wtr);
} else {
if (e->header.type == PERF_RECORD_SAMPLE) {
- printf("== @%ld.%ld CPU: %d index: %d =====\n",
- (long)ts.tv_sec, ts.tv_nsec,
+ printf("== @%lld.%09lld CPU: %d index: %d =====\n",
+ e->time / 1000000000ULL, e->time % 1000000000ULL,
ring->cpu, ring->key);
fprint_hex(stdout, e->data, e->size, " ");
printf("\n");
@@ -185,7 +180,7 @@ static void perf_event_unmap(void *mem)
static int bpf_perf_event_open(int map_fd, int key, int cpu)
{
struct perf_event_attr attr = {
- .sample_type = PERF_SAMPLE_RAW,
+ .sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_TIME,
.type = PERF_TYPE_SOFTWARE,
.config = PERF_COUNT_SW_BPF_OUTPUT,
};
--
2.17.0
^ permalink raw reply related
* [PATCH bpf-next 0/7] bpf: add perf event reading loop and move samples closer to libbpf
From: Jakub Kicinski @ 2018-05-10 17:24 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
Hi!
This series started out as a follow up to the bpftool perf event dumping
patches.
As suggested by Daniel patch 1 makes use of PERF_SAMPLE_TIME to simplify
code and improve accuracy of timestamps.
Remaining patches are trying to move perf event loop into libbpf as
suggested by Alexei. One user for this new function is bpftool which
links with libbpf nicely, the other, unfortunately, is in samples/bpf.
Remaining patches make samples/bpf link against full libbpf.a (not just
a handful of objects). Once we have full power of libbpf at our disposal
we can convert some of XDP samples to use libbpf loader instead of
bpf_load.c. My understanding is that this is the desired direction,
at least for networking code.
Jakub Kicinski (7):
tools: bpftool: use PERF_SAMPLE_TIME instead of reading the clock
samples: bpf: rename struct bpf_map_def to avoid conflict with libbpf
samples: bpf: compile and link against full libbpf
tools: bpf: move the event reading loop to libbpf
tools: bpf: improve comments in libbpf.h
tools: bpf: don't complain about no kernel version for networking code
samples: bpf: convert some XDP samples from bpf_load to libbpf
samples/bpf/Makefile | 37 +++++--
samples/bpf/bpf_load.c | 10 +-
samples/bpf/bpf_load.h | 4 +-
samples/bpf/trace_output_user.c | 6 +-
samples/bpf/xdp1_user.c | 31 ++++--
samples/bpf/xdp_adjust_tail_user.c | 36 ++++---
samples/bpf/xdp_rxq_info_user.c | 46 ++++++---
tools/bpf/bpftool/map_perf_ring.c | 83 ++++-----------
tools/lib/bpf/Makefile | 2 +-
tools/lib/bpf/libbpf.c | 107 ++++++++++++++++++--
tools/lib/bpf/libbpf.h | 61 ++++++-----
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/bpf/test_progs.c | 6 +-
tools/testing/selftests/bpf/trace_helpers.c | 87 ++++++----------
tools/testing/selftests/bpf/trace_helpers.h | 11 +-
15 files changed, 307 insertions(+), 222 deletions(-)
--
2.17.0
^ permalink raw reply
* [PATCH] [PATCH net v2] ipv6: remove min MTU check for ipsec tunnels
From: Ashwanth Goli @ 2018-05-10 17:19 UTC (permalink / raw)
To: netdev, davem; +Cc: pabeni, dsahern, eric.dumazet
With 749439bfac "fix udpv6 sendmsg crash caused by too small MTU"
ipsec tunnels that report a MTU less than IPV6_MIN_MTU are broken
even for packets that are smaller than IPV6_MIN_MTU.
According to rfc2473#section-7.1
if the original IPv6 packet is equal or smaller than the
IPv6 minimum link MTU, the tunnel entry-point node
encapsulates the original packet, and subsequently
fragments the resulting IPv6 tunnel packet into IPv6
fragments that do not exceed the Path MTU to the tunnel
exit-point.
Dropping the MTU check for ipsec tunnel destinations.
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
---
net/ipv6/ip6_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2e891d2..c4c3313 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
+ if (!(rt->dst.flags & DST_XFRM_TUNNEL) && mtu < IPV6_MIN_MTU)
return -EINVAL;
cork->base.fragsize = mtu;
if (dst_allfrag(xfrm_dst_path(&rt->dst)))
--
1.9.1
^ permalink raw reply related
* [PATCH bpf] tools: bpf: handle NULL return in bpf_prog_load_xattr()
From: Jakub Kicinski @ 2018-05-10 17:09 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski
bpf_object__open() can return error pointer as well as NULL.
Fix error handling in bpf_prog_load_xattr() (and indirectly
bpf_prog_load()).
Fixes: 6f6d33f3b3d0 ("bpf: selftests add sockmap tests")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
tools/lib/bpf/libbpf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5922443063f0..0f9f06df49bc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2035,7 +2035,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
return -EINVAL;
obj = bpf_object__open(attr->file);
- if (IS_ERR(obj))
+ if (IS_ERR_OR_NULL(obj))
return -ENOENT;
bpf_object__for_each_program(prog, obj) {
--
2.17.0
^ permalink raw reply related
* Re: [PATCH] [PATCH net v2] ipv6: remove min MTU check for ipsec tunnels
From: Eric Dumazet @ 2018-05-10 17:05 UTC (permalink / raw)
To: Ashwanth Goli, netdev, davem; +Cc: pabeni, dsahern
In-Reply-To: <1525970887-26600-1-git-send-email-ashwanth@codeaurora.org>
On 05/10/2018 09:48 AM, Ashwanth Goli wrote:
> With 749439bfac "fix udpv6 sendmsg crash caused by too small MTU"
> ipsec tunnels that report a MTU less than IPV6_MIN_MTU are broken
> even for packets that are smaller than IPV6_MIN_MTU.
>
> According to rfc2473#section-7.1
>
> if the original IPv6 packet is equal or smaller than the
> IPv6 minimum link MTU, the tunnel entry-point node
> encapsulates the original packet, and subsequently
> fragments the resulting IPv6 tunnel packet into IPv6
> fragments that do not exceed the Path MTU to the tunnel
> exit-point.
>
> Dropping the MTU check for ipsec tunnel destinations.
>
> Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
> ---
> net/ipv6/ip6_output.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 2e891d2..c4c3313 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
> if (np->frag_size)
> mtu = np->frag_size;
> }
> - if (mtu < IPV6_MIN_MTU)
> + if (!(rt->dst.flags & DST_XFRM_TUNNEL) && mtu < IPV6_MIN_MTU)
> return -EINVAL;
> cork->base.fragsize = mtu;
> if (dst_allfrag(xfrm_dst_path(&rt->dst)))
>
Given that the (mtu < IPV6_MIN_MTU) condition is unlikely, I would reorder to :
if (mtu < IPV6_MIN_MTU && !(rt->dst.flags & DST_XFRM_TUNNEL))
return -EINVAL;
The fast path should not look at dst.flags
^ permalink raw reply
* [PATCH] [PATCH net v2] ipv6: remove min MTU check for ipsec tunnels
From: Ashwanth Goli @ 2018-05-10 16:48 UTC (permalink / raw)
To: netdev, davem; +Cc: pabeni, dsahern
With 749439bfac "fix udpv6 sendmsg crash caused by too small MTU"
ipsec tunnels that report a MTU less than IPV6_MIN_MTU are broken
even for packets that are smaller than IPV6_MIN_MTU.
According to rfc2473#section-7.1
if the original IPv6 packet is equal or smaller than the
IPv6 minimum link MTU, the tunnel entry-point node
encapsulates the original packet, and subsequently
fragments the resulting IPv6 tunnel packet into IPv6
fragments that do not exceed the Path MTU to the tunnel
exit-point.
Dropping the MTU check for ipsec tunnel destinations.
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
---
net/ipv6/ip6_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2e891d2..c4c3313 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
+ if (!(rt->dst.flags & DST_XFRM_TUNNEL) && mtu < IPV6_MIN_MTU)
return -EINVAL;
cork->base.fragsize = mtu;
if (dst_allfrag(xfrm_dst_path(&rt->dst)))
--
1.9.1
^ permalink raw reply related
* [PATCH 1/2] can: hi311x: Acquire SPI lock on ->do_get_berr_counter
From: Marc Kleine-Budde @ 2018-05-10 16:47 UTC (permalink / raw)
To: netdev
Cc: davem, linux-can, kernel, Lukas Wunner, Mathias Duckeck,
Akshay Bhat, Casey Fitzpatrick, Stef Walter, Karel Zak, stable,
Marc Kleine-Budde
In-Reply-To: <20180510164749.20481-1-mkl@pengutronix.de>
From: Lukas Wunner <lukas@wunner.de>
hi3110_get_berr_counter() may run concurrently to the rest of the driver
but neglects to acquire the lock protecting access to the SPI device.
As a result, it and the rest of the driver may clobber each other's tx
and rx buffers.
We became aware of this issue because transmission of packets with
"cangen -g 0 -i -x" frequently hung. It turns out that agetty executes
->do_get_berr_counter every few seconds via the following call stack:
CPU: 2 PID: 1605 Comm: agetty
[<7f3f7500>] (hi3110_get_berr_counter [hi311x])
[<7f130204>] (can_fill_info [can_dev])
[<80693bc0>] (rtnl_fill_ifinfo)
[<806949ec>] (rtnl_dump_ifinfo)
[<806b4834>] (netlink_dump)
[<806b4bc8>] (netlink_recvmsg)
[<8065f180>] (sock_recvmsg)
[<80660f90>] (___sys_recvmsg)
[<80661e7c>] (__sys_recvmsg)
[<80661ec0>] (SyS_recvmsg)
[<80108b20>] (ret_fast_syscall+0x0/0x1c)
agetty listens to netlink messages in order to update the login prompt
when IP addresses change (if /etc/issue contains \4 or \6 escape codes):
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=e36deb6424e8
It's a useful feature, though it seems questionable that it causes CAN
bit error statistics to be queried.
Be that as it may, if hi3110_get_berr_counter() is invoked while a frame
is sent by hi3110_hw_tx(), bogus SPI transfers like the following may
occur:
=> 12 00 (hi3110_get_berr_counter() wanted to transmit
EC 00 to query the transmit error counter,
but the first byte was overwritten by
hi3110_hw_tx_frame())
=> EA 00 3E 80 01 FB (hi3110_hw_tx_frame() wanted to transmit a
frame, but the first byte was overwritten by
hi3110_get_berr_counter() because it wanted
to query the receive error counter)
This sequence hangs the transmission because the driver believes it has
sent a frame and waits for the interrupt signaling completion, but in
reality the chip has never sent away the frame since the commands it
received were malformed.
Fix by acquiring the SPI lock in hi3110_get_berr_counter().
I've scrutinized the entire driver for further unlocked SPI accesses but
found no others.
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: Stef Walter <stefw@redhat.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Akshay Bhat <akshay.bhat@timesys.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
drivers/net/can/spi/hi311x.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
index 5590c559a8ca..c2cf254e4e95 100644
--- a/drivers/net/can/spi/hi311x.c
+++ b/drivers/net/can/spi/hi311x.c
@@ -427,8 +427,10 @@ static int hi3110_get_berr_counter(const struct net_device *net,
struct hi3110_priv *priv = netdev_priv(net);
struct spi_device *spi = priv->spi;
+ mutex_lock(&priv->hi3110_lock);
bec->txerr = hi3110_read(spi, HI3110_READ_TEC);
bec->rxerr = hi3110_read(spi, HI3110_READ_REC);
+ mutex_unlock(&priv->hi3110_lock);
return 0;
}
--
2.17.0
^ permalink raw reply related
* pull-request: can 2018-05-10
From: Marc Kleine-Budde @ 2018-05-10 16:47 UTC (permalink / raw)
To: netdev; +Cc: davem, linux-can, kernel
Hello David,
this is a pull request for net/master consisting of 2 patches.
Both patches are from Lukas Wunner and fix two problems found in the hi311x CAN
driver under high load situations.
regards,
Marc
---
The following changes since commit 4a026da91caaa36004a53a844dd00959370ea8fc:
net/9p: correct some comment errors in 9p file system code (2018-05-10 08:21:53 -0400)
are available in the Git repository at:
ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git tags/linux-can-fixes-for-4.17-20180510
for you to fetch changes up to 32bee8f48fa048a3198109de50e51c092507ff52:
can: hi311x: Work around TX complete interrupt erratum (2018-05-10 18:25:30 +0200)
----------------------------------------------------------------
linux-can-fixes-for-4.17-20180510
----------------------------------------------------------------
Lukas Wunner (2):
can: hi311x: Acquire SPI lock on ->do_get_berr_counter
can: hi311x: Work around TX complete interrupt erratum
drivers/net/can/spi/hi311x.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
^ permalink raw reply
* [PATCH 2/2] can: hi311x: Work around TX complete interrupt erratum
From: Marc Kleine-Budde @ 2018-05-10 16:47 UTC (permalink / raw)
To: netdev
Cc: davem, linux-can, kernel, Lukas Wunner, Mathias Duckeck,
Akshay Bhat, Casey Fitzpatrick, stable, Marc Kleine-Budde
In-Reply-To: <20180510164749.20481-1-mkl@pengutronix.de>
From: Lukas Wunner <lukas@wunner.de>
When sending packets as fast as possible using "cangen -g 0 -i -x", the
HI-3110 occasionally latches the interrupt pin high on completion of a
packet, but doesn't set the TXCPLT bit in the INTF register. The INTF
register contains 0x00 as if no interrupt has occurred. Even waiting
for a few milliseconds after the interrupt doesn't help.
Work around this apparent erratum by instead checking the TXMTY bit in
the STATF register ("TX FIFO empty"). We know that we've queued up a
packet for transmission if priv->tx_len is nonzero. If the TX FIFO is
empty, transmission of that packet must have completed.
Note that this is congruent with our handling of received packets, which
likewise gleans from the STATF register whether a packet is waiting in
the RX FIFO, instead of looking at the INTF register.
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Akshay Bhat <akshay.bhat@timesys.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
drivers/net/can/spi/hi311x.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
index c2cf254e4e95..53e320c92a8b 100644
--- a/drivers/net/can/spi/hi311x.c
+++ b/drivers/net/can/spi/hi311x.c
@@ -91,6 +91,7 @@
#define HI3110_STAT_BUSOFF BIT(2)
#define HI3110_STAT_ERRP BIT(3)
#define HI3110_STAT_ERRW BIT(4)
+#define HI3110_STAT_TXMTY BIT(7)
#define HI3110_BTR0_SJW_SHIFT 6
#define HI3110_BTR0_BRP_SHIFT 0
@@ -737,10 +738,7 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id)
}
}
- if (intf == 0)
- break;
-
- if (intf & HI3110_INT_TXCPLT) {
+ if (priv->tx_len && statf & HI3110_STAT_TXMTY) {
net->stats.tx_packets++;
net->stats.tx_bytes += priv->tx_len - 1;
can_led_event(net, CAN_LED_EVENT_TX);
@@ -750,6 +748,9 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id)
}
netif_wake_queue(net);
}
+
+ if (intf == 0)
+ break;
}
mutex_unlock(&priv->hi3110_lock);
return IRQ_HANDLED;
--
2.17.0
^ permalink raw reply related
* Re: [PATCH v6 5/5] PCI: Remove unused pcie_get_minimum_link()
From: Bjorn Helgaas @ 2018-05-10 16:33 UTC (permalink / raw)
To: Jeff Kirsher, Ganesh Goudar, Michael Chan, Ariel Elior
Cc: linux-pci, everest-linux-l2, intel-wired-lan, netdev,
linux-kernel, Tal Gilboa, Tariq Toukan, Jacob Keller,
Jakub Kicinski
In-Reply-To: <152537764322.62474.552711932384023140.stgit@bhelgaas-glaptop.roam.corp.google.com>
On Thu, May 03, 2018 at 03:00:43PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> In some cases pcie_get_minimum_link() returned misleading information
> because it found the slowest link and the narrowest link without
> considering the total bandwidth of the link.
>
> For example, consider a path with these two links:
>
> - 16.0 GT/s x1 link (16.0 * 10^9 * 128 / 130) * 1 / 8 = 1969 MB/s
> - 2.5 GT/s x16 link ( 2.5 * 10^9 * 8 / 10) * 16 / 8 = 4000 MB/s
>
> The available bandwidth of the path is limited by the 16 GT/s link to about
> 1969 MB/s, but pcie_get_minimum_link() returned 2.5 GT/s x1, which
> corresponds to only 250 MB/s.
>
> Callers should use pcie_print_link_status() instead, or
> pcie_bandwidth_available() if they need more detailed information.
>
> Remove pcie_get_minimum_link() since there are no callers left.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Hi Jeff,
I got your note that you applied this to dev-queue. I assume that
means you also applied the preceding patches that removed all the
users. I got a note about ixgbe, but not the others, so I'm just
double-checking.
> ---
> drivers/pci/pci.c | 43 -------------------------------------------
> include/linux/pci.h | 2 --
> 2 files changed, 45 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e597655a5643..4bafa817c40a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5069,49 +5069,6 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
> }
> EXPORT_SYMBOL(pcie_set_mps);
>
> -/**
> - * pcie_get_minimum_link - determine minimum link settings of a PCI device
> - * @dev: PCI device to query
> - * @speed: storage for minimum speed
> - * @width: storage for minimum width
> - *
> - * This function will walk up the PCI device chain and determine the minimum
> - * link width and speed of the device.
> - */
> -int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
> - enum pcie_link_width *width)
> -{
> - int ret;
> -
> - *speed = PCI_SPEED_UNKNOWN;
> - *width = PCIE_LNK_WIDTH_UNKNOWN;
> -
> - while (dev) {
> - u16 lnksta;
> - enum pci_bus_speed next_speed;
> - enum pcie_link_width next_width;
> -
> - ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> - if (ret)
> - return ret;
> -
> - next_speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS];
> - next_width = (lnksta & PCI_EXP_LNKSTA_NLW) >>
> - PCI_EXP_LNKSTA_NLW_SHIFT;
> -
> - if (next_speed < *speed)
> - *speed = next_speed;
> -
> - if (next_width < *width)
> - *width = next_width;
> -
> - dev = dev->bus->self;
> - }
> -
> - return 0;
> -}
> -EXPORT_SYMBOL(pcie_get_minimum_link);
> -
> /**
> * pcie_bandwidth_available - determine minimum link settings of a PCIe
> * device and its bandwidth limitation
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 73178a2fcee0..230615620a4a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1079,8 +1079,6 @@ int pcie_get_readrq(struct pci_dev *dev);
> int pcie_set_readrq(struct pci_dev *dev, int rq);
> int pcie_get_mps(struct pci_dev *dev);
> int pcie_set_mps(struct pci_dev *dev, int mps);
> -int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
> - enum pcie_link_width *width);
> u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
> enum pci_bus_speed *speed,
> enum pcie_link_width *width);
>
^ permalink raw reply
* Re: [PATCH] can: hi311x: Work around TX complete interrupt erratum
From: Marc Kleine-Budde @ 2018-05-10 16:27 UTC (permalink / raw)
To: Lukas Wunner, Wolfgang Grandegger, linux-can, netdev
Cc: Mathias Duckeck, Akshay Bhat, Casey Fitzpatrick
In-Reply-To: <dc7901c96bed90724bb372f4d3eead82bd22ac37.1525869303.git.lukas@wunner.de>
On 05/09/2018 02:43 PM, Lukas Wunner wrote:
> When sending packets as fast as possible using "cangen -g 0 -i -x", the
> HI-3110 occasionally latches the interrupt pin high on completion of a
> packet, but doesn't set the TXCPLT bit in the INTF register. The INTF
> register contains 0x00 as if no interrupt has occurred. Even waiting
> for a few milliseconds after the interrupt doesn't help.
>
> Work around this apparent erratum by instead checking the TXMTY bit in
> the STATF register ("TX FIFO empty"). We know that we've queued up a
> packet for transmission if priv->tx_len is nonzero. If the TX FIFO is
> empty, transmission of that packet must have completed.
>
> Note that this is congruent with our handling of received packets, which
> likewise gleans from the STATF register whether a packet is waiting in
> the RX FIFO, instead of looking at the INTF register.
>
> Cc: Mathias Duckeck <m.duckeck@kunbus.de>
> Cc: Akshay Bhat <akshay.bhat@timesys.com>
> Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
> Cc: stable@vger.kernel.org # v4.12+
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
Applied to can.
Tnx,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
^ permalink raw reply
* Re: [PATCH] can: hi311x: Work around TX complete interrupt erratum
From: Akshay Bhat @ 2018-05-10 16:23 UTC (permalink / raw)
To: Lukas Wunner
Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
Mathias Duckeck, Casey Fitzpatrick
In-Reply-To: <dc7901c96bed90724bb372f4d3eead82bd22ac37.1525869303.git.lukas@wunner.de>
On Wed, May 9, 2018 at 8:43 AM, Lukas Wunner <lukas@wunner.de> wrote:
> When sending packets as fast as possible using "cangen -g 0 -i -x", the
> HI-3110 occasionally latches the interrupt pin high on completion of a
> packet, but doesn't set the TXCPLT bit in the INTF register. The INTF
> register contains 0x00 as if no interrupt has occurred. Even waiting
> for a few milliseconds after the interrupt doesn't help.
>
> Work around this apparent erratum by instead checking the TXMTY bit in
> the STATF register ("TX FIFO empty"). We know that we've queued up a
> packet for transmission if priv->tx_len is nonzero. If the TX FIFO is
> empty, transmission of that packet must have completed.
>
> Note that this is congruent with our handling of received packets, which
> likewise gleans from the STATF register whether a packet is waiting in
> the RX FIFO, instead of looking at the INTF register.
>
> Cc: Mathias Duckeck <m.duckeck@kunbus.de>
> Cc: Akshay Bhat <akshay.bhat@timesys.com>
> Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
> Cc: stable@vger.kernel.org # v4.12+
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
Acked-by: Akshay Bhat <akshay.bhat@timesys.com>
^ permalink raw reply
* Re: [PATCH] can: hi311x: Acquire SPI lock on ->do_get_berr_counter
From: Marc Kleine-Budde @ 2018-05-10 16:23 UTC (permalink / raw)
To: Lukas Wunner, Wolfgang Grandegger, linux-can, netdev
Cc: Mathias Duckeck, Akshay Bhat, Casey Fitzpatrick, Stef Walter,
Karel Zak
In-Reply-To: <b1518690dd293a008959b56cc52f3f29f52be25e.1525869191.git.lukas@wunner.de>
On 05/09/2018 02:38 PM, Lukas Wunner wrote:
> hi3110_get_berr_counter() may run concurrently to the rest of the driver
> but neglects to acquire the lock protecting access to the SPI device.
> As a result, it and the rest of the driver may clobber each other's tx
> and rx buffers.
>
> We became aware of this issue because transmission of packets with
> "cangen -g 0 -i -x" frequently hung. It turns out that agetty executes
> ->do_get_berr_counter every few seconds via the following call stack:
>
> CPU: 2 PID: 1605 Comm: agetty
> [<7f3f7500>] (hi3110_get_berr_counter [hi311x])
> [<7f130204>] (can_fill_info [can_dev])
> [<80693bc0>] (rtnl_fill_ifinfo)
> [<806949ec>] (rtnl_dump_ifinfo)
> [<806b4834>] (netlink_dump)
> [<806b4bc8>] (netlink_recvmsg)
> [<8065f180>] (sock_recvmsg)
> [<80660f90>] (___sys_recvmsg)
> [<80661e7c>] (__sys_recvmsg)
> [<80661ec0>] (SyS_recvmsg)
> [<80108b20>] (ret_fast_syscall+0x0/0x1c)
>
> agetty listens to netlink messages in order to update the login prompt
> when IP addresses change (if /etc/issue contains \4 or \6 escape codes):
> https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=e36deb6424e8
>
> It's a useful feature, though it seems questionable that it causes CAN
> bit error statistics to be queried.
>
> Be that as it may, if hi3110_get_berr_counter() is invoked while a frame
> is sent by hi3110_hw_tx(), bogus SPI transfers like the following may
> occur:
>
> => 12 00 (hi3110_get_berr_counter() wanted to transmit
> EC 00 to query the transmit error counter,
> but the first byte was overwritten by
> hi3110_hw_tx_frame())
>
> => EA 00 3E 80 01 FB (hi3110_hw_tx_frame() wanted to transmit a
> frame, but the first byte was overwritten by
> hi3110_get_berr_counter() because it wanted
> to query the receive error counter)
>
> This sequence hangs the transmission because the driver believes it has
> sent a frame and waits for the interrupt signaling completion, but in
> reality the chip has never sent away the frame since the commands it
> received were malformed.
>
> Fix by acquiring the SPI lock in hi3110_get_berr_counter().
>
> I've scrutinized the entire driver for further unlocked SPI accesses but
> found no others.
>
> Cc: Mathias Duckeck <m.duckeck@kunbus.de>
> Cc: Akshay Bhat <akshay.bhat@timesys.com>
> Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
> Cc: Stef Walter <stefw@redhat.com>
> Cc: Karel Zak <kzak@redhat.com>
> Cc: stable@vger.kernel.org # v4.12+
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
Applied to can.
Tnx,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
^ permalink raw reply
* Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Borislav Petkov @ 2018-05-10 16:20 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Peter Zijlstra, Yonghong Song, mingo, torvalds, ast, daniel,
linux-kernel, x86, netdev, kernel-team, Thomas Gleixner
In-Reply-To: <20180510155240.5s2fpgm2fwal3jlj@ast-mbp.dhcp.thefacebook.com>
On Thu, May 10, 2018 at 08:52:42AM -0700, Alexei Starovoitov wrote:
> That makes me wonder what happened with "we do not break user space" rule?
As someone already pointed out on IRC, arch/x86/include/asm/cpufeature.h
is solely a kernel header so nothing but kernel should include it. So
forget the userspace breakage "argument".
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply
* Re: [PATCH] selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
From: Paul Moore @ 2018-05-10 16:05 UTC (permalink / raw)
To: Alexey Kodanev
Cc: Stephen Smalley, Richard Haines, selinux, Eric Paris,
linux-security-module, netdev
In-Reply-To: <aab86e3c-0891-766b-53aa-ad471394ac46@oracle.com>
On Thu, May 10, 2018 at 5:28 AM, Alexey Kodanev
<alexey.kodanev@oracle.com> wrote:
> On 10.05.2018 01:02, Paul Moore wrote:
> ...
>> I just had a better look at this and I believe that Alexey and Stephen
>> are right: this is the best option. My apologies for the noise
>> earlier. However, while looking at the code I think there are some
>> additional necessary changes:
>>
>> * In the case of an SCTP socket, we should return -EINVAL, just as we
>> do with other address families.
>
> Right.
>
>> * While not strictly related to AF_UNSPEC, we really should be passing
>> the address family of the sockaddr, and not the socket, to functions
>> that need to interpret the bind address/port.
>
> That looks like a correct solution. I guess we need the same fix for
> sctp_connectx(), in selinux_socket_connect_helper().
Yes. I think we also need a small change to
selinux_sctp_bind_connect() to both not error out on AF_UNSPEC, and to
return EINVAL on a bad address family (some of the callers pass on the
return value, some don't).
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 5f30045b2053..a8bac9b37ee7 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -5277,6 +5277,7 @@ static int selinux_sctp_bind_connect(struct sock *sk, int>
while (walk_size < addrlen) {
addr = addr_buf;
switch (addr->sa_family) {
+ case AF_UNSPEC:
case AF_INET:
len = sizeof(struct sockaddr_in);
break;
@@ -5284,7 +5285,7 @@ static int selinux_sctp_bind_connect(struct sock *sk, int>
len = sizeof(struct sockaddr_in6);
break;
default:
- return -EAFNOSUPPORT;
+ return -EINVAL;
}
err = -EINVAL;
>> I'm waiting for my kernel to compile so I haven't given this any
>> sanity testing, but the patch below is what I think we need ...
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 4cafe6a19167..5f30045b2053 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -4576,6 +4576,7 @@ static int selinux_socket_post_create(struct socket *sock,
>> int family,
>> static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, i
>> nt addrlen)
>> {
>> struct sock *sk = sock->sk;
>> + struct sk_security_struct *sksec = sk->sk_security;
>> u16 family;
>> int err;
>>
>> @@ -4587,13 +4588,13 @@ static int selinux_socket_bind(struct socket *sock, stru
>> ct sockaddr *address, in
>> family = sk->sk_family;
>> if (family == PF_INET || family == PF_INET6) {
>> char *addrp;
>> - struct sk_security_struct *sksec = sk->sk_security;
>> struct common_audit_data ad;
>> struct lsm_network_audit net = {0,};
>> struct sockaddr_in *addr4 = NULL;
>> struct sockaddr_in6 *addr6 = NULL;
>> unsigned short snum;
>> u32 sid, node_perm;
>> + u16 family_sa = address->sa_family;
>>
>> /*
>> * sctp_bindx(3) calls via selinux_sctp_bind_connect()
>> @@ -4601,11 +4602,19 @@ static int selinux_socket_bind(struct socket *sock, stru
>> ct sockaddr *address, in
>> * need to check address->sa_family as it is possible to have
>> * sk->sk_family = PF_INET6 with addr->sa_family = AF_INET.
>> */
>> - switch (address->sa_family) {
>> + switch (family_sa) {
>> + case AF_UNSPEC:
>> case AF_INET:
>> if (addrlen < sizeof(struct sockaddr_in))
>> return -EINVAL;
>> addr4 = (struct sockaddr_in *)address;
>> + if (family_sa == AF_UNSPEC) {
>> + /* see "__inet_bind()", we only want to allow
>> + * AF_UNSPEC if the address is INADDR_ANY */
>> + if (addr4->sin_addr.s_addr != htonl(INADDR_ANY))
>> + goto err_af;
>> + family_sa = AF_INET;
>> + }
>> snum = ntohs(addr4->sin_port);
>> addrp = (char *)&addr4->sin_addr.s_addr;
>> break;
>> @@ -4617,15 +4626,14 @@ static int selinux_socket_bind(struct socket *sock, stru
>> ct sockaddr *address, in
>> addrp = (char *)&addr6->sin6_addr.s6_addr;
>> break;
>> default:
>> - /* Note that SCTP services expect -EINVAL, whereas
>> - * others expect -EAFNOSUPPORT.
>> - */
>> - if (sksec->sclass == SECCLASS_SCTP_SOCKET)
>> - return -EINVAL;
>> - else
>> - return -EAFNOSUPPORT;
>> + goto err_af;
>> }
>>
>> + ad.type = LSM_AUDIT_DATA_NET;
>> + ad.u.net = &net;
>> + ad.u.net->sport = htons(snum);
>> + ad.u.net->family = family_sa;
>> +
>
> May be we could move setting ad.u.net->v{4|6}info.saddr here as well?
I looked at that too, the problem is that if we set the IP address
here it will be reported in the audit record for a name_bind failure,
which is a change from the current behavior. One could argue this is
the correct thing to do, but I would like to limit the number of
changes for patches that are destined for the -rcX stream.
Let's leave them separate for now.
> Will send a v2 of this patch so that SCTP socket returns EINVAL with
> AF_UNSPEC. Should I prepare a patch with correcting 'ad.u.net->family'
> and sel_netnode_sid()?
Please, that would be helpful. I think all of the issues we have
identified in this thread should be fixed during the v4.17-rcX
releases, so if you don't do it I'll need to do it :)
--
paul moore
www.paul-moore.com
^ permalink raw reply related
* Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Alexei Starovoitov @ 2018-05-10 15:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Yonghong Song, mingo, torvalds, ast, daniel, linux-kernel, x86,
netdev, kernel-team, Thomas Gleixner
In-Reply-To: <20180510100634.GZ12217@hirez.programming.kicks-ass.net>
On Thu, May 10, 2018 at 12:06:34PM +0200, Peter Zijlstra wrote:
> On Thu, May 03, 2018 at 08:31:19PM -0700, Yonghong Song wrote:
>
> > This approach is preferred since the already deployed bcc scripts, or
> > any other bpf applicaitons utilizing LLVM JIT compilation functionality,
> > will continue work with the new kernel without re-compilation and
> > re-deployment.
>
> So I really hate this and would much rather see the BPF build
> environment changed. It not consistenyly having __BPF__ defined really
> smells like a bug on your end.
>
> Sometimes you just need to update tools... Is it really too hard to do
> -D__BPF__ in the bpf build process that we need to mollest the kernel
> for it?
>
> > Note that this is a hack in the kernel to workaround bpf compilation issue.
> > The hack will be removed once clang starts to support asm goto.
>
> Note that that ^^ already mandates people re-deploy their bpf tools, so
> why is llvm supporting asm-goto a better point to re-deploy than fixing
> a consistent __BPF__ define for the bpf build environment?
This thread has been going for over a month now. Looks like we're starting
to go in circles and what we discussed earlier got lost. To recap:
- this is NOT a bpf issue. libbcc is the first user that got broken
by commit d0266046ad54 ("x86: Remove FAST_FEATURE_TESTS")
- all other libraries and tools (like fuzzers, static code analyzers, etc)
that are using clang front-end to process kernel headers are broken too
That makes me wonder what happened with "we do not break user space" rule?
I think the safest course of action would be to partially revert the
offending commit d0266046ad54 which we proposed first.
Since you were concerned that this is somehow will disincentivize clang
folks to add asm-goto support, we agreed to add this __NO_CLANG_BPF_HACK
macro to un-break libbcc at least, so that existing libbcc installations
will continue to work when people upgrade kernels.
The __BPF__ macro is automatically defined by clang when '-target bpf' is used.
This is NOT the case here. libbcc has to use native target when
processing kernel headers.
It happens from time to time that one or the other libbcc script gets
broken by new kernel because kprobe symbol that the script is using
got changed in the recent kernel. That's ok and we deal with this situation
all the time. What's different this time that ALL bcc scripts are broken
because of this commit and we cannot workaround in user space.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox