* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
From: Michael S. Tsirkin @ 2016-12-09 3:05 UTC (permalink / raw)
To: John Fastabend
Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
netdev, brouer
In-Reply-To: <5849F52A.7050105@gmail.com>
On Thu, Dec 08, 2016 at 04:04:58PM -0800, John Fastabend wrote:
> On 16-12-08 01:36 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
> >> This adds support for dynamically setting the LRO feature flag. The
> >> message to control guest features in the backend uses the
> >> CTRL_GUEST_OFFLOADS msg type.
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> ---
> >> drivers/net/virtio_net.c | 40 +++++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 39 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index a21d93a..a5c47b1 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -1419,6 +1419,36 @@ static void virtnet_init_settings(struct net_device *dev)
> >> .set_settings = virtnet_set_settings,
> >> };
> >>
> >> +static int virtnet_set_features(struct net_device *netdev,
> >> + netdev_features_t features)
> >> +{
> >> + struct virtnet_info *vi = netdev_priv(netdev);
> >> + struct virtio_device *vdev = vi->vdev;
> >> + struct scatterlist sg;
> >> + u64 offloads = 0;
> >> +
> >> + if (features & NETIF_F_LRO)
> >> + offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
> >> + (1 << VIRTIO_NET_F_GUEST_TSO6);
> >> +
> >> + if (features & NETIF_F_RXCSUM)
> >> + offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
> >> +
> >> + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> >> + sg_init_one(&sg, &offloads, sizeof(uint64_t));
> >> + if (!virtnet_send_command(vi,
> >> + VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> >> + VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
> >> + &sg)) {
> >
> > Hmm I just realised that this will slow down setups that bridge
> > virtio net interfaces since bridge calls this if provided.
> > See below.
>
>
> Really? What code is trying to turn off GRO via the GUEST_OFFLOADS LRO
> command. My qemu/Linux setup has a set of tap/vhost devices attached to
> a bridge and all of them have LRO enabled even with this patch series.
>
> I must missing a setup handler somewhere?
Turning off LRO, not GRO.
> >
> >> + dev_warn(&netdev->dev,
> >> + "Failed to set guest offloads by virtnet command.\n");
> >> + return -EINVAL;
> >> + }
> >> + }
> >
> > Hmm if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is off, this fails
> > silently. It might actually be a good idea to avoid
> > breaking setups.
> >
> >> +
> >> + return 0;
> >> +}
> >> +
> >> static const struct net_device_ops virtnet_netdev = {
> >> .ndo_open = virtnet_open,
> >> .ndo_stop = virtnet_close,
> >> @@ -1435,6 +1465,7 @@ static void virtnet_init_settings(struct net_device *dev)
> >> #ifdef CONFIG_NET_RX_BUSY_POLL
> >> .ndo_busy_poll = virtnet_busy_poll,
> >> #endif
> >> + .ndo_set_features = virtnet_set_features,
> >> };
> >>
> >> static void virtnet_config_changed_work(struct work_struct *work)
> >> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
> >> if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> >> dev->features |= NETIF_F_RXCSUM;
> >>
> >> + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> >> + virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> >> + dev->features |= NETIF_F_LRO;
> >> + dev->hw_features |= NETIF_F_LRO;
> >
> > So the issue is I think that the virtio "LRO" isn't really
> > LRO, it's typically just GRO forwarded to guests.
> > So these are easily re-split along MTU boundaries,
> > which makes it ok to forward these across bridges.
> >
> > It's not nice that we don't document this in the spec,
> > but it's the reality and people rely on this.
> >
> > For now, how about doing a custom thing and just disable/enable
> > it as XDP is attached/detached?
>
> The annoying part about doing this is ethtool will say that it is fixed
> yet it will be changed by seemingly unrelated operation.
ATM ethtool does not tell you LRO is enabled, since
again, this isn't exactly LRO. It's a safe variant that
supports e.g. bridging.
> I'm not sure I
> like the idea to start automatically configuring the link via xdp_set.
That's what I thought up off-hand. Let me think about better
interfaces over the weekend.
> >
> >> + }
> >> +
> >> dev->vlan_features = dev->features;
> >>
> >> /* MTU range: 68 - 65535 */
> >> @@ -2057,7 +2094,8 @@ static int virtnet_restore(struct virtio_device *vdev)
> >> VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
> >> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> >> VIRTIO_NET_F_CTRL_MAC_ADDR, \
> >> - VIRTIO_NET_F_MTU
> >> + VIRTIO_NET_F_MTU, \
> >> + VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> >>
> >> static unsigned int features[] = {
> >> VIRTNET_FEATURES,
^ permalink raw reply
* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: Michael S. Tsirkin @ 2016-12-09 3:01 UTC (permalink / raw)
To: David Miller
Cc: john.fastabend, alexei.starovoitov, daniel, shm, tgraf,
john.r.fastabend, netdev, brouer
In-Reply-To: <20161208.171602.544798927904408724.davem@davemloft.net>
On Thu, Dec 08, 2016 at 05:16:02PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Thu, 8 Dec 2016 12:46:07 -0800
>
> > On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> >> On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> >>> From: John Fastabend <john.fastabend@gmail.com>
> >>> Date: Wed, 07 Dec 2016 12:10:47 -0800
> >>>
> >>>> This implements virtio_net for the mergeable buffers and big_packet
> >>>> modes. I tested this with vhost_net running on qemu and did not see
> >>>> any issues. For testing num_buf > 1 I added a hack to vhost driver
> >>>> to only but 100 bytes per buffer.
> >>> ...
> >>>
> >>> So where are we with this?
> >
> > There is one possible issue with a hang that Michael pointed out. I can
> > either spin a v6 or if you pull this v5 series in I can post a bugfix
> > for it. I am not seeing the issue in practice XDP virtio has been up
> > and running on my box here for days without issue.
>
> If that's the case then please spin a v6 and I'll apply it.
I think 1/6 should be reworked, disabling/enabling GUEST_TSO
transparently without touching the LRO flag, and without adding
NDO callback.
--
MST
^ permalink raw reply
* [PATCHv3 perf/core 7/7] samples/bpf: Move open_raw_sock to separate header
From: Joe Stringer @ 2016-12-09 2:46 UTC (permalink / raw)
To: linux-kernel; +Cc: wangnan0, ast, daniel, acme, netdev
In-Reply-To: <20161209024620.31660-1-joe@ovn.org>
This function was declared in libbpf.c and was the only remaining
function in this library, but has nothing to do with BPF. Shift it out
into a new header, sock_example.h, and include it from the relevant
samples.
Signed-off-by: Joe Stringer <joe@ovn.org>
---
v3: First post.
---
samples/bpf/Makefile | 2 +-
samples/bpf/fds_example.c | 1 +
samples/bpf/libbpf.h | 3 ---
samples/bpf/sock_example.c | 1 +
samples/bpf/{libbpf.c => sock_example.h} | 3 +--
samples/bpf/sockex1_user.c | 1 +
samples/bpf/sockex2_user.c | 1 +
samples/bpf/sockex3_user.c | 1 +
8 files changed, 7 insertions(+), 6 deletions(-)
rename samples/bpf/{libbpf.c => sock_example.h} (92%)
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 0adc47e67e65..7f083faa6e16 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -30,7 +30,7 @@ hostprogs-y += sampleip
hostprogs-y += tc_l2_redirect
# Libbpf dependencies
-LIBBPF := libbpf.o ../../tools/lib/bpf/bpf.o
+LIBBPF := ../../tools/lib/bpf/bpf.o
test_verifier-objs := test_verifier.o $(LIBBPF)
test_maps-objs := test_maps.o $(LIBBPF)
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c
index 4ffd8f340496..7ae3b19f5c42 100644
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -14,6 +14,7 @@
#include "bpf_load.h"
#include "libbpf.h"
+#include "sock_example.h"
#define BPF_F_PIN (1 << 0)
#define BPF_F_GET (1 << 1)
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 09aedc320009..3705fba453a0 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -185,7 +185,4 @@ struct bpf_insn;
.off = 0, \
.imm = 0 })
-/* create RAW socket and bind to interface 'name' */
-int open_raw_sock(const char *name);
-
#endif
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index 7ab636c30154..bb418fd0a1f2 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -27,6 +27,7 @@
#include <linux/ip.h>
#include <stddef.h>
#include "libbpf.h"
+#include "sock_example.h"
static int test_sock(void)
{
diff --git a/samples/bpf/libbpf.c b/samples/bpf/sock_example.h
similarity index 92%
rename from samples/bpf/libbpf.c
rename to samples/bpf/sock_example.h
index bee473a494f1..09f7fe7e5fd7 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/sock_example.h
@@ -1,4 +1,3 @@
-/* eBPF mini library */
#include <stdlib.h>
#include <stdio.h>
#include <linux/unistd.h>
@@ -11,7 +10,7 @@
#include <arpa/inet.h>
#include "libbpf.h"
-int open_raw_sock(const char *name)
+static inline int open_raw_sock(const char *name)
{
struct sockaddr_ll sll;
int sock;
diff --git a/samples/bpf/sockex1_user.c b/samples/bpf/sockex1_user.c
index 2956d893d732..5cdddc5c9015 100644
--- a/samples/bpf/sockex1_user.c
+++ b/samples/bpf/sockex1_user.c
@@ -3,6 +3,7 @@
#include <linux/bpf.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "sock_example.h"
#include <unistd.h>
#include <arpa/inet.h>
diff --git a/samples/bpf/sockex2_user.c b/samples/bpf/sockex2_user.c
index c43958a67cca..7ae4e2e5cf3a 100644
--- a/samples/bpf/sockex2_user.c
+++ b/samples/bpf/sockex2_user.c
@@ -3,6 +3,7 @@
#include <linux/bpf.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "sock_example.h"
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/resource.h>
diff --git a/samples/bpf/sockex3_user.c b/samples/bpf/sockex3_user.c
index 2cb9011ea440..a100cf6c95bb 100644
--- a/samples/bpf/sockex3_user.c
+++ b/samples/bpf/sockex3_user.c
@@ -3,6 +3,7 @@
#include <linux/bpf.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "sock_example.h"
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/resource.h>
--
2.10.2
^ permalink raw reply related
* [PATCHv3 perf/core 6/7] samples/bpf: Remove perf_event_open() declaration
From: Joe Stringer @ 2016-12-09 2:46 UTC (permalink / raw)
To: linux-kernel; +Cc: wangnan0, ast, daniel, acme, netdev
In-Reply-To: <20161209024620.31660-1-joe@ovn.org>
This declaration was made in samples/bpf/libbpf.c for convenience, but
there's already one in tools/perf/perf-sys.h. Reuse that one.
Signed-off-by: Joe Stringer <joe@ovn.org>
---
v3: First post.
---
samples/bpf/Makefile | 3 ++-
samples/bpf/bpf_load.c | 3 ++-
samples/bpf/libbpf.c | 7 -------
samples/bpf/libbpf.h | 3 ---
samples/bpf/sampleip_user.c | 3 ++-
samples/bpf/trace_event_user.c | 9 +++++----
samples/bpf/trace_output_user.c | 3 ++-
samples/bpf/tracex6_user.c | 3 ++-
8 files changed, 15 insertions(+), 19 deletions(-)
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index c8f7ed37b2de..0adc47e67e65 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -92,7 +92,8 @@ always += test_current_task_under_cgroup_kern.o
always += trace_event_kern.o
always += sampleip_kern.o
-HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/
+HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/ \
+ -I$(objtree)/tools/include -I$(objtree)/tools/perf
HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
HOSTLOADLIBES_fds_example += -lelf
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index f8e3c58a0897..d683bd278171 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -19,6 +19,7 @@
#include <ctype.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "perf-sys.h"
#define DEBUGFS "/sys/kernel/debug/tracing/"
@@ -168,7 +169,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
id = atoi(buf);
attr.config = id;
- efd = perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
+ efd = sys_perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
if (efd < 0) {
printf("event %d fd %d err %s\n", id, efd, strerror(errno));
return -1;
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index d9af876b4a2c..bee473a494f1 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -34,10 +34,3 @@ int open_raw_sock(const char *name)
return sock;
}
-
-int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
- int group_fd, unsigned long flags)
-{
- return syscall(__NR_perf_event_open, attr, pid, cpu,
- group_fd, flags);
-}
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index cc815624aacf..09aedc320009 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -188,7 +188,4 @@ struct bpf_insn;
/* create RAW socket and bind to interface 'name' */
int open_raw_sock(const char *name);
-struct perf_event_attr;
-int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
- int group_fd, unsigned long flags);
#endif
diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
index 09ab620b324c..476a11947180 100644
--- a/samples/bpf/sampleip_user.c
+++ b/samples/bpf/sampleip_user.c
@@ -21,6 +21,7 @@
#include <sys/ioctl.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "perf-sys.h"
#define DEFAULT_FREQ 99
#define DEFAULT_SECS 5
@@ -50,7 +51,7 @@ static int sampling_start(int *pmu_fd, int freq)
};
for (i = 0; i < nr_cpus; i++) {
- pmu_fd[i] = perf_event_open(&pe_sample_attr, -1 /* pid */, i,
+ pmu_fd[i] = sys_perf_event_open(&pe_sample_attr, -1 /* pid */, i,
-1 /* group_fd */, 0 /* flags */);
if (pmu_fd[i] < 0) {
fprintf(stderr, "ERROR: Initializing perf sampling\n");
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index de8fd0266d78..ccb0cba8324a 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -20,6 +20,7 @@
#include <sys/resource.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "perf-sys.h"
#define SAMPLE_FREQ 50
@@ -126,9 +127,9 @@ static void test_perf_event_all_cpu(struct perf_event_attr *attr)
/* open perf_event on all cpus */
for (i = 0; i < nr_cpus; i++) {
- pmu_fd[i] = perf_event_open(attr, -1, i, -1, 0);
+ pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0);
if (pmu_fd[i] < 0) {
- printf("perf_event_open failed\n");
+ printf("sys_perf_event_open failed\n");
goto all_cpu_err;
}
assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 0);
@@ -147,9 +148,9 @@ static void test_perf_event_task(struct perf_event_attr *attr)
int pmu_fd;
/* open task bound event */
- pmu_fd = perf_event_open(attr, 0, -1, -1, 0);
+ pmu_fd = sys_perf_event_open(attr, 0, -1, -1, 0);
if (pmu_fd < 0) {
- printf("perf_event_open failed\n");
+ printf("sys_perf_event_open failed\n");
return;
}
assert(ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 0);
diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
index 9c38f7aa4515..64e692fd7d51 100644
--- a/samples/bpf/trace_output_user.c
+++ b/samples/bpf/trace_output_user.c
@@ -21,6 +21,7 @@
#include <signal.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "perf-sys.h"
static int pmu_fd;
@@ -160,7 +161,7 @@ static void test_bpf_perf_event(void)
};
int key = 0;
- pmu_fd = perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
+ pmu_fd = sys_perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
assert(pmu_fd >= 0);
assert(bpf_map_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index 7a3b4a4b19f3..1681cb7cd713 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -10,6 +10,7 @@
#include <linux/bpf.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "perf-sys.h"
#define SAMPLE_PERIOD 0x7fffffffffffffffULL
@@ -32,7 +33,7 @@ static void test_bpf_perf_event(void)
};
for (i = 0; i < nr_cpus; i++) {
- pmu_fd[i] = perf_event_open(&attr_insn_pmu, -1/*pid*/, i/*cpu*/, -1/*group_fd*/, 0);
+ pmu_fd[i] = sys_perf_event_open(&attr_insn_pmu, -1/*pid*/, i/*cpu*/, -1/*group_fd*/, 0);
if (pmu_fd[i] < 0) {
printf("event syscall failed\n");
goto exit;
--
2.10.2
^ permalink raw reply related
* [PATCHv3 perf/core 5/7] samples/bpf: Switch over to libbpf
From: Joe Stringer @ 2016-12-09 2:46 UTC (permalink / raw)
To: linux-kernel; +Cc: wangnan0, ast, daniel, acme, netdev
In-Reply-To: <20161209024620.31660-1-joe@ovn.org>
Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.
Signed-off-by: Joe Stringer <joe@ovn.org>
---
v3: First post.
---
samples/bpf/Makefile | 60 +++++++++++++-------------
samples/bpf/README.rst | 4 +-
samples/bpf/libbpf.c | 111 -------------------------------------------------
samples/bpf/libbpf.h | 19 +--------
tools/lib/bpf/Makefile | 2 +
5 files changed, 38 insertions(+), 158 deletions(-)
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 72c58675973e..c8f7ed37b2de 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -29,35 +29,38 @@ hostprogs-y += trace_event
hostprogs-y += sampleip
hostprogs-y += tc_l2_redirect
-test_verifier-objs := test_verifier.o libbpf.o
-test_maps-objs := test_maps.o libbpf.o
-sock_example-objs := sock_example.o libbpf.o
-fds_example-objs := bpf_load.o libbpf.o fds_example.o
-sockex1-objs := bpf_load.o libbpf.o sockex1_user.o
-sockex2-objs := bpf_load.o libbpf.o sockex2_user.o
-sockex3-objs := bpf_load.o libbpf.o sockex3_user.o
-tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
-tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
-tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
-tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
-tracex5-objs := bpf_load.o libbpf.o tracex5_user.o
-tracex6-objs := bpf_load.o libbpf.o tracex6_user.o
-test_probe_write_user-objs := bpf_load.o libbpf.o test_probe_write_user_user.o
-trace_output-objs := bpf_load.o libbpf.o trace_output_user.o
-lathist-objs := bpf_load.o libbpf.o lathist_user.o
-offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
-spintest-objs := bpf_load.o libbpf.o spintest_user.o
-map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
-test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
-test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
-xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
+# Libbpf dependencies
+LIBBPF := libbpf.o ../../tools/lib/bpf/bpf.o
+
+test_verifier-objs := test_verifier.o $(LIBBPF)
+test_maps-objs := test_maps.o $(LIBBPF)
+sock_example-objs := sock_example.o $(LIBBPF)
+fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o
+sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o
+sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o
+sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o
+tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o
+tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o
+tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
+tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
+tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
+tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
+trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
+lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
+offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o
+spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o
+map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o
+test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
+test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
+xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
# reuse xdp1 source intentionally
-xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
-test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \
+xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) \
test_current_task_under_cgroup_user.o
-trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
-sampleip-objs := bpf_load.o libbpf.o sampleip_user.o
-tc_l2_redirect-objs := bpf_load.o libbpf.o tc_l2_redirect_user.o
+trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o
+sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o
+tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
@@ -89,7 +92,7 @@ always += test_current_task_under_cgroup_kern.o
always += trace_event_kern.o
always += sampleip_kern.o
-HOSTCFLAGS += -I$(objtree)/usr/include
+HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/
HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
HOSTLOADLIBES_fds_example += -lelf
@@ -123,6 +126,7 @@ CLANG ?= clang
# Trick to allow make to be run from this directory
all:
+ $(MAKE) -C ../../ tools/lib/bpf/
$(MAKE) -C ../../ $$PWD/
clean:
diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
index a43eae3f0551..79f9a58f1872 100644
--- a/samples/bpf/README.rst
+++ b/samples/bpf/README.rst
@@ -1,8 +1,8 @@
eBPF sample programs
====================
-This directory contains a mini eBPF library, test stubs, verifier
-test-suite and examples for using eBPF.
+This directory contains a test stubs, verifier test-suite and examples
+for using eBPF. The examples use libbpf from tools/lib/bpf.
Build dependencies
==================
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index e5c5a69996fc..d9af876b4a2c 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -4,8 +4,6 @@
#include <linux/unistd.h>
#include <unistd.h>
#include <string.h>
-#include <linux/netlink.h>
-#include <linux/bpf.h>
#include <errno.h>
#include <net/ethernet.h>
#include <net/if.h>
@@ -13,115 +11,6 @@
#include <arpa/inet.h>
#include "libbpf.h"
-static __u64 ptr_to_u64(void *ptr)
-{
- return (__u64) (unsigned long) ptr;
-}
-
-int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
- int max_entries, int map_flags)
-{
- union bpf_attr attr = {
- .map_type = map_type,
- .key_size = key_size,
- .value_size = value_size,
- .max_entries = max_entries,
- .map_flags = map_flags,
- };
-
- return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
-}
-
-int bpf_map_update_elem(int fd, void *key, void *value, unsigned long long flags)
-{
- union bpf_attr attr = {
- .map_fd = fd,
- .key = ptr_to_u64(key),
- .value = ptr_to_u64(value),
- .flags = flags,
- };
-
- return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
-}
-
-int bpf_map_lookup_elem(int fd, void *key, void *value)
-{
- union bpf_attr attr = {
- .map_fd = fd,
- .key = ptr_to_u64(key),
- .value = ptr_to_u64(value),
- };
-
- return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
-}
-
-int bpf_map_delete_elem(int fd, void *key)
-{
- union bpf_attr attr = {
- .map_fd = fd,
- .key = ptr_to_u64(key),
- };
-
- return syscall(__NR_bpf, BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
-}
-
-int bpf_map_get_next_key(int fd, void *key, void *next_key)
-{
- union bpf_attr attr = {
- .map_fd = fd,
- .key = ptr_to_u64(key),
- .next_key = ptr_to_u64(next_key),
- };
-
- return syscall(__NR_bpf, BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
-}
-
-#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
-
-int bpf_load_program(enum bpf_prog_type prog_type,
- const struct bpf_insn *insns, int prog_len,
- const char *license, int kern_version,
- char *log_buf, size_t log_buf_sz)
-{
- union bpf_attr attr = {
- .prog_type = prog_type,
- .insns = ptr_to_u64((void *) insns),
- .insn_cnt = prog_len / sizeof(struct bpf_insn),
- .license = ptr_to_u64((void *) license),
- .log_buf = ptr_to_u64(log_buf),
- .log_size = log_buf_sz,
- .log_level = 1,
- };
-
- /* assign one field outside of struct init to make sure any
- * padding is zero initialized
- */
- attr.kern_version = kern_version;
-
- log_buf[0] = 0;
-
- return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
-}
-
-int bpf_obj_pin(int fd, const char *pathname)
-{
- union bpf_attr attr = {
- .pathname = ptr_to_u64((void *)pathname),
- .bpf_fd = fd,
- };
-
- return syscall(__NR_bpf, BPF_OBJ_PIN, &attr, sizeof(attr));
-}
-
-int bpf_obj_get(const char *pathname)
-{
- union bpf_attr attr = {
- .pathname = ptr_to_u64((void *)pathname),
- };
-
- return syscall(__NR_bpf, BPF_OBJ_GET, &attr, sizeof(attr));
-}
-
int open_raw_sock(const char *name)
{
struct sockaddr_ll sll;
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 1325152be4cd..cc815624aacf 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -2,24 +2,9 @@
#ifndef __LIBBPF_H
#define __LIBBPF_H
-struct bpf_insn;
-
-int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
- int max_entries, int map_flags);
-int bpf_map_update_elem(int fd, void *key, void *value, unsigned long long flags);
-int bpf_map_lookup_elem(int fd, void *key, void *value);
-int bpf_map_delete_elem(int fd, void *key);
-int bpf_map_get_next_key(int fd, void *key, void *next_key);
-
-int bpf_load_program(enum bpf_prog_type prog_type,
- const struct bpf_insn *insns, int insn_len,
- const char *license, int kern_version,
- char *log_buf, size_t log_buf_sz);
+#include <bpf/bpf.h>
-int bpf_obj_pin(int fd, const char *pathname);
-int bpf_obj_get(const char *pathname);
-
-#define BPF_LOG_BUF_SIZE 65536
+struct bpf_insn;
/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 62d89d50fcbd..616bd55f3be8 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -149,6 +149,8 @@ CMD_TARGETS = $(LIB_FILE)
TARGETS = $(CMD_TARGETS)
+libbpf: all
+
all: fixdep $(VERSION_FILES) all_cmd
all_cmd: $(CMD_TARGETS)
--
2.10.2
^ permalink raw reply related
* [PATCHv3 perf/core 4/7] samples/bpf: Make samples more libbpf-centric
From: Joe Stringer @ 2016-12-09 2:46 UTC (permalink / raw)
To: linux-kernel; +Cc: wangnan0, ast, daniel, acme, netdev
In-Reply-To: <20161209024620.31660-1-joe@ovn.org>
Switch all of the sample code to use the function names from
tools/lib/bpf so that they're consistent with that, and to declare their
own log buffers. This allow the next commit to be purely devoted to
getting rid of the duplicate library in samples/bpf.
Signed-off-by: Joe Stringer <joe@ovn.org>
---
samples/bpf/bpf_load.c | 17 ++-
samples/bpf/fds_example.c | 9 +-
samples/bpf/lathist_user.c | 3 +-
samples/bpf/libbpf.c | 23 ++--
samples/bpf/libbpf.h | 18 +--
samples/bpf/map_perf_test_user.c | 1 +
samples/bpf/offwaketime_user.c | 10 +-
samples/bpf/sampleip_user.c | 5 +-
samples/bpf/sock_example.c | 10 +-
samples/bpf/sockex1_user.c | 8 +-
samples/bpf/sockex2_user.c | 6 +-
samples/bpf/sockex3_user.c | 6 +-
samples/bpf/spintest_user.c | 10 +-
samples/bpf/tc_l2_redirect_user.c | 4 +-
samples/bpf/test_cgrp2_array_pin.c | 4 +-
samples/bpf/test_current_task_under_cgroup_user.c | 10 +-
samples/bpf/test_maps.c | 142 +++++++++++-----------
samples/bpf/test_overhead_user.c | 2 +
samples/bpf/test_probe_write_user_user.c | 4 +-
samples/bpf/test_verifier.c | 8 +-
samples/bpf/trace_event_user.c | 15 +--
samples/bpf/trace_output_user.c | 3 +-
samples/bpf/tracex1_user.c | 2 +
samples/bpf/tracex2_user.c | 12 +-
samples/bpf/tracex3_user.c | 6 +-
samples/bpf/tracex4_user.c | 6 +-
samples/bpf/tracex5_user.c | 2 +
samples/bpf/tracex6_user.c | 4 +-
samples/bpf/xdp1_user.c | 4 +-
29 files changed, 200 insertions(+), 154 deletions(-)
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 97913e109b14..f8e3c58a0897 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -18,7 +18,6 @@
#include <poll.h>
#include <ctype.h>
#include "libbpf.h"
-#include "bpf_helpers.h"
#include "bpf_load.h"
#define DEBUGFS "/sys/kernel/debug/tracing/"
@@ -26,17 +25,26 @@
static char license[128];
static int kern_version;
static bool processed_sec[128];
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
int map_fd[MAX_MAPS];
int prog_fd[MAX_PROGS];
int event_fd[MAX_PROGS];
int prog_cnt;
int prog_array_fd = -1;
+struct bpf_map_def {
+ unsigned int type;
+ unsigned int key_size;
+ unsigned int value_size;
+ unsigned int max_entries;
+ unsigned int map_flags;
+};
+
static int populate_prog_array(const char *event, int prog_fd)
{
int ind = atoi(event), err;
- err = bpf_update_elem(prog_array_fd, &ind, &prog_fd, BPF_ANY);
+ err = bpf_map_update_elem(prog_array_fd, &ind, &prog_fd, BPF_ANY);
if (err < 0) {
printf("failed to store prog_fd in prog_array\n");
return -1;
@@ -77,9 +85,10 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
return -1;
}
- fd = bpf_prog_load(prog_type, prog, size, license, kern_version);
+ fd = bpf_load_program(prog_type, prog, size, license, kern_version,
+ NULL, 0);
if (fd < 0) {
- printf("bpf_prog_load() err=%d\n%s", errno, bpf_log_buf);
+ printf("bpf_load_program() err=%d\n%s", errno, bpf_log_buf);
return -1;
}
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c
index 625e797be6ef..4ffd8f340496 100644
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -58,8 +58,9 @@ static int bpf_prog_create(const char *object)
assert(!load_bpf_file((char *)object));
return prog_fd[0];
} else {
- return bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER,
- insns, sizeof(insns), "GPL", 0);
+ return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER,
+ insns, sizeof(insns), "GPL", 0, NULL,
+ 0);
}
}
@@ -83,12 +84,12 @@ static int bpf_do_map(const char *file, uint32_t flags, uint32_t key,
}
if ((flags & BPF_F_KEY_VAL) == BPF_F_KEY_VAL) {
- ret = bpf_update_elem(fd, &key, &value, 0);
+ ret = bpf_map_update_elem(fd, &key, &value, 0);
printf("bpf: fd:%d u->(%u:%u) ret:(%d,%s)\n", fd, key, value,
ret, strerror(errno));
assert(ret == 0);
} else if (flags & BPF_F_KEY) {
- ret = bpf_lookup_elem(fd, &key, &value);
+ ret = bpf_map_lookup_elem(fd, &key, &value);
printf("bpf: fd:%d l->(%u):%u ret:(%d,%s)\n", fd, key, value,
ret, strerror(errno));
assert(ret == 0);
diff --git a/samples/bpf/lathist_user.c b/samples/bpf/lathist_user.c
index 65da8c1576de..bcdee00816b2 100644
--- a/samples/bpf/lathist_user.c
+++ b/samples/bpf/lathist_user.c
@@ -23,6 +23,7 @@ struct cpu_hist {
};
static struct cpu_hist cpu_hist[MAX_CPU];
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
static void stars(char *str, long val, long max, int width)
{
@@ -73,7 +74,7 @@ static void get_data(int fd)
for (c = 0; c < MAX_CPU; c++) {
for (i = 0; i < MAX_ENTRIES; i++) {
key = c * MAX_ENTRIES + i;
- bpf_lookup_elem(fd, &key, &value);
+ bpf_map_lookup_elem(fd, &key, &value);
cpu_hist[c].data[i] = value;
if (value > cpu_hist[c].max)
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 9969e35550c3..e5c5a69996fc 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -32,7 +32,7 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
}
-int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags)
+int bpf_map_update_elem(int fd, void *key, void *value, unsigned long long flags)
{
union bpf_attr attr = {
.map_fd = fd,
@@ -44,7 +44,7 @@ int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags)
return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}
-int bpf_lookup_elem(int fd, void *key, void *value)
+int bpf_map_lookup_elem(int fd, void *key, void *value)
{
union bpf_attr attr = {
.map_fd = fd,
@@ -55,7 +55,7 @@ int bpf_lookup_elem(int fd, void *key, void *value)
return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
}
-int bpf_delete_elem(int fd, void *key)
+int bpf_map_delete_elem(int fd, void *key)
{
union bpf_attr attr = {
.map_fd = fd,
@@ -65,7 +65,7 @@ int bpf_delete_elem(int fd, void *key)
return syscall(__NR_bpf, BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
}
-int bpf_get_next_key(int fd, void *key, void *next_key)
+int bpf_map_get_next_key(int fd, void *key, void *next_key)
{
union bpf_attr attr = {
.map_fd = fd,
@@ -78,19 +78,18 @@ int bpf_get_next_key(int fd, void *key, void *next_key)
#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
-char bpf_log_buf[LOG_BUF_SIZE];
-
-int bpf_prog_load(enum bpf_prog_type prog_type,
- const struct bpf_insn *insns, int prog_len,
- const char *license, int kern_version)
+int bpf_load_program(enum bpf_prog_type prog_type,
+ const struct bpf_insn *insns, int prog_len,
+ const char *license, int kern_version,
+ char *log_buf, size_t log_buf_sz)
{
union bpf_attr attr = {
.prog_type = prog_type,
.insns = ptr_to_u64((void *) insns),
.insn_cnt = prog_len / sizeof(struct bpf_insn),
.license = ptr_to_u64((void *) license),
- .log_buf = ptr_to_u64(bpf_log_buf),
- .log_size = LOG_BUF_SIZE,
+ .log_buf = ptr_to_u64(log_buf),
+ .log_size = log_buf_sz,
.log_level = 1,
};
@@ -99,7 +98,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
*/
attr.kern_version = kern_version;
- bpf_log_buf[0] = 0;
+ log_buf[0] = 0;
return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
}
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index ac6edb61b64a..1325152be4cd 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -6,20 +6,20 @@ struct bpf_insn;
int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
int max_entries, int map_flags);
-int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags);
-int bpf_lookup_elem(int fd, void *key, void *value);
-int bpf_delete_elem(int fd, void *key);
-int bpf_get_next_key(int fd, void *key, void *next_key);
+int bpf_map_update_elem(int fd, void *key, void *value, unsigned long long flags);
+int bpf_map_lookup_elem(int fd, void *key, void *value);
+int bpf_map_delete_elem(int fd, void *key);
+int bpf_map_get_next_key(int fd, void *key, void *next_key);
-int bpf_prog_load(enum bpf_prog_type prog_type,
- const struct bpf_insn *insns, int insn_len,
- const char *license, int kern_version);
+int bpf_load_program(enum bpf_prog_type prog_type,
+ const struct bpf_insn *insns, int insn_len,
+ const char *license, int kern_version,
+ char *log_buf, size_t log_buf_sz);
int bpf_obj_pin(int fd, const char *pathname);
int bpf_obj_get(const char *pathname);
-#define LOG_BUF_SIZE 65536
-extern char bpf_log_buf[LOG_BUF_SIZE];
+#define BPF_LOG_BUF_SIZE 65536
/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index 3147377e8fd3..62ed870590cc 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -37,6 +37,7 @@ static __u64 time_get_ns(void)
#define PERCPU_HASH_KMALLOC (1 << 3)
static int test_flags = ~0;
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
static void test_hash_prealloc(int cpu)
{
diff --git a/samples/bpf/offwaketime_user.c b/samples/bpf/offwaketime_user.c
index 6f002a9c24fa..dd2598b1fabc 100644
--- a/samples/bpf/offwaketime_user.c
+++ b/samples/bpf/offwaketime_user.c
@@ -20,6 +20,8 @@
#define PRINT_RAW_ADDR 0
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
static void print_ksym(__u64 addr)
{
struct ksym *sym;
@@ -49,14 +51,14 @@ static void print_stack(struct key_t *key, __u64 count)
int i;
printf("%s;", key->target);
- if (bpf_lookup_elem(map_fd[3], &key->tret, ip) != 0) {
+ if (bpf_map_lookup_elem(map_fd[3], &key->tret, ip) != 0) {
printf("---;");
} else {
for (i = PERF_MAX_STACK_DEPTH - 1; i >= 0; i--)
print_ksym(ip[i]);
}
printf("-;");
- if (bpf_lookup_elem(map_fd[3], &key->wret, ip) != 0) {
+ if (bpf_map_lookup_elem(map_fd[3], &key->wret, ip) != 0) {
printf("---;");
} else {
for (i = 0; i < PERF_MAX_STACK_DEPTH; i++)
@@ -77,8 +79,8 @@ static void print_stacks(int fd)
struct key_t key = {}, next_key;
__u64 value;
- while (bpf_get_next_key(fd, &key, &next_key) == 0) {
- bpf_lookup_elem(fd, &next_key, &value);
+ while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
+ bpf_map_lookup_elem(fd, &next_key, &value);
print_stack(&next_key, value);
key = next_key;
}
diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
index 260a6bdd6413..09ab620b324c 100644
--- a/samples/bpf/sampleip_user.c
+++ b/samples/bpf/sampleip_user.c
@@ -28,6 +28,7 @@
#define PAGE_OFFSET 0xffff880000000000
static int nr_cpus;
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
static void usage(void)
{
@@ -95,8 +96,8 @@ static void print_ip_map(int fd)
/* fetch IPs and counts */
key = 0, i = 0;
- while (bpf_get_next_key(fd, &key, &next_key) == 0) {
- bpf_lookup_elem(fd, &next_key, &value);
+ while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
+ bpf_map_lookup_elem(fd, &next_key, &value);
counts[i].ip = next_key;
counts[i++].count = value;
key = next_key;
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index 28b60baa9fa8..7ab636c30154 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -55,8 +55,8 @@ static int test_sock(void)
BPF_EXIT_INSN(),
};
- prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog),
- "GPL", 0);
+ prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog),
+ "GPL", 0, NULL, 0);
if (prog_fd < 0) {
printf("failed to load prog '%s'\n", strerror(errno));
goto cleanup;
@@ -72,13 +72,13 @@ static int test_sock(void)
for (i = 0; i < 10; i++) {
key = IPPROTO_TCP;
- assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
key = IPPROTO_UDP;
- assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, &udp_cnt) == 0);
key = IPPROTO_ICMP;
- assert(bpf_lookup_elem(map_fd, &key, &icmp_cnt) == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, &icmp_cnt) == 0);
printf("TCP %lld UDP %lld ICMP %lld packets\n",
tcp_cnt, udp_cnt, icmp_cnt);
diff --git a/samples/bpf/sockex1_user.c b/samples/bpf/sockex1_user.c
index 678ce4693551..2956d893d732 100644
--- a/samples/bpf/sockex1_user.c
+++ b/samples/bpf/sockex1_user.c
@@ -6,6 +6,8 @@
#include <unistd.h>
#include <arpa/inet.h>
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
int main(int ac, char **argv)
{
char filename[256];
@@ -32,13 +34,13 @@ int main(int ac, char **argv)
int key;
key = IPPROTO_TCP;
- assert(bpf_lookup_elem(map_fd[0], &key, &tcp_cnt) == 0);
+ assert(bpf_map_lookup_elem(map_fd[0], &key, &tcp_cnt) == 0);
key = IPPROTO_UDP;
- assert(bpf_lookup_elem(map_fd[0], &key, &udp_cnt) == 0);
+ assert(bpf_map_lookup_elem(map_fd[0], &key, &udp_cnt) == 0);
key = IPPROTO_ICMP;
- assert(bpf_lookup_elem(map_fd[0], &key, &icmp_cnt) == 0);
+ assert(bpf_map_lookup_elem(map_fd[0], &key, &icmp_cnt) == 0);
printf("TCP %lld UDP %lld ICMP %lld bytes\n",
tcp_cnt, udp_cnt, icmp_cnt);
diff --git a/samples/bpf/sockex2_user.c b/samples/bpf/sockex2_user.c
index 8a4085c2d117..c43958a67cca 100644
--- a/samples/bpf/sockex2_user.c
+++ b/samples/bpf/sockex2_user.c
@@ -7,6 +7,8 @@
#include <arpa/inet.h>
#include <sys/resource.h>
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
struct pair {
__u64 packets;
__u64 bytes;
@@ -39,8 +41,8 @@ int main(int ac, char **argv)
int key = 0, next_key;
struct pair value;
- while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
- bpf_lookup_elem(map_fd[0], &next_key, &value);
+ while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
+ bpf_map_lookup_elem(map_fd[0], &next_key, &value);
printf("ip %s bytes %lld packets %lld\n",
inet_ntoa((struct in_addr){htonl(next_key)}),
value.bytes, value.packets);
diff --git a/samples/bpf/sockex3_user.c b/samples/bpf/sockex3_user.c
index 3fcfd8c4b2a3..2cb9011ea440 100644
--- a/samples/bpf/sockex3_user.c
+++ b/samples/bpf/sockex3_user.c
@@ -7,6 +7,8 @@
#include <arpa/inet.h>
#include <sys/resource.h>
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
struct bpf_flow_keys {
__be32 src;
__be32 dst;
@@ -54,8 +56,8 @@ int main(int argc, char **argv)
sleep(1);
printf("IP src.port -> dst.port bytes packets\n");
- while (bpf_get_next_key(map_fd[2], &key, &next_key) == 0) {
- bpf_lookup_elem(map_fd[2], &next_key, &value);
+ while (bpf_map_get_next_key(map_fd[2], &key, &next_key) == 0) {
+ bpf_map_lookup_elem(map_fd[2], &next_key, &value);
printf("%s.%05d -> %s.%05d %12lld %12lld\n",
inet_ntoa((struct in_addr){htonl(next_key.src)}),
next_key.port16[0],
diff --git a/samples/bpf/spintest_user.c b/samples/bpf/spintest_user.c
index 311ede532230..8950a3d27a92 100644
--- a/samples/bpf/spintest_user.c
+++ b/samples/bpf/spintest_user.c
@@ -7,6 +7,8 @@
#include "libbpf.h"
#include "bpf_load.h"
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
int main(int ac, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
@@ -31,8 +33,8 @@ int main(int ac, char **argv)
for (i = 0; i < 5; i++) {
key = 0;
printf("kprobing funcs:");
- while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
- bpf_lookup_elem(map_fd[0], &next_key, &value);
+ while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
+ bpf_map_lookup_elem(map_fd[0], &next_key, &value);
assert(next_key == value);
sym = ksym_search(value);
printf(" %s", sym->name);
@@ -41,8 +43,8 @@ int main(int ac, char **argv)
if (key)
printf("\n");
key = 0;
- while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0)
- bpf_delete_elem(map_fd[0], &next_key);
+ while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0)
+ bpf_map_delete_elem(map_fd[0], &next_key);
sleep(1);
}
diff --git a/samples/bpf/tc_l2_redirect_user.c b/samples/bpf/tc_l2_redirect_user.c
index 4013c5337b91..28995a776560 100644
--- a/samples/bpf/tc_l2_redirect_user.c
+++ b/samples/bpf/tc_l2_redirect_user.c
@@ -60,9 +60,9 @@ int main(int argc, char **argv)
}
/* bpf_tunnel_key.remote_ipv4 expects host byte orders */
- ret = bpf_update_elem(array_fd, &array_key, &ifindex, 0);
+ ret = bpf_map_update_elem(array_fd, &array_key, &ifindex, 0);
if (ret) {
- perror("bpf_update_elem");
+ perror("bpf_map_update_elem");
goto out;
}
diff --git a/samples/bpf/test_cgrp2_array_pin.c b/samples/bpf/test_cgrp2_array_pin.c
index 70e86f7be69d..8a1b8b5d8def 100644
--- a/samples/bpf/test_cgrp2_array_pin.c
+++ b/samples/bpf/test_cgrp2_array_pin.c
@@ -85,9 +85,9 @@ int main(int argc, char **argv)
}
}
- ret = bpf_update_elem(array_fd, &array_key, &cg2_fd, 0);
+ ret = bpf_map_update_elem(array_fd, &array_key, &cg2_fd, 0);
if (ret) {
- perror("bpf_update_elem");
+ perror("bpf_map_update_elem");
goto out;
}
diff --git a/samples/bpf/test_current_task_under_cgroup_user.c b/samples/bpf/test_current_task_under_cgroup_user.c
index 30b0bce884f9..054915324aeb 100644
--- a/samples/bpf/test_current_task_under_cgroup_user.c
+++ b/samples/bpf/test_current_task_under_cgroup_user.c
@@ -28,6 +28,8 @@
#define log_err(MSG, ...) fprintf(stderr, "(%s:%d: errno: %s) " MSG "\n", \
__FILE__, __LINE__, clean_errno(), ##__VA_ARGS__)
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
static int join_cgroup(char *path)
{
int fd, rc = 0;
@@ -94,7 +96,7 @@ int main(int argc, char **argv)
goto cleanup_cgroup_err;
}
- if (bpf_update_elem(map_fd[0], &idx, &cg2, BPF_ANY)) {
+ if (bpf_map_update_elem(map_fd[0], &idx, &cg2, BPF_ANY)) {
log_err("Adding target cgroup to map");
goto cleanup_cgroup_err;
}
@@ -109,7 +111,7 @@ int main(int argc, char **argv)
*/
sync();
- bpf_lookup_elem(map_fd[1], &idx, &remote_pid);
+ bpf_map_lookup_elem(map_fd[1], &idx, &remote_pid);
if (local_pid != remote_pid) {
fprintf(stderr,
@@ -123,10 +125,10 @@ int main(int argc, char **argv)
goto leave_cgroup_err;
remote_pid = 0;
- bpf_update_elem(map_fd[1], &idx, &remote_pid, BPF_ANY);
+ bpf_map_update_elem(map_fd[1], &idx, &remote_pid, BPF_ANY);
sync();
- bpf_lookup_elem(map_fd[1], &idx, &remote_pid);
+ bpf_map_lookup_elem(map_fd[1], &idx, &remote_pid);
if (local_pid == remote_pid) {
fprintf(stderr, "BPF cgroup negative test did not work\n");
diff --git a/samples/bpf/test_maps.c b/samples/bpf/test_maps.c
index cce2b59751eb..2618394a3cc7 100644
--- a/samples/bpf/test_maps.c
+++ b/samples/bpf/test_maps.c
@@ -36,68 +36,68 @@ static void test_hashmap_sanity(int i, void *data)
key = 1;
value = 1234;
/* insert key=1 element */
- assert(bpf_update_elem(map_fd, &key, &value, BPF_ANY) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_ANY) == 0);
value = 0;
/* BPF_NOEXIST means: add new element if it doesn't exist */
- assert(bpf_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
/* key=1 already exists */
errno == EEXIST);
- assert(bpf_update_elem(map_fd, &key, &value, -1) == -1 && errno == EINVAL);
+ assert(bpf_map_update_elem(map_fd, &key, &value, -1) == -1 && errno == EINVAL);
/* check that key=1 can be found */
- assert(bpf_lookup_elem(map_fd, &key, &value) == 0 && value == 1234);
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0 && value == 1234);
key = 2;
/* check that key=2 is not found */
- assert(bpf_lookup_elem(map_fd, &key, &value) == -1 && errno == ENOENT);
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == -1 && errno == ENOENT);
/* BPF_EXIST means: update existing element */
- assert(bpf_update_elem(map_fd, &key, &value, BPF_EXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_EXIST) == -1 &&
/* key=2 is not there */
errno == ENOENT);
/* insert key=2 element */
- assert(bpf_update_elem(map_fd, &key, &value, BPF_NOEXIST) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST) == 0);
/* key=1 and key=2 were inserted, check that key=0 cannot be inserted
* due to max_entries limit
*/
key = 0;
- assert(bpf_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
errno == E2BIG);
/* update existing element, thought the map is full */
key = 1;
- assert(bpf_update_elem(map_fd, &key, &value, BPF_EXIST) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_EXIST) == 0);
key = 2;
- assert(bpf_update_elem(map_fd, &key, &value, BPF_ANY) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_ANY) == 0);
key = 1;
- assert(bpf_update_elem(map_fd, &key, &value, BPF_ANY) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_ANY) == 0);
/* check that key = 0 doesn't exist */
key = 0;
- assert(bpf_delete_elem(map_fd, &key) == -1 && errno == ENOENT);
+ assert(bpf_map_delete_elem(map_fd, &key) == -1 && errno == ENOENT);
/* iterate over two elements */
- assert(bpf_get_next_key(map_fd, &key, &next_key) == 0 &&
+ assert(bpf_map_get_next_key(map_fd, &key, &next_key) == 0 &&
(next_key == 1 || next_key == 2));
- assert(bpf_get_next_key(map_fd, &next_key, &next_key) == 0 &&
+ assert(bpf_map_get_next_key(map_fd, &next_key, &next_key) == 0 &&
(next_key == 1 || next_key == 2));
- assert(bpf_get_next_key(map_fd, &next_key, &next_key) == -1 &&
+ assert(bpf_map_get_next_key(map_fd, &next_key, &next_key) == -1 &&
errno == ENOENT);
/* delete both elements */
key = 1;
- assert(bpf_delete_elem(map_fd, &key) == 0);
+ assert(bpf_map_delete_elem(map_fd, &key) == 0);
key = 2;
- assert(bpf_delete_elem(map_fd, &key) == 0);
- assert(bpf_delete_elem(map_fd, &key) == -1 && errno == ENOENT);
+ assert(bpf_map_delete_elem(map_fd, &key) == 0);
+ assert(bpf_map_delete_elem(map_fd, &key) == -1 && errno == ENOENT);
key = 0;
/* check that map is empty */
- assert(bpf_get_next_key(map_fd, &key, &next_key) == -1 &&
+ assert(bpf_map_get_next_key(map_fd, &key, &next_key) == -1 &&
errno == ENOENT);
close(map_fd);
}
@@ -123,54 +123,54 @@ static void test_percpu_hashmap_sanity(int task, void *data)
key = 1;
/* insert key=1 element */
assert(!(expected_key_mask & key));
- assert(bpf_update_elem(map_fd, &key, value, BPF_ANY) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, value, BPF_ANY) == 0);
expected_key_mask |= key;
/* BPF_NOEXIST means: add new element if it doesn't exist */
- assert(bpf_update_elem(map_fd, &key, value, BPF_NOEXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, value, BPF_NOEXIST) == -1 &&
/* key=1 already exists */
errno == EEXIST);
/* -1 is an invalid flag */
- assert(bpf_update_elem(map_fd, &key, value, -1) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, value, -1) == -1 &&
errno == EINVAL);
/* check that key=1 can be found. value could be 0 if the lookup
* was run from a different cpu.
*/
value[0] = 1;
- assert(bpf_lookup_elem(map_fd, &key, value) == 0 && value[0] == 100);
+ assert(bpf_map_lookup_elem(map_fd, &key, value) == 0 && value[0] == 100);
key = 2;
/* check that key=2 is not found */
- assert(bpf_lookup_elem(map_fd, &key, value) == -1 && errno == ENOENT);
+ assert(bpf_map_lookup_elem(map_fd, &key, value) == -1 && errno == ENOENT);
/* BPF_EXIST means: update existing element */
- assert(bpf_update_elem(map_fd, &key, value, BPF_EXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, value, BPF_EXIST) == -1 &&
/* key=2 is not there */
errno == ENOENT);
/* insert key=2 element */
assert(!(expected_key_mask & key));
- assert(bpf_update_elem(map_fd, &key, value, BPF_NOEXIST) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, value, BPF_NOEXIST) == 0);
expected_key_mask |= key;
/* key=1 and key=2 were inserted, check that key=0 cannot be inserted
* due to max_entries limit
*/
key = 0;
- assert(bpf_update_elem(map_fd, &key, value, BPF_NOEXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, value, BPF_NOEXIST) == -1 &&
errno == E2BIG);
/* check that key = 0 doesn't exist */
- assert(bpf_delete_elem(map_fd, &key) == -1 && errno == ENOENT);
+ assert(bpf_map_delete_elem(map_fd, &key) == -1 && errno == ENOENT);
/* iterate over two elements */
- while (!bpf_get_next_key(map_fd, &key, &next_key)) {
+ while (!bpf_map_get_next_key(map_fd, &key, &next_key)) {
assert((expected_key_mask & next_key) == next_key);
expected_key_mask &= ~next_key;
- assert(bpf_lookup_elem(map_fd, &next_key, value) == 0);
+ assert(bpf_map_lookup_elem(map_fd, &next_key, value) == 0);
for (i = 0; i < nr_cpus; i++)
assert(value[i] == i + 100);
@@ -180,18 +180,18 @@ static void test_percpu_hashmap_sanity(int task, void *data)
/* Update with BPF_EXIST */
key = 1;
- assert(bpf_update_elem(map_fd, &key, value, BPF_EXIST) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, value, BPF_EXIST) == 0);
/* delete both elements */
key = 1;
- assert(bpf_delete_elem(map_fd, &key) == 0);
+ assert(bpf_map_delete_elem(map_fd, &key) == 0);
key = 2;
- assert(bpf_delete_elem(map_fd, &key) == 0);
- assert(bpf_delete_elem(map_fd, &key) == -1 && errno == ENOENT);
+ assert(bpf_map_delete_elem(map_fd, &key) == 0);
+ assert(bpf_map_delete_elem(map_fd, &key) == -1 && errno == ENOENT);
key = 0;
/* check that map is empty */
- assert(bpf_get_next_key(map_fd, &key, &next_key) == -1 &&
+ assert(bpf_map_get_next_key(map_fd, &key, &next_key) == -1 &&
errno == ENOENT);
close(map_fd);
}
@@ -211,41 +211,41 @@ static void test_arraymap_sanity(int i, void *data)
key = 1;
value = 1234;
/* insert key=1 element */
- assert(bpf_update_elem(map_fd, &key, &value, BPF_ANY) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_ANY) == 0);
value = 0;
- assert(bpf_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
errno == EEXIST);
/* check that key=1 can be found */
- assert(bpf_lookup_elem(map_fd, &key, &value) == 0 && value == 1234);
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0 && value == 1234);
key = 0;
/* check that key=0 is also found and zero initialized */
- assert(bpf_lookup_elem(map_fd, &key, &value) == 0 && value == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0 && value == 0);
/* key=0 and key=1 were inserted, check that key=2 cannot be inserted
* due to max_entries limit
*/
key = 2;
- assert(bpf_update_elem(map_fd, &key, &value, BPF_EXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_EXIST) == -1 &&
errno == E2BIG);
/* check that key = 2 doesn't exist */
- assert(bpf_lookup_elem(map_fd, &key, &value) == -1 && errno == ENOENT);
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == -1 && errno == ENOENT);
/* iterate over two elements */
- assert(bpf_get_next_key(map_fd, &key, &next_key) == 0 &&
+ assert(bpf_map_get_next_key(map_fd, &key, &next_key) == 0 &&
next_key == 0);
- assert(bpf_get_next_key(map_fd, &next_key, &next_key) == 0 &&
+ assert(bpf_map_get_next_key(map_fd, &next_key, &next_key) == 0 &&
next_key == 1);
- assert(bpf_get_next_key(map_fd, &next_key, &next_key) == -1 &&
+ assert(bpf_map_get_next_key(map_fd, &next_key, &next_key) == -1 &&
errno == ENOENT);
/* delete shouldn't succeed */
key = 1;
- assert(bpf_delete_elem(map_fd, &key) == -1 && errno == EINVAL);
+ assert(bpf_map_delete_elem(map_fd, &key) == -1 && errno == EINVAL);
close(map_fd);
}
@@ -269,12 +269,12 @@ static void test_percpu_arraymap_many_keys(void)
values[i] = i + 10;
for (key = 0; key < nr_keys; key++)
- assert(bpf_update_elem(map_fd, &key, values, BPF_ANY) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, values, BPF_ANY) == 0);
for (key = 0; key < nr_keys; key++) {
for (i = 0; i < nr_cpus; i++)
values[i] = 0;
- assert(bpf_lookup_elem(map_fd, &key, values) == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, values) == 0);
for (i = 0; i < nr_cpus; i++)
assert(values[i] == i + 10);
}
@@ -300,40 +300,40 @@ static void test_percpu_arraymap_sanity(int i, void *data)
key = 1;
/* insert key=1 element */
- assert(bpf_update_elem(map_fd, &key, values, BPF_ANY) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, values, BPF_ANY) == 0);
values[0] = 0;
- assert(bpf_update_elem(map_fd, &key, values, BPF_NOEXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, values, BPF_NOEXIST) == -1 &&
errno == EEXIST);
/* check that key=1 can be found */
- assert(bpf_lookup_elem(map_fd, &key, values) == 0 && values[0] == 100);
+ assert(bpf_map_lookup_elem(map_fd, &key, values) == 0 && values[0] == 100);
key = 0;
/* check that key=0 is also found and zero initialized */
- assert(bpf_lookup_elem(map_fd, &key, values) == 0 &&
+ assert(bpf_map_lookup_elem(map_fd, &key, values) == 0 &&
values[0] == 0 && values[nr_cpus - 1] == 0);
/* check that key=2 cannot be inserted due to max_entries limit */
key = 2;
- assert(bpf_update_elem(map_fd, &key, values, BPF_EXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, values, BPF_EXIST) == -1 &&
errno == E2BIG);
/* check that key = 2 doesn't exist */
- assert(bpf_lookup_elem(map_fd, &key, values) == -1 && errno == ENOENT);
+ assert(bpf_map_lookup_elem(map_fd, &key, values) == -1 && errno == ENOENT);
/* iterate over two elements */
- assert(bpf_get_next_key(map_fd, &key, &next_key) == 0 &&
+ assert(bpf_map_get_next_key(map_fd, &key, &next_key) == 0 &&
next_key == 0);
- assert(bpf_get_next_key(map_fd, &next_key, &next_key) == 0 &&
+ assert(bpf_map_get_next_key(map_fd, &next_key, &next_key) == 0 &&
next_key == 1);
- assert(bpf_get_next_key(map_fd, &next_key, &next_key) == -1 &&
+ assert(bpf_map_get_next_key(map_fd, &next_key, &next_key) == -1 &&
errno == ENOENT);
/* delete shouldn't succeed */
key = 1;
- assert(bpf_delete_elem(map_fd, &key) == -1 && errno == EINVAL);
+ assert(bpf_map_delete_elem(map_fd, &key) == -1 && errno == EINVAL);
close(map_fd);
}
@@ -359,21 +359,21 @@ static void test_map_large(void)
for (i = 0; i < MAP_SIZE; i++) {
key = (struct bigkey) {.c = i};
value = i;
- assert(bpf_update_elem(map_fd, &key, &value, BPF_NOEXIST) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST) == 0);
}
key.c = -1;
- assert(bpf_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
errno == E2BIG);
/* iterate through all elements */
for (i = 0; i < MAP_SIZE; i++)
- assert(bpf_get_next_key(map_fd, &key, &key) == 0);
- assert(bpf_get_next_key(map_fd, &key, &key) == -1 && errno == ENOENT);
+ assert(bpf_map_get_next_key(map_fd, &key, &key) == 0);
+ assert(bpf_map_get_next_key(map_fd, &key, &key) == -1 && errno == ENOENT);
key.c = 0;
- assert(bpf_lookup_elem(map_fd, &key, &value) == 0 && value == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0 && value == 0);
key.a = 1;
- assert(bpf_lookup_elem(map_fd, &key, &value) == -1 && errno == ENOENT);
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == -1 && errno == ENOENT);
close(map_fd);
}
@@ -423,10 +423,10 @@ static void do_work(int fn, void *data)
for (i = fn; i < MAP_SIZE; i += TASKS) {
key = value = i;
if (do_update) {
- assert(bpf_update_elem(map_fd, &key, &value, BPF_NOEXIST) == 0);
- assert(bpf_update_elem(map_fd, &key, &value, BPF_EXIST) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST) == 0);
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_EXIST) == 0);
} else {
- assert(bpf_delete_elem(map_fd, &key) == 0);
+ assert(bpf_map_delete_elem(map_fd, &key) == 0);
}
}
}
@@ -454,19 +454,19 @@ static void test_map_parallel(void)
run_parallel(TASKS, do_work, data);
/* check that key=0 is already there */
- assert(bpf_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
+ assert(bpf_map_update_elem(map_fd, &key, &value, BPF_NOEXIST) == -1 &&
errno == EEXIST);
/* check that all elements were inserted */
key = -1;
for (i = 0; i < MAP_SIZE; i++)
- assert(bpf_get_next_key(map_fd, &key, &key) == 0);
- assert(bpf_get_next_key(map_fd, &key, &key) == -1 && errno == ENOENT);
+ assert(bpf_map_get_next_key(map_fd, &key, &key) == 0);
+ assert(bpf_map_get_next_key(map_fd, &key, &key) == -1 && errno == ENOENT);
/* another check for all elements */
for (i = 0; i < MAP_SIZE; i++) {
key = MAP_SIZE - i - 1;
- assert(bpf_lookup_elem(map_fd, &key, &value) == 0 &&
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0 &&
value == key);
}
@@ -476,7 +476,7 @@ static void test_map_parallel(void)
/* nothing should be left */
key = -1;
- assert(bpf_get_next_key(map_fd, &key, &key) == -1 && errno == ENOENT);
+ assert(bpf_map_get_next_key(map_fd, &key, &key) == -1 && errno == ENOENT);
}
static void run_all_tests(void)
diff --git a/samples/bpf/test_overhead_user.c b/samples/bpf/test_overhead_user.c
index d291167fd3c7..74b79af98ba2 100644
--- a/samples/bpf/test_overhead_user.c
+++ b/samples/bpf/test_overhead_user.c
@@ -24,6 +24,8 @@
#define MAX_CNT 1000000
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
static __u64 time_get_ns(void)
{
struct timespec ts;
diff --git a/samples/bpf/test_probe_write_user_user.c b/samples/bpf/test_probe_write_user_user.c
index a44bf347bedd..29af2160e4e2 100644
--- a/samples/bpf/test_probe_write_user_user.c
+++ b/samples/bpf/test_probe_write_user_user.c
@@ -9,6 +9,8 @@
#include <netinet/in.h>
#include <arpa/inet.h>
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
int main(int ac, char **argv)
{
int serverfd, serverconnfd, clientfd;
@@ -50,7 +52,7 @@ int main(int ac, char **argv)
mapped_addr_in->sin_port = htons(5555);
mapped_addr_in->sin_addr.s_addr = inet_addr("255.255.255.255");
- assert(!bpf_update_elem(map_fd[0], &mapped_addr, &serv_addr, BPF_ANY));
+ assert(!bpf_map_update_elem(map_fd[0], &mapped_addr, &serv_addr, BPF_ANY));
assert(listen(serverfd, 5) == 0);
diff --git a/samples/bpf/test_verifier.c b/samples/bpf/test_verifier.c
index 369ffaad3799..2e85a8e2d696 100644
--- a/samples/bpf/test_verifier.c
+++ b/samples/bpf/test_verifier.c
@@ -24,6 +24,8 @@
#define MAX_FIXUPS 8
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
struct bpf_test {
const char *descr;
struct bpf_insn insns[MAX_INSNS];
@@ -2466,9 +2468,9 @@ static int test(void)
printf("#%d %s ", i, tests[i].descr);
- prog_fd = bpf_prog_load(prog_type ?: BPF_PROG_TYPE_SOCKET_FILTER,
- prog, prog_len * sizeof(struct bpf_insn),
- "GPL", 0);
+ prog_fd = bpf_load_program(prog_type ?: BPF_PROG_TYPE_SOCKET_FILTER,
+ prog, prog_len * sizeof(struct bpf_insn),
+ "GPL", 0, NULL, 0);
if (unpriv && tests[i].result_unpriv != UNDEF)
expected_result = tests[i].result_unpriv;
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 9a130d31ecf2..de8fd0266d78 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -24,6 +24,7 @@
#define SAMPLE_FREQ 50
static bool sys_read_seen, sys_write_seen;
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
static void print_ksym(__u64 addr)
{
@@ -61,14 +62,14 @@ static void print_stack(struct key_t *key, __u64 count)
int i;
printf("%3lld %s;", count, key->comm);
- if (bpf_lookup_elem(map_fd[1], &key->kernstack, ip) != 0) {
+ if (bpf_map_lookup_elem(map_fd[1], &key->kernstack, ip) != 0) {
printf("---;");
} else {
for (i = PERF_MAX_STACK_DEPTH - 1; i >= 0; i--)
print_ksym(ip[i]);
}
printf("-;");
- if (bpf_lookup_elem(map_fd[1], &key->userstack, ip) != 0) {
+ if (bpf_map_lookup_elem(map_fd[1], &key->userstack, ip) != 0) {
printf("---;");
} else {
for (i = PERF_MAX_STACK_DEPTH - 1; i >= 0; i--)
@@ -98,10 +99,10 @@ static void print_stacks(void)
int fd = map_fd[0], stack_map = map_fd[1];
sys_read_seen = sys_write_seen = false;
- while (bpf_get_next_key(fd, &key, &next_key) == 0) {
- bpf_lookup_elem(fd, &next_key, &value);
+ while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
+ bpf_map_lookup_elem(fd, &next_key, &value);
print_stack(&next_key, value);
- bpf_delete_elem(fd, &next_key);
+ bpf_map_delete_elem(fd, &next_key);
key = next_key;
}
@@ -111,8 +112,8 @@ static void print_stacks(void)
}
/* clear stack map */
- while (bpf_get_next_key(stack_map, &stackid, &next_id) == 0) {
- bpf_delete_elem(stack_map, &next_id);
+ while (bpf_map_get_next_key(stack_map, &stackid, &next_id) == 0) {
+ bpf_map_delete_elem(stack_map, &next_id);
stackid = next_id;
}
}
diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
index 661a7d052f2c..9c38f7aa4515 100644
--- a/samples/bpf/trace_output_user.c
+++ b/samples/bpf/trace_output_user.c
@@ -27,6 +27,7 @@ static int pmu_fd;
int page_size;
int page_cnt = 8;
volatile struct perf_event_mmap_page *header;
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
typedef void (*print_fn)(void *data, int size);
@@ -162,7 +163,7 @@ static void test_bpf_perf_event(void)
pmu_fd = perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
assert(pmu_fd >= 0);
- assert(bpf_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
+ assert(bpf_map_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
}
diff --git a/samples/bpf/tracex1_user.c b/samples/bpf/tracex1_user.c
index 31a48183beea..aa60cc3ef38c 100644
--- a/samples/bpf/tracex1_user.c
+++ b/samples/bpf/tracex1_user.c
@@ -4,6 +4,8 @@
#include "libbpf.h"
#include "bpf_load.h"
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
int main(int ac, char **argv)
{
FILE *f;
diff --git a/samples/bpf/tracex2_user.c b/samples/bpf/tracex2_user.c
index ab5b19e68acf..5dd128cc5785 100644
--- a/samples/bpf/tracex2_user.c
+++ b/samples/bpf/tracex2_user.c
@@ -10,6 +10,8 @@
#define MAX_INDEX 64
#define MAX_STARS 38
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
static void stars(char *str, long val, long max, int width)
{
int i;
@@ -46,12 +48,12 @@ static void print_hist_for_pid(int fd, void *task)
long max_value = 0;
int i, ind;
- while (bpf_get_next_key(fd, &key, &next_key) == 0) {
+ while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
if (memcmp(&next_key, task, SIZE)) {
key = next_key;
continue;
}
- bpf_lookup_elem(fd, &next_key, values);
+ bpf_map_lookup_elem(fd, &next_key, values);
value = 0;
for (i = 0; i < nr_cpus; i++)
value += values[i];
@@ -81,7 +83,7 @@ static void print_hist(int fd)
int task_cnt = 0;
int i;
- while (bpf_get_next_key(fd, &key, &next_key) == 0) {
+ while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
int found = 0;
for (i = 0; i < task_cnt; i++)
@@ -134,8 +136,8 @@ int main(int ac, char **argv)
for (i = 0; i < 5; i++) {
key = 0;
- while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
- bpf_lookup_elem(map_fd[0], &next_key, &value);
+ while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
+ bpf_map_lookup_elem(map_fd[0], &next_key, &value);
printf("location 0x%lx count %ld\n", next_key, value);
key = next_key;
}
diff --git a/samples/bpf/tracex3_user.c b/samples/bpf/tracex3_user.c
index 48716f7f0d8b..9c6ee595a882 100644
--- a/samples/bpf/tracex3_user.c
+++ b/samples/bpf/tracex3_user.c
@@ -18,6 +18,8 @@
#define SLOTS 100
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
static void clear_stats(int fd)
{
unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
@@ -26,7 +28,7 @@ static void clear_stats(int fd)
memset(values, 0, sizeof(values));
for (key = 0; key < SLOTS; key++)
- bpf_update_elem(fd, &key, values, BPF_ANY);
+ bpf_map_update_elem(fd, &key, values, BPF_ANY);
}
const char *color[] = {
@@ -87,7 +89,7 @@ static void print_hist(int fd)
int i;
for (key = 0; key < SLOTS; key++) {
- bpf_lookup_elem(fd, &key, values);
+ bpf_map_lookup_elem(fd, &key, values);
value = 0;
for (i = 0; i < nr_cpus; i++)
value += values[i];
diff --git a/samples/bpf/tracex4_user.c b/samples/bpf/tracex4_user.c
index bc4a3bdea6ed..8b2f98c6c99a 100644
--- a/samples/bpf/tracex4_user.c
+++ b/samples/bpf/tracex4_user.c
@@ -15,6 +15,8 @@
#include "libbpf.h"
#include "bpf_load.h"
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
struct pair {
long long val;
__u64 ip;
@@ -37,8 +39,8 @@ static void print_old_objects(int fd)
key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
key = -1;
- while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
- bpf_lookup_elem(map_fd[0], &next_key, &v);
+ while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
+ bpf_map_lookup_elem(map_fd[0], &next_key, &v);
key = next_key;
if (val - v.val < 1000000000ll)
/* object was allocated more then 1 sec ago */
diff --git a/samples/bpf/tracex5_user.c b/samples/bpf/tracex5_user.c
index 36b5925bb137..035beea6b31c 100644
--- a/samples/bpf/tracex5_user.c
+++ b/samples/bpf/tracex5_user.c
@@ -8,6 +8,8 @@
#include "bpf_load.h"
#include <sys/resource.h>
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
/* install fake seccomp program to enable seccomp code path inside the kernel,
* so that our kprobe attached to seccomp_phase1() can be triggered
*/
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index 8ea4976cfcf1..7a3b4a4b19f3 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -13,6 +13,8 @@
#define SAMPLE_PERIOD 0x7fffffffffffffffULL
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
static void test_bpf_perf_event(void)
{
int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
@@ -36,7 +38,7 @@ static void test_bpf_perf_event(void)
goto exit;
}
- bpf_update_elem(map_fd[0], &i, &pmu_fd[i], BPF_ANY);
+ bpf_map_update_elem(map_fd[0], &i, &pmu_fd[i], BPF_ANY);
ioctl(pmu_fd[i], PERF_EVENT_IOC_ENABLE, 0);
}
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index a5e109e398a1..7302322f26ff 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -18,6 +18,8 @@
#include "bpf_load.h"
#include "libbpf.h"
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
static int set_link_xdp_fd(int ifindex, int fd)
{
struct sockaddr_nl sa;
@@ -134,7 +136,7 @@ static void poll_stats(int interval)
for (key = 0; key < nr_keys; key++) {
__u64 sum = 0;
- assert(bpf_lookup_elem(map_fd[0], &key, values) == 0);
+ assert(bpf_map_lookup_elem(map_fd[0], &key, values) == 0);
for (i = 0; i < nr_cpus; i++)
sum += (values[i] - prev[key][i]);
if (sum)
--
2.10.2
^ permalink raw reply related
* [PATCHv3 perf/core 3/7] tools lib bpf: Add flags to bpf_create_map()
From: Joe Stringer @ 2016-12-09 2:46 UTC (permalink / raw)
To: linux-kernel; +Cc: wangnan0, ast, daniel, acme, netdev
In-Reply-To: <20161209024620.31660-1-joe@ovn.org>
The map_flags argument to bpf_create_map() was previously not exposed.
By exposing it, users can access flags such as whether or not to
preallocate the map.
Signed-off-by: Joe Stringer <joe@ovn.org>
---
v3: Split from "tools lib bpf: Sync with samples/bpf/libbpf".
---
tools/lib/bpf/bpf.c | 3 ++-
tools/lib/bpf/bpf.h | 2 +-
tools/lib/bpf/libbpf.c | 3 ++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 89e8e8e5b60e..d0afb26c2e0f 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -54,7 +54,7 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
}
int bpf_create_map(enum bpf_map_type map_type, int key_size,
- int value_size, int max_entries)
+ int value_size, int max_entries, __u32 map_flags)
{
union bpf_attr attr;
@@ -64,6 +64,7 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
attr.key_size = key_size;
attr.value_size = value_size;
attr.max_entries = max_entries;
+ attr.map_flags = map_flags;
return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 61130170a6ad..7fcdce16fd62 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -24,7 +24,7 @@
#include <linux/bpf.h>
int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
- int max_entries);
+ int max_entries, __u32 map_flags);
/* Recommend log buffer size */
#define BPF_LOG_BUF_SIZE 65536
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2e974593f3e8..84e6b35da4bd 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -854,7 +854,8 @@ bpf_object__create_maps(struct bpf_object *obj)
*pfd = bpf_create_map(def->type,
def->key_size,
def->value_size,
- def->max_entries);
+ def->max_entries,
+ 0);
if (*pfd < 0) {
size_t j;
int err = *pfd;
--
2.10.2
^ permalink raw reply related
* [PATCHv3 perf/core 2/7] tools lib bpf: use __u32 from linux/types.h
From: Joe Stringer @ 2016-12-09 2:46 UTC (permalink / raw)
To: linux-kernel; +Cc: wangnan0, ast, daniel, acme, netdev
In-Reply-To: <20161209024620.31660-1-joe@ovn.org>
Fixes the following issue when building without access to 'u32' type:
./tools/lib/bpf/bpf.h:27:23: error: unknown type name ‘u32’
Signed-off-by: Joe Stringer <joe@ovn.org>
---
v3: Split from "tools lib bpf: Sync with samples/bpf/libbpf"
---
tools/lib/bpf/bpf.c | 4 ++--
tools/lib/bpf/bpf.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 8143536b462a..89e8e8e5b60e 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -70,7 +70,7 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
size_t insns_cnt, char *license,
- u32 kern_version, char *log_buf, size_t log_buf_sz)
+ __u32 kern_version, char *log_buf, size_t log_buf_sz)
{
int fd;
union bpf_attr attr;
@@ -98,7 +98,7 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
}
int bpf_map_update_elem(int fd, void *key, void *value,
- u64 flags)
+ __u64 flags)
{
union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 253c3dbb06b4..61130170a6ad 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -30,11 +30,11 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
#define BPF_LOG_BUF_SIZE 65536
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
size_t insns_cnt, char *license,
- u32 kern_version, char *log_buf,
+ __u32 kern_version, char *log_buf,
size_t log_buf_sz);
int bpf_map_update_elem(int fd, void *key, void *value,
- u64 flags);
+ __u64 flags);
int bpf_map_lookup_elem(int fd, void *key, void *value);
int bpf_map_delete_elem(int fd, void *key);
--
2.10.2
^ permalink raw reply related
* [PATCHv3 perf/core 1/7] tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
From: Joe Stringer @ 2016-12-09 2:46 UTC (permalink / raw)
To: linux-kernel; +Cc: wangnan0, ast, daniel, acme, netdev
In-Reply-To: <20161209024620.31660-1-joe@ovn.org>
The tools version of this header is out of date; update it to the latest
version from the kernel headers.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
---
v3: Add ack.
v2: No change.
---
tools/include/uapi/linux/bpf.h | 51 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9e5fc168c8a3..f09c70b97eca 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -95,6 +95,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SCHED_ACT,
BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_XDP,
+ BPF_PROG_TYPE_PERF_EVENT,
};
#define BPF_PSEUDO_MAP_FD 1
@@ -375,6 +376,56 @@ enum bpf_func_id {
*/
BPF_FUNC_probe_write_user,
+ /**
+ * bpf_current_task_under_cgroup(map, index) - Check cgroup2 membership of current task
+ * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
+ * @index: index of the cgroup in the bpf_map
+ * Return:
+ * == 0 current failed the cgroup2 descendant test
+ * == 1 current succeeded the cgroup2 descendant test
+ * < 0 error
+ */
+ BPF_FUNC_current_task_under_cgroup,
+
+ /**
+ * bpf_skb_change_tail(skb, len, flags)
+ * The helper will resize the skb to the given new size,
+ * to be used f.e. with control messages.
+ * @skb: pointer to skb
+ * @len: new skb length
+ * @flags: reserved
+ * Return: 0 on success or negative error
+ */
+ BPF_FUNC_skb_change_tail,
+
+ /**
+ * bpf_skb_pull_data(skb, len)
+ * The helper will pull in non-linear data in case the
+ * skb is non-linear and not all of len are part of the
+ * linear section. Only needed for read/write with direct
+ * packet access.
+ * @skb: pointer to skb
+ * @len: len to make read/writeable
+ * Return: 0 on success or negative error
+ */
+ BPF_FUNC_skb_pull_data,
+
+ /**
+ * bpf_csum_update(skb, csum)
+ * Adds csum into skb->csum in case of CHECKSUM_COMPLETE.
+ * @skb: pointer to skb
+ * @csum: csum to add
+ * Return: csum on success or negative error
+ */
+ BPF_FUNC_csum_update,
+
+ /**
+ * bpf_set_hash_invalid(skb)
+ * Invalidate current skb>hash.
+ * @skb: pointer to skb
+ */
+ BPF_FUNC_set_hash_invalid,
+
__BPF_FUNC_MAX_ID,
};
--
2.10.2
^ permalink raw reply related
* [PATCHv3 perf/core 0/7] Reuse libbpf from samples/bpf
From: Joe Stringer @ 2016-12-09 2:46 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, wangnan0, ast, daniel, acme
(Was "libbpf: Synchronize implementations")
Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the
samples/bpf/ code, then get rid of all of the duplicate BPF libraries in
samples/bpf/libbpf.[ch].
---
v3: Add ack for first patch.
Split out second patch from v2 into separate changes for remaining diff.
Add patches to switch samples/bpf over to using tools/lib/.
v2: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
Don't shift non-bpf code into libbpf.
Drop the patch to synchronize ELF definitions with tc.
v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
First post.
Joe Stringer (7):
tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
tools lib bpf: use __u32 from linux/types.h
tools lib bpf: Add flags to bpf_create_map()
samples/bpf: Make samples more libbpf-centric
samples/bpf: Switch over to libbpf
samples/bpf: Remove perf_event_open() declaration
samples/bpf: Move open_raw_sock to separate header
samples/bpf/Makefile | 61 +++++----
samples/bpf/README.rst | 4 +-
samples/bpf/bpf_load.c | 20 ++-
samples/bpf/fds_example.c | 10 +-
samples/bpf/lathist_user.c | 3 +-
samples/bpf/libbpf.c | 155 ----------------------
samples/bpf/libbpf.h | 25 +---
samples/bpf/map_perf_test_user.c | 1 +
samples/bpf/offwaketime_user.c | 10 +-
samples/bpf/sampleip_user.c | 8 +-
samples/bpf/sock_example.c | 11 +-
samples/bpf/sock_example.h | 35 +++++
samples/bpf/sockex1_user.c | 9 +-
samples/bpf/sockex2_user.c | 7 +-
samples/bpf/sockex3_user.c | 7 +-
samples/bpf/spintest_user.c | 10 +-
samples/bpf/tc_l2_redirect_user.c | 4 +-
samples/bpf/test_cgrp2_array_pin.c | 4 +-
samples/bpf/test_current_task_under_cgroup_user.c | 10 +-
samples/bpf/test_maps.c | 142 ++++++++++----------
samples/bpf/test_overhead_user.c | 2 +
samples/bpf/test_probe_write_user_user.c | 4 +-
samples/bpf/test_verifier.c | 8 +-
samples/bpf/trace_event_user.c | 24 ++--
samples/bpf/trace_output_user.c | 6 +-
samples/bpf/tracex1_user.c | 2 +
samples/bpf/tracex2_user.c | 12 +-
samples/bpf/tracex3_user.c | 6 +-
samples/bpf/tracex4_user.c | 6 +-
samples/bpf/tracex5_user.c | 2 +
samples/bpf/tracex6_user.c | 7 +-
samples/bpf/xdp1_user.c | 4 +-
tools/include/uapi/linux/bpf.h | 51 +++++++
tools/lib/bpf/Makefile | 2 +
tools/lib/bpf/bpf.c | 7 +-
tools/lib/bpf/bpf.h | 6 +-
tools/lib/bpf/libbpf.c | 3 +-
37 files changed, 332 insertions(+), 356 deletions(-)
delete mode 100644 samples/bpf/libbpf.c
create mode 100644 samples/bpf/sock_example.h
--
2.10.2
^ permalink raw reply
* Re: [PATCH net-next 3/3] net: xgene: avoid bogus maybe-uninitialized warning
From: David Miller @ 2016-12-09 2:31 UTC (permalink / raw)
To: arnd; +Cc: isubramanian, kchudgar, qnguyen, kdinh, toanle, netdev,
linux-kernel
In-Reply-To: <20161208215727.44841-3-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Thu, 8 Dec 2016 22:57:05 +0100
> In some configurations, gcc cannot trace the state of variables
> across a spin_unlock() barrier, leading to a warning about
> correct code:
>
> xgene_enet_main.c: In function 'xgene_enet_start_xmit':
> ../../../phy/mdio-xgene.h:112:14: error: 'mss_index' may be used uninitialized in this function [-Werror=maybe-uninitialized]
>
> Here we can trivially move the assignment before that spin_unlock,
> which reliably avoids the warning.
>
> Fixes: e3978673f514 ("drivers: net: xgene: Fix MSS programming")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 2/3] net: xgene: move xgene_cle_ptree_ewdn data off stack
From: David Miller @ 2016-12-09 2:31 UTC (permalink / raw)
To: arnd; +Cc: isubramanian, kchudgar, kdinh, qnguyen, tinamdar, netdev,
linux-kernel
In-Reply-To: <20161208215727.44841-2-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Thu, 8 Dec 2016 22:57:04 +0100
> The array for initializing the cle is set up on the stack with
> almost entirely constant data and then passed to a function that
> converts it into HW specific bit patterns. With the latest
> addition, the size of this array has grown to the point that
> we get a warning about potential stack overflow in allmodconfig
> builds:
>
> xgene_enet_cle.c: In function ‘xgene_enet_cle_init’:
> xgene_enet_cle.c:836:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>
> Looking a bit deeper at the usage, I noticed that the only modification
> of the data is in dead code, as we don't even use the cle module
> for phy_mode other than PHY_INTERFACE_MODE_XGMII. This means we
> can simply mark the structure constant and access it directly rather
> than passing the pointer down through another structure, making
> the code more efficient at the same time as avoiding the
> warning.
>
> Fixes: a809701fed15 ("drivers: net: xgene: fix: RSS for non-TCP/UDP")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 1/3] net/mlx5e: use %pad format string for dma_addr_t
From: David Miller @ 2016-12-09 2:31 UTC (permalink / raw)
To: arnd-r2nGTMty4D4
Cc: saeedm-VPRAkNaXOzVWk0Htik3J/w, matanb-VPRAkNaXOzVWk0Htik3J/w,
leonro-VPRAkNaXOzVWk0Htik3J/w, danielj-VPRAkNaXOzVWk0Htik3J/w,
tariqt-VPRAkNaXOzVWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161208215727.44841-1-arnd-r2nGTMty4D4@public.gmane.org>
From: Arnd Bergmann <arnd@arndb.de>
Date: Thu, 8 Dec 2016 22:57:03 +0100
> On 32-bit ARM with 64-bit dma_addr_t I get this warning about an
> incorrect format string:
>
> In file included from /git/arm-soc/drivers/net/ethernet/mellanox/mlx5/core/alloc.c:42:0:
> drivers/net/ethernet/mellanox/mlx5/core/alloc.c: In function ‘mlx5_frag_buf_alloc_node’:
> drivers/net/ethernet/mellanox/mlx5/core/alloc.c:134:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
>
> We have the special %pad format for printing dma_addr_t, so use that
> to print the correct address and avoid the warning.
>
> Fixes: 1c1b522808a1 ("net/mlx5e: Implement Fragmented Work Queue (WQ)")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied.
^ permalink raw reply
* Re: [PATCH net 0/2] net: ethernet: Make sure we set dev->dev.parent
From: David Miller @ 2016-12-09 2:27 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, madalin.bucur, johan
In-Reply-To: <20161208194125.13264-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 8 Dec 2016 11:41:23 -0800
> This patch series builds atop:
>
> ec988ad78ed6d184a7f4ca6b8e962b0e8f1de461 ("phy: Don't increment MDIO bus
> refcount unless it's a different owner")
>
> FMAN is the one that potentially needs patching as well (call SET_NETDEV_DEV),
> but there appears to be no way that init_phy is called right now, or there is
> not such an in-tree user. Madalin, can you comment on that?
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH-RESEND] vhost-vsock: fix orphan connection reset
From: David Miller @ 2016-12-09 2:25 UTC (permalink / raw)
To: bergwolf; +Cc: stefanha, kvm, virtualization, netdev
In-Reply-To: <1481217046-7058-1-git-send-email-bergwolf@gmail.com>
From: Peng Tao <bergwolf@gmail.com>
Date: Fri, 9 Dec 2016 01:10:46 +0800
> local_addr.svm_cid is host cid. We should check guest cid instead,
> which is remote_addr.svm_cid. Otherwise we end up resetting all
> connections to all guests.
>
> Cc: stable@vger.kernel.org [4.8+]
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <bergwolf@gmail.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH] [v4] net: phy: phy drivers should not set SUPPORTED_[Asym_]Pause
From: David Miller @ 2016-12-09 2:23 UTC (permalink / raw)
To: timur; +Cc: f.fainelli, netdev, jon.mason, nks.gnu
In-Reply-To: <1481138451-28144-1-git-send-email-timur@codeaurora.org>
Florian, please review this patch.
Thanks.
^ permalink raw reply
* Re: fs, net: deadlock between bind/splice on af_unix
From: Al Viro @ 2016-12-09 1:32 UTC (permalink / raw)
To: Cong Wang
Cc: Dmitry Vyukov, linux-fsdevel@vger.kernel.org, LKML, David Miller,
Rainer Weikusat, Hannes Frederic Sowa, netdev, Eric Dumazet,
syzkaller
In-Reply-To: <CAM_iQpXu+fyjmvrYRB9+VJCdSLS=7Jiet762hqWDANfsOM0XWw@mail.gmail.com>
On Thu, Dec 08, 2016 at 04:08:27PM -0800, Cong Wang wrote:
> On Thu, Dec 8, 2016 at 8:30 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > Chain exists of:
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(sb_writers#5);
> > lock(&u->bindlock);
> > lock(sb_writers#5);
> > lock(&pipe->mutex/1);
>
> This looks false positive, probably just needs lockdep_set_class()
> to set keys for pipe->mutex and unix->bindlock.
I'm afraid that it's not a false positive at all.
Preparations:
* create an AF_UNIX socket.
* set SOCK_PASSCRED on it.
* create a pipe.
Child 1: splice from pipe to socket; locks pipe and proceeds down towards
unix_dgram_sendmsg().
Child 2: splice from pipe to /mnt/foo/bar; requests write access to /mnt
and blocks on attempt to lock the pipe already locked by (1).
Child 3: freeze /mnt; blocks until (2) is done
Child 4: bind() the socket to /mnt/barf; grabs ->bindlock on the socket and
proceeds to create /mnt/barf, which blocks due to fairness of freezer (no
extra write accesses to something that is in process of being frozen).
_Now_ (1) gets around to unix_dgram_sendmsg(). We still have NULL u->addr,
since bind() has not gotten through yet. We also have SOCK_PASSCRED set,
so we attempt autobind; it blocks on the ->bindlock, which won't be
released until bind() is done (at which point we'll see non-NULL u->addr
and bugger off from autobind), but bind() won't succeed until /mnt
goes through the freeze-thaw cycle, which won't happen until (2) finishes,
which won't happen until (1) unlocks the pipe. Deadlock.
Granted, ->bindlock is taken interruptibly, so it's not that much of
a problem (you can kill the damn thing), but you would need to intervene
and kill it.
Why do we do autobind there, anyway, and why is it conditional on
SOCK_PASSCRED? Note that e.g. for SOCK_STREAM we can bloody well get
to sending stuff without autobind ever done - just use socketpair()
to create that sucker and we won't be going through the connect()
at all.
^ permalink raw reply
* Re: [Intel-wired-lan] [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
From: tndave @ 2016-12-09 1:16 UTC (permalink / raw)
To: Alexander Duyck, David Laight
Cc: Jeff Kirsher, intel-wired-lan, Netdev, sparclinux
In-Reply-To: <fc3fc99a-ff53-d895-ec06-56f9cf8a0d13@oracle.com>
On 12/08/2016 04:45 PM, tndave wrote:
>
>
> On 12/08/2016 08:05 AM, Alexander Duyck wrote:
>> On Thu, Dec 8, 2016 at 2:43 AM, David Laight
>> <David.Laight@aculab.com> wrote:
>>> From: Alexander Duyck
>>>> Sent: 06 December 2016 17:10
>>> ...
>>>> I was thinking about it and I realized we can probably simplify
>>>> this even further. In the case of most other architectures the
>>>> DMA_ATTR_WEAK_ORDERING has no effect anyway. So from what I can
>>>> tell there is probably no reason not to just always pass that
>>>> attribute with the DMA mappings. From what I can tell the only
>>>> other architecture that uses this is the PowerPC Cell
>>>> architecture.
>>>
>>> And I should have read all the thread :-(
>>>
>>>> Also I was wondering if you actually needed to enable this
>>>> attribute for both Rx and Tx buffers or just Rx buffers? The
>>>> patch that enabled DMA_ATTR_WEAK_ORDERING for Sparc64 seems to
>>>> call out writes, but I didn't see anything about reads. I'm just
>>>> wondering if changing the code for Tx has any effect? If not you
>>>> could probably drop those changes and just focus on Rx.
>>>
>>> 'Weak ordering' only applies to PCIe read transfers, so can only
>>> have an effect on descriptor reads and transmit buffer reads.
>>>
>>> Basically PCIe is a comms protocol and an endpoint (or the host)
>>> can have multiple outstanding read requests (each of which might
>>> generate multiple response messages. The responses for each request
>>> must arrive in order, but responses for different requests can be
>>> interleaved. Setting 'not weak ordering' lets the host interwork
>>> with broken endpoints. (Or, like we did, you fix the fpga's PCIe
>>> implementation.)
>>
>> I get the basics of relaxed ordering. The question is how does the
>> Sparc64 IOMMU translate DMA_ATTR_WEAK_ORDERING into relaxed ordering
>> messages, and at what level the ordering is relaxed. Odds are the
>> wording in the description where this attribute was added to Sparc
>> is just awkward, but I was wanting to verify if this only applies to
>> writes, or also read completions.
> In Sparc64, passing DMA_ATTR_WEAK_ORDERING in dma map/unmap only affects
> PCIe root complex (Hostbridge). Using DMA_ATTR_WEAK_ORDERING, requested
> DMA transaction can be relaxed ordered within the PCIe root complex.
>
> In Sparc64, memory writes can be held at PCIe root complex not letting
> other memory writes to go through. By passing DMA_ATTR_WEAK_ORDERING in
> dma map/unmap allows memory writes to bypass other memory writes in PCIe
> root complex. (This applies to only PCIe root complex and does not
> affect at any other level of PCIe hierarchy e.g. PCIe bridges et al.
> Also the PCIe root complex when bypassing memory writes does follow PCIe
> relax ordering rules as per PCIe specification.
>
> For reference [old but still relevant write-up]: PCI-Express Relaxed
> Ordering and the Sun SPARC Enterprise M-class Servers
> https://blogs.oracle.com/olympus/entry/relaxed_ordering
>
>>
>>> In this case you need the reads of both transmit and receive rings
>>> to 'overtake' reads of transmit data.
>>
>> Actually that isn't quite right. With relaxed ordering completions
>> and writes can pass each other if I recall correctly, but reads will
>> always force all writes ahead of them to be completed before you can
>> begin generating the read completions.
> That is my understanding as well.
>
>>
>>> I'm not at all clear how this 'flag' can be set on dma_map(). It is
>>> a property of the PCIe subsystem.
> Because in Sparc64, passing DMA_ATTR_WEAK_ORDERING flag in DMA map/unmap
> adds an entry in IOMMU/ATU table so that an access to requested DMA
> address from PCIe root complex can be relaxed ordered.
>>
>> That was where my original question on this came in. We can do a
>> blanket enable of relaxed ordering for Tx and Rx data buffers, but
>> if we only need it on Rx then there isn't any need for us to make
>> unnecessary changes.
> I ran some quick test and it is likely that we don't need
> DMA_ATTR_WEAK_ORDERING for any TX dma buffer (because in case of TX dma
> buffers, its all memory reads from device).
in above line , s/from/by
+ cc sparclinux@vger.kernel.org
-Tushar
>
> -Tushar
>>
>> - Alex
>>
>
^ permalink raw reply
* Re: Soft lockup in inet_put_port on 4.6
From: Josef Bacik @ 2016-12-09 1:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: Hannes Frederic Sowa, Tom Herbert,
Linux Kernel Network Developers
In-Reply-To: <1481243432.4930.145.camel@edumazet-glaptop3.roam.corp.google.com>
> On Dec 8, 2016, at 7:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote:
>>
>> We can reproduce the problem at will, still trying to run down the
>> problem. I'll try and find one of the boxes that dumped a core and get
>> a bt of everybody. Thanks,
>
> OK, sounds good.
>
> I had a look and :
> - could not spot a fix that came after 4.6.
> - could not spot an obvious bug.
>
> Anything special in the program triggering the issue ?
> SO_REUSEPORT and/or special socket options ?
>
So they recently started using SO_REUSEPORT, that's what triggered it, if they don't use it then everything is fine.
I added some instrumentation for get_port to see if it was looping in there and none of my printk's triggered. The softlockup messages are always on the inet_bind_bucket lock, sometimes in the process context in get_port or in the softirq context either through inet_put_port or inet_kill_twsk. On the box that I have a coredump for there's only one processor in the inet code so I'm not sure what to make of that. That was a box from last week so I'll look at a more recent core and see if it's different. Thanks,
Josef
^ permalink raw reply
* Re: [PATCH 2/6] net: ethernet: ti: cpts: add support for ext rftclk selection
From: Stephen Boyd @ 2016-12-09 0:47 UTC (permalink / raw)
To: Grygorii Strashko
Cc: Richard Cochran, Murali Karicheri, David S. Miller, netdev,
Mugunthan V N, Sekhar Nori, linux-kernel, linux-omap, Rob Herring,
devicetree, Wingman Kwok, linux-clk
In-Reply-To: <11994fbc-3713-6ef7-8a44-8a2442106dfc@ti.com>
On 12/06, Grygorii Strashko wrote:
> Subject: [PATCH] cpts refclk sel
>
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
> arch/arm/boot/dts/keystone-k2e-netcp.dtsi | 10 +++++-
> drivers/net/ethernet/ti/cpts.c | 52 ++++++++++++++++++++++++++++++-
> 2 files changed, 60 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/boot/dts/keystone-k2e-netcp.dtsi b/arch/arm/boot/dts/keystone-k2e-netcp.dtsi
> index 919e655..b27aa22 100644
> --- a/arch/arm/boot/dts/keystone-k2e-netcp.dtsi
> +++ b/arch/arm/boot/dts/keystone-k2e-netcp.dtsi
> @@ -138,7 +138,7 @@ netcp: netcp@24000000 {
> /* NetCP address range */
> ranges = <0 0x24000000 0x1000000>;
>
> - clocks = <&clkpa>, <&clkcpgmac>, <&chipclk12>;
> + clocks = <&clkpa>, <&clkcpgmac>, <&cpts_mux>;
> clock-names = "pa_clk", "ethss_clk", "cpts";
> dma-coherent;
>
> @@ -162,6 +162,14 @@ netcp: netcp@24000000 {
> cpts-ext-ts-inputs = <6>;
> cpts-ts-comp-length;
>
> + cpts_mux: cpts_refclk_mux {
> + #clock-cells = <0>;
> + clocks = <&chipclk12>, <&chipclk13>;
> + cpts-mux-tbl = <0>, <1>;
> + assigned-clocks = <&cpts_mux>;
> + assigned-clock-parents = <&chipclk12>;
Is there a binding update? Why the subnode? Why not have it as
part of the netcp node? Does the cpts-mux-tbl property change?
> + };
> +
> interfaces {
> gbe0: interface-0 {
> slave-port = <0>;
> diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
> index 938de22..ef94316 100644
> --- a/drivers/net/ethernet/ti/cpts.c
> +++ b/drivers/net/ethernet/ti/cpts.c
> @@ -17,6 +17,7 @@
> * along with this program; if not, write to the Free Software
> * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> */
> +#include <linux/clk-provider.h>
> #include <linux/err.h>
> #include <linux/if.h>
> #include <linux/hrtimer.h>
> @@ -672,6 +673,7 @@ int cpts_register(struct cpts *cpts)
> cpts->phc_index = ptp_clock_index(cpts->clock);
>
> schedule_delayed_work(&cpts->overflow_work, cpts->ov_check_period);
> +
Maybe in another patch.
> return 0;
>
> err_ptp:
> @@ -741,6 +743,54 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
> freq, cpts->cc_mult, cpts->cc.shift, (ns - NSEC_PER_SEC));
> }
>
> +static int cpts_of_mux_clk_setup(struct cpts *cpts, struct device_node *node)
> +{
> + unsigned int num_parents;
> + const char **parent_names;
> + struct device_node *refclk_np;
> + void __iomem *reg;
> + struct clk *clk;
> + u32 *mux_table;
> + int ret;
> +
> + refclk_np = of_get_child_by_name(node, "cpts_refclk_mux");
> + if (!refclk_np)
> + return -EINVAL;
> +
> + num_parents = of_clk_get_parent_count(refclk_np);
> + if (num_parents < 1) {
> + dev_err(cpts->dev, "mux-clock %s must have parents\n",
> + refclk_np->name);
> + return -EINVAL;
> + }
> + parent_names = devm_kzalloc(cpts->dev, (sizeof(char *) * num_parents),
> + GFP_KERNEL);
> + if (!parent_names)
> + return -ENOMEM;
> +
> + of_clk_parent_fill(refclk_np, parent_names, num_parents);
> +
> + mux_table = devm_kzalloc(cpts->dev, sizeof(*mux_table) * (32 + 1),
> + GFP_KERNEL);
> + if (!mux_table)
> + return -ENOMEM;
> +
> + ret = of_property_read_variable_u32_array(refclk_np, "cpts-mux-tbl",
> + mux_table, 1, 32);
> + if (ret < 0)
> + return ret;
> +
> + reg = &cpts->reg->rftclk_sel;
> +
> + clk = clk_register_mux_table(cpts->dev, refclk_np->name,
> + parent_names, num_parents,
> + 0, reg, 0, 0x1F, 0, mux_table, NULL);
> + if (IS_ERR(clk))
> + return PTR_ERR(clk);
> +
> + return of_clk_add_provider(refclk_np, of_clk_src_simple_get, clk);
Can you please use the clk_hw APIs instead?
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply
* Re: [Intel-wired-lan] [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
From: tndave @ 2016-12-09 0:45 UTC (permalink / raw)
To: Alexander Duyck, David Laight; +Cc: Jeff Kirsher, intel-wired-lan, Netdev
In-Reply-To: <CAKgT0Uf3aZBoziv7B0tFMovYe9JNWEYZAZhe9iof2XcuRJD+uw@mail.gmail.com>
On 12/08/2016 08:05 AM, Alexander Duyck wrote:
> On Thu, Dec 8, 2016 at 2:43 AM, David Laight
> <David.Laight@aculab.com> wrote:
>> From: Alexander Duyck
>>> Sent: 06 December 2016 17:10
>> ...
>>> I was thinking about it and I realized we can probably simplify
>>> this even further. In the case of most other architectures the
>>> DMA_ATTR_WEAK_ORDERING has no effect anyway. So from what I can
>>> tell there is probably no reason not to just always pass that
>>> attribute with the DMA mappings. From what I can tell the only
>>> other architecture that uses this is the PowerPC Cell
>>> architecture.
>>
>> And I should have read all the thread :-(
>>
>>> Also I was wondering if you actually needed to enable this
>>> attribute for both Rx and Tx buffers or just Rx buffers? The
>>> patch that enabled DMA_ATTR_WEAK_ORDERING for Sparc64 seems to
>>> call out writes, but I didn't see anything about reads. I'm just
>>> wondering if changing the code for Tx has any effect? If not you
>>> could probably drop those changes and just focus on Rx.
>>
>> 'Weak ordering' only applies to PCIe read transfers, so can only
>> have an effect on descriptor reads and transmit buffer reads.
>>
>> Basically PCIe is a comms protocol and an endpoint (or the host)
>> can have multiple outstanding read requests (each of which might
>> generate multiple response messages. The responses for each request
>> must arrive in order, but responses for different requests can be
>> interleaved. Setting 'not weak ordering' lets the host interwork
>> with broken endpoints. (Or, like we did, you fix the fpga's PCIe
>> implementation.)
>
> I get the basics of relaxed ordering. The question is how does the
> Sparc64 IOMMU translate DMA_ATTR_WEAK_ORDERING into relaxed ordering
> messages, and at what level the ordering is relaxed. Odds are the
> wording in the description where this attribute was added to Sparc
> is just awkward, but I was wanting to verify if this only applies to
> writes, or also read completions.
In Sparc64, passing DMA_ATTR_WEAK_ORDERING in dma map/unmap only affects
PCIe root complex (Hostbridge). Using DMA_ATTR_WEAK_ORDERING, requested
DMA transaction can be relaxed ordered within the PCIe root complex.
In Sparc64, memory writes can be held at PCIe root complex not letting
other memory writes to go through. By passing DMA_ATTR_WEAK_ORDERING in
dma map/unmap allows memory writes to bypass other memory writes in PCIe
root complex. (This applies to only PCIe root complex and does not
affect at any other level of PCIe hierarchy e.g. PCIe bridges et al.
Also the PCIe root complex when bypassing memory writes does follow PCIe
relax ordering rules as per PCIe specification.
For reference [old but still relevant write-up]: PCI-Express Relaxed
Ordering and the Sun SPARC Enterprise M-class Servers
https://blogs.oracle.com/olympus/entry/relaxed_ordering
>
>> In this case you need the reads of both transmit and receive rings
>> to 'overtake' reads of transmit data.
>
> Actually that isn't quite right. With relaxed ordering completions
> and writes can pass each other if I recall correctly, but reads will
> always force all writes ahead of them to be completed before you can
> begin generating the read completions.
That is my understanding as well.
>
>> I'm not at all clear how this 'flag' can be set on dma_map(). It is
>> a property of the PCIe subsystem.
Because in Sparc64, passing DMA_ATTR_WEAK_ORDERING flag in DMA map/unmap
adds an entry in IOMMU/ATU table so that an access to requested DMA
address from PCIe root complex can be relaxed ordered.
>
> That was where my original question on this came in. We can do a
> blanket enable of relaxed ordering for Tx and Rx data buffers, but
> if we only need it on Rx then there isn't any need for us to make
> unnecessary changes.
I ran some quick test and it is likely that we don't need
DMA_ATTR_WEAK_ORDERING for any TX dma buffer (because in case of TX dma
buffers, its all memory reads from device).
-Tushar
>
> - Alex
>
^ permalink raw reply
* Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects
From: Paul Gortmaker @ 2016-12-09 0:40 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netfilter-devel, David Miller, netdev, linux-next@vger.kernel.org
In-Reply-To: <1481147576-5690-38-git-send-email-pablo@netfilter.org>
On Wed, Dec 7, 2016 at 4:52 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
> dump-and-reset of the stateful object. This also comes with add support
> for atomic dump and reset for counter and quota objects.
This triggered a new build failure in linux-next on parisc-32, which a
hands-off bisect
run lists as resulting from this:
ERROR: "__cmpxchg_u64" [net/netfilter/nft_counter.ko] undefined!
make[2]: *** [__modpost] Error 1
make[1]: *** [modules] Error 2
make: *** [sub-make] Error 2
43da04a593d8b2626f1cf4b56efe9402f6b53652 is the first bad commit
commit 43da04a593d8b2626f1cf4b56efe9402f6b53652
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon Nov 28 00:05:44 2016 +0100
netfilter: nf_tables: atomic dump and reset for stateful objects
This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
dump-and-reset of the stateful object. This also comes with add support
for atomic dump and reset for counter and quota objects.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
:040000 040000 6cd4554f69247e5c837db52342f26888beda1623
5908aca93c89e7922336546c3753bfcf2aceefba M include
:040000 040000 f25d5831eb30972436bd198c5bb237a0cb0b4856
4ee5751c8de02bb5a8dcaadb2a2df7986d90f8e9 M net
bisect run success
Guessing this is more an issue with parisc than it is with netfilter, but I
figured I'd mention it anyway.
Paul.
--
>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> include/net/netfilter/nf_tables.h | 3 +-
> include/uapi/linux/netfilter/nf_tables.h | 2 ++
> net/netfilter/nf_tables_api.c | 29 ++++++++++++-----
> net/netfilter/nft_counter.c | 56 +++++++++++++++++++++++++++-----
> net/netfilter/nft_quota.c | 18 ++++++----
> 5 files changed, 85 insertions(+), 23 deletions(-)
>
> diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
> index 903cd618f50e..6f7d6a1dc09c 100644
> --- a/include/net/netfilter/nf_tables.h
> +++ b/include/net/netfilter/nf_tables.h
> @@ -997,7 +997,8 @@ struct nft_object_type {
> struct nft_object *obj);
> void (*destroy)(struct nft_object *obj);
> int (*dump)(struct sk_buff *skb,
> - const struct nft_object *obj);
> + struct nft_object *obj,
> + bool reset);
> };
>
> int nft_register_obj(struct nft_object_type *obj_type);
> diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
> index 3d47582caa80..399eac1eee91 100644
> --- a/include/uapi/linux/netfilter/nf_tables.h
> +++ b/include/uapi/linux/netfilter/nf_tables.h
> @@ -89,6 +89,7 @@ enum nft_verdicts {
> * @NFT_MSG_NEWOBJ: create a stateful object (enum nft_obj_attributes)
> * @NFT_MSG_GETOBJ: get a stateful object (enum nft_obj_attributes)
> * @NFT_MSG_DELOBJ: delete a stateful object (enum nft_obj_attributes)
> + * @NFT_MSG_GETOBJ_RESET: get and reset a stateful object (enum nft_obj_attributes)
> */
> enum nf_tables_msg_types {
> NFT_MSG_NEWTABLE,
> @@ -112,6 +113,7 @@ enum nf_tables_msg_types {
> NFT_MSG_NEWOBJ,
> NFT_MSG_GETOBJ,
> NFT_MSG_DELOBJ,
> + NFT_MSG_GETOBJ_RESET,
> NFT_MSG_MAX,
> };
>
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 2ae717c5dcb8..bfc015af366a 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -3972,14 +3972,14 @@ static struct nft_object *nft_obj_init(const struct nft_object_type *type,
> }
>
> static int nft_object_dump(struct sk_buff *skb, unsigned int attr,
> - const struct nft_object *obj)
> + struct nft_object *obj, bool reset)
> {
> struct nlattr *nest;
>
> nest = nla_nest_start(skb, attr);
> if (!nest)
> goto nla_put_failure;
> - if (obj->type->dump(skb, obj) < 0)
> + if (obj->type->dump(skb, obj, reset) < 0)
> goto nla_put_failure;
> nla_nest_end(skb, nest);
> return 0;
> @@ -4096,7 +4096,7 @@ static int nf_tables_newobj(struct net *net, struct sock *nlsk,
> static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net,
> u32 portid, u32 seq, int event, u32 flags,
> int family, const struct nft_table *table,
> - const struct nft_object *obj)
> + struct nft_object *obj, bool reset)
> {
> struct nfgenmsg *nfmsg;
> struct nlmsghdr *nlh;
> @@ -4115,7 +4115,7 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net,
> nla_put_string(skb, NFTA_OBJ_NAME, obj->name) ||
> nla_put_be32(skb, NFTA_OBJ_TYPE, htonl(obj->type->type)) ||
> nla_put_be32(skb, NFTA_OBJ_USE, htonl(obj->use)) ||
> - nft_object_dump(skb, NFTA_OBJ_DATA, obj))
> + nft_object_dump(skb, NFTA_OBJ_DATA, obj, reset))
> goto nla_put_failure;
>
> nlmsg_end(skb, nlh);
> @@ -4131,10 +4131,14 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
> const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
> const struct nft_af_info *afi;
> const struct nft_table *table;
> - const struct nft_object *obj;
> unsigned int idx = 0, s_idx = cb->args[0];
> struct net *net = sock_net(skb->sk);
> int family = nfmsg->nfgen_family;
> + struct nft_object *obj;
> + bool reset = false;
> +
> + if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET)
> + reset = true;
>
> rcu_read_lock();
> cb->seq = net->nft.base_seq;
> @@ -4156,7 +4160,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
> cb->nlh->nlmsg_seq,
> NFT_MSG_NEWOBJ,
> NLM_F_MULTI | NLM_F_APPEND,
> - afi->family, table, obj) < 0)
> + afi->family, table, obj, reset) < 0)
> goto done;
>
> nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> @@ -4183,6 +4187,7 @@ static int nf_tables_getobj(struct net *net, struct sock *nlsk,
> const struct nft_table *table;
> struct nft_object *obj;
> struct sk_buff *skb2;
> + bool reset = false;
> u32 objtype;
> int err;
>
> @@ -4214,9 +4219,12 @@ static int nf_tables_getobj(struct net *net, struct sock *nlsk,
> if (!skb2)
> return -ENOMEM;
>
> + if (NFNL_MSG_TYPE(nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET)
> + reset = true;
> +
> err = nf_tables_fill_obj_info(skb2, net, NETLINK_CB(skb).portid,
> nlh->nlmsg_seq, NFT_MSG_NEWOBJ, 0,
> - family, table, obj);
> + family, table, obj, reset);
> if (err < 0)
> goto err;
>
> @@ -4291,7 +4299,7 @@ static int nf_tables_obj_notify(const struct nft_ctx *ctx,
>
> err = nf_tables_fill_obj_info(skb, ctx->net, ctx->portid, ctx->seq,
> event, 0, ctx->afi->family, ctx->table,
> - obj);
> + obj, false);
> if (err < 0) {
> kfree_skb(skb);
> goto err;
> @@ -4482,6 +4490,11 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
> .attr_count = NFTA_OBJ_MAX,
> .policy = nft_obj_policy,
> },
> + [NFT_MSG_GETOBJ_RESET] = {
> + .call = nf_tables_getobj,
> + .attr_count = NFTA_OBJ_MAX,
> + .policy = nft_obj_policy,
> + },
> };
>
> static void nft_chain_commit_update(struct nft_trans *trans)
> diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
> index 6f3dd429f865..f6a02c5071c2 100644
> --- a/net/netfilter/nft_counter.c
> +++ b/net/netfilter/nft_counter.c
> @@ -100,10 +100,10 @@ static void nft_counter_obj_destroy(struct nft_object *obj)
> nft_counter_do_destroy(priv);
> }
>
> -static void nft_counter_fetch(const struct nft_counter_percpu __percpu *counter,
> +static void nft_counter_fetch(struct nft_counter_percpu __percpu *counter,
> struct nft_counter *total)
> {
> - const struct nft_counter_percpu *cpu_stats;
> + struct nft_counter_percpu *cpu_stats;
> u64 bytes, packets;
> unsigned int seq;
> int cpu;
> @@ -122,12 +122,52 @@ static void nft_counter_fetch(const struct nft_counter_percpu __percpu *counter,
> }
> }
>
> +static u64 __nft_counter_reset(u64 *counter)
> +{
> + u64 ret, old;
> +
> + do {
> + old = *counter;
> + ret = cmpxchg64(counter, old, 0);
> + } while (ret != old);
> +
> + return ret;
> +}
> +
> +static void nft_counter_reset(struct nft_counter_percpu __percpu *counter,
> + struct nft_counter *total)
> +{
> + struct nft_counter_percpu *cpu_stats;
> + u64 bytes, packets;
> + unsigned int seq;
> + int cpu;
> +
> + memset(total, 0, sizeof(*total));
> + for_each_possible_cpu(cpu) {
> + bytes = packets = 0;
> +
> + cpu_stats = per_cpu_ptr(counter, cpu);
> + do {
> + seq = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
> + packets += __nft_counter_reset(&cpu_stats->counter.packets);
> + bytes += __nft_counter_reset(&cpu_stats->counter.bytes);
> + } while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, seq));
> +
> + total->packets += packets;
> + total->bytes += bytes;
> + }
> +}
> +
> static int nft_counter_do_dump(struct sk_buff *skb,
> - const struct nft_counter_percpu_priv *priv)
> + const struct nft_counter_percpu_priv *priv,
> + bool reset)
> {
> struct nft_counter total;
>
> - nft_counter_fetch(priv->counter, &total);
> + if (reset)
> + nft_counter_reset(priv->counter, &total);
> + else
> + nft_counter_fetch(priv->counter, &total);
>
> if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes),
> NFTA_COUNTER_PAD) ||
> @@ -141,11 +181,11 @@ static int nft_counter_do_dump(struct sk_buff *skb,
> }
>
> static int nft_counter_obj_dump(struct sk_buff *skb,
> - const struct nft_object *obj)
> + struct nft_object *obj, bool reset)
> {
> - const struct nft_counter_percpu_priv *priv = nft_obj_data(obj);
> + struct nft_counter_percpu_priv *priv = nft_obj_data(obj);
>
> - return nft_counter_do_dump(skb, priv);
> + return nft_counter_do_dump(skb, priv, reset);
> }
>
> static const struct nla_policy nft_counter_policy[NFTA_COUNTER_MAX + 1] = {
> @@ -178,7 +218,7 @@ static int nft_counter_dump(struct sk_buff *skb, const struct nft_expr *expr)
> {
> const struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
>
> - return nft_counter_do_dump(skb, priv);
> + return nft_counter_do_dump(skb, priv, false);
> }
>
> static int nft_counter_init(const struct nft_ctx *ctx,
> diff --git a/net/netfilter/nft_quota.c b/net/netfilter/nft_quota.c
> index 0d344209803a..5d25f57497cb 100644
> --- a/net/netfilter/nft_quota.c
> +++ b/net/netfilter/nft_quota.c
> @@ -83,12 +83,17 @@ static int nft_quota_obj_init(const struct nlattr * const tb[],
> return nft_quota_do_init(tb, priv);
> }
>
> -static int nft_quota_do_dump(struct sk_buff *skb, const struct nft_quota *priv)
> +static int nft_quota_do_dump(struct sk_buff *skb, struct nft_quota *priv,
> + bool reset)
> {
> u32 flags = priv->invert ? NFT_QUOTA_F_INV : 0;
> u64 consumed;
>
> - consumed = atomic64_read(&priv->consumed);
> + if (reset)
> + consumed = atomic64_xchg(&priv->consumed, 0);
> + else
> + consumed = atomic64_read(&priv->consumed);
> +
> /* Since we inconditionally increment consumed quota for each packet
> * that we see, don't go over the quota boundary in what we send to
> * userspace.
> @@ -108,11 +113,12 @@ static int nft_quota_do_dump(struct sk_buff *skb, const struct nft_quota *priv)
> return -1;
> }
>
> -static int nft_quota_obj_dump(struct sk_buff *skb, const struct nft_object *obj)
> +static int nft_quota_obj_dump(struct sk_buff *skb, struct nft_object *obj,
> + bool reset)
> {
> struct nft_quota *priv = nft_obj_data(obj);
>
> - return nft_quota_do_dump(skb, priv);
> + return nft_quota_do_dump(skb, priv, reset);
> }
>
> static struct nft_object_type nft_quota_obj __read_mostly = {
> @@ -146,9 +152,9 @@ static int nft_quota_init(const struct nft_ctx *ctx,
>
> static int nft_quota_dump(struct sk_buff *skb, const struct nft_expr *expr)
> {
> - const struct nft_quota *priv = nft_expr_priv(expr);
> + struct nft_quota *priv = nft_expr_priv(expr);
>
> - return nft_quota_do_dump(skb, priv);
> + return nft_quota_do_dump(skb, priv, false);
> }
>
> static struct nft_expr_type nft_quota_type;
> --
> 2.1.4
>
^ permalink raw reply
* Re: [PATCH net-next] openvswitch: fix VxLAN-gpe port can't be created in ovs compat mode
From: Yang, Yi @ 2016-12-09 0:27 UTC (permalink / raw)
To: Pravin Shelar; +Cc: netdev, dev, jbenc
In-Reply-To: <CAOrHB_CVr0PQGsNkhrrvT7L8cX1KZypV4UXakXm1uuCz7ZPYHw@mail.gmail.com>
On Thu, Dec 08, 2016 at 11:41:58AM -0800, Pravin Shelar wrote:
> On Thu, Dec 8, 2016 at 12:20 AM, Yi Yang <yi.y.yang@intel.com> wrote:
> >
> > Signed-off-by: Yi Yang <yi.y.yang@intel.com>
> > ---
> > include/uapi/linux/openvswitch.h | 1 +
> > net/openvswitch/vport-vxlan.c | 15 +++++++++++++++
> > 2 files changed, 16 insertions(+)
> >
> There is no need for this patch in upstream kernel module. I am open
> to having such a patch in out of tree kernel if it simplifies feature
> compatibility code.
I'm very glad to hear this :-), the goal is to enable current ovs to create
vxlan-gpe port in compat mode without new kernel help, I'll post a patch
for ovs, thanks a lot.
^ permalink raw reply
* linux-next: build warning after merge of the net-next tree
From: Stephen Rothwell @ 2016-12-09 0:34 UTC (permalink / raw)
To: David Miller, Networking
Cc: linux-next, linux-kernel, Jacob Keller, Jeff Kirsher
Hi all,
After merging the net-next tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:
In file included from include/linux/byteorder/big_endian.h:4:0,
from arch/powerpc/include/uapi/asm/byteorder.h:13,
from include/asm-generic/bitops/le.h:5,
from arch/powerpc/include/asm/bitops.h:279,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from include/linux/skbuff.h:17,
from include/linux/if_ether.h:23,
from include/linux/etherdevice.h:25,
from drivers/net/ethernet/intel/i40e/i40e_main.c:27:
drivers/net/ethernet/intel/i40e/i40e_main.c: In function 'i40e_sync_vsi_filters':
include/uapi/linux/byteorder/big_endian.h:34:26: warning: large integer implicitly truncated to unsigned type [-Woverflow]
#define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
^
include/linux/byteorder/generic.h:89:21: note: in expansion of macro '__cpu_to_le16'
#define cpu_to_le16 __cpu_to_le16
^
drivers/net/ethernet/intel/i40e/i40e_main.c:2200:5: note: in expansion of macro 'cpu_to_le16'
cpu_to_le16((u16)I40E_AQC_MM_ERR_NO_RES);
^
Introduced by commit
ac9e23901441 ("i40e: refactor i40e_update_filter_state to avoid passing aq_err")
--
Cheers,
Stephen Rothwell
^ permalink raw reply
* Re: net: deadlock on genl_mutex
From: Cong Wang @ 2016-12-09 0:32 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: syzkaller, Eric Dumazet, David Miller, Matti Vaittinen,
Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert,
netdev, LKML, Richard Guy Briggs, netdev-owner
In-Reply-To: <CACT4Y+Zy82UAJ55VbPbVadUM92ZSx1VJCFPdhhcmj53uxZ5PXQ@mail.gmail.com>
On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Chain exists of:
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(genl_mutex);
> lock(nlk->cb_mutex);
> lock(genl_mutex);
> lock(rtnl_mutex);
>
> *** DEADLOCK ***
This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
Let me think about it.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox