* [RFC PATCH 0/3] BPF socket filter to deal with skb frags
@ 2018-06-08 21:00 Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter Tushar Dave
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
rdna, quentin.monnet, brakmo, acme
This RFC allows bpf socket filter programs to look into complete skb
i.e. linear and non-linear part of skb. (patch1)
For a proof of concept I'm using RDS sample program that uses bpf socket
filter and inspect skb packet data from linear and non-linear part e.g.
skb frags. (patch 2 and 3)
I'm sharing this RFC to get some feedback on direction.
Details:
patch1 adds new bpf helper function and needed infrastructure so that
socket(sk) filter based eBPF program can retrieve non-linear part of skb
(e.g. skb frags) unlike current socket filter that only deals with
linear skb. This patch adds very basic functionality and for now allow
socket filter programs to only read packet data (from linear and
non-linear part of) skb. The idea behind this patch is to add eBPF
helper that allow socket filter based ebpf program to walk through the
skb frag using bpf tail call. This way ebpf program can do deep packet
inspection (i.e. allows to look into headers as well as payload).
patch2 adds sample ebpf socket filter program that uses rds socket. The
sample program opens an rds socket, attach ebpf program to rds socket
and uses bpf helper added in patch 1 to look into skb. For a test,
current ebpf program only prints first few bytes from skb->data and skb
frags.
patch3 allows rds_recv_incoming to invoke bpf socket filter program if
any program is attached to rds socket.
FYI, I'm also working on a follow-up patchset that deals with *struct
scatterlist* to allow RDS filtering for IB/RDMA use cases that do not
have an sk_buff.
Thanks.
-Tushar
Tushar Dave (3):
ebpf: add next_skb_frag bpf helper for sk filter
samples/bpf: add sample RDS program
rds: invoke sk filter attached to rds socket
include/linux/filter.h | 2 +
include/uapi/linux/bpf.h | 10 +-
net/core/filter.c | 44 ++++-
net/rds/recv.c | 17 ++
samples/bpf/Makefile | 3 +
samples/bpf/rds_skb_kern.c | 87 +++++++++
samples/bpf/rds_skb_user.c | 311 ++++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 10 +-
tools/testing/selftests/bpf/bpf_helpers.h | 2 +
9 files changed, 482 insertions(+), 4 deletions(-)
create mode 100644 samples/bpf/rds_skb_kern.c
create mode 100644 samples/bpf/rds_skb_user.c
--
1.8.3.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter
2018-06-08 21:00 [RFC PATCH 0/3] BPF socket filter to deal with skb frags Tushar Dave
@ 2018-06-08 21:00 ` Tushar Dave
2018-06-08 21:27 ` Daniel Borkmann
2018-06-08 21:00 ` [RFC PATCH 2/3] samples/bpf: add sample RDS program Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 3/3] rds: invoke sk filter attached to rds socket Tushar Dave
2 siblings, 1 reply; 7+ messages in thread
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
rdna, quentin.monnet, brakmo, acme
Today socket filter only deals with linear skbs. This change allows
ebpf programs to look into non-linear skb e.g. skb frags. This will be
useful when users need to look into data which is not contained in the
linear part of skb.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
include/linux/filter.h | 2 ++
include/uapi/linux/bpf.h | 10 ++++++-
net/core/filter.c | 44 +++++++++++++++++++++++++++++--
tools/include/uapi/linux/bpf.h | 10 ++++++-
tools/testing/selftests/bpf/bpf_helpers.h | 2 ++
5 files changed, 64 insertions(+), 4 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9dbcb9d..603b8bf 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -500,6 +500,7 @@ struct sk_filter {
struct bpf_skb_data_end {
struct qdisc_skb_cb qdisc_cb;
+ u8 index;
void *data_meta;
void *data_end;
};
@@ -534,6 +535,7 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
BUILD_BUG_ON(sizeof(*cb) > FIELD_SIZEOF(struct sk_buff, cb));
cb->data_meta = skb->data - skb_metadata_len(skb);
cb->data_end = skb->data + skb_headlen(skb);
+ cb->index = 0;
}
static inline u8 *bpf_skb_cb(struct sk_buff *skb)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d94d333..5fe9668 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1902,6 +1902,13 @@ struct bpf_stack_build_id {
* egress otherwise). This is the only flag supported for now.
* Return
* **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_next_skb_frag(struct sk_buff *skb)
+ * Description
+ * This helper allows users to look into non-linear part of skb
+ * e.g. skb frags.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -1976,7 +1983,8 @@ struct bpf_stack_build_id {
FN(fib_lookup), \
FN(sock_hash_update), \
FN(msg_redirect_hash), \
- FN(sk_redirect_hash),
+ FN(sk_redirect_hash), \
+ FN(next_skb_frag),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 51ea7dd..fd8e90f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3752,6 +3752,38 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
.arg1_type = ARG_PTR_TO_CTX,
};
+BPF_CALL_1(bpf_next_skb_frag, struct sk_buff *, skb)
+{
+ struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+ const skb_frag_t *frag;
+
+ if (skb->data_len == 0)
+ return -ENODATA;
+
+ if (cb->index == (u8)skb_shinfo(skb)->nr_frags)
+ return -ENODATA;
+
+ /* get the frag start and end address into data_meta and data_end
+ * respectively so eBPF program can look into skb frag
+ */
+ frag = &skb_shinfo(skb)->frags[cb->index];
+ cb->data_meta = page_address(skb_frag_page(frag)) +
+ frag->page_offset;
+ cb->data_end = cb->data_meta + skb_frag_size(frag);
+
+ /* update frag index */
+ cb->index++;
+
+ return 0;
+}
+
+static const struct bpf_func_proto bpf_next_skb_frag_proto = {
+ .func = bpf_next_skb_frag,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+};
+
BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
int, level, int, optname, char *, optval, int, optlen)
{
@@ -4415,6 +4447,8 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
return &bpf_get_socket_cookie_proto;
case BPF_FUNC_get_socket_uid:
return &bpf_get_socket_uid_proto;
+ case BPF_FUNC_next_skb_frag:
+ return &bpf_next_skb_frag_proto;
default:
return bpf_base_func_proto(func_id);
}
@@ -4698,10 +4732,16 @@ static bool sk_filter_is_valid_access(int off, int size,
struct bpf_insn_access_aux *info)
{
switch (off) {
- case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data):
- case bpf_ctx_range(struct __sk_buff, data_meta):
+ info->reg_type = PTR_TO_PACKET;
+ break;
case bpf_ctx_range(struct __sk_buff, data_end):
+ info->reg_type = PTR_TO_PACKET_END;
+ break;
+ case bpf_ctx_range(struct __sk_buff, data_meta):
+ info->reg_type = PTR_TO_PACKET;
+ break;
+ case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false;
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d94d333..5fe9668 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1902,6 +1902,13 @@ struct bpf_stack_build_id {
* egress otherwise). This is the only flag supported for now.
* Return
* **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_next_skb_frag(struct sk_buff *skb)
+ * Description
+ * This helper allows users to look into non-linear part of skb
+ * e.g. skb frags.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -1976,7 +1983,8 @@ struct bpf_stack_build_id {
FN(fib_lookup), \
FN(sock_hash_update), \
FN(msg_redirect_hash), \
- FN(sk_redirect_hash),
+ FN(sk_redirect_hash), \
+ FN(next_skb_frag),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 8f143df..51f2153 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -114,6 +114,8 @@ static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
static int (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params,
int plen, __u32 flags) =
(void *) BPF_FUNC_fib_lookup;
+static unsigned long long (*bpf_next_skb_frag)(void *ctx) =
+ (void *) BPF_FUNC_next_skb_frag;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 2/3] samples/bpf: add sample RDS program
2018-06-08 21:00 [RFC PATCH 0/3] BPF socket filter to deal with skb frags Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter Tushar Dave
@ 2018-06-08 21:00 ` Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 3/3] rds: invoke sk filter attached to rds socket Tushar Dave
2 siblings, 0 replies; 7+ messages in thread
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
rdna, quentin.monnet, brakmo, acme
When run in server mode, the sample RDS program opens PF_RDS socket,
attaches ebpf program to RDS socket which then uses bpf_skb_next_frag
helper along with bpf tail calls to inspect skb linear and non-linear
data.
To ease testing, RDS client functionality is also added so that users
can generate RDS packet.
Run server:
[root@lab71 bpf]# ./rds_skb -s 192.168.3.71
running server in a loop
transport tcp
server bound to address: 192.168.3.71 port 4000
server listening on 192.168.3.71
192.168.3.71 received a packet from 192.168.3.71 of len 8192 cmsg len 0,
on port 52287
payload contains:30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41
42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59
5a 5b 5c 5d 5e 5f 60 61 62 63 64 65 66 67 68 69 6a 6b ...
server listening on 192.168.3.71
Run client:
[root@lab70 bpf]# ./rds_skb -s 192.168.3.71 -c 192.168.3.70
transport tcp
client bound to address: 192.168.3.71 port 47437
client sending 8192 byte message from 192.168.3.71 to 192.168.3.70 on
port 47437
bpf program output:
[root@lab71]# cat /sys/kernel/debug/tracing/trace_pipe
<idle>-0 [000] ..s. 218923.839673: 0: 30 31 32
<idle>-0 [000] ..s. 218923.839682: 0: 33 34 35
<idle>-0 [000] ..s. 218923.845133: 0: be bf c0
<idle>-0 [000] ..s. 218923.845135: 0: c1 c2 c3
<idle>-0 [000] ..s. 218923.850581: 0: be bf c0
<idle>-0 [000] ..s. 218923.850582: 0: c1 c2 c3
<idle>-0 [000] ..s. 218923.850582: 0: no more skb frag
Note: changing MTU to 9000 help assure that RDS get skb with
fragments.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
samples/bpf/Makefile | 3 +
samples/bpf/rds_skb_kern.c | 87 +++++++++++++
samples/bpf/rds_skb_user.c | 311 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 401 insertions(+)
create mode 100644 samples/bpf/rds_skb_kern.c
create mode 100644 samples/bpf/rds_skb_user.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 62a99ab..a05c3b2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -51,6 +51,7 @@ hostprogs-y += cpustat
hostprogs-y += xdp_adjust_tail
hostprogs-y += xdpsock
hostprogs-y += xdp_fwd
+hostprogs-y += rds_skb
# Libbpf dependencies
LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -105,6 +106,7 @@ cpustat-objs := bpf_load.o cpustat_user.o
xdp_adjust_tail-objs := xdp_adjust_tail_user.o
xdpsock-objs := bpf_load.o xdpsock_user.o
xdp_fwd-objs := bpf_load.o xdp_fwd_user.o
+rds_skb-objs := bpf_load.o rds_skb_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
@@ -160,6 +162,7 @@ always += cpustat_kern.o
always += xdp_adjust_tail_kern.o
always += xdpsock_kern.o
always += xdp_fwd_kern.o
+always += rds_skb_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/rds_skb_kern.c b/samples/bpf/rds_skb_kern.c
new file mode 100644
index 0000000..c8832d4
--- /dev/null
+++ b/samples/bpf/rds_skb_kern.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/filter.h>
+#include <linux/ptrace.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include <linux/rds.h>
+#include "bpf_helpers.h"
+
+
+#define PROG(F) SEC("socket/"__stringify(F)) int bpf_func_##F
+
+#define bpf_printk(fmt, ...) \
+({ \
+ char ____fmt[] = fmt; \
+ bpf_trace_printk(____fmt, sizeof(____fmt), \
+ ##__VA_ARGS__); \
+})
+
+
+struct bpf_map_def SEC("maps") jmp_table = {
+ .type = BPF_MAP_TYPE_PROG_ARRAY,
+ .key_size = sizeof(u32),
+ .value_size = sizeof(u32),
+ .max_entries = 2,
+};
+
+#define FRAG 1
+
+static inline void dump_skb(struct __sk_buff *skb)
+{
+ void *data = (void *)(long) skb->data_meta;
+ void *data_end = (void *)(long) skb->data_end;
+ unsigned char *d;
+
+ if (data + 6 > data_end)
+ return;
+
+ d = (unsigned char *)data;
+ bpf_printk("%x %x %x\n", d[0], d[1], d[2]);
+ bpf_printk("%x %x %x\n", d[3], d[4], d[5]);
+ return;
+}
+
+static void populate_skb_frags(struct __sk_buff *skb)
+{
+ int ret;
+
+ ret = bpf_next_skb_frag(skb);
+ if (ret == -ENODATA) {
+ bpf_printk("no more skb frag\n");
+ return;
+ }
+
+ bpf_tail_call(skb, &jmp_table, 1);
+}
+
+/* walk skb frag */
+
+PROG(FRAG)(struct __sk_buff *skb)
+{
+ dump_skb(skb);
+ populate_skb_frags(skb);
+ return 0;
+}
+
+SEC("socket/0")
+int main_prog(struct __sk_buff *skb)
+{
+ void *data = (void *)(long) skb->data;
+ void *data_end = (void *)(long) skb->data_end;
+ int ret;
+ unsigned char *d;
+
+ if (data + 6 > data_end) {
+ bpf_printk("out\n");
+ return 0;
+ }
+
+ d = (unsigned char *)data;
+ bpf_printk("%x %x %x\n", d[0], d[1], d[2]);
+ bpf_printk("%x %x %x\n", d[3], d[4], d[5]);
+
+ populate_skb_frags(skb);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/rds_skb_user.c b/samples/bpf/rds_skb_user.c
new file mode 100644
index 0000000..9f73dc3
--- /dev/null
+++ b/samples/bpf/rds_skb_user.c
@@ -0,0 +1,311 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <arpa/inet.h>
+#include <assert.h>
+#include "bpf_load.h"
+#include <getopt.h>
+#include <errno.h>
+#include <netinet/in.h>
+#include <limits.h>
+#include <linux/sockios.h>
+#include <linux/rds.h>
+#include <linux/errqueue.h>
+#include <linux/bpf.h>
+#include <strings.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#define TESTPORT 4000
+#define BUFSIZE 8192
+
+static const char *trans2str(int trans)
+{
+ switch (trans) {
+ case RDS_TRANS_TCP:
+ return ("tcp");
+ case RDS_TRANS_NONE:
+ return ("none");
+ default:
+ return ("unknown");
+ }
+}
+
+static int gettransport(int sock)
+{
+ int err;
+ char val;
+ socklen_t len = sizeof(int);
+
+ err = getsockopt(sock, SOL_RDS, SO_RDS_TRANSPORT,
+ (char *)&val, &len);
+ if (err < 0) {
+ fprintf(stderr, "%s: getsockopt %s\n",
+ __func__, strerror(errno));
+ return err;
+ }
+ return (int)val;
+}
+
+static int settransport(int sock, int transport)
+{
+ int err;
+
+ err = setsockopt(sock, SOL_RDS, SO_RDS_TRANSPORT,
+ (char *)&transport, sizeof(transport));
+ if (err < 0) {
+ fprintf(stderr, "could not set transport %s, %s\n",
+ trans2str(transport), strerror(errno));
+ }
+ return err;
+}
+
+static void print_sock_local_info(int fd, char *str, struct sockaddr_in *ret)
+{
+ socklen_t sin_size = sizeof(struct sockaddr_in);
+ struct sockaddr_in sin;
+ int err;
+
+ err = getsockname(fd, (struct sockaddr *)&sin, &sin_size);
+ if (err < 0) {
+ fprintf(stderr, "%s getsockname %s\n",
+ __func__, strerror(errno));
+ return;
+ }
+ printf("%s address: %s port %d\n",
+ (str ? str : ""), inet_ntoa(sin.sin_addr), ntohs(sin.sin_port));
+
+ if (ret != NULL)
+ *ret = sin;
+}
+
+static void server(char *address, in_port_t port)
+{
+ struct sockaddr_in sin, din;
+ struct msghdr msg;
+ struct iovec *iov;
+ int rc, sock;
+ char *buf;
+
+ buf = calloc(BUFSIZE, sizeof(char));
+ if (!buf) {
+ fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+ return;
+ }
+
+ sock = socket(PF_RDS, SOCK_SEQPACKET, 0);
+ if (sock < 0) {
+ fprintf(stderr, "%s: socket %s\n", __func__, strerror(errno));
+ goto out;
+ }
+ if (settransport(sock, RDS_TRANS_TCP) < 0)
+ goto out;
+
+ printf("transport %s\n", trans2str(gettransport(sock)));
+
+ memset(&sin, 0, sizeof(sin));
+ sin.sin_family = AF_INET;
+ sin.sin_addr.s_addr = inet_addr(address);
+ sin.sin_port = htons(port);
+
+ rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin));
+ if (rc < 0) {
+ fprintf(stderr, "%s: bind %s\n", __func__, strerror(errno));
+ goto out;
+ }
+
+ /* attach eBPF program */
+ assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd[1],
+ sizeof(prog_fd[0])) == 0);
+
+ print_sock_local_info(sock, "server bound to", NULL);
+
+ iov = calloc(1, sizeof(struct iovec));
+ if (!iov) {
+ fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+ goto out;
+ }
+
+ while (1) {
+ memset(buf, 0, BUFSIZE);
+ iov[0].iov_base = buf;
+ iov[0].iov_len = BUFSIZE;
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_name = &din;
+ msg.msg_namelen = sizeof(din);
+ msg.msg_iov = iov;
+ msg.msg_iovlen = 1;
+
+ printf("server listening on %s\n", inet_ntoa(sin.sin_addr));
+
+ rc = recvmsg(sock, &msg, 0);
+ if (rc < 0) {
+ fprintf(stderr, "%s: recvmsg %s\n",
+ __func__, strerror(errno));
+ break;
+ }
+
+ printf("%s received a packet from %s of len %d cmsg len %d, on port %d\n",
+ inet_ntoa(sin.sin_addr),
+ inet_ntoa(din.sin_addr),
+ (uint32_t) iov[0].iov_len,
+ (uint32_t) msg.msg_controllen,
+ ntohs(din.sin_port));
+
+ {
+ int i;
+
+ printf("payload contains:");
+ for (i = 0; i < 60; i++)
+ printf("%x ", buf[i]);
+ printf("...\n");
+ }
+ }
+ free(iov);
+out:
+ free(buf);
+}
+
+static void create_message(char *buf)
+{
+ unsigned int i;
+
+ for (i = 0; i < BUFSIZE; i++) {
+ buf[i] = i + 0x30;
+ }
+}
+
+static int build_rds_packet(struct msghdr *msg, char *buf)
+{
+ struct iovec *iov;
+
+ iov = calloc(1, sizeof(struct iovec));
+ if (!iov) {
+ fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+ return -1;
+ }
+
+ msg->msg_iov = iov;
+ msg->msg_iovlen = 1;
+
+ iov[0].iov_base = buf;
+ iov[0].iov_len = BUFSIZE * sizeof(char);
+
+ return 0;
+}
+
+static void client(char *localaddr, char *remoteaddr, in_port_t server_port)
+{
+ struct sockaddr_in sin, din;
+ struct msghdr msg;
+ int rc, sock;
+ char *buf;
+
+ buf = calloc(BUFSIZE, sizeof(char));
+ if (!buf) {
+ fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+ return;
+ }
+
+ create_message(buf);
+
+ sock = socket(PF_RDS, SOCK_SEQPACKET, 0);
+ if (sock < 0) {
+ fprintf(stderr, "%s: socket %s\n", __func__, strerror(errno));
+ goto out;
+ }
+
+ if (settransport(sock, RDS_TRANS_TCP) < 0)
+ goto out;
+
+ printf("transport %s\n", trans2str(gettransport(sock)));
+
+ memset(&sin, 0, sizeof(sin));
+ sin.sin_family = AF_INET;
+ sin.sin_addr.s_addr = inet_addr(localaddr);
+ sin.sin_port = 0;
+
+ rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin));
+ if (rc < 0) {
+ fprintf(stderr, "%s: bind %s\n", __func__, strerror(errno));
+ goto out;
+ }
+ print_sock_local_info(sock, "client bound to", &sin);
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_name = &din;
+ msg.msg_namelen = sizeof(din);
+
+ memset(&din, 0, sizeof(din));
+ din.sin_family = AF_INET;
+ din.sin_addr.s_addr = inet_addr(remoteaddr);
+ din.sin_port = htons(server_port);
+
+ rc = build_rds_packet(&msg, buf);
+ if (rc < 0)
+ goto out;
+
+ printf("client sending %d byte message from %s to %s on port %d\n",
+ (uint32_t) msg.msg_iov->iov_len, localaddr,
+ remoteaddr, ntohs(sin.sin_port));
+
+ rc = sendmsg(sock, &msg, 0);
+ if (rc < 0)
+ fprintf(stderr, "%s: sendmsg %s\n", __func__, strerror(errno));
+
+ if (msg.msg_control)
+ free(msg.msg_control);
+ if (msg.msg_iov)
+ free(msg.msg_iov);
+out:
+ free(buf);
+
+ return;
+}
+
+static void usage(char *progname)
+{
+ fprintf(stderr, "Usage %s [-s srvaddr] [-c clientaddr]\n", progname);
+}
+
+int main(int argc, char **argv)
+{
+ in_port_t server_port = TESTPORT;
+ char *serveraddr = NULL;
+ char *clientaddr = NULL;
+ char filename[256];
+ int opt;
+
+ while ((opt = getopt(argc, argv, "s:c:")) != -1) {
+ switch (opt) {
+ case 's':
+ serveraddr = optarg;
+ break;
+ case 'c':
+ clientaddr = optarg;
+ break;
+ default:
+ usage(argv[0]);
+ return 1;
+ }
+ }
+
+ snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+ if (load_bpf_file(filename)) {
+ fprintf(stderr, "Error: load_bpf_file %s", bpf_log_buf);
+ return 1;
+ }
+
+ if (serveraddr && !clientaddr) {
+ printf("running server in a loop\n");
+ server(serveraddr, server_port);
+ } else if (serveraddr && clientaddr) {
+ client(clientaddr, serveraddr, server_port);
+ }
+
+ return 0;
+}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 3/3] rds: invoke sk filter attached to rds socket
2018-06-08 21:00 [RFC PATCH 0/3] BPF socket filter to deal with skb frags Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 2/3] samples/bpf: add sample RDS program Tushar Dave
@ 2018-06-08 21:00 ` Tushar Dave
2 siblings, 0 replies; 7+ messages in thread
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
rdna, quentin.monnet, brakmo, acme
RDS module sits on top of TCP (rds_tcp) and IB (rds_rdma), so messages
arrive in form of skb (over TCP) and scatterlist (over IB/RDMA).
However, because socket filter only deal with skb (e.g. struct skb as
bpf context) we can only use socket filter for rds_tcp and not for
rds_rdma. For that reason this patch invokes socket filter only for
rds socket with tcp transport e.g. rds_tcp.
note:
BTW, we dont want rds-core to be polluted by module-specific data
structures e.g. we included tcp.h to retrieve rds_tcp specific
structures. For non-RFC version we will add a way to get transport
specific indirections to get the skb.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/recv.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/net/rds/recv.c b/net/rds/recv.c
index dc67458..3be9628 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -39,6 +39,7 @@
#include <linux/rds.h>
#include "rds.h"
+#include "tcp.h"
void rds_inc_init(struct rds_incoming *inc, struct rds_connection *conn,
__be32 saddr)
@@ -369,6 +370,22 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
/* We can be racing with rds_release() which marks the socket dead. */
sk = rds_rs_to_sk(rs);
+ if (rs->rs_transport->t_type == RDS_TRANS_TCP) {
+ struct sk_buff *skb;
+ struct sk_filter *filter = sk->sk_filter;
+ struct rds_tcp_incoming *tinc;
+
+ tinc = container_of(inc, struct rds_tcp_incoming, ti_inc);
+ skb = tinc->ti_skb_list.next;
+ rcu_read_lock();
+ filter = rcu_dereference(sk->sk_filter);
+ if (filter) {
+ bpf_compute_data_pointers(skb);
+ bpf_prog_run_save_cb(filter->prog, skb);
+ }
+ rcu_read_unlock();
+ }
+
/* serialize with rds_release -> sock_orphan */
write_lock_irqsave(&rs->rs_recv_lock, flags);
if (!sock_flag(sk, SOCK_DEAD)) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter
2018-06-08 21:00 ` [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter Tushar Dave
@ 2018-06-08 21:27 ` Daniel Borkmann
2018-06-08 21:46 ` Tushar Dave
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2018-06-08 21:27 UTC (permalink / raw)
To: Tushar Dave, netdev, ast, davem, john.fastabend, jakub.kicinski,
kafai, rdna, quentin.monnet, brakmo, acme
On 06/08/2018 11:00 PM, Tushar Dave wrote:
> Today socket filter only deals with linear skbs. This change allows
> ebpf programs to look into non-linear skb e.g. skb frags. This will be
> useful when users need to look into data which is not contained in the
> linear part of skb.
Hmm, I don't think this statement is correct in its form here ... they
can handle non-linear skbs just fine.
Straight forward way is to use bpf_skb_load_bytes(). It's simple and uses
internally skb_header_pointer(), and that one of course walks everything
if it really has to via skb_copy_bits() (page frags _and_ frag list). And
if you need to look into mac/net headers that may otherwise not be accessible
anymore from socket layer, there's bpf_skb_load_bytes_relative() helper
which is effectively doing the negative offset trick from ld_abs/ind more
efficient for multi-byte loads.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter
2018-06-08 21:27 ` Daniel Borkmann
@ 2018-06-08 21:46 ` Tushar Dave
2018-06-08 22:24 ` Tushar Dave
0 siblings, 1 reply; 7+ messages in thread
From: Tushar Dave @ 2018-06-08 21:46 UTC (permalink / raw)
To: Daniel Borkmann, netdev, ast, davem, john.fastabend,
jakub.kicinski, kafai, rdna, quentin.monnet, brakmo, acme
On 06/08/2018 02:27 PM, Daniel Borkmann wrote:
> On 06/08/2018 11:00 PM, Tushar Dave wrote:
>> Today socket filter only deals with linear skbs. This change allows
>> ebpf programs to look into non-linear skb e.g. skb frags. This will be
>> useful when users need to look into data which is not contained in the
>> linear part of skb.
>
> Hmm, I don't think this statement is correct in its form here ... they
> can handle non-linear skbs just fine.
Thanks Daniel for your reply.
>
> Straight forward way is to use bpf_skb_load_bytes(). It's simple and uses
> internally skb_header_pointer(), and that one of course walks everything
> if it really has to via skb_copy_bits() (page frags _and_ frag list). And
> if you need to look into mac/net headers that may otherwise not be accessible
> anymore from socket layer, there's bpf_skb_load_bytes_relative() helper
> which is effectively doing the negative offset trick from ld_abs/ind more
> efficient for multi-byte loads.
I'm looking into bpf_skb_load_bytes and friends.
Thanks.
-Tushar
>
> Thanks,
> Daniel
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter
2018-06-08 21:46 ` Tushar Dave
@ 2018-06-08 22:24 ` Tushar Dave
0 siblings, 0 replies; 7+ messages in thread
From: Tushar Dave @ 2018-06-08 22:24 UTC (permalink / raw)
To: Daniel Borkmann, netdev, ast, davem, john.fastabend,
jakub.kicinski, kafai, rdna, quentin.monnet, brakmo, acme
On 06/08/2018 02:46 PM, Tushar Dave wrote:
>
>
> On 06/08/2018 02:27 PM, Daniel Borkmann wrote:
>> On 06/08/2018 11:00 PM, Tushar Dave wrote:
>>> Today socket filter only deals with linear skbs. This change allows
>>> ebpf programs to look into non-linear skb e.g. skb frags. This will be
>>> useful when users need to look into data which is not contained in the
>>> linear part of skb.
>>
>> Hmm, I don't think this statement is correct in its form here ... they
>> can handle non-linear skbs just fine.
> Thanks Daniel for your reply.
>>
>> Straight forward way is to use bpf_skb_load_bytes(). It's simple and uses
>> internally skb_header_pointer(), and that one of course walks everything
>> if it really has to via skb_copy_bits() (page frags _and_ frag list). And
>> if you need to look into mac/net headers that may otherwise not be
>> accessible
>> anymore from socket layer, there's bpf_skb_load_bytes_relative() helper
>> which is effectively doing the negative offset trick from ld_abs/ind more
>> efficient for multi-byte loads.
> I'm looking into bpf_skb_load_bytes and friends.
Daniel,
While I am trying to see if I can use exiting bpf_skb_load helpers, I am
wondering socket filter based ebpf program are allowed to change packet
data? In other words, can we use them to build firewall?
Thanks.
-Tushar
>
> Thanks.
> -Tushar
>>
>> Thanks,
>> Daniel
>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-06-08 22:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-08 21:00 [RFC PATCH 0/3] BPF socket filter to deal with skb frags Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter Tushar Dave
2018-06-08 21:27 ` Daniel Borkmann
2018-06-08 21:46 ` Tushar Dave
2018-06-08 22:24 ` Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 2/3] samples/bpf: add sample RDS program Tushar Dave
2018-06-08 21:00 ` [RFC PATCH 3/3] rds: invoke sk filter attached to rds socket Tushar Dave
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).