* Re: [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter
From: Daniel Borkmann @ 2018-06-08 21:27 UTC (permalink / raw)
To: Tushar Dave, netdev, ast, davem, john.fastabend, jakub.kicinski,
kafai, rdna, quentin.monnet, brakmo, acme
In-Reply-To: <1528491607-10399-2-git-send-email-tushar.n.dave@oracle.com>
On 06/08/2018 11:00 PM, Tushar Dave wrote:
> Today socket filter only deals with linear skbs. This change allows
> ebpf programs to look into non-linear skb e.g. skb frags. This will be
> useful when users need to look into data which is not contained in the
> linear part of skb.
Hmm, I don't think this statement is correct in its form here ... they
can handle non-linear skbs just fine.
Straight forward way is to use bpf_skb_load_bytes(). It's simple and uses
internally skb_header_pointer(), and that one of course walks everything
if it really has to via skb_copy_bits() (page frags _and_ frag list). And
if you need to look into mac/net headers that may otherwise not be accessible
anymore from socket layer, there's bpf_skb_load_bytes_relative() helper
which is effectively doing the negative offset trick from ld_abs/ind more
efficient for multi-byte loads.
Thanks,
Daniel
^ permalink raw reply
* Re: Fw: [Bug 199995] New: Ramdomly sent TCP Reset from Kernel with bonding mode "brodcast"
From: Michal Kubecek @ 2018-06-08 21:04 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Eric Dumazet, netdev
In-Reply-To: <20180608095954.4a0437e4@xeon-e3>
On Fri, Jun 08, 2018 at 09:59:54AM -0700, Stephen Hemminger wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=199995
>
> Bug ID: 199995
> Summary: Ramdomly sent TCP Reset from Kernel with bonding mode
> "brodcast"
>
> after a dist upgrade from Ubuntu 17.10 (Kernel 4.13.x) to Ubuntu 18.04 (Kernel
> 4.15.0) I suffer from ramdomly generated TCP RST packets sent (presumably) by
> the Kernel
> on a bonding device that uses bonding mode "brodcast" with 2 physical NICs.
>
> With tcpdump/whireshark I can see that the kernel randomly sends TCP-RST
> packets after the SYN/ACK/ACK packet is received (see attached PCAP).
> This only happens if the kernel receives the initial SYN packet on both
> physical NICs (and therefore seeing it twice), before the connection is
> established by sending SYN/ACK.
> It's not happening in 100% of all cases and only, if the system can use two or
> more CPU cores/threads. With only one CPU available to the system, this
> behaviour is not reproducable.
I have seen similar report earlier from one of our customers running
SLE12 SP2 (kernel 4.4). The problem is that if duplicated SYN packet is
received on both slaves, these two copies can be processed by the
lockless listener simultaneously on different CPUs and each can reply by
SYNACK with different sequence number which results in a reset.
I tried to think of a way to prevent this race without losing the
performance gain of lockless listener but couldn't come with anything.
Eventually, I managed to persuade the customer that this setup (where
each packet is received twice under normal circumstances) is not what
broadcast mode was designed for (based on the description in
Documentation/networking/bonding.txt).
However, the lockless listener was introduced in 4.4 so it's not clear
why reporter started encountering this after an upgrade from 4.13 to
4.15.
Michal Kubecek
^ permalink raw reply
* [RFC PATCH 3/3] rds: invoke sk filter attached to rds socket
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
rdna, quentin.monnet, brakmo, acme
In-Reply-To: <1528491607-10399-1-git-send-email-tushar.n.dave@oracle.com>
RDS module sits on top of TCP (rds_tcp) and IB (rds_rdma), so messages
arrive in form of skb (over TCP) and scatterlist (over IB/RDMA).
However, because socket filter only deal with skb (e.g. struct skb as
bpf context) we can only use socket filter for rds_tcp and not for
rds_rdma. For that reason this patch invokes socket filter only for
rds socket with tcp transport e.g. rds_tcp.
note:
BTW, we dont want rds-core to be polluted by module-specific data
structures e.g. we included tcp.h to retrieve rds_tcp specific
structures. For non-RFC version we will add a way to get transport
specific indirections to get the skb.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/recv.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/net/rds/recv.c b/net/rds/recv.c
index dc67458..3be9628 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -39,6 +39,7 @@
#include <linux/rds.h>
#include "rds.h"
+#include "tcp.h"
void rds_inc_init(struct rds_incoming *inc, struct rds_connection *conn,
__be32 saddr)
@@ -369,6 +370,22 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
/* We can be racing with rds_release() which marks the socket dead. */
sk = rds_rs_to_sk(rs);
+ if (rs->rs_transport->t_type == RDS_TRANS_TCP) {
+ struct sk_buff *skb;
+ struct sk_filter *filter = sk->sk_filter;
+ struct rds_tcp_incoming *tinc;
+
+ tinc = container_of(inc, struct rds_tcp_incoming, ti_inc);
+ skb = tinc->ti_skb_list.next;
+ rcu_read_lock();
+ filter = rcu_dereference(sk->sk_filter);
+ if (filter) {
+ bpf_compute_data_pointers(skb);
+ bpf_prog_run_save_cb(filter->prog, skb);
+ }
+ rcu_read_unlock();
+ }
+
/* serialize with rds_release -> sock_orphan */
write_lock_irqsave(&rs->rs_recv_lock, flags);
if (!sock_flag(sk, SOCK_DEAD)) {
--
1.8.3.1
^ permalink raw reply related
* [RFC PATCH 2/3] samples/bpf: add sample RDS program
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
rdna, quentin.monnet, brakmo, acme
In-Reply-To: <1528491607-10399-1-git-send-email-tushar.n.dave@oracle.com>
When run in server mode, the sample RDS program opens PF_RDS socket,
attaches ebpf program to RDS socket which then uses bpf_skb_next_frag
helper along with bpf tail calls to inspect skb linear and non-linear
data.
To ease testing, RDS client functionality is also added so that users
can generate RDS packet.
Run server:
[root@lab71 bpf]# ./rds_skb -s 192.168.3.71
running server in a loop
transport tcp
server bound to address: 192.168.3.71 port 4000
server listening on 192.168.3.71
192.168.3.71 received a packet from 192.168.3.71 of len 8192 cmsg len 0,
on port 52287
payload contains:30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41
42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59
5a 5b 5c 5d 5e 5f 60 61 62 63 64 65 66 67 68 69 6a 6b ...
server listening on 192.168.3.71
Run client:
[root@lab70 bpf]# ./rds_skb -s 192.168.3.71 -c 192.168.3.70
transport tcp
client bound to address: 192.168.3.71 port 47437
client sending 8192 byte message from 192.168.3.71 to 192.168.3.70 on
port 47437
bpf program output:
[root@lab71]# cat /sys/kernel/debug/tracing/trace_pipe
<idle>-0 [000] ..s. 218923.839673: 0: 30 31 32
<idle>-0 [000] ..s. 218923.839682: 0: 33 34 35
<idle>-0 [000] ..s. 218923.845133: 0: be bf c0
<idle>-0 [000] ..s. 218923.845135: 0: c1 c2 c3
<idle>-0 [000] ..s. 218923.850581: 0: be bf c0
<idle>-0 [000] ..s. 218923.850582: 0: c1 c2 c3
<idle>-0 [000] ..s. 218923.850582: 0: no more skb frag
Note: changing MTU to 9000 help assure that RDS get skb with
fragments.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
samples/bpf/Makefile | 3 +
samples/bpf/rds_skb_kern.c | 87 +++++++++++++
samples/bpf/rds_skb_user.c | 311 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 401 insertions(+)
create mode 100644 samples/bpf/rds_skb_kern.c
create mode 100644 samples/bpf/rds_skb_user.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 62a99ab..a05c3b2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -51,6 +51,7 @@ hostprogs-y += cpustat
hostprogs-y += xdp_adjust_tail
hostprogs-y += xdpsock
hostprogs-y += xdp_fwd
+hostprogs-y += rds_skb
# Libbpf dependencies
LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -105,6 +106,7 @@ cpustat-objs := bpf_load.o cpustat_user.o
xdp_adjust_tail-objs := xdp_adjust_tail_user.o
xdpsock-objs := bpf_load.o xdpsock_user.o
xdp_fwd-objs := bpf_load.o xdp_fwd_user.o
+rds_skb-objs := bpf_load.o rds_skb_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
@@ -160,6 +162,7 @@ always += cpustat_kern.o
always += xdp_adjust_tail_kern.o
always += xdpsock_kern.o
always += xdp_fwd_kern.o
+always += rds_skb_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/rds_skb_kern.c b/samples/bpf/rds_skb_kern.c
new file mode 100644
index 0000000..c8832d4
--- /dev/null
+++ b/samples/bpf/rds_skb_kern.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/filter.h>
+#include <linux/ptrace.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include <linux/rds.h>
+#include "bpf_helpers.h"
+
+
+#define PROG(F) SEC("socket/"__stringify(F)) int bpf_func_##F
+
+#define bpf_printk(fmt, ...) \
+({ \
+ char ____fmt[] = fmt; \
+ bpf_trace_printk(____fmt, sizeof(____fmt), \
+ ##__VA_ARGS__); \
+})
+
+
+struct bpf_map_def SEC("maps") jmp_table = {
+ .type = BPF_MAP_TYPE_PROG_ARRAY,
+ .key_size = sizeof(u32),
+ .value_size = sizeof(u32),
+ .max_entries = 2,
+};
+
+#define FRAG 1
+
+static inline void dump_skb(struct __sk_buff *skb)
+{
+ void *data = (void *)(long) skb->data_meta;
+ void *data_end = (void *)(long) skb->data_end;
+ unsigned char *d;
+
+ if (data + 6 > data_end)
+ return;
+
+ d = (unsigned char *)data;
+ bpf_printk("%x %x %x\n", d[0], d[1], d[2]);
+ bpf_printk("%x %x %x\n", d[3], d[4], d[5]);
+ return;
+}
+
+static void populate_skb_frags(struct __sk_buff *skb)
+{
+ int ret;
+
+ ret = bpf_next_skb_frag(skb);
+ if (ret == -ENODATA) {
+ bpf_printk("no more skb frag\n");
+ return;
+ }
+
+ bpf_tail_call(skb, &jmp_table, 1);
+}
+
+/* walk skb frag */
+
+PROG(FRAG)(struct __sk_buff *skb)
+{
+ dump_skb(skb);
+ populate_skb_frags(skb);
+ return 0;
+}
+
+SEC("socket/0")
+int main_prog(struct __sk_buff *skb)
+{
+ void *data = (void *)(long) skb->data;
+ void *data_end = (void *)(long) skb->data_end;
+ int ret;
+ unsigned char *d;
+
+ if (data + 6 > data_end) {
+ bpf_printk("out\n");
+ return 0;
+ }
+
+ d = (unsigned char *)data;
+ bpf_printk("%x %x %x\n", d[0], d[1], d[2]);
+ bpf_printk("%x %x %x\n", d[3], d[4], d[5]);
+
+ populate_skb_frags(skb);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/rds_skb_user.c b/samples/bpf/rds_skb_user.c
new file mode 100644
index 0000000..9f73dc3
--- /dev/null
+++ b/samples/bpf/rds_skb_user.c
@@ -0,0 +1,311 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <arpa/inet.h>
+#include <assert.h>
+#include "bpf_load.h"
+#include <getopt.h>
+#include <errno.h>
+#include <netinet/in.h>
+#include <limits.h>
+#include <linux/sockios.h>
+#include <linux/rds.h>
+#include <linux/errqueue.h>
+#include <linux/bpf.h>
+#include <strings.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#define TESTPORT 4000
+#define BUFSIZE 8192
+
+static const char *trans2str(int trans)
+{
+ switch (trans) {
+ case RDS_TRANS_TCP:
+ return ("tcp");
+ case RDS_TRANS_NONE:
+ return ("none");
+ default:
+ return ("unknown");
+ }
+}
+
+static int gettransport(int sock)
+{
+ int err;
+ char val;
+ socklen_t len = sizeof(int);
+
+ err = getsockopt(sock, SOL_RDS, SO_RDS_TRANSPORT,
+ (char *)&val, &len);
+ if (err < 0) {
+ fprintf(stderr, "%s: getsockopt %s\n",
+ __func__, strerror(errno));
+ return err;
+ }
+ return (int)val;
+}
+
+static int settransport(int sock, int transport)
+{
+ int err;
+
+ err = setsockopt(sock, SOL_RDS, SO_RDS_TRANSPORT,
+ (char *)&transport, sizeof(transport));
+ if (err < 0) {
+ fprintf(stderr, "could not set transport %s, %s\n",
+ trans2str(transport), strerror(errno));
+ }
+ return err;
+}
+
+static void print_sock_local_info(int fd, char *str, struct sockaddr_in *ret)
+{
+ socklen_t sin_size = sizeof(struct sockaddr_in);
+ struct sockaddr_in sin;
+ int err;
+
+ err = getsockname(fd, (struct sockaddr *)&sin, &sin_size);
+ if (err < 0) {
+ fprintf(stderr, "%s getsockname %s\n",
+ __func__, strerror(errno));
+ return;
+ }
+ printf("%s address: %s port %d\n",
+ (str ? str : ""), inet_ntoa(sin.sin_addr), ntohs(sin.sin_port));
+
+ if (ret != NULL)
+ *ret = sin;
+}
+
+static void server(char *address, in_port_t port)
+{
+ struct sockaddr_in sin, din;
+ struct msghdr msg;
+ struct iovec *iov;
+ int rc, sock;
+ char *buf;
+
+ buf = calloc(BUFSIZE, sizeof(char));
+ if (!buf) {
+ fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+ return;
+ }
+
+ sock = socket(PF_RDS, SOCK_SEQPACKET, 0);
+ if (sock < 0) {
+ fprintf(stderr, "%s: socket %s\n", __func__, strerror(errno));
+ goto out;
+ }
+ if (settransport(sock, RDS_TRANS_TCP) < 0)
+ goto out;
+
+ printf("transport %s\n", trans2str(gettransport(sock)));
+
+ memset(&sin, 0, sizeof(sin));
+ sin.sin_family = AF_INET;
+ sin.sin_addr.s_addr = inet_addr(address);
+ sin.sin_port = htons(port);
+
+ rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin));
+ if (rc < 0) {
+ fprintf(stderr, "%s: bind %s\n", __func__, strerror(errno));
+ goto out;
+ }
+
+ /* attach eBPF program */
+ assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd[1],
+ sizeof(prog_fd[0])) == 0);
+
+ print_sock_local_info(sock, "server bound to", NULL);
+
+ iov = calloc(1, sizeof(struct iovec));
+ if (!iov) {
+ fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+ goto out;
+ }
+
+ while (1) {
+ memset(buf, 0, BUFSIZE);
+ iov[0].iov_base = buf;
+ iov[0].iov_len = BUFSIZE;
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_name = &din;
+ msg.msg_namelen = sizeof(din);
+ msg.msg_iov = iov;
+ msg.msg_iovlen = 1;
+
+ printf("server listening on %s\n", inet_ntoa(sin.sin_addr));
+
+ rc = recvmsg(sock, &msg, 0);
+ if (rc < 0) {
+ fprintf(stderr, "%s: recvmsg %s\n",
+ __func__, strerror(errno));
+ break;
+ }
+
+ printf("%s received a packet from %s of len %d cmsg len %d, on port %d\n",
+ inet_ntoa(sin.sin_addr),
+ inet_ntoa(din.sin_addr),
+ (uint32_t) iov[0].iov_len,
+ (uint32_t) msg.msg_controllen,
+ ntohs(din.sin_port));
+
+ {
+ int i;
+
+ printf("payload contains:");
+ for (i = 0; i < 60; i++)
+ printf("%x ", buf[i]);
+ printf("...\n");
+ }
+ }
+ free(iov);
+out:
+ free(buf);
+}
+
+static void create_message(char *buf)
+{
+ unsigned int i;
+
+ for (i = 0; i < BUFSIZE; i++) {
+ buf[i] = i + 0x30;
+ }
+}
+
+static int build_rds_packet(struct msghdr *msg, char *buf)
+{
+ struct iovec *iov;
+
+ iov = calloc(1, sizeof(struct iovec));
+ if (!iov) {
+ fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+ return -1;
+ }
+
+ msg->msg_iov = iov;
+ msg->msg_iovlen = 1;
+
+ iov[0].iov_base = buf;
+ iov[0].iov_len = BUFSIZE * sizeof(char);
+
+ return 0;
+}
+
+static void client(char *localaddr, char *remoteaddr, in_port_t server_port)
+{
+ struct sockaddr_in sin, din;
+ struct msghdr msg;
+ int rc, sock;
+ char *buf;
+
+ buf = calloc(BUFSIZE, sizeof(char));
+ if (!buf) {
+ fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+ return;
+ }
+
+ create_message(buf);
+
+ sock = socket(PF_RDS, SOCK_SEQPACKET, 0);
+ if (sock < 0) {
+ fprintf(stderr, "%s: socket %s\n", __func__, strerror(errno));
+ goto out;
+ }
+
+ if (settransport(sock, RDS_TRANS_TCP) < 0)
+ goto out;
+
+ printf("transport %s\n", trans2str(gettransport(sock)));
+
+ memset(&sin, 0, sizeof(sin));
+ sin.sin_family = AF_INET;
+ sin.sin_addr.s_addr = inet_addr(localaddr);
+ sin.sin_port = 0;
+
+ rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin));
+ if (rc < 0) {
+ fprintf(stderr, "%s: bind %s\n", __func__, strerror(errno));
+ goto out;
+ }
+ print_sock_local_info(sock, "client bound to", &sin);
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_name = &din;
+ msg.msg_namelen = sizeof(din);
+
+ memset(&din, 0, sizeof(din));
+ din.sin_family = AF_INET;
+ din.sin_addr.s_addr = inet_addr(remoteaddr);
+ din.sin_port = htons(server_port);
+
+ rc = build_rds_packet(&msg, buf);
+ if (rc < 0)
+ goto out;
+
+ printf("client sending %d byte message from %s to %s on port %d\n",
+ (uint32_t) msg.msg_iov->iov_len, localaddr,
+ remoteaddr, ntohs(sin.sin_port));
+
+ rc = sendmsg(sock, &msg, 0);
+ if (rc < 0)
+ fprintf(stderr, "%s: sendmsg %s\n", __func__, strerror(errno));
+
+ if (msg.msg_control)
+ free(msg.msg_control);
+ if (msg.msg_iov)
+ free(msg.msg_iov);
+out:
+ free(buf);
+
+ return;
+}
+
+static void usage(char *progname)
+{
+ fprintf(stderr, "Usage %s [-s srvaddr] [-c clientaddr]\n", progname);
+}
+
+int main(int argc, char **argv)
+{
+ in_port_t server_port = TESTPORT;
+ char *serveraddr = NULL;
+ char *clientaddr = NULL;
+ char filename[256];
+ int opt;
+
+ while ((opt = getopt(argc, argv, "s:c:")) != -1) {
+ switch (opt) {
+ case 's':
+ serveraddr = optarg;
+ break;
+ case 'c':
+ clientaddr = optarg;
+ break;
+ default:
+ usage(argv[0]);
+ return 1;
+ }
+ }
+
+ snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+ if (load_bpf_file(filename)) {
+ fprintf(stderr, "Error: load_bpf_file %s", bpf_log_buf);
+ return 1;
+ }
+
+ if (serveraddr && !clientaddr) {
+ printf("running server in a loop\n");
+ server(serveraddr, server_port);
+ } else if (serveraddr && clientaddr) {
+ client(clientaddr, serveraddr, server_port);
+ }
+
+ return 0;
+}
--
1.8.3.1
^ permalink raw reply related
* [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
rdna, quentin.monnet, brakmo, acme
In-Reply-To: <1528491607-10399-1-git-send-email-tushar.n.dave@oracle.com>
Today socket filter only deals with linear skbs. This change allows
ebpf programs to look into non-linear skb e.g. skb frags. This will be
useful when users need to look into data which is not contained in the
linear part of skb.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
include/linux/filter.h | 2 ++
include/uapi/linux/bpf.h | 10 ++++++-
net/core/filter.c | 44 +++++++++++++++++++++++++++++--
tools/include/uapi/linux/bpf.h | 10 ++++++-
tools/testing/selftests/bpf/bpf_helpers.h | 2 ++
5 files changed, 64 insertions(+), 4 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9dbcb9d..603b8bf 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -500,6 +500,7 @@ struct sk_filter {
struct bpf_skb_data_end {
struct qdisc_skb_cb qdisc_cb;
+ u8 index;
void *data_meta;
void *data_end;
};
@@ -534,6 +535,7 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
BUILD_BUG_ON(sizeof(*cb) > FIELD_SIZEOF(struct sk_buff, cb));
cb->data_meta = skb->data - skb_metadata_len(skb);
cb->data_end = skb->data + skb_headlen(skb);
+ cb->index = 0;
}
static inline u8 *bpf_skb_cb(struct sk_buff *skb)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d94d333..5fe9668 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1902,6 +1902,13 @@ struct bpf_stack_build_id {
* egress otherwise). This is the only flag supported for now.
* Return
* **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_next_skb_frag(struct sk_buff *skb)
+ * Description
+ * This helper allows users to look into non-linear part of skb
+ * e.g. skb frags.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -1976,7 +1983,8 @@ struct bpf_stack_build_id {
FN(fib_lookup), \
FN(sock_hash_update), \
FN(msg_redirect_hash), \
- FN(sk_redirect_hash),
+ FN(sk_redirect_hash), \
+ FN(next_skb_frag),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 51ea7dd..fd8e90f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3752,6 +3752,38 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
.arg1_type = ARG_PTR_TO_CTX,
};
+BPF_CALL_1(bpf_next_skb_frag, struct sk_buff *, skb)
+{
+ struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+ const skb_frag_t *frag;
+
+ if (skb->data_len == 0)
+ return -ENODATA;
+
+ if (cb->index == (u8)skb_shinfo(skb)->nr_frags)
+ return -ENODATA;
+
+ /* get the frag start and end address into data_meta and data_end
+ * respectively so eBPF program can look into skb frag
+ */
+ frag = &skb_shinfo(skb)->frags[cb->index];
+ cb->data_meta = page_address(skb_frag_page(frag)) +
+ frag->page_offset;
+ cb->data_end = cb->data_meta + skb_frag_size(frag);
+
+ /* update frag index */
+ cb->index++;
+
+ return 0;
+}
+
+static const struct bpf_func_proto bpf_next_skb_frag_proto = {
+ .func = bpf_next_skb_frag,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+};
+
BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
int, level, int, optname, char *, optval, int, optlen)
{
@@ -4415,6 +4447,8 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
return &bpf_get_socket_cookie_proto;
case BPF_FUNC_get_socket_uid:
return &bpf_get_socket_uid_proto;
+ case BPF_FUNC_next_skb_frag:
+ return &bpf_next_skb_frag_proto;
default:
return bpf_base_func_proto(func_id);
}
@@ -4698,10 +4732,16 @@ static bool sk_filter_is_valid_access(int off, int size,
struct bpf_insn_access_aux *info)
{
switch (off) {
- case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data):
- case bpf_ctx_range(struct __sk_buff, data_meta):
+ info->reg_type = PTR_TO_PACKET;
+ break;
case bpf_ctx_range(struct __sk_buff, data_end):
+ info->reg_type = PTR_TO_PACKET_END;
+ break;
+ case bpf_ctx_range(struct __sk_buff, data_meta):
+ info->reg_type = PTR_TO_PACKET;
+ break;
+ case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false;
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d94d333..5fe9668 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1902,6 +1902,13 @@ struct bpf_stack_build_id {
* egress otherwise). This is the only flag supported for now.
* Return
* **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_next_skb_frag(struct sk_buff *skb)
+ * Description
+ * This helper allows users to look into non-linear part of skb
+ * e.g. skb frags.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -1976,7 +1983,8 @@ struct bpf_stack_build_id {
FN(fib_lookup), \
FN(sock_hash_update), \
FN(msg_redirect_hash), \
- FN(sk_redirect_hash),
+ FN(sk_redirect_hash), \
+ FN(next_skb_frag),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 8f143df..51f2153 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -114,6 +114,8 @@ static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
static int (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params,
int plen, __u32 flags) =
(void *) BPF_FUNC_fib_lookup;
+static unsigned long long (*bpf_next_skb_frag)(void *ctx) =
+ (void *) BPF_FUNC_next_skb_frag;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
--
1.8.3.1
^ permalink raw reply related
* [RFC PATCH 0/3] BPF socket filter to deal with skb frags
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
rdna, quentin.monnet, brakmo, acme
This RFC allows bpf socket filter programs to look into complete skb
i.e. linear and non-linear part of skb. (patch1)
For a proof of concept I'm using RDS sample program that uses bpf socket
filter and inspect skb packet data from linear and non-linear part e.g.
skb frags. (patch 2 and 3)
I'm sharing this RFC to get some feedback on direction.
Details:
patch1 adds new bpf helper function and needed infrastructure so that
socket(sk) filter based eBPF program can retrieve non-linear part of skb
(e.g. skb frags) unlike current socket filter that only deals with
linear skb. This patch adds very basic functionality and for now allow
socket filter programs to only read packet data (from linear and
non-linear part of) skb. The idea behind this patch is to add eBPF
helper that allow socket filter based ebpf program to walk through the
skb frag using bpf tail call. This way ebpf program can do deep packet
inspection (i.e. allows to look into headers as well as payload).
patch2 adds sample ebpf socket filter program that uses rds socket. The
sample program opens an rds socket, attach ebpf program to rds socket
and uses bpf helper added in patch 1 to look into skb. For a test,
current ebpf program only prints first few bytes from skb->data and skb
frags.
patch3 allows rds_recv_incoming to invoke bpf socket filter program if
any program is attached to rds socket.
FYI, I'm also working on a follow-up patchset that deals with *struct
scatterlist* to allow RDS filtering for IB/RDMA use cases that do not
have an sk_buff.
Thanks.
-Tushar
Tushar Dave (3):
ebpf: add next_skb_frag bpf helper for sk filter
samples/bpf: add sample RDS program
rds: invoke sk filter attached to rds socket
include/linux/filter.h | 2 +
include/uapi/linux/bpf.h | 10 +-
net/core/filter.c | 44 ++++-
net/rds/recv.c | 17 ++
samples/bpf/Makefile | 3 +
samples/bpf/rds_skb_kern.c | 87 +++++++++
samples/bpf/rds_skb_user.c | 311 ++++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 10 +-
tools/testing/selftests/bpf/bpf_helpers.h | 2 +
9 files changed, 482 insertions(+), 4 deletions(-)
create mode 100644 samples/bpf/rds_skb_kern.c
create mode 100644 samples/bpf/rds_skb_user.c
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH 3/3] bpfilter: do not (ab)use host-program build rule
From: Alexei Starovoitov @ 2018-06-08 20:52 UTC (permalink / raw)
To: Masahiro Yamada
Cc: netdev, Alexei Starovoitov, David S . Miller, Arnd Bergmann,
Geert Uytterhoeven, linux-kernel, YueHaibing, Daniel Borkmann
In-Reply-To: <1528477930-7342-4-git-send-email-yamada.masahiro@socionext.com>
On Sat, Jun 09, 2018 at 02:12:10AM +0900, Masahiro Yamada wrote:
> It is an ugly hack to overwrite $(HOSTCC) with $(CC) to reuse the
> build rules from scripts/Makefile.host. It should not be tedious
> to write a build rule for its own.
>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---
>
> net/bpfilter/Makefile | 17 +++++++++++------
> net/bpfilter/{main.c => bpfilter_umh.c} | 0
> 2 files changed, 11 insertions(+), 6 deletions(-)
> rename net/bpfilter/{main.c => bpfilter_umh.c} (100%)
>
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> index 39c6980..6571b30 100644
> --- a/net/bpfilter/Makefile
> +++ b/net/bpfilter/Makefile
> @@ -3,18 +3,23 @@
> # Makefile for the Linux BPFILTER layer.
> #
>
> -hostprogs-y := bpfilter_umh
> -bpfilter_umh-objs := main.o
> -HOSTCFLAGS += -I. -Itools/include/ -Itools/include/uapi
> -HOSTCC := $(CC)
that is a hack indeed. I don't like it either, but see below.
> -
> ifeq ($(CONFIG_BPFILTER_UMH), y)
> # builtin bpfilter_umh should be compiled with -static
> # since rootfs isn't mounted at the time of __init
> # function is called and do_execv won't find elf interpreter
> -HOSTLDFLAGS += -static
> +STATIC := -static
> endif
>
> +quiet_cmd_cc_user = CC $@
> + cmd_cc_user = $(CC) -Wall -Wmissing-prototypes -O2 -std=gnu89 \
> + -I$(srctree) -I$(srctree)/tools/include/ \
> + -I$(srctree)/tools/include/uapi $(STATIC) -o $@ $<
> +
> +$(obj)/bpfilter_umh: $(src)/bpfilter_umh.c FORCE
> + $(call if_changed,cc_user)
Does this scale?
Please see two top patches here:
https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/log/?h=ipt_bpf
that add more meat to bpfilter and a lot more files.
Recompiling all of them at once isn't nice either.
This Makefile needs different .c -> .o rules for bpfilter_kern.c files
that get compiled into kernel module and for the rest of umh files:
main.c ctor.c init.c gen.c etc
that need to be compiled .c -> .o differently.
I don't see how to do it without ugly hacks in Makefile.
In that sense HOSTCC = CC hack looked the least ugly to me that's
why I went with it.
Better ideas?
^ permalink raw reply
* Re: [PATCH 2/3] bpfilter: include bpfilter_umh in assembly instead of using objcopy
From: Alexei Starovoitov @ 2018-06-08 20:47 UTC (permalink / raw)
To: Masahiro Yamada
Cc: netdev, Alexei Starovoitov, David S . Miller, Arnd Bergmann,
Geert Uytterhoeven, linux-kernel, YueHaibing
In-Reply-To: <1528477930-7342-3-git-send-email-yamada.masahiro@socionext.com>
On Sat, Jun 09, 2018 at 02:12:09AM +0900, Masahiro Yamada wrote:
> Do not use the troublesome ELF magic. What is happening here is to
> embed a user-space program into the kernel. Simply wrap it in the
> assembly with the '.incbin' directive.
>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---
>
> net/bpfilter/Makefile | 15 ++-------------
> net/bpfilter/bpfilter_kern.c | 11 +++++------
> net/bpfilter/bpfilter_umh_blob.S | 7 +++++++
> 3 files changed, 14 insertions(+), 19 deletions(-)
> create mode 100644 net/bpfilter/bpfilter_umh_blob.S
>
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> index aafa720..39c6980 100644
> --- a/net/bpfilter/Makefile
> +++ b/net/bpfilter/Makefile
> @@ -15,18 +15,7 @@ ifeq ($(CONFIG_BPFILTER_UMH), y)
> HOSTLDFLAGS += -static
> endif
>
> -# a bit of elf magic to convert bpfilter_umh binary into a binary blob
> -# inside bpfilter_umh.o elf file referenced by
> -# _binary_net_bpfilter_bpfilter_umh_start symbol
> -# which bpfilter_kern.c passes further into umh blob loader at run-time
> -quiet_cmd_copy_umh = GEN $@
> - cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> - $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
> - -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
> - --rename-section .data=.init.rodata $< $@
> -
> -$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
> - $(call cmd,copy_umh)
> +$(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
>
> obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
> -bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
> +bpfilter-objs += bpfilter_kern.o bpfilter_umh_blob.o
> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
> index b13d058..fcc1a7c 100644
> --- a/net/bpfilter/bpfilter_kern.c
> +++ b/net/bpfilter/bpfilter_kern.c
> @@ -10,11 +10,8 @@
> #include <linux/file.h>
> #include "msgfmt.h"
>
> -#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
> -#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
> -
> -extern char UMH_start;
> -extern char UMH_end;
> +extern char bpfilter_umh_start;
> +extern char bpfilter_umh_end;
>
> static struct umh_info info;
> /* since ip_getsockopt() can run in parallel, serialize access to umh */
> @@ -89,7 +86,9 @@ static int __init load_umh(void)
> int err;
>
> /* fork usermode process */
> - err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
> + err = fork_usermode_blob(&bpfilter_umh_end,
> + &bpfilter_umh_end - &bpfilter_umh_start,
> + &info);
> if (err)
> return err;
> pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
> diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S
> new file mode 100644
> index 0000000..40311d1
> --- /dev/null
> +++ b/net/bpfilter/bpfilter_umh_blob.S
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> + .section .init.rodata, "a"
> + .global bpfilter_umh_start
> +bpfilter_umh_start:
> + .incbin "net/bpfilter/bpfilter_umh"
Interesting. I think this is good idea. Looks cleaner than objcopy magic.
btw CONFIG_OUTPUT_FORMAT already fixed by
commit 8d97ca6b6755 ("bpfilter: fix OUTPUT_FORMAT") in net tree.
Could you please rebase on top of that tree?
^ permalink raw reply
* Re: netdevice notifier and device private data
From: Michael Richardson @ 2018-06-08 19:37 UTC (permalink / raw)
To: Alexander Aring; +Cc: netdev, linux-wpan, linux-bluetooth
In-Reply-To: <20180608173455.vrnfvv7dlu4oxwqf@x220t>
[-- Attachment #1: Type: text/plain, Size: 822 bytes --]
Alexander Aring <aring@mojatatu.com> wrote:
Alex> I already see code outside who changed tun netdevice to the
Alex> ARPHRD_6LOWPAN type and I suppose they running into this
Alex> issue. (Btw: I don't know why somebody wants to changed that
Alex> type to ARPHRD_6LOWPAN on tun).
so that they can have the kernel do 6lowpan processing, emitting 6lowPAN
packets into userspace to be transfered into a radio via some proprietary
interface (including, for instance SLIP over USB cable to Contiki or OpenWSN stack,
set up to act as radio only)
--
] Never tell me the odds! | ipv6 mesh networks [
] Michael Richardson, Sandelman Software Works | network architect [
] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 464 bytes --]
^ permalink raw reply
* Re: netdevice notifier and device private data
From: Alexander Aring @ 2018-06-08 19:41 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, linux-wpan, linux-bluetooth
In-Reply-To: <20180608111457.0a9b4cae@xeon-e3>
Hi Stephen,
On Fri, Jun 08, 2018 at 11:14:57AM -0700, Stephen Hemminger wrote:
...
>
> notifiers are always called with RTNL mutex held
> and dev->type should not change unless RTNL is held.
thanks for you answer. I am not talking about any race between notifiers
vs dev->type change.
I am talking that dev->type was already changed and a upcoming notifier ends
in undefined behaviour when it derefences dev->priv. I have some notifier
which maps a cast from dev->type to a specific structure at dev->priv. This
structure is not there in tap/tun devices if they changed to "my" dev->type
and the notifier occurs.
- Alex
^ permalink raw reply
* Re: Qualcomm rmnet driver and qmi_wwan
From: Bjørn Mork @ 2018-06-08 19:10 UTC (permalink / raw)
To: Subash Abhinov Kasiviswanathan; +Cc: Daniele Palmas, Dan Williams, netdev
In-Reply-To: <8a77f905ddcd6a8136dd9f2d5de11438@codeaurora.org>
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> writes:
>> I followed Dan's advice and prepared a very basic test patch
>> (attached) for testing it through ip link.
>>
>> Basically things seem to be properly working with qmicli, but I needed
>> to modify a bit qmi_wwan, so I'm adding Bjørn that maybe can help.
>>
>> Bjørn,
>>
>> I'm trying to add support to rmnet in qmi_wwan: I had to modify the
>> code as in the attached test patch, but I'm not sure it is the right
>> way.
>>
>> This is done under the assumption that the rmnet device would be the
>> only one to register an rx handler to qmi_wwan, but it is probably
>> wrong.
>>
>> Basically I'm wondering if there is a more correct way to understand
>> if an rmnet device is linked to the real qmi_wwan device.
>>
>> Thanks,
>> Daniele
>
>
> Hi Daniele / Bjørn
>
> Is it possible to define a pass through mode in qmi_wwan. This is to
> ensure that all packets in MAP format are passed through instead of
> processing in qmi_wwan layer. The pass through mode would just call
> netif_receive_skb() on all these packets.
>
> That would allow all the packets to be intercepted by the rx_handler
> attached by rmnet which would subsequently de-multiplex and process
> the packets.
This sounds like a good idea. I probably won't have any time to look at
this in the near future, though. Sorry about that. Extremely overloaded
both at work and private right now...
But I trust that you and Daniele can work out something. Please keep me
CCed, but don't expect timely replies.
Bjørn
^ permalink raw reply
* Re: [PATCH net] failover: eliminate callback hell
From: Michael S. Tsirkin @ 2018-06-08 19:04 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Alexander Duyck, Samudrala, Sridhar, Jiri Pirko, KY Srinivasan,
Haiyang Zhang, David Miller, Netdev, Stephen Hemminger
In-Reply-To: <20180608113008.76cbf425@xeon-e3>
On Fri, Jun 08, 2018 at 11:30:08AM -0700, Stephen Hemminger wrote:
> * what about nested KVM on Hyper-V? Would it make sense to
> have a way to pass subset of VF queues to guest?
No as long as hyper-v doesn't have a vIOMMU.
--
MST
^ permalink raw reply
* Re: [PATCH net] failover: eliminate callback hell
From: Stephen Hemminger @ 2018-06-08 18:30 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Alexander Duyck, Samudrala, Sridhar, Jiri Pirko, KY Srinivasan,
Haiyang Zhang, David Miller, Netdev, Stephen Hemminger
In-Reply-To: <20180607201850-mutt-send-email-mst@kernel.org>
On Thu, 7 Jun 2018 20:22:15 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Jun 07, 2018 at 09:17:42AM -0700, Stephen Hemminger wrote:
> > On Thu, 7 Jun 2018 18:41:31 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > > > Why would DPDK care what we do in the kernel? Isn't it just slapping
> > > > > vfio-pci on the netdevs it sees?
> > > >
> > > > Alex, you are correct for Intel devices; but DPDK on Azure is not Intel based.,.
> > > > The DPDK support uses:
> > > > * Mellanox MLX5 which uses the Infinband hooks to do DMA directly to
> > > > userspace. This means VF netdev device must exist and be visible.
> > > > * Slow path using kernel netvsc device, TAP and BPF to get exception
> > > > path packets to userspace.
> > > > * A autodiscovery mechanism that to set all this up that relies on
> > > > 2 device model and sysfs.
> > >
> > > Could you describe what does it look for exactly? What will break if
> > > instead of MLX5 being a child of the PV, it's a child of the failover
> > > device?
> >
> > So in DPDK there is an internal four device model:
> > 1. failsafe is like failover in your model
> > 2. TAP is used like netvsc in kernel
> > 3. MLX5 is the VF
> > 4. vdev_netvsc is a pseudo device whose only reason to exist
> > is to glue everything together.
> >
> > Digging deeper inside...
> >
> > Vdev_netvsc does:
> > * driver is started in a convuluted way off device arguments
> > * probe routine for driver runs
> > - scans list of kernel interfaces in sysfs
> > - matches those using VMBUS
>
> Could you tell a bit more what does this step entail?
Quick code high/low lights.
ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, 1, name,
kvargs, specified, &matched);
static int
vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface,
const struct ether_addr *eth_addr,
va_list ap), int is_netvsc, ...)
{
struct if_nameindex *iface = if_nameindex();
for (i = 0; iface[i].if_name; ++i) {
is_netvsc_ret = vdev_netvsc_iface_is_netvsc(&iface[i]) ? 1 : 0;
if (is_netvsc ^ is_netvsc_ret)
continue;
strlcpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name));
if (ioctl(s, SIOCGIFHWADDR, &req) == -1) {
}
memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data,
RTE_DIM(eth_addr.addr_bytes));
ret = func(&iface[i], ð_addr, ap); << func is vdev_netvsc_netvsc_probe
static int
vdev_netvsc_netvsc_probe(const struct if_nameindex *iface,
const struct ether_addr *eth_addr,
va_list ap)
{
/* Routed NetVSC should not be probed. */
if (vdev_netvsc_has_route(iface, AF_INET) ||
vdev_netvsc_has_route(iface, AF_INET6)) {
if (!specified)
return 0;
DRV_LOG(WARNING, "probably using routed NetVSC interface \"%s\""
" (index %u)", iface->if_name, iface->if_index);
}
/* Create interface context. */
ctx = calloc(1, sizeof(*ctx));
...
>
> > - skip netvsc devices that have an IPV4 route
> > * scan for PCI devices that have same MAC address as kernel netvsc
> > devices discovered in previous step
> > * add these interfaces to arguments to failsafe
> >
> > Then failsafe configures based on arguments on device
> >
> > The code works but is specific to the Azure hardware model, and exposes lots
> > of things to application that it should not have to care about.
> >
> > If you try and walk through this code in DPDK, you will see why I have developed
> > a dislike for high levels of indirection.
> >
> >
> >
>
> Thanks that was helpful! I'll try to poke at it next week. Just from
> the description it seems the kernel is merely used to locate the MAC
> address through sysfs and that for this DPDK code to keep working the
> hidden device must be hidden from it in sysfs - is that a fair summary?
What is the point of the 3 device model? What value does it have
to userspace? How would userspace use each of the three devices.
Going back to 3 device model really doesn't make sense to me if
there is not visible benefit.
Some other considerations:
* there is ongoing development to support RDMA failover as
well in netvsc.
* there is a new driver which implements the VMBUS protocol
in userspace for DPDK. This gets rid of several layers and
removes any special scanning code. The vmbus device is
unbound from netvsc and bound to UIO device. Then the user
space DPDK driver manages all the host signalling events
including VF discovery. It is really 2 device model done
all in userspace. The kernel device is still needed when
the VF is mellanox; because that is how the MLX DPDK driver
rolls.
* what about nested KVM on Hyper-V? Would it make sense to
have a way to pass subset of VF queues to guest?
^ permalink raw reply
* Re: netdevice notifier and device private data
From: Stephen Hemminger @ 2018-06-08 18:14 UTC (permalink / raw)
To: Alexander Aring; +Cc: netdev, linux-wpan, linux-bluetooth
In-Reply-To: <20180608173455.vrnfvv7dlu4oxwqf@x220t>
On Fri, 8 Jun 2018 13:34:55 -0400
Alexander Aring <aring@mojatatu.com> wrote:
> Hey netdev community,
>
> I am trying to solve some issue which Eric Dumazet points to me by
> commit ca0edb131bdf ("ieee802154: 6lowpan: fix possible NULL deref in
> lowpan_device_event()").
>
> The issue is that dev->type can be changed during runtime. We don't have
> any problems with the netdevice notifier which Eric Dumazet fixed. I am
> bother with another netdevice notifier which is broken because the same
> tun/tap feature and I don't have any dev->$SUBSYSTEM_DEV_POINTER to check
> if this is my netdevice type.
>
> This netdevice notifier will access the dev->priv area which is only
> available for the dev->type which was allocated and initialized with the
> right dev->priv room. If a tap/tun netdevice changed their dev->type I
> might have an illegal read of netdev->priv and I can't confirm that it
> has the data which I cast to it. The reason for that is that tap/tun
> netdevices doesn't run my netdevice init.
>
> I already see code outside who changed tun netdevice to the
> ARPHRD_6LOWPAN type and I suppose they running into this issue.
> (Btw: I don't know why somebody wants to changed that type to
> ARPHRD_6LOWPAN on tun).
>
> My question is:
>
> How we deal with that? Is it forbidden to access dev->priv from a
> global netdevice notifier which only checks for dev->type?
>
> I could solve it like Eric Dumazet and introduce a special
> dev->$SUBSYSTEM_DEV_POINTER and check on it if set. At least tun/tap
> will not set these pointers, then I am sure the netdevice was running
> through my init function. Seems for me the best solution right now and
> I think I will go for it.
>
> I assumed before the data of dev->priv is binded to dev->type.
> This tun/tap feature will break at least my handling and I am not sure
> if there are others users which using dev->priv in netdevice notifier
> and don't check on dev->$SUBSYSTEM_DEV_POINTER if they have one.
>
> Thanks for everybody in advance to solve this issue.
>
> - Alex
notifiers are always called with RTNL mutex held
and dev->type should not change unless RTNL is held.
^ permalink raw reply
* Re: [PATCH bpf] bpf: implement dummy fops for bpf objects
From: Alexei Starovoitov @ 2018-06-08 18:05 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: ast, netdev
In-Reply-To: <20180608161034.3854-1-daniel@iogearbox.net>
On Fri, Jun 08, 2018 at 06:10:34PM +0200, Daniel Borkmann wrote:
> syzkaller was able to trigger the following warning in
> do_dentry_open():
>
> WARNING: CPU: 1 PID: 4508 at fs/open.c:778 do_dentry_open+0x4ad/0xe40 fs/open.c:778
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 4508 Comm: syz-executor867 Not tainted 4.17.0+ #90
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> [...]
> vfs_open+0x139/0x230 fs/open.c:908
> do_last fs/namei.c:3370 [inline]
> path_openat+0x1717/0x4dc0 fs/namei.c:3511
> do_filp_open+0x249/0x350 fs/namei.c:3545
> do_sys_open+0x56f/0x740 fs/open.c:1101
> __do_sys_openat fs/open.c:1128 [inline]
> __se_sys_openat fs/open.c:1122 [inline]
> __x64_sys_openat+0x9d/0x100 fs/open.c:1122
> do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Problem was that prog and map inodes in bpf fs did not
> implement a dummy file open operation that would return an
> error. The patch in do_dentry_open() checks whether f_ops
> are present and if not bails out with an error. While this
> may be fine, we really shouldn't be throwing a warning
> though. Thus follow the model similar to bad_file_ops and
> reject the request unconditionally with -EIO.
>
> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
> Reported-by: syzbot+2e7fcab0f56fdbb330b8@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Applied, Thanks
^ permalink raw reply
* [PATCH v2 1/1] iproute2: Add support for a few routing protocols
From: Donald Sharp @ 2018-06-08 17:47 UTC (permalink / raw)
To: netdev, dsahern, stephen
In-Reply-To: <20180608124638.4895-1-sharpd@cumulusnetworks.com>
Add support for:
BGP
ISIS
OSPF
RIP
EIGRP
Routing protocols to iproute2.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
---
v2: Update to latest version of code.
etc/iproute2/rt_protos | 5 +++++
lib/rt_names.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/etc/iproute2/rt_protos b/etc/iproute2/rt_protos
index 2a9ee01b..b3a0ec8f 100644
--- a/etc/iproute2/rt_protos
+++ b/etc/iproute2/rt_protos
@@ -16,3 +16,8 @@
15 ntk
16 dhcp
42 babel
+186 bgp
+187 isis
+188 ospf
+189 rip
+192 eigrp
diff --git a/lib/rt_names.c b/lib/rt_names.c
index a02db35e..66d5f2f0 100644
--- a/lib/rt_names.c
+++ b/lib/rt_names.c
@@ -134,6 +134,11 @@ static char *rtnl_rtprot_tab[256] = {
[RTPROT_XORP] = "xorp",
[RTPROT_NTK] = "ntk",
[RTPROT_DHCP] = "dhcp",
+ [RTPROT_BGP] = "bgp",
+ [RTPROT_ISIS] = "isis",
+ [RTPROT_OSPF] = "ospf",
+ [RTPROT_RIP] = "rip",
+ [RTPROT_EIGRP] = "eigrp",
};
--
2.14.4
^ permalink raw reply related
* [PATCH v2 0/1] Addition of new routing protocols for iproute2
From: Donald Sharp @ 2018-06-08 17:47 UTC (permalink / raw)
To: netdev, dsahern, stephen
In-Reply-To: <20180608124638.4895-1-sharpd@cumulusnetworks.com>
The linux kernel recently accepted some new RTPROT values for some
fairly standard routing protocols. This commit brings in support
for iproute2 to handle these new values.
v2 - Update to latest version of master which has rtnetlink.h code and drop
of work already done.
Donald Sharp (1):
iproute2: Add support for a few routing protocols
etc/iproute2/rt_protos | 5 +++++
lib/rt_names.c | 5 +++++
2 files changed, 10 insertions(+)
--
2.14.4
^ permalink raw reply
* netdevice notifier and device private data
From: Alexander Aring @ 2018-06-08 17:34 UTC (permalink / raw)
To: netdev; +Cc: linux-wpan, linux-bluetooth
Hey netdev community,
I am trying to solve some issue which Eric Dumazet points to me by
commit ca0edb131bdf ("ieee802154: 6lowpan: fix possible NULL deref in
lowpan_device_event()").
The issue is that dev->type can be changed during runtime. We don't have
any problems with the netdevice notifier which Eric Dumazet fixed. I am
bother with another netdevice notifier which is broken because the same
tun/tap feature and I don't have any dev->$SUBSYSTEM_DEV_POINTER to check
if this is my netdevice type.
This netdevice notifier will access the dev->priv area which is only
available for the dev->type which was allocated and initialized with the
right dev->priv room. If a tap/tun netdevice changed their dev->type I
might have an illegal read of netdev->priv and I can't confirm that it
has the data which I cast to it. The reason for that is that tap/tun
netdevices doesn't run my netdevice init.
I already see code outside who changed tun netdevice to the
ARPHRD_6LOWPAN type and I suppose they running into this issue.
(Btw: I don't know why somebody wants to changed that type to
ARPHRD_6LOWPAN on tun).
My question is:
How we deal with that? Is it forbidden to access dev->priv from a
global netdevice notifier which only checks for dev->type?
I could solve it like Eric Dumazet and introduce a special
dev->$SUBSYSTEM_DEV_POINTER and check on it if set. At least tun/tap
will not set these pointers, then I am sure the netdevice was running
through my init function. Seems for me the best solution right now and
I think I will go for it.
I assumed before the data of dev->priv is binded to dev->type.
This tun/tap feature will break at least my handling and I am not sure
if there are others users which using dev->priv in netdevice notifier
and don't check on dev->$SUBSYSTEM_DEV_POINTER if they have one.
Thanks for everybody in advance to solve this issue.
- Alex
^ permalink raw reply
* Re: [PATCH 1/2] iproute2: Add support for a few routing protocols
From: Stephen Hemminger @ 2018-06-08 17:29 UTC (permalink / raw)
To: Donald Sharp; +Cc: netdev, dsahern
In-Reply-To: <20180608124638.4895-2-sharpd@cumulusnetworks.com>
On Fri, 8 Jun 2018 08:46:37 -0400
Donald Sharp <sharpd@cumulusnetworks.com> wrote:
> Add support for:
>
> BGP
> ISIS
> OSPF
> RIP
> EIGRP
>
> Routing protocols to iproute2.
>
> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
> ---
> etc/iproute2/rt_protos | 5 +++++
> include/linux/rtnetlink.h | 5 +++++
> lib/rt_names.c | 5 +++++
> 3 files changed, 15 insertions(+)
>
I just merged iproute2-next into iproute2 and rtnetlink.h is now up to date.
Please rebase your patches.
^ permalink raw reply
* Re: [PATCH] Bluetooth: hci_bcm: Configure SCO routing automatically
From: Rob Herring @ 2018-06-08 17:25 UTC (permalink / raw)
To: attitokes
Cc: David S. Miller, Mark Rutland, Marcel Holtmann, Johan Hedberg,
Artiom Vaskov, netdev, devicetree, linux-kernel@vger.kernel.org,
open list:BLUETOOTH DRIVERS
In-Reply-To: <20180608162009.22762-1-attitokes@gmail.com>
On Fri, Jun 8, 2018 at 10:20 AM, <attitokes@gmail.com> wrote:
> From: Attila Tőkés <attitokes@gmail.com>
>
> Added support to automatically configure the SCO packet routing at the device setup. The SCO packets are used with the HSP / HFP profiles, but in some devices (ex. CYW43438) they are routed to a PCM output by default. This change allows sending the vendor specific HCI command to configure the SCO routing. The parameters of the command are loaded from the device tree.
Please wrap your commit msg.
>
> Signed-off-by: Attila Tőkés <attitokes@gmail.com>
> ---
> .../bindings/net/broadcom-bluetooth.txt | 7 ++
Please split bindings to separate patch.
> drivers/bluetooth/hci_bcm.c | 72 +++++++++++++++++++
> 2 files changed, 79 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt b/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
> index 4194ff7e..aea3a094 100644
> --- a/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
> +++ b/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
> @@ -21,6 +21,12 @@ Optional properties:
> - clocks: clock specifier if external clock provided to the controller
> - clock-names: should be "extclk"
>
> + SCO routing parameters:
> + - sco-routing: 0-3 (PCM, Transport, Codec, I2S)
> + - pcm-interface-rate: 0-4 (128 Kbps - 2048 Kbps)
> + - pcm-frame-type: 0 (short), 1 (long)
> + - pcm-sync-mode: 0 (slave), 1 (master)
> + - pcm-clock-mode: 0 (slave), 1 (master)
Are these Broadcom specific? Properties need either vendor prefix or
to be documented in a common location. I think these look like the
latter.
However, this also looks incomplete to me. For example, which SoC
I2S/PCM port is BT audio connected to and how does it fit into the
existing audio related bindings? There's been work on HDMI audio
bindings which would be similar (except for the SCO over UART at
least).
>
> Example:
>
> @@ -31,5 +37,6 @@ Example:
> bluetooth {
> compatible = "brcm,bcm43438-bt";
> max-speed = <921600>;
> + sco-routing = <1>; /* 1 = transport (UART) */
> };
> };
> diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c
> index ddbd8c6a..0e729534 100644
> --- a/drivers/bluetooth/hci_bcm.c
> +++ b/drivers/bluetooth/hci_bcm.c
> @@ -83,6 +83,16 @@
> * @hu: pointer to HCI UART controller struct,
> * used to disable flow control during runtime suspend and system sleep
> * @is_suspended: whether flow control is currently disabled
> + *
> + * SCO routing parameters:
> + * used as the parameters for the bcm_set_pcm_int_params command
> + * @sco_routing:
> + * >= 255 (skip SCO routing configuration)
> + * 0-3 (PCM, Transport, Codec, I2S)
> + * @pcm_interface_rate: 0-4 (128 Kbps - 2048 Kbps)
> + * @pcm_frame_type: 0 (short), 1 (long)
> + * @pcm_sync_mode: 0 (slave), 1 (master)
> + * @pcm_clock_mode: 0 (slave), 1 (master)
> */
> struct bcm_device {
> /* Must be the first member, hci_serdev.c expects this. */
> @@ -114,6 +124,13 @@ struct bcm_device {
> struct hci_uart *hu;
> bool is_suspended;
> #endif
> +
> + /* SCO routing parameters */
> + u8 sco_routing;
> + u8 pcm_interface_rate;
> + u8 pcm_frame_type;
> + u8 pcm_sync_mode;
> + u8 pcm_clock_mode;
> };
>
> /* generic bcm uart resources */
> @@ -189,6 +206,40 @@ static int bcm_set_baudrate(struct hci_uart *hu, unsigned int speed)
> return 0;
> }
>
> +static int bcm_configure_sco_routing(struct hci_uart *hu, struct bcm_device *bcm_dev)
> +{
> + struct hci_dev *hdev = hu->hdev;
> + struct sk_buff *skb;
> + struct bcm_set_pcm_int_params params;
> +
> + if (bcm_dev->sco_routing >= 0xff) {
> + /* SCO routing configuration should be skipped */
> + return 0;
> + }
> +
> + bt_dev_dbg(hdev, "BCM: Configuring SCO routing (%d %d %d %d %d)",
> + bcm_dev->sco_routing, bcm_dev->pcm_interface_rate, bcm_dev->pcm_frame_type,
> + bcm_dev->pcm_sync_mode, bcm_dev->pcm_clock_mode);
> +
> + params.routing = bcm_dev->sco_routing;
> + params.rate = bcm_dev->pcm_interface_rate;
> + params.frame_sync = bcm_dev->pcm_frame_type;
> + params.sync_mode = bcm_dev->pcm_sync_mode;
> + params.clock_mode = bcm_dev->pcm_clock_mode;
> +
> + /* Send the SCO routing configuration command */
> + skb = __hci_cmd_sync(hdev, 0xfc1c, sizeof(params), ¶ms, HCI_CMD_TIMEOUT);
> + if (IS_ERR(skb)) {
> + int err = PTR_ERR(skb);
> + bt_dev_err(hdev, "BCM: failed to configure SCO routing (%d)", err);
> + return err;
> + }
> +
> + kfree_skb(skb);
> +
> + return 0;
> +}
> +
> /* bcm_device_exists should be protected by bcm_device_lock */
> static bool bcm_device_exists(struct bcm_device *device)
> {
> @@ -534,6 +585,9 @@ static int bcm_setup(struct hci_uart *hu)
> host_set_baudrate(hu, speed);
> }
>
> + /* Configure SCO routing if needed */
> + bcm_configure_sco_routing(hu, bcm->dev);
> +
> finalize:
> release_firmware(fw);
>
> @@ -1004,9 +1058,21 @@ static int bcm_acpi_probe(struct bcm_device *dev)
> }
> #endif /* CONFIG_ACPI */
>
> +static void read_u8_device_property(struct device *device, const char *property, u8 *destination) {
> + u32 temp;
> + if (device_property_read_u32(device, property, &temp) == 0) {
> + *destination = temp & 0xff;
> + }
> +}
> +
> static int bcm_of_probe(struct bcm_device *bdev)
> {
> device_property_read_u32(bdev->dev, "max-speed", &bdev->oper_speed);
> + read_u8_device_property(bdev->dev, "sco-routing", &bdev->sco_routing);
> + read_u8_device_property(bdev->dev, "pcm-interface-rate", &bdev->pcm_interface_rate);
> + read_u8_device_property(bdev->dev, "pcm-frame-type", &bdev->pcm_frame_type);
> + read_u8_device_property(bdev->dev, "pcm-sync-mode", &bdev->pcm_sync_mode);
> + read_u8_device_property(bdev->dev, "pcm-clock-mode", &bdev->pcm_clock_mode);
These are actually broken because the DT properties are 32-bit.
Rob
^ permalink raw reply
* [ANNOUNCE] iproute 4.17
From: Stephen Hemminger @ 2018-06-08 17:25 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel
New iproute2 release for Linux 4.17
Lastest version iproute2 utility to support new features in Linux 4.17.
In addition to usual range of small changes, some items worth noting:
* RDMA tool has gotten lots of updates
* lots of devlink updates
* more bpf tool updates from Daniel Borkmann
* more VRF related changes
* ss -s command no longer reports socket statistics off slab cache.
This was broken since early in 2.6 development cycle and users only
noticed 10 yrs later.
* The ip command subtypes support JSON output.
Most of tc commands as well.
The tarball can be dowloaded from:
https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-4.17.0.tar.gz
The upstream repositories for master and net-next branch are now
split. Master branch is at:
git://git.kernel.org/pub/scm/network/iproute2/iproute2.git
and patches for next release are in (master branch):
git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
Report problems (or enhancements) to the netdev@vger.kernel.org mailing list.
---
Adam Vyskovsky (1):
tc: fix an off-by-one error while printing tc actions
Alexander Alemayhu (4):
man: add examples to ip.8
man: fix man page warnings
tc: bpf: add ppc64 and sparc64 to list of archs with eBPF support
examples/bpf: update list of examples
Alexander Aring (5):
tc: m_ife: allow ife type to zero
tc: m_ife: print IEEE ethertype format
tc: m_ife: report about kernels default type
man: tc-ife: add default type note
tc: m_ife: fix match tcindex parsing
Alexander Heinlein (1):
ip/xfrm: Fix deleteall when having many policies installed
Alexander Zubkov (5):
iproute: list/flush/save filter also by metric
iproute: "list/flush/save default" selected all of the routes
treat "default" and "all"/"any" addresses differenty
treat "default" and "all"/"any" addresses differenty
arrange prefix parsing code after redundant patches
Alexey Kodanev (1):
fix typo in ip-xfrm man page, rmd610 -> rmd160
Amir Vadai (14):
libnetlink: Introduce rta_getattr_be*()
tc/cls_flower: Classify packet in ip tunnels
tc/act_tunnel: Introduce ip tunnel action
tc/pedit: Fix a typo in pedit usage message
tc/pedit: Extend pedit to specify offset relative to mac/transport headers
tc/pedit: Introduce 'add' operation
tc/pedit: p_ip: introduce editing ttl header
tc/pedit: Support fields bigger than 32 bits
tc/pedit: p_eth: ETH header editor
tc/pedit: p_tcp: introduce pedit tcp support
pedit: Fix a typo in warning
pedit: Do not allow using retain for too big fields
pedit: Check for extended capability in protocol parser
pedit: Introduce ipv6 support
Amritha Nambiar (4):
tc/mqprio: Offload mode and shaper options in mqprio
flower: Represent HW traffic classes as classid values
man: tc-mqprio: add documentation for new offload options
man: tc-flower: add explanation for hw_tc option
Andreas Henriksson (1):
ss: fix help/man TCP-STATE description for listening
Antonio Quartulli (2):
ss: fix crash when skipping disabled header field
ss: fix NULL pointer access when parsing unix sockets with oldformat
Arkadi Sharshevsky (15):
devlink: Change netlink attribute validation
devlink: Add support for pipeline debug (dpipe)
bridge: Distinguish between externally learned vs offloaded FDBs
devlink: Make match/action parsing more flexible
devlink: Add support for special format protocol headers
devlink: Add support for protocol IPv4/IPv6/Ethernet special formats
devlink: Ignore unknown attributes
devlink: Change empty line indication with indentations
devlink: mnlg: Add support for extended ack
devlink: Add support for devlink resource abstraction
devlink: Add support for hot reload
devlink: Move dpipe context from heap to stack
devlink: Add support for resource/dpipe relation
devlink: Update man pages and add resource man
devlink: Fix error reporting
Asbjørn Sloth Tønnesen (2):
testsuite: refactor kernel config search
testsuite: search for kernel config in /boot
Baruch Siach (5):
tc: add missing limits.h header
ip: include libc headers first
lib: fix multiple strlcpy definition
README: update libdb build dependency information
arpd: remove pthread dependency
Benjamin LaHaise (2):
f_flower: don't set TCA_FLOWER_KEY_ETH_TYPE for "protocol all"
tc: flower: support for matching MPLS labels
Boris Pismenny (1):
ip xfrm: Add xfrm state crypto offload
Casey Callendrello (1):
netns: make /var/run/netns bind-mount recursive
Chris Mi (3):
tc: fix command "tc actions del" hang issue
lib/libnetlink: Add a new function rtnl_talk_iov
tc: Add batchsize feature for filter and actions
Christian Brauner (1):
netns: allow negative nsid
Christian Ehrhardt (2):
tests: read limited amount from /dev/urandom
tests: make sure rand_dev suffix has 6 chars
Christoph Paasch (1):
ip: add fastopen_no_cookie option to ip route
Craig Gallek (2):
gre6: fix copy/paste bugs in GREv6 attribute manipulation
iplink: Expose IFLA_*_FWMARK attributes for supported link types
Cyrill Gorcunov (2):
libnetlink: Add test for error code returned from netlink reply
ss: Add inet raw sockets information gathering via netlink diag interface
Daniel Borkmann (19):
bpf: make tc's bpf loader generic and move into lib
bpf: check for owner_prog_type and notify users when differ
bpf: add initial support for attaching xdp progs
{f,m}_bpf: dump tag over insns
bpf: test for valid type in bpf_get_work_dir
bpf: add support for generic xdp
bpf: update printing of generic xdp mode
bpf: dump error to the user when retrieving pinned prog fails
bpf: indicate lderr when bpf_apply_relo_data fails
bpf: remove obsolete samples
bpf: support loading map in map from obj
bpf: dump id/jited info for cls/act programs
bpf: improve error reporting around tail calls
bpf: fix mnt path when from env
bpf: unbreak libelf linkage for bpf obj loader
bpf: minor cleanups for bpf_trace_pipe
bpf: consolidate dumps to use bpf_dump_prog_info
json: move json printer to common library
bpf: properly output json for xdp
David Ahern (56):
Makefile: really suppress printing of directories
lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
bpf: export bpf_prog_load
bpf: Add BPF_ macros
move cmd_exec to lib utils
Add filesystem APIs to lib
change name_is_vrf to return index
libnetlink: Add variant of rtnl_talk that does not display RTNETLINK answers error
Introduce ip vrf command
Fix compile warning in get_addr_1
ip vrf: Move kernel config hint to prog_load failure
ip vrf: Refactor ipvrf_identify
ip vrf: Fix reset to default VRF
ip netns: Reset vrf to default VRF on namespace switch
ip vrf: Fix run-on error message on mkdir failure
ip vrf: Improve cgroup2 error messages
ip vrf: Improve bpf error messages
Add support for rt_protos.d
rttable: Fix invalid range checking when table id is converted to u32
ip route: error out on multiple via without nexthop keyword
ip route: Make name of protocol 0 consistent
ip vrf: Handle vrf in a cgroup hierarchy
ip netns: refactor netns_identify
ip vrf: Handle VRF nesting in namespace
ip vrf: Detect invalid vrf name in pids command
ip: Add support for MPLS netconf
ip route: Add missing space between nexthop and via for mpls multipath routes
netlink: Add flag to suppress print of nlmsg error
ip netconf: Show all address families by default in dumps
ip netconf: show all families on dev request
ip vrf: Add command name next to pid
ip vrf: Add command name next to pid
ip: mpls: fix printing of mpls labels
ip: add support for more MPLS labels
netlink: Change rtnl_dump_done to always show error
ip address: Export ip_linkaddr_list
ip address: Move filter struct to ip_common.h
ip address: Change print_linkinfo_brief to take filter as an input
ip vrf: Add show command
lib: Dump ext-ack string by default
libnetlink: Fix extack attribute parsing
libnetlink: Handle extack messages for non-error case
Update headers from 4.15-rc3
Restore --no-print-directory option for silent builds
Update kernel headers to 4.15-rc8
Update kernel headers to 4.16.0-rc2+
Update kernel headers to 08009a760213
Import tc_em_ipt.h from kernel at commit 08009a760213
libnetlink: __rtnl_talk_iov should only loop max iovlen times
Update kernel headers to 4.16.0-rc4+
Update kernel headers
Update kernel headers
devlink: Print size of -1 as unlimited
utils: Do not reset family for default, any, all addresses
ip route: Print expires as signed int
iplink_vrf: Save device index from response for return code
David Forster (1):
ip6tunnel: Align ipv6 tunnel key display with ipv4
David Lebrun (9):
ip: add ip sr command to control SR-IPv6 internal structures
iproute: add support for SR-IPv6 lwtunnel encapsulation
man: add documentation for IPv6 SR commands
iproute: fix compilation issue with older glibc
iproute: add helper functions for SRH processing
iproute: add support for SRv6 local segment processing
man: add documentation for seg6local lwt
iproute: add support for seg6 l2encap mode
man: add documentation for seg6 l2encap mode
David Michael (1):
tc: make tc linking depend on libtc.a
Davide Caratti (4):
tc: m_csum: add support for SCTP checksum
tc: fix typo in tc-tcindex man page
tc: bash-completion: add missing 'classid' keyword
tc: fix parsing of the control action
Donald Sharp (5):
ip: mroute: Add table output to show command
ip: Properly display AF_BRIDGE address information for neighbor events
ip: Use the `struct fib_rule_hdr` for rules
ip: Display ip rule protocol used
ip: Allow rules to accept a specified protocol
Eli Cohen (1):
iplink: Update usage in help message
Eric Dumazet (2):
ss: print tcpi_rcv_mss and tcpi_advmss
tc: fq: support low_rate_threshold attribute
Eyal Birger (2):
tc: ematch: add parse_eopt_argv() method for providing ematches with argv parameters
tc: add em_ipt ematch for calling xtables matches from tc matching context
Filip Moc (1):
ip fou: pass family attribute as u8
Gal Pressman (3):
iplink: Validate minimum tx rate is less than maximum tx rate
ipaddress: Make sure VF min/max rate API is supported before using it
man: Document the meaning of zero in min/max_tx_rate parameters
GhantaKrishnamurthy MohanKrishna (1):
ss: Add support for TIPC socket diag in ss tool
Girish Moodalbail (2):
vxlan: Add support for modifying vxlan device attributes
geneve: support for modifying geneve device
Greg Greenway (1):
Add "show" subcommand to "ip fou"
Guillaume Nault (3):
ip/l2tp: remove offset and peer-offset options
l2tp: no need to export session offsets in JSON output
bridge: fix typo in hairpin error message
Hadar Hen Zion (4):
tc/cls_flower: Add dest UDP port to tunnel params
tc/m_tunnel_key: Add dest UDP port to tunnel key action
tc/cls_flower: Add to the usage encapsulation dest UDP port
tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
Hangbin Liu (12):
iplink: bridge: add support for IFLA_BR_FDB_FLUSH
iplink: bridge: add support for IFLA_BR_VLAN_STATS_ENABLED
iplink: bridge: add support for IFLA_BR_MCAST_STATS_ENABLED
iplink: bridge: add support for IFLA_BR_MCAST_IGMP_VERSION
iplink: bridge: add support for IFLA_BR_MCAST_MLD_VERSION
iplink: bridge_slave: add support for IFLA_BRPORT_FLUSH
man: ip-link.8: Document bridge_slave fdb_flush option
man: ip-link.8: Document bridge_slave fdb_flush option
ip neigh: allow flush FAILED neighbour entry
utils: return default family when rtm_family is not RTNL_FAMILY_IPMR/IP6MR
lib/libnetlink: re malloc buff if size is not enough
lib/libnetlink: update rtnl_talk to support malloc buff at run time
Hoang Le (1):
tipc: TIPC_NLA_LINK_NAME value pass on nesting entry TIPC_NLA_LINK
Ido Schimmel (2):
iproute: Display offload indication per-nexthop
iproute: Parse last nexthop in a multipath route
Ivan Delalande (2):
utils: add print_escape_buf to format and print arbitrary bytes
ss: print MD5 signature keys configured on TCP sockets
Ivan Vecera (3):
lib: make resolve_hosts variable common
devlink: add batch command support
devlink: don't enforce NETLINK_{CAP,EXT}_ACK sock opts
Jakub Kicinski (23):
bpf: print xdp offloaded mode
bpf: add xdpdrv for requesting XDP driver mode
bpf: allow requesting XDP HW offload
bpf: initialize the verifier log
bpf: pass program type in struct bpf_cfg_in
bpf: keep parsed program mode in struct bpf_cfg_in
bpf: allocate opcode table in struct bpf_cfg_in
bpf: split parse from program loading
bpf: rename bpf_parse_common() to bpf_parse_and_load_common()
bpf: expose bpf_parse_common() and bpf_load_common()
bpf: allow loading programs for a specific ifindex
{f, m}_bpf: don't allow specifying multiple bpf programs
tc_filter: resolve device name before parsing filter
f_bpf: communicate ifindex for eBPF offload
iplink: communicate ifindex for xdp offload
ip: link: add support for netdevsim device type
tc: red: allow setting th_min and th_max to the same value
bpf: support map offload
tc: red: JSON-ify RED output
tc: prio: JSON-ify prio output
ip: address: fix stats64 JSON object name
tc: fix second printing of requeues
iplink_geneve: correct size of message to avoid spurious errors
Jakub Sitnicki (2):
iproute: Remove useless check for nexthop keyword when setting RTA_OIF
iproute: Abort if nexthop cannot be parsed
Jamal Hadi Salim (6):
utils: make hex2mem available to all users
actions: Add support for user cookies
tc actions: Improved batching and time filtered dumping
actions: update the man page to describe the "since" time filter
tc/actions: introduce support for jump action
tc: Fix filter protocol output
Jean-Philippe Brucker (1):
ss: fix NULL dereference when rendering without header
Jesus Sanchez-Palencia (1):
man: Clarify idleslope calculation for tc-cbs
Jiri Benc (3):
Revert "man pages: add man page for skbmod action"
tc: m_tunnel_key: reformat the usage text
tc: m_tunnel_key: add csum/nocsum option
Jiri Kosina (2):
iproute2: tc: introduce build dependency on libnetlink
iproute2: add support for invisible qdisc dumping
Jiri Pirko (28):
devlink: use DEVLINK_CMD_ESWITCH_* instead of DEVLINK_CMD_ESWITCH_MODE_*
tc_filter: add support for chain index
tc: actions: add helpers to parse and print control actions
tc/actions: introduce support for goto chain action
tc: flower: add support for tcp flags
tc: gact: fix control action parsing
tc: add support for TRAP action
tc: don't print error message on miss when parsing action with default
tc: move action cookie print out of the stats if
tc: remove action cookie len from printout
tc: jsonify qdisc core
tc: jsonify stats2
tc: jsonify fq_codel qdisc
tc: jsonify htb qdisc
tc: jsonify filter core
tc: jsonify flower filter
tc: jsonify matchall filter
tc: jsonify actions core
tc: jsonify gact action
tc: jsonify mirred action
tc: jsonify vlan action
man: add -json option to tc manpage
tc: fix json array closing
tc: introduce tc_qdisc_block_exists helper
tc: introduce support for block-handle for filter operations
tc: implement ingress/egress block index attributes for qdiscs
devlink: fix port new monitoring message typo
man: fix devlink object list
Joe Stringer (1):
bpf: Print section name when hitting non ld64 issue
Jon Maloy (3):
tipc: change family attribute from u32 to u16
tipc: introduce command for handling a new 128-bit node identity
tipc: change node address printout formats
Julien Fortin (31):
ip: vfinfo: remove code duplication for IFLA_VF_RSS_QUERY_EN
color: add new COLOR_NONE and disable_color function
ip: add new command line argument -json (mutually exclusive with -color)
json_writer: add new json handlers (null, float with format, lluint, hu)
ip: ip_print: add new API to print JSON or regular format output
ip: ipaddress.c: add support for json output
ip: iplink.c: open/close json obj for ip -brief -json link show dev DEV
ip: iplink_bond.c: add json output support
ip: iplink_bond_slave.c: add json output support (info_slave_data)
ip: iplink_hsr.c: add json output support
ip: iplink_bridge.c: add json output support
ip: iplink_bridge_slave.c: add json output support
ip: iplink_can.c: add json output support
ip: iplink_geneve.c: add json output support
ip: iplink_ipoib.c: add json output support
ip: iplink_ipvlan.c: add json output support
ip: iplink_vrf.c: add json output support
ip: iplink_vxlan.c: add json output support
ip: iplink_xdp.c: add json output support
ip: ipmacsec.c: add json output support
ip: link_gre.c: add json output support
ip: link_gre6.c: add json output support
ip: link_ip6tnl.c: add json output support
ip: link_iptnl.c: add json output support
ip: link_vti.c: add json output support
ip: link_vti6.c: add json output support
ip: link_macvlan.c: add json output support
ip: iplink_vlan.c: add json output support
ip: ipaddress: fix missing space after prefixlen
lib: json_print: rework 'new_json_obj' drop FILE* argument
lib: json_print: rework 'new_json_obj' drop FILE* argument
Khem Raj (1):
tc: include stdint.h explicitly for UINT16_MAX
Krister Johansen (3):
iptunnel: document mode parameter for sit tunnels
iptunnel: add support for mpls/ip to sit tunnels
iptunnel: add support for mpls/ip to ipip tunnels
Leon Romanovsky (34):
devlink: Call dl_free in early exit case
utils: Move BIT macro to common header
rdma: Add basic infrastructure for RDMA tool
rdma: Add dev object
rdma: Add link object
rdma: Add json and pretty outputs
rdma: Implement json output for dev object
rdma: Add json output to link object
rdma: Add initial manual for the tool
ip: Fix compilation break on old systems
rdma: Reduce scope of _dev_map_lookup call
rdma: Protect dev_map_lookup from wrong input
rdma: Move per-device handler function to generic code
rdma: Fix misspelled SYS_IMAGE_GUID
rdma: Check that port index exists before operate on link layer
rdma: Print supplied device name in case of wrong name
rdma: Get rid of dev_map_free call
rdma: Rename free function to be rd_cleanup
rdma: Rename rd_free_devmap to be rd_free
rdma: Move link execution logic to common code
rdma: Add option to provide "-" sign for the port number
rdma: Make visible the number of arguments
rdma: Add filtering infrastructure
rdma: Set pointer to device name position
rdma: Allow external usage of compare string routine
rdma: Add resource tracking summary
rdma: Add QP resource tracking information
rdma: Document resource tracking
rdma: Check return value of strdup call
rdma: Add batch command support
rdma: Avoid memory leak for skipper resource
rdma: Update device capabilities flags
rdma: Move RDMA UAPI header file to be under RDMA responsibility
rdma: Ignore unknown netlink attributes
Lorenzo Colitti (3):
ip: support UID range routing.
iproute: build more easily on Android
iproute2: fixes to compile on some systems.
Lubomir Rintel (1):
lib/namespace: don't try to mount rw /sys over a ro one
Luca Boccassi (7):
man: drop references to Debian-specific paths
man: add more keywords to ip.8 short description
man: ip-address: document 15-char limit for LABEL
man: routel/routef: don't mention filesystem paths
man: fix small formatting errors
Drop capabilities if not running ip exec vrf with libcap
ip: do not drop capabilities if net_admin=i is set
Lucas Bates (2):
man page: add page for skbmod action
Add new man page for tc actions.
Lukas Braun (1):
man: ip-route.8: Mention that lower metric means higher priority
Mahesh Bandewar (1):
ip/ipvlan: enhance ability to add mode flags to existing modes
Marcelo Ricardo Leitner (1):
tc-netem: fix limit description in man page
Martin KaFai Lau (1):
bpf: Add support for IFLA_XDP_PROG_ID
Masatake YAMATO (1):
ss: prepare rth when killing inet sock
Matteo Croce (3):
tc: fix typo in manpage
netns: avoid directory traversal
netns: more input validation
Matthias Schiffer (1):
devlink, rdma, tipc: properly define TARGETS without HAVE_MNL
Michal Kubecek (4):
iplink: check for message truncation in iplink_get()
iplink: double the buffer size also in iplink_get()
ip xfrm: use correct key length for netlink message
ip maddr: fix filtering by device
Michal Kubeček (1):
routel: fix infinite loop in line parser
Michal Privoznik (1):
tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()
Mike Frysinger (2):
mark shell scripts +x
ifcfg/rtpr: convert to POSIX shell
Nathan Harold (1):
iproute2: fix 'ip xfrm monitor all' command
Neal Cardwell (1):
ss: print new tcp_info fields: delivery_rate and app_limited
Nicolas Dichtel (4):
link_gre6: really support encaplimit option
ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support
ip: display netns name instead of nsid
iplink: enable to specify a name for the link-netns
Nikhil Gajendrakumar (1):
bridge: this patch adds json support for bridge mdb show
Nikolay Aleksandrov (7):
bridge: fdb: add state filter support
ipmroute: add support for RTNH_F_UNRESOLVED
iplink: add support for xstats subcommand
iplink: bridge: add support for displaying xstats
iplink: bridge_slave: add support for displaying xstats
ip: bridge_slave: add support for per-port group_fwd_mask
ip: bridge_slave: add neigh_suppress to the type help and
Nishanth Devarajan (1):
tc: B.W limits can now be specified in %.
Nogah Frankel (4):
ifstat: Includes reorder
ifstat: Add extended statistics to ifstat
ifstat: Add "sw only" extended statistics to ifstat
ifstat: Add xstat to ifstat man page
Oliver Hartkopp (3):
ip: link add vxcan support
ip: add vxcan to help text
ip: add vxcan/veth to ip-link man page
Or Gerlitz (4):
tc: matchall: Print skip flags when dumping a filter
tc/pedit: p_udp: introduce pedit udp support
tc: Reflect HW offload status
tc: flower: add support for matching on ip tos and ttl
Paul Blakey (2):
tc: flower: support matching flags
tc: flower: Refactor matching flags to be more user friendly
Pavel Maltsev (1):
Allow to configure /var/run/netns directory
Petr Machata (1):
ip: link_gre6.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag
Petr Vorel (8):
ip: fix igmp parsing when iface is long
color: use "light" colors for dark background
tests: Remove bashisms (s/source/.)
tests: Revert back /bin/sh in shebang
color: Fix ip segfault when using --color switch
color: Fix another ip segfault when using --color switch
color: Cleanup code to remove "magic" offset + 7
color: Rename enum
Phil Dibowitz (1):
Show 'external' link mode in output
Phil Sutter (113):
ss: Mark fall through in arg parsing switch()
ss: Drop empty lines in UDP output
ss: Add missing tab when printing UNIX details
ss: Use sockstat->type in all socket types
ss: introduce proc_ctx_print()
ss: Drop list traversal from unix_stats_print()
ss: Eliminate unix_use_proc()
ss: Turn generic_proc_open() wrappers into macros
ss: Make tmr_name local to tcp_timer_print()
ss: Make user_ent_hash_build_init local to user_ent_hash_build()
ss: Make some variables function-local
ss: Make slabstat_ids local to get_slabstat()
ss: Get rid of useless goto in handle_follow_request()
ss: Get rid of single-fielded struct snmpstat
ss: Make unix_state_map local to unix_show()
ss: Make sstate_name local to sock_state_print()
ss: Make sstate_namel local to scan_state()
ss: unix_show: No need to initialize members of calloc'ed structs
tc: m_xt: Fix segfault with iptables-1.6.0
tc: m_xt: Drop needless parentheses from #if checks
man: tc-csum.8: Fix example
man: ip-route.8: Fix 'expires' indenting
testsuite: Generate nlmsg blob at runtime
testsuite: Search kernel config in modules dir also
man: ss.8: Add missing protocols to description of -A
ip: link: bond: Fix whitespace in help text
ip: link: macvlan: Add newline to help output
ip: link: Unify link type help functions a bit
ip: link: Add missing link type help texts
man: ip-link: Specify min/max values for bridge slave priority and cost
man: ip-rule.8: Further clarify how to interpret priority value
man: ip.8: Document -brief flag
tc: m_xt: Prevent a segfault in libipt
man: Collect names of man pages automatically
bpf: Make bytecode-file reading a little more robust
Really fix get_addr() and get_prefix() error messages
tc-simple: Fix documentation
examples: Some shell fixes to cbq.init
ifcfg: Quote left-hand side of [ ] expression
tipc/node: Fix socket fd check in cmd_node_get_addr()
iproute_lwtunnel: Argument to strerror must be positive
iproute_lwtunnel: csum_mode value checking was ineffective
ss: Don't leak fd in tcp_show_netlink_file()
tc/em_ipset: Don't leak sockfd on error path
ipvrf: Fix error path of vrf_switch()
ifstat: Fix memleak in error case
ifstat: Fix memleak in dump_kern_db() for json output
ss: Fix potential memleak in unix_stats_print()
tipc/bearer: Fix resource leak in error path
devlink: No need for this self-assignment
ipntable: No need to check and assign to parms_rta
iproute: Fix for missing 'Oifs:' display
lib/rt_names: Drop dead code in rtnl_rttable_n2a()
ss: Skip useless check in parse_hostcond()
ss: Drop useless assignment
tc/m_gact: Drop dead code
ipaddress: Avoid accessing uninitialized variable lcl
iplink_can: Prevent overstepping array bounds
ipmaddr: Avoid accessing uninitialized data
ss: Use C99 initializer in netlink_show_one()
netem/maketable: Check return value of fstat()
tc/q_multiq: Don't pass garbage in TCA_OPTIONS
iproute: Check mark value input
iplink_vrf: Complain if main table is not found
devlink: Check return code of strslashrsplit()
lib/bpf: Don't leak fp in bpf_find_mntpt()
ifstat, nstat: Check fdopen() return value
tc/q_netem: Don't dereference possibly NULL pointer
tc/tc_filter: Make sure filter name is not empty
tipc/bearer: Prevent NULL pointer dereference
ipntable: Avoid memory allocation for filter.name
lib/fs: Fix format string in find_fs_mount()
lib/inet_proto: Review inet_proto_{a2n,n2a}()
lnstat_util: Simplify alloc_and_open() a bit
tc/m_xt: Fix for potential string buffer overflows
lib/ll_map: Choose size of new cache items at run-time
ss: Make struct tcpstat fields 'timer' and 'timeout' unsigned
ss: Make sure scanned index value to unix_state_map is sane
netem/maketable: Check return value of fscanf()
lib/bpf: Check return value of write()
lib/fs: Fix and simplify make_path()
lib/libnetlink: Don't pass NULL parameter to memcpy()
ss: Fix for added diag support check
link_gre6: Fix for changing tclass/flowlabel
link_gre6: Print the tunnel's tclass setting
utils: Implement strlcpy() and strlcat()
Convert the obvious cases to strlcpy()
Convert harmful calls to strncpy() to strlcpy()
ipxfrm: Replace STRBUF_CAT macro with strlcat()
tc_util: No need to terminate an snprintf'ed buffer
lnstat_util: Make sure buffer is NUL-terminated
lib/bpf: Fix bytecode-file parsing
utils: strlcpy() and strlcat() don't clobber dst
ipaddress: Fix segfault in 'addr showdump'
ip-route: Fix for listing routes with RTAX_LOCK attribute
ip{6, }tunnel: Avoid copying user-supplied interface name around
tc: flower: No need to cache indev arg
Check user supplied interface name lengths
ss: Distinguish between IPv4 and IPv6 wildcard sockets
ss: Detect IPPROTO_ICMPV6 sockets
tc_util: Drop needless pointer check
tc_util: Silence spurious compiler warning
link_gre6: Detect invalid encaplimit values
man: tc-csum.8: Fix inconsistency in example description
tc: Optimize gact action lookup
Remove leftovers from removed Latex documentation
ip-link: Fix use after free in nl_get_ll_addr_len()
man: ip-route.8: ssthresh parameter is NUMBER
man: tc-vlan.8: Fix for incorrect example
ssfilter: Eliminate shift/reduce conflicts
ss: Allow excluding a socket table from being queried
ss: Put filter DB parsing into a separate function
ss: Drop filter_default_dbs()
Philip Prindeville (1):
iproute2: add support for GRE ignore-df knob
Pieter Jansen van Vuuren (1):
tc: f_flower: Add support for matching first frag packets
Quentin Monnet (2):
README: update location of git repositories, remove broken info link
README: re-add updated information link
Ralf Baechle (1):
ip: HSR: Fix cut and paste error
Remigiusz Kołłątaj (1):
ip: add handling for new CAN netlink interface
Robert Shearman (6):
iplink: add support for afstats subcommand
man: Fix formatting of vrf parameter of ip-link show command
iproute: Add support for ttl-propagation attribute
iproute: Add support for MPLS LWT ttl attribute
gre: Fix ttl inherit option
vxlan: Make id optional when modifying a link
Roi Dayan (11):
devlink: Add usage help for eswitch subcommand
devlink: Add option to set and show eswitch inline mode
tc: flower: Fix typo and style in flower man page
tc: tunnel_key: Add tc-tunnel_key man page to Makefile
tc: flower: Fix flower output for src and dst ports
tc: flower: Add missing err check when parsing flower options
tc: flower: Fix incorrect error msg about eth type
tc: flower: Fix parsing ip address
devlink: Add json and pretty options to help and man
devlink: Add option to set and show eswitch encapsulation support
tc: Fix compilation error with old iptables
Roman Mashak (29):
tc: pass correct conversion specifier to print 'unsigned int' action index.
tc: fixed man page fonts for keywords and variable values
tc: updated man page to reflect filter-id use in filter GET command.
tc: distinguish Add/Replace action operations.
tc: print skbedit action when dumping actions.
tc: fix Makefile to build skbmod
tc: fixed typo in usage text.
tc: updated tc-u32 man page to reflect skip_sw and skip_hw parameters.
tc: updated ife man page.
ss: initialize 'fackets' member of tcpstat structure
bridge: isolate vlans parsing code in a separate API
bridge: dump vlan table information for link
bridge: request vlans along with link information
ip: added missing newline in man page
ip netns: use strtol() instead of atoi()
tc: distinguish Add/Replace qdisc operations
ss: remove duplicate assignment
ss: add missing path MTU parameter
tc: added tc monitor description in man page
tc: updated tc-bpf man page
tc: print actual action for sample action
tc: use get_u32() in psample action to match types
tc: print actual action for connmark action
tc: print index, refcnt & bindcnt for nat action
tc: add oneline mode
tc: enable json output for actions
tc: support oneline mode in action generic printer functions
tc: jsonify sample action
tc: return on invalid smac or dmac in ife action
Roopa Prabhu (9):
ip: extend route get to return matching fib route
iproute: extend route get for mpls routes
iplink: new option to set neigh suppression on a bridge port
iplink: bridge: support bridge port vlan_tunnel attribute
bridge: vlan: support for per vlan tunnel info
bridge: fdb: print NDA_SRC_VNI if available
ss: print skmeminfo for packet sockets
iprule: support for ip_proto, sport and dport match options
bridge: add option extern_learn to set NTF_EXT_LEARNED on fdb entries
Sabrina Dubroca (3):
man: ip-link.8: document bridge options
ip link: add support to display extended tun attributes
ip link: add json support for tun attributes
Serhey Popovych (90):
ip/tunnel: Unify setup and accept zero address for local/remote endpoints
ip/tunnel: Use get_addr() instead of get_prefix() for local/remote endpoints
ip: gre: fix IFLA_GRE_LINK attribute sizing
iplink: Improve index parameter handling
iplink: Process "alias" parameter correctly
iplink: Kill redundant network device name checks
ip/tunnel: Use tnl_parse_key() to parse tunnel key
link_ip6tnl: Use IN6ADDR_ANY_INIT to initialize local/remote endpoints
link_vti6: Always add local/remote endpoint attributes
utils: ll_addr: Handle ARPHRD_IP6GRE in ll_addr_n2a()
ip/tunnel: No need to free answer after rtnl_talk() on error
gre,ip6tnl/tunnel: Fix noencap- support
gre6/tunnel: Do not submit garbage in flowinfo
vxcan,veth: Forbid "type" for peer device
ip/tunnel: Document "external" parameter
link_iptnl: Kill code duplication
link_iptnl: Print tunnel mode
link_iptnl: Open "encap" JSON object
ip6/tunnel: Fix tclass output
ip6tnl/tunnel: Do not print obscure flowinfo
ip6/tunnel: Unify tclass printing
ip6/tunnel: Unify flowlabel printing
ip6/tunnel: Unify encap_limit printing
gre6/tunnel: Output flowlabel after tclass
ip6tnl/tunnel: Output hoplimit before encapsulation limit
ipaddress: Use family_name() for better code reuse
iplink: Fix "alias" parameter length calculations
iplink: Use ll_index_to_name() instead of if_indextoname()
ip/tunnel: Correct and unify ttl/hoplimit printing
ip/tunnel: Simplify and unify tos printing
ip/tunnel: Use print_0xhex() instead of print_string()
ip/tunnel: Abstract tunnel encapsulation options printing
gre/tunnel: Print erspan_index using print_uint()
vti/tunnel: Unify ikey/okey printing
vti6/tunnel: Unify and simplify link type help functions
tunnel: Return constant string without copying it
utils: Always specify family for address in get_addr_1()
utils: Always specify family and ->bytelen in get_prefix_1()
utils: Fast inet address classification after get_addr()
iplink_geneve: Get rid of inet_get_addr()
iplink_vxlan: Get rid of inet_get_addr()
ip: Get rid of inet_get_addr()
gre/gre6: Post merge fixes
tunnel: Add space between encap-dport and encap-sport in non-JSON output
iptnl/ip6tnl: Unify ttl/hoplimit parsing routines
vti/vti6: Minor improvements
iplink: Use ll_name_to_index() instead of if_nametoindex()
ip/tunnel: Be consistent when printing tunnel collect metadata
gre/gre6: Unify attribute addition to netlink buffer
utils: Introduce get_addr_rta() and inet_addr_match_rta()
ipaddress: Use inet_addr_match_rta()
iprule: Use inet_addr_match_rta()
ipmroute: Use inet_addr_match_rta()
ipneigh: Use inet_addr_match_rta()
ipl2tp: Use get_addr_rta()
tcp_metric: Use get_addr_rta()
ip/tunnel: Unify local/remote endpoint address printing
Revert "ip address: Change print_linkinfo_brief to take filter as an input"
ip: Consolidate ip, xdp and lwtunnel parse/dump prototypes in ip_common.h
ip: Minor cleanups
treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes
ipaddress: Unify print_link_stats() and print_link_stats64()
ip: Introduce get_rtnl_link_stats_rta() to get link statistics
tunnel: Split statistic getting and printing
iptunnel/ip6tunnel: Code cleanups
iptunnel/ip6tunnel: Use netlink to walk through tunnels list
tuntap: Use netlink to walk through tuntap list
vti/vti6: Unify vti_print_help()
gre/gre6: Unify gre_print_help()
iptnl/ip6tnl: Unify iptunnel_print_help()
ip/tunnel: Minor cleanups
ip: Use print_0xhex() where appropriate
utils: Introduce and use inet_prefix_reset()
vti/vti6: Unify local/remote endpoint address parsing
gre/gre6: Unify local/remote endpoint address parsing
iptnl/ip6tnl: Unify local/remote endpoint and 6rd address parsing
ip: Use single variable to represent -pretty
ipaddress: Abstract IFA_LABEL matching code
ipaddress: ll_map: Replace ll_idx_n2a() with ll_index_to_name()
utils: Reimplement ll_idx_n2a() and introduce ll_idx_a2n()
ipaddress: Improve print_linkinfo()
ipaddress: Simplify print_linkinfo_brief() and it's usage
lib: Correct object file dependencies
utils: Introduce and use get_ifname_rta()
utils: Introduce and use print_name_and_link() to print name@link
ipaddress: Make print_linkinfo_brief() static
utils: Introduce and use nodev() helper routine
iplink: Use "dev" and "name" parameters interchangeable when possible
iplink: Follow documented behaviour when "index" is given
iplink: Perform most of request buffer setups and checks in iplink_parse()
Shmulik Ladkani (2):
tc: m_mirred: Add support for ingress redirect/mirror
ip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag
Simon Horman (20):
tc: flower: Support matching on SCTP ports
tc: flower: remove references to eth_type in manpage
tc: flower: document SCTP ip_proto
tc: flower: correct name of ip_proto parameter to flower_parse_port()
tc: flower: make use of flower_port_attr_type() safe and silent
tc: flower: introduce enum flower_endpoint
tc: flower: support matching on ICMP type and code
tc: flower: document that *_ip parameters take a PREFIX as an argument.
tc: flower: Allow *_mac options to accept a mask
tc: flower: document that *_ip parameters take a PREFIX as an argument.
tc: flower: Allow *_mac options to accept a mask
tc: flower: Update dest UDP port documentation
tc: ife: correct spelling of prio in example
tc: flower: Support matching ARP
tc: flower: use correct type when calling flower_icmp_attr_type
tc: flower: Update documentation to indicate ARP takes IPv4 prefixes
tc: flower: provide generic masked u8 parser helper
tc: flower: provide generic masked u8 print helper
tc: flower: support masked ICMP code and type match
tc actions: store and dump correct length of user cookies
Simon Ruderich (3):
man: document ip route get mark
man: document ip fou show
man: document ip xfrm policy nosock
Solio Sarabia (1):
iplink: validate maximum gso_max_size
Stefan Hajnoczi (2):
ss: allow AF_FAMILY constants >32
ss: add AF_VSOCK support
Stefano Brivio (8):
ss: Remove useless width specifier in process context print
ss: Streamline process context printing in netlink_show_one()
ss: Fix width calculations when Netid or State columns are missing
ss: Replace printf() calls for "main" output by calls to helper
ss: Introduce columns lightweight abstraction
ss: Buffer raw fields first, then render them as a table
ss: Implement automatic column width calculation
ss: Fix rendering of continuous output (-E, --events)
Stephen Hemminger (235):
update kernel headers to 4.9-net-next
update net-next headers
tc: flower checkpatch cleanups
Update kernel headers for XDP and tcp_info
update kernel headers from net-next
update kernel headers from net-next
update to net-next headers (pre 4.10 rc)
lwtunnel: style cleanup
libnetlink: break up dump function
utils: cleanup style
ipvrf: cleanup style issues
configure: fix elftest when warnings enabled
update kernel headers
Revert "tc: flower: document that *_ip parameters take a PREFIX as an argument."
Revert "tc: flower: Allow *_mac options to accept a mask"
minor kernel header update
whitespace cleanup
kernel headers update
add more uapi header files
include: remove unused header
update kernel headers (from 4.10-rc4)
update kernel headers from 4.10 net-next
update kernel headers from net-next
tcp: header file update
update headers from bridge tunnel metadata
tc: add missing sample file
update headers from net-next
update headers from 4.10-rc8
utils: hex2mem get rid of unnecessary goto
v4.10.0
add missing iplink_xstats.c
update headers from net-next
Update headers based on 4.11 merge window
netlink route attribute cleanup
xfrm: remove unnecessary casts
tc: use rta_getattr_u32
bpf: remove unnecessary cast
pie: remove always false condition
update headers from 4.11-rc2
update kernel headers from net-next
update headers from net-next
update headers from 4.11-rc3
update headers from net-next (post 4.11-rc3)
update kernel headers from net-next
netem: fix out of bounds access in maketable
Update kernel headers from 4.11 net-next
add seg6.h kernel headers
update kernel headers from net-next
remove unused header file sysctl.h
iplink: whitespace cleanup
pedit: fix whitespace
update headers to 4.11 net-next
v4.11.0
update kernel headers during 4.12 merge window
update headers from 4.12-rc2
include: remove no longer used iptables_common.h
update to current net-next headers
update headers to get changes for TCA_FLOWER
update headers to get IFLA_EVENT
updated headers from net-next
update headers from net-next (bpf and tc)
more bpf header updates
xfrm: get #define's from linux includes
update headers to get TCA_TUNNEL_CSUM
update kernel headers from net-next
v4.12.0
update kernel headers from net-next
update headers to 4.13-rc1
remove duplicated #include's
Update headers from net-next
ip: change flag names to an array
update headers from 4.13-rc4
tc: fix m_simple usage
update headers from 4.13 net-next
iproute: Add support for extended ack to rtnl_talk
ss: enclose IPv6 address in brackets
lib: fix extended ack with and without libmnl
lib: need to pass LIBMNL flag
include: update headers from net-next
tc, ip: more Makefile updates for LIBMNL
vti6: fix local/remote any addr handling
change how Config is used in Makefile's
vti: print keys in hex not dotted notation
more BPF headers update
seg6: add include/linux/seg6_local.h
include: add pfkeyv2.h drop ipv6.h
update kernel headers from net-next
config: put CFLAGS/LDLIBS in config.mk
add ERSPAN headers
rdma: fix duplicate initialization in port_names
libnetlink: drop unused parameter to rtnl_dump_done
bpf: drop unused parameter to bpf_report_map_in_map
tc: use named initializer for default mqprio options
devlink: header update
update headers from net-next
update headers from 4.14 merge
v4.13.0
BPF: update headers from 4.14-rc1
tc: flower remove unused variable
doc: remove obsolete ip-tunnels documentation
doc: remove outdated ss documentation
doc: remove outdated arpd documentation
doc: remove outdated nstat/rtstat documentation
ignore generated Config file
doc: remove outdated tc-filters documentation
doc: remove outdated IPv6 flow label document
doc: drop old ip command documentation
update headers from net-next rc
tipc: don't need custom CFLAGS
update uapi headers from 4.14-rc4 net-next
rdma: move headers to uapi
uapi: add include linux/vm_sockets_diag.h
netem: fix code indentation
update headers for TC and TIPC from net-next
bpf: update header file
include: add TCP fastopen option
update kernel headers
iproute: source code cleanup
bridge: checkpatch related cleanups
Update kernel headers based on 4.14-rc7
Update kernel headers from net-next (4.14-rc6)
update kernel headers from 4.14-rc7 net-next
Update kernel headers from 4.14-rc8 nete-next
Update kernel headers with new SPDK identifier
netem: use fixed rather than floating point for scaling
update kernel headers
update kernel headers from 4.14 net-next
drop unneeded include of syslog.h
v4.14.0
utils: remove duplicate include of ctype.h
v4.14.1
update headers from 4.15-rc1
ila: fix formatting of help message
update bpf header from net-next
tc: replace magic constant 16 with #define
tc: break long lines
SPDX license identifiers
m_vlan: style cleanups
m_action: style cleanup
m_gact: whitespace cleanup
m_mirred: style cleanups
update bpf header from net-next
update headers from 4.15-rc2
iplink: allow configuring GSO max values
uapi: add access to snd_cwnd and other sock_ops
uapi: tun add eBPF based queue selection method
iplink: add definitions for GSO_MAX
include: qdisc offload defines
ip: validate vlan value for vlan info
ss: fix crash with invalid command input file
utils: fix makeargs stack overflow
include: update ethernet headers
tc: remove no longer relevant README
v4.15.0
include: update uapi with BPF from 4.15-rc1
include: update netfilter headers from 4.15-rc1
include: update rdma uapi from 4.15-rc1
include: update interface UAPI from 4.15-rc1
include: update UAPI types.h
iproute: refactor printing flags
iproute: make printing icmpv6 a function
iproute: make printing IPv4 cache flags a function
iproute: refactor cacheinfo printing
iproute: refactor metrics print
iproute: refactor printing flow info
iproute: refactor newdst, gateway and via printing
iproute: refactor multipath print
iproute: refactor printing of interface
iproute: whitespace fixes
iproute: don't do assignment in condition
iproute: make flush a separate function
json: make pretty printing optional
man: add documentation for json and pretty flags
json: fix newline at end of array
iproute: implement JSON and color output
include: update rdma header from 4.16-rc1
uapi: update if_ether compat headers
ip: don't colorize the master device
ip: remove dead code
bridge: implement json pretty print flag
bridge: colorize output and use JSON print library
bridge: add json support for link command
bridge: update man page for new color and json changes
ip: always print interface name in color
tc: implement color output
json_writer: add SPDX Identifier (GPL-2/BSD-2)
ipneigh: add color and json support
ipaddrlabel: add json support
iprule: add json support
ipntable: add json support
ipnetconf: add JSON support
tcp_metrics; make tables const
tcp_metrics: add json support
ipsr: add json support
token: support JSON
tuntap: support JSON output
fou: break long lines
fou: support JSON output
ip: macsec cleanup
ipmacsec: collapse common code
macsec: support JSON
netns: add JSON support
ipmaddr: json and color support
ipmroute: convert to output JSON
ipmroute: better error message if no kernel mroute
Revert "iproute: "list/flush/save default" selected all of the routes"
tc: help and whitespace cleanup
rdma: fix man page typos
ip/ila: support json and color
ip/l2tp: add JSON support
bridge: avoid snprint truncation on time
pedit: fix strncpy warning
ip: use strlcpy() to avoid truncation
tunnel: use strlcpy to avoid strncpy warnings
tc_class: fix snprintf warning
ematch: fix possible snprintf overflow
misc: avoid snprintf warnings in ss and nstat
bpf: avoid compiler warnings about strncpy
namespace: limit the length of namespace name to avoid snprintf overflow
uapi/if_ether: add definition of ether type field
v4.16.0
uapi/bpf: update kernel header from 4.17-rc1
uapi/tipc: update header from 4.17-rc1
uapi/sctp: update header from 4.17-rc1
ipneigh: fix missing format specifier
flower: use 16 bit format where possible
bpf: fix warnings on gcc-8 about string truncation
rdma: align headers with upstream
rdma: add ib header files
ss: remove non-functional slabinfo
tc: allow 0% for percent options
ip: defer lookup interface index
rt_protos: drop old experimental gated names
uapi: update bpf.h to include padding
v4.17.0
Steve Wise (7):
rdma: update rdma_netlink.h
rdma: add UAPI rdma_user_cm.h
rdma: initialize the rd struct
rdma: Add CM_ID resource tracking information
rdma: Add CQ resource tracking information
rdma: Add MR resource tracking information
rdma: Add PD resource tracking information
Tariq Toukan (1):
ip-address: Fix negative prints of large TX rate limits
Thomas Egerer (3):
xfrm_policy: Add filter option for socket policies
xfrm_policy: Do not attempt to deleteall a socket policy
xfrm_{state, policy}: Allow to deleteall polices/states with marks
Thomas Graf (2):
bpf: Fix number of retries when growing log buffer
lwt: BPF support for LWT
Thomas Haller (1):
man: fix documentation for range of route table ID
Timothy Redaelli (2):
ip-route: Prevent some other double spaces in output
bridge: Prevent a double space in bridge mdb show
Toke Høiland-Jørgensen (4):
tc: Add missing documentation for codel and fq_codel parameters
tc: Add JSON output of fq_codel stats
ingress: Don't break JSON output
json_print: Fix hidden 64-bit type promotion
Tom Herbert (5):
ila: Fix reporting of ILA locators and locator match
ila: added csum neutral support to ipila
ila: support to configure checksum neutral-map-auto
ila: support for configuring identifier and hook types
ila: create ila_common.h
Vincent Bernat (2):
vxlan: use preferred address family when neither group or remote is specified
color: disable color when json output is requested
Vinicius Costa Gomes (2):
tc: Add support for the CBS qdisc
man: Add initial manpage for tc-cbs(8)
Vlad Yasevich (1):
ip: Add IFLA_EVENT output to ip monitor
Wei Wang (1):
ss: print tcpi_rcv_ssthresh
William Tu (5):
gre: add support for ERSPAN tunnel
ip6_gre: add support for ERSPAN tunnel
gre6: add collect metadata support
erspan: add erspan version II support
erspan: add erspan usage description
Wolfgang Bumiller (1):
tc/lexer: let quotes actually start strings
Yotam Gigi (10):
tc: man: matchall: Fix example indentation
tc: Add support for the sample tc action
tc: man: Add man entry for the tc-sample action
tc: man: matchall: Update examples to include sample
tc: bash-completion: Add the _from variant to _tc_one* funcs
tc: bash-completion: Prepare action autocomplete to support several actions
tc: bash-completion: Make the *_KIND variables global
tc: bash-completion: Add support for filter actions
tc: bash-completion: Add support for matchall
ip: mroute: Print offload indication
Yuchung Cheng (1):
ss: print new tcp_info fields: busy, rwnd-limited, sndbuf-limited times
Yulia Kartseva (1):
tc: fix ipv6 filter selector attribute for some prefix lengths
Yuval Mintz (2):
qdisc: print offload indication
tc: Correct json output for actions
Zhang Shengju (1):
iplink: add support for IFLA_CARRIER attribute
yupeng (1):
man: add additional explainations for ss
Élie Bouttier (1):
ip route: replace exits with returns
^ permalink raw reply
* Re: Qualcomm rmnet driver and qmi_wwan
From: Subash Abhinov Kasiviswanathan @ 2018-06-08 17:19 UTC (permalink / raw)
To: Daniele Palmas; +Cc: Bjørn Mork, Dan Williams, netdev
In-Reply-To: <CAGRyCJFqiDWDypSij3SGskLpJgtAJ_8f5qKLRY8Kt_yEKB=Q_g@mail.gmail.com>
> I followed Dan's advice and prepared a very basic test patch
> (attached) for testing it through ip link.
>
> Basically things seem to be properly working with qmicli, but I needed
> to modify a bit qmi_wwan, so I'm adding Bjørn that maybe can help.
>
> Bjørn,
>
> I'm trying to add support to rmnet in qmi_wwan: I had to modify the
> code as in the attached test patch, but I'm not sure it is the right
> way.
>
> This is done under the assumption that the rmnet device would be the
> only one to register an rx handler to qmi_wwan, but it is probably
> wrong.
>
> Basically I'm wondering if there is a more correct way to understand
> if an rmnet device is linked to the real qmi_wwan device.
>
> Thanks,
> Daniele
Hi Daniele / Bjørn
Is it possible to define a pass through mode in qmi_wwan. This is to
ensure that all packets in MAP format are passed through instead of
processing in qmi_wwan layer. The pass through mode would just call
netif_receive_skb() on all these packets.
That would allow all the packets to be intercepted by the rx_handler
attached by rmnet which would subsequently de-multiplex and process
the packets.
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply
* [PATCH 3/3] bpfilter: do not (ab)use host-program build rule
From: Masahiro Yamada @ 2018-06-08 17:12 UTC (permalink / raw)
To: netdev, Alexei Starovoitov, David S . Miller
Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada,
YueHaibing
In-Reply-To: <1528477930-7342-1-git-send-email-yamada.masahiro@socionext.com>
It is an ugly hack to overwrite $(HOSTCC) with $(CC) to reuse the
build rules from scripts/Makefile.host. It should not be tedious
to write a build rule for its own.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---
net/bpfilter/Makefile | 17 +++++++++++------
net/bpfilter/{main.c => bpfilter_umh.c} | 0
2 files changed, 11 insertions(+), 6 deletions(-)
rename net/bpfilter/{main.c => bpfilter_umh.c} (100%)
diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 39c6980..6571b30 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -3,18 +3,23 @@
# Makefile for the Linux BPFILTER layer.
#
-hostprogs-y := bpfilter_umh
-bpfilter_umh-objs := main.o
-HOSTCFLAGS += -I. -Itools/include/ -Itools/include/uapi
-HOSTCC := $(CC)
-
ifeq ($(CONFIG_BPFILTER_UMH), y)
# builtin bpfilter_umh should be compiled with -static
# since rootfs isn't mounted at the time of __init
# function is called and do_execv won't find elf interpreter
-HOSTLDFLAGS += -static
+STATIC := -static
endif
+quiet_cmd_cc_user = CC $@
+ cmd_cc_user = $(CC) -Wall -Wmissing-prototypes -O2 -std=gnu89 \
+ -I$(srctree) -I$(srctree)/tools/include/ \
+ -I$(srctree)/tools/include/uapi $(STATIC) -o $@ $<
+
+$(obj)/bpfilter_umh: $(src)/bpfilter_umh.c FORCE
+ $(call if_changed,cc_user)
+
+targets += bpfilter_umh
+
$(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
diff --git a/net/bpfilter/main.c b/net/bpfilter/bpfilter_umh.c
similarity index 100%
rename from net/bpfilter/main.c
rename to net/bpfilter/bpfilter_umh.c
--
2.7.4
^ permalink raw reply related
* [PATCH 2/3] bpfilter: include bpfilter_umh in assembly instead of using objcopy
From: Masahiro Yamada @ 2018-06-08 17:12 UTC (permalink / raw)
To: netdev, Alexei Starovoitov, David S . Miller
Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada,
YueHaibing
In-Reply-To: <1528477930-7342-1-git-send-email-yamada.masahiro@socionext.com>
Do not use the troublesome ELF magic. What is happening here is to
embed a user-space program into the kernel. Simply wrap it in the
assembly with the '.incbin' directive.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---
net/bpfilter/Makefile | 15 ++-------------
net/bpfilter/bpfilter_kern.c | 11 +++++------
net/bpfilter/bpfilter_umh_blob.S | 7 +++++++
3 files changed, 14 insertions(+), 19 deletions(-)
create mode 100644 net/bpfilter/bpfilter_umh_blob.S
diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index aafa720..39c6980 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -15,18 +15,7 @@ ifeq ($(CONFIG_BPFILTER_UMH), y)
HOSTLDFLAGS += -static
endif
-# a bit of elf magic to convert bpfilter_umh binary into a binary blob
-# inside bpfilter_umh.o elf file referenced by
-# _binary_net_bpfilter_bpfilter_umh_start symbol
-# which bpfilter_kern.c passes further into umh blob loader at run-time
-quiet_cmd_copy_umh = GEN $@
- cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
- $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
- -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
- --rename-section .data=.init.rodata $< $@
-
-$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
- $(call cmd,copy_umh)
+$(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
-bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
+bpfilter-objs += bpfilter_kern.o bpfilter_umh_blob.o
diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
index b13d058..fcc1a7c 100644
--- a/net/bpfilter/bpfilter_kern.c
+++ b/net/bpfilter/bpfilter_kern.c
@@ -10,11 +10,8 @@
#include <linux/file.h>
#include "msgfmt.h"
-#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
-#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
-
-extern char UMH_start;
-extern char UMH_end;
+extern char bpfilter_umh_start;
+extern char bpfilter_umh_end;
static struct umh_info info;
/* since ip_getsockopt() can run in parallel, serialize access to umh */
@@ -89,7 +86,9 @@ static int __init load_umh(void)
int err;
/* fork usermode process */
- err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
+ err = fork_usermode_blob(&bpfilter_umh_end,
+ &bpfilter_umh_end - &bpfilter_umh_start,
+ &info);
if (err)
return err;
pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S
new file mode 100644
index 0000000..40311d1
--- /dev/null
+++ b/net/bpfilter/bpfilter_umh_blob.S
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+ .section .init.rodata, "a"
+ .global bpfilter_umh_start
+bpfilter_umh_start:
+ .incbin "net/bpfilter/bpfilter_umh"
+ .global bpfilter_umh_end
+bpfilter_umh_end:
--
2.7.4
^ permalink raw reply related
* [PATCH 1/3] bpfilter: add bpfilter_umh to .gitignore
From: Masahiro Yamada @ 2018-06-08 17:12 UTC (permalink / raw)
To: netdev, Alexei Starovoitov, David S . Miller
Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada
In-Reply-To: <1528477930-7342-1-git-send-email-yamada.masahiro@socionext.com>
bpfilter_umh is a generated file. It should be ignored by git.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---
net/bpfilter/.gitignore | 1 +
1 file changed, 1 insertion(+)
create mode 100644 net/bpfilter/.gitignore
diff --git a/net/bpfilter/.gitignore b/net/bpfilter/.gitignore
new file mode 100644
index 0000000..e97084e
--- /dev/null
+++ b/net/bpfilter/.gitignore
@@ -0,0 +1 @@
+bpfilter_umh
--
2.7.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox