* [PATCH bpf-next 0/8] bpf: offload: report device back to user space (take 2)
From: Jakub Kicinski @ 2017-12-20 4:09 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
Hi!
This series is a redo of reporting offload device information to
user space after the first attempt did not take into account name
spaces. As requested by Kirill offloads are now protected by an
r/w sem. This allows us to remove the workqueue and free the
offload state fully when device is removed (suggested by Alexei).
Net namespace is reported with a device/inode pair.
The accompanying bpftool support is placed in common code because
maps will have very similar info. Note that the UAPI information
can't be nicely encapsulated into a struct, because in case we
need to grow the device information the new fields will have to
be added at the end of struct bpf_prog_info, we can't grow
structures in the middle of bpf_prog_info.
Jakub Kicinski (8):
bpf: offload: don't require rtnl for dev list manipulation
bpf: offload: don't use prog->aux->offload as boolean
bpf: offload: allow netdev to disappear while verifier is running
bpf: offload: free prog->aux->offload when device disappears
bpf: offload: free program id when device disappears
bpf: offload: report device information for offloaded programs
tools: bpftool: report device information for offloaded programs
selftests/bpf: test device info reporting for bound progs
drivers/net/ethernet/netronome/nfp/bpf/main.h | 2 +-
drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 2 +-
drivers/net/netdevsim/bpf.c | 2 +-
fs/nsfs.c | 2 +-
include/linux/bpf.h | 16 ++-
include/linux/bpf_verifier.h | 16 +--
include/linux/netdevice.h | 4 +-
include/linux/proc_ns.h | 1 +
include/uapi/linux/bpf.h | 3 +
kernel/bpf/offload.c | 114 ++++++++++++++++------
kernel/bpf/syscall.c | 19 +++-
kernel/bpf/verifier.c | 20 ++--
tools/bpf/bpftool/common.c | 52 ++++++++++
tools/bpf/bpftool/main.h | 2 +
tools/bpf/bpftool/prog.c | 3 +
tools/include/uapi/linux/bpf.h | 3 +
tools/testing/selftests/bpf/test_offload.py | 107 +++++++++++++++++---
17 files changed, 287 insertions(+), 81 deletions(-)
--
2.15.1
^ permalink raw reply
* [PATCH net-next v3] netdevsim: correctly check return value of debugfs_create_dir
From: Prashant Bhole @ 2017-12-20 3:18 UTC (permalink / raw)
To: David S . Miller; +Cc: Prashant Bhole, netdev, Jakub Kicinski
- Checking return value with IS_ERROR_OR_NULL
- Added error handling where it was not handled
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
v3: nit-pick: directly returning error instead of going to label
drivers/net/netdevsim/bpf.c | 8 ++++----
drivers/net/netdevsim/netdev.c | 6 ++++--
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 078d2c37a6c1..aeb429428cc5 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -201,7 +201,6 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog)
{
struct nsim_bpf_bound_prog *state;
char name[16];
- int err;
state = kzalloc(sizeof(*state), GFP_KERNEL);
if (!state)
@@ -214,10 +213,9 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog)
/* Program id is not populated yet when we create the state. */
sprintf(name, "%u", ns->prog_id_gen++);
state->ddir = debugfs_create_dir(name, ns->ddir_bpf_bound_progs);
- if (IS_ERR(state->ddir)) {
- err = PTR_ERR(state->ddir);
+ if (IS_ERR_OR_NULL(state->ddir)) {
kfree(state);
- return err;
+ return -ENOMEM;
}
debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id);
@@ -349,6 +347,8 @@ int nsim_bpf_init(struct netdevsim *ns)
&ns->bpf_bind_verifier_delay);
ns->ddir_bpf_bound_progs =
debugfs_create_dir("bpf_bound_progs", ns->ddir);
+ if (IS_ERR_OR_NULL(ns->ddir_bpf_bound_progs))
+ return -ENOMEM;
ns->bpf_tc_accept = true;
debugfs_create_bool("bpf_tc_accept", 0600, ns->ddir,
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index eb8c679fca9f..56d7ea93a983 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -151,6 +151,8 @@ static int nsim_init(struct net_device *dev)
ns->netdev = dev;
ns->ddir = debugfs_create_dir(netdev_name(dev), nsim_ddir);
+ if (IS_ERR_OR_NULL(ns->ddir))
+ return -ENOMEM;
err = nsim_bpf_init(ns);
if (err)
@@ -469,8 +471,8 @@ static int __init nsim_module_init(void)
int err;
nsim_ddir = debugfs_create_dir(DRV_NAME, NULL);
- if (IS_ERR(nsim_ddir))
- return PTR_ERR(nsim_ddir);
+ if (IS_ERR_OR_NULL(nsim_ddir))
+ return -ENOMEM;
err = bus_register(&nsim_bus);
if (err)
--
2.13.6
^ permalink raw reply related
* [PATCH] selftests/bpf: remove the DEBUG macro for test_dev_cgroup
From: Chen Rong @ 2017-12-20 3:15 UTC (permalink / raw)
Cc: chenr.fnst, Alexei Starovoitov, Daniel Borkmann, Shuah Khan,
netdev, linux-kernel, linux-kselftest
The test may fail if not enable DEBUG macro in dev_cgroup.c
# ./test_dev_cgroup
libbpf: load bpf program failed: Operation not permitted
libbpf: failed to load program 'cgroup/dev'
libbpf: failed to load object './dev_cgroup.o'
Failed to load DEV_CGROUP program
Removing the DEBUG macro makes the test always pass.
Signed-off-by: Chen Rong <chenr.fnst@cn.fujitsu.com>
---
tools/testing/selftests/bpf/dev_cgroup.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/dev_cgroup.c b/tools/testing/selftests/bpf/dev_cgroup.c
index ce41a34..a167c6d 100644
--- a/tools/testing/selftests/bpf/dev_cgroup.c
+++ b/tools/testing/selftests/bpf/dev_cgroup.c
@@ -13,7 +13,6 @@ SEC("cgroup/dev")
int bpf_prog1(struct bpf_cgroup_dev_ctx *ctx)
{
short type = ctx->access_type & 0xFFFF;
-#ifdef DEBUG
short access = ctx->access_type >> 16;
char fmt[] = " %d:%d \n";
@@ -39,7 +38,6 @@ int bpf_prog1(struct bpf_cgroup_dev_ctx *ctx)
fmt[10] = 'm';
bpf_trace_printk(fmt, sizeof(fmt), ctx->major, ctx->minor);
-#endif
/* Allow access to /dev/zero and /dev/random.
* Forbid everything else.
--
2.5.0
^ permalink raw reply related
* [PATCH v3 net-next 5/5] net: tracepoint: using sock_set_state tracepoint to trace SCTP state transition
From: Yafang Shao @ 2017-12-20 3:12 UTC (permalink / raw)
To: songliubraving, davem, marcelo.leitner, rostedt
Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513739574-3345-1-git-send-email-laoar.shao@gmail.com>
With changes in inet_ files, SCTP state transitions are traced with
inet_sock_set_state tracepoint.
As SCTP state names, i.e. SCTP_SS_CLOSED, SCTP_SS_ESTABLISHED,
have the same value with TCP state names. So the output info still print
the TCP state names, that makes the code easy.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
net/sctp/endpointola.c | 2 +-
net/sctp/sm_sideeffect.c | 4 ++--
net/sctp/socket.c | 12 ++++++------
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index ee1e601..8b31468 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -232,7 +232,7 @@ void sctp_endpoint_free(struct sctp_endpoint *ep)
{
ep->base.dead = true;
- ep->base.sk->sk_state = SCTP_SS_CLOSED;
+ inet_sk_set_state(ep->base.sk, SCTP_SS_CLOSED);
/* Unlink this endpoint, so we can't find it again! */
sctp_unhash_endpoint(ep);
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 8adde71..c0c3ec6 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -878,12 +878,12 @@ static void sctp_cmd_new_state(struct sctp_cmd_seq *cmds,
* successfully completed a connect() call.
*/
if (sctp_state(asoc, ESTABLISHED) && sctp_sstate(sk, CLOSED))
- sk->sk_state = SCTP_SS_ESTABLISHED;
+ inet_sk_set_state(sk, SCTP_SS_ESTABLISHED);
/* Set the RCV_SHUTDOWN flag when a SHUTDOWN is received. */
if (sctp_state(asoc, SHUTDOWN_RECEIVED) &&
sctp_sstate(sk, ESTABLISHED)) {
- sk->sk_state = SCTP_SS_CLOSING;
+ inet_sk_set_state(sk, SCTP_SS_CLOSING);
sk->sk_shutdown |= RCV_SHUTDOWN;
}
}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7eec0a0..59b5689 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1544,7 +1544,7 @@ static void sctp_close(struct sock *sk, long timeout)
lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
sk->sk_shutdown = SHUTDOWN_MASK;
- sk->sk_state = SCTP_SS_CLOSING;
+ inet_sk_set_state(sk, SCTP_SS_CLOSING);
ep = sctp_sk(sk)->ep;
@@ -4653,7 +4653,7 @@ static void sctp_shutdown(struct sock *sk, int how)
if (how & SEND_SHUTDOWN && !list_empty(&ep->asocs)) {
struct sctp_association *asoc;
- sk->sk_state = SCTP_SS_CLOSING;
+ inet_sk_set_state(sk, SCTP_SS_CLOSING);
asoc = list_entry(ep->asocs.next,
struct sctp_association, asocs);
sctp_primitive_SHUTDOWN(net, asoc, NULL);
@@ -7509,13 +7509,13 @@ static int sctp_listen_start(struct sock *sk, int backlog)
* sockets.
*
*/
- sk->sk_state = SCTP_SS_LISTENING;
+ inet_sk_set_state(sk, SCTP_SS_LISTENING);
if (!ep->base.bind_addr.port) {
if (sctp_autobind(sk))
return -EAGAIN;
} else {
if (sctp_get_port(sk, inet_sk(sk)->inet_num)) {
- sk->sk_state = SCTP_SS_CLOSED;
+ inet_sk_set_state(sk, SCTP_SS_CLOSED);
return -EADDRINUSE;
}
}
@@ -8538,10 +8538,10 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
* is called, set RCV_SHUTDOWN flag.
*/
if (sctp_state(assoc, CLOSED) && sctp_style(newsk, TCP)) {
- newsk->sk_state = SCTP_SS_CLOSED;
+ inet_sk_set_state(newsk, SCTP_SS_CLOSED);
newsk->sk_shutdown |= RCV_SHUTDOWN;
} else {
- newsk->sk_state = SCTP_SS_ESTABLISHED;
+ inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
}
release_sock(newsk);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 net-next 4/5] net: tracepoint: using sock_set_state tracepoint to trace DCCP state transition
From: Yafang Shao @ 2017-12-20 3:12 UTC (permalink / raw)
To: songliubraving, davem, marcelo.leitner, rostedt
Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513739574-3345-1-git-send-email-laoar.shao@gmail.com>
With changes in inet_ files, DCCP state transitions are traced with
inet_sock_set_state tracepoint.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
net/dccp/proto.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 9d43c1f..7a75a1d 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -110,7 +110,7 @@ void dccp_set_state(struct sock *sk, const int state)
/* Change state AFTER socket is unhashed to avoid closed
* socket sitting in hash tables.
*/
- sk->sk_state = state;
+ inet_sk_set_state(sk, state);
}
EXPORT_SYMBOL_GPL(dccp_set_state);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 net-next 3/5] net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store
From: Yafang Shao @ 2017-12-20 3:12 UTC (permalink / raw)
To: songliubraving, davem, marcelo.leitner, rostedt
Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513739574-3345-1-git-send-email-laoar.shao@gmail.com>
sk_state_load is only used by AF_INET/AF_INET6, so rename it to
inet_sk_state_load and move it into inet_sock.h.
sk_state_store is removed as it is not used any more.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/net/inet_sock.h | 25 ++++++++++++++++++++++++-
include/net/sock.h | 25 -------------------------
net/ipv4/inet_connection_sock.c | 2 +-
net/ipv4/tcp.c | 4 ++--
net/ipv4/tcp_diag.c | 2 +-
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
7 files changed, 30 insertions(+), 32 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index a3431a4..0a671c3 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -290,9 +290,32 @@ static inline void inet_sk_copy_descendant(struct sock *sk_to,
#endif
int inet_sk_rebuild_header(struct sock *sk);
-void inet_sk_set_state(struct sock *sk, int state);
+
+/**
+ * inet_sk_state_load - read sk->sk_state for lockless contexts
+ * @sk: socket pointer
+ *
+ * Paired with inet_sk_state_store(). Used in places we don't hold socket lock:
+ * tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...
+ */
+static inline int inet_sk_state_load(const struct sock *sk)
+{
+ /* state change might impact lockless readers. */
+ return smp_load_acquire(&sk->sk_state);
+}
+
+/**
+ * inet_sk_state_store - update sk->sk_state
+ * @sk: socket pointer
+ * @newstate: new state
+ *
+ * Paired with inet_sk_state_load(). Should be used in contexts where
+ * state change might impact lockless readers.
+ */
void inet_sk_state_store(struct sock *sk, int newstate);
+void inet_sk_set_state(struct sock *sk, int state);
+
static inline unsigned int __inet_ehashfn(const __be32 laddr,
const __u16 lport,
const __be32 faddr,
diff --git a/include/net/sock.h b/include/net/sock.h
index 9a90472..4fd211b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2332,31 +2332,6 @@ static inline bool sk_listener(const struct sock *sk)
return (1 << sk->sk_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV);
}
-/**
- * sk_state_load - read sk->sk_state for lockless contexts
- * @sk: socket pointer
- *
- * Paired with sk_state_store(). Used in places we do not hold socket lock :
- * tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...
- */
-static inline int sk_state_load(const struct sock *sk)
-{
- return smp_load_acquire(&sk->sk_state);
-}
-
-/**
- * sk_state_store - update sk->sk_state
- * @sk: socket pointer
- * @newstate: new state
- *
- * Paired with sk_state_load(). Should be used in contexts where
- * state change might impact lockless readers.
- */
-static inline void sk_state_store(struct sock *sk, int newstate)
-{
- smp_store_release(&sk->sk_state, newstate);
-}
-
void sock_enable_timestamp(struct sock *sk, int flag);
int sock_get_timestamp(struct sock *, struct timeval __user *);
int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f460fc0..12410ec 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -685,7 +685,7 @@ static void reqsk_timer_handler(struct timer_list *t)
int max_retries, thresh;
u8 defer_accept;
- if (sk_state_load(sk_listener) != TCP_LISTEN)
+ if (inet_sk_state_load(sk_listener) != TCP_LISTEN)
goto drop;
max_retries = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_synack_retries;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d408fb4..67d39b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -502,7 +502,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
sock_poll_wait(file, sk_sleep(sk), wait);
- state = sk_state_load(sk);
+ state = inet_sk_state_load(sk);
if (state == TCP_LISTEN)
return inet_csk_listen_poll(sk);
@@ -2916,7 +2916,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
if (sk->sk_type != SOCK_STREAM)
return;
- info->tcpi_state = sk_state_load(sk);
+ info->tcpi_state = inet_sk_state_load(sk);
/* Report meaningful fields for all TCP states, including listeners */
rate = READ_ONCE(sk->sk_pacing_rate);
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index abbf0ed..81148f7 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -24,7 +24,7 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
{
struct tcp_info *info = _info;
- if (sk_state_load(sk) == TCP_LISTEN) {
+ if (inet_sk_state_load(sk) == TCP_LISTEN) {
r->idiag_rqueue = sk->sk_ack_backlog;
r->idiag_wqueue = sk->sk_max_ack_backlog;
} else if (sk->sk_type == SOCK_STREAM) {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 77ea45d..67ef303 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2281,7 +2281,7 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i)
timer_expires = jiffies;
}
- state = sk_state_load(sk);
+ state = inet_sk_state_load(sk);
if (state == TCP_LISTEN)
rx_queue = sk->sk_ack_backlog;
else
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 1f04ec0..af2b2a2 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1795,7 +1795,7 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
timer_expires = jiffies;
}
- state = sk_state_load(sp);
+ state = inet_sk_state_load(sp);
if (state == TCP_LISTEN)
rx_queue = sp->sk_ack_backlog;
else
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint
From: Yafang Shao @ 2017-12-20 3:12 UTC (permalink / raw)
To: songliubraving, davem, marcelo.leitner, rostedt
Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513739574-3345-1-git-send-email-laoar.shao@gmail.com>
As sk_state is a common field for struct sock, so the state
transition tracepoint should not be a TCP specific feature.
Currently it traces all AF_INET state transition, so I rename this
tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
into trace/events/sock.h.
We dont need to create a file named trace/events/inet_sock.h for this one single
tracepoint.
Two helpers are introduced to trace sk_state transition
- void inet_sk_state_store(struct sock *sk, int newstate);
- void inet_sk_set_state(struct sock *sk, int state);
As trace header should not be included in other header files,
so they are defined in sock.c.
The protocol such as SCTP maybe compiled as a ko, hence export
inet_sk_set_state().
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/net/inet_sock.h | 2 +
include/trace/events/sock.h | 107 ++++++++++++++++++++++++++++++++++++++++
include/trace/events/tcp.h | 31 ------------
net/ipv4/af_inet.c | 14 ++++++
net/ipv4/inet_connection_sock.c | 6 +--
net/ipv4/inet_hashtables.c | 2 +-
net/ipv4/tcp.c | 6 +--
7 files changed, 128 insertions(+), 40 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 39efb96..a3431a4 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -290,6 +290,8 @@ static inline void inet_sk_copy_descendant(struct sock *sk_to,
#endif
int inet_sk_rebuild_header(struct sock *sk);
+void inet_sk_set_state(struct sock *sk, int state);
+void inet_sk_state_store(struct sock *sk, int newstate);
static inline unsigned int __inet_ehashfn(const __be32 laddr,
const __u16 lport,
diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
index ec4dade..3b9094a 100644
--- a/include/trace/events/sock.h
+++ b/include/trace/events/sock.h
@@ -6,7 +6,50 @@
#define _TRACE_SOCK_H
#include <net/sock.h>
+#include <net/ipv6.h>
#include <linux/tracepoint.h>
+#include <linux/ipv6.h>
+#include <linux/tcp.h>
+
+/* The protocol traced by sock_set_state */
+#define inet_protocol_names \
+ EM(IPPROTO_TCP) \
+ EM(IPPROTO_DCCP) \
+ EMe(IPPROTO_SCTP)
+
+#define tcp_state_names \
+ EM(TCP_ESTABLISHED) \
+ EM(TCP_SYN_SENT) \
+ EM(TCP_SYN_RECV) \
+ EM(TCP_FIN_WAIT1) \
+ EM(TCP_FIN_WAIT2) \
+ EM(TCP_TIME_WAIT) \
+ EM(TCP_CLOSE) \
+ EM(TCP_CLOSE_WAIT) \
+ EM(TCP_LAST_ACK) \
+ EM(TCP_LISTEN) \
+ EM(TCP_CLOSING) \
+ EMe(TCP_NEW_SYN_RECV)
+
+/* enums need to be exported to user space */
+#undef EM
+#undef EMe
+#define EM(a) TRACE_DEFINE_ENUM(a);
+#define EMe(a) TRACE_DEFINE_ENUM(a);
+
+inet_protocol_names
+tcp_state_names
+
+#undef EM
+#undef EMe
+#define EM(a) { a, #a },
+#define EMe(a) { a, #a }
+
+#define show_inet_protocol_name(val) \
+ __print_symbolic(val, inet_protocol_names)
+
+#define show_tcp_state_name(val) \
+ __print_symbolic(val, tcp_state_names)
TRACE_EVENT(sock_rcvqueue_full,
@@ -63,6 +106,70 @@
__entry->rmem_alloc)
);
+TRACE_EVENT(inet_sock_set_state,
+
+ TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
+
+ TP_ARGS(sk, oldstate, newstate),
+
+ TP_STRUCT__entry(
+ __field(const void *, skaddr)
+ __field(int, oldstate)
+ __field(int, newstate)
+ __field(__u16, sport)
+ __field(__u16, dport)
+ __field(__u8, protocol)
+ __array(__u8, saddr, 4)
+ __array(__u8, daddr, 4)
+ __array(__u8, saddr_v6, 16)
+ __array(__u8, daddr_v6, 16)
+ ),
+
+ TP_fast_assign(
+ struct inet_sock *inet = inet_sk(sk);
+ struct in6_addr *pin6;
+ __be32 *p32;
+
+ __entry->skaddr = sk;
+ __entry->oldstate = oldstate;
+ __entry->newstate = newstate;
+
+ __entry->protocol = sk->sk_protocol;
+ __entry->sport = ntohs(inet->inet_sport);
+ __entry->dport = ntohs(inet->inet_dport);
+
+ p32 = (__be32 *) __entry->saddr;
+ *p32 = inet->inet_saddr;
+
+ p32 = (__be32 *) __entry->daddr;
+ *p32 = inet->inet_daddr;
+
+#if IS_ENABLED(CONFIG_IPV6)
+ if (sk->sk_family == AF_INET6) {
+ pin6 = (struct in6_addr *)__entry->saddr_v6;
+ *pin6 = sk->sk_v6_rcv_saddr;
+ pin6 = (struct in6_addr *)__entry->daddr_v6;
+ *pin6 = sk->sk_v6_daddr;
+ } else
+#endif
+ {
+ pin6 = (struct in6_addr *)__entry->saddr_v6;
+ ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
+ pin6 = (struct in6_addr *)__entry->daddr_v6;
+ ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
+ }
+ ),
+
+ TP_printk("protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4"
+ "saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
+ show_inet_protocol_name(__entry->protocol),
+ __entry->sport, __entry->dport,
+ __entry->saddr, __entry->daddr,
+ __entry->saddr_v6, __entry->daddr_v6,
+ show_tcp_state_name(__entry->oldstate),
+ show_tcp_state_name(__entry->newstate))
+);
+
#endif /* _TRACE_SOCK_H */
/* This part must be outside protection */
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index ec52fb3..8e88a16 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -9,37 +9,6 @@
#include <linux/tracepoint.h>
#include <net/ipv6.h>
-#define tcp_state_names \
- EM(TCP_ESTABLISHED) \
- EM(TCP_SYN_SENT) \
- EM(TCP_SYN_RECV) \
- EM(TCP_FIN_WAIT1) \
- EM(TCP_FIN_WAIT2) \
- EM(TCP_TIME_WAIT) \
- EM(TCP_CLOSE) \
- EM(TCP_CLOSE_WAIT) \
- EM(TCP_LAST_ACK) \
- EM(TCP_LISTEN) \
- EM(TCP_CLOSING) \
- EMe(TCP_NEW_SYN_RECV) \
-
-/* enums need to be exported to user space */
-#undef EM
-#undef EMe
-#define EM(a) TRACE_DEFINE_ENUM(a);
-#define EMe(a) TRACE_DEFINE_ENUM(a);
-
-tcp_state_names
-
-#undef EM
-#undef EMe
-#define EM(a) tcp_state_name(a),
-#define EMe(a) tcp_state_name(a)
-
-#define tcp_state_name(state) { state, #state }
-#define show_tcp_state_name(val) \
- __print_symbolic(val, tcp_state_names)
-
/*
* tcp event with arguments sk and skb
*
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f00499a..bab98a4 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -121,6 +121,7 @@
#endif
#include <net/l3mdev.h>
+#include <trace/events/sock.h>
/* The inetsw table contains everything that inet_create needs to
* build a new socket.
@@ -1220,6 +1221,19 @@ int inet_sk_rebuild_header(struct sock *sk)
}
EXPORT_SYMBOL(inet_sk_rebuild_header);
+void inet_sk_set_state(struct sock *sk, int state)
+{
+ trace_inet_sock_set_state(sk, sk->sk_state, state);
+ sk->sk_state = state;
+}
+EXPORT_SYMBOL(inet_sk_set_state);
+
+void inet_sk_state_store(struct sock *sk, int newstate)
+{
+ trace_inet_sock_set_state(sk, sk->sk_state, newstate);
+ smp_store_release(&sk->sk_state, newstate);
+}
+
struct sk_buff *inet_gso_segment(struct sk_buff *skb,
netdev_features_t features)
{
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ca46dc..f460fc0 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -783,7 +783,7 @@ struct sock *inet_csk_clone_lock(const struct sock *sk,
if (newsk) {
struct inet_connection_sock *newicsk = inet_csk(newsk);
- newsk->sk_state = TCP_SYN_RECV;
+ inet_sk_set_state(newsk, TCP_SYN_RECV);
newicsk->icsk_bind_hash = NULL;
inet_sk(newsk)->inet_dport = inet_rsk(req)->ir_rmt_port;
@@ -877,7 +877,7 @@ int inet_csk_listen_start(struct sock *sk, int backlog)
* It is OK, because this socket enters to hash table only
* after validation is complete.
*/
- sk_state_store(sk, TCP_LISTEN);
+ inet_sk_state_store(sk, TCP_LISTEN);
if (!sk->sk_prot->get_port(sk, inet->inet_num)) {
inet->inet_sport = htons(inet->inet_num);
@@ -888,7 +888,7 @@ int inet_csk_listen_start(struct sock *sk, int backlog)
return 0;
}
- sk->sk_state = TCP_CLOSE;
+ inet_sk_set_state(sk, TCP_CLOSE);
return err;
}
EXPORT_SYMBOL_GPL(inet_csk_listen_start);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index f6f5810..37b7da0 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -544,7 +544,7 @@ bool inet_ehash_nolisten(struct sock *sk, struct sock *osk)
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
} else {
percpu_counter_inc(sk->sk_prot->orphan_count);
- sk->sk_state = TCP_CLOSE;
+ inet_sk_set_state(sk, TCP_CLOSE);
sock_set_flag(sk, SOCK_DEAD);
inet_csk_destroy_sock(sk);
}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c470fec..d408fb4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -283,8 +283,6 @@
#include <asm/ioctls.h>
#include <net/busy_poll.h>
-#include <trace/events/tcp.h>
-
struct percpu_counter tcp_orphan_count;
EXPORT_SYMBOL_GPL(tcp_orphan_count);
@@ -2040,8 +2038,6 @@ void tcp_set_state(struct sock *sk, int state)
{
int oldstate = sk->sk_state;
- trace_tcp_set_state(sk, oldstate, state);
-
switch (state) {
case TCP_ESTABLISHED:
if (oldstate != TCP_ESTABLISHED)
@@ -2065,7 +2061,7 @@ void tcp_set_state(struct sock *sk, int state)
/* Change state AFTER socket is unhashed to avoid closed
* socket sitting in hash tables.
*/
- sk_state_store(sk, state);
+ inet_sk_state_store(sk, state);
#ifdef STATE_TRACE
SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk, statename[oldstate], statename[state]);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 net-next 1/5] tcp: Export to userspace the TCP state names for the trace events
From: Yafang Shao @ 2017-12-20 3:12 UTC (permalink / raw)
To: songliubraving, davem, marcelo.leitner, rostedt
Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513739574-3345-1-git-send-email-laoar.shao@gmail.com>
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
The TCP trace events (specifically tcp_set_state), maps emums to symbol
names via __print_symbolic(). But this only works for reading trace events
from the tracefs trace files. If perf or trace-cmd were to record these
events, the event format file does not convert the enum names into numbers,
and you get something like:
__print_symbolic(REC->oldstate,
{ TCP_ESTABLISHED, "TCP_ESTABLISHED" },
{ TCP_SYN_SENT, "TCP_SYN_SENT" },
{ TCP_SYN_RECV, "TCP_SYN_RECV" },
{ TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
{ TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
{ TCP_TIME_WAIT, "TCP_TIME_WAIT" },
{ TCP_CLOSE, "TCP_CLOSE" },
{ TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
{ TCP_LAST_ACK, "TCP_LAST_ACK" },
{ TCP_LISTEN, "TCP_LISTEN" },
{ TCP_CLOSING, "TCP_CLOSING" },
{ TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })
Where trace-cmd and perf do not know the values of those enums.
Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
the enum strings into their values at system boot. This will allow perf and
trace-cmd to see actual numbers and not enums:
__print_symbolic(REC->oldstate,
{ 1, "TCP_ESTABLISHED" },
{ 2, "TCP_SYN_SENT" },
{ 3, "TCP_SYN_RECV" },
{ 4, "TCP_FIN_WAIT1" },
{ 5, "TCP_FIN_WAIT2" },
{ 6, "TCP_TIME_WAIT" },
{ 7, "TCP_CLOSE" },
{ 8, "TCP_CLOSE_WAIT" },
{ 9, "TCP_LAST_ACK" },
{ 10, "TCP_LISTEN" },
{ 11, "TCP_CLOSING" },
{ 12, "TCP_NEW_SYN_RECV" })
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/trace/events/tcp.h | 41 ++++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 07cccca..ec52fb3 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -9,21 +9,36 @@
#include <linux/tracepoint.h>
#include <net/ipv6.h>
+#define tcp_state_names \
+ EM(TCP_ESTABLISHED) \
+ EM(TCP_SYN_SENT) \
+ EM(TCP_SYN_RECV) \
+ EM(TCP_FIN_WAIT1) \
+ EM(TCP_FIN_WAIT2) \
+ EM(TCP_TIME_WAIT) \
+ EM(TCP_CLOSE) \
+ EM(TCP_CLOSE_WAIT) \
+ EM(TCP_LAST_ACK) \
+ EM(TCP_LISTEN) \
+ EM(TCP_CLOSING) \
+ EMe(TCP_NEW_SYN_RECV) \
+
+/* enums need to be exported to user space */
+#undef EM
+#undef EMe
+#define EM(a) TRACE_DEFINE_ENUM(a);
+#define EMe(a) TRACE_DEFINE_ENUM(a);
+
+tcp_state_names
+
+#undef EM
+#undef EMe
+#define EM(a) tcp_state_name(a),
+#define EMe(a) tcp_state_name(a)
+
#define tcp_state_name(state) { state, #state }
#define show_tcp_state_name(val) \
- __print_symbolic(val, \
- tcp_state_name(TCP_ESTABLISHED), \
- tcp_state_name(TCP_SYN_SENT), \
- tcp_state_name(TCP_SYN_RECV), \
- tcp_state_name(TCP_FIN_WAIT1), \
- tcp_state_name(TCP_FIN_WAIT2), \
- tcp_state_name(TCP_TIME_WAIT), \
- tcp_state_name(TCP_CLOSE), \
- tcp_state_name(TCP_CLOSE_WAIT), \
- tcp_state_name(TCP_LAST_ACK), \
- tcp_state_name(TCP_LISTEN), \
- tcp_state_name(TCP_CLOSING), \
- tcp_state_name(TCP_NEW_SYN_RECV))
+ __print_symbolic(val, tcp_state_names)
/*
* tcp event with arguments sk and skb
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3 net-next 0/5] replace tcp_set_state tracepoint with inet_sock_set_state
From: Yafang Shao @ 2017-12-20 3:12 UTC (permalink / raw)
To: songliubraving, davem, marcelo.leitner, rostedt
Cc: bgregg, netdev, linux-kernel, Yafang Shao
According to the discussion in the mail thread
https://patchwork.kernel.org/patch/10099243/,
tcp_set_state tracepoint is renamed to inet_sock_set_state tracepoint and is
moved to include/trace/events/sock.h.
With this new tracepoint, we can trace AF_INET/AF_INET6 sock state transitions.
As there's only one single tracepoint for inet, so I didn't create a new trace
file named trace/events/inet_sock.h, and just place it in
include/trace/events/sock.h
Currently TCP/DCCP/SCTP state transitions are traced with this tracepoint.
- Why not more protocol ?
If we really think that anonter protocol should be traced, I will modify the
code to trace it.
I just want to make the code easy and not output useless information.
Steven Rostedt (VMware) (1):
tcp: Export to userspace the TCP state names for the trace events
Yafang Shao (4):
net: tracepoint: replace tcp_set_state tracepoint with
inet_sock_set_state tracepoint
net: sock: replace sk_state_load with inet_sk_state_load and remove
sk_state_store
net: tracepoint: using sock_set_state tracepoint to trace DCCP state
transition
net: tracepoint: using sock_set_state tracepoint to trace SCTP state
transition
include/net/inet_sock.h | 25 ++++++++++
include/net/sock.h | 25 ----------
include/trace/events/sock.h | 107 ++++++++++++++++++++++++++++++++++++++++
include/trace/events/tcp.h | 16 ------
net/dccp/proto.c | 2 +-
net/ipv4/af_inet.c | 14 ++++++
net/ipv4/inet_connection_sock.c | 8 +--
net/ipv4/inet_hashtables.c | 2 +-
net/ipv4/tcp.c | 10 ++--
net/ipv4/tcp_diag.c | 2 +-
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
net/sctp/endpointola.c | 2 +-
net/sctp/sm_sideeffect.c | 4 +-
net/sctp/socket.c | 12 ++---
15 files changed, 167 insertions(+), 66 deletions(-)
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH v4 04/36] nds32: Kernel booting and initialization
From: Greentime Hu @ 2017-12-20 2:35 UTC (permalink / raw)
To: Randy Dunlap
Cc: Greentime, Linux Kernel Mailing List, Arnd Bergmann, linux-arch,
Thomas Gleixner, Jason Cooper, Marc Zyngier, Rob Herring, netdev,
Vincent Chen, DTML, Al Viro, David Howells, Will Deacon,
Daniel Lezcano, linux-serial, Geert Uytterhoeven, Linus Walleij,
Mark Rutland, Greg KH
In-Reply-To: <78afd442-4482-f104-746e-5984214658ee@infradead.org>
2017-12-20 6:01 GMT+08:00 Randy Dunlap <rdunlap@infradead.org>:
> On 12/17/2017 10:46 PM, Greentime Hu wrote:
>> From: Greentime Hu <greentime@andestech.com>
>>
>> This patch includes the kernel startup code. It can get dtb pointer
>> passed from bootloader. It will create a temp mapping by tlb
>> instructions at beginning and goto start_kernel.
>>
>> Signed-off-by: Vincent Chen <vincentc@andestech.com>
>> Signed-off-by: Greentime Hu <greentime@andestech.com>
>> ---
>> arch/nds32/kernel/head.S | 189 ++++++++++++++++++++++
>> arch/nds32/kernel/setup.c | 383 +++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 572 insertions(+)
>> create mode 100644 arch/nds32/kernel/head.S
>> create mode 100644 arch/nds32/kernel/setup.c
>>
>
>> diff --git a/arch/nds32/kernel/setup.c b/arch/nds32/kernel/setup.c
>> new file mode 100644
>> index 0000000..7718c58
>> --- /dev/null
>> +++ b/arch/nds32/kernel/setup.c
>> @@ -0,0 +1,383 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (C) 2005-2017 Andes Technology Corporation
>> +
>
> [snip]
>
>> +struct cache_info L1_cache_info[2];
>> +static void __init dump_cpu_info(int cpu)
>> +{
>> + int i, p = 0;
>> + char str[sizeof(hwcap_str) + 16];
>> +
>> + for (i = 0; hwcap_str[i]; i++) {
>> + if (elf_hwcap & (1 << i)) {
>> + sprintf(str + p, "%s ", hwcap_str[i]);
>> + p += strlen(hwcap_str[i]) + 1;
>> + }
>> + }
>> +
>> + pr_info("CPU%d Featuretures: %s\n", cpu, str);
>
> Features:
>
Thanks Randy. I will fix this typo.
^ permalink raw reply
* Re: [PATCH v4 25/36] nds32: Miscellaneous header files
From: Greentime Hu @ 2017-12-20 2:34 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Greentime, Linux Kernel Mailing List, linux-arch, Thomas Gleixner,
Jason Cooper, Marc Zyngier, Rob Herring, Networking, Vincent Chen,
DTML, Al Viro, David Howells, Will Deacon, Daniel Lezcano,
linux-serial-u79uwXL29TY76Z2rM5mHXA, Geert Uytterhoeven,
Linus Walleij, Mark Rutland, Greg KH, Guo Ren
In-Reply-To: <CAK8P3a3Ofczq1DrQEcEcP1fZrgyeOLpFDwgd7uMZ4H0NpHs+wg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-19 17:54 GMT+08:00 Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>:
> On Tue, Dec 19, 2017 at 6:34 AM, Greentime Hu <green.hu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi, Arnd:
>>
>> 2017-12-18 19:13 GMT+08:00 Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>:
>>> On Mon, Dec 18, 2017 at 7:46 AM, Greentime Hu <green.hu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> From: Greentime Hu <greentime-MUIXKm3Oiri1Z/+hSey0Gg@public.gmane.org>
>>>>
>>>> This patch introduces some miscellaneous header files.
>>>
>>>> +static inline void __delay(unsigned long loops)
>>>> +{
>>>> + __asm__ __volatile__(".align 2\n"
>>>> + "1:\n"
>>>> + "\taddi\t%0, %0, -1\n"
>>>> + "\tbgtz\t%0, 1b\n"
>>>> + :"=r"(loops)
>>>> + :"0"(loops));
>>>> +}
>>>> +
>>>> +static inline void __udelay(unsigned long usecs, unsigned long lpj)
>>>> +{
>>>> + usecs *= (unsigned long)(((0x8000000000000000ULL / (500000 / HZ)) +
>>>> + 0x80000000ULL) >> 32);
>>>> + usecs = (unsigned long)(((unsigned long long)usecs * lpj) >> 32);
>>>> + __delay(usecs);
>>>> +}
>>>
>>> Do you have a reliable clocksource that you can read here instead of doing the
>>> loop? It's generally preferred to have an accurate delay if at all possible, the
>>> delay loop calibration is only for those architectures that don't have any
>>> way to observe how much time has passed accurately.
>>>
>>
>> We currently only have atcpit100 as clocksource but it is an IP of SoC.
>> These delay API will be unavailable if we changed to another SoC
>> unless all these timer driver provided the same APIs.
>> It may suffer our customers if they forget to port these APIs in their
>> timer drivers when they try to use nds32 in the first beginning.
>
> Ok, thanks for the clarification.
>
>> Or maybe I can use a CONFIG_USE_ACCURATE_DELAY to keep these 2
>> implementions for these purposes?
>
> I'd just add a one-line comment in delay.h to explain that there is no
> cycle counter in the CPU.
>
Thanks.
Got it. I will add a one-line comment in delay.h
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next v2] netdevsim: correctly check return value of debugfs_create_dir
From: Jakub Kicinski @ 2017-12-20 2:34 UTC (permalink / raw)
To: Prashant Bhole; +Cc: David S . Miller, netdev
In-Reply-To: <20171220022715.2356-1-bhole_prashant_q7@lab.ntt.co.jp>
On Wed, 20 Dec 2017 11:27:15 +0900, Prashant Bhole wrote:
> diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
> index eb8c679fca9f..c2a02d1944b8 100644
> --- a/drivers/net/netdevsim/netdev.c
> +++ b/drivers/net/netdevsim/netdev.c
> @@ -147,10 +147,12 @@ struct device_type nsim_dev_type = {
> static int nsim_init(struct net_device *dev)
> {
> struct netdevsim *ns = netdev_priv(dev);
> - int err;
> + int err = -ENOMEM;
>
> ns->netdev = dev;
> ns->ddir = debugfs_create_dir(netdev_name(dev), nsim_ddir);
> + if (IS_ERR_OR_NULL(ns->ddir))
> + goto err;
nit:
Could you return err; here directly instead of go(ing )to return
and having label and variable of the same name? Same in
nsim_module_init().
With that feel free to add:
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Thanks!
> err = nsim_bpf_init(ns);
> if (err)
> @@ -171,6 +173,7 @@ static int nsim_init(struct net_device *dev)
> nsim_bpf_uninit(ns);
> err_debugfs_destroy:
> debugfs_remove_recursive(ns->ddir);
> +err:
> return err;
> }
>
^ permalink raw reply
* Re: [PATCH -tip v3 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events
From: Masami Hiramatsu @ 2017-12-20 2:31 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Ingo Molnar, Stephen Hemminger, Steven Rostedt, Peter Zijlstra,
Thomas Gleixner, LKML, David S . Miller, netdev
In-Reply-To: <20171219180155.xxkv437fqmwhmhgg@ast-mbp.dhcp.thefacebook.com>
On Tue, 19 Dec 2017 10:01:56 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Tue, Dec 19, 2017 at 05:56:55PM +0900, Masami Hiramatsu wrote:
> > include/trace/events/sctp.h | 98 ++++++++++++++
> > include/trace/events/tcp.h | 80 +++++++++++
> > net/Kconfig | 17 --
> > net/dccp/Kconfig | 17 --
> > net/dccp/Makefile | 2
> > net/dccp/probe.c | 203 -----------------------------
> > net/dccp/proto.c | 5 +
> > net/dccp/trace.h | 105 +++++++++++++++
> > net/ipv4/Makefile | 1
> > net/ipv4/tcp_input.c | 3
> > net/ipv4/tcp_probe.c | 301 -------------------------------------------
> > net/sctp/Kconfig | 12 --
> > net/sctp/Makefile | 3
> > net/sctp/probe.c | 244 -----------------------------------
> > net/sctp/sm_statefuns.c | 5 +
> > 15 files changed, 296 insertions(+), 800 deletions(-)
>
> You need to target net-next tree for this patch set.
>
Good point! I'll rebased on net-next tree. Anyway, I got an issue
building this on i386. I'll fix it and resend again.
Thank you,
--
Masami Hiramatsu <mhiramat@kernel.org>
^ permalink raw reply
* [PATCH net-next v2] netdevsim: correctly check return value of debugfs_create_dir
From: Prashant Bhole @ 2017-12-20 2:27 UTC (permalink / raw)
To: David S . Miller; +Cc: Prashant Bhole, netdev, Jakub Kicinski
- Checking return value with IS_ERROR_OR_NULL
- Added error handling where it was not handled
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
drivers/net/netdevsim/bpf.c | 8 ++++----
drivers/net/netdevsim/netdev.c | 12 ++++++++----
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 078d2c37a6c1..aeb429428cc5 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -201,7 +201,6 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog)
{
struct nsim_bpf_bound_prog *state;
char name[16];
- int err;
state = kzalloc(sizeof(*state), GFP_KERNEL);
if (!state)
@@ -214,10 +213,9 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog)
/* Program id is not populated yet when we create the state. */
sprintf(name, "%u", ns->prog_id_gen++);
state->ddir = debugfs_create_dir(name, ns->ddir_bpf_bound_progs);
- if (IS_ERR(state->ddir)) {
- err = PTR_ERR(state->ddir);
+ if (IS_ERR_OR_NULL(state->ddir)) {
kfree(state);
- return err;
+ return -ENOMEM;
}
debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id);
@@ -349,6 +347,8 @@ int nsim_bpf_init(struct netdevsim *ns)
&ns->bpf_bind_verifier_delay);
ns->ddir_bpf_bound_progs =
debugfs_create_dir("bpf_bound_progs", ns->ddir);
+ if (IS_ERR_OR_NULL(ns->ddir_bpf_bound_progs))
+ return -ENOMEM;
ns->bpf_tc_accept = true;
debugfs_create_bool("bpf_tc_accept", 0600, ns->ddir,
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index eb8c679fca9f..c2a02d1944b8 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -147,10 +147,12 @@ struct device_type nsim_dev_type = {
static int nsim_init(struct net_device *dev)
{
struct netdevsim *ns = netdev_priv(dev);
- int err;
+ int err = -ENOMEM;
ns->netdev = dev;
ns->ddir = debugfs_create_dir(netdev_name(dev), nsim_ddir);
+ if (IS_ERR_OR_NULL(ns->ddir))
+ goto err;
err = nsim_bpf_init(ns);
if (err)
@@ -171,6 +173,7 @@ static int nsim_init(struct net_device *dev)
nsim_bpf_uninit(ns);
err_debugfs_destroy:
debugfs_remove_recursive(ns->ddir);
+err:
return err;
}
@@ -466,11 +469,11 @@ struct dentry *nsim_ddir;
static int __init nsim_module_init(void)
{
- int err;
+ int err = -ENOMEM;
nsim_ddir = debugfs_create_dir(DRV_NAME, NULL);
- if (IS_ERR(nsim_ddir))
- return PTR_ERR(nsim_ddir);
+ if (IS_ERR_OR_NULL(nsim_ddir))
+ goto err;
err = bus_register(&nsim_bus);
if (err)
@@ -486,6 +489,7 @@ static int __init nsim_module_init(void)
bus_unregister(&nsim_bus);
err_debugfs_destroy:
debugfs_remove_recursive(nsim_ddir);
+err:
return err;
}
--
2.13.6
^ permalink raw reply related
* [PATCH v3,net-next 2/2] ip6_gre: fix error path when ip6erspan_rcv failed
From: Haishuang Yan @ 2017-12-20 2:21 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
Cc: netdev, linux-kernel, Haishuang Yan, William Tu
In-Reply-To: <1513736507-22968-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
Same as ipv4 code, when ip6erspan_rcv call return PACKET_REJECT, we
should call icmpv6_send to send icmp unreachable message in error path.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Acked-by: William Tu <u9012063@gmail.com>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
Change since v2:
* Rebase on latest master branch.
* Fix wrong commit information.
---
net/ipv6/ip6_gre.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 45038a9..8451d00 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -604,12 +604,13 @@ static int gre_rcv(struct sk_buff *skb)
tpi.proto == htons(ETH_P_ERSPAN2))) {
if (ip6erspan_rcv(skb, hdr_len, &tpi) == PACKET_RCVD)
return 0;
- goto drop;
+ goto out;
}
if (ip6gre_rcv(skb, &tpi) == PACKET_RCVD)
return 0;
+out:
icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0);
drop:
kfree_skb(skb);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3,net-next 1/2] ip_gre: fix error path when erspan_rcv failed
From: Haishuang Yan @ 2017-12-20 2:21 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
Cc: netdev, linux-kernel, Haishuang Yan, William Tu
In-Reply-To: <1513736507-22968-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
When erspan_rcv call return PACKET_REJECT, we shoudn't call ipgre_rcv to
process packets again, instead send icmp unreachable message in error
path.
Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Acked-by: William Tu <u9012063@gmail.com>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
Change since v3:
* Rebase on latest master branch.
* Fix wrong commit information.
---
net/ipv4/ip_gre.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 3029e3e..90c9123 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -436,11 +436,13 @@ static int gre_rcv(struct sk_buff *skb)
tpi.proto == htons(ETH_P_ERSPAN2))) {
if (erspan_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
return 0;
+ goto out;
}
if (ipgre_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
return 0;
+out:
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
drop:
kfree_skb(skb);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3,net-next 0/2] net: erspan: fix erspan_rcv/ip6erspan_rcv error path
From: Haishuang Yan @ 2017-12-20 2:21 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
Cc: netdev, linux-kernel, Haishuang Yan
This patch series fix potential issue in error path.
Haishuang Yan (2):
ip_gre: fix error path when erspan_rcv failed
ip6_gre: fix error path when ip6erspan_rcv failed
net/ipv4/ip_gre.c | 2 ++
net/ipv6/ip6_gre.c | 3 ++-
2 files changed, 4 insertions(+), 1 deletion(-)
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH v10 1/5] add infrastructure for tagging functions as error injectable
From: Alexei Starovoitov @ 2017-12-20 2:14 UTC (permalink / raw)
To: Masami Hiramatsu, Josef Bacik
Cc: rostedt, mingo, davem, netdev, linux-kernel, ast, kernel-team,
daniel, linux-btrfs, darrick.wong, Josef Bacik
In-Reply-To: <20171219152925.5789309c6c4d27807d42f11c@kernel.org>
On 12/18/17 10:29 PM, Masami Hiramatsu wrote:
>>
>> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
>> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
>
> BTW, CONFIG_BPF_KPROBE_OVERRIDE is also confusable name.
> Since this feature override a function to just return with
> some return value (as far as I understand, or would you
> also plan to modify execution path inside a function?),
> I think it should be better CONFIG_BPF_FUNCTION_OVERRIDE or
> CONFIG_BPF_EXECUTION_OVERRIDE.
I don't think such renaming makes sense.
The feature is overriding kprobe by changing how kprobe returns.
It doesn't override BPF_FUNCTION or BPF_EXECUTION.
The kernel enters and exists bpf program as normal.
> Indeed, BPF is based on kprobes, but it seems you are limiting it
> with ftrace (function-call trace) (I'm not sure the reason why),
> so using "kprobes" for this feature seems strange for me.
do you have an idea how kprobe override can happen when kprobe
placed in the middle of the function?
Please make your suggestion as patches based on top of bpf-next.
Thanks
^ permalink raw reply
* [PATCH v3,net-next 2/2] ip6_gre: fix potential memory leak in ip6erspan_rcv
From: Haishuang Yan @ 2017-12-20 2:07 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
Cc: netdev, linux-kernel, Haishuang Yan, William Tu
In-Reply-To: <1513735621-21913-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
If md is NULL, tun_dst must be freed, otherwise it will cause memory
leak.
Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
Changes since v3:
* Rebase on latest master branch.
* Fix wrong commit information.
---
net/ipv6/ip6_gre.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 9bd1103..45038a9 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -550,8 +550,10 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len,
info = &tun_dst->u.tun_info;
md = ip_tunnel_info_opts(info);
- if (!md)
+ if (!md) {
+ dst_release((struct dst_entry *)tun_dst);
return PACKET_REJECT;
+ }
memcpy(md, pkt_md, sizeof(*md));
md->version = ver;
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3,net-next 1/2] ip_gre: fix potential memory leak in erspan_rcv
From: Haishuang Yan @ 2017-12-20 2:07 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
Cc: netdev, linux-kernel, Haishuang Yan, William Tu
In-Reply-To: <1513735621-21913-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
If md is NULL, tun_dst must be freed, otherwise it will cause memory
leak.
Fixes: 1a66a836da6 ("gre: add collect_md mode to ERSPAN tunnel")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
Changes since v3:
* Rebase on latest master branch.
* Fix wrong commit information.
---
net/ipv4/ip_gre.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index fd4d6e9..3029e3e 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -313,8 +313,10 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi,
return PACKET_REJECT;
md = ip_tunnel_info_opts(&tun_dst->u.tun_info);
- if (!md)
+ if (!md) {
+ dst_release((struct dst_entry *)tun_dst);
return PACKET_REJECT;
+ }
memcpy(md, pkt_md, sizeof(*md));
md->version = ver;
--
1.8.3.1
^ permalink raw reply related
* [PATCH v3,net-next 0/2] net: erspan: fix potential memory leak
From: Haishuang Yan @ 2017-12-20 2:06 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
Cc: netdev, linux-kernel, Haishuang Yan
This patch series fix potential memory leak issue.
Haishuang Yan (2):
ip_gre: fix potential memory leak in erspan_rcv
ip6_gre: fix potential memory leak in ip6erspan_rcv
net/ipv4/ip_gre.c | 4 +++-
net/ipv6/ip6_gre.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH v4 iproute2 net-next] erspan: add erspan version II support
From: William Tu @ 2017-12-20 2:01 UTC (permalink / raw)
To: netdev; +Cc: dsahern
The patch adds support for configuring the erspan v2, for both
ipv4 and ipv6 erspan implementation. Three additional fields
are added: 'erspan_ver' for distinguishing v1 or v2, 'erspan_dir'
for specifying direction of the mirrored traffic, and 'erspan_hwid'
for users to set ERSPAN engine ID within a system.
As for manpage, the ERSPAN descriptions used to be under GRE, IPIP,
SIT Type paragraph. Since IP6GRE/IP6GRETAP also supports ERSPAN,
the patch removes the old one, creates a separate ERSPAN paragrah,
and adds an example.
Signed-off-by: William Tu <u9012063@gmail.com>
---
change in v4:
- use matches instead of strcmp on ingress/egress
change in v3:
- change erspan_dir 0/1 to "in[gress]/e[gress]"
- update manpage
change in v2:
- fix typo ETH_P_ERSPAN2
- fix space and indent
---
include/uapi/linux/if_ether.h | 1 +
include/uapi/linux/if_tunnel.h | 3 ++
ip/link_gre.c | 66 ++++++++++++++++++++++++++++--
ip/link_gre6.c | 67 ++++++++++++++++++++++++++++--
man/man8/ip-link.8.in | 92 ++++++++++++++++++++++++++++++++++++------
5 files changed, 210 insertions(+), 19 deletions(-)
diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index 2eb529a90250..133567bf2e04 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -47,6 +47,7 @@
#define ETH_P_PUP 0x0200 /* Xerox PUP packet */
#define ETH_P_PUPAT 0x0201 /* Xerox PUP Addr Trans packet */
#define ETH_P_TSN 0x22F0 /* TSN (IEEE 1722) packet */
+#define ETH_P_ERSPAN2 0x22EB /* ERSPAN version 2 (type III) */
#define ETH_P_IP 0x0800 /* Internet Protocol packet */
#define ETH_P_X25 0x0805 /* CCITT X.25 */
#define ETH_P_ARP 0x0806 /* Address Resolution packet */
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 38cdf90692f8..ecdc76669cfd 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -137,6 +137,9 @@ enum {
IFLA_GRE_IGNORE_DF,
IFLA_GRE_FWMARK,
IFLA_GRE_ERSPAN_INDEX,
+ IFLA_GRE_ERSPAN_VER,
+ IFLA_GRE_ERSPAN_DIR,
+ IFLA_GRE_ERSPAN_HWID,
__IFLA_GRE_MAX,
};
diff --git a/ip/link_gre.c b/ip/link_gre.c
index 43cb1af6196a..0b9c71baebaf 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -98,6 +98,9 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
__u8 ignore_df = 0;
__u32 fwmark = 0;
__u32 erspan_idx = 0;
+ __u8 erspan_ver = 0;
+ __u8 erspan_dir = 0;
+ __u16 erspan_hwid = 0;
if (!(n->nlmsg_flags & NLM_F_CREATE)) {
if (rtnl_talk(&rth, &req.n, &answer) < 0) {
@@ -179,6 +182,15 @@ get_failed:
if (greinfo[IFLA_GRE_ERSPAN_INDEX])
erspan_idx = rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
+ if (greinfo[IFLA_GRE_ERSPAN_VER])
+ erspan_ver = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_VER]);
+
+ if (greinfo[IFLA_GRE_ERSPAN_DIR])
+ erspan_dir = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_DIR]);
+
+ if (greinfo[IFLA_GRE_ERSPAN_HWID])
+ erspan_hwid = rta_getattr_u16(greinfo[IFLA_GRE_ERSPAN_HWID]);
+
free(answer);
}
@@ -343,6 +355,24 @@ get_failed:
invarg("invalid erspan index\n", *argv);
if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
+ } else if (strcmp(*argv, "erspan_ver") == 0) {
+ NEXT_ARG();
+ if (get_u8(&erspan_ver, *argv, 0))
+ invarg("invalid erspan version\n", *argv);
+ if (erspan_ver != 1 && erspan_ver != 2)
+ invarg("erspan version must be 1 or 2\n", *argv);
+ } else if (strcmp(*argv, "erspan_dir") == 0) {
+ NEXT_ARG();
+ if (matches(*argv, "ingress") == 0)
+ erspan_dir = 0;
+ else if (matches(*argv, "egress") == 0)
+ erspan_dir = 1;
+ else
+ invarg("Invalid erspan direction.", *argv);
+ } else if (strcmp(*argv, "erspan_hwid") == 0) {
+ NEXT_ARG();
+ if (get_u16(&erspan_hwid, *argv, 0))
+ invarg("invalid erspan hwid\n", *argv);
} else
usage();
argc--; argv++;
@@ -374,8 +404,15 @@ get_failed:
addattr_l(n, 1024, IFLA_GRE_TTL, &ttl, 1);
addattr_l(n, 1024, IFLA_GRE_TOS, &tos, 1);
addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
- if (erspan_idx != 0)
- addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+ if (erspan_ver) {
+ addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
+ if (erspan_ver == 1 && erspan_idx != 0) {
+ addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+ } else if (erspan_ver == 2) {
+ addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
+ addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
+ }
+ }
} else {
addattr_l(n, 1024, IFLA_GRE_COLLECT_METADATA, NULL, 0);
}
@@ -514,7 +551,30 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
if (tb[IFLA_GRE_ERSPAN_INDEX]) {
__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
- fprintf(f, "erspan_index %u ", erspan_idx);
+ print_uint(PRINT_ANY, "erspan_index", "erspan_index %u ", erspan_idx);
+ }
+
+ if (tb[IFLA_GRE_ERSPAN_VER]) {
+ __u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
+
+ print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", erspan_ver);
+ }
+
+ if (tb[IFLA_GRE_ERSPAN_DIR]) {
+ __u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
+
+ if (erspan_dir == 0)
+ print_string(PRINT_ANY, "erspan_dir",
+ "erspan_dir ingress ", NULL);
+ else
+ print_string(PRINT_ANY, "erspan_dir",
+ "erspan_dir egress ", NULL);
+ }
+
+ if (tb[IFLA_GRE_ERSPAN_HWID]) {
+ __u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
+
+ print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
}
if (tb[IFLA_GRE_ENCAP_TYPE] &&
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 2cb46ca116d0..e4a8e1f5ee41 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -109,6 +109,9 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
int len;
__u32 fwmark = 0;
__u32 erspan_idx = 0;
+ __u8 erspan_ver = 0;
+ __u8 erspan_dir = 0;
+ __u16 erspan_hwid = 0;
if (!(n->nlmsg_flags & NLM_F_CREATE)) {
if (rtnl_talk(&rth, &req.n, &answer) < 0) {
@@ -191,6 +194,15 @@ get_failed:
if (greinfo[IFLA_GRE_ERSPAN_INDEX])
erspan_idx = rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
+ if (greinfo[IFLA_GRE_ERSPAN_VER])
+ erspan_ver = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_VER]);
+
+ if (greinfo[IFLA_GRE_ERSPAN_DIR])
+ erspan_dir = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_DIR]);
+
+ if (greinfo[IFLA_GRE_ERSPAN_HWID])
+ erspan_hwid = rta_getattr_u16(greinfo[IFLA_GRE_ERSPAN_HWID]);
+
free(answer);
}
@@ -389,6 +401,24 @@ get_failed:
invarg("invalid erspan index\n", *argv);
if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
+ } else if (strcmp(*argv, "erspan_ver") == 0) {
+ NEXT_ARG();
+ if (get_u8(&erspan_ver, *argv, 0))
+ invarg("invalid erspan version\n", *argv);
+ if (erspan_ver != 1 && erspan_ver != 2)
+ invarg("erspan version must be 1 or 2\n", *argv);
+ } else if (strcmp(*argv, "erspan_dir") == 0) {
+ NEXT_ARG();
+ if (matches(*argv, "ingress") == 0)
+ erspan_dir = 0;
+ else if (matches(*argv, "egress") == 0)
+ erspan_dir = 1;
+ else
+ invarg("Invalid erspan direction.", *argv);
+ } else if (strcmp(*argv, "erspan_hwid") == 0) {
+ NEXT_ARG();
+ if (get_u16(&erspan_hwid, *argv, 0))
+ invarg("invalid erspan hwid\n", *argv);
} else
usage();
argc--; argv++;
@@ -408,9 +438,15 @@ get_failed:
addattr_l(n, 1024, IFLA_GRE_FLOWINFO, &flowinfo, 4);
addattr32(n, 1024, IFLA_GRE_FLAGS, flags);
addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
- if (erspan_idx != 0)
- addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
-
+ if (erspan_ver) {
+ addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
+ if (erspan_ver == 1 && erspan_idx != 0) {
+ addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+ } else {
+ addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
+ addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
+ }
+ }
addattr16(n, 1024, IFLA_GRE_ENCAP_TYPE, encaptype);
addattr16(n, 1024, IFLA_GRE_ENCAP_FLAGS, encapflags);
addattr16(n, 1024, IFLA_GRE_ENCAP_SPORT, htons(encapsport));
@@ -587,7 +623,30 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
if (tb[IFLA_GRE_ERSPAN_INDEX]) {
__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
- fprintf(f, "erspan_index %u ", erspan_idx);
+ print_uint(PRINT_ANY, "erspan_index", "erspan_index %u ", erspan_idx);
+ }
+
+ if (tb[IFLA_GRE_ERSPAN_VER]) {
+ __u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
+
+ print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", erspan_ver);
+ }
+
+ if (tb[IFLA_GRE_ERSPAN_DIR]) {
+ __u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
+
+ if (erspan_dir == 0)
+ print_string(PRINT_ANY, "erspan_dir",
+ "erspan_dir ingress ", NULL);
+ else
+ print_string(PRINT_ANY, "erspan_dir",
+ "erspan_dir egress ", NULL);
+ }
+
+ if (tb[IFLA_GRE_ERSPAN_HWID]) {
+ __u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
+
+ print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
}
if (tb[IFLA_GRE_ENCAP_TYPE] &&
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 9e9a5f0d2cef..0086b3dfa09d 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -665,13 +665,13 @@ keyword.
.in -8
.TP
-GRE, IPIP, SIT, ERSPAN Type Support
+GRE, IPIP, SIT Type Support
For a link of types
-.I GRE/IPIP/SIT/ERSPAN
+.I GRE/IPIP/SIT
the following additional arguments are supported:
.BI "ip link add " DEVICE
-.BR type " { " gre " | " ipip " | " sit " | " erspan " }"
+.BR type " { " gre " | " ipip " | " sit " }"
.BI " remote " ADDR " local " ADDR
[
.BR encap " { " fou " | " gue " | " none " }"
@@ -685,8 +685,6 @@ the following additional arguments are supported:
.I " [no]encap-remcsum "
] [
.I " mode " { ip6ip | ipip | mplsip | any } "
-] [
-.BR erspan " \fIIDX "
]
.in +8
@@ -731,13 +729,6 @@ MPLS-Over-IPv4, "any" indicates IPv6, IPv4 or MPLS Over IPv4. Supported for
SIT where the default is "ip6ip" and IPIP where the default is "ipip".
IPv6-Over-IPv4 is not supported for IPIP.
-.sp
-.BR erspan " \fIIDX "
-- specifies the ERSPAN index field.
-.IR IDX
-indicates a 20 bit index/port number associated with the ERSPAN
-traffic's source port and direction.
-
.in -8
.TP
@@ -883,6 +874,76 @@ the following additional arguments are supported:
- specifies the mode (datagram or connected) to use.
.TP
+ERSPAN Type Support
+For a link of type
+.I ERSPAN/IP6ERSPAN
+the following additional arguments are supported:
+
+.BI "ip link add " DEVICE
+.BR type " { " erspan " | " ip6erspan " }"
+.BI remote " ADDR " local " ADDR " seq
+.RB key
+.I KEY
+.BR erspan_ver " \fIversion "
+[
+.BR erspan " \fIIDX "
+] [
+.BR erspan_dir " { " \fIingress " | " \fIegress " }"
+] [
+.BR erspan_hwid " \fIhwid "
+] [
+.RB external
+]
+
+.in +8
+.sp
+.BI remote " ADDR "
+- specifies the remote address of the tunnel.
+
+.sp
+.BI local " ADDR "
+- specifies the fixed local address for tunneled packets.
+It must be an address on another interface on this host.
+
+.sp
+.BR erspan_ver " \fIversion "
+- specifies the ERSPAN version number.
+.IR version
+indicates the ERSPAN version to be created: 1 for version 1 (type II)
+or 2 for version 2 (type III).
+
+.sp
+.BR erspan " \fIIDX "
+- specifies the ERSPAN v1 index field.
+.IR IDX
+indicates a 20 bit index/port number associated with the ERSPAN
+traffic's source port and direction.
+
+.sp
+.BR erspan_dir " { " \fIingress " | " \fIegress " }"
+- specifies the ERSPAN v2 mirrored traffic's direction.
+
+.sp
+.BR erspan_hwid " \fIhwid "
+- an unique identifier of an ERSPAN v2 engine within a system.
+.IR hwid
+is a 6-bit value for users to configure.
+
+.sp
+.BR external
+- make this tunnel externally controlled (or not, which is the default).
+In the kernel, this is referred to as collect metadata mode. This flag is
+mutually exclusive with the
+.BR remote ,
+.BR local ,
+.BR erspan_ver ,
+.BR erspan ,
+.BR erspan_dir " and " erspan_hwid
+options.
+
+.in -8
+
+.TP
GENEVE Type Support
For a link of type
.I GENEVE
@@ -2062,6 +2123,13 @@ ip link add link wpan0 lowpan0 type lowpan
Creates a 6LoWPAN interface named lowpan0 on the underlying
IEEE 802.15.4 device wpan0.
.RE
+.PP
+ip link add dev ip6erspan11 type ip6erspan seq key 102
+local fc00:100::2 remote fc00:100::1
+erspan_ver 2 erspan_dir ingress erspan_hwid 17
+.RS 4
+Creates a IP6ERSPAN version 2 interface named ip6erspan00.
+.RE
.SH SEE ALSO
.br
--
2.7.4
^ permalink raw reply related
* RCU callback crashes
From: Jakub Kicinski @ 2017-12-20 1:59 UTC (permalink / raw)
To: netdev@vger.kernel.org, Jiri Pirko, Cong Wang
Hi!
If I run the netdevsim test long enough on a kernel with no debugging
I get this:
[ 1400.450124] BUG: unable to handle kernel paging request at 000000046474e552
[ 1400.458005] IP: 0x46474e552
[ 1400.461231] PGD 0 P4D 0
[ 1400.464150] Oops: 0010 [#1] PREEMPT SMP
[ 1400.468525] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
[ 1400.516951] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #918
[ 1400.526678] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[ 1400.535150] RIP: 0010:0x46474e552
[ 1400.538941] RSP: 0018:ffff9f736f083f08 EFLAGS: 00010216
[ 1400.544870] RAX: ffff9f736b4771b8 RBX: ffff9f736f09b880 RCX: ffff9f736b4771b8
[ 1400.552935] RDX: 000000046474e552 RSI: ffff9f736f083f18 RDI: ffff9f736b4771b8
[ 1400.561001] RBP: ffffffff8bc4a740 R08: ffff9f736b4771b8 R09: 0000000000000000
[ 1400.569066] R10: ffff9f736f083d90 R11: 0000000000000000 R12: ffff9f736f09b8b8
[ 1400.577132] R13: 000000000000000a R14: 7fffffffffffffff R15: 0000000000000202
[ 1400.585197] FS: 0000000000000000(0000) GS:ffff9f736f080000(0000) knlGS:0000000000000000
[ 1400.594349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1400.600859] CR2: 000000046474e552 CR3: 0000000839c09001 CR4: 00000000003606e0
[ 1400.608917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1400.616982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1400.625048] Call Trace:
[ 1400.627868] <IRQ>
[ 1400.630207] ? rcu_process_callbacks+0x1a0/0x4d0
[ 1400.635458] ? __do_softirq+0xd1/0x30a
[ 1400.639739] ? irq_exit+0xae/0xb0
[ 1400.643532] ? smp_apic_timer_interrupt+0x60/0x140
[ 1400.648977] ? apic_timer_interrupt+0x8c/0xa0
[ 1400.653934] </IRQ>
[ 1400.656370] ? cpuidle_enter_state+0xb0/0x2f0
[ 1400.661328] ? cpuidle_enter_state+0x8d/0x2f0
[ 1400.666287] ? do_idle+0x17b/0x1d0
[ 1400.670167] ? cpu_startup_entry+0x5f/0x70
[ 1400.674836] ? start_secondary+0x169/0x190
[ 1400.679504] ? secondary_startup_64+0xa5/0xb0
[ 1400.684466] Code: Bad RIP value.
[ 1400.688259] RIP: 0x46474e552 RSP: ffff9f736f083f08
[ 1400.693703] CR2: 000000046474e552
[ 1400.697501] ---[ end trace fab2c0fb826644df ]---
[ 1400.708442] Kernel panic - not syncing: Fatal exception in interrupt
[ 1400.715693] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1400.732994] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
Unfortunately reproducing the crash on an instrumented kernel seems to
be difficult..
I managed to gather this:
[ 26.157415] ------------[ cut here ]------------
[ 26.162670] ODEBUG: free active (active state 1) object type: rcu_head hint: (null)
[ 26.172361] WARNING: CPU: 19 PID: 1352 at ../lib/debugobjects.c:291 debug_print_object+0x64/0x80
[ 26.182288] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
[ 26.230728] CPU: 19 PID: 1352 Comm: tc Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #4
[ 26.239977] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[ 26.248453] RIP: 0010:debug_print_object+0x64/0x80
[ 26.253896] RSP: 0018:ffffb7340410fa00 EFLAGS: 00010086
[ 26.259825] RAX: 0000000000000051 RBX: ffff8f1f6b7cc5a0 RCX: 0000000000000006
[ 26.267892] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f1f6f48cdd0
[ 26.275959] RBP: ffffffffb3c48600 R08: 0000000000000000 R09: 00000000000005f2
[ 26.284042] R10: 000000000000001e R11: ffffffffb41c35ad R12: ffffffffb3a1d101
[ 26.292125] R13: ffff8f1f6b7cc5a0 R14: ffffffffb423a8b8 R15: 0000000000000001
[ 26.300194] FS: 00007f64d4956700(0000) GS:ffff8f1f6f480000(0000) knlGS:0000000000000000
[ 26.309346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 26.315859] CR2: 0000000001cbc498 CR3: 000000086a8a2004 CR4: 00000000003606e0
[ 26.323925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 26.331994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 26.331994] Call Trace:
[ 26.331998] debug_check_no_obj_freed+0x1e6/0x220
[ 26.332020] ? qdisc_graft+0x14f/0x450
[ 26.332025] kfree+0x14d/0x1b0
[ 26.332027] qdisc_graft+0x14f/0x450
[ 26.332029] tc_get_qdisc+0x12f/0x200
[ 26.332035] rtnetlink_rcv_msg+0x122/0x310
[ 26.332039] ? __skb_try_recv_datagram+0xef/0x150
[ 26.332040] ? __kmalloc_node_track_caller+0x205/0x2b0
[ 26.332042] ? rtnl_calcit.isra.12+0x100/0x100
[ 26.332044] netlink_rcv_skb+0x8d/0x130
[ 26.332046] netlink_unicast+0x16a/0x210
[ 26.332048] netlink_sendmsg+0x32a/0x370
[ 26.332054] sock_sendmsg+0x2d/0x40
[ 26.332056] ___sys_sendmsg+0x298/0x2e0
[ 26.332061] ? mem_cgroup_commit_charge+0x7a/0x540
[ 26.332062] ? mem_cgroup_try_charge+0x8e/0x1d0
[ 26.332066] ? __handle_mm_fault+0x3a1/0x1190
[ 26.332068] ? __sys_sendmsg+0x41/0x70
[ 26.332069] __sys_sendmsg+0x41/0x70
[ 26.332074] entry_SYSCALL_64_fastpath+0x1e/0x81
[ 26.332076] RIP: 0033:0x7f64d3b53450
[ 26.332076] RSP: 002b:00007fffb5ea4388 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 26.332077] RAX: ffffffffffffffda RBX: 00007f64d3e0fb20 RCX: 00007f64d3b53450
[ 26.332078] RDX: 0000000000000000 RSI: 00007fffb5ea43e0 RDI: 0000000000000003
[ 26.332078] RBP: 0000000000000a11 R08: 0000000000000000 R09: 000000000000000f
[ 26.332079] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007f64d3e0fb78
[ 26.332079] R13: 00007f64d3e0fb78 R14: 000000000000270f R15: 00007f64d3e0fb78
[ 26.332081] Code: c1 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 f6 d0 e5 00 8b 53 10 4c 89 e6 48 c7 c7 38 7c a3 b3 48 8b 14 d5 80 3d 85 b
[ 26.332097] ---[ end trace bd33b199ae76ad43 ]---
^ permalink raw reply
* [PATCH v3,net-next] ip6_gre: fix a pontential issue in ip6erspan_rcv
From: Haishuang Yan @ 2017-12-20 1:53 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
Cc: netdev, linux-kernel, Haishuang Yan, William Tu
pskb_may_pull() can change skb->data, so we need to load ipv6h/ershdr at
the right place.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Cc: William Tu <u9012063@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
Change since v3:
* Rebase on latest master branch.
* Fix wrong commit information.
---
net/ipv6/ip6_gre.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 87b9892..9bd1103 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -507,12 +507,11 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len,
struct ip6_tnl *tunnel;
u8 ver;
- ipv6h = ipv6_hdr(skb);
- ershdr = (struct erspan_base_hdr *)skb->data;
-
if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr))))
return PACKET_REJECT;
+ ipv6h = ipv6_hdr(skb);
+ ershdr = (struct erspan_base_hdr *)skb->data;
ver = (ntohs(ershdr->ver_vlan) & VER_MASK) >> VER_OFFSET;
tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v3 iproute2 net-next] erspan: add erspan version II support
From: William Tu @ 2017-12-20 1:51 UTC (permalink / raw)
To: David Ahern; +Cc: Linux Kernel Network Developers
In-Reply-To: <8eb4e84f-2218-0c96-ece6-2b1008f2da2f@gmail.com>
On Tue, Dec 19, 2017 at 5:28 PM, David Ahern <dsahern@gmail.com> wrote:
> Hi William:
>
> On 12/19/17 6:08 PM, William Tu wrote:
>> @@ -343,6 +355,26 @@ get_failed:
>> invarg("invalid erspan index\n", *argv);
>> if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
>> invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
>> + } else if (strcmp(*argv, "erspan_ver") == 0) {
>> + NEXT_ARG();
>> + if (get_u8(&erspan_ver, *argv, 0))
>> + invarg("invalid erspan version\n", *argv);
>> + if (erspan_ver != 1 && erspan_ver != 2)
>> + invarg("erspan version must be 1 or 2\n", *argv);
>> + } else if (strcmp(*argv, "erspan_dir") == 0) {
>> + NEXT_ARG();
>> + if (strcmp(*argv, "ingress") == 0 ||
>> + strcmp(*argv, "in") == 0)
>> + erspan_dir = 0;
>> + else if (strcmp(*argv, "egress") == 0 ||
>> + strcmp(*argv, "e") == 0)
>
> iproute2 has a matches() function that should be used -- it basically
> allows whatever shorthand notation matches -- in this case e, eg, egr,
> egres, egress all match. Checkout ip/iplink.c and search for matches.
>
Hi David,
Thanks, will fix it in next version.
William
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox