* [PATCH bpf-next 0/8] bpf: offload: report device back to user space (take 2)
From: Jakub Kicinski @ 2017-12-20 4:09 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
Hi!
This series is a redo of reporting offload device information to
user space after the first attempt did not take into account name
spaces. As requested by Kirill offloads are now protected by an
r/w sem. This allows us to remove the workqueue and free the
offload state fully when device is removed (suggested by Alexei).
Net namespace is reported with a device/inode pair.
The accompanying bpftool support is placed in common code because
maps will have very similar info. Note that the UAPI information
can't be nicely encapsulated into a struct, because in case we
need to grow the device information the new fields will have to
be added at the end of struct bpf_prog_info, we can't grow
structures in the middle of bpf_prog_info.
Jakub Kicinski (8):
bpf: offload: don't require rtnl for dev list manipulation
bpf: offload: don't use prog->aux->offload as boolean
bpf: offload: allow netdev to disappear while verifier is running
bpf: offload: free prog->aux->offload when device disappears
bpf: offload: free program id when device disappears
bpf: offload: report device information for offloaded programs
tools: bpftool: report device information for offloaded programs
selftests/bpf: test device info reporting for bound progs
drivers/net/ethernet/netronome/nfp/bpf/main.h | 2 +-
drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 2 +-
drivers/net/netdevsim/bpf.c | 2 +-
fs/nsfs.c | 2 +-
include/linux/bpf.h | 16 ++-
include/linux/bpf_verifier.h | 16 +--
include/linux/netdevice.h | 4 +-
include/linux/proc_ns.h | 1 +
include/uapi/linux/bpf.h | 3 +
kernel/bpf/offload.c | 114 ++++++++++++++++------
kernel/bpf/syscall.c | 19 +++-
kernel/bpf/verifier.c | 20 ++--
tools/bpf/bpftool/common.c | 52 ++++++++++
tools/bpf/bpftool/main.h | 2 +
tools/bpf/bpftool/prog.c | 3 +
tools/include/uapi/linux/bpf.h | 3 +
tools/testing/selftests/bpf/test_offload.py | 107 +++++++++++++++++---
17 files changed, 287 insertions(+), 81 deletions(-)
--
2.15.1
^ permalink raw reply
* [PATCH bpf-next 1/8] bpf: offload: don't require rtnl for dev list manipulation
From: Jakub Kicinski @ 2017-12-20 4:09 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
In-Reply-To: <20171220041006.25629-1-jakub.kicinski@netronome.com>
We only need to hold rtnl_lock() around ndo calls. The device
offload initialization doesn't require it. Neither will soon-
-to-come querying the offload info. Use struct rw_semaphore
because map offload will require sleeping with the semaphore
held for read.
Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
kernel/bpf/offload.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 8455b89d1bbf..b88e5ebdc61d 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -20,8 +20,12 @@
#include <linux/netdevice.h>
#include <linux/printk.h>
#include <linux/rtnetlink.h>
+#include <linux/rwsem.h>
-/* protected by RTNL */
+/* Protects bpf_prog_offload_devs and offload members of all progs.
+ * RTNL lock cannot be taken when holding this lock.
+ */
+static struct rw_semaphore bpf_devs_lock;
static LIST_HEAD(bpf_prog_offload_devs);
int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
@@ -43,17 +47,21 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
offload->prog = prog;
init_waitqueue_head(&offload->verifier_done);
- rtnl_lock();
+ /* Our UNREGISTER notifier will grab bpf_devs_lock, so we are safe
+ * to assume the netdev doesn't get unregistered as long as we hold
+ * bpf_devs_lock.
+ */
+ down_write(&bpf_devs_lock);
offload->netdev = __dev_get_by_index(net, attr->prog_ifindex);
if (!offload->netdev) {
- rtnl_unlock();
+ up_write(&bpf_devs_lock);
kfree(offload);
return -EINVAL;
}
prog->aux->offload = offload;
list_add_tail(&offload->offloads, &bpf_prog_offload_devs);
- rtnl_unlock();
+ up_write(&bpf_devs_lock);
return 0;
}
@@ -126,7 +134,9 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
wake_up(&offload->verifier_done);
rtnl_lock();
+ down_write(&bpf_devs_lock);
__bpf_prog_offload_destroy(prog);
+ up_write(&bpf_devs_lock);
rtnl_unlock();
kfree(offload);
@@ -181,11 +191,13 @@ static int bpf_offload_notification(struct notifier_block *notifier,
if (netdev->reg_state != NETREG_UNREGISTERING)
break;
+ down_write(&bpf_devs_lock);
list_for_each_entry_safe(offload, tmp, &bpf_prog_offload_devs,
offloads) {
if (offload->netdev == netdev)
__bpf_prog_offload_destroy(offload->prog);
}
+ up_write(&bpf_devs_lock);
break;
default:
break;
@@ -199,6 +211,7 @@ static struct notifier_block bpf_offload_notifier = {
static int __init bpf_offload_init(void)
{
+ init_rwsem(&bpf_devs_lock);
register_netdevice_notifier(&bpf_offload_notifier);
return 0;
}
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next 2/8] bpf: offload: don't use prog->aux->offload as boolean
From: Jakub Kicinski @ 2017-12-20 4:10 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
In-Reply-To: <20171220041006.25629-1-jakub.kicinski@netronome.com>
We currently use aux->offload to indicate that program is bound
to a specific device. This forces us to keep the offload structure
around even after the device is gone. Add a bool member to
struct bpf_prog_aux to indicate if offload was requested.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/bpf.h | 3 ++-
kernel/bpf/syscall.c | 4 +++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index da54ef644fcd..838eee10e979 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -201,6 +201,7 @@ struct bpf_prog_aux {
u32 stack_depth;
u32 id;
u32 func_cnt;
+ bool offload_requested;
struct bpf_prog **func;
void *jit_data; /* JIT specific data. arch dependent */
struct latch_tree_node ksym_tnode;
@@ -529,7 +530,7 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
static inline bool bpf_prog_is_dev_bound(struct bpf_prog_aux *aux)
{
- return aux->offload;
+ return aux->offload_requested;
}
#else
static inline int bpf_prog_offload_init(struct bpf_prog *prog,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e2e1c78ce1dc..1143db61584c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1151,6 +1151,8 @@ static int bpf_prog_load(union bpf_attr *attr)
if (!prog)
return -ENOMEM;
+ prog->aux->offload_requested = !!attr->prog_ifindex;
+
err = security_bpf_prog_alloc(prog->aux);
if (err)
goto free_prog_nouncharge;
@@ -1172,7 +1174,7 @@ static int bpf_prog_load(union bpf_attr *attr)
atomic_set(&prog->aux->refcnt, 1);
prog->gpl_compatible = is_gpl ? 1 : 0;
- if (attr->prog_ifindex) {
+ if (bpf_prog_is_dev_bound(prog->aux)) {
err = bpf_prog_offload_init(prog, attr);
if (err)
goto free_prog;
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next 3/8] bpf: offload: allow netdev to disappear while verifier is running
From: Jakub Kicinski @ 2017-12-20 4:10 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
In-Reply-To: <20171220041006.25629-1-jakub.kicinski@netronome.com>
To allow verifier instruction callbacks without any extra locking
NETDEV_UNREGISTER notification would wait on a waitqueue for verifier
to finish. This design decision was made when rtnl lock was providing
all the locking. Use the read/write lock instead and remove the
workqueue.
Verifier will now call into the offload code, so dev_ops are moved
to offload structure. Since verifier calls are all under
bpf_prog_is_dev_bound() we no longer need static inline implementations
to please builds with CONFIG_NET=n.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
drivers/net/ethernet/netronome/nfp/bpf/main.h | 2 +-
drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 2 +-
drivers/net/netdevsim/bpf.c | 2 +-
include/linux/bpf.h | 9 +++++--
include/linux/bpf_verifier.h | 16 ++----------
include/linux/netdevice.h | 4 +--
kernel/bpf/offload.c | 30 ++++++++++++-----------
kernel/bpf/verifier.c | 20 ++++++---------
8 files changed, 37 insertions(+), 48 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index aae1be9ed056..89a9b6393882 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -238,7 +238,7 @@ struct nfp_bpf_vnic {
int nfp_bpf_jit(struct nfp_prog *prog);
-extern const struct bpf_ext_analyzer_ops nfp_bpf_analyzer_ops;
+extern const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops;
struct netdev_bpf;
struct nfp_app;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 9c2608445bd8..d8870c2f11f3 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -260,6 +260,6 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
return 0;
}
-const struct bpf_ext_analyzer_ops nfp_bpf_analyzer_ops = {
+const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
.insn_hook = nfp_verify_insn,
};
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index c977fece64a3..e363658405ee 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -66,7 +66,7 @@ nsim_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn)
return 0;
}
-static const struct bpf_ext_analyzer_ops nsim_bpf_analyzer_ops = {
+static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = {
.insn_hook = nsim_bpf_verify_insn,
};
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 838eee10e979..669549f7e3e8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -17,6 +17,7 @@
#include <linux/numa.h>
#include <linux/wait.h>
+struct bpf_verifier_env;
struct perf_event;
struct bpf_prog;
struct bpf_map;
@@ -184,14 +185,18 @@ struct bpf_verifier_ops {
struct bpf_prog *prog, u32 *target_size);
};
+struct bpf_prog_offload_ops {
+ int (*insn_hook)(struct bpf_verifier_env *env,
+ int insn_idx, int prev_insn_idx);
+};
+
struct bpf_dev_offload {
struct bpf_prog *prog;
struct net_device *netdev;
void *dev_priv;
struct list_head offloads;
bool dev_state;
- bool verifier_running;
- wait_queue_head_t verifier_done;
+ const struct bpf_prog_offload_ops *dev_ops;
};
struct bpf_prog_aux {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index aaac589e490c..02ede122d35b 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -166,12 +166,6 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifer_log *log)
return log->len_used >= log->len_total - 1;
}
-struct bpf_verifier_env;
-struct bpf_ext_analyzer_ops {
- int (*insn_hook)(struct bpf_verifier_env *env,
- int insn_idx, int prev_insn_idx);
-};
-
#define BPF_MAX_SUBPROGS 256
/* single container for all structs
@@ -185,7 +179,6 @@ struct bpf_verifier_env {
bool strict_alignment; /* perform strict pointer alignment checks */
struct bpf_verifier_state *cur_state; /* current verifier state */
struct bpf_verifier_state_list **explored_states; /* search pruning optimization */
- const struct bpf_ext_analyzer_ops *dev_ops; /* device analyzer ops */
struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
u32 used_map_cnt; /* number of used maps */
u32 id_gen; /* used to generate unique reg IDs */
@@ -205,13 +198,8 @@ static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
return cur->frame[cur->curframe]->regs;
}
-#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
-#else
-static inline int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env)
-{
- return -EOPNOTSUPP;
-}
-#endif
+int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
+ int insn_idx, int prev_insn_idx);
#endif /* _LINUX_BPF_VERIFIER_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cc4ce7456e38..0a1a4a111546 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -804,7 +804,7 @@ enum bpf_netdev_command {
BPF_OFFLOAD_DESTROY,
};
-struct bpf_ext_analyzer_ops;
+struct bpf_prog_offload_ops;
struct netlink_ext_ack;
struct netdev_bpf {
@@ -826,7 +826,7 @@ struct netdev_bpf {
/* BPF_OFFLOAD_VERIFIER_PREP */
struct {
struct bpf_prog *prog;
- const struct bpf_ext_analyzer_ops *ops; /* callee set */
+ const struct bpf_prog_offload_ops *ops; /* callee set */
} verifier;
/* BPF_OFFLOAD_TRANSLATE, BPF_OFFLOAD_DESTROY */
struct {
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index b88e5ebdc61d..cda2d8350fe1 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -45,7 +45,6 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
return -ENOMEM;
offload->prog = prog;
- init_waitqueue_head(&offload->verifier_done);
/* Our UNREGISTER notifier will grab bpf_devs_lock, so we are safe
* to assume the netdev doesn't get unregistered as long as we hold
@@ -95,15 +94,28 @@ int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env)
if (err)
goto exit_unlock;
- env->dev_ops = data.verifier.ops;
-
+ env->prog->aux->offload->dev_ops = data.verifier.ops;
env->prog->aux->offload->dev_state = true;
- env->prog->aux->offload->verifier_running = true;
exit_unlock:
rtnl_unlock();
return err;
}
+int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
+ int insn_idx, int prev_insn_idx)
+{
+ struct bpf_dev_offload *offload;
+ int ret = -ENODEV;
+
+ down_read(&bpf_devs_lock);
+ offload = env->prog->aux->offload;
+ if (offload->netdev)
+ ret = offload->dev_ops->insn_hook(env, insn_idx, prev_insn_idx);
+ up_read(&bpf_devs_lock);
+
+ return ret;
+}
+
static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
{
struct bpf_dev_offload *offload = prog->aux->offload;
@@ -115,9 +127,6 @@ static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
data.offload.prog = prog;
- if (offload->verifier_running)
- wait_event(offload->verifier_done, !offload->verifier_running);
-
if (offload->dev_state)
WARN_ON(__bpf_offload_ndo(prog, BPF_OFFLOAD_DESTROY, &data));
@@ -130,9 +139,6 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
{
struct bpf_dev_offload *offload = prog->aux->offload;
- offload->verifier_running = false;
- wake_up(&offload->verifier_done);
-
rtnl_lock();
down_write(&bpf_devs_lock);
__bpf_prog_offload_destroy(prog);
@@ -144,15 +150,11 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
static int bpf_prog_offload_translate(struct bpf_prog *prog)
{
- struct bpf_dev_offload *offload = prog->aux->offload;
struct netdev_bpf data = {};
int ret;
data.offload.prog = prog;
- offload->verifier_running = false;
- wake_up(&offload->verifier_done);
-
rtnl_lock();
ret = __bpf_offload_ndo(prog, BPF_OFFLOAD_TRANSLATE, &data);
rtnl_unlock();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 48b2901cf483..6b95efad5828 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4341,15 +4341,6 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
return 0;
}
-static int ext_analyzer_insn_hook(struct bpf_verifier_env *env,
- int insn_idx, int prev_insn_idx)
-{
- if (env->dev_ops && env->dev_ops->insn_hook)
- return env->dev_ops->insn_hook(env, insn_idx, prev_insn_idx);
-
- return 0;
-}
-
static int do_check(struct bpf_verifier_env *env)
{
struct bpf_verifier_state *state;
@@ -4431,9 +4422,12 @@ static int do_check(struct bpf_verifier_env *env)
env->allow_ptr_leaks);
}
- err = ext_analyzer_insn_hook(env, insn_idx, prev_insn_idx);
- if (err)
- return err;
+ if (bpf_prog_is_dev_bound(env->prog->aux)) {
+ err = bpf_prog_offload_verify_insn(env, insn_idx,
+ prev_insn_idx);
+ if (err)
+ return err;
+ }
regs = cur_regs(env);
env->insn_aux_data[insn_idx].seen = true;
@@ -5341,7 +5335,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
env->strict_alignment = true;
- if (env->prog->aux->offload) {
+ if (bpf_prog_is_dev_bound(env->prog->aux)) {
ret = bpf_prog_offload_verifier_prep(env);
if (ret)
goto err_unlock;
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next 4/8] bpf: offload: free prog->aux->offload when device disappears
From: Jakub Kicinski @ 2017-12-20 4:10 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
In-Reply-To: <20171220041006.25629-1-jakub.kicinski@netronome.com>
All bpf offload operations should now be under bpf_devs_lock,
it's safe to free and clear the entire offload structure,
not only the netdev pointer.
__bpf_prog_offload_destroy() will no longer be called multiple
times.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
kernel/bpf/offload.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index cda2d8350fe1..9988dc4038e6 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -68,12 +68,14 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
static int __bpf_offload_ndo(struct bpf_prog *prog, enum bpf_netdev_command cmd,
struct netdev_bpf *data)
{
- struct net_device *netdev = prog->aux->offload->netdev;
+ struct bpf_dev_offload *offload = prog->aux->offload;
+ struct net_device *netdev;
ASSERT_RTNL();
- if (!netdev)
+ if (!offload)
return -ENODEV;
+ netdev = offload->netdev;
if (!netdev->netdev_ops->ndo_bpf)
return -EOPNOTSUPP;
@@ -109,7 +111,7 @@ int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
down_read(&bpf_devs_lock);
offload = env->prog->aux->offload;
- if (offload->netdev)
+ if (offload)
ret = offload->dev_ops->insn_hook(env, insn_idx, prev_insn_idx);
up_read(&bpf_devs_lock);
@@ -121,31 +123,24 @@ static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
struct bpf_dev_offload *offload = prog->aux->offload;
struct netdev_bpf data = {};
- /* Caution - if netdev is destroyed before the program, this function
- * will be called twice.
- */
-
data.offload.prog = prog;
if (offload->dev_state)
WARN_ON(__bpf_offload_ndo(prog, BPF_OFFLOAD_DESTROY, &data));
- offload->dev_state = false;
list_del_init(&offload->offloads);
- offload->netdev = NULL;
+ kfree(offload);
+ prog->aux->offload = NULL;
}
void bpf_prog_offload_destroy(struct bpf_prog *prog)
{
- struct bpf_dev_offload *offload = prog->aux->offload;
-
rtnl_lock();
down_write(&bpf_devs_lock);
- __bpf_prog_offload_destroy(prog);
+ if (prog->aux->offload)
+ __bpf_prog_offload_destroy(prog);
up_write(&bpf_devs_lock);
rtnl_unlock();
-
- kfree(offload);
}
static int bpf_prog_offload_translate(struct bpf_prog *prog)
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next 6/8] bpf: offload: report device information for offloaded programs
From: Jakub Kicinski @ 2017-12-20 4:10 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel
Cc: ktkhai, oss-drivers, Jakub Kicinski, Eric W . Biederman
In-Reply-To: <20171220041006.25629-1-jakub.kicinski@netronome.com>
Report to the user ifindex and namespace information of offloaded
programs. If device has disappeared return -ENODEV. Specify the
namespace using dev/inode combination.
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
fs/nsfs.c | 2 +-
include/linux/bpf.h | 2 ++
include/linux/proc_ns.h | 1 +
include/uapi/linux/bpf.h | 3 +++
kernel/bpf/offload.c | 39 +++++++++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 6 ++++++
tools/include/uapi/linux/bpf.h | 3 +++
7 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/fs/nsfs.c b/fs/nsfs.c
index 7c6f76d29f56..e50628675935 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -51,7 +51,7 @@ static void nsfs_evict(struct inode *inode)
ns->ops->put(ns);
}
-static void *__ns_get_path(struct path *path, struct ns_common *ns)
+void *__ns_get_path(struct path *path, struct ns_common *ns)
{
struct vfsmount *mnt = nsfs_mnt;
struct dentry *dentry;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9a916ab34299..7810ae57b357 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -531,6 +531,8 @@ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
int bpf_prog_offload_compile(struct bpf_prog *prog);
void bpf_prog_offload_destroy(struct bpf_prog *prog);
+int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
+ struct bpf_prog *prog);
#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 2ff18c9840a7..1733359cf713 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -76,6 +76,7 @@ static inline int ns_alloc_inum(struct ns_common *ns)
extern struct file *proc_ns_fget(int fd);
#define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
+extern void *__ns_get_path(struct path *path, struct ns_common *ns);
extern void *ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d01f1cb3cfc0..72b37fc3bc0c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -921,6 +921,9 @@ struct bpf_prog_info {
__u32 nr_map_ids;
__aligned_u64 map_ids;
char name[BPF_OBJ_NAME_LEN];
+ __u32 ifindex;
+ __u64 netns_dev;
+ __u64 netns_ino;
} __attribute__((aligned(8)));
struct bpf_map_info {
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 1af94cb4f815..0543f24542ae 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -16,9 +16,11 @@
#include <linux/bpf.h>
#include <linux/bpf_verifier.h>
#include <linux/bug.h>
+#include <linux/kdev_t.h>
#include <linux/list.h>
#include <linux/netdevice.h>
#include <linux/printk.h>
+#include <linux/proc_ns.h>
#include <linux/rtnetlink.h>
#include <linux/rwsem.h>
@@ -174,6 +176,43 @@ int bpf_prog_offload_compile(struct bpf_prog *prog)
return bpf_prog_offload_translate(prog);
}
+int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
+ struct bpf_prog *prog)
+{
+ struct bpf_dev_offload *offload;
+ struct inode *ns_inode;
+ struct path ns_path;
+ struct net *net;
+ void *ptr;
+
+again:
+ down_read(&bpf_devs_lock);
+ offload = prog->aux->offload;
+ if (!offload) {
+ up_read(&bpf_devs_lock);
+ return -ENODEV;
+ }
+
+ net = dev_net(offload->netdev);
+ get_net(net); /* __ns_get_path() drops the reference */
+
+ ptr = __ns_get_path(&ns_path, &net->ns);
+ if (IS_ERR(ptr)) {
+ up_read(&bpf_devs_lock);
+ if (PTR_ERR(ptr) == -EAGAIN)
+ goto again;
+ return PTR_ERR(ptr);
+ }
+ ns_inode = ns_path.dentry->d_inode;
+
+ info->ifindex = offload->netdev->ifindex;
+ info->netns_dev = new_encode_dev(ns_inode->i_sb->s_dev);
+ info->netns_ino = ns_inode->i_ino;
+ up_read(&bpf_devs_lock);
+
+ return 0;
+}
+
const struct bpf_prog_ops bpf_offload_prog_ops = {
};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7d9f5b0f0e49..20444fd678d0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1624,6 +1624,12 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
return -EFAULT;
}
+ if (bpf_prog_is_dev_bound(prog->aux)) {
+ err = bpf_prog_offload_info_fill(&info, prog);
+ if (err)
+ return err;
+ }
+
done:
if (copy_to_user(uinfo, &info, info_len) ||
put_user(info_len, &uattr->info.info_len))
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index db1b0923a308..4e8c60acfa32 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -921,6 +921,9 @@ struct bpf_prog_info {
__u32 nr_map_ids;
__aligned_u64 map_ids;
char name[BPF_OBJ_NAME_LEN];
+ __u32 ifindex;
+ __u64 netns_dev;
+ __u64 netns_ino;
} __attribute__((aligned(8)));
struct bpf_map_info {
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next 5/8] bpf: offload: free program id when device disappears
From: Jakub Kicinski @ 2017-12-20 4:10 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
In-Reply-To: <20171220041006.25629-1-jakub.kicinski@netronome.com>
Bound programs are quite useless after their device disappears.
They are simply waiting for reference count to go to zero,
don't list them in BPF_PROG_GET_NEXT_ID by freeing their ID
early.
Note that orphaned offload programs will return -ENODEV on
BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/bpf.h | 2 ++
kernel/bpf/offload.c | 3 +++
kernel/bpf/syscall.c | 9 +++++++--
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 669549f7e3e8..9a916ab34299 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -357,6 +357,8 @@ void bpf_prog_put(struct bpf_prog *prog);
int __bpf_prog_charge(struct user_struct *user, u32 pages);
void __bpf_prog_uncharge(struct user_struct *user, u32 pages);
+void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
+
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
struct bpf_map *__bpf_map_get(struct fd f);
struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 9988dc4038e6..1af94cb4f815 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -128,6 +128,9 @@ static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
if (offload->dev_state)
WARN_ON(__bpf_offload_ndo(prog, BPF_OFFLOAD_DESTROY, &data));
+ /* Make sure BPF_PROG_GET_NEXT_ID can't find this dead program */
+ bpf_prog_free_id(prog, true);
+
list_del_init(&offload->offloads);
kfree(offload);
prog->aux->offload = NULL;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1143db61584c..7d9f5b0f0e49 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -905,9 +905,13 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
return id > 0 ? 0 : id;
}
-static void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
+void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
{
- /* cBPF to eBPF migrations are currently not in the idr store. */
+ /* cBPF to eBPF migrations are currently not in the idr store.
+ * Offloaded programs are removed from the store when their device
+ * disappears - even if someone grabs an fd to them they are unusable,
+ * simply waiting for refcnt to drop to be freed.
+ */
if (!prog->aux->id)
return;
@@ -917,6 +921,7 @@ static void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
__acquire(&prog_idr_lock);
idr_remove(&prog_idr, prog->aux->id);
+ prog->aux->id = 0;
if (do_idr_lock)
spin_unlock_bh(&prog_idr_lock);
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next 7/8] tools: bpftool: report device information for offloaded programs
From: Jakub Kicinski @ 2017-12-20 4:10 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
In-Reply-To: <20171220041006.25629-1-jakub.kicinski@netronome.com>
Print the just-exposed device information about device to which
program is bound.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
tools/bpf/bpftool/common.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++
tools/bpf/bpftool/main.h | 2 ++
tools/bpf/bpftool/prog.c | 3 +++
3 files changed, 57 insertions(+)
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index b62c94e3997a..6601c95a9258 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -44,7 +44,9 @@
#include <unistd.h>
#include <linux/limits.h>
#include <linux/magic.h>
+#include <net/if.h>
#include <sys/mount.h>
+#include <sys/stat.h>
#include <sys/types.h>
#include <sys/vfs.h>
@@ -412,3 +414,53 @@ void delete_pinned_obj_table(struct pinned_obj_table *tab)
free(obj);
}
}
+
+static char *
+ifindex_to_name_ns(__u32 ifindex, __u32 ns_dev, __u32 ns_ino, char *buf)
+{
+ struct stat st;
+ int err;
+
+ err = stat("/proc/self/ns/net", &st);
+ if (err) {
+ p_err("Can't stat /proc/self: %s", strerror(errno));
+ return NULL;
+ }
+
+ if (st.st_dev != ns_dev || st.st_ino != ns_ino)
+ return NULL;
+
+ return if_indextoname(ifindex, buf);
+}
+
+void print_dev_plain(__u32 ifindex, __u64 ns_dev, __u64 ns_inode)
+{
+ char name[IF_NAMESIZE];
+
+ if (!ifindex)
+ return;
+
+ printf(" dev ");
+ if (ifindex_to_name_ns(ifindex, ns_dev, ns_inode, name))
+ printf("%s", name);
+ else
+ printf("ifindex %u ns_dev %llu ns_ino %llu",
+ ifindex, ns_dev, ns_inode);
+}
+
+void print_dev_json(__u32 ifindex, __u64 ns_dev, __u64 ns_inode)
+{
+ char name[IF_NAMESIZE];
+
+ if (!ifindex)
+ return;
+
+ jsonw_name(json_wtr, "dev");
+ jsonw_start_object(json_wtr);
+ jsonw_uint_field(json_wtr, "ifindex", ifindex);
+ jsonw_uint_field(json_wtr, "ns_dev", ns_dev);
+ jsonw_uint_field(json_wtr, "ns_inode", ns_inode);
+ if (ifindex_to_name_ns(ifindex, ns_dev, ns_inode, name))
+ jsonw_string_field(json_wtr, "ifname", name);
+ jsonw_end_object(json_wtr);
+}
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 8f6d3cac0347..65b526fe6e7e 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -96,6 +96,8 @@ struct pinned_obj {
int build_pinned_obj_table(struct pinned_obj_table *table,
enum bpf_obj_type type);
void delete_pinned_obj_table(struct pinned_obj_table *tab);
+void print_dev_plain(__u32 ifindex, __u64 ns_dev, __u64 ns_inode);
+void print_dev_json(__u32 ifindex, __u64 ns_dev, __u64 ns_inode);
struct cmd {
const char *cmd;
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 037484ceaeaf..4ccf6301f0fe 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -230,6 +230,8 @@ static void print_prog_json(struct bpf_prog_info *info, int fd)
info->tag[0], info->tag[1], info->tag[2], info->tag[3],
info->tag[4], info->tag[5], info->tag[6], info->tag[7]);
+ print_dev_json(info->ifindex, info->netns_dev, info->netns_ino);
+
if (info->load_time) {
char buf[32];
@@ -287,6 +289,7 @@ static void print_prog_plain(struct bpf_prog_info *info, int fd)
printf("tag ");
fprint_hex(stdout, info->tag, BPF_TAG_SIZE, "");
+ print_dev_plain(info->ifindex, info->netns_dev, info->netns_ino);
printf("\n");
if (info->load_time) {
--
2.15.1
^ permalink raw reply related
* [PATCH bpf-next 8/8] selftests/bpf: test device info reporting for bound progs
From: Jakub Kicinski @ 2017-12-20 4:10 UTC (permalink / raw)
To: netdev, alexei.starovoitov, daniel; +Cc: ktkhai, oss-drivers, Jakub Kicinski
In-Reply-To: <20171220041006.25629-1-jakub.kicinski@netronome.com>
Check if bound programs report correct device info. Test
in local namespace, in remote one, back to the local ns,
remove the device and check that information is cleared.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
tools/testing/selftests/bpf/test_offload.py | 107 +++++++++++++++++++++++++---
1 file changed, 96 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py
index c940505c2978..a581eb5b0f05 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -18,6 +18,8 @@ import argparse
import json
import os
import pprint
+import random
+import string
import subprocess
import time
@@ -27,6 +29,7 @@ bpf_test_dir = os.path.dirname(os.path.realpath(__file__))
pp = pprint.PrettyPrinter()
devs = [] # devices we created for clean up
files = [] # files to be removed
+netns = [] # net namespaces to be removed
def log_get_sec(level=0):
return "*" * (log_level + level)
@@ -128,22 +131,25 @@ files = [] # files to be removed
if f in files:
files.remove(f)
-def tool(name, args, flags, JSON=True, fail=True):
+def tool(name, args, flags, JSON=True, ns="", fail=True):
params = ""
if JSON:
params += "%s " % (flags["json"])
- ret, out = cmd(name + " " + params + args, fail=fail)
+ if ns != "":
+ ns = "ip netns exec %s " % (ns)
+
+ ret, out = cmd(ns + name + " " + params + args, fail=fail)
if JSON and len(out.strip()) != 0:
return ret, json.loads(out)
else:
return ret, out
-def bpftool(args, JSON=True, fail=True):
- return tool("bpftool", args, {"json":"-p"}, JSON=JSON, fail=fail)
+def bpftool(args, JSON=True, ns="", fail=True):
+ return tool("bpftool", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail)
-def bpftool_prog_list(expected=None):
- _, progs = bpftool("prog show", JSON=True, fail=True)
+def bpftool_prog_list(expected=None, ns=""):
+ _, progs = bpftool("prog show", JSON=True, ns=ns, fail=True)
if expected is not None:
if len(progs) != expected:
fail(True, "%d BPF programs loaded, expected %d" %
@@ -158,13 +164,13 @@ files = [] # files to be removed
time.sleep(0.05)
raise Exception("Time out waiting for program counts to stabilize want %d, have %d" % (expected, nprogs))
-def ip(args, force=False, JSON=True, fail=True):
+def ip(args, force=False, JSON=True, ns="", fail=True):
if force:
args = "-force " + args
- return tool("ip", args, {"json":"-j"}, JSON=JSON, fail=fail)
+ return tool("ip", args, {"json":"-j"}, JSON=JSON, ns=ns, fail=fail)
-def tc(args, JSON=True, fail=True):
- return tool("tc", args, {"json":"-p"}, JSON=JSON, fail=fail)
+def tc(args, JSON=True, ns="", fail=True):
+ return tool("tc", args, {"json":"-p"}, JSON=JSON, ns=ns, fail=fail)
def ethtool(dev, opt, args, fail=True):
return cmd("ethtool %s %s %s" % (opt, dev["ifname"], args), fail=fail)
@@ -178,6 +184,15 @@ files = [] # files to be removed
def bpf_bytecode(bytecode):
return "bytecode \"%s\"" % (bytecode)
+def mknetns(n_retry=10):
+ for i in range(n_retry):
+ name = ''.join([random.choice(string.ascii_letters) for i in range(8)])
+ ret, _ = ip("netns add %s" % (name), fail=False)
+ if ret == 0:
+ netns.append(name)
+ return name
+ return None
+
class DebugfsDir:
"""
Class for accessing DebugFS directories as a dictionary.
@@ -237,6 +252,8 @@ files = [] # files to be removed
self.dev = self._netdevsim_create()
devs.append(self)
+ self.ns = ""
+
self.dfs_dir = '/sys/kernel/debug/netdevsim/%s' % (self.dev['ifname'])
self.dfs_refresh()
@@ -257,7 +274,7 @@ files = [] # files to be removed
def remove(self):
devs.remove(self)
- ip("link del dev %s" % (self.dev["ifname"]))
+ ip("link del dev %s" % (self.dev["ifname"]), ns=self.ns)
def dfs_refresh(self):
self.dfs = DebugfsDir(self.dfs_dir)
@@ -285,6 +302,11 @@ files = [] # files to be removed
time.sleep(0.05)
raise Exception("Time out waiting for program counts to stabilize want %d/%d, have %d bound, %d loaded" % (bound, total, nbound, nprogs))
+ def set_ns(self, ns):
+ name = "1" if ns == "" else ns
+ ip("link set dev %s netns %s" % (self.dev["ifname"], name), ns=self.ns)
+ self.ns = ns
+
def set_mtu(self, mtu, fail=True):
return ip("link set dev %s mtu %d" % (self.dev["ifname"], mtu),
fail=fail)
@@ -372,6 +394,8 @@ files = [] # files to be removed
dev.remove()
for f in files:
cmd("rm -f %s" % (f))
+ for ns in netns:
+ cmd("ip netns delete %s" % (ns))
def pin_prog(file_name, idx=0):
progs = bpftool_prog_list(expected=(idx + 1))
@@ -381,6 +405,30 @@ files = [] # files to be removed
return file_name, bpf_pinned(file_name)
+def check_dev_info(other_ns, ns, removed=False):
+ if removed:
+ bpftool_prog_list()
+ return
+ progs = bpftool_prog_list(expected=int(not removed), ns=ns)
+ prog = progs[0]
+
+ fail("dev" not in prog.keys(), "Device parameters not reported")
+ dev = prog["dev"]
+ fail("ifindex" not in dev.keys(), "Device parameters not reported")
+ fail("ns_dev" not in dev.keys(), "Device parameters not reported")
+ fail("ns_inode" not in dev.keys(), "Device parameters not reported")
+
+ if not removed and not other_ns:
+ fail("ifname" not in dev.keys(), "Ifname not reported")
+ fail(dev["ifname"] != sim["ifname"],
+ "Ifname incorrect %s vs %s" % (dev["ifname"], sim["ifname"]))
+ else:
+ fail("ifname" in dev.keys(), "Ifname is reported for other ns")
+ if removed:
+ fail(dev["ifindex"] != 0, "Device perameters not zero on removed")
+ fail(dev["ns_dev"] != 0, "Device perameters not zero on removed")
+ fail(dev["ns_inode"] != 0, "Device perameters not zero on removed")
+
# Parse command line
parser = argparse.ArgumentParser()
parser.add_argument("--log", help="output verbose log to given file")
@@ -417,6 +465,12 @@ samples = ["sample_ret0.o"]
skip(ret != 0, "sample %s/%s not found, please compile it" %
(bpf_test_dir, s))
+# Check if net namespaces seem to work
+ns = mknetns()
+skip(ns is None, "Could not create a net namespace")
+cmd("ip netns delete %s" % (ns))
+netns = []
+
try:
obj = bpf_obj("sample_ret0.o")
bytecode = bpf_bytecode("1,6 0 0 4294967295,")
@@ -549,6 +603,8 @@ samples = ["sample_ret0.o"]
progs = bpftool_prog_list(expected=1)
fail(ipl["xdp"]["prog"]["id"] != progs[0]["id"],
"Loaded program has wrong ID")
+ fail("dev" in progs[0].keys(),
+ "Device parameters reported for non-offloaded program")
start_test("Test XDP prog replace with bad flags...")
ret, _ = sim.set_xdp(obj, "offload", force=True, fail=False)
@@ -673,6 +729,35 @@ samples = ["sample_ret0.o"]
fail(time_diff < delay_sec, "Removal process took %s, expected %s" %
(time_diff, delay_sec))
+ # Remove all pinned files and reinstantiate the netdev
+ clean_up()
+ bpftool_prog_list_wait(expected=0)
+
+ sim = NetdevSim()
+ sim.set_ethtool_tc_offloads(True)
+ sim.set_xdp(obj, "offload")
+
+ start_test("Test bpftool bound info reporting (own ns)...")
+ check_dev_info(False, "")
+
+ start_test("Test bpftool bound info reporting (other ns)...")
+ ns = mknetns()
+ sim.set_ns(ns)
+ check_dev_info(True, "")
+
+ start_test("Test bpftool bound info reporting (remote ns)...")
+ check_dev_info(False, ns)
+
+ start_test("Test bpftool bound info reporting (back to own ns)...")
+ sim.set_ns("")
+ check_dev_info(False, "")
+
+ pin_prog("/sys/fs/bpf/tmp")
+ sim.remove()
+
+ start_test("Test bpftool bound info reporting (removed dev)...")
+ check_dev_info(True, "", removed=True)
+
print("%s: OK" % (os.path.basename(__file__)))
finally:
--
2.15.1
^ permalink raw reply related
* [PATCH net-next v4 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events
From: Masami Hiramatsu @ 2017-12-20 4:14 UTC (permalink / raw)
To: Ingo Molnar, David S . Miller, Ian McDonald, Vlad Yasevich,
Stephen Hemminger, Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
Gerrit Renker, Neil Horman, dccp, netdev, linux-sctp,
Stephen Rothwell, mhiramat
Hi,
This series is v4 of the replacement of jprobe usage with trace
events. This version is rebased on net-next, fixes a build warning
and moves a temporal variable definition in a block.
Previous version is here;
https://lkml.org/lkml/2017/12/19/153
Changes from v3:
All: Rebased on net-next
[3/6]: fixes a build warning for i386 by casting pointer unsigned
long instead of __u64, and moves a temporal variable
definition in a block.
Thank you,
---
Masami Hiramatsu (6):
net: tcp: Add trace events for TCP congestion window tracing
net: tcp: Remove TCP probe module
net: sctp: Add SCTP ACK tracking trace event
net: sctp: Remove debug SCTP probe module
net: dccp: Add DCCP sendmsg trace event
net: dccp: Remove dccpprobe module
include/trace/events/sctp.h | 99 ++++++++++++++
include/trace/events/tcp.h | 80 +++++++++++
net/Kconfig | 17 --
net/dccp/Kconfig | 17 --
net/dccp/Makefile | 2
net/dccp/probe.c | 203 -----------------------------
net/dccp/proto.c | 5 +
net/dccp/trace.h | 105 +++++++++++++++
net/ipv4/Makefile | 1
net/ipv4/tcp_input.c | 3
net/ipv4/tcp_probe.c | 301 -------------------------------------------
net/sctp/Kconfig | 12 --
net/sctp/Makefile | 3
net/sctp/probe.c | 244 -----------------------------------
net/sctp/sm_statefuns.c | 5 +
15 files changed, 297 insertions(+), 800 deletions(-)
create mode 100644 include/trace/events/sctp.h
delete mode 100644 net/dccp/probe.c
create mode 100644 net/dccp/trace.h
delete mode 100644 net/ipv4/tcp_probe.c
delete mode 100644 net/sctp/probe.c
--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>
^ permalink raw reply
* [PATCH net-next v4 1/6] net: tcp: Add trace events for TCP congestion window tracing
From: Masami Hiramatsu @ 2017-12-20 4:14 UTC (permalink / raw)
To: Ingo Molnar, David S . Miller, Ian McDonald, Vlad Yasevich,
Stephen Hemminger, Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
Gerrit Renker, Neil Horman, dccp, netdev, linux-sctp,
Stephen Rothwell, mhiramat
In-Reply-To: <151374325126.2497.6934744693865165386.stgit@devbox>
This adds an event to trace TCP stat variables with
slightly intrusive trace-event. This uses ftrace/perf
event log buffer to trace those state, no needs to
prepare own ring-buffer, nor custom user apps.
User can use ftrace to trace this event as below;
# cd /sys/kernel/debug/tracing
# echo 1 > events/tcp/tcp_probe/enable
(run workloads)
# cat trace
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
Changes in v3:
- Fix build errors caused by including events/tcp.h twice.
- Sort out the including headers.
---
include/trace/events/tcp.h | 80 ++++++++++++++++++++++++++++++++++++++++++++
net/ipv4/tcp_input.c | 3 ++
2 files changed, 83 insertions(+)
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 07cccca6cbf1..14ad60b468fb 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
#undef TRACE_SYSTEM
#define TRACE_SYSTEM tcp
@@ -8,6 +9,7 @@
#include <linux/tcp.h>
#include <linux/tracepoint.h>
#include <net/ipv6.h>
+#include <net/tcp.h>
#define tcp_state_name(state) { state, #state }
#define show_tcp_state_name(val) \
@@ -293,6 +295,84 @@ TRACE_EVENT(tcp_retransmit_synack,
__entry->saddr_v6, __entry->daddr_v6)
);
+TRACE_EVENT(tcp_probe,
+
+ TP_PROTO(struct sock *sk, struct sk_buff *skb),
+
+ TP_ARGS(sk, skb),
+
+ TP_STRUCT__entry(
+ /* sockaddr_in6 is always bigger than sockaddr_in */
+ __array(__u8, saddr, sizeof(struct sockaddr_in6))
+ __array(__u8, daddr, sizeof(struct sockaddr_in6))
+ __field(__u16, sport)
+ __field(__u16, dport)
+ __field(__u32, mark)
+ __field(__u16, length)
+ __field(__u32, snd_nxt)
+ __field(__u32, snd_una)
+ __field(__u32, snd_cwnd)
+ __field(__u32, ssthresh)
+ __field(__u32, snd_wnd)
+ __field(__u32, srtt)
+ __field(__u32, rcv_wnd)
+ ),
+
+ TP_fast_assign(
+ const struct tcp_sock *tp = tcp_sk(sk);
+ const struct inet_sock *inet = inet_sk(sk);
+
+ memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+ memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
+
+ if (sk->sk_family == AF_INET) {
+ struct sockaddr_in *v4 = (void *)__entry->saddr;
+
+ v4->sin_family = AF_INET;
+ v4->sin_port = inet->inet_sport;
+ v4->sin_addr.s_addr = inet->inet_saddr;
+ v4 = (void *)__entry->daddr;
+ v4->sin_family = AF_INET;
+ v4->sin_port = inet->inet_dport;
+ v4->sin_addr.s_addr = inet->inet_daddr;
+#if IS_ENABLED(CONFIG_IPV6)
+ } else if (sk->sk_family == AF_INET6) {
+ struct sockaddr_in6 *v6 = (void *)__entry->saddr;
+
+ v6->sin6_family = AF_INET6;
+ v6->sin6_port = inet->inet_sport;
+ v6->sin6_addr = inet6_sk(sk)->saddr;
+ v6 = (void *)__entry->daddr;
+ v6->sin6_family = AF_INET6;
+ v6->sin6_port = inet->inet_dport;
+ v6->sin6_addr = sk->sk_v6_daddr;
+#endif
+ }
+
+ /* For filtering use */
+ __entry->sport = ntohs(inet->inet_sport);
+ __entry->dport = ntohs(inet->inet_dport);
+ __entry->mark = skb->mark;
+
+ __entry->length = skb->len;
+ __entry->snd_nxt = tp->snd_nxt;
+ __entry->snd_una = tp->snd_una;
+ __entry->snd_cwnd = tp->snd_cwnd;
+ __entry->snd_wnd = tp->snd_wnd;
+ __entry->rcv_wnd = tp->rcv_wnd;
+ __entry->ssthresh = tcp_current_ssthresh(sk);
+ __entry->srtt = tp->srtt_us >> 3;
+ ),
+
+ TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x "
+ "snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u "
+ "rcv_wnd=%u",
+ __entry->saddr, __entry->daddr, __entry->mark,
+ __entry->length, __entry->snd_nxt, __entry->snd_una,
+ __entry->snd_cwnd, __entry->ssthresh, __entry->snd_wnd,
+ __entry->srtt, __entry->rcv_wnd)
+);
+
#endif /* _TRACE_TCP_H */
/* This part must be outside protection */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4d55c4b338ee..ff71b18d9682 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5299,6 +5299,9 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
unsigned int len = skb->len;
struct tcp_sock *tp = tcp_sk(sk);
+ /* TCP congestion window tracking */
+ trace_tcp_probe(sk, skb);
+
tcp_mstamp_refresh(tp);
if (unlikely(!sk->sk_rx_dst))
inet_csk(sk)->icsk_af_ops->sk_rx_dst_set(sk, skb);
^ permalink raw reply related
* [PATCH net-next v4 2/6] net: tcp: Remove TCP probe module
From: Masami Hiramatsu @ 2017-12-20 4:15 UTC (permalink / raw)
To: Ingo Molnar, David S . Miller, Ian McDonald, Vlad Yasevich,
Stephen Hemminger, Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
Gerrit Renker, Neil Horman, dccp, netdev, linux-sctp,
Stephen Rothwell, mhiramat
In-Reply-To: <151374325126.2497.6934744693865165386.stgit@devbox>
Remove TCP probe module since jprobe has been deprecated.
That function is now replaced by tcp/tcp_probe trace-event.
You can use it via ftrace or perftools.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
net/Kconfig | 17 ---
net/ipv4/Makefile | 1
net/ipv4/tcp_probe.c | 301 --------------------------------------------------
3 files changed, 319 deletions(-)
delete mode 100644 net/ipv4/tcp_probe.c
diff --git a/net/Kconfig b/net/Kconfig
index 9dba2715919d..efe930db3c08 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -336,23 +336,6 @@ config NET_PKTGEN
To compile this code as a module, choose M here: the
module will be called pktgen.
-config NET_TCPPROBE
- tristate "TCP connection probing"
- depends on INET && PROC_FS && KPROBES
- ---help---
- This module allows for capturing the changes to TCP connection
- state in response to incoming packets. It is used for debugging
- TCP congestion avoidance modules. If you don't understand
- what was just said, you don't need it: say N.
-
- Documentation on how to use TCP connection probing can be found
- at:
-
- http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe
-
- To compile this code as a module, choose M here: the
- module will be called tcp_probe.
-
config NET_DROP_MONITOR
tristate "Network packet drop alerting service"
depends on INET && TRACEPOINTS
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index c6c8ad1d4b6d..47a0a6649a9d 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -43,7 +43,6 @@ obj-$(CONFIG_INET_DIAG) += inet_diag.o
obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o
obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o
obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o
-obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o
obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
deleted file mode 100644
index 697f4c67b2e3..000000000000
--- a/net/ipv4/tcp_probe.c
+++ /dev/null
@@ -1,301 +0,0 @@
-/*
- * tcpprobe - Observe the TCP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger <shemminger@osdl.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/kernel.h>
-#include <linux/kprobes.h>
-#include <linux/socket.h>
-#include <linux/tcp.h>
-#include <linux/slab.h>
-#include <linux/proc_fs.h>
-#include <linux/module.h>
-#include <linux/ktime.h>
-#include <linux/time.h>
-#include <net/net_namespace.h>
-
-#include <net/tcp.h>
-
-MODULE_AUTHOR("Stephen Hemminger <shemminger@linux-foundation.org>");
-MODULE_DESCRIPTION("TCP cwnd snooper");
-MODULE_LICENSE("GPL");
-MODULE_VERSION("1.1");
-
-static int port __read_mostly;
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-static unsigned int bufsize __read_mostly = 4096;
-MODULE_PARM_DESC(bufsize, "Log buffer size in packets (4096)");
-module_param(bufsize, uint, 0);
-
-static unsigned int fwmark __read_mostly;
-MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)");
-module_param(fwmark, uint, 0);
-
-static int full __read_mostly;
-MODULE_PARM_DESC(full, "Full log (1=every ack packet received, 0=only cwnd changes)");
-module_param(full, int, 0);
-
-static const char procname[] = "tcpprobe";
-
-struct tcp_log {
- ktime_t tstamp;
- union {
- struct sockaddr raw;
- struct sockaddr_in v4;
- struct sockaddr_in6 v6;
- } src, dst;
- u16 length;
- u32 snd_nxt;
- u32 snd_una;
- u32 snd_wnd;
- u32 rcv_wnd;
- u32 snd_cwnd;
- u32 ssthresh;
- u32 srtt;
-};
-
-static struct {
- spinlock_t lock;
- wait_queue_head_t wait;
- ktime_t start;
- u32 lastcwnd;
-
- unsigned long head, tail;
- struct tcp_log *log;
-} tcp_probe;
-
-static inline int tcp_probe_used(void)
-{
- return (tcp_probe.head - tcp_probe.tail) & (bufsize - 1);
-}
-
-static inline int tcp_probe_avail(void)
-{
- return bufsize - tcp_probe_used() - 1;
-}
-
-#define tcp_probe_copy_fl_to_si4(inet, si4, mem) \
- do { \
- si4.sin_family = AF_INET; \
- si4.sin_port = inet->inet_##mem##port; \
- si4.sin_addr.s_addr = inet->inet_##mem##addr; \
- } while (0) \
-
-/*
- * Hook inserted to be called before each receive packet.
- * Note: arguments must match tcp_rcv_established()!
- */
-static void jtcp_rcv_established(struct sock *sk, struct sk_buff *skb,
- const struct tcphdr *th)
-{
- unsigned int len = skb->len;
- const struct tcp_sock *tp = tcp_sk(sk);
- const struct inet_sock *inet = inet_sk(sk);
-
- /* Only update if port or skb mark matches */
- if (((port == 0 && fwmark == 0) ||
- ntohs(inet->inet_dport) == port ||
- ntohs(inet->inet_sport) == port ||
- (fwmark > 0 && skb->mark == fwmark)) &&
- (full || tp->snd_cwnd != tcp_probe.lastcwnd)) {
-
- spin_lock(&tcp_probe.lock);
- /* If log fills, just silently drop */
- if (tcp_probe_avail() > 1) {
- struct tcp_log *p = tcp_probe.log + tcp_probe.head;
-
- p->tstamp = ktime_get();
- switch (sk->sk_family) {
- case AF_INET:
- tcp_probe_copy_fl_to_si4(inet, p->src.v4, s);
- tcp_probe_copy_fl_to_si4(inet, p->dst.v4, d);
- break;
- case AF_INET6:
- memset(&p->src.v6, 0, sizeof(p->src.v6));
- memset(&p->dst.v6, 0, sizeof(p->dst.v6));
-#if IS_ENABLED(CONFIG_IPV6)
- p->src.v6.sin6_family = AF_INET6;
- p->src.v6.sin6_port = inet->inet_sport;
- p->src.v6.sin6_addr = inet6_sk(sk)->saddr;
-
- p->dst.v6.sin6_family = AF_INET6;
- p->dst.v6.sin6_port = inet->inet_dport;
- p->dst.v6.sin6_addr = sk->sk_v6_daddr;
-#endif
- break;
- default:
- BUG();
- }
-
- p->length = len;
- p->snd_nxt = tp->snd_nxt;
- p->snd_una = tp->snd_una;
- p->snd_cwnd = tp->snd_cwnd;
- p->snd_wnd = tp->snd_wnd;
- p->rcv_wnd = tp->rcv_wnd;
- p->ssthresh = tcp_current_ssthresh(sk);
- p->srtt = tp->srtt_us >> 3;
-
- tcp_probe.head = (tcp_probe.head + 1) & (bufsize - 1);
- }
- tcp_probe.lastcwnd = tp->snd_cwnd;
- spin_unlock(&tcp_probe.lock);
-
- wake_up(&tcp_probe.wait);
- }
-
- jprobe_return();
-}
-
-static struct jprobe tcp_jprobe = {
- .kp = {
- .symbol_name = "tcp_rcv_established",
- },
- .entry = jtcp_rcv_established,
-};
-
-static int tcpprobe_open(struct inode *inode, struct file *file)
-{
- /* Reset (empty) log */
- spin_lock_bh(&tcp_probe.lock);
- tcp_probe.head = tcp_probe.tail = 0;
- tcp_probe.start = ktime_get();
- spin_unlock_bh(&tcp_probe.lock);
-
- return 0;
-}
-
-static int tcpprobe_sprint(char *tbuf, int n)
-{
- const struct tcp_log *p
- = tcp_probe.log + tcp_probe.tail;
- struct timespec64 ts
- = ktime_to_timespec64(ktime_sub(p->tstamp, tcp_probe.start));
-
- return scnprintf(tbuf, n,
- "%lu.%09lu %pISpc %pISpc %d %#x %#x %u %u %u %u %u\n",
- (unsigned long)ts.tv_sec,
- (unsigned long)ts.tv_nsec,
- &p->src, &p->dst, p->length, p->snd_nxt, p->snd_una,
- p->snd_cwnd, p->ssthresh, p->snd_wnd, p->srtt, p->rcv_wnd);
-}
-
-static ssize_t tcpprobe_read(struct file *file, char __user *buf,
- size_t len, loff_t *ppos)
-{
- int error = 0;
- size_t cnt = 0;
-
- if (!buf)
- return -EINVAL;
-
- while (cnt < len) {
- char tbuf[256];
- int width;
-
- /* Wait for data in buffer */
- error = wait_event_interruptible(tcp_probe.wait,
- tcp_probe_used() > 0);
- if (error)
- break;
-
- spin_lock_bh(&tcp_probe.lock);
- if (tcp_probe.head == tcp_probe.tail) {
- /* multiple readers race? */
- spin_unlock_bh(&tcp_probe.lock);
- continue;
- }
-
- width = tcpprobe_sprint(tbuf, sizeof(tbuf));
-
- if (cnt + width < len)
- tcp_probe.tail = (tcp_probe.tail + 1) & (bufsize - 1);
-
- spin_unlock_bh(&tcp_probe.lock);
-
- /* if record greater than space available
- return partial buffer (so far) */
- if (cnt + width >= len)
- break;
-
- if (copy_to_user(buf + cnt, tbuf, width))
- return -EFAULT;
- cnt += width;
- }
-
- return cnt == 0 ? error : cnt;
-}
-
-static const struct file_operations tcpprobe_fops = {
- .owner = THIS_MODULE,
- .open = tcpprobe_open,
- .read = tcpprobe_read,
- .llseek = noop_llseek,
-};
-
-static __init int tcpprobe_init(void)
-{
- int ret = -ENOMEM;
-
- /* Warning: if the function signature of tcp_rcv_established,
- * has been changed, you also have to change the signature of
- * jtcp_rcv_established, otherwise you end up right here!
- */
- BUILD_BUG_ON(__same_type(tcp_rcv_established,
- jtcp_rcv_established) == 0);
-
- init_waitqueue_head(&tcp_probe.wait);
- spin_lock_init(&tcp_probe.lock);
-
- if (bufsize == 0)
- return -EINVAL;
-
- bufsize = roundup_pow_of_two(bufsize);
- tcp_probe.log = kcalloc(bufsize, sizeof(struct tcp_log), GFP_KERNEL);
- if (!tcp_probe.log)
- goto err0;
-
- if (!proc_create(procname, S_IRUSR, init_net.proc_net, &tcpprobe_fops))
- goto err0;
-
- ret = register_jprobe(&tcp_jprobe);
- if (ret)
- goto err1;
-
- pr_info("probe registered (port=%d/fwmark=%u) bufsize=%u\n",
- port, fwmark, bufsize);
- return 0;
- err1:
- remove_proc_entry(procname, init_net.proc_net);
- err0:
- kfree(tcp_probe.log);
- return ret;
-}
-module_init(tcpprobe_init);
-
-static __exit void tcpprobe_exit(void)
-{
- remove_proc_entry(procname, init_net.proc_net);
- unregister_jprobe(&tcp_jprobe);
- kfree(tcp_probe.log);
-}
-module_exit(tcpprobe_exit);
^ permalink raw reply related
* [PATCH net-next v4 3/6] net: sctp: Add SCTP ACK tracking trace event
From: Masami Hiramatsu @ 2017-12-20 4:15 UTC (permalink / raw)
To: Ingo Molnar, David S . Miller, Ian McDonald, Vlad Yasevich,
Stephen Hemminger, Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
Gerrit Renker, Neil Horman, dccp, netdev, linux-sctp,
Stephen Rothwell, mhiramat
In-Reply-To: <151374325126.2497.6934744693865165386.stgit@devbox>
Add SCTP ACK tracking trace event to trace the changes of SCTP
association state in response to incoming packets.
It is used for debugging SCTP congestion control algorithms,
and will replace sctp_probe module.
Note that this event a bit tricky. Since this consists of 2
events (sctp_probe and sctp_probe_path) so you have to enable
both events as below.
# cd /sys/kernel/debug/tracing
# echo 1 > events/sctp/sctp_probe/enable
# echo 1 > events/sctp/sctp_probe_path/enable
Or, you can enable all the events under sctp.
# echo 1 > events/sctp/enable
Since sctp_probe_path event is always invoked from sctp_probe
event, you can not see any output if you only enable
sctp_probe_path.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
Changes in v3:
- Add checking whether sctp_probe_path event is enabled
before iterating sctp paths to record. Thanks Steven.
Changes in v4:
- Move a temporal variable definition in the block.
- Fix to cast pointer to unsigned long instead of __u64
for 32bit environment.
---
include/trace/events/sctp.h | 99 +++++++++++++++++++++++++++++++++++++++++++
net/sctp/sm_statefuns.c | 5 ++
2 files changed, 104 insertions(+)
create mode 100644 include/trace/events/sctp.h
diff --git a/include/trace/events/sctp.h b/include/trace/events/sctp.h
new file mode 100644
index 000000000000..7475c7be165a
--- /dev/null
+++ b/include/trace/events/sctp.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM sctp
+
+#if !defined(_TRACE_SCTP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SCTP_H
+
+#include <net/sctp/structs.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(sctp_probe_path,
+
+ TP_PROTO(struct sctp_transport *sp,
+ const struct sctp_association *asoc),
+
+ TP_ARGS(sp, asoc),
+
+ TP_STRUCT__entry(
+ __field(__u64, asoc)
+ __field(__u32, primary)
+ __array(__u8, ipaddr, sizeof(union sctp_addr))
+ __field(__u32, state)
+ __field(__u32, cwnd)
+ __field(__u32, ssthresh)
+ __field(__u32, flight_size)
+ __field(__u32, partial_bytes_acked)
+ __field(__u32, pathmtu)
+ ),
+
+ TP_fast_assign(
+ __entry->asoc = (unsigned long)asoc;
+ __entry->primary = (sp == asoc->peer.primary_path);
+ memcpy(__entry->ipaddr, &sp->ipaddr, sizeof(union sctp_addr));
+ __entry->state = sp->state;
+ __entry->cwnd = sp->cwnd;
+ __entry->ssthresh = sp->ssthresh;
+ __entry->flight_size = sp->flight_size;
+ __entry->partial_bytes_acked = sp->partial_bytes_acked;
+ __entry->pathmtu = sp->pathmtu;
+ ),
+
+ TP_printk("asoc=%#llx%s ipaddr=%pISpc state=%u cwnd=%u ssthresh=%u "
+ "flight_size=%u partial_bytes_acked=%u pathmtu=%u",
+ __entry->asoc, __entry->primary ? "(*)" : "",
+ __entry->ipaddr, __entry->state, __entry->cwnd,
+ __entry->ssthresh, __entry->flight_size,
+ __entry->partial_bytes_acked, __entry->pathmtu)
+);
+
+TRACE_EVENT(sctp_probe,
+
+ TP_PROTO(const struct sctp_endpoint *ep,
+ const struct sctp_association *asoc,
+ struct sctp_chunk *chunk),
+
+ TP_ARGS(ep, asoc, chunk),
+
+ TP_STRUCT__entry(
+ __field(__u64, asoc)
+ __field(__u32, mark)
+ __field(__u16, bind_port)
+ __field(__u16, peer_port)
+ __field(__u32, pathmtu)
+ __field(__u32, rwnd)
+ __field(__u16, unack_data)
+ ),
+
+ TP_fast_assign(
+ struct sk_buff *skb = chunk->skb;
+
+ __entry->asoc = (unsigned long)asoc;
+ __entry->mark = skb->mark;
+ __entry->bind_port = ep->base.bind_addr.port;
+ __entry->peer_port = asoc->peer.port;
+ __entry->pathmtu = asoc->pathmtu;
+ __entry->rwnd = asoc->peer.rwnd;
+ __entry->unack_data = asoc->unack_data;
+
+ if (trace_sctp_probe_path_enabled()) {
+ struct sctp_transport *sp;
+
+ list_for_each_entry(sp, &asoc->peer.transport_addr_list,
+ transports) {
+ trace_sctp_probe_path(sp, asoc);
+ }
+ }
+ ),
+
+ TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d "
+ "rwnd=%u unack_data=%d",
+ __entry->asoc, __entry->mark, __entry->bind_port,
+ __entry->peer_port, __entry->pathmtu, __entry->rwnd,
+ __entry->unack_data)
+);
+
+#endif /* _TRACE_SCTP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 541f34735346..eb7905ffe5f2 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -59,6 +59,9 @@
#include <net/sctp/sm.h>
#include <net/sctp/structs.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/sctp.h>
+
static struct sctp_packet *sctp_abort_pkt_new(
struct net *net,
const struct sctp_endpoint *ep,
@@ -3219,6 +3222,8 @@ enum sctp_disposition sctp_sf_eat_sack_6_2(struct net *net,
struct sctp_sackhdr *sackh;
__u32 ctsn;
+ trace_sctp_probe(ep, asoc, chunk);
+
if (!sctp_vtag_verify(chunk, asoc))
return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
^ permalink raw reply related
* [PATCH net-next v4 4/6] net: sctp: Remove debug SCTP probe module
From: Masami Hiramatsu @ 2017-12-20 4:16 UTC (permalink / raw)
To: Ingo Molnar, David S . Miller, Ian McDonald, Vlad Yasevich,
Stephen Hemminger, Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
Gerrit Renker, Neil Horman, dccp, netdev, linux-sctp,
Stephen Rothwell, mhiramat
In-Reply-To: <151374325126.2497.6934744693865165386.stgit@devbox>
Remove SCTP probe module since jprobe has been deprecated.
That function is now replaced by sctp/sctp_probe and
sctp/sctp_probe_path trace-events.
You can use it via ftrace or perftools.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
net/sctp/Kconfig | 12 ---
net/sctp/Makefile | 3 -
net/sctp/probe.c | 244 -----------------------------------------------------
3 files changed, 259 deletions(-)
delete mode 100644 net/sctp/probe.c
diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index d9c04dc1b3f3..c740b189d4ba 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -37,18 +37,6 @@ menuconfig IP_SCTP
if IP_SCTP
-config NET_SCTPPROBE
- tristate "SCTP: Association probing"
- depends on PROC_FS && KPROBES
- ---help---
- This module allows for capturing the changes to SCTP association
- state in response to incoming packets. It is used for debugging
- SCTP congestion control algorithms. If you don't understand
- what was just said, you don't need it: say N.
-
- To compile this code as a module, choose M here: the
- module will be called sctp_probe.
-
config SCTP_DBG_OBJCNT
bool "SCTP: Debug object counts"
depends on PROC_FS
diff --git a/net/sctp/Makefile b/net/sctp/Makefile
index 54bd9c1a8aa1..6776582ec449 100644
--- a/net/sctp/Makefile
+++ b/net/sctp/Makefile
@@ -4,7 +4,6 @@
#
obj-$(CONFIG_IP_SCTP) += sctp.o
-obj-$(CONFIG_NET_SCTPPROBE) += sctp_probe.o
obj-$(CONFIG_INET_SCTP_DIAG) += sctp_diag.o
sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
@@ -16,8 +15,6 @@ sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
offload.o stream_sched.o stream_sched_prio.o \
stream_sched_rr.o stream_interleave.o
-sctp_probe-y := probe.o
-
sctp-$(CONFIG_SCTP_DBG_OBJCNT) += objcnt.o
sctp-$(CONFIG_PROC_FS) += proc.o
sctp-$(CONFIG_SYSCTL) += sysctl.o
diff --git a/net/sctp/probe.c b/net/sctp/probe.c
deleted file mode 100644
index 1280f85a598d..000000000000
--- a/net/sctp/probe.c
+++ /dev/null
@@ -1,244 +0,0 @@
-/*
- * sctp_probe - Observe the SCTP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger <shemminger@osdl.org>
- *
- * Modified for SCTP from Stephen Hemminger's code
- * Copyright (C) 2010, Wei Yongjun <yjwei@cn.fujitsu.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/kernel.h>
-#include <linux/kprobes.h>
-#include <linux/socket.h>
-#include <linux/sctp.h>
-#include <linux/proc_fs.h>
-#include <linux/vmalloc.h>
-#include <linux/module.h>
-#include <linux/kfifo.h>
-#include <linux/time.h>
-#include <net/net_namespace.h>
-
-#include <net/sctp/sctp.h>
-#include <net/sctp/sm.h>
-
-MODULE_SOFTDEP("pre: sctp");
-MODULE_AUTHOR("Wei Yongjun <yjwei@cn.fujitsu.com>");
-MODULE_DESCRIPTION("SCTP snooper");
-MODULE_LICENSE("GPL");
-
-static int port __read_mostly = 0;
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-static unsigned int fwmark __read_mostly = 0;
-MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)");
-module_param(fwmark, uint, 0);
-
-static int bufsize __read_mostly = 64 * 1024;
-MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)");
-module_param(bufsize, int, 0);
-
-static int full __read_mostly = 1;
-MODULE_PARM_DESC(full, "Full log (1=every ack packet received, 0=only cwnd changes)");
-module_param(full, int, 0);
-
-static const char procname[] = "sctpprobe";
-
-static struct {
- struct kfifo fifo;
- spinlock_t lock;
- wait_queue_head_t wait;
- struct timespec64 tstart;
-} sctpw;
-
-static __printf(1, 2) void printl(const char *fmt, ...)
-{
- va_list args;
- int len;
- char tbuf[256];
-
- va_start(args, fmt);
- len = vscnprintf(tbuf, sizeof(tbuf), fmt, args);
- va_end(args);
-
- kfifo_in_locked(&sctpw.fifo, tbuf, len, &sctpw.lock);
- wake_up(&sctpw.wait);
-}
-
-static int sctpprobe_open(struct inode *inode, struct file *file)
-{
- kfifo_reset(&sctpw.fifo);
- ktime_get_ts64(&sctpw.tstart);
-
- return 0;
-}
-
-static ssize_t sctpprobe_read(struct file *file, char __user *buf,
- size_t len, loff_t *ppos)
-{
- int error = 0, cnt = 0;
- unsigned char *tbuf;
-
- if (!buf)
- return -EINVAL;
-
- if (len == 0)
- return 0;
-
- tbuf = vmalloc(len);
- if (!tbuf)
- return -ENOMEM;
-
- error = wait_event_interruptible(sctpw.wait,
- kfifo_len(&sctpw.fifo) != 0);
- if (error)
- goto out_free;
-
- cnt = kfifo_out_locked(&sctpw.fifo, tbuf, len, &sctpw.lock);
- error = copy_to_user(buf, tbuf, cnt) ? -EFAULT : 0;
-
-out_free:
- vfree(tbuf);
-
- return error ? error : cnt;
-}
-
-static const struct file_operations sctpprobe_fops = {
- .owner = THIS_MODULE,
- .open = sctpprobe_open,
- .read = sctpprobe_read,
- .llseek = noop_llseek,
-};
-
-static enum sctp_disposition jsctp_sf_eat_sack(
- struct net *net,
- const struct sctp_endpoint *ep,
- const struct sctp_association *asoc,
- const union sctp_subtype type,
- void *arg,
- struct sctp_cmd_seq *commands)
-{
- struct sctp_chunk *chunk = arg;
- struct sk_buff *skb = chunk->skb;
- struct sctp_transport *sp;
- static __u32 lcwnd = 0;
- struct timespec64 now;
-
- sp = asoc->peer.primary_path;
-
- if (((port == 0 && fwmark == 0) ||
- asoc->peer.port == port ||
- ep->base.bind_addr.port == port ||
- (fwmark > 0 && skb->mark == fwmark)) &&
- (full || sp->cwnd != lcwnd)) {
- lcwnd = sp->cwnd;
-
- ktime_get_ts64(&now);
- now = timespec64_sub(now, sctpw.tstart);
-
- printl("%lu.%06lu ", (unsigned long) now.tv_sec,
- (unsigned long) now.tv_nsec / NSEC_PER_USEC);
-
- printl("%p %5d %5d %5d %8d %5d ", asoc,
- ep->base.bind_addr.port, asoc->peer.port,
- asoc->pathmtu, asoc->peer.rwnd, asoc->unack_data);
-
- list_for_each_entry(sp, &asoc->peer.transport_addr_list,
- transports) {
- if (sp == asoc->peer.primary_path)
- printl("*");
-
- printl("%pISc %2u %8u %8u %8u %8u %8u ",
- &sp->ipaddr, sp->state, sp->cwnd, sp->ssthresh,
- sp->flight_size, sp->partial_bytes_acked,
- sp->pathmtu);
- }
- printl("\n");
- }
-
- jprobe_return();
- return 0;
-}
-
-static struct jprobe sctp_recv_probe = {
- .kp = {
- .symbol_name = "sctp_sf_eat_sack_6_2",
- },
- .entry = jsctp_sf_eat_sack,
-};
-
-static __init int sctp_setup_jprobe(void)
-{
- int ret = register_jprobe(&sctp_recv_probe);
-
- if (ret) {
- if (request_module("sctp"))
- goto out;
- ret = register_jprobe(&sctp_recv_probe);
- }
-
-out:
- return ret;
-}
-
-static __init int sctpprobe_init(void)
-{
- int ret = -ENOMEM;
-
- /* Warning: if the function signature of sctp_sf_eat_sack_6_2,
- * has been changed, you also have to change the signature of
- * jsctp_sf_eat_sack, otherwise you end up right here!
- */
- BUILD_BUG_ON(__same_type(sctp_sf_eat_sack_6_2,
- jsctp_sf_eat_sack) == 0);
-
- init_waitqueue_head(&sctpw.wait);
- spin_lock_init(&sctpw.lock);
- if (kfifo_alloc(&sctpw.fifo, bufsize, GFP_KERNEL))
- return ret;
-
- if (!proc_create(procname, S_IRUSR, init_net.proc_net,
- &sctpprobe_fops))
- goto free_kfifo;
-
- ret = sctp_setup_jprobe();
- if (ret)
- goto remove_proc;
-
- pr_info("probe registered (port=%d/fwmark=%u) bufsize=%u\n",
- port, fwmark, bufsize);
- return 0;
-
-remove_proc:
- remove_proc_entry(procname, init_net.proc_net);
-free_kfifo:
- kfifo_free(&sctpw.fifo);
- return ret;
-}
-
-static __exit void sctpprobe_exit(void)
-{
- kfifo_free(&sctpw.fifo);
- remove_proc_entry(procname, init_net.proc_net);
- unregister_jprobe(&sctp_recv_probe);
-}
-
-module_init(sctpprobe_init);
-module_exit(sctpprobe_exit);
^ permalink raw reply related
* [PATCH net-next v4 5/6] net: dccp: Add DCCP sendmsg trace event
From: Masami Hiramatsu @ 2017-12-20 4:16 UTC (permalink / raw)
To: Ingo Molnar, David S . Miller, Ian McDonald, Vlad Yasevich,
Stephen Hemminger, Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
Gerrit Renker, Neil Horman, dccp, netdev, linux-sctp,
Stephen Rothwell, mhiramat
In-Reply-To: <151374325126.2497.6934744693865165386.stgit@devbox>
Add DCCP sendmsg trace event (dccp/dccp_probe) for
replacing dccpprobe. User can trace this event via
ftrace or perftools.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
net/dccp/proto.c | 5 +++
net/dccp/trace.h | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 110 insertions(+)
create mode 100644 net/dccp/trace.h
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 9d43c1f40274..e57b5db495cd 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -38,6 +38,9 @@
#include "dccp.h"
#include "feat.h"
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly;
EXPORT_SYMBOL_GPL(dccp_statistics);
@@ -761,6 +764,8 @@ int dccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
int rc, size;
long timeo;
+ trace_dccp_probe(sk, len);
+
if (len > dp->dccps_mss_cache)
return -EMSGSIZE;
diff --git a/net/dccp/trace.h b/net/dccp/trace.h
new file mode 100644
index 000000000000..aa01321a6c37
--- /dev/null
+++ b/net/dccp/trace.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM dccp
+
+#if !defined(_TRACE_DCCP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DCCP_H
+
+#include <net/sock.h>
+#include "dccp.h"
+#include "ccids/ccid3.h"
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(dccp_probe,
+
+ TP_PROTO(struct sock *sk, size_t size),
+
+ TP_ARGS(sk, size),
+
+ TP_STRUCT__entry(
+ /* sockaddr_in6 is always bigger than sockaddr_in */
+ __array(__u8, saddr, sizeof(struct sockaddr_in6))
+ __array(__u8, daddr, sizeof(struct sockaddr_in6))
+ __field(__u16, sport)
+ __field(__u16, dport)
+ __field(__u16, size)
+ __field(__u16, tx_s)
+ __field(__u32, tx_rtt)
+ __field(__u32, tx_p)
+ __field(__u32, tx_x_calc)
+ __field(__u64, tx_x_recv)
+ __field(__u64, tx_x)
+ __field(__u32, tx_t_ipi)
+ ),
+
+ TP_fast_assign(
+ const struct inet_sock *inet = inet_sk(sk);
+ struct ccid3_hc_tx_sock *hc = NULL;
+
+ if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3)
+ hc = ccid3_hc_tx_sk(sk);
+
+ memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+ memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
+
+ if (sk->sk_family == AF_INET) {
+ struct sockaddr_in *v4 = (void *)__entry->saddr;
+
+ v4->sin_family = AF_INET;
+ v4->sin_port = inet->inet_sport;
+ v4->sin_addr.s_addr = inet->inet_saddr;
+ v4 = (void *)__entry->daddr;
+ v4->sin_family = AF_INET;
+ v4->sin_port = inet->inet_dport;
+ v4->sin_addr.s_addr = inet->inet_daddr;
+#if IS_ENABLED(CONFIG_IPV6)
+ } else if (sk->sk_family == AF_INET6) {
+ struct sockaddr_in6 *v6 = (void *)__entry->saddr;
+
+ v6->sin6_family = AF_INET6;
+ v6->sin6_port = inet->inet_sport;
+ v6->sin6_addr = inet6_sk(sk)->saddr;
+ v6 = (void *)__entry->daddr;
+ v6->sin6_family = AF_INET6;
+ v6->sin6_port = inet->inet_dport;
+ v6->sin6_addr = sk->sk_v6_daddr;
+#endif
+ }
+
+ /* For filtering use */
+ __entry->sport = ntohs(inet->inet_sport);
+ __entry->dport = ntohs(inet->inet_dport);
+
+ __entry->size = size;
+ if (hc) {
+ __entry->tx_s = hc->tx_s;
+ __entry->tx_rtt = hc->tx_rtt;
+ __entry->tx_p = hc->tx_p;
+ __entry->tx_x_calc = hc->tx_x_calc;
+ __entry->tx_x_recv = hc->tx_x_recv >> 6;
+ __entry->tx_x = hc->tx_x >> 6;
+ __entry->tx_t_ipi = hc->tx_t_ipi;
+ } else {
+ __entry->tx_s = 0;
+ memset(&__entry->tx_rtt, 0, (void *)&__entry->tx_t_ipi -
+ (void *)&__entry->tx_rtt +
+ sizeof(__entry->tx_t_ipi));
+ }
+ ),
+
+ TP_printk("src=%pISpc dest=%pISpc size=%d tx_s=%d tx_rtt=%d "
+ "tx_p=%d tx_x_calc=%u tx_x_recv=%llu tx_x=%llu tx_t_ipi=%d",
+ __entry->saddr, __entry->daddr, __entry->size,
+ __entry->tx_s, __entry->tx_rtt, __entry->tx_p,
+ __entry->tx_x_calc, __entry->tx_x_recv, __entry->tx_x,
+ __entry->tx_t_ipi)
+);
+
+#endif /* _TRACE_TCP_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>
^ permalink raw reply related
* [PATCH net-next v4 6/6] net: dccp: Remove dccpprobe module
From: Masami Hiramatsu @ 2017-12-20 4:17 UTC (permalink / raw)
To: Ingo Molnar, David S . Miller, Ian McDonald, Vlad Yasevich,
Stephen Hemminger, Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
Gerrit Renker, Neil Horman, dccp, netdev, linux-sctp,
Stephen Rothwell, mhiramat
In-Reply-To: <151374325126.2497.6934744693865165386.stgit@devbox>
Remove DCCP probe module since jprobe has been deprecated.
That function is now replaced by dccp/dccp_probe trace-event.
You can use it via ftrace or perftools.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
net/dccp/Kconfig | 17 ----
net/dccp/Makefile | 2 -
net/dccp/probe.c | 203 -----------------------------------------------------
3 files changed, 222 deletions(-)
delete mode 100644 net/dccp/probe.c
diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig
index 8c0ef71bed2f..b270e84d9c13 100644
--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -39,23 +39,6 @@ config IP_DCCP_DEBUG
Just say N.
-config NET_DCCPPROBE
- tristate "DCCP connection probing"
- depends on PROC_FS && KPROBES
- ---help---
- This module allows for capturing the changes to DCCP connection
- state in response to incoming packets. It is used for debugging
- DCCP congestion avoidance modules. If you don't understand
- what was just said, you don't need it: say N.
-
- Documentation on how to use DCCP connection probing can be found
- at:
-
- http://www.linuxfoundation.org/collaborate/workgroups/networking/dccpprobe
-
- To compile this code as a module, choose M here: the
- module will be called dccp_probe.
-
endmenu
diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 2e7b56097bc4..9d0383d2f277 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -21,9 +21,7 @@ obj-$(subst y,$(CONFIG_IP_DCCP),$(CONFIG_IPV6)) += dccp_ipv6.o
dccp_ipv6-y := ipv6.o
obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o
-obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o
dccp-$(CONFIG_SYSCTL) += sysctl.o
dccp_diag-y := diag.o
-dccp_probe-y := probe.o
diff --git a/net/dccp/probe.c b/net/dccp/probe.c
deleted file mode 100644
index 3d3fda05b32d..000000000000
--- a/net/dccp/probe.c
+++ /dev/null
@@ -1,203 +0,0 @@
-/*
- * dccp_probe - Observe the DCCP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger <shemminger@osdl.org>
- *
- * Modified for DCCP from Stephen Hemminger's code
- * Copyright (C) 2006, Ian McDonald <ian.mcdonald@jandi.co.nz>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include <linux/kernel.h>
-#include <linux/kprobes.h>
-#include <linux/socket.h>
-#include <linux/dccp.h>
-#include <linux/proc_fs.h>
-#include <linux/module.h>
-#include <linux/kfifo.h>
-#include <linux/vmalloc.h>
-#include <linux/time64.h>
-#include <linux/gfp.h>
-#include <net/net_namespace.h>
-
-#include "dccp.h"
-#include "ccid.h"
-#include "ccids/ccid3.h"
-
-static int port;
-
-static int bufsize = 64 * 1024;
-
-static const char procname[] = "dccpprobe";
-
-static struct {
- struct kfifo fifo;
- spinlock_t lock;
- wait_queue_head_t wait;
- struct timespec64 tstart;
-} dccpw;
-
-static void printl(const char *fmt, ...)
-{
- va_list args;
- int len;
- struct timespec64 now;
- char tbuf[256];
-
- va_start(args, fmt);
- getnstimeofday64(&now);
-
- now = timespec64_sub(now, dccpw.tstart);
-
- len = sprintf(tbuf, "%lu.%06lu ",
- (unsigned long) now.tv_sec,
- (unsigned long) now.tv_nsec / NSEC_PER_USEC);
- len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args);
- va_end(args);
-
- kfifo_in_locked(&dccpw.fifo, tbuf, len, &dccpw.lock);
- wake_up(&dccpw.wait);
-}
-
-static int jdccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
-{
- const struct inet_sock *inet = inet_sk(sk);
- struct ccid3_hc_tx_sock *hc = NULL;
-
- if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3)
- hc = ccid3_hc_tx_sk(sk);
-
- if (port == 0 || ntohs(inet->inet_dport) == port ||
- ntohs(inet->inet_sport) == port) {
- if (hc)
- printl("%pI4:%u %pI4:%u %d %d %d %d %u %llu %llu %d\n",
- &inet->inet_saddr, ntohs(inet->inet_sport),
- &inet->inet_daddr, ntohs(inet->inet_dport), size,
- hc->tx_s, hc->tx_rtt, hc->tx_p,
- hc->tx_x_calc, hc->tx_x_recv >> 6,
- hc->tx_x >> 6, hc->tx_t_ipi);
- else
- printl("%pI4:%u %pI4:%u %d\n",
- &inet->inet_saddr, ntohs(inet->inet_sport),
- &inet->inet_daddr, ntohs(inet->inet_dport),
- size);
- }
-
- jprobe_return();
- return 0;
-}
-
-static struct jprobe dccp_send_probe = {
- .kp = {
- .symbol_name = "dccp_sendmsg",
- },
- .entry = jdccp_sendmsg,
-};
-
-static int dccpprobe_open(struct inode *inode, struct file *file)
-{
- kfifo_reset(&dccpw.fifo);
- getnstimeofday64(&dccpw.tstart);
- return 0;
-}
-
-static ssize_t dccpprobe_read(struct file *file, char __user *buf,
- size_t len, loff_t *ppos)
-{
- int error = 0, cnt = 0;
- unsigned char *tbuf;
-
- if (!buf)
- return -EINVAL;
-
- if (len == 0)
- return 0;
-
- tbuf = vmalloc(len);
- if (!tbuf)
- return -ENOMEM;
-
- error = wait_event_interruptible(dccpw.wait,
- kfifo_len(&dccpw.fifo) != 0);
- if (error)
- goto out_free;
-
- cnt = kfifo_out_locked(&dccpw.fifo, tbuf, len, &dccpw.lock);
- error = copy_to_user(buf, tbuf, cnt) ? -EFAULT : 0;
-
-out_free:
- vfree(tbuf);
-
- return error ? error : cnt;
-}
-
-static const struct file_operations dccpprobe_fops = {
- .owner = THIS_MODULE,
- .open = dccpprobe_open,
- .read = dccpprobe_read,
- .llseek = noop_llseek,
-};
-
-static __init int dccpprobe_init(void)
-{
- int ret = -ENOMEM;
-
- init_waitqueue_head(&dccpw.wait);
- spin_lock_init(&dccpw.lock);
- if (kfifo_alloc(&dccpw.fifo, bufsize, GFP_KERNEL))
- return ret;
- if (!proc_create(procname, S_IRUSR, init_net.proc_net, &dccpprobe_fops))
- goto err0;
-
- ret = register_jprobe(&dccp_send_probe);
- if (ret) {
- ret = request_module("dccp");
- if (!ret)
- ret = register_jprobe(&dccp_send_probe);
- }
-
- if (ret)
- goto err1;
-
- pr_info("DCCP watch registered (port=%d)\n", port);
- return 0;
-err1:
- remove_proc_entry(procname, init_net.proc_net);
-err0:
- kfifo_free(&dccpw.fifo);
- return ret;
-}
-module_init(dccpprobe_init);
-
-static __exit void dccpprobe_exit(void)
-{
- kfifo_free(&dccpw.fifo);
- remove_proc_entry(procname, init_net.proc_net);
- unregister_jprobe(&dccp_send_probe);
-
-}
-module_exit(dccpprobe_exit);
-
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)");
-module_param(bufsize, int, 0);
-
-MODULE_AUTHOR("Ian McDonald <ian.mcdonald@jandi.co.nz>");
-MODULE_DESCRIPTION("DCCP snooper");
-MODULE_LICENSE("GPL");
^ permalink raw reply related
* [PATCH net-next] virtio_net: Add ethtool stats
From: Toshiaki Makita @ 2017-12-20 4:40 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin, Jason Wang
Cc: Toshiaki Makita, netdev, virtualization
The main purpose of this patch is adding a way of checking per-queue stats.
It's useful to debug performance problems on multiqueue environment.
$ ethtool -S ens10
NIC statistics:
rx_packets: 4172939
tx_packets: 5855538
rx_bytes: 6317757408
tx_bytes: 8865151846
rx_dropped: 0
rx_length_errors: 0
rx_frame_errors: 0
tx_dropped: 0
tx_fifo_errors: 0
rx_queue_0_packets: 2090408
rx_queue_0_bytes: 3164825094
rx_queue_1_packets: 2082531
rx_queue_1_bytes: 3152932314
tx_queue_0_packets: 2770841
tx_queue_0_bytes: 4194955474
tx_queue_1_packets: 3084697
tx_queue_1_bytes: 4670196372
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
drivers/net/virtio_net.c | 187 ++++++++++++++++++++++++++++++++++-------------
1 file changed, 136 insertions(+), 51 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6fb7b65..a0a7bf5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -65,14 +65,31 @@
VIRTIO_NET_F_GUEST_UFO
};
-struct virtnet_stats {
- struct u64_stats_sync tx_syncp;
- struct u64_stats_sync rx_syncp;
- u64 tx_bytes;
- u64 tx_packets;
-
- u64 rx_bytes;
- u64 rx_packets;
+struct virtnet_gstats {
+ char stat_string[ETH_GSTRING_LEN];
+ int stat_offset;
+};
+
+#define VIRTNET_NETDEV_STAT(m) offsetof(struct rtnl_link_stats64, m)
+
+static const struct virtnet_gstats virtnet_gstrings_stats[] = {
+ { "rx_packets", VIRTNET_NETDEV_STAT(rx_packets) },
+ { "tx_packets", VIRTNET_NETDEV_STAT(tx_packets) },
+ { "rx_bytes", VIRTNET_NETDEV_STAT(rx_bytes) },
+ { "tx_bytes", VIRTNET_NETDEV_STAT(tx_bytes) },
+ { "rx_dropped", VIRTNET_NETDEV_STAT(rx_dropped) },
+ { "rx_length_errors", VIRTNET_NETDEV_STAT(rx_length_errors) },
+ { "rx_frame_errors", VIRTNET_NETDEV_STAT(rx_frame_errors) },
+ { "tx_dropped", VIRTNET_NETDEV_STAT(tx_dropped) },
+ { "tx_fifo_errors", VIRTNET_NETDEV_STAT(tx_fifo_errors) },
+};
+
+# define VIRTNET_GSTATS_LEN ARRAY_SIZE(virtnet_gstrings_stats)
+
+struct virtnet_queue_stats {
+ struct u64_stats_sync syncp;
+ u64 bytes;
+ u64 packets;
};
/* Internal representation of a send virtqueue */
@@ -86,6 +103,8 @@ struct send_queue {
/* Name of the send queue: output.$index */
char name[40];
+ struct virtnet_queue_stats stats;
+
struct napi_struct napi;
};
@@ -98,6 +117,8 @@ struct receive_queue {
struct bpf_prog __rcu *xdp_prog;
+ struct virtnet_queue_stats stats;
+
/* Chain pages by the private ptr. */
struct page *pages;
@@ -149,9 +170,6 @@ struct virtnet_info {
/* Packet virtio header size */
u8 hdr_len;
- /* Active statistics */
- struct virtnet_stats __percpu *stats;
-
/* Work struct for refilling if we run low on memory. */
struct delayed_work refill;
@@ -1121,7 +1139,6 @@ static int virtnet_receive(struct receive_queue *rq, int budget, bool *xdp_xmit)
struct virtnet_info *vi = rq->vq->vdev->priv;
unsigned int len, received = 0, bytes = 0;
void *buf;
- struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
if (!vi->big_packets || vi->mergeable_rx_bufs) {
void *ctx;
@@ -1144,10 +1161,10 @@ static int virtnet_receive(struct receive_queue *rq, int budget, bool *xdp_xmit)
schedule_delayed_work(&vi->refill, 0);
}
- u64_stats_update_begin(&stats->rx_syncp);
- stats->rx_bytes += bytes;
- stats->rx_packets += received;
- u64_stats_update_end(&stats->rx_syncp);
+ u64_stats_update_begin(&rq->stats.syncp);
+ rq->stats.bytes += bytes;
+ rq->stats.packets += received;
+ u64_stats_update_end(&rq->stats.syncp);
return received;
}
@@ -1156,8 +1173,6 @@ static void free_old_xmit_skbs(struct send_queue *sq)
{
struct sk_buff *skb;
unsigned int len;
- struct virtnet_info *vi = sq->vq->vdev->priv;
- struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
unsigned int packets = 0;
unsigned int bytes = 0;
@@ -1176,10 +1191,10 @@ static void free_old_xmit_skbs(struct send_queue *sq)
if (!packets)
return;
- u64_stats_update_begin(&stats->tx_syncp);
- stats->tx_bytes += bytes;
- stats->tx_packets += packets;
- u64_stats_update_end(&stats->tx_syncp);
+ u64_stats_update_begin(&sq->stats.syncp);
+ sq->stats.bytes += bytes;
+ sq->stats.packets += packets;
+ u64_stats_update_end(&sq->stats.syncp);
}
static void virtnet_poll_cleantx(struct receive_queue *rq)
@@ -1463,24 +1478,24 @@ static void virtnet_stats(struct net_device *dev,
struct rtnl_link_stats64 *tot)
{
struct virtnet_info *vi = netdev_priv(dev);
- int cpu;
unsigned int start;
+ int i;
- for_each_possible_cpu(cpu) {
- struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
+ for (i = 0; i < vi->max_queue_pairs; i++) {
u64 tpackets, tbytes, rpackets, rbytes;
+ struct receive_queue *rq = &vi->rq[i];
+ struct send_queue *sq = &vi->sq[i];
do {
- start = u64_stats_fetch_begin_irq(&stats->tx_syncp);
- tpackets = stats->tx_packets;
- tbytes = stats->tx_bytes;
- } while (u64_stats_fetch_retry_irq(&stats->tx_syncp, start));
-
+ start = u64_stats_fetch_begin_irq(&sq->stats.syncp);
+ tpackets = sq->stats.packets;
+ tbytes = sq->stats.bytes;
+ } while (u64_stats_fetch_retry_irq(&sq->stats.syncp, start));
do {
- start = u64_stats_fetch_begin_irq(&stats->rx_syncp);
- rpackets = stats->rx_packets;
- rbytes = stats->rx_bytes;
- } while (u64_stats_fetch_retry_irq(&stats->rx_syncp, start));
+ start = u64_stats_fetch_begin_irq(&rq->stats.syncp);
+ rpackets = rq->stats.packets;
+ rbytes = rq->stats.bytes;
+ } while (u64_stats_fetch_retry_irq(&rq->stats.syncp, start));
tot->rx_packets += rpackets;
tot->tx_packets += tpackets;
@@ -1817,6 +1832,84 @@ static int virtnet_set_channels(struct net_device *dev,
return err;
}
+static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *data)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ char *p = (char *)data;
+ unsigned int i;
+
+ switch (stringset) {
+ case ETH_SS_STATS:
+ for (i = 0; i < VIRTNET_GSTATS_LEN; i++) {
+ memcpy(p, virtnet_gstrings_stats[i].stat_string,
+ ETH_GSTRING_LEN);
+ p += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < vi->curr_queue_pairs; i++) {
+ sprintf(p, "rx_queue_%u_packets", i);
+ p += ETH_GSTRING_LEN;
+ sprintf(p, "rx_queue_%u_bytes", i);
+ p += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < vi->curr_queue_pairs; i++) {
+ sprintf(p, "tx_queue_%u_packets", i);
+ p += ETH_GSTRING_LEN;
+ sprintf(p, "tx_queue_%u_bytes", i);
+ p += ETH_GSTRING_LEN;
+ }
+ break;
+ }
+}
+
+static int virtnet_get_sset_count(struct net_device *dev, int sset)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ switch (sset) {
+ case ETH_SS_STATS:
+ return VIRTNET_GSTATS_LEN + vi->curr_queue_pairs * 4;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static void virtnet_get_ethtool_stats(struct net_device *dev,
+ struct ethtool_stats *stats, u64 *data)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct rtnl_link_stats64 storage;
+ unsigned int idx = 0, start, i;
+ const u8 *stats_base;
+
+ stats_base = (u8 *)dev_get_stats(dev, &storage);
+ for (i = 0; i < VIRTNET_GSTATS_LEN; i++) {
+ data[idx++] = *(u64 *)(stats_base +
+ virtnet_gstrings_stats[i].stat_offset);
+ }
+
+ for (i = 0; i < vi->curr_queue_pairs; i++) {
+ struct receive_queue *rq = &vi->rq[i];
+
+ do {
+ start = u64_stats_fetch_begin_irq(&rq->stats.syncp);
+ data[idx] = rq->stats.packets;
+ data[idx + 1] = rq->stats.bytes;
+ } while (u64_stats_fetch_retry_irq(&rq->stats.syncp, start));
+ idx += 2;
+ }
+
+ for (i = 0; i < vi->curr_queue_pairs; i++) {
+ struct send_queue *sq = &vi->sq[i];
+
+ do {
+ start = u64_stats_fetch_begin_irq(&sq->stats.syncp);
+ data[idx] = sq->stats.packets;
+ data[idx + 1] = sq->stats.bytes;
+ } while (u64_stats_fetch_retry_irq(&sq->stats.syncp, start));
+ idx += 2;
+ }
+}
+
static void virtnet_get_channels(struct net_device *dev,
struct ethtool_channels *channels)
{
@@ -1898,6 +1991,9 @@ static void virtnet_init_settings(struct net_device *dev)
.get_drvinfo = virtnet_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_ringparam = virtnet_get_ringparam,
+ .get_strings = virtnet_get_strings,
+ .get_sset_count = virtnet_get_sset_count,
+ .get_ethtool_stats = virtnet_get_ethtool_stats,
.set_channels = virtnet_set_channels,
.get_channels = virtnet_get_channels,
.get_ts_info = ethtool_op_get_ts_info,
@@ -2389,6 +2485,9 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
ewma_pkt_len_init(&vi->rq[i].mrg_avg_pkt_len);
sg_init_table(vi->sq[i].sg, ARRAY_SIZE(vi->sq[i].sg));
+
+ u64_stats_init(&vi->rq[i].stats.syncp);
+ u64_stats_init(&vi->sq[i].stats.syncp);
}
return 0;
@@ -2513,7 +2612,7 @@ static int virtnet_validate(struct virtio_device *vdev)
static int virtnet_probe(struct virtio_device *vdev)
{
- int i, err;
+ int i, err = -ENOMEM;
struct net_device *dev;
struct virtnet_info *vi;
u16 max_queue_pairs;
@@ -2590,17 +2689,6 @@ static int virtnet_probe(struct virtio_device *vdev)
vi->dev = dev;
vi->vdev = vdev;
vdev->priv = vi;
- vi->stats = alloc_percpu(struct virtnet_stats);
- err = -ENOMEM;
- if (vi->stats == NULL)
- goto free;
-
- for_each_possible_cpu(i) {
- struct virtnet_stats *virtnet_stats;
- virtnet_stats = per_cpu_ptr(vi->stats, i);
- u64_stats_init(&virtnet_stats->tx_syncp);
- u64_stats_init(&virtnet_stats->rx_syncp);
- }
INIT_WORK(&vi->config_work, virtnet_config_changed_work);
@@ -2637,7 +2725,7 @@ static int virtnet_probe(struct virtio_device *vdev)
*/
dev_err(&vdev->dev, "device MTU appears to have changed "
"it is now %d < %d", mtu, dev->min_mtu);
- goto free_stats;
+ goto free;
}
dev->mtu = mtu;
@@ -2661,7 +2749,7 @@ static int virtnet_probe(struct virtio_device *vdev)
/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
err = init_vqs(vi);
if (err)
- goto free_stats;
+ goto free;
#ifdef CONFIG_SYSFS
if (vi->mergeable_rx_bufs)
@@ -2715,8 +2803,6 @@ static int virtnet_probe(struct virtio_device *vdev)
cancel_delayed_work_sync(&vi->refill);
free_receive_page_frags(vi);
virtnet_del_vqs(vi);
-free_stats:
- free_percpu(vi->stats);
free:
free_netdev(dev);
return err;
@@ -2749,7 +2835,6 @@ static void virtnet_remove(struct virtio_device *vdev)
remove_vq_common(vi);
- free_percpu(vi->stats);
free_netdev(vi->dev);
}
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net 0/3] Few mvneta fixes
From: Willy Tarreau @ 2017-12-20 5:19 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper, Networking,
Antoine Tenart, Linux Kernel Mailing List, Dmitri Epshtein,
Nadav Haklai, Lior Amsalem, Miquèl Raynal, Gregory CLEMENT,
Marcin Wojtas, David S. Miller, Linux ARM, Sebastian Hesselbarth
In-Reply-To: <CAK8P3a18h_JB_P4DOFmd+v+f5KM1X9h513qUke_7nxoSJtiOUw@mail.gmail.com>
Hi Arnd,
On Tue, Dec 19, 2017 at 09:18:35PM +0100, Arnd Bergmann wrote:
> On Tue, Dec 19, 2017 at 5:59 PM, Gregory CLEMENT
> <gregory.clement@free-electrons.com> wrote:
> > Hello,
> >
> > here it is a small series of fixes found on the mvneta driver. They
> > had been already used in the vendor kernel and are now ported to
> > mainline.
>
> Does one of the patches look like it addresses the rare Oops we discussed on
> #kernelci this morning?
>
> https://storage.kernelci.org/stable/linux-4.9.y/v4.9.70/arm/mvebu_v7_defconfig/lab-free-electrons/boot-armada-375-db.html
I could be wrong but for me the 375 uses mvpp2, not mvneta, so this
should have no effect there.
Willy
^ permalink raw reply
* Re: r8169 regression: UDP packets dropped intermittantly
From: Jonathan Woithe @ 2017-12-20 5:20 UTC (permalink / raw)
To: Michal Kubecek; +Cc: Holger Hoffstätte, netdev, linux-kernel
In-Reply-To: <20171219122523.lhavmoxo3ippftyn@unicorn.suse.cz>
On Tue, Dec 19, 2017 at 01:25:23PM +0100, Michal Kubecek wrote:
> On Tue, Dec 19, 2017 at 04:15:32PM +1030, Jonathan Woithe wrote:
> > This clearly indicates that not every card using the r8169 driver is
> > vulnerable to the problem. It also explains why Holger was unable to
> > reproduce the result on his system: the PCIe cards do not appear to suffer
> > from the problem. Most likely the PCI RTL-8169 chip is affected, but newer
> > PCIe variations do not. However, obviously more testing will be required
> > with a wider variety of cards if this inference is to hold up.
>
> The r8169 driver supports many slightly different variants of the chip.
> To identify your variant more precisely, look for a line like
>
> r8169 0000:02:00.0 eth0: RTL8168evl/8111evl at 0xffffc90003135000, d4:3d:7e:2a:30:08, XID 0c900800 IRQ 38
>
> in kernel log.
The PCIe card (the one which works correctly with the current driver) shows
this:
r8169 0000:02:00.0 eth0: RTL8168e/8111e at 0xf862e000, 80:1f:02:45:25:a4,
XID 0c200000 IRQ 30
r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes,
tx checksumming: ko]
The PCI card (Netgear GA311) which is affected by the problem shows this:
r8169 0000:05:01.0 eth1: RTL8110s at 0xf8706800, e0:91:f5:1b:5f:c6,
XID 04000000 IRQ 22
r8169 0000:05:01.0 eth1: jumbo features [frames: 7152 bytes,
tx checksumming: ok]
The system which has shown the regressed behaviour is running a 32-bit
kernel; for various reasons we can't move to a 64-bit kernel at present.
However, I was able to boot this system using Slackware 14.2 install discs,
and therefore test using both 32-bit and 64-bit 4.4.14 kernels. In both
cases the fault was observed within 30 minutes of starting the tests when
the GA311 card was in use. The fault is therefore not specific to 32-bit
environments.
Regards
jonathan
^ permalink raw reply
* Re: [PATCH net 1/2] cls_bpf: fix offload assumptions after callback conversion
From: Jiri Pirko @ 2017-12-20 6:05 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, daniel, oss-drivers
In-Reply-To: <20171219213214.1084-2-jakub.kicinski@netronome.com>
Tue, Dec 19, 2017 at 10:32:13PM CET, jakub.kicinski@netronome.com wrote:
>cls_bpf used to take care of tracking what offload state a filter
>is in, i.e. it would track if offload request succeeded or not.
>This information would then be used to issue correct requests to
>the driver, e.g. requests for statistics only on offloaded filters,
>removing only filters which were offloaded, using add instead of
>replace if previous filter was not added etc.
>
>This tracking of offload state no longer functions with the new
>callback infrastructure. There could be multiple entities trying
>to offload the same filter.
>
>Throw out all the tracking and corresponding commands and simply
>pass to the drivers both old and new bpf program. Drivers will
>have to deal with offload state tracking by themselves.
>
>Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload")
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Thanks Jakub!
^ permalink raw reply
* Re: RCU callback crashes
From: Jiri Pirko @ 2017-12-20 6:11 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev@vger.kernel.org, Cong Wang
In-Reply-To: <20171219175921.7db9b0e1@cakuba.netronome.com>
Wed, Dec 20, 2017 at 02:59:21AM CET, kubakici@wp.pl wrote:
>Hi!
>
>If I run the netdevsim test long enough on a kernel with no debugging
Just running tools/testing/selftests/bpf/test_offload.py?
>I get this:
Could you try to run it with kasan on?
>
>[ 1400.450124] BUG: unable to handle kernel paging request at 000000046474e552
>[ 1400.458005] IP: 0x46474e552
>[ 1400.461231] PGD 0 P4D 0
>[ 1400.464150] Oops: 0010 [#1] PREEMPT SMP
>[ 1400.468525] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
>[ 1400.516951] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #918
>[ 1400.526678] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>[ 1400.535150] RIP: 0010:0x46474e552
>[ 1400.538941] RSP: 0018:ffff9f736f083f08 EFLAGS: 00010216
>[ 1400.544870] RAX: ffff9f736b4771b8 RBX: ffff9f736f09b880 RCX: ffff9f736b4771b8
>[ 1400.552935] RDX: 000000046474e552 RSI: ffff9f736f083f18 RDI: ffff9f736b4771b8
>[ 1400.561001] RBP: ffffffff8bc4a740 R08: ffff9f736b4771b8 R09: 0000000000000000
>[ 1400.569066] R10: ffff9f736f083d90 R11: 0000000000000000 R12: ffff9f736f09b8b8
>[ 1400.577132] R13: 000000000000000a R14: 7fffffffffffffff R15: 0000000000000202
>[ 1400.585197] FS: 0000000000000000(0000) GS:ffff9f736f080000(0000) knlGS:0000000000000000
>[ 1400.594349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 1400.600859] CR2: 000000046474e552 CR3: 0000000839c09001 CR4: 00000000003606e0
>[ 1400.608917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>[ 1400.616982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>[ 1400.625048] Call Trace:
>[ 1400.627868] <IRQ>
>[ 1400.630207] ? rcu_process_callbacks+0x1a0/0x4d0
>[ 1400.635458] ? __do_softirq+0xd1/0x30a
>[ 1400.639739] ? irq_exit+0xae/0xb0
>[ 1400.643532] ? smp_apic_timer_interrupt+0x60/0x140
>[ 1400.648977] ? apic_timer_interrupt+0x8c/0xa0
>[ 1400.653934] </IRQ>
>[ 1400.656370] ? cpuidle_enter_state+0xb0/0x2f0
>[ 1400.661328] ? cpuidle_enter_state+0x8d/0x2f0
>[ 1400.666287] ? do_idle+0x17b/0x1d0
>[ 1400.670167] ? cpu_startup_entry+0x5f/0x70
>[ 1400.674836] ? start_secondary+0x169/0x190
>[ 1400.679504] ? secondary_startup_64+0xa5/0xb0
>[ 1400.684466] Code: Bad RIP value.
>[ 1400.688259] RIP: 0x46474e552 RSP: ffff9f736f083f08
>[ 1400.693703] CR2: 000000046474e552
>[ 1400.697501] ---[ end trace fab2c0fb826644df ]---
>[ 1400.708442] Kernel panic - not syncing: Fatal exception in interrupt
>[ 1400.715693] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>[ 1400.732994] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>
>Unfortunately reproducing the crash on an instrumented kernel seems to
>be difficult..
>
>I managed to gather this:
>
>[ 26.157415] ------------[ cut here ]------------
>[ 26.162670] ODEBUG: free active (active state 1) object type: rcu_head hint: (null)
>[ 26.172361] WARNING: CPU: 19 PID: 1352 at ../lib/debugobjects.c:291 debug_print_object+0x64/0x80
>[ 26.182288] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
>[ 26.230728] CPU: 19 PID: 1352 Comm: tc Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #4
>[ 26.239977] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>[ 26.248453] RIP: 0010:debug_print_object+0x64/0x80
>[ 26.253896] RSP: 0018:ffffb7340410fa00 EFLAGS: 00010086
>[ 26.259825] RAX: 0000000000000051 RBX: ffff8f1f6b7cc5a0 RCX: 0000000000000006
>[ 26.267892] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f1f6f48cdd0
>[ 26.275959] RBP: ffffffffb3c48600 R08: 0000000000000000 R09: 00000000000005f2
>[ 26.284042] R10: 000000000000001e R11: ffffffffb41c35ad R12: ffffffffb3a1d101
>[ 26.292125] R13: ffff8f1f6b7cc5a0 R14: ffffffffb423a8b8 R15: 0000000000000001
>[ 26.300194] FS: 00007f64d4956700(0000) GS:ffff8f1f6f480000(0000) knlGS:0000000000000000
>[ 26.309346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 26.315859] CR2: 0000000001cbc498 CR3: 000000086a8a2004 CR4: 00000000003606e0
>[ 26.323925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>[ 26.331994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>[ 26.331994] Call Trace:
>[ 26.331998] debug_check_no_obj_freed+0x1e6/0x220
>[ 26.332020] ? qdisc_graft+0x14f/0x450
>[ 26.332025] kfree+0x14d/0x1b0
>[ 26.332027] qdisc_graft+0x14f/0x450
>[ 26.332029] tc_get_qdisc+0x12f/0x200
>[ 26.332035] rtnetlink_rcv_msg+0x122/0x310
>[ 26.332039] ? __skb_try_recv_datagram+0xef/0x150
>[ 26.332040] ? __kmalloc_node_track_caller+0x205/0x2b0
>[ 26.332042] ? rtnl_calcit.isra.12+0x100/0x100
>[ 26.332044] netlink_rcv_skb+0x8d/0x130
>[ 26.332046] netlink_unicast+0x16a/0x210
>[ 26.332048] netlink_sendmsg+0x32a/0x370
>[ 26.332054] sock_sendmsg+0x2d/0x40
>[ 26.332056] ___sys_sendmsg+0x298/0x2e0
>[ 26.332061] ? mem_cgroup_commit_charge+0x7a/0x540
>[ 26.332062] ? mem_cgroup_try_charge+0x8e/0x1d0
>[ 26.332066] ? __handle_mm_fault+0x3a1/0x1190
>[ 26.332068] ? __sys_sendmsg+0x41/0x70
>[ 26.332069] __sys_sendmsg+0x41/0x70
>[ 26.332074] entry_SYSCALL_64_fastpath+0x1e/0x81
>[ 26.332076] RIP: 0033:0x7f64d3b53450
>[ 26.332076] RSP: 002b:00007fffb5ea4388 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>[ 26.332077] RAX: ffffffffffffffda RBX: 00007f64d3e0fb20 RCX: 00007f64d3b53450
>[ 26.332078] RDX: 0000000000000000 RSI: 00007fffb5ea43e0 RDI: 0000000000000003
>[ 26.332078] RBP: 0000000000000a11 R08: 0000000000000000 R09: 000000000000000f
>[ 26.332079] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007f64d3e0fb78
>[ 26.332079] R13: 00007f64d3e0fb78 R14: 000000000000270f R15: 00007f64d3e0fb78
>[ 26.332081] Code: c1 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 f6 d0 e5 00 8b 53 10 4c 89 e6 48 c7 c7 38 7c a3 b3 48 8b 14 d5 80 3d 85 b
>[ 26.332097] ---[ end trace bd33b199ae76ad43 ]---
^ permalink raw reply
* Re: [PATCH V2 net-next 01/17] net: hns3: add support to query tqps number
From: lipeng (Y) @ 2017-12-20 6:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel, linuxarm, salil.mehta
In-Reply-To: <20171219.141644.1574828912135603880.davem@davemloft.net>
On 2017/12/20 3:16, David Miller wrote:
> From: Lipeng <lipeng321@huawei.com>
> Date: Tue, 19 Dec 2017 12:02:23 +0800
>
>> @@ -5002,6 +5002,26 @@ static void hclge_uninit_ae_dev(struct hnae3_ae_dev *ae_dev)
>> ae_dev->priv = NULL;
>> }
>>
>> +static u32 hclge_get_max_channels(struct hnae3_handle *handle)
>> +{
>> + struct hclge_vport *vport = hclge_get_vport(handle);
>> + struct hnae3_knic_private_info *kinfo = &handle->kinfo;
>> + struct hclge_dev *hdev = vport->back;
>> +
> Please order local variables from longest to shortest line.
>
> Please audit your entire submission for this problem.
>
> .
will check this patch-set about this problem. Thanks
^ permalink raw reply
* Re: [PATCH V2 net-next 02/17] net: hns3: add support to modify tqps number
From: lipeng (Y) @ 2017-12-20 6:15 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel, linuxarm, salil.mehta
In-Reply-To: <20171219.141840.1031328928935244349.davem@davemloft.net>
On 2017/12/20 3:18, David Miller wrote:
> From: Lipeng <lipeng321@huawei.com>
> Date: Tue, 19 Dec 2017 12:02:24 +0800
>
>> @@ -2651,6 +2651,19 @@ static int hns3_get_ring_config(struct hns3_nic_priv *priv)
>> return ret;
>> }
>>
>> +static void hns3_put_ring_config(struct hns3_nic_priv *priv)
>> +{
>> + struct hnae3_handle *h = priv->ae_handle;
>> + u16 i;
>> +
>> + for (i = 0; i < h->kinfo.num_tqps; i++) {
> Please use a plain "int" for index iteration loops like this since
> that is the canonical type to use.
will check and fix this , Thanks.
>> +static void hclge_release_tqp(struct hclge_vport *vport)
>> +{
>> + struct hnae3_knic_private_info *kinfo = &vport->nic.kinfo;
>> + struct hclge_dev *hdev = vport->back;
>> + u16 i;
>> +
>> + for (i = 0; i < kinfo->num_tqps; i++) {
> Likewise.
>
> .
>
^ permalink raw reply
* Re: RCU callback crashes
From: Jakub Kicinski @ 2017-12-20 6:22 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev@vger.kernel.org, Cong Wang
In-Reply-To: <20171220061118.GB1916@nanopsycho>
On Wed, 20 Dec 2017 07:11:18 +0100, Jiri Pirko wrote:
> Wed, Dec 20, 2017 at 02:59:21AM CET, kubakici@wp.pl wrote:
> >Hi!
> >
> >If I run the netdevsim test long enough on a kernel with no debugging
>
> Just running tools/testing/selftests/bpf/test_offload.py?
Yes, like this:
while ./linux/tools/testing/selftests/bpf/test_offload.py --log /tmp/log; do echo; done
I usually crashes after ~10 minutes on my machine.
> >I get this:
>
> Could you try to run it with kasan on?
I didn't manage to reproduce it with KASAN on so far :( Even enabling
object debugging to get the second splat in my email (which is more
useful) actually makes the crash go away, I only see the warning...
^ permalink raw reply
* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Neftin, Sasha @ 2017-12-20 6:24 UTC (permalink / raw)
To: Pavel Machek, jacob.e.keller
Cc: bpoirier, nix.or.die, netdev, linux-kernel, intel-wired-lan,
lsorense, David Miller
In-Reply-To: <077087f2-551a-c045-6b07-b1b661e53dad@intel.com>
On 12/18/2017 17:50, Neftin, Sasha wrote:
> On 12/18/2017 13:58, Pavel Machek wrote:
>> On Mon 2017-12-18 13:24:40, Neftin, Sasha wrote:
>>> On 12/18/2017 12:26, Pavel Machek wrote:
>>>> Hi!
>>>>
>>>>>>>> In v4.15-rc2+, network manager can not see my ethernet card, and
>>>>>>>> manual attempts to ifconfig it up did not really help, either.
>>>>>>>>
>>>>>>>> Card is:
>>>>>>>>
>>>>>>>> 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit
>>>>>>>> Ethernet
>>>>>>>> Controller
>>>>>> ....
>>>>>>>> Any ideas ?
>>>>>>> Yes , 19110cfbb34d4af0cdfe14cd243f3b09dc95b013 broke it.
>>>>>>>
>>>>>>> See:
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198047
>>>>>>>
>>>>>>> Fix there :
>>>>>>> https://marc.info/?l=linux-kernel&m=151272209903675&w=2
>>>>>> I don't see the patch in latest mainline. Not having ethernet
>>>>>> is... somehow annoying. What is going on there?
>>>>> Generally speaking, e1000 maintainence has been handled very
>>>>> poorly over
>>>>> the past few years, I have to say.
>>>>>
>>>>> Fixes take forever to propagate even when someone other than the
>>>>> maintainer provides a working and tested fix, just like this case.
>>>>>
>>>>> Jeff, please take e1000 maintainence seriously and get these critical
>>>>> bug fixes propagated.
>>>> No response AFAICT. I guess I should test reverting
>>>> 19110cfbb34d4af0cdfe14cd243f3b09dc95b013, then ask you for revert?
>>> Hello Pavel,
>>>
>>> Before ask for reverting 19110cfbb..., please, check if follow patch of
>>> Benjamin work for you http://patchwork.ozlabs.org/patch/846825/
>> Jacob said, in another email:
>>
>> # Digging into this, the problem is complicated. The original bug
>> # assumed behavior of the .check_for_link call, which is universally not
>> # implemented.
>> #
>> # I think the correct fix is to revert 19110cfbb34d ("e1000e: Separate
>> # signaling for link check/link up", 2017-10-10) and find a more
>> proper solution.
>>
>> ...which makes me think that revert is preffered?
>>
>> Pavel
>>
> Pavel, before ask for revert - let's check Benjamin's patch following
> to his previous patch. Previous patch was not competed and latest one
> come to complete changes.
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Pavel, any update? Is Benjamin's last patch solved your network problem?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox