* Re: [PATCH iproute2-next v2 2/4] devlink: Add devlink trap set and show commands
From: Jiri Pirko @ 2019-08-13 8:44 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-3-idosch@idosch.org>
Tue, Aug 13, 2019 at 10:31:41AM CEST, idosch@idosch.org wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>The trap set command allows the user to set the action of an individual
>trap. Example:
>
># devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap
>
>The trap show command allows the user to get the current status of an
>individual trap or a dump of all traps in case one is not specified.
>When '-s' is specified the trap's statistics are shown. When '-v' is
>specified the metadata types the trap can provide are shown. Example:
>
># devlink -jvps trap show netdevsim/netdevsim10 trap blackhole_route
>{
> "trap": {
> "netdevsim/netdevsim10": [ {
> "name": "blackhole_route",
> "type": "drop",
> "generic": true,
> "action": "trap",
> "group": "l3_drops",
> "metadata": [ "input_port" ],
> "stats": {
> "rx": {
> "bytes": 0,
> "packets": 0
> }
> }
> } ]
> }
>}
>
>Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* Re: tun: mark small packets as owned by the tap sock
From: Jason Wang @ 2019-08-13 8:33 UTC (permalink / raw)
To: Dave Jones, Alexis Bauvin; +Cc: netdev
In-Reply-To: <20190812221954.GA13314@codemonkey.org.uk>
On 2019/8/13 上午6:19, Dave Jones wrote:
> On Wed, Aug 07, 2019 at 12:30:07AM +0000, Linux Kernel wrote:
> > Commit: 4b663366246be1d1d4b1b8b01245b2e88ad9e706
> > Parent: 16b2084a8afa1432d14ba72b7c97d7908e178178
> > Web: https://git.kernel.org/torvalds/c/4b663366246be1d1d4b1b8b01245b2e88ad9e706
> > Author: Alexis Bauvin <abauvin@scaleway.com>
> > AuthorDate: Tue Jul 23 16:23:01 2019 +0200
> >
> > tun: mark small packets as owned by the tap sock
> >
> > - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
> >
> > Small packets going out of a tap device go through an optimized code
> > path that uses build_skb() rather than sock_alloc_send_pskb(). The
> > latter calls skb_set_owner_w(), but the small packet code path does not.
> >
> > The net effect is that small packets are not owned by the userland
> > application's socket (e.g. QEMU), while large packets are.
> > This can be seen with a TCP session, where packets are not owned when
> > the window size is small enough (around PAGE_SIZE), while they are once
> > the window grows (note that this requires the host to support virtio
> > tso for the guest to offload segmentation).
> > All this leads to inconsistent behaviour in the kernel, especially on
> > netfilter modules that uses sk->socket (e.g. xt_owner).
> >
> > Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet")
> > Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
>
> This commit breaks ipv6 routing when I deployed on it a linode.
> It seems to work briefly after boot, and then silently all packets get
> dropped. (Presumably, it's dropping RA or ND packets)
>
> With this reverted, everything works as it did in rc3.
>
> Dave
Hi:
Two questions:
- Are you using XDP for TUN?
- Does it work before 66ccbc9c87c2? If yes, could you show us the result
of net_dropmonitor?
Thanks
>
^ permalink raw reply
* Re: [PATCH 14/16] include/linux: prefer __section from compiler_attributes.h
From: Will Deacon @ 2019-08-13 8:32 UTC (permalink / raw)
To: Nick Desaulniers
Cc: akpm, sedat.dilek, jpoimboe, yhs, miguel.ojeda.sandonis,
clang-built-linux, Luc Van Oostenryck, Lai Jiangshan,
Paul E. McKenney, Josh Triplett, Steven Rostedt,
Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Song Liu, Thomas Gleixner, Ingo Molnar,
Peter Zijlstra (Intel), Nicholas Piggin, Jiri Kosina,
Ard Biesheuvel, Michael Ellerman, Masahiro Yamada,
Hans Liljestrand, Elena Reshetova, David Windsor, Marc Zyngier,
Ming Lei, Dou Liyang, Julien Thierry, Mauro Carvalho Chehab,
Jens Axboe, linux-kernel, linux-sparse, rcu, netdev, bpf
In-Reply-To: <20190812215052.71840-14-ndesaulniers@google.com>
On Mon, Aug 12, 2019 at 02:50:47PM -0700, Nick Desaulniers wrote:
> Link: https://github.com/ClangBuiltLinux/linux/issues/619
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
-ENOCOMMITMESSAGE
Otherwise, patch looks good to me.
Will
^ permalink raw reply
* [PATCH iproute2-next v2 4/4] devlink: Add man page for devlink-trap
From: Ido Schimmel @ 2019-08-13 8:31 UTC (permalink / raw)
To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
man/man8/devlink-monitor.8 | 3 +-
man/man8/devlink-trap.8 | 138 +++++++++++++++++++++++++++++++++++++
man/man8/devlink.8 | 11 ++-
3 files changed, 150 insertions(+), 2 deletions(-)
create mode 100644 man/man8/devlink-trap.8
diff --git a/man/man8/devlink-monitor.8 b/man/man8/devlink-monitor.8
index 13fe641dc8f5..fffab3a4ce88 100644
--- a/man/man8/devlink-monitor.8
+++ b/man/man8/devlink-monitor.8
@@ -21,7 +21,7 @@ command is the first in the command line and then the object list.
.I OBJECT-LIST
is the list of object types that we want to monitor.
It may contain
-.BR dev ", " port ".
+.BR dev ", " port ", " trap ", " trap-group .
.B devlink
opens Devlink Netlink socket, listens on it and dumps state changes.
@@ -31,6 +31,7 @@ opens Devlink Netlink socket, listens on it and dumps state changes.
.BR devlink-dev (8),
.BR devlink-sb (8),
.BR devlink-port (8),
+.BR devlink-trap (8),
.br
.SH AUTHOR
diff --git a/man/man8/devlink-trap.8 b/man/man8/devlink-trap.8
new file mode 100644
index 000000000000..4f079eb86d7b
--- /dev/null
+++ b/man/man8/devlink-trap.8
@@ -0,0 +1,138 @@
+.TH DEVLINK\-TRAP 8 "2 August 2019" "iproute2" "Linux"
+.SH NAME
+devlink-trap \- devlink trap configuration
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.B trap
+.RI "{ " COMMAND " |"
+.BR help " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-v\fR[\fIerbose\fR] |
+\fB\-s\fR[\fItatistics\fR] }
+
+.ti -8
+.B "devlink trap show"
+.RI "[ " DEV
+.B trap
+.IR TRAP " ]"
+
+.ti -8
+.BI "devlink trap set " DEV " trap " TRAP
+.RB "[ " action " { " trap " | " drop " } ]"
+
+.ti -8
+.B "devlink trap group show"
+.RI "[ " DEV
+.B group
+.IR GROUP " ]"
+
+.ti -8
+.BI "devlink trap group set " DEV " group " GROUP
+.RB "[ " action " { " trap " | " drop " } ]"
+
+.ti -8
+.B devlink trap help
+
+.SH "DESCRIPTION"
+.SS devlink trap show - display available packet traps and their attributes
+
+.PP
+.I "DEV"
+- specifies the devlink device from which to show packet traps.
+If this argument is omitted all packet traps of all devices are listed.
+
+.PP
+.BI "trap " TRAP
+- specifies the packet trap.
+Only applicable if a devlink device is also specified.
+
+.SS devlink trap set - set attributes of a packet trap
+
+.PP
+.I "DEV"
+- specifies the devlink device the packet trap belongs to.
+
+.PP
+.BI "trap " TRAP
+- specifies the packet trap.
+
+.TP
+.BR action " { " trap " | " drop " } "
+packet trap action.
+
+.I trap
+- the sole copy of the packet is sent to the CPU.
+
+.I drop
+- the packet is dropped by the underlying device and a copy is not sent to the CPU.
+
+.SS devlink trap group show - display available packet trap groups and their attributes
+
+.PP
+.I "DEV"
+- specifies the devlink device from which to show packet trap groups.
+If this argument is omitted all packet trap groups of all devices are listed.
+
+.PP
+.BI "group " GROUP
+- specifies the packet trap group.
+Only applicable if a devlink device is also specified.
+
+.SS devlink trap group set - set attributes of a packet trap group
+
+.PP
+.I "DEV"
+- specifies the devlink device the packet trap group belongs to.
+
+.PP
+.BI "group " GROUP
+- specifies the packet trap group.
+
+.TP
+.BR action " { " trap " | " drop " } "
+packet trap action. The action is set for all the packet traps member in the
+trap group. The actions of non-drop traps cannot be changed and are thus
+skipped.
+
+.SH "EXAMPLES"
+.PP
+devlink trap show
+.RS 4
+List available packet traps.
+.RE
+.PP
+devlink trap group show
+.RS 4
+List available packet trap groups.
+.RE
+.PP
+devlink -vs trap show pci/0000:01:00.0 trap source_mac_is_multicast
+.RS 4
+Show attributes and statistics of a specific packet trap.
+.RE
+.PP
+devlink -s trap group show pci/0000:01:00.0 group l2_drops
+.RS 4
+Show attributes and statistics of a specific packet trap group.
+.RE
+.PP
+devlink trap set pci/0000:01:00.0 trap source_mac_is_multicast action trap
+.RS 4
+Set the action of a specific packet trap to 'trap'.
+
+.SH SEE ALSO
+.BR devlink (8),
+.BR devlink-dev (8),
+.BR devlink-monitor (8),
+.br
+
+.SH AUTHOR
+Ido Schimmel <idosch@mellanox.com>
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 13d4dcd908b3..12d489440a3d 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -7,7 +7,7 @@ devlink \- Devlink tool
.in +8
.ti -8
.B devlink
-.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health " } { " COMMAND " | "
+.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health | trap " } { " COMMAND " | "
.BR help " }"
.sp
@@ -51,6 +51,10 @@ When combined with -j generate a pretty JSON output.
.BR "\-v" , " --verbose"
Turn on verbose output.
+.TP
+.BR "\-s" , " --statistics"
+Output statistics.
+
.SS
.I OBJECT
@@ -82,6 +86,10 @@ Turn on verbose output.
.B health
- devlink reporting and recovery
+.TP
+.B trap
+- devlink trap configuration
+
.SS
.I COMMAND
@@ -114,6 +122,7 @@ Exit status is 0 if command was successful or a positive integer upon failure.
.BR devlink-resource (8),
.BR devlink-region (8),
.BR devlink-health (8),
+.BR devlink-trap (8),
.br
.SH REPORTING BUGS
--
2.21.0
^ permalink raw reply related
* [PATCH iproute2-next v2 3/4] devlink: Add devlink trap group set and show commands
From: Ido Schimmel @ 2019-08-13 8:31 UTC (permalink / raw)
To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
These commands are similar to the trap set and show commands, but
operate on a trap group and not individual traps. Example:
# devlink trap group set netdevsim/netdevsim10 group l3_drops action trap
# devlink -jps trap group show netdevsim/netdevsim10 group l3_drops
{
"trap_group": {
"netdevsim/netdevsim10": [ {
"name": "l3_drops",
"generic": true,
"stats": {
"rx": {
"bytes": 0,
"packets": 0
}
}
} ]
}
}
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
devlink/devlink.c | 135 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 133 insertions(+), 2 deletions(-)
diff --git a/devlink/devlink.c b/devlink/devlink.c
index 81fff4429841..2f084c020765 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -235,6 +235,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
#define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER BIT(28)
#define DL_OPT_TRAP_NAME BIT(29)
#define DL_OPT_TRAP_ACTION BIT(30)
+#define DL_OPT_TRAP_GROUP_NAME BIT(31)
struct dl_opts {
uint64_t present; /* flags of present items */
@@ -272,6 +273,7 @@ struct dl_opts {
uint64_t reporter_graceful_period;
bool reporter_auto_recover;
const char *trap_name;
+ const char *trap_group_name;
enum devlink_trap_action trap_action;
};
@@ -1092,6 +1094,7 @@ static const struct dl_args_metadata dl_args_required[] = {
{DL_OPT_REGION_LENGTH, "Region length value expected."},
{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
{DL_OPT_TRAP_NAME, "Trap's name is expected."},
+ {DL_OPT_TRAP_GROUP_NAME, "Trap group's name is expected."},
};
static int dl_args_finding_required_validate(uint64_t o_required,
@@ -1388,6 +1391,13 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
if (err)
return err;
o_found |= DL_OPT_TRAP_NAME;
+ } else if (dl_argv_match(dl, "group") &&
+ (o_all & DL_OPT_TRAP_GROUP_NAME)) {
+ dl_arg_inc(dl);
+ err = dl_argv_str(dl, &opts->trap_group_name);
+ if (err)
+ return err;
+ o_found |= DL_OPT_TRAP_GROUP_NAME;
} else if (dl_argv_match(dl, "action") &&
(o_all & DL_OPT_TRAP_ACTION)) {
const char *actionstr;
@@ -1516,6 +1526,9 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
if (opts->present & DL_OPT_TRAP_NAME)
mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_NAME,
opts->trap_name);
+ if (opts->present & DL_OPT_TRAP_GROUP_NAME)
+ mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_GROUP_NAME,
+ opts->trap_group_name);
if (opts->present & DL_OPT_TRAP_ACTION)
mnl_attr_put_u8(nlh, DEVLINK_ATTR_TRAP_ACTION,
opts->trap_action);
@@ -3869,6 +3882,10 @@ static const char *cmd_name(uint8_t cmd)
case DEVLINK_CMD_TRAP_SET: return "set";
case DEVLINK_CMD_TRAP_NEW: return "new";
case DEVLINK_CMD_TRAP_DEL: return "del";
+ case DEVLINK_CMD_TRAP_GROUP_GET: return "get";
+ case DEVLINK_CMD_TRAP_GROUP_SET: return "set";
+ case DEVLINK_CMD_TRAP_GROUP_NEW: return "new";
+ case DEVLINK_CMD_TRAP_GROUP_DEL: return "del";
default: return "<unknown cmd>";
}
}
@@ -3902,6 +3919,11 @@ static const char *cmd_obj(uint8_t cmd)
case DEVLINK_CMD_TRAP_NEW:
case DEVLINK_CMD_TRAP_DEL:
return "trap";
+ case DEVLINK_CMD_TRAP_GROUP_GET:
+ case DEVLINK_CMD_TRAP_GROUP_SET:
+ case DEVLINK_CMD_TRAP_GROUP_NEW:
+ case DEVLINK_CMD_TRAP_GROUP_DEL:
+ return "trap-group";
default: return "<unknown obj>";
}
}
@@ -3928,6 +3950,7 @@ static bool cmd_filter_check(struct dl *dl, uint8_t cmd)
static void pr_out_region(struct dl *dl, struct nlattr **tb);
static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array);
+static void pr_out_trap_group(struct dl *dl, struct nlattr **tb, bool array);
static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
{
@@ -3999,6 +4022,18 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
pr_out_mon_header(genl->cmd);
pr_out_trap(dl, tb, false);
break;
+ case DEVLINK_CMD_TRAP_GROUP_GET: /* fall through */
+ case DEVLINK_CMD_TRAP_GROUP_SET: /* fall through */
+ case DEVLINK_CMD_TRAP_GROUP_NEW: /* fall through */
+ case DEVLINK_CMD_TRAP_GROUP_DEL:
+ mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+ if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+ !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+ !tb[DEVLINK_ATTR_STATS])
+ return MNL_CB_ERROR;
+ pr_out_mon_header(genl->cmd);
+ pr_out_trap_group(dl, tb, false);
+ break;
}
return MNL_CB_OK;
}
@@ -4013,7 +4048,8 @@ static int cmd_mon_show(struct dl *dl)
if (strcmp(cur_obj, "all") != 0 &&
strcmp(cur_obj, "dev") != 0 &&
strcmp(cur_obj, "port") != 0 &&
- strcmp(cur_obj, "trap") != 0) {
+ strcmp(cur_obj, "trap") != 0 &&
+ strcmp(cur_obj, "trap-group") != 0) {
pr_err("Unknown object \"%s\"\n", cur_obj);
return -EINVAL;
}
@@ -4030,7 +4066,7 @@ static int cmd_mon_show(struct dl *dl)
static void cmd_mon_help(void)
{
pr_err("Usage: devlink monitor [ all | OBJECT-LIST ]\n"
- "where OBJECT-LIST := { dev | port | trap }\n");
+ "where OBJECT-LIST := { dev | port | trap | trap-group }\n");
}
static int cmd_mon(struct dl *dl)
@@ -6546,6 +6582,8 @@ static void cmd_trap_help(void)
{
pr_err("Usage: devlink trap set DEV trap TRAP [ action { trap | drop } ]\n");
pr_err(" devlink trap show [ DEV trap TRAP ]\n");
+ pr_err(" devlink trap group set DEV group GROUP [ action { trap | drop } ]\n");
+ pr_err(" devlink trap group show [ DEV group GROUP ]\n");
}
static int cmd_trap_show(struct dl *dl)
@@ -6589,6 +6627,96 @@ static int cmd_trap_set(struct dl *dl)
return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
}
+static void pr_out_trap_group(struct dl *dl, struct nlattr **tb, bool array)
+{
+ if (array)
+ pr_out_handle_start_arr(dl, tb);
+ else
+ __pr_out_handle_start(dl, tb, true, false);
+
+ pr_out_str(dl, "name",
+ mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_GROUP_NAME]));
+ pr_out_bool(dl, "generic", !!tb[DEVLINK_ATTR_TRAP_GENERIC]);
+ pr_out_stats(dl, tb[DEVLINK_ATTR_STATS]);
+ pr_out_handle_end(dl);
+}
+
+static int cmd_trap_group_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+ struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+ struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+ struct dl *dl = data;
+
+ mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+ if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+ !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] || !tb[DEVLINK_ATTR_STATS])
+ return MNL_CB_ERROR;
+
+ pr_out_trap_group(dl, tb, true);
+
+ return MNL_CB_OK;
+}
+
+static int cmd_trap_group_show(struct dl *dl)
+{
+ uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+ struct nlmsghdr *nlh;
+ int err;
+
+ if (dl_argc(dl) == 0)
+ flags |= NLM_F_DUMP;
+
+ nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GROUP_GET, flags);
+
+ if (dl_argc(dl) > 0) {
+ err = dl_argv_parse_put(nlh, dl,
+ DL_OPT_HANDLE | DL_OPT_TRAP_GROUP_NAME,
+ 0);
+ if (err)
+ return err;
+ }
+
+ pr_out_section_start(dl, "trap_group");
+ err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_trap_group_show_cb, dl);
+ pr_out_section_end(dl);
+
+ return err;
+}
+
+static int cmd_trap_group_set(struct dl *dl)
+{
+ struct nlmsghdr *nlh;
+ int err;
+
+ nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GROUP_SET,
+ NLM_F_REQUEST | NLM_F_ACK);
+
+ err = dl_argv_parse_put(nlh, dl,
+ DL_OPT_HANDLE | DL_OPT_TRAP_GROUP_NAME,
+ DL_OPT_TRAP_ACTION);
+ if (err)
+ return err;
+
+ return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_trap_group(struct dl *dl)
+{
+ if (dl_argv_match(dl, "help")) {
+ cmd_trap_help();
+ return 0;
+ } else if (dl_argv_match(dl, "show") ||
+ dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+ dl_arg_inc(dl);
+ return cmd_trap_group_show(dl);
+ } else if (dl_argv_match(dl, "set")) {
+ dl_arg_inc(dl);
+ return cmd_trap_group_set(dl);
+ }
+ pr_err("Command \"%s\" not found\n", dl_argv(dl));
+ return -ENOENT;
+}
+
static int cmd_trap(struct dl *dl)
{
if (dl_argv_match(dl, "help")) {
@@ -6601,6 +6729,9 @@ static int cmd_trap(struct dl *dl)
} else if (dl_argv_match(dl, "set")) {
dl_arg_inc(dl);
return cmd_trap_set(dl);
+ } else if (dl_argv_match(dl, "group")) {
+ dl_arg_inc(dl);
+ return cmd_trap_group(dl);
}
pr_err("Command \"%s\" not found\n", dl_argv(dl));
return -ENOENT;
--
2.21.0
^ permalink raw reply related
* [PATCH iproute2-next v2 2/4] devlink: Add devlink trap set and show commands
From: Ido Schimmel @ 2019-08-13 8:31 UTC (permalink / raw)
To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
The trap set command allows the user to set the action of an individual
trap. Example:
# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap
The trap show command allows the user to get the current status of an
individual trap or a dump of all traps in case one is not specified.
When '-s' is specified the trap's statistics are shown. When '-v' is
specified the metadata types the trap can provide are shown. Example:
# devlink -jvps trap show netdevsim/netdevsim10 trap blackhole_route
{
"trap": {
"netdevsim/netdevsim10": [ {
"name": "blackhole_route",
"type": "drop",
"generic": true,
"action": "trap",
"group": "l3_drops",
"metadata": [ "input_port" ],
"stats": {
"rx": {
"bytes": 0,
"packets": 0
}
}
} ]
}
}
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
devlink/devlink.c | 293 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 288 insertions(+), 5 deletions(-)
diff --git a/devlink/devlink.c b/devlink/devlink.c
index 4ed240e251f5..81fff4429841 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -233,6 +233,8 @@ static void ifname_map_free(struct ifname_map *ifname_map)
#define DL_OPT_HEALTH_REPORTER_NAME BIT(27)
#define DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD BIT(27)
#define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER BIT(28)
+#define DL_OPT_TRAP_NAME BIT(29)
+#define DL_OPT_TRAP_ACTION BIT(30)
struct dl_opts {
uint64_t present; /* flags of present items */
@@ -269,6 +271,8 @@ struct dl_opts {
const char *reporter_name;
uint64_t reporter_graceful_period;
bool reporter_auto_recover;
+ const char *trap_name;
+ enum devlink_trap_action trap_action;
};
struct dl {
@@ -282,6 +286,7 @@ struct dl {
bool json_output;
bool pretty_output;
bool verbose;
+ bool stats;
struct {
bool present;
char *bus_name;
@@ -436,6 +441,19 @@ static const enum mnl_attr_data_type devlink_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT] = MNL_TYPE_U64,
[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS] = MNL_TYPE_U64,
[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = MNL_TYPE_U64,
+ [DEVLINK_ATTR_STATS] = MNL_TYPE_NESTED,
+ [DEVLINK_ATTR_TRAP_NAME] = MNL_TYPE_STRING,
+ [DEVLINK_ATTR_TRAP_ACTION] = MNL_TYPE_U8,
+ [DEVLINK_ATTR_TRAP_TYPE] = MNL_TYPE_U8,
+ [DEVLINK_ATTR_TRAP_GENERIC] = MNL_TYPE_FLAG,
+ [DEVLINK_ATTR_TRAP_METADATA] = MNL_TYPE_NESTED,
+ [DEVLINK_ATTR_TRAP_GROUP_NAME] = MNL_TYPE_STRING,
+};
+
+static const enum mnl_attr_data_type
+devlink_stats_policy[DEVLINK_ATTR_STATS_MAX + 1] = {
+ [DEVLINK_ATTR_STATS_RX_PACKETS] = MNL_TYPE_U64,
+ [DEVLINK_ATTR_STATS_RX_BYTES] = MNL_TYPE_U64,
};
static int attr_cb(const struct nlattr *attr, void *data)
@@ -454,6 +472,25 @@ static int attr_cb(const struct nlattr *attr, void *data)
return MNL_CB_OK;
}
+static int attr_stats_cb(const struct nlattr *attr, void *data)
+{
+ const struct nlattr **tb = data;
+ int type;
+
+ /* Allow the tool to work on top of newer kernels that might contain
+ * more attributes.
+ */
+ if (mnl_attr_type_valid(attr, DEVLINK_ATTR_STATS_MAX) < 0)
+ return MNL_CB_OK;
+
+ type = mnl_attr_get_type(attr);
+ if (mnl_attr_validate(attr, devlink_stats_policy[type]) < 0)
+ return MNL_CB_ERROR;
+
+ tb[type] = attr;
+ return MNL_CB_OK;
+}
+
static int ifname_map_cb(const struct nlmsghdr *nlh, void *data)
{
struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
@@ -1014,6 +1051,20 @@ static int param_cmode_get(const char *cmodestr,
return 0;
}
+static int trap_action_get(const char *actionstr,
+ enum devlink_trap_action *p_action)
+{
+ if (strcmp(actionstr, "drop") == 0) {
+ *p_action = DEVLINK_TRAP_ACTION_DROP;
+ } else if (strcmp(actionstr, "trap") == 0) {
+ *p_action = DEVLINK_TRAP_ACTION_TRAP;
+ } else {
+ pr_err("Unknown trap action \"%s\"\n", actionstr);
+ return -EINVAL;
+ }
+ return 0;
+}
+
struct dl_args_metadata {
uint64_t o_flag;
char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
@@ -1040,6 +1091,7 @@ static const struct dl_args_metadata dl_args_required[] = {
{DL_OPT_REGION_ADDRESS, "Region address value expected."},
{DL_OPT_REGION_LENGTH, "Region length value expected."},
{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
+ {DL_OPT_TRAP_NAME, "Trap's name is expected."},
};
static int dl_args_finding_required_validate(uint64_t o_required,
@@ -1329,6 +1381,25 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
if (err)
return err;
o_found |= DL_OPT_HEALTH_REPORTER_AUTO_RECOVER;
+ } else if (dl_argv_match(dl, "trap") &&
+ (o_all & DL_OPT_TRAP_NAME)) {
+ dl_arg_inc(dl);
+ err = dl_argv_str(dl, &opts->trap_name);
+ if (err)
+ return err;
+ o_found |= DL_OPT_TRAP_NAME;
+ } else if (dl_argv_match(dl, "action") &&
+ (o_all & DL_OPT_TRAP_ACTION)) {
+ const char *actionstr;
+
+ dl_arg_inc(dl);
+ err = dl_argv_str(dl, &actionstr);
+ if (err)
+ return err;
+ err = trap_action_get(actionstr, &opts->trap_action);
+ if (err)
+ return err;
+ o_found |= DL_OPT_TRAP_ACTION;
} else {
pr_err("Unknown option \"%s\"\n", dl_argv(dl));
return -EINVAL;
@@ -1442,6 +1513,12 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
if (opts->present & DL_OPT_HEALTH_REPORTER_AUTO_RECOVER)
mnl_attr_put_u8(nlh, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
opts->reporter_auto_recover);
+ if (opts->present & DL_OPT_TRAP_NAME)
+ mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_NAME,
+ opts->trap_name);
+ if (opts->present & DL_OPT_TRAP_ACTION)
+ mnl_attr_put_u8(nlh, DEVLINK_ATTR_TRAP_ACTION,
+ opts->trap_action);
}
@@ -1945,6 +2022,30 @@ static void pr_out_entry_end(struct dl *dl)
__pr_out_newline();
}
+static void pr_out_stats(struct dl *dl, struct nlattr *nla_stats)
+{
+ struct nlattr *tb[DEVLINK_ATTR_STATS_MAX + 1] = {};
+ int err;
+
+ if (!dl->stats)
+ return;
+
+ err = mnl_attr_parse_nested(nla_stats, attr_stats_cb, tb);
+ if (err != MNL_CB_OK)
+ return;
+
+ pr_out_object_start(dl, "stats");
+ pr_out_object_start(dl, "rx");
+ if (tb[DEVLINK_ATTR_STATS_RX_BYTES])
+ pr_out_u64(dl, "bytes",
+ mnl_attr_get_u64(tb[DEVLINK_ATTR_STATS_RX_BYTES]));
+ if (tb[DEVLINK_ATTR_STATS_RX_PACKETS])
+ pr_out_u64(dl, "packets",
+ mnl_attr_get_u64(tb[DEVLINK_ATTR_STATS_RX_PACKETS]));
+ pr_out_object_end(dl);
+ pr_out_object_end(dl);
+}
+
static const char *param_cmode_name(uint8_t cmode)
{
switch (cmode) {
@@ -3764,6 +3865,10 @@ static const char *cmd_name(uint8_t cmd)
case DEVLINK_CMD_REGION_SET: return "set";
case DEVLINK_CMD_REGION_NEW: return "new";
case DEVLINK_CMD_REGION_DEL: return "del";
+ case DEVLINK_CMD_TRAP_GET: return "get";
+ case DEVLINK_CMD_TRAP_SET: return "set";
+ case DEVLINK_CMD_TRAP_NEW: return "new";
+ case DEVLINK_CMD_TRAP_DEL: return "del";
default: return "<unknown cmd>";
}
}
@@ -3792,6 +3897,11 @@ static const char *cmd_obj(uint8_t cmd)
case DEVLINK_CMD_REGION_NEW:
case DEVLINK_CMD_REGION_DEL:
return "region";
+ case DEVLINK_CMD_TRAP_GET:
+ case DEVLINK_CMD_TRAP_SET:
+ case DEVLINK_CMD_TRAP_NEW:
+ case DEVLINK_CMD_TRAP_DEL:
+ return "trap";
default: return "<unknown obj>";
}
}
@@ -3817,6 +3927,7 @@ static bool cmd_filter_check(struct dl *dl, uint8_t cmd)
}
static void pr_out_region(struct dl *dl, struct nlattr **tb);
+static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array);
static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
{
@@ -3872,6 +3983,22 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
pr_out_mon_header(genl->cmd);
pr_out_region(dl, tb);
break;
+ case DEVLINK_CMD_TRAP_GET: /* fall through */
+ case DEVLINK_CMD_TRAP_SET: /* fall through */
+ case DEVLINK_CMD_TRAP_NEW: /* fall through */
+ case DEVLINK_CMD_TRAP_DEL:
+ mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+ if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+ !tb[DEVLINK_ATTR_TRAP_NAME] ||
+ !tb[DEVLINK_ATTR_TRAP_TYPE] ||
+ !tb[DEVLINK_ATTR_TRAP_ACTION] ||
+ !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+ !tb[DEVLINK_ATTR_TRAP_METADATA] ||
+ !tb[DEVLINK_ATTR_STATS])
+ return MNL_CB_ERROR;
+ pr_out_mon_header(genl->cmd);
+ pr_out_trap(dl, tb, false);
+ break;
}
return MNL_CB_OK;
}
@@ -3885,7 +4012,8 @@ static int cmd_mon_show(struct dl *dl)
while ((cur_obj = dl_argv_index(dl, index++))) {
if (strcmp(cur_obj, "all") != 0 &&
strcmp(cur_obj, "dev") != 0 &&
- strcmp(cur_obj, "port") != 0) {
+ strcmp(cur_obj, "port") != 0 &&
+ strcmp(cur_obj, "trap") != 0) {
pr_err("Unknown object \"%s\"\n", cur_obj);
return -EINVAL;
}
@@ -3902,7 +4030,7 @@ static int cmd_mon_show(struct dl *dl)
static void cmd_mon_help(void)
{
pr_err("Usage: devlink monitor [ all | OBJECT-LIST ]\n"
- "where OBJECT-LIST := { dev | port }\n");
+ "where OBJECT-LIST := { dev | port | trap }\n");
}
static int cmd_mon(struct dl *dl)
@@ -6330,12 +6458,160 @@ static int cmd_health(struct dl *dl)
return -ENOENT;
}
+static const char *trap_type_name(uint8_t type)
+{
+ switch (type) {
+ case DEVLINK_TRAP_TYPE_DROP:
+ return "drop";
+ case DEVLINK_TRAP_TYPE_EXCEPTION:
+ return "exception";
+ default:
+ return "<unknown type>";
+ }
+}
+
+static const char *trap_action_name(uint8_t action)
+{
+ switch (action) {
+ case DEVLINK_TRAP_ACTION_DROP:
+ return "drop";
+ case DEVLINK_TRAP_ACTION_TRAP:
+ return "trap";
+ default:
+ return "<unknown action>";
+ }
+}
+
+static const char *trap_metadata_name(const struct nlattr *attr)
+{
+ switch (attr->nla_type) {
+ case DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT:
+ return "input_port";
+ default:
+ return "<unknown metadata type>";
+ }
+}
+static void pr_out_trap_metadata(struct dl *dl, struct nlattr *attr)
+{
+ struct nlattr *attr_metadata;
+
+ pr_out_array_start(dl, "metadata");
+ mnl_attr_for_each_nested(attr_metadata, attr)
+ pr_out_str_value(dl, trap_metadata_name(attr_metadata));
+ pr_out_array_end(dl);
+}
+
+static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array)
+{
+ uint8_t action = mnl_attr_get_u8(tb[DEVLINK_ATTR_TRAP_ACTION]);
+ uint8_t type = mnl_attr_get_u8(tb[DEVLINK_ATTR_TRAP_TYPE]);
+
+ if (array)
+ pr_out_handle_start_arr(dl, tb);
+ else
+ __pr_out_handle_start(dl, tb, true, false);
+
+ pr_out_str(dl, "name", mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_NAME]));
+ pr_out_str(dl, "type", trap_type_name(type));
+ pr_out_bool(dl, "generic", !!tb[DEVLINK_ATTR_TRAP_GENERIC]);
+ pr_out_str(dl, "action", trap_action_name(action));
+ pr_out_str(dl, "group",
+ mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_GROUP_NAME]));
+ if (dl->verbose)
+ pr_out_trap_metadata(dl, tb[DEVLINK_ATTR_TRAP_METADATA]);
+ pr_out_stats(dl, tb[DEVLINK_ATTR_STATS]);
+ pr_out_handle_end(dl);
+}
+
+static int cmd_trap_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+ struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+ struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+ struct dl *dl = data;
+
+ mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+ if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+ !tb[DEVLINK_ATTR_TRAP_NAME] || !tb[DEVLINK_ATTR_TRAP_TYPE] ||
+ !tb[DEVLINK_ATTR_TRAP_ACTION] ||
+ !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+ !tb[DEVLINK_ATTR_TRAP_METADATA] || !tb[DEVLINK_ATTR_STATS])
+ return MNL_CB_ERROR;
+
+ pr_out_trap(dl, tb, true);
+
+ return MNL_CB_OK;
+}
+
+static void cmd_trap_help(void)
+{
+ pr_err("Usage: devlink trap set DEV trap TRAP [ action { trap | drop } ]\n");
+ pr_err(" devlink trap show [ DEV trap TRAP ]\n");
+}
+
+static int cmd_trap_show(struct dl *dl)
+{
+ uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+ struct nlmsghdr *nlh;
+ int err;
+
+ if (dl_argc(dl) == 0)
+ flags |= NLM_F_DUMP;
+
+ nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GET, flags);
+
+ if (dl_argc(dl) > 0) {
+ err = dl_argv_parse_put(nlh, dl,
+ DL_OPT_HANDLE | DL_OPT_TRAP_NAME, 0);
+ if (err)
+ return err;
+ }
+
+ pr_out_section_start(dl, "trap");
+ err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_trap_show_cb, dl);
+ pr_out_section_end(dl);
+
+ return err;
+}
+
+static int cmd_trap_set(struct dl *dl)
+{
+ struct nlmsghdr *nlh;
+ int err;
+
+ nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_SET,
+ NLM_F_REQUEST | NLM_F_ACK);
+
+ err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_TRAP_NAME,
+ DL_OPT_TRAP_ACTION);
+ if (err)
+ return err;
+
+ return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_trap(struct dl *dl)
+{
+ if (dl_argv_match(dl, "help")) {
+ cmd_trap_help();
+ return 0;
+ } else if (dl_argv_match(dl, "show") ||
+ dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+ dl_arg_inc(dl);
+ return cmd_trap_show(dl);
+ } else if (dl_argv_match(dl, "set")) {
+ dl_arg_inc(dl);
+ return cmd_trap_set(dl);
+ }
+ pr_err("Command \"%s\" not found\n", dl_argv(dl));
+ return -ENOENT;
+}
+
static void help(void)
{
pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
" devlink [ -f[orce] ] -b[atch] filename\n"
- "where OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health }\n"
- " OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] }\n");
+ "where OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health | trap }\n"
+ " OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] -s[tatistics] }\n");
}
static int dl_cmd(struct dl *dl, int argc, char **argv)
@@ -6370,6 +6646,9 @@ static int dl_cmd(struct dl *dl, int argc, char **argv)
} else if (dl_argv_match(dl, "health")) {
dl_arg_inc(dl);
return cmd_health(dl);
+ } else if (dl_argv_match(dl, "trap")) {
+ dl_arg_inc(dl);
+ return cmd_trap(dl);
}
pr_err("Object \"%s\" not found\n", dl_argv(dl));
return -ENOENT;
@@ -6479,6 +6758,7 @@ int main(int argc, char **argv)
{ "json", no_argument, NULL, 'j' },
{ "pretty", no_argument, NULL, 'p' },
{ "verbose", no_argument, NULL, 'v' },
+ { "statistics", no_argument, NULL, 's' },
{ NULL, 0, NULL, 0 }
};
const char *batch_file = NULL;
@@ -6494,7 +6774,7 @@ int main(int argc, char **argv)
return EXIT_FAILURE;
}
- while ((opt = getopt_long(argc, argv, "Vfb:njpv",
+ while ((opt = getopt_long(argc, argv, "Vfb:njpvs",
long_options, NULL)) >= 0) {
switch (opt) {
@@ -6520,6 +6800,9 @@ int main(int argc, char **argv)
case 'v':
dl->verbose = true;
break;
+ case 's':
+ dl->stats = true;
+ break;
default:
pr_err("Unknown option.\n");
help();
--
2.21.0
^ permalink raw reply related
* [PATCH iproute2-next v2 1/4] devlink: Increase number of supported options
From: Ido Schimmel @ 2019-08-13 8:31 UTC (permalink / raw)
To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
Currently, the number of supported options is capped at 32 which is a
problem given we are about to add a few more and go over the limit.
Increase the limit to 64 options.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
devlink/devlink.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/devlink/devlink.c b/devlink/devlink.c
index 91c85dc1de73..4ed240e251f5 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -235,7 +235,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
#define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER BIT(28)
struct dl_opts {
- uint32_t present; /* flags of present items */
+ uint64_t present; /* flags of present items */
char *bus_name;
char *dev_name;
uint32_t port_index;
@@ -735,7 +735,7 @@ static int dl_argv_handle_port(struct dl *dl, char **p_bus_name,
static int dl_argv_handle_both(struct dl *dl, char **p_bus_name,
char **p_dev_name, uint32_t *p_port_index,
- uint32_t *p_handle_bit)
+ uint64_t *p_handle_bit)
{
char *str = dl_argv_next(dl);
unsigned int slash_count;
@@ -1015,7 +1015,7 @@ static int param_cmode_get(const char *cmodestr,
}
struct dl_args_metadata {
- uint32_t o_flag;
+ uint64_t o_flag;
char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
};
@@ -1042,10 +1042,10 @@ static const struct dl_args_metadata dl_args_required[] = {
{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
};
-static int dl_args_finding_required_validate(uint32_t o_required,
- uint32_t o_found)
+static int dl_args_finding_required_validate(uint64_t o_required,
+ uint64_t o_found)
{
- uint32_t o_flag;
+ uint64_t o_flag;
int i;
for (i = 0; i < ARRAY_SIZE(dl_args_required); i++) {
@@ -1058,16 +1058,16 @@ static int dl_args_finding_required_validate(uint32_t o_required,
return 0;
}
-static int dl_argv_parse(struct dl *dl, uint32_t o_required,
- uint32_t o_optional)
+static int dl_argv_parse(struct dl *dl, uint64_t o_required,
+ uint64_t o_optional)
{
struct dl_opts *opts = &dl->opts;
- uint32_t o_all = o_required | o_optional;
- uint32_t o_found = 0;
+ uint64_t o_all = o_required | o_optional;
+ uint64_t o_found = 0;
int err;
if (o_required & DL_OPT_HANDLE && o_required & DL_OPT_HANDLEP) {
- uint32_t handle_bit;
+ uint64_t handle_bit;
err = dl_argv_handle_both(dl, &opts->bus_name, &opts->dev_name,
&opts->port_index, &handle_bit);
@@ -1446,7 +1446,7 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
}
static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
- uint32_t o_required, uint32_t o_optional)
+ uint64_t o_required, uint64_t o_optional)
{
int err;
--
2.21.0
^ permalink raw reply related
* [PATCH iproute2-next v2 0/4] Add devlink-trap support
From: Ido Schimmel @ 2019-08-13 8:31 UTC (permalink / raw)
To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
From: Ido Schimmel <idosch@mellanox.com>
This patchset adds devlink-trap support in iproute2.
Patch #1 increases the number of options devlink can handle.
Patches #2-#3 gradually add support for all devlink-trap commands.
Patch #4 adds a man page for devlink-trap.
See individual commit messages for example usage and output.
Changes in v2:
* Remove report option and monitor command since monitoring is done
using drop monitor
Ido Schimmel (4):
devlink: Increase number of supported options
devlink: Add devlink trap set and show commands
devlink: Add devlink trap group set and show commands
devlink: Add man page for devlink-trap
devlink/devlink.c | 448 +++++++++++++++++++++++++++++++++++--
man/man8/devlink-monitor.8 | 3 +-
man/man8/devlink-trap.8 | 138 ++++++++++++
man/man8/devlink.8 | 11 +-
4 files changed, 581 insertions(+), 19 deletions(-)
create mode 100644 man/man8/devlink-trap.8
--
2.21.0
^ permalink raw reply
* Re: [PATCH 14/16] include/linux: prefer __section from compiler_attributes.h
From: Ard Biesheuvel @ 2019-08-13 8:31 UTC (permalink / raw)
To: Nick Desaulniers
Cc: Andrew Morton, Sedat Dilek, Josh Poimboeuf, yhs, Miguel Ojeda,
clang-built-linux, Luc Van Oostenryck, Lai Jiangshan,
Paul E. McKenney, Josh Triplett, Steven Rostedt,
Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Song Liu, Thomas Gleixner, Ingo Molnar,
Peter Zijlstra (Intel), Nicholas Piggin, Jiri Kosina, Will Deacon,
Michael Ellerman, Masahiro Yamada, Hans Liljestrand,
Elena Reshetova, David Windsor, Marc Zyngier, Ming Lei,
Dou Liyang, Julien Thierry, Mauro Carvalho Chehab, Jens Axboe,
Linux Kernel Mailing List, Linux-Sparse, rcu,
<netdev@vger.kernel.org>, bpf
In-Reply-To: <20190812215052.71840-14-ndesaulniers@google.com>
On Tue, 13 Aug 2019 at 00:53, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
This patch needs a commit log that describes the reason for making this change.
> Link: https://github.com/ClangBuiltLinux/linux/issues/619
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
> include/linux/cache.h | 6 +++---
> include/linux/compiler.h | 8 ++++----
> include/linux/cpu.h | 2 +-
> include/linux/export.h | 2 +-
> include/linux/init_task.h | 4 ++--
> include/linux/interrupt.h | 5 ++---
> include/linux/sched/debug.h | 2 +-
> include/linux/srcutree.h | 2 +-
> 8 files changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/cache.h b/include/linux/cache.h
> index 750621e41d1c..3f4df9eef1e1 100644
> --- a/include/linux/cache.h
> +++ b/include/linux/cache.h
> @@ -28,7 +28,7 @@
> * but may get written to during init, so can't live in .rodata (via "const").
> */
> #ifndef __ro_after_init
> -#define __ro_after_init __attribute__((__section__(".data..ro_after_init")))
> +#define __ro_after_init __section(.data..ro_after_init)
> #endif
>
> #ifndef ____cacheline_aligned
> @@ -45,8 +45,8 @@
>
> #ifndef __cacheline_aligned
> #define __cacheline_aligned \
> - __attribute__((__aligned__(SMP_CACHE_BYTES), \
> - __section__(".data..cacheline_aligned")))
> + __aligned(SMP_CACHE_BYTES) \
> + __section(.data..cacheline_aligned)
> #endif /* __cacheline_aligned */
>
> #ifndef __cacheline_aligned_in_smp
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index f0fd5636fddb..5e88e7e33abe 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -24,7 +24,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> long ______r; \
> static struct ftrace_likely_data \
> __aligned(4) \
> - __section("_ftrace_annotated_branch") \
> + __section(_ftrace_annotated_branch) \
> ______f = { \
> .data.func = __func__, \
> .data.file = __FILE__, \
> @@ -60,7 +60,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> #define __trace_if_value(cond) ({ \
> static struct ftrace_branch_data \
> __aligned(4) \
> - __section("_ftrace_branch") \
> + __section(_ftrace_branch) \
> __if_trace = { \
> .func = __func__, \
> .file = __FILE__, \
> @@ -118,7 +118,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> ".popsection\n\t"
>
> /* Annotate a C jump table to allow objtool to follow the code flow */
> -#define __annotate_jump_table __section(".rodata..c_jump_table")
> +#define __annotate_jump_table __section(.rodata..c_jump_table)
>
> #else
> #define annotate_reachable()
> @@ -298,7 +298,7 @@ unsigned long read_word_at_a_time(const void *addr)
> * visible to the compiler.
> */
> #define __ADDRESSABLE(sym) \
> - static void * __section(".discard.addressable") __used \
> + static void * __section(.discard.addressable) __used \
> __PASTE(__addressable_##sym, __LINE__) = (void *)&sym;
>
> /**
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index fcb1386bb0d4..186bbd79d6ce 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -166,7 +166,7 @@ void cpu_startup_entry(enum cpuhp_state state);
> void cpu_idle_poll_ctrl(bool enable);
>
> /* Attach to any functions which should be considered cpuidle. */
> -#define __cpuidle __attribute__((__section__(".cpuidle.text")))
> +#define __cpuidle __section(.cpuidle.text)
>
> bool cpu_in_idle(unsigned long pc);
>
> diff --git a/include/linux/export.h b/include/linux/export.h
> index fd8711ed9ac4..808c1a0c2ef9 100644
> --- a/include/linux/export.h
> +++ b/include/linux/export.h
> @@ -104,7 +104,7 @@ struct kernel_symbol {
> * discarded in the final link stage.
> */
> #define __ksym_marker(sym) \
> - static int __ksym_marker_##sym[0] __section(".discard.ksym") __used
> + static int __ksym_marker_##sym[0] __section(.discard.ksym) __used
>
> #define __EXPORT_SYMBOL(sym, sec) \
> __ksym_marker(sym); \
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 6049baa5b8bc..50139505da34 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -51,12 +51,12 @@ extern struct cred init_cred;
>
> /* Attach to the init_task data structure for proper alignment */
> #ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK
> -#define __init_task_data __attribute__((__section__(".data..init_task")))
> +#define __init_task_data __section(.data..init_task)
> #else
> #define __init_task_data /**/
> #endif
>
> /* Attach to the thread_info data structure for proper alignment */
> -#define __init_thread_info __attribute__((__section__(".data..init_thread_info")))
> +#define __init_thread_info __section(.data..init_thread_info)
>
> #endif
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5b8328a99b2a..29debfe4dd0f 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -741,8 +741,7 @@ extern int arch_early_irq_init(void);
> /*
> * We want to know which function is an entrypoint of a hardirq or a softirq.
> */
> -#define __irq_entry __attribute__((__section__(".irqentry.text")))
> -#define __softirq_entry \
> - __attribute__((__section__(".softirqentry.text")))
> +#define __irq_entry __section(.irqentry.text)
> +#define __softirq_entry __section(.softirqentry.text)
>
> #endif
> diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h
> index 95fb9e025247..e17b66221fdd 100644
> --- a/include/linux/sched/debug.h
> +++ b/include/linux/sched/debug.h
> @@ -42,7 +42,7 @@ extern void proc_sched_set_task(struct task_struct *p);
> #endif
>
> /* Attach to any functions which should be ignored in wchan output. */
> -#define __sched __attribute__((__section__(".sched.text")))
> +#define __sched __section(.sched.text)
>
> /* Linker adds these: start and end of __sched functions */
> extern char __sched_text_start[], __sched_text_end[];
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 9cfcc8a756ae..9de652f4e1bd 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -124,7 +124,7 @@ struct srcu_struct {
> # define __DEFINE_SRCU(name, is_static) \
> is_static struct srcu_struct name; \
> struct srcu_struct * const __srcu_struct_##name \
> - __section("___srcu_struct_ptrs") = &name
> + __section(___srcu_struct_ptrs) = &name
> #else
> # define __DEFINE_SRCU(name, is_static) \
> static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data); \
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>
^ permalink raw reply
* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Jason Wang @ 2019-08-13 8:31 UTC (permalink / raw)
To: Jason Gunthorpe, Michael S. Tsirkin
Cc: kvm, virtualization, netdev, linux-kernel, linux-mm
In-Reply-To: <20190812130252.GE24457@ziepe.ca>
On 2019/8/12 下午9:02, Jason Gunthorpe wrote:
> On Mon, Aug 12, 2019 at 05:49:08AM -0400, Michael S. Tsirkin wrote:
>> On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:
>>> On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
>>>> On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
>>>>> Hi all:
>>>>>
>>>>> This series try to fix several issues introduced by meta data
>>>>> accelreation series. Please review.
>>>>>
>>>>> Changes from V4:
>>>>> - switch to use spinlock synchronize MMU notifier with accessors
>>>>>
>>>>> Changes from V3:
>>>>> - remove the unnecessary patch
>>>>>
>>>>> Changes from V2:
>>>>> - use seqlck helper to synchronize MMU notifier with vhost worker
>>>>>
>>>>> Changes from V1:
>>>>> - try not use RCU to syncrhonize MMU notifier with vhost worker
>>>>> - set dirty pages after no readers
>>>>> - return -EAGAIN only when we find the range is overlapped with
>>>>> metadata
>>>>>
>>>>> Jason Wang (9):
>>>>> vhost: don't set uaddr for invalid address
>>>>> vhost: validate MMU notifier registration
>>>>> vhost: fix vhost map leak
>>>>> vhost: reset invalidate_count in vhost_set_vring_num_addr()
>>>>> vhost: mark dirty pages during map uninit
>>>>> vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
>>>>> vhost: do not use RCU to synchronize MMU notifier with worker
>>>>> vhost: correctly set dirty pages in MMU notifiers callback
>>>>> vhost: do not return -EAGAIN for non blocking invalidation too early
>>>>>
>>>>> drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
>>>>> drivers/vhost/vhost.h | 6 +-
>>>>> 2 files changed, 122 insertions(+), 86 deletions(-)
>>>> This generally looks more solid.
>>>>
>>>> But this amounts to a significant overhaul of the code.
>>>>
>>>> At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
>>>> for this release, and then re-apply a corrected version
>>>> for the next one?
>>>
>>> If possible, consider we've actually disabled the feature. How about just
>>> queued those patches for next release?
>>>
>>> Thanks
>> Sorry if I was unclear. My idea is that
>> 1. I revert the disabled code
>> 2. You send a patch readding it with all the fixes squashed
>> 3. Maybe optimizations on top right away?
>> 4. We queue *that* for next and see what happens.
>>
>> And the advantage over the patchy approach is that the current patches
>> are hard to review. E.g. it's not reasonable to ask RCU guys to review
>> the whole of vhost for RCU usage but it's much more reasonable to ask
>> about a specific patch.
> I think there are other problems here too, I don't like that the use
> of mmu notifiers is so different from every other driver, or that GUP
> is called under spinlock.
What kind of issues do you see? Spinlock is to synchronize GUP with MMU
notifier in this series.
Btw, back to the original question. May I know why synchronize_rcu() is
not suitable? Consider:
- MMU notifier are allowed to sleep
- MMU notifier could be preempted
If you mean something that prevents RCU grace period from running. I'm
afraid MMU notifier is not the only victim. But it should be no more
worse than some one is holding a lock for very long time. If the only
concern is the preemption of vhost kthread, I can switch to use
rcu_read_lock_bh() instead.
Thanks
>
> So I favor the revert and try again approach as well. It is hard to
> get a clear picture with these endless bug fix patches
>
> Jason
Ok.
Thanks
^ permalink raw reply
* RE: [PATCH net-next v2 04/12] net: stmmac: Add Split Header support and enable it in XGMAC cores
From: Jose Abreu @ 2019-08-13 8:30 UTC (permalink / raw)
To: David Miller, jakub.kicinski@netronome.com
Cc: netdev@vger.kernel.org, Joao.Pinto@synopsys.com,
peppe.cavallaro@st.com, alexandre.torgue@st.com,
mcoquelin.stm32@gmail.com,
linux-stm32@st-md-mailman.stormreply.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20190812.140618.1288127671943783439.davem@davemloft.net>
++ Jakub
From: David Miller <davem@davemloft.net>
Date: Aug/12/2019, 22:06:18 (UTC+00:00)
> From: Jose Abreu <Jose.Abreu@synopsys.com>
> Date: Mon, 12 Aug 2019 11:44:03 +0200
>
> > - Add performance info (David)
>
> Ummm...
>
> Whilst cpu utilization is interesting, I might be mainly interested in
> how this effects "networking" performance. I find it very surprising
> that it isn't obvious that this is what I wanted.
>
> Do you not do performance testing on the networking level when you
> make fundamental changes to how packets are processed by the
> hardware/driver?
I do.
In my HW this feature does not impact performance neither improves it as
I'm already on max of theoretical bandwidth.
I do expect it to reduce CPU usage and memory footprint because we no
longer have to memcpy the entire buffer to SKB and instead we just copy
the header and then append payload as page which is passed directly to
net layer.
This feature is already used in some drivers and is part of GRO
offloading I think.
Jakub, as David is off can you please comment on how can we proceed with
this series ? I can add more information in commit log for this patch
...
---
Thanks,
Jose Miguel Abreu
^ permalink raw reply
* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain index
From: Paul Blakey @ 2019-08-13 8:29 UTC (permalink / raw)
To: Pravin Shelar
Cc: Linux Kernel Network Developers, David S. Miller, Justin Pettit,
Simon Horman, Marcelo Ricardo Leitner, Vlad Buslov, Jiri Pirko,
Roi Dayan, Yossi Kuperman, Rony Efraim, Oz Shlomo
In-Reply-To: <CAOrHB_CDrau-jLycRYxRkn1tEXVrRhoSYSd8sAcGPiZ-bp+FEg@mail.gmail.com>
On 8/12/2019 7:18 PM, Pravin Shelar wrote:
> On Sun, Aug 11, 2019 at 3:46 AM Paul Blakey <paulb@mellanox.com> wrote:
>>
>> On 8/8/2019 11:53 PM, Pravin Shelar wrote:
>>> On Wed, Aug 7, 2019 at 5:08 AM Paul Blakey <paulb@mellanox.com> wrote:
>>>> Offloaded OvS datapath rules are translated one to one to tc rules,
>>>> for example the following simplified OvS rule:
>>>>
>>>> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
>>>>
>>>> Will be translated to the following tc rule:
>>>>
>>>> $ tc filter add dev dev1 ingress \
>>>> prio 1 chain 0 proto ip \
>>>> flower tcp ct_state -trk \
>>>> action ct pipe \
>>>> action goto chain 2
>>>>
>>>> Received packets will first travel though tc, and if they aren't stolen
>>>> by it, like in the above rule, they will continue to OvS datapath.
>>>> Since we already did some actions (action ct in this case) which might
>>>> modify the packets, and updated action stats, we would like to continue
>>>> the proccessing with the correct recirc_id in OvS (here recirc_id(2))
>>>> where we left off.
>>>>
>>>> To support this, introduce a new skb extension for tc, which
>>>> will be used for translating tc chain to ovs recirc_id to
>>>> handle these miss cases. Last tc chain index will be set
>>>> by tc goto chain action and read by OvS datapath.
>>>>
>>>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>>>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>> include/linux/skbuff.h | 13 +++++++++++++
>>>> include/net/sch_generic.h | 5 ++++-
>>>> net/core/skbuff.c | 6 ++++++
>>>> net/openvswitch/flow.c | 9 +++++++++
>>>> net/sched/Kconfig | 13 +++++++++++++
>>>> net/sched/act_api.c | 1 +
>>>> net/sched/cls_api.c | 12 ++++++++++++
>>>> 7 files changed, 58 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 3aef8d8..fb2a792 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -279,6 +279,16 @@ struct nf_bridge_info {
>>>> };
>>>> #endif
>>>>
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> +/* Chain in tc_skb_ext will be used to share the tc chain with
>>>> + * ovs recirc_id. It will be set to the current chain by tc
>>>> + * and read by ovs to recirc_id.
>>>> + */
>>>> +struct tc_skb_ext {
>>>> + __u32 chain;
>>>> +};
>>>> +#endif
>>>> +
>>>> struct sk_buff_head {
>>>> /* These two members must be first. */
>>>> struct sk_buff *next;
>>>> @@ -4050,6 +4060,9 @@ enum skb_ext_id {
>>>> #ifdef CONFIG_XFRM
>>>> SKB_EXT_SEC_PATH,
>>>> #endif
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> + TC_SKB_EXT,
>>>> +#endif
>>>> SKB_EXT_NUM, /* must be last */
>>>> };
>>>>
>>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>>> index 6b6b012..871feea 100644
>>>> --- a/include/net/sch_generic.h
>>>> +++ b/include/net/sch_generic.h
>>>> @@ -275,7 +275,10 @@ struct tcf_result {
>>>> unsigned long class;
>>>> u32 classid;
>>>> };
>>>> - const struct tcf_proto *goto_tp;
>>>> + struct {
>>>> + const struct tcf_proto *goto_tp;
>>>> + u32 goto_index;
>>>> + };
>>>>
>>>> /* used in the skb_tc_reinsert function */
>>>> struct {
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index ea8e8d3..2b40b5a 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -4087,6 +4087,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>>>> #ifdef CONFIG_XFRM
>>>> [SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path),
>>>> #endif
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> + [TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext),
>>>> +#endif
>>>> };
>>>>
>>>> static __always_inline unsigned int skb_ext_total_length(void)
>>>> @@ -4098,6 +4101,9 @@ static __always_inline unsigned int skb_ext_total_length(void)
>>>> #ifdef CONFIG_XFRM
>>>> skb_ext_type_len[SKB_EXT_SEC_PATH] +
>>>> #endif
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> + skb_ext_type_len[TC_SKB_EXT] +
>>>> +#endif
>>>> 0;
>>>> }
>>>>
>>>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>>>> index bc89e16..0287ead 100644
>>>> --- a/net/openvswitch/flow.c
>>>> +++ b/net/openvswitch/flow.c
>>>> @@ -816,6 +816,9 @@ static int key_extract_mac_proto(struct sk_buff *skb)
>>>> int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>>>> struct sk_buff *skb, struct sw_flow_key *key)
>>>> {
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> + struct tc_skb_ext *tc_ext;
>>>> +#endif
>>>> int res, err;
>>>>
>>>> /* Extract metadata from packet. */
>>>> @@ -848,7 +851,13 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>>>> if (res < 0)
>>>> return res;
>>>> key->mac_proto = res;
>>>> +
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> + tc_ext = skb_ext_find(skb, TC_SKB_EXT);
>>>> + key->recirc_id = tc_ext ? tc_ext->chain : 0;
>>>> +#else
>>>> key->recirc_id = 0;
>>>> +#endif
>>>>
>>> Most of cases the config would be turned on, so the ifdef is not that
>>> useful. Can you add static key to avoid searching the skb-ext in non
>>> offload cases.
>> Hi,
>>
>> What do you mean by a static key?
>>
> https://www.kernel.org/doc/Documentation/static-keys.txt
>
> Static key can be enabled when a flow is added to the tc filter.
Hi and thanks for the feedback,
The skb_ext_find() just checks a single bit on the
skb->active_extensions, and if so returns an offset. Do you think it
will impact performance much?
But to your suggestion, do you mean that the first tc goto action
instance with the relevant ifdef (CONFIG_NET_TC_SKB_EXT) it will enable
the OvS static key that guards this skb_ext_find()?
I guess calling it in tcf_action_set_ctrlact() if goto_chain != 0.
This will expose some OvS helper function (or static key) to
net/sched/act_api.c right?
Thanks,
Paul.
^ permalink raw reply
* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Will Deacon @ 2019-08-13 8:27 UTC (permalink / raw)
To: Nick Desaulniers
Cc: akpm, sedat.dilek, jpoimboe, yhs, miguel.ojeda.sandonis,
clang-built-linux, Catalin Marinas, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrey Konovalov,
Greg Kroah-Hartman, Enrico Weigelt, Suzuki K Poulose,
Thomas Gleixner, Masayoshi Mizuma, Shaokun Zhang, Alexios Zavras,
Allison Randal, linux-arm-kernel, linux-kernel, netdev, bpf
In-Reply-To: <20190812215052.71840-12-ndesaulniers@google.com>
Hi Nick,
On Mon, Aug 12, 2019 at 02:50:45PM -0700, Nick Desaulniers wrote:
> GCC unescapes escaped string section names while Clang does not. Because
> __section uses the `#` stringification operator for the section name, it
> doesn't need to be escaped.
>
> This antipattern was found with:
> $ grep -e __section\(\" -e __section__\(\" -r
>
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
> arch/arm64/include/asm/cache.h | 2 +-
> arch/arm64/kernel/smp_spin_table.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
Does this fix a build issue, or is it just cosmetic or do we end up with
duplicate sections or something else?
Happy to route it via arm64, just having trouble working out whether it's
5.3 material!
Will
^ permalink raw reply
* [PATCH] net/mlx5: Fix a memory leak bug
From: Wenwen Wang @ 2019-08-13 8:21 UTC (permalink / raw)
To: Wenwen Wang
Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller,
open list:MELLANOX MLX5 core VPI driver,
open list:MELLANOX MLX5 core VPI driver, open list
In mlx5_cmd_invoke(), 'ent' is allocated through kzalloc() in alloc_cmd().
After the work is queued, wait_func() is invoked to wait the completion of
the work. If wait_func() returns -ETIMEDOUT, the following execution will
be terminated. However, the allocated 'ent' is not deallocated on this
program path, leading to a memory leak bug.
To fix the above issue, free 'ent' before returning the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 8cdd7e6..90cdb9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1036,7 +1036,7 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in,
err = wait_func(dev, ent);
if (err == -ETIMEDOUT)
- goto out;
+ goto out_free;
ds = ent->ts2 - ent->ts1;
op = MLX5_GET(mbox_in, in->first.data, opcode);
--
2.7.4
^ permalink raw reply related
* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Jason Wang @ 2019-08-13 8:12 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: kvm, virtualization, netdev, linux-kernel, linux-mm, jgg
In-Reply-To: <20190812054429-mutt-send-email-mst@kernel.org>
On 2019/8/12 下午5:49, Michael S. Tsirkin wrote:
> On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:
>> On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
>>> On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
>>>> Hi all:
>>>>
>>>> This series try to fix several issues introduced by meta data
>>>> accelreation series. Please review.
>>>>
>>>> Changes from V4:
>>>> - switch to use spinlock synchronize MMU notifier with accessors
>>>>
>>>> Changes from V3:
>>>> - remove the unnecessary patch
>>>>
>>>> Changes from V2:
>>>> - use seqlck helper to synchronize MMU notifier with vhost worker
>>>>
>>>> Changes from V1:
>>>> - try not use RCU to syncrhonize MMU notifier with vhost worker
>>>> - set dirty pages after no readers
>>>> - return -EAGAIN only when we find the range is overlapped with
>>>> metadata
>>>>
>>>> Jason Wang (9):
>>>> vhost: don't set uaddr for invalid address
>>>> vhost: validate MMU notifier registration
>>>> vhost: fix vhost map leak
>>>> vhost: reset invalidate_count in vhost_set_vring_num_addr()
>>>> vhost: mark dirty pages during map uninit
>>>> vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
>>>> vhost: do not use RCU to synchronize MMU notifier with worker
>>>> vhost: correctly set dirty pages in MMU notifiers callback
>>>> vhost: do not return -EAGAIN for non blocking invalidation too early
>>>>
>>>> drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
>>>> drivers/vhost/vhost.h | 6 +-
>>>> 2 files changed, 122 insertions(+), 86 deletions(-)
>>> This generally looks more solid.
>>>
>>> But this amounts to a significant overhaul of the code.
>>>
>>> At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
>>> for this release, and then re-apply a corrected version
>>> for the next one?
>>
>> If possible, consider we've actually disabled the feature. How about just
>> queued those patches for next release?
>>
>> Thanks
> Sorry if I was unclear. My idea is that
> 1. I revert the disabled code
> 2. You send a patch readding it with all the fixes squashed
> 3. Maybe optimizations on top right away?
> 4. We queue *that* for next and see what happens.
>
> And the advantage over the patchy approach is that the current patches
> are hard to review. E.g. it's not reasonable to ask RCU guys to review
> the whole of vhost for RCU usage but it's much more reasonable to ask
> about a specific patch.
Ok. Then I agree to revert.
Thanks
^ permalink raw reply
* Re: [PATCH v2 bpf-next] mm: mmap: increase sockets maximum memory size pgoff for 32bits
From: Magnus Karlsson @ 2019-08-13 8:02 UTC (permalink / raw)
To: Ivan Khoronzhuk
Cc: Björn Töpel, linux-mm, Xdp, Network Development, bpf,
linux-kernel, akpm, Alexei Starovoitov, Karlsson, Magnus
In-Reply-To: <20190812124326.32146-1-ivan.khoronzhuk@linaro.org>
On Mon, Aug 12, 2019 at 2:45 PM Ivan Khoronzhuk
<ivan.khoronzhuk@linaro.org> wrote:
>
> The AF_XDP sockets umem mapping interface uses XDP_UMEM_PGOFF_FILL_RING
> and XDP_UMEM_PGOFF_COMPLETION_RING offsets. The offsets seems like are
> established already and are part of configuration interface.
>
> But for 32-bit systems, while AF_XDP socket configuration, the values
> are to large to pass maximum allowed file size verification.
> The offsets can be tuned ofc, but instead of changing existent
> interface - extend max allowed file size for sockets.
Can you use mmap2() instead that takes a larger offset (2^44) even on
32-bit systems?
/Magnus
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>
> Based on bpf-next/master
>
> v2..v1:
> removed not necessarily #ifdev as ULL and UL for 64 has same size
>
> mm/mmap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 7e8c3e8ae75f..578f52812361 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1358,6 +1358,9 @@ static inline u64 file_mmap_size_max(struct file *file, struct inode *inode)
> if (S_ISBLK(inode->i_mode))
> return MAX_LFS_FILESIZE;
>
> + if (S_ISSOCK(inode->i_mode))
> + return MAX_LFS_FILESIZE;
> +
> /* Special "we do even unsigned file positions" case */
> if (file->f_mode & FMODE_UNSIGNED_OFFSET)
> return 0;
> --
> 2.17.1
>
^ permalink raw reply
* [PATCH net-next] net/mvpp2: Replace tasklet with softirq hrtimer
From: Sebastian Andrzej Siewior @ 2019-08-13 8:00 UTC (permalink / raw)
To: netdev; +Cc: Thomas Gleixner, David S. Miller, Maxime Chevallier
From: Thomas Gleixner <tglx@linutronix.de>
The tx_done_tasklet tasklet is used in invoke the hrtimer
(mvpp2_hr_timer_cb) in softirq context. This can be also achieved without
the tasklet but with HRTIMER_MODE_SOFT as hrtimer mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 3 +-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 58 +++++++------------
2 files changed, 23 insertions(+), 38 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index 4d9564ba68f69..ee3bab508ee8c 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -829,9 +829,8 @@ struct mvpp2_pcpu_stats {
/* Per-CPU port control */
struct mvpp2_port_pcpu {
struct hrtimer tx_done_timer;
+ struct net_device *dev;
bool timer_scheduled;
- /* Tasklet for egress finalization */
- struct tasklet_struct tx_done_tasklet;
};
struct mvpp2_queue_vector {
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 74fd9e1718654..12e799e99803c 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -2651,31 +2651,21 @@ static irqreturn_t mvpp2_link_status_isr(int irq, void *dev_id)
return IRQ_HANDLED;
}
-static void mvpp2_timer_set(struct mvpp2_port_pcpu *port_pcpu)
-{
- ktime_t interval;
-
- if (!port_pcpu->timer_scheduled) {
- port_pcpu->timer_scheduled = true;
- interval = MVPP2_TXDONE_HRTIMER_PERIOD_NS;
- hrtimer_start(&port_pcpu->tx_done_timer, interval,
- HRTIMER_MODE_REL_PINNED);
- }
-}
-
-static void mvpp2_tx_proc_cb(unsigned long data)
+static enum hrtimer_restart mvpp2_hr_timer_cb(struct hrtimer *timer)
{
- struct net_device *dev = (struct net_device *)data;
- struct mvpp2_port *port = netdev_priv(dev);
+ struct net_device *dev;
+ struct mvpp2_port *port;
struct mvpp2_port_pcpu *port_pcpu;
unsigned int tx_todo, cause;
- port_pcpu = per_cpu_ptr(port->pcpu,
- mvpp2_cpu_to_thread(port->priv, smp_processor_id()));
+ port_pcpu = container_of(timer, struct mvpp2_port_pcpu, tx_done_timer);
+ dev = port_pcpu->dev;
if (!netif_running(dev))
- return;
+ return HRTIMER_NORESTART;
+
port_pcpu->timer_scheduled = false;
+ port = netdev_priv(dev);
/* Process all the Tx queues */
cause = (1 << port->ntxqs) - 1;
@@ -2683,18 +2673,13 @@ static void mvpp2_tx_proc_cb(unsigned long data)
mvpp2_cpu_to_thread(port->priv, smp_processor_id()));
/* Set the timer in case not all the packets were processed */
- if (tx_todo)
- mvpp2_timer_set(port_pcpu);
-}
-
-static enum hrtimer_restart mvpp2_hr_timer_cb(struct hrtimer *timer)
-{
- struct mvpp2_port_pcpu *port_pcpu = container_of(timer,
- struct mvpp2_port_pcpu,
- tx_done_timer);
-
- tasklet_schedule(&port_pcpu->tx_done_tasklet);
+ if (tx_todo && !port_pcpu->timer_scheduled) {
+ port_pcpu->timer_scheduled = true;
+ hrtimer_forward_now(&port_pcpu->tx_done_timer,
+ MVPP2_TXDONE_HRTIMER_PERIOD_NS);
+ return HRTIMER_RESTART;
+ }
return HRTIMER_NORESTART;
}
@@ -3182,7 +3167,12 @@ static netdev_tx_t mvpp2_tx(struct sk_buff *skb, struct net_device *dev)
txq_pcpu->count > 0) {
struct mvpp2_port_pcpu *port_pcpu = per_cpu_ptr(port->pcpu, thread);
- mvpp2_timer_set(port_pcpu);
+ if (!port_pcpu->timer_scheduled) {
+ port_pcpu->timer_scheduled = true;
+ hrtimer_start(&port_pcpu->tx_done_timer,
+ MVPP2_TXDONE_HRTIMER_PERIOD_NS,
+ HRTIMER_MODE_REL_PINNED_SOFT);
+ }
}
if (test_bit(thread, &port->priv->lock_map))
@@ -3619,7 +3609,6 @@ static int mvpp2_stop(struct net_device *dev)
hrtimer_cancel(&port_pcpu->tx_done_timer);
port_pcpu->timer_scheduled = false;
- tasklet_kill(&port_pcpu->tx_done_tasklet);
}
}
mvpp2_cleanup_rxqs(port);
@@ -5183,13 +5172,10 @@ static int mvpp2_port_probe(struct platform_device *pdev,
port_pcpu = per_cpu_ptr(port->pcpu, thread);
hrtimer_init(&port_pcpu->tx_done_timer, CLOCK_MONOTONIC,
- HRTIMER_MODE_REL_PINNED);
+ HRTIMER_MODE_REL_PINNED_SOFT);
port_pcpu->tx_done_timer.function = mvpp2_hr_timer_cb;
port_pcpu->timer_scheduled = false;
-
- tasklet_init(&port_pcpu->tx_done_tasklet,
- mvpp2_tx_proc_cb,
- (unsigned long)dev);
+ port_pcpu->dev = dev;
}
}
--
2.23.0.rc1
^ permalink raw reply related
* Re: tun: mark small packets as owned by the tap sock
From: Jack Wang @ 2019-08-13 7:58 UTC (permalink / raw)
To: Dave Jones; +Cc: Alexis Bauvin, netdev, stable
In-Reply-To: <20190812221954.GA13314@codemonkey.org.uk>
Dave Jones <davej@codemonkey.org.uk> 于2019年8月13日周二 上午1:05写道:
>
> On Wed, Aug 07, 2019 at 12:30:07AM +0000, Linux Kernel wrote:
> > Commit: 4b663366246be1d1d4b1b8b01245b2e88ad9e706
> > Parent: 16b2084a8afa1432d14ba72b7c97d7908e178178
> > Web: https://git.kernel.org/torvalds/c/4b663366246be1d1d4b1b8b01245b2e88ad9e706
> > Author: Alexis Bauvin <abauvin@scaleway.com>
> > AuthorDate: Tue Jul 23 16:23:01 2019 +0200
> >
> > tun: mark small packets as owned by the tap sock
> >
> > - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
> >
> > Small packets going out of a tap device go through an optimized code
> > path that uses build_skb() rather than sock_alloc_send_pskb(). The
> > latter calls skb_set_owner_w(), but the small packet code path does not.
> >
> > The net effect is that small packets are not owned by the userland
> > application's socket (e.g. QEMU), while large packets are.
> > This can be seen with a TCP session, where packets are not owned when
> > the window size is small enough (around PAGE_SIZE), while they are once
> > the window grows (note that this requires the host to support virtio
> > tso for the guest to offload segmentation).
> > All this leads to inconsistent behaviour in the kernel, especially on
> > netfilter modules that uses sk->socket (e.g. xt_owner).
> >
> > Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet")
> > Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
>
> This commit breaks ipv6 routing when I deployed on it a linode.
> It seems to work briefly after boot, and then silently all packets get
> dropped. (Presumably, it's dropping RA or ND packets)
>
> With this reverted, everything works as it did in rc3.
>
> Dave
>
Thanks for reporting, Dave.
+cc stable
Just noticed, the patch has been backported to 4.14,4.19, 5.2
Regards,
Jack Wang
^ permalink raw reply
* Re: [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Felipe Balbi @ 2019-08-13 7:50 UTC (permalink / raw)
To: Andrew Lunn
Cc: Richard Cochran, netdev, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, H . Peter Anvin, x86, linux-kernel,
Christopher S . Hall
In-Reply-To: <20190719132021.GC24930@lunn.ch>
Hi,
Andrew Lunn <andrew@lunn.ch> writes:
>> Andrew Lunn <andrew@lunn.ch> writes:
>> > On Tue, Jul 16, 2019 at 10:20:33AM +0300, Felipe Balbi wrote:
>> >> TGPIO is a new IP which allows for time synchronization between systems
>> >> without any other means of synchronization such as PTP or NTP. The
>> >> driver is implemented as part of the PTP framework since its features
>> >> covered most of what this controller can do.
>> >
>> > Hi Felipe
>> >
>> > Given the name TGPIO, can it also be used for plain old boring GPIO?
>>
>> not really, no. This is a misnomer, IMHO :-) We can only assert output
>> pulses at specified intervals or capture a timestamp of an external
>> signal.
>
> Hi Felipe
>
> So i guess Intel Marketing wants to call it a GPIO, but between
> engineers can we give it a better name?
If we do that we make it difficult for those reading specification and
trying to find the matching driver.
>> > Also, is this always embedded into a SoC? Or could it actually be in a
>> > discrete NIC?
>>
>> Technically, this could be done as a discrete, but it isn't. In any
>> case, why does that matter? From a linux-point of view, we have a device
>> driver either way.
>
> I've seen a lot of i210 used with ARM SoCs. How necessary is the tsc
> patch? Is there an architecture independent alternative?
Without the TSC patch, we don't get the timestamp we need. One can argue
that $this driver could call get_tsc_ns() directly instead of providing
a wrapper for it. But that's something else entirely.
--
balbi
^ permalink raw reply
* Re: [PATCH 13/16] include/asm-generic: prefer __section from compiler_attributes.h
From: Arnd Bergmann @ 2019-08-13 7:57 UTC (permalink / raw)
To: Nick Desaulniers
Cc: Andrew Morton, sedat.dilek, Josh Poimboeuf, Yonghong Song,
Miguel Ojeda Sandonis, clang-built-linux, Naveen N. Rao,
Anil S Keshavamurthy, David S. Miller, Masami Hiramatsu,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
linux-arch, Linux Kernel Mailing List, Networking, bpf
In-Reply-To: <20190812215052.71840-13-ndesaulniers@google.com>
On Mon, Aug 12, 2019 at 11:52 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
The patch looks fine, but it looks like you forgot to add a description.
Arnd
^ permalink raw reply
* [PATCH net-next v2 09/14] devlink: Add generic packet traps and groups
From: Ido Schimmel @ 2019-08-13 7:53 UTC (permalink / raw)
To: netdev
Cc: davem, nhorman, jiri, toke, dsahern, roopa, nikolay,
jakub.kicinski, andy, f.fainelli, andrew, vivien.didelot, mlxsw,
Ido Schimmel
In-Reply-To: <20190813075400.11841-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
Add generic packet traps and groups that can report dropped packets as
well as exceptions such as TTL error.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/net/devlink.h | 40 ++++++++++++++++++++++++++++++++++++++++
net/core/devlink.c | 12 ++++++++++++
2 files changed, 52 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 03b32e33e93e..fb02e0e89f9d 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -541,18 +541,58 @@ struct devlink_trap {
};
enum devlink_trap_generic_id {
+ DEVLINK_TRAP_GENERIC_ID_SMAC_MC,
+ DEVLINK_TRAP_GENERIC_ID_VLAN_TAG_MISMATCH,
+ DEVLINK_TRAP_GENERIC_ID_INGRESS_VLAN_FILTER,
+ DEVLINK_TRAP_GENERIC_ID_INGRESS_STP_FILTER,
+ DEVLINK_TRAP_GENERIC_ID_EMPTY_TX_LIST,
+ DEVLINK_TRAP_GENERIC_ID_PORT_LOOPBACK_FILTER,
+ DEVLINK_TRAP_GENERIC_ID_BLACKHOLE_ROUTE,
+ DEVLINK_TRAP_GENERIC_ID_TTL_ERROR,
+ DEVLINK_TRAP_GENERIC_ID_TAIL_DROP,
+
/* Add new generic trap IDs above */
__DEVLINK_TRAP_GENERIC_ID_MAX,
DEVLINK_TRAP_GENERIC_ID_MAX = __DEVLINK_TRAP_GENERIC_ID_MAX - 1,
};
enum devlink_trap_group_generic_id {
+ DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS,
+ DEVLINK_TRAP_GROUP_GENERIC_ID_L3_DROPS,
+ DEVLINK_TRAP_GROUP_GENERIC_ID_BUFFER_DROPS,
+
/* Add new generic trap group IDs above */
__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX,
DEVLINK_TRAP_GROUP_GENERIC_ID_MAX =
__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX - 1,
};
+#define DEVLINK_TRAP_GENERIC_NAME_SMAC_MC \
+ "source_mac_is_multicast"
+#define DEVLINK_TRAP_GENERIC_NAME_VLAN_TAG_MISMATCH \
+ "vlan_tag_mismatch"
+#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_VLAN_FILTER \
+ "ingress_vlan_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_STP_FILTER \
+ "ingress_spanning_tree_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_EMPTY_TX_LIST \
+ "port_list_is_empty"
+#define DEVLINK_TRAP_GENERIC_NAME_PORT_LOOPBACK_FILTER \
+ "port_loopback_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_BLACKHOLE_ROUTE \
+ "blackhole_route"
+#define DEVLINK_TRAP_GENERIC_NAME_TTL_ERROR \
+ "ttl_value_is_too_small"
+#define DEVLINK_TRAP_GENERIC_NAME_TAIL_DROP \
+ "tail_drop"
+
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \
+ "l2_drops"
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L3_DROPS \
+ "l3_drops"
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_BUFFER_DROPS \
+ "buffer_drops"
+
#define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group, _metadata_cap) \
{ \
.type = DEVLINK_TRAP_TYPE_##_type, \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 3a3ef2f8c133..6b15b9f744c7 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -7442,6 +7442,15 @@ EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
}
static const struct devlink_trap devlink_trap_generic[] = {
+ DEVLINK_TRAP(SMAC_MC, DROP),
+ DEVLINK_TRAP(VLAN_TAG_MISMATCH, DROP),
+ DEVLINK_TRAP(INGRESS_VLAN_FILTER, DROP),
+ DEVLINK_TRAP(INGRESS_STP_FILTER, DROP),
+ DEVLINK_TRAP(EMPTY_TX_LIST, DROP),
+ DEVLINK_TRAP(PORT_LOOPBACK_FILTER, DROP),
+ DEVLINK_TRAP(BLACKHOLE_ROUTE, DROP),
+ DEVLINK_TRAP(TTL_ERROR, EXCEPTION),
+ DEVLINK_TRAP(TAIL_DROP, DROP),
};
#define DEVLINK_TRAP_GROUP(_id) \
@@ -7451,6 +7460,9 @@ static const struct devlink_trap devlink_trap_generic[] = {
}
static const struct devlink_trap_group devlink_trap_group_generic[] = {
+ DEVLINK_TRAP_GROUP(L2_DROPS),
+ DEVLINK_TRAP_GROUP(L3_DROPS),
+ DEVLINK_TRAP_GROUP(BUFFER_DROPS),
};
static int devlink_trap_generic_verify(const struct devlink_trap *trap)
--
2.21.0
^ permalink raw reply related
* [PATCH net-next v2 14/14] Documentation: Add a section for devlink-trap testing
From: Ido Schimmel @ 2019-08-13 7:54 UTC (permalink / raw)
To: netdev
Cc: davem, nhorman, jiri, toke, dsahern, roopa, nikolay,
jakub.kicinski, andy, f.fainelli, andrew, vivien.didelot, mlxsw,
Ido Schimmel
In-Reply-To: <20190813075400.11841-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
Documentation/networking/devlink-trap.rst | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/Documentation/networking/devlink-trap.rst b/Documentation/networking/devlink-trap.rst
index fe4f6e149623..7b07442b3ec3 100644
--- a/Documentation/networking/devlink-trap.rst
+++ b/Documentation/networking/devlink-trap.rst
@@ -196,3 +196,12 @@ narrow. The description of these groups must be added to the following table:
* - ``buffer_drops``
- Contains packet traps for packets that were dropped by the device due to
an enqueue decision
+
+Testing
+=======
+
+See ``tools/testing/selftests/net/devlink_trap.sh`` for a test covering the
+core infrastructure. Test cases should be added for any new functionality.
+
+Device drivers should focus their tests on device-specific functionality, such
+as the triggering of supported packet traps.
--
2.21.0
^ permalink raw reply related
* [PATCH net-next v2 12/14] Documentation: Add description of netdevsim traps
From: Ido Schimmel @ 2019-08-13 7:53 UTC (permalink / raw)
To: netdev
Cc: davem, nhorman, jiri, toke, dsahern, roopa, nikolay,
jakub.kicinski, andy, f.fainelli, andrew, vivien.didelot, mlxsw,
Ido Schimmel
In-Reply-To: <20190813075400.11841-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
.../networking/devlink-trap-netdevsim.rst | 20 +++++++++++++++++++
Documentation/networking/devlink-trap.rst | 11 ++++++++++
Documentation/networking/index.rst | 1 +
drivers/net/netdevsim/dev.c | 3 +++
4 files changed, 35 insertions(+)
create mode 100644 Documentation/networking/devlink-trap-netdevsim.rst
diff --git a/Documentation/networking/devlink-trap-netdevsim.rst b/Documentation/networking/devlink-trap-netdevsim.rst
new file mode 100644
index 000000000000..b721c9415473
--- /dev/null
+++ b/Documentation/networking/devlink-trap-netdevsim.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Devlink Trap netdevsim
+======================
+
+Driver-specific Traps
+=====================
+
+.. list-table:: List of Driver-specific Traps Registered by ``netdevsim``
+ :widths: 5 5 90
+
+ * - Name
+ - Type
+ - Description
+ * - ``fid_miss``
+ - ``exception``
+ - When a packet enters the device it is classified to a filtering
+ indentifier (FID) based on the ingress port and VLAN. This trap is used
+ to trap packets for which a FID could not be found
diff --git a/Documentation/networking/devlink-trap.rst b/Documentation/networking/devlink-trap.rst
index dbc7a3e00fd8..fe4f6e149623 100644
--- a/Documentation/networking/devlink-trap.rst
+++ b/Documentation/networking/devlink-trap.rst
@@ -162,6 +162,17 @@ be added to the following table:
- Traps packets that the device decided to drop because they could not be
enqueued to a transmission queue which is full
+Driver-specific Packet Traps
+============================
+
+Device drivers can register driver-specific packet traps, but these must be
+clearly documented. Such traps can correspond to device-specific exceptions and
+help debug packet drops caused by these exceptions. The following list includes
+links to the description of driver-specific traps registered by various device
+drivers:
+
+ * :doc:`/devlink-trap-netdevsim`
+
Generic Packet Trap Groups
==========================
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 86a814e4d450..37eabc17894c 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -15,6 +15,7 @@ Contents:
dsa/index
devlink-info-versions
devlink-trap
+ devlink-trap-netdevsim
ieee802154
kapi
z8530book
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 2758d95c8d18..6691eeb039ae 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -270,6 +270,9 @@ struct nsim_trap_data {
spinlock_t trap_lock; /* Protects trap_items_arr */
};
+/* All driver-specific traps must be documented in
+ * Documentation/networking/devlink-trap-netdevsim.rst
+ */
enum {
NSIM_TRAP_ID_BASE = DEVLINK_TRAP_GENERIC_ID_MAX,
NSIM_TRAP_ID_FID_MISS,
--
2.21.0
^ permalink raw reply related
* [PATCH net-next v2 13/14] selftests: devlink_trap: Add test cases for devlink-trap
From: Ido Schimmel @ 2019-08-13 7:53 UTC (permalink / raw)
To: netdev
Cc: davem, nhorman, jiri, toke, dsahern, roopa, nikolay,
jakub.kicinski, andy, f.fainelli, andrew, vivien.didelot, mlxsw,
Ido Schimmel
In-Reply-To: <20190813075400.11841-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
Add test cases for devlink-trap on top of the netdevsim implementation.
The tests focus on the devlink-trap core infrastructure and user space
API. They test both good and bad flows and also dismantle of the netdev
and devlink device used to report trapped packets.
This allows device drivers to focus their tests on device-specific
functionality.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/devlink_trap.sh | 616 ++++++++++++++++++++
3 files changed, 618 insertions(+), 1 deletion(-)
create mode 100755 tools/testing/selftests/net/devlink_trap.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 0bd6b23c97ef..4ea59bf4eeec 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -10,7 +10,7 @@ TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
TEST_PROGS += udpgro_bench.sh udpgro.sh test_vxlan_under_vrf.sh reuseport_addr_any.sh
TEST_PROGS += test_vxlan_fdb_changelink.sh so_txtime.sh ipv6_flowlabel.sh
-TEST_PROGS += tcp_fastopen_backup_key.sh fcnal-test.sh l2tp.sh
+TEST_PROGS += tcp_fastopen_backup_key.sh fcnal-test.sh l2tp.sh devlink_trap.sh
TEST_PROGS_EXTENDED := in_netns.sh
TEST_GEN_FILES = socket nettest
TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index b8503a8119b0..ddd400fd5dce 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -29,3 +29,4 @@ CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_ETF=m
CONFIG_TEST_BLACKHOLE_DEV=m
CONFIG_KALLSYMS=y
+CONFIG_NETDEVSIM=y
diff --git a/tools/testing/selftests/net/devlink_trap.sh b/tools/testing/selftests/net/devlink_trap.sh
new file mode 100755
index 000000000000..577bbd6c9411
--- /dev/null
+++ b/tools/testing/selftests/net/devlink_trap.sh
@@ -0,0 +1,616 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# This test is for checking devlink-trap functionality. It makes use of
+# netdevsim which implements the required callbacks.
+
+##############################################################################
+# Global variables
+
+# Kselftest framework requirement - SKIP code is 4.
+KSFT_SKIP=4
+# Exit status to return at the end. Set in case one of the tests fails.
+EXIT_STATUS=0
+# Per-test return value. Clear at the beginning of each test.
+RET=0
+
+# netdevsim-specific variables.
+NETDEVSIM_PATH=/sys/bus/netdevsim/
+DEV_ADDR=1337
+SLEEP_TIME=.3
+DEVLINK_DEV=""
+NETDEV=""
+DEV=""
+
+##############################################################################
+# Dependencies
+
+if [ "$(id -u)" -ne 0 ]; then
+ echo "SKIP: Need root privileges"
+ exit $KSFT_SKIP
+fi
+
+if [ ! -d "$NETDEVSIM_PATH" ]; then
+ echo "SKIP: No netdevsim support"
+ exit $KSFT_SKIP
+fi
+
+if [ -d "${NETDEVSIM_PATH}/devices/netdevsim${DEV_ADDR}" ]; then
+ echo "SKIP: Device netdevsim${DEV_ADDR} already exists"
+ exit $KSFT_SKIP
+fi
+
+if [ ! -x "$(command -v jq)" ]; then
+ echo "SKIP: Could not run test without jq tool"
+ exit $KSFT_SKIP
+fi
+
+if [ ! -x "$(command -v udevadm)" ]; then
+ echo "SKIP: Could not run test without udevadm tool"
+ exit $KSFT_SKIP
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+ echo "SKIP: Could not run test without ip tool"
+ exit $KSFT_SKIP
+fi
+
+if [ ! -x "$(command -v devlink)" ]; then
+ echo "SKIP: Could not run test without devlink tool"
+ exit $KSFT_SKIP
+fi
+
+devlink help 2>&1 | grep -q "trap"
+if [ $? -ne 0 ]; then
+ echo "SKIP: iproute2 too old, missing devlink-trap"
+ exit $KSFT_SKIP
+fi
+
+##############################################################################
+# Helpers
+
+check_err()
+{
+ local err=$1; shift
+ local msg=$1; shift
+
+ if [[ $RET -eq 0 && $err -ne 0 ]]; then
+ RET=$err
+ RETMSG=$msg
+ fi
+}
+
+check_fail()
+{
+ local err=$1; shift
+ local msg=$1; shift
+
+ if [[ $RET -eq 0 && $err -eq 0 ]]; then
+ RET=1
+ RETMSG=$msg
+ fi
+}
+
+log_test()
+{
+ local test_name=$1
+ local opt_str=$2
+
+ if [[ $# -eq 2 ]]; then
+ opt_str="($opt_str)"
+ fi
+
+ if [[ $RET -ne 0 ]]; then
+ EXIT_STATUS=1
+ printf "TEST: %-60s [FAIL]\n" "$test_name $opt_str"
+ if [[ ! -z "$RETMSG" ]]; then
+ printf "\t%s\n" "$RETMSG"
+ fi
+ return 1
+ fi
+
+ printf "TEST: %-60s [ OK ]\n" "$test_name $opt_str"
+ return 0
+}
+
+netdevsim_dev_create()
+{
+ echo "$DEV_ADDR 0" > ${NETDEVSIM_PATH}/new_device
+}
+
+netdevsim_dev_destroy()
+{
+ echo "$DEV_ADDR" > ${NETDEVSIM_PATH}/del_device
+}
+
+netdevsim_port_create()
+{
+ echo 1 > ${NETDEVSIM_PATH}/devices/${DEV}/new_port
+}
+
+netdevsim_port_destroy()
+{
+ echo 1 > ${NETDEVSIM_PATH}/devices/${DEV}/del_port
+}
+
+##############################################################################
+# Trap helpers
+
+devlink_traps_num_get()
+{
+ devlink -j trap | jq '.[]["'$DEVLINK_DEV'"] | length'
+}
+
+devlink_traps_get()
+{
+ devlink -j trap | jq -r '.[]["'$DEVLINK_DEV'"][].name'
+}
+
+devlink_trap_type_get()
+{
+ local trap_name=$1; shift
+
+ devlink -j trap show $DEVLINK_DEV trap $trap_name \
+ | jq -r '.[][][].type'
+}
+
+devlink_trap_action_set()
+{
+ local trap_name=$1; shift
+ local action=$1; shift
+
+ # Pipe output to /dev/null to avoid expected warnings.
+ devlink trap set $DEVLINK_DEV trap $trap_name \
+ action $action &> /dev/null
+}
+
+devlink_trap_action_get()
+{
+ local trap_name=$1; shift
+
+ devlink -j trap show $DEVLINK_DEV trap $trap_name \
+ | jq -r '.[][][].action'
+}
+
+devlink_trap_group_get()
+{
+ devlink -j trap show $DEVLINK_DEV trap $trap_name \
+ | jq -r '.[][][].group'
+}
+
+devlink_trap_metadata_test()
+{
+ local trap_name=$1; shift
+ local metadata=$1; shift
+
+ devlink -jv trap show $DEVLINK_DEV trap $trap_name \
+ | jq -e '.[][][].metadata | contains(["'$metadata'"])' \
+ &> /dev/null
+}
+
+devlink_trap_rx_packets_get()
+{
+ local trap_name=$1; shift
+
+ devlink -js trap show $DEVLINK_DEV trap $trap_name \
+ | jq '.[][][]["stats"]["rx"]["packets"]'
+}
+
+devlink_trap_rx_bytes_get()
+{
+ local trap_name=$1; shift
+
+ devlink -js trap show $DEVLINK_DEV trap $trap_name \
+ | jq '.[][][]["stats"]["rx"]["bytes"]'
+}
+
+devlink_trap_stats_idle_test()
+{
+ local trap_name=$1; shift
+ local t0_packets t0_bytes
+ local t1_packets t1_bytes
+
+ t0_packets=$(devlink_trap_rx_packets_get $trap_name)
+ t0_bytes=$(devlink_trap_rx_bytes_get $trap_name)
+
+ sleep $SLEEP_TIME
+
+ t1_packets=$(devlink_trap_rx_packets_get $trap_name)
+ t1_bytes=$(devlink_trap_rx_bytes_get $trap_name)
+
+ if [[ $t0_packets -eq $t1_packets && $t0_bytes -eq $t1_bytes ]]; then
+ return 0
+ else
+ return 1
+ fi
+}
+
+devlink_traps_enable_all()
+{
+ local trap_name
+
+ for trap_name in $(devlink_traps_get); do
+ devlink_trap_action_set $trap_name "trap"
+ done
+}
+
+devlink_traps_disable_all()
+{
+ for trap_name in $(devlink_traps_get); do
+ devlink_trap_action_set $trap_name "drop"
+ done
+}
+
+##############################################################################
+# Trap group helpers
+
+devlink_trap_groups_get()
+{
+ devlink -j trap group | jq -r '.[]["'$DEVLINK_DEV'"][].name'
+}
+
+devlink_trap_group_action_set()
+{
+ local group_name=$1; shift
+ local action=$1; shift
+
+ # Pipe output to /dev/null to avoid expected warnings.
+ devlink trap group set $DEVLINK_DEV group $group_name action $action \
+ &> /dev/null
+}
+
+devlink_trap_group_rx_packets_get()
+{
+ local group_name=$1; shift
+
+ devlink -js trap group show $DEVLINK_DEV group $group_name \
+ | jq '.[][][]["stats"]["rx"]["packets"]'
+}
+
+devlink_trap_group_rx_bytes_get()
+{
+ local group_name=$1; shift
+
+ devlink -js trap group show $DEVLINK_DEV group $group_name \
+ | jq '.[][][]["stats"]["rx"]["bytes"]'
+}
+
+devlink_trap_group_stats_idle_test()
+{
+ local group_name=$1; shift
+ local t0_packets t0_bytes
+ local t1_packets t1_bytes
+
+ t0_packets=$(devlink_trap_group_rx_packets_get $group_name)
+ t0_bytes=$(devlink_trap_group_rx_bytes_get $group_name)
+
+ sleep $SLEEP_TIME
+
+ t1_packets=$(devlink_trap_group_rx_packets_get $group_name)
+ t1_bytes=$(devlink_trap_group_rx_bytes_get $group_name)
+
+ if [[ $t0_packets -eq $t1_packets && $t0_bytes -eq $t1_bytes ]]; then
+ return 0
+ else
+ return 1
+ fi
+}
+
+##############################################################################
+# Initialization
+
+setup_prepare()
+{
+ local netdev
+
+ DEV=netdevsim${DEV_ADDR}
+ DEVLINK_DEV=netdevsim/${DEV}
+
+ netdevsim_dev_create
+
+ if [ ! -d "${NETDEVSIM_PATH}/devices/${DEV}" ]; then
+ echo "Failed to create netdevsim device"
+ exit 1
+ fi
+
+ netdevsim_port_create
+
+ if [ ! -d "${NETDEVSIM_PATH}/devices/${DEV}/net/" ]; then
+ echo "Failed to create netdevsim port"
+ exit 1
+ fi
+
+ # Wait for udev to rename newly created netdev.
+ udevadm settle
+
+ NETDEV=$(ls ${NETDEVSIM_PATH}/devices/${DEV}/net/)
+}
+
+cleanup()
+{
+ netdevsim_port_destroy
+ netdevsim_dev_destroy
+}
+
+##############################################################################
+# Tests
+
+init_test()
+{
+ RET=0
+
+ test $(devlink_traps_num_get) -ne 0
+ check_err $? "No traps were registered"
+
+ log_test "Initialization"
+}
+
+trap_action_test()
+{
+ local orig_action
+ local trap_name
+ local action
+
+ RET=0
+
+ for trap_name in $(devlink_traps_get); do
+ # The action of non-drop traps cannot be changed.
+ if [ $(devlink_trap_type_get $trap_name) = "drop" ]; then
+ devlink_trap_action_set $trap_name "trap"
+ action=$(devlink_trap_action_get $trap_name)
+ if [ $action != "trap" ]; then
+ check_err 1 "Trap $trap_name did not change action to trap"
+ fi
+
+ devlink_trap_action_set $trap_name "drop"
+ action=$(devlink_trap_action_get $trap_name)
+ if [ $action != "drop" ]; then
+ check_err 1 "Trap $trap_name did not change action to drop"
+ fi
+ else
+ orig_action=$(devlink_trap_action_get $trap_name)
+
+ devlink_trap_action_set $trap_name "trap"
+ action=$(devlink_trap_action_get $trap_name)
+ if [ $action != $orig_action ]; then
+ check_err 1 "Trap $trap_name changed action when should not"
+ fi
+
+ devlink_trap_action_set $trap_name "drop"
+ action=$(devlink_trap_action_get $trap_name)
+ if [ $action != $orig_action ]; then
+ check_err 1 "Trap $trap_name changed action when should not"
+ fi
+ fi
+ done
+
+ log_test "Trap action"
+}
+
+trap_metadata_test()
+{
+ local trap_name
+
+ RET=0
+
+ for trap_name in $(devlink_traps_get); do
+ devlink_trap_metadata_test $trap_name "input_port"
+ check_err $? "Input port not reported as metadata of trap $trap_name"
+ done
+
+ log_test "Trap metadata"
+}
+
+bad_trap_test()
+{
+ RET=0
+
+ devlink_trap_action_set "made_up_trap" "drop"
+ check_fail $? "Did not get an error for non-existing trap"
+
+ log_test "Non-existing trap"
+}
+
+bad_trap_action_test()
+{
+ local traps_arr
+ local trap_name
+
+ RET=0
+
+ # Pick first trap.
+ traps_arr=($(devlink_traps_get))
+ trap_name=${traps_arr[0]}
+
+ devlink_trap_action_set $trap_name "made_up_action"
+ check_fail $? "Did not get an error for non-existing trap action"
+
+ log_test "Non-existing trap action"
+}
+
+trap_stats_test()
+{
+ local trap_name
+
+ RET=0
+
+ for trap_name in $(devlink_traps_get); do
+ devlink_trap_stats_idle_test $trap_name
+ check_err $? "Stats of trap $trap_name not idle when netdev down"
+
+ ip link set dev $NETDEV up
+
+ if [ $(devlink_trap_type_get $trap_name) = "drop" ]; then
+ devlink_trap_action_set $trap_name "trap"
+ devlink_trap_stats_idle_test $trap_name
+ check_fail $? "Stats of trap $trap_name idle when action is trap"
+
+ devlink_trap_action_set $trap_name "drop"
+ devlink_trap_stats_idle_test $trap_name
+ check_err $? "Stats of trap $trap_name not idle when action is drop"
+ else
+ devlink_trap_stats_idle_test $trap_name
+ check_fail $? "Stats of non-drop trap $trap_name idle when should not"
+ fi
+
+ ip link set dev $NETDEV down
+ done
+
+ log_test "Trap statistics"
+}
+
+trap_group_action_test()
+{
+ local curr_group group_name
+ local trap_name
+ local trap_type
+ local action
+
+ RET=0
+
+ for group_name in $(devlink_trap_groups_get); do
+ devlink_trap_group_action_set $group_name "trap"
+
+ for trap_name in $(devlink_traps_get); do
+ curr_group=$(devlink_trap_group_get $trap_name)
+ if [ $curr_group != $group_name ]; then
+ continue
+ fi
+
+ trap_type=$(devlink_trap_type_get $trap_name)
+ if [ $trap_type != "drop" ]; then
+ continue
+ fi
+
+ action=$(devlink_trap_action_get $trap_name)
+ if [ $action != "trap" ]; then
+ check_err 1 "Trap $trap_name did not change action to trap"
+ fi
+ done
+
+ devlink_trap_group_action_set $group_name "drop"
+
+ for trap_name in $(devlink_traps_get); do
+ curr_group=$(devlink_trap_group_get $trap_name)
+ if [ $curr_group != $group_name ]; then
+ continue
+ fi
+
+ trap_type=$(devlink_trap_type_get $trap_name)
+ if [ $trap_type != "drop" ]; then
+ continue
+ fi
+
+ action=$(devlink_trap_action_get $trap_name)
+ if [ $action != "drop" ]; then
+ check_err 1 "Trap $trap_name did not change action to drop"
+ fi
+ done
+ done
+
+ log_test "Trap group action"
+}
+
+bad_trap_group_test()
+{
+ RET=0
+
+ devlink_trap_group_action_set "made_up_trap_group" "drop"
+ check_fail $? "Did not get an error for non-existing trap group"
+
+ log_test "Non-existing trap group"
+}
+
+trap_group_stats_test()
+{
+ local group_name
+
+ RET=0
+
+ for group_name in $(devlink_trap_groups_get); do
+ devlink_trap_group_stats_idle_test $group_name
+ check_err $? "Stats of trap group $group_name not idle when netdev down"
+
+ ip link set dev $NETDEV up
+
+ devlink_trap_group_action_set $group_name "trap"
+ devlink_trap_group_stats_idle_test $group_name
+ check_fail $? "Stats of trap group $group_name idle when action is trap"
+
+ devlink_trap_group_action_set $group_name "drop"
+ ip link set dev $NETDEV down
+ done
+
+ log_test "Trap group statistics"
+}
+
+port_del_test()
+{
+ local group_name
+ local i
+
+ # The test never fails. It is meant to exercise different code paths
+ # and make sure we properly dismantle a port while packets are
+ # in-flight.
+ RET=0
+
+ devlink_traps_enable_all
+
+ for i in $(seq 1 10); do
+ ip link set dev $NETDEV up
+
+ sleep $SLEEP_TIME
+
+ netdevsim_port_destroy
+ netdevsim_port_create
+ udevadm settle
+ done
+
+ devlink_traps_disable_all
+
+ log_test "Port delete"
+}
+
+dev_del_test()
+{
+ local group_name
+ local i
+
+ # The test never fails. It is meant to exercise different code paths
+ # and make sure we properly unregister traps while packets are
+ # in-flight.
+ RET=0
+
+ devlink_traps_enable_all
+
+ for i in $(seq 1 10); do
+ ip link set dev $NETDEV up
+
+ sleep $SLEEP_TIME
+
+ cleanup
+ setup_prepare
+ done
+
+ devlink_traps_disable_all
+
+ log_test "Device delete"
+}
+
+trap cleanup EXIT
+
+# Each test should make sure that initial state is resumed at the end.
+setup_prepare
+init_test
+trap_action_test
+trap_metadata_test
+bad_trap_test
+bad_trap_action_test
+trap_stats_test
+trap_group_action_test
+bad_trap_group_test
+trap_group_stats_test
+port_del_test
+dev_del_test
+
+exit $EXIT_STATUS
--
2.21.0
^ permalink raw reply related
* [PATCH net-next v2 11/14] netdevsim: Add devlink-trap support
From: Ido Schimmel @ 2019-08-13 7:53 UTC (permalink / raw)
To: netdev
Cc: davem, nhorman, jiri, toke, dsahern, roopa, nikolay,
jakub.kicinski, andy, f.fainelli, andrew, vivien.didelot, mlxsw,
Ido Schimmel
In-Reply-To: <20190813075400.11841-1-idosch@idosch.org>
From: Ido Schimmel <idosch@mellanox.com>
Have netdevsim register its trap groups and traps with devlink during
initialization and periodically report trapped packets to devlink core.
Since netdevsim is not a real device, the trapped packets are emulated
using a workqueue that periodically reports a UDP packet with a random
5-tuple from each active packet trap and from each running netdev.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/netdevsim/dev.c | 278 +++++++++++++++++++++++++++++-
drivers/net/netdevsim/netdevsim.h | 1 +
2 files changed, 278 insertions(+), 1 deletion(-)
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 08ca59fc189b..2758d95c8d18 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -17,11 +17,21 @@
#include <linux/debugfs.h>
#include <linux/device.h>
+#include <linux/etherdevice.h>
+#include <linux/inet.h>
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/mutex.h>
#include <linux/random.h>
+#include <linux/workqueue.h>
+#include <linux/random.h>
#include <linux/rtnetlink.h>
#include <net/devlink.h>
+#include <net/ip.h>
+#include <uapi/linux/devlink.h>
+#include <uapi/linux/ip.h>
+#include <uapi/linux/udp.h>
#include "netdevsim.h"
@@ -248,6 +258,213 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
nsim_dev->test1 = saved_value.vbool;
}
+struct nsim_trap_item {
+ void *trap_ctx;
+ enum devlink_trap_action action;
+};
+
+struct nsim_trap_data {
+ struct delayed_work trap_report_dw;
+ struct nsim_trap_item *trap_items_arr;
+ struct nsim_dev *nsim_dev;
+ spinlock_t trap_lock; /* Protects trap_items_arr */
+};
+
+enum {
+ NSIM_TRAP_ID_BASE = DEVLINK_TRAP_GENERIC_ID_MAX,
+ NSIM_TRAP_ID_FID_MISS,
+};
+
+#define NSIM_TRAP_NAME_FID_MISS "fid_miss"
+
+#define NSIM_TRAP_METADATA DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT
+
+#define NSIM_TRAP_DROP(_id, _group_id) \
+ DEVLINK_TRAP_GENERIC(DROP, DROP, _id, \
+ DEVLINK_TRAP_GROUP_GENERIC(_group_id), \
+ NSIM_TRAP_METADATA)
+#define NSIM_TRAP_EXCEPTION(_id, _group_id) \
+ DEVLINK_TRAP_GENERIC(EXCEPTION, TRAP, _id, \
+ DEVLINK_TRAP_GROUP_GENERIC(_group_id), \
+ NSIM_TRAP_METADATA)
+#define NSIM_TRAP_DRIVER_EXCEPTION(_id, _group_id) \
+ DEVLINK_TRAP_DRIVER(EXCEPTION, TRAP, NSIM_TRAP_ID_##_id, \
+ NSIM_TRAP_NAME_##_id, \
+ DEVLINK_TRAP_GROUP_GENERIC(_group_id), \
+ NSIM_TRAP_METADATA)
+
+static const struct devlink_trap nsim_traps_arr[] = {
+ NSIM_TRAP_DROP(SMAC_MC, L2_DROPS),
+ NSIM_TRAP_DROP(VLAN_TAG_MISMATCH, L2_DROPS),
+ NSIM_TRAP_DROP(INGRESS_VLAN_FILTER, L2_DROPS),
+ NSIM_TRAP_DROP(INGRESS_STP_FILTER, L2_DROPS),
+ NSIM_TRAP_DROP(EMPTY_TX_LIST, L2_DROPS),
+ NSIM_TRAP_DROP(PORT_LOOPBACK_FILTER, L2_DROPS),
+ NSIM_TRAP_DRIVER_EXCEPTION(FID_MISS, L2_DROPS),
+ NSIM_TRAP_DROP(BLACKHOLE_ROUTE, L3_DROPS),
+ NSIM_TRAP_EXCEPTION(TTL_ERROR, L3_DROPS),
+ NSIM_TRAP_DROP(TAIL_DROP, BUFFER_DROPS),
+};
+
+#define NSIM_TRAP_L4_DATA_LEN 100
+
+static struct sk_buff *nsim_dev_trap_skb_build(void)
+{
+ int tot_len, data_len = NSIM_TRAP_L4_DATA_LEN;
+ struct sk_buff *skb;
+ struct udphdr *udph;
+ struct ethhdr *eth;
+ struct iphdr *iph;
+
+ skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
+ if (!skb)
+ return NULL;
+ tot_len = sizeof(struct iphdr) + sizeof(struct udphdr) + data_len;
+
+ eth = skb_put(skb, sizeof(struct ethhdr));
+ eth_random_addr(eth->h_dest);
+ eth_random_addr(eth->h_source);
+ eth->h_proto = htons(ETH_P_IP);
+ skb->protocol = htons(ETH_P_IP);
+
+ iph = skb_put(skb, sizeof(struct iphdr));
+ iph->protocol = IPPROTO_UDP;
+ iph->saddr = in_aton("192.0.2.1");
+ iph->daddr = in_aton("198.51.100.1");
+ iph->version = 0x4;
+ iph->frag_off = 0;
+ iph->ihl = 0x5;
+ iph->tot_len = htons(tot_len);
+ iph->ttl = 100;
+ ip_send_check(iph);
+
+ udph = skb_put_zero(skb, sizeof(struct udphdr) + data_len);
+ get_random_bytes(&udph->source, sizeof(u16));
+ get_random_bytes(&udph->dest, sizeof(u16));
+ udph->len = htons(sizeof(struct udphdr) + data_len);
+
+ return skb;
+}
+
+static void nsim_dev_trap_report(struct nsim_dev_port *nsim_dev_port)
+{
+ struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+ struct nsim_trap_data *nsim_trap_data = nsim_dev->trap_data;
+ struct devlink *devlink = priv_to_devlink(nsim_dev);
+ int i;
+
+ spin_lock(&nsim_trap_data->trap_lock);
+ for (i = 0; i < ARRAY_SIZE(nsim_traps_arr); i++) {
+ struct nsim_trap_item *nsim_trap_item;
+ struct sk_buff *skb;
+
+ nsim_trap_item = &nsim_trap_data->trap_items_arr[i];
+ if (nsim_trap_item->action == DEVLINK_TRAP_ACTION_DROP)
+ continue;
+
+ skb = nsim_dev_trap_skb_build();
+ if (!skb)
+ continue;
+ skb->dev = nsim_dev_port->ns->netdev;
+
+ /* Trapped packets are usually passed to devlink in softIRQ,
+ * but in this case they are generated in a workqueue. Disable
+ * softIRQs to prevent lockdep from complaining about
+ * "incosistent lock state".
+ */
+ local_bh_disable();
+ devlink_trap_report(devlink, skb, nsim_trap_item->trap_ctx,
+ &nsim_dev_port->devlink_port);
+ local_bh_enable();
+ consume_skb(skb);
+ }
+ spin_unlock(&nsim_trap_data->trap_lock);
+}
+
+#define NSIM_TRAP_REPORT_INTERVAL_MS 100
+
+static void nsim_dev_trap_report_work(struct work_struct *work)
+{
+ struct nsim_trap_data *nsim_trap_data;
+ struct nsim_dev_port *nsim_dev_port;
+ struct nsim_dev *nsim_dev;
+
+ nsim_trap_data = container_of(work, struct nsim_trap_data,
+ trap_report_dw.work);
+ nsim_dev = nsim_trap_data->nsim_dev;
+
+ /* For each running port and enabled packet trap, generate a UDP
+ * packet with a random 5-tuple and report it.
+ */
+ mutex_lock(&nsim_dev->port_list_lock);
+ list_for_each_entry(nsim_dev_port, &nsim_dev->port_list, list) {
+ if (!netif_running(nsim_dev_port->ns->netdev))
+ continue;
+
+ nsim_dev_trap_report(nsim_dev_port);
+ }
+ mutex_unlock(&nsim_dev->port_list_lock);
+
+ schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw,
+ msecs_to_jiffies(NSIM_TRAP_REPORT_INTERVAL_MS));
+}
+
+static int nsim_dev_traps_init(struct devlink *devlink)
+{
+ struct nsim_dev *nsim_dev = devlink_priv(devlink);
+ struct nsim_trap_data *nsim_trap_data;
+ int err;
+
+ nsim_trap_data = kzalloc(sizeof(*nsim_trap_data), GFP_KERNEL);
+ if (!nsim_trap_data)
+ return -ENOMEM;
+
+ nsim_trap_data->trap_items_arr = kcalloc(ARRAY_SIZE(nsim_traps_arr),
+ sizeof(struct nsim_trap_item),
+ GFP_KERNEL);
+ if (!nsim_trap_data->trap_items_arr) {
+ err = -ENOMEM;
+ goto err_trap_data_free;
+ }
+
+ /* The lock is used to protect the action state of the registered
+ * traps. The value is written by user and read in delayed work when
+ * iterating over all the traps.
+ */
+ spin_lock_init(&nsim_trap_data->trap_lock);
+ nsim_trap_data->nsim_dev = nsim_dev;
+ nsim_dev->trap_data = nsim_trap_data;
+
+ err = devlink_traps_register(devlink, nsim_traps_arr,
+ ARRAY_SIZE(nsim_traps_arr), NULL);
+ if (err)
+ goto err_trap_items_free;
+
+ INIT_DELAYED_WORK(&nsim_dev->trap_data->trap_report_dw,
+ nsim_dev_trap_report_work);
+ schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw,
+ msecs_to_jiffies(NSIM_TRAP_REPORT_INTERVAL_MS));
+
+ return 0;
+
+err_trap_items_free:
+ kfree(nsim_trap_data->trap_items_arr);
+err_trap_data_free:
+ kfree(nsim_trap_data);
+ return err;
+}
+
+static void nsim_dev_traps_exit(struct devlink *devlink)
+{
+ struct nsim_dev *nsim_dev = devlink_priv(devlink);
+
+ cancel_delayed_work_sync(&nsim_dev->trap_data->trap_report_dw);
+ devlink_traps_unregister(devlink, nsim_traps_arr,
+ ARRAY_SIZE(nsim_traps_arr));
+ kfree(nsim_dev->trap_data->trap_items_arr);
+ kfree(nsim_dev->trap_data);
+}
+
static int nsim_dev_reload(struct devlink *devlink,
struct netlink_ext_ack *extack)
{
@@ -315,9 +532,61 @@ static int nsim_dev_flash_update(struct devlink *devlink, const char *file_name,
return 0;
}
+static struct nsim_trap_item *
+nsim_dev_trap_item_lookup(struct nsim_dev *nsim_dev, u16 trap_id)
+{
+ struct nsim_trap_data *nsim_trap_data = nsim_dev->trap_data;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(nsim_traps_arr); i++) {
+ if (nsim_traps_arr[i].id == trap_id)
+ return &nsim_trap_data->trap_items_arr[i];
+ }
+
+ return NULL;
+}
+
+static int nsim_dev_devlink_trap_init(struct devlink *devlink,
+ const struct devlink_trap *trap,
+ void *trap_ctx)
+{
+ struct nsim_dev *nsim_dev = devlink_priv(devlink);
+ struct nsim_trap_item *nsim_trap_item;
+
+ nsim_trap_item = nsim_dev_trap_item_lookup(nsim_dev, trap->id);
+ if (WARN_ON(!nsim_trap_item))
+ return -ENOENT;
+
+ nsim_trap_item->trap_ctx = trap_ctx;
+ nsim_trap_item->action = trap->init_action;
+
+ return 0;
+}
+
+static int
+nsim_dev_devlink_trap_action_set(struct devlink *devlink,
+ const struct devlink_trap *trap,
+ enum devlink_trap_action action)
+{
+ struct nsim_dev *nsim_dev = devlink_priv(devlink);
+ struct nsim_trap_item *nsim_trap_item;
+
+ nsim_trap_item = nsim_dev_trap_item_lookup(nsim_dev, trap->id);
+ if (WARN_ON(!nsim_trap_item))
+ return -ENOENT;
+
+ spin_lock(&nsim_dev->trap_data->trap_lock);
+ nsim_trap_item->action = action;
+ spin_unlock(&nsim_dev->trap_data->trap_lock);
+
+ return 0;
+}
+
static const struct devlink_ops nsim_dev_devlink_ops = {
.reload = nsim_dev_reload,
.flash_update = nsim_dev_flash_update,
+ .trap_init = nsim_dev_devlink_trap_init,
+ .trap_action_set = nsim_dev_devlink_trap_action_set,
};
#define NSIM_DEV_MAX_MACS_DEFAULT 32
@@ -363,10 +632,14 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
goto err_dl_unregister;
nsim_devlink_set_params_init_values(nsim_dev, devlink);
- err = nsim_dev_debugfs_init(nsim_dev);
+ err = nsim_dev_traps_init(devlink);
if (err)
goto err_params_unregister;
+ err = nsim_dev_debugfs_init(nsim_dev);
+ if (err)
+ goto err_traps_exit;
+
err = nsim_bpf_dev_init(nsim_dev);
if (err)
goto err_debugfs_exit;
@@ -376,6 +649,8 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
err_debugfs_exit:
nsim_dev_debugfs_exit(nsim_dev);
+err_traps_exit:
+ nsim_dev_traps_exit(devlink);
err_params_unregister:
devlink_params_unregister(devlink, nsim_devlink_params,
ARRAY_SIZE(nsim_devlink_params));
@@ -396,6 +671,7 @@ static void nsim_dev_destroy(struct nsim_dev *nsim_dev)
nsim_bpf_dev_exit(nsim_dev);
nsim_dev_debugfs_exit(nsim_dev);
+ nsim_dev_traps_exit(devlink);
devlink_params_unregister(devlink, nsim_devlink_params,
ARRAY_SIZE(nsim_devlink_params));
devlink_unregister(devlink);
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 95751a817508..d8207ac85562 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -145,6 +145,7 @@ struct nsim_dev_port {
struct nsim_dev {
struct nsim_bus_dev *nsim_bus_dev;
struct nsim_fib_data *fib_data;
+ struct nsim_trap_data *trap_data;
struct dentry *ddir;
struct dentry *ports_ddir;
struct bpf_offload_dev *bpf_dev;
--
2.21.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox