Netdev List
 help / color / mirror / Atom feed
* [PATCH] brcmfmac: btcoex: replace init_timer with setup_timer
From: Xie Qirong @ 2017-05-03 11:27 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	brcm80211-dev-list.pdl-dY08KVG/lbpWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Piotr Haber, Xie Qirong

setup_timer.cocci suggested the following improvement:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/btcoex.c:383:1-11: Use
setup_timer function for function on line 384.

Signed-off-by: Xie Qirong <cheerx1994-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---

 Patch was compile checked with: x86_64_defconfig + CONFIG_BRCMFMAC=y +
 CONFIG_BRCMFMAC_USB=y + CONFIG_BRCMFMAC_PCIE=y + CONFIG_BRCM_TRACING=y +
 CONFIG_BRCMDBG=y
 
 Kernel version: next-20170502 (localversion-next is next-20170502)

 Thanks for Arend's advice

 drivers/net/wireless/broadcom/brcm80211/brcmfmac/btcoex.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/btcoex.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/btcoex.c
index 14a70d4..3559fb5 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/btcoex.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/btcoex.c
@@ -380,9 +380,7 @@ int brcmf_btcoex_attach(struct brcmf_cfg80211_info *cfg)
 	/* Set up timer for BT  */
 	btci->timer_on = false;
 	btci->timeout = BRCMF_BTCOEX_OPPR_WIN_TIME;
-	init_timer(&btci->timer);
-	btci->timer.data = (ulong)btci;
-	btci->timer.function = brcmf_btcoex_timerfunc;
+	setup_timer(&btci->timer, brcmf_btcoex_timerfunc, (ulong)btci);
 	btci->cfg = cfg;
 	btci->saved_regs_part1 = false;
 	btci->saved_regs_part2 = false;
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net v3] driver: veth: Fix one possbile memleak when fail to register_netdevice
From: Xin Long @ 2017-05-03 11:25 UTC (permalink / raw)
  To: Gao Feng; +Cc: Gao Feng, davem, jarod, Stephen Hemminger, dsa, network dev
In-Reply-To: <006101d2c3d7$bf814230$3e83c690$@foxmail.com>

On Wed, May 3, 2017 at 2:37 PM, Gao Feng <gfree.wind@foxmail.com> wrote:
>> From: Xin Long [mailto:lucien.xin@gmail.com]
>> Sent: Wednesday, May 3, 2017 1:38 PM
>> On Wed, May 3, 2017 at 10:07 AM, Gao Feng <gfree.wind@foxmail.com>
>> wrote:
>> >> From: netdev-owner@vger.kernel.org
>> >> [mailto:netdev-owner@vger.kernel.org]
>> >> On Behalf Of Xin Long
>> >> Sent: Wednesday, May 3, 2017 12:59 AM On Tue, May 2, 2017 at 7:03 PM,
>> >> Gao Feng <gfree.wind@vip.163.com> wrote:
>> >> >> From: Xin Long [mailto:lucien.xin@gmail.com]
>> >> >> Sent: Tuesday, May 2, 2017 3:56 PM On Sat, Apr 29, 2017 at 11:51
>> >> >> AM,  <gfree.wind@foxmail.com> wrote:
>> >> >> > From: Gao Feng <gfree.wind@foxmail.com>
> [...]
>> > The fix you mentioned change the original logic.
>> > The dev->vstats is freed in advance in the ndo_uninit, not destructor.
>> > It may break the backward.
>> Sorry, I didn't get your "backward"
>> I can't see there will be any problem caused by it.
>> can you say this patch also break the 'backward' ?
>> https://patchwork.ozlabs.org/patch/748964/
>>
>> It's really weird to do dev->reg_state check in ndo_unint ndo_unint is supposed
>> to free the memory alloced in ndo_init.
>>
>
> I am not sure if it would break the backward, so I said it MAY break.
> I assumed there may be someone would access the dev->vstats after ndo_uninit,
> because current veth driver free the mem in the destructor.
> I selected this approach because I don't want to bring new bugs during fix bug.
>
> If you're sure it is safe to free dev->vstats in ndo_uninit, I would like to update it.
yes, stats are accessed in .ndo_start_xmit waited by synchronize_net() and
.ndo_get_stats64 protected by rtnl_lock().


>
> BTW there are too many drivers which have possible memleak.
> You could find the list by https://www.mail-archive.com/netdev@vger.kernel.org/msg166629.html.
ah, cool.
I'm not sure about other dev's stuff, have to check them for sure later.

>
> Some drivers allocate the resources in ndo_init, free some in ndo_uninit and free left in destructor.
> I think there are some reasons.
> We could not move all free in the ndo_uninit from destructor. What's your opinion?
>
> Best Regards
> Feng
>
>
>

^ permalink raw reply

* [patch iproute2 v2 2/2] devlink: Add support for pipeline debug (dpipe)
From: Jiri Pirko @ 2017-05-03 11:25 UTC (permalink / raw)
  To: netdev; +Cc: arkadis, stephen, davem, mlxsw, dsa
In-Reply-To: <1493810723-2536-1-git-send-email-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Add support for pipeline debug (dpipe). The headers are used both the
gain visibillity into the headers supported by the hardware, and to
build the headers/field database which is used by other commands.

Examples:

First we can see the headers supported by the hardware:

$devlink dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta
  field:
    name erif_port bitwidth 32 mapping_type ifindex
    name l3_forward bitwidth 1
    name l3_drop bitwidth 1

Note that mapping_type is presented only if relevant. Also the header/
field id's are reported by the kernel they are not shown by default.
They can be observed by using the -v option. Also the headers scope
(global/local) is specified.

$devlink -v dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta id 0 global false
  field:
    name erif_port id 0 bitwidth 32 mapping_type ifindex
    name l3_forward id 1 bitwidth 1
    name l3_drop id 2 bitwidth 1

Second we can examine the tables supported by the hardware. In order
to dump all the tables no table name should be provided:
$devlink dpipe table show pci/0000:03:00.0

In order to examine specific table its name have to be specified:
$devlink dpipe table show pci/0000:03:00.0 name erif

pci/0000:03:00.0:
  name mlxsw_erif size 800 counters_enabled true
  match:
    type field_exact header mlxsw_meta field erif_port mapping ifindex
  action:
    type field_modify header mlxsw_meta field l3_forward
    type field_modify header mlxsw_meta field l3_drop

To enable/disable counters on the table:
$devlink dpipe table set pci/0000:03:00.0 name erif counters enable
$devlink dpipe table set pci/0000:03:00.0 name erif counters disable

In order to see the current entries in the hardware for specific table:
$devlink dpipe table dump pci/0000:03:00.0 name erif

pci/0000:03:00.0:
  index 0 counter 0
  match_value:
    type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 383 value 0
  action_value:
    type field_modify header mlxsw_meta field l3_forward value 1

  index 1 counter 0
  match_value:
    type field_exact header mlxsw_meta field erif_port mapping ifindex mapping_value 381 value 1
  action_value:
    type field_modify header mlxsw_meta field l3_forward value 1

In the above example the table contains two entries which does match
on erif port and forwards the packet or drop it (currently only the
forward count is implemented). The counter values are provided for
example. In case the counting is not enabled on the table the counters
will not be available.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 1353 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 1254 insertions(+), 99 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 35220d8..e22ee0a 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -34,7 +34,15 @@
 #define ESWITCH_INLINE_MODE_TRANSPORT "transport"
 
 #define pr_err(args...) fprintf(stderr, ##args)
-#define pr_out(args...) fprintf(stdout, ##args)
+#define pr_out(args...)						\
+	do {							\
+		if (g_indent_newline) {				\
+			fprintf(stdout, "%s", g_indent_str);	\
+			g_indent_newline = false;		\
+		}						\
+		fprintf(stdout, ##args);			\
+	} while (0)
+
 #define pr_out_sp(num, args...)					\
 	do {							\
 		int ret = fprintf(stdout, ##args);		\
@@ -42,6 +50,35 @@
 			fprintf(stdout, "%*s", num - ret, "");	\
 	} while (0)
 
+static int g_indent_level;
+static bool g_indent_newline;
+#define INDENT_STR_STEP 2
+#define INDENT_STR_MAXLEN 32
+static char g_indent_str[INDENT_STR_MAXLEN + 1] = "";
+
+static void __pr_out_indent_inc(void)
+{
+	if (g_indent_level + INDENT_STR_STEP > INDENT_STR_MAXLEN)
+		return;
+	g_indent_level += INDENT_STR_STEP;
+	memset(g_indent_str, ' ', sizeof(g_indent_str));
+	g_indent_str[g_indent_level] = '\0';
+}
+
+static void __pr_out_indent_dec(void)
+{
+	if (g_indent_level - INDENT_STR_STEP < 0)
+		return;
+	g_indent_level -= INDENT_STR_STEP;
+	g_indent_str[g_indent_level] = '\0';
+}
+
+static void __pr_out_newline(void)
+{
+	pr_out("\n");
+	g_indent_newline = true;
+}
+
 static int _mnlg_socket_recv_run(struct mnlg_socket *nlg,
 				 mnl_cb_t data_cb, void *data)
 {
@@ -137,6 +174,8 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_SB_TC		BIT(10)
 #define DL_OPT_ESWITCH_MODE	BIT(11)
 #define DL_OPT_ESWITCH_INLINE_MODE	BIT(12)
+#define DL_OPT_DPIPE_TABLE_NAME	BIT(13)
+#define DL_OPT_DPIPE_TABLE_COUNTERS	BIT(14)
 
 struct dl_opts {
 	uint32_t present; /* flags of present items */
@@ -154,6 +193,8 @@ struct dl_opts {
 	uint16_t sb_tc_index;
 	enum devlink_eswitch_mode eswitch_mode;
 	enum devlink_eswitch_inline_mode eswitch_inline_mode;
+	const char *dpipe_table_name;
+	bool dpipe_counters_enable;
 };
 
 struct dl {
@@ -166,6 +207,7 @@ struct dl {
 	json_writer_t *jw;
 	bool json_output;
 	bool pretty_output;
+	bool verbose;
 	struct {
 		bool present;
 		char *bus_name;
@@ -257,6 +299,38 @@ static const enum mnl_attr_data_type devlink_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_SB_OCC_MAX] = MNL_TYPE_U32,
 	[DEVLINK_ATTR_ESWITCH_MODE] = MNL_TYPE_U16,
 	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_DPIPE_TABLES] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_TABLE] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_TABLE_NAME] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_DPIPE_TABLE_SIZE] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_DPIPE_TABLE_MATCHES] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_TABLE_ACTIONS] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] =  MNL_TYPE_U8,
+	[DEVLINK_ATTR_DPIPE_ENTRIES] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_ENTRY] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_ENTRY_INDEX] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_DPIPE_ENTRY_MATCH_VALUES] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_ENTRY_ACTION_VALUES] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_ENTRY_COUNTER] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_DPIPE_MATCH] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_MATCH_VALUE] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_MATCH_TYPE] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_DPIPE_ACTION] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_ACTION_VALUE] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_ACTION_TYPE] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_DPIPE_VALUE_MAPPING] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_DPIPE_HEADERS] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_HEADER] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_HEADER_NAME] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_DPIPE_HEADER_ID] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_DPIPE_HEADER_FIELDS] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_HEADER_GLOBAL] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_DPIPE_HEADER_INDEX] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_DPIPE_FIELD] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_DPIPE_FIELD_NAME] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_DPIPE_FIELD_ID] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE] = MNL_TYPE_U32,
 };
 
 static int attr_cb(const struct nlattr *attr, void *data)
@@ -666,6 +740,20 @@ static int eswitch_inline_mode_get(const char *typestr,
 	return 0;
 }
 
+static int dpipe_counters_enable_get(const char *typestr,
+				     bool *counters_enable)
+{
+	if (strcmp(typestr, "enable") == 0) {
+		*counters_enable = 1;
+	} else if (strcmp(typestr, "disable") == 0) {
+		*counters_enable = 0;
+	} else {
+		pr_err("Unknown counter_state \"%s\"\n", typestr);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			 uint32_t o_optional)
 {
@@ -800,6 +888,27 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_ESWITCH_INLINE_MODE;
+		} else if (dl_argv_match(dl, "name") &&
+			   (o_all & DL_OPT_DPIPE_TABLE_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->dpipe_table_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_DPIPE_TABLE_NAME;
+		} else if (dl_argv_match(dl, "counters") &&
+			   (o_all & DL_OPT_DPIPE_TABLE_COUNTERS)) {
+			const char *typestr;
+
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &typestr);
+			if (err)
+				return err;
+			err = dpipe_counters_enable_get(typestr,
+							&opts->dpipe_counters_enable);
+			if (err)
+				return err;
+			o_found |= DL_OPT_DPIPE_TABLE_COUNTERS;
+
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -866,6 +975,17 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 		return -EINVAL;
 	}
 
+	if ((o_required & DL_OPT_DPIPE_TABLE_NAME) &&
+	    !(o_found & DL_OPT_DPIPE_TABLE_NAME)) {
+		pr_err("Dpipe table name expected\n");
+		return -EINVAL;
+	}
+
+	if ((o_required & DL_OPT_DPIPE_TABLE_COUNTERS) &&
+	    !(o_found & DL_OPT_DPIPE_TABLE_COUNTERS)) {
+		pr_err("Dpipe table counter state expected\n");
+		return -EINVAL;
+	}
 	return 0;
 }
 
@@ -915,6 +1035,12 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_ESWITCH_INLINE_MODE)
 		mnl_attr_put_u8(nlh, DEVLINK_ATTR_ESWITCH_INLINE_MODE,
 				opts->eswitch_inline_mode);
+	if (opts->present & DL_OPT_DPIPE_TABLE_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_DPIPE_TABLE_NAME,
+				  opts->dpipe_table_name);
+	if (opts->present & DL_OPT_DPIPE_TABLE_COUNTERS)
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED,
+				opts->dpipe_counters_enable);
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -1033,7 +1159,19 @@ static void __pr_out_handle_start(struct dl *dl, struct nlattr **tb,
 			jsonw_start_object(dl->jw);
 		}
 	} else {
-		pr_out("%s%s", buf, content ? ":" : "");
+		if (array) {
+			if (should_arr_last_handle_end(dl, bus_name, dev_name))
+				__pr_out_indent_dec();
+			if (should_arr_last_handle_start(dl, bus_name,
+							 dev_name)) {
+				pr_out("%s%s", buf, content ? ":" : "");
+				__pr_out_newline();
+				__pr_out_indent_inc();
+				arr_last_handle_set(dl, bus_name, dev_name);
+			}
+		} else {
+			pr_out("%s%s", buf, content ? ":" : "");
+		}
 	}
 }
 
@@ -1047,7 +1185,7 @@ static void pr_out_handle_end(struct dl *dl)
 	if (dl->json_output)
 		jsonw_end_object(dl->jw);
 	else
-		pr_out("\n");
+		__pr_out_newline();
 }
 
 static void pr_out_handle(struct dl *dl, struct nlattr **tb)
@@ -1163,18 +1301,26 @@ static void pr_out_port_handle_end(struct dl *dl)
 
 static void pr_out_str(struct dl *dl, const char *name, const char *val)
 {
-	if (dl->json_output)
+	if (dl->json_output) {
 		jsonw_string_field(dl->jw, name, val);
-	else
-		pr_out(" %s %s", name, val);
+	} else {
+		if (g_indent_newline)
+			pr_out("%s %s", name, val);
+		else
+			pr_out(" %s %s", name, val);
+	}
 }
 
 static void pr_out_uint(struct dl *dl, const char *name, unsigned int val)
 {
-	if (dl->json_output)
+	if (dl->json_output) {
 		jsonw_uint_field(dl->jw, name, val);
-	else
-		pr_out(" %s %u", name, val);
+	} else {
+		if (g_indent_newline)
+			pr_out("%s %u", name, val);
+		else
+			pr_out(" %s %u", name, val);
+	}
 }
 
 static void pr_out_dev(struct dl *dl, struct nlattr **tb)
@@ -1201,6 +1347,42 @@ static void pr_out_section_end(struct dl *dl)
 	}
 }
 
+static void pr_out_array_start(struct dl *dl, const char *name)
+{
+	if (dl->json_output) {
+		jsonw_name(dl->jw, name);
+		jsonw_start_array(dl->jw);
+	} else {
+		if (!g_indent_newline)
+			__pr_out_newline();
+		pr_out("%s:", name);
+		__pr_out_newline();
+		__pr_out_indent_inc();
+	}
+}
+
+static void pr_out_array_end(struct dl *dl)
+{
+	if (dl->json_output)
+		jsonw_end_array(dl->jw);
+	else
+		__pr_out_indent_dec();
+}
+
+static void pr_out_entry_start(struct dl *dl)
+{
+	if (dl->json_output)
+		jsonw_start_object(dl->jw);
+}
+
+static void pr_out_entry_end(struct dl *dl)
+{
+	if (dl->json_output)
+		jsonw_end_object(dl->jw);
+	else
+		__pr_out_newline();
+}
+
 static const char *eswitch_mode_name(uint32_t mode)
 {
 	switch (mode) {
@@ -2423,129 +2605,1102 @@ static int cmd_mon(struct dl *dl)
 	return -ENOENT;
 }
 
-static void help(void)
+struct dpipe_field {
+	char *name;
+	unsigned int id;
+	unsigned int bitwidth;
+	enum devlink_dpipe_field_mapping_type mapping_type;
+};
+
+struct dpipe_header {
+	struct list_head list;
+	char *name;
+	unsigned int id;
+	struct dpipe_field *fields;
+	unsigned int fields_count;
+};
+
+struct dpipe_ctx {
+	struct dl *dl;
+	int err;
+	struct list_head global_headers;
+	struct list_head local_headers;
+	bool print_headers;
+};
+
+static struct dpipe_header *dpipe_header_alloc(unsigned int fields_count)
 {
-	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
-	       "where  OBJECT := { dev | port | sb | monitor }\n"
-	       "       OPTIONS := { -V[ersion] | -n[no-nice-names] | -j[json] | -p[pretty] }\n");
+	struct dpipe_header *header;
+
+	header = calloc(1, sizeof(struct dpipe_header));
+	if (!header)
+		return NULL;
+	header->fields = calloc(fields_count, sizeof(struct dpipe_field));
+	if (!header->fields)
+		goto err_fields_alloc;
+	header->fields_count = fields_count;
+	return header;
+
+err_fields_alloc:
+	free(header);
+	return NULL;
 }
 
-static int dl_cmd(struct dl *dl)
+static void dpipe_header_free(struct dpipe_header *header)
 {
-	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
-		help();
-		return 0;
-	} else if (dl_argv_match(dl, "dev")) {
-		dl_arg_inc(dl);
-		return cmd_dev(dl);
-	} else if (dl_argv_match(dl, "port")) {
-		dl_arg_inc(dl);
-		return cmd_port(dl);
-	} else if (dl_argv_match(dl, "sb")) {
-		dl_arg_inc(dl);
-		return cmd_sb(dl);
-	} else if (dl_argv_match(dl, "monitor")) {
-		dl_arg_inc(dl);
-		return cmd_mon(dl);
+	free(header->fields);
+	free(header);
+}
+
+static void dpipe_header_clear(struct dpipe_header *header)
+{
+	struct dpipe_field *field;
+	int i;
+
+	for (i = 0; i < header->fields_count; i++) {
+		field = &header->fields[i];
+		free(field->name);
 	}
-	pr_err("Object \"%s\" not found\n", dl_argv(dl));
-	return -ENOENT;
+	free(header->name);
 }
 
-static int dl_init(struct dl *dl, int argc, char **argv)
+static void dpipe_header_add(struct dpipe_ctx *ctx,
+			     struct dpipe_header *header, bool global)
 {
-	int err;
+	if (global)
+		list_add(&header->list, &ctx->global_headers);
+	else
+		list_add(&header->list, &ctx->local_headers);
+}
 
-	dl->argc = argc;
-	dl->argv = argv;
+static void dpipe_header_del(struct dpipe_header *header)
+{
+	list_del(&header->list);
+}
 
-	dl->nlg = mnlg_socket_open(DEVLINK_GENL_NAME, DEVLINK_GENL_VERSION);
-	if (!dl->nlg) {
-		pr_err("Failed to connect to devlink Netlink\n");
-		return -errno;
+static struct dpipe_ctx *dpipe_ctx_alloc(struct dl *dl)
+{
+	struct dpipe_ctx *ctx;
+
+	ctx = calloc(1, sizeof(struct dpipe_ctx));
+	if (!ctx)
+		return NULL;
+	ctx->dl = dl;
+	INIT_LIST_HEAD(&ctx->global_headers);
+	INIT_LIST_HEAD(&ctx->local_headers);
+	return ctx;
+}
+
+static void dpipe_ctx_free(struct dpipe_ctx *ctx)
+{
+	free(ctx);
+}
+
+static void dpipe_ctx_clear(struct dpipe_ctx *ctx)
+{
+	struct dpipe_header *header, *tmp;
+
+	list_for_each_entry_safe(header, tmp, &ctx->global_headers,
+				 list) {
+		dpipe_header_del(header);
+		dpipe_header_clear(header);
+		dpipe_header_free(header);
+	}
+	list_for_each_entry_safe(header, tmp, &ctx->local_headers,
+				 list) {
+		dpipe_header_del(header);
+		dpipe_header_clear(header);
+		dpipe_header_free(header);
 	}
+}
 
-	err = ifname_map_init(dl);
-	if (err) {
-		pr_err("Failed to create index map\n");
-		goto err_ifname_map_create;
+static const char *dpipe_header_id2s(struct dpipe_ctx *ctx,
+				     uint32_t header_id, bool global)
+{
+	struct list_head *header_list;
+	struct dpipe_header *header;
+
+	if (global)
+		header_list = &ctx->global_headers;
+	else
+		header_list = &ctx->local_headers;
+	list_for_each_entry(header, header_list, list) {
+		if (header->id != header_id)
+			continue;
+		return header->name;
 	}
-	if (dl->json_output) {
-		dl->jw = jsonw_new(stdout);
-		if (!dl->jw) {
-			pr_err("Failed to create JSON writer\n");
-			goto err_json_new;
-		}
-		jsonw_pretty(dl->jw, dl->pretty_output);
+	return NULL;
+}
+
+static const char *dpipe_field_id2s(struct dpipe_ctx *ctx,
+				    uint32_t header_id,
+				    uint32_t field_id, bool global)
+{
+	struct list_head *header_list;
+	struct dpipe_header *header;
+
+	if (global)
+		header_list = &ctx->global_headers;
+	else
+		header_list = &ctx->local_headers;
+	list_for_each_entry(header, header_list, list) {
+		if (header->id != header_id)
+			continue;
+		return header->fields[field_id].name;
 	}
-	return 0;
+	return NULL;
+}
 
-err_json_new:
-	ifname_map_fini(dl);
-err_ifname_map_create:
-	mnlg_socket_close(dl->nlg);
-	return err;
+static const char *
+dpipe_field_mapping_e2s(enum devlink_dpipe_field_mapping_type mapping_type)
+{
+	switch (mapping_type) {
+	case DEVLINK_DPIPE_FIELD_MAPPING_TYPE_NONE:
+		return NULL;
+	case DEVLINK_DPIPE_FIELD_MAPPING_TYPE_IFINDEX:
+		return "ifindex";
+	default:
+		return "<unknown>";
+	}
 }
 
-static void dl_fini(struct dl *dl)
+static const char *
+dpipe_mapping_get(struct dpipe_ctx *ctx, uint32_t header_id,
+		  uint32_t field_id, bool global)
 {
-	if (dl->json_output)
-		jsonw_destroy(&dl->jw);
-	ifname_map_fini(dl);
-	mnlg_socket_close(dl->nlg);
+	enum devlink_dpipe_field_mapping_type mapping_type;
+	struct list_head *header_list;
+	struct dpipe_header *header;
+
+	if (global)
+		header_list = &ctx->global_headers;
+	else
+		header_list = &ctx->local_headers;
+	list_for_each_entry(header, header_list, list) {
+		if (header->id != header_id)
+			continue;
+		mapping_type = header->fields[field_id].mapping_type;
+		return dpipe_field_mapping_e2s(mapping_type);
+	}
+	return NULL;
 }
 
-static struct dl *dl_alloc(void)
+static void pr_out_dpipe_fields(struct dpipe_ctx *ctx,
+				struct dpipe_field *fields,
+				unsigned int field_count)
 {
-	struct dl *dl;
+	struct dpipe_field *field;
+	int i;
 
-	dl = calloc(1, sizeof(*dl));
-	if (!dl)
-		return NULL;
-	return dl;
+	for (i = 0; i < field_count; i++) {
+		field = &fields[i];
+		pr_out_entry_start(ctx->dl);
+		pr_out_str(ctx->dl, "name", field->name);
+		if (ctx->dl->verbose)
+			pr_out_uint(ctx->dl, "id", field->id);
+		pr_out_uint(ctx->dl, "bitwidth", field->bitwidth);
+		if (field->mapping_type)
+			pr_out_str(ctx->dl, "mapping_type",
+				   dpipe_field_mapping_e2s(field->mapping_type));
+		pr_out_entry_end(ctx->dl);
+	}
 }
 
-static void dl_free(struct dl *dl)
+static void
+pr_out_dpipe_header(struct dpipe_ctx *ctx, struct nlattr **tb,
+		    struct dpipe_header *header, bool global)
 {
-	free(dl);
+	pr_out_handle_start_arr(ctx->dl, tb);
+	pr_out_str(ctx->dl, "name", header->name);
+	if (ctx->dl->verbose) {
+		pr_out_uint(ctx->dl, "id", header->id);
+		pr_out_str(ctx->dl, "global",
+			   global ? "true" : "false");
+	}
+	pr_out_array_start(ctx->dl, "field");
+	pr_out_dpipe_fields(ctx, header->fields,
+			    header->fields_count);
+	pr_out_array_end(ctx->dl);
+	pr_out_handle_end(ctx->dl);
 }
 
-int main(int argc, char **argv)
+static void pr_out_dpipe_headers(struct dpipe_ctx *ctx,
+				 struct nlattr **tb)
 {
-	static const struct option long_options[] = {
-		{ "Version",		no_argument,		NULL, 'V' },
-		{ "no-nice-names",	no_argument,		NULL, 'n' },
-		{ "json",		no_argument,		NULL, 'j' },
-		{ "pretty",		no_argument,		NULL, 'p' },
-		{ NULL, 0, NULL, 0 }
-	};
-	struct dl *dl;
-	int opt;
+	struct dpipe_header *header;
+
+	list_for_each_entry(header, &ctx->local_headers, list)
+		pr_out_dpipe_header(ctx, tb, header, false);
+
+	list_for_each_entry(header, &ctx->global_headers, list)
+		pr_out_dpipe_header(ctx, tb, header, true);
+}
+
+static int dpipe_header_field_get(struct nlattr *nl, struct dpipe_field *field)
+{
+	struct nlattr *nla_field[DEVLINK_ATTR_MAX + 1] = {};
+	const char *name;
 	int err;
-	int ret;
 
-	dl = dl_alloc();
-	if (!dl) {
-		pr_err("Failed to allocate memory for devlink\n");
-		return EXIT_FAILURE;
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_field);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+	if (!nla_field[DEVLINK_ATTR_DPIPE_FIELD_ID] ||
+	    !nla_field[DEVLINK_ATTR_DPIPE_FIELD_NAME] ||
+	    !nla_field[DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH] ||
+	    !nla_field[DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE])
+		return -EINVAL;
+
+	name = mnl_attr_get_str(nla_field[DEVLINK_ATTR_DPIPE_FIELD_NAME]);
+	field->id = mnl_attr_get_u32(nla_field[DEVLINK_ATTR_DPIPE_FIELD_ID]);
+	field->bitwidth = mnl_attr_get_u32(nla_field[DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH]);
+	field->name = strdup(name);
+	if (!field->name)
+		return -ENOMEM;
+	field->mapping_type = mnl_attr_get_u32(nla_field[DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE]);
+	return 0;
+}
+
+static int dpipe_header_fields_get(struct nlattr *nla_fields,
+				   struct dpipe_field *fields)
+{
+	struct nlattr *nla_field;
+	int count = 0;
+	int err;
+
+	mnl_attr_for_each_nested(nla_field, nla_fields) {
+		err = dpipe_header_field_get(nla_field, &fields[count]);
+		if (err)
+			return err;
+		count++;
 	}
+	return 0;
+}
 
-	while ((opt = getopt_long(argc, argv, "Vnjp",
-				  long_options, NULL)) >= 0) {
+static unsigned int dpipe_header_field_count_get(struct nlattr *nla_fields)
+{
+	struct nlattr *nla_field;
+	unsigned int count = 0;
 
-		switch (opt) {
-		case 'V':
-			printf("devlink utility, iproute2-ss%s\n", SNAPSHOT);
-			ret = EXIT_SUCCESS;
-			goto dl_free;
-		case 'n':
-			dl->no_nice_names = true;
-			break;
-		case 'j':
-			dl->json_output = true;
-			break;
-		case 'p':
-			dl->pretty_output = true;
+	mnl_attr_for_each_nested(nla_field, nla_fields)
+		count++;
+	return count;
+}
+
+static int dpipe_header_get(struct dpipe_ctx *ctx, struct nlattr *nl)
+{
+	struct nlattr *nla_header[DEVLINK_ATTR_MAX + 1] = {};
+	struct dpipe_header *header;
+	unsigned int fields_count;
+	const char *header_name;
+	bool global;
+	int err;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_header);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_header[DEVLINK_ATTR_DPIPE_HEADER_NAME] ||
+	    !nla_header[DEVLINK_ATTR_DPIPE_HEADER_ID] ||
+	    !nla_header[DEVLINK_ATTR_DPIPE_HEADER_FIELDS])
+		return -EINVAL;
+
+	fields_count = dpipe_header_field_count_get(nla_header[DEVLINK_ATTR_DPIPE_HEADER_FIELDS]);
+	header = dpipe_header_alloc(fields_count);
+	if (!header)
+		return -ENOMEM;
+
+	header_name = mnl_attr_get_str(nla_header[DEVLINK_ATTR_DPIPE_HEADER_NAME]);
+	header->name = strdup(header_name);
+	header->id = mnl_attr_get_u32(nla_header[DEVLINK_ATTR_DPIPE_HEADER_ID]);
+	header->fields_count = fields_count;
+	global = !!mnl_attr_get_u8(nla_header[DEVLINK_ATTR_DPIPE_HEADER_GLOBAL]);
+
+	err = dpipe_header_fields_get(nla_header[DEVLINK_ATTR_DPIPE_HEADER_FIELDS],
+				      header->fields);
+	if (err)
+		goto err_field_get;
+	dpipe_header_add(ctx, header, global);
+	return 0;
+
+err_field_get:
+	dpipe_header_free(header);
+	return err;
+}
+
+static int dpipe_headers_get(struct dpipe_ctx *ctx, struct nlattr **tb)
+{
+	struct nlattr *nla_headers = tb[DEVLINK_ATTR_DPIPE_HEADERS];
+	struct nlattr *nla_header;
+	int err;
+
+	mnl_attr_for_each_nested(nla_header, nla_headers) {
+		err = dpipe_header_get(ctx, nla_header);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int cmd_dpipe_header_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct dpipe_ctx *ctx = data;
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	int err;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_DPIPE_HEADERS])
+		return MNL_CB_ERROR;
+	err = dpipe_headers_get(ctx, tb);
+	if (err) {
+		ctx->err = err;
+		return MNL_CB_ERROR;
+	}
+
+	if (ctx->print_headers)
+		pr_out_dpipe_headers(ctx, tb);
+	return MNL_CB_OK;
+}
+
+static int cmd_dpipe_headers_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	struct dpipe_ctx *ctx;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_DPIPE_HEADERS_GET, flags);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE, 0);
+	if (err)
+		return err;
+
+	ctx = dpipe_ctx_alloc(dl);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->print_headers = true;
+
+	pr_out_section_start(dl, "header");
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_dpipe_header_cb, ctx);
+	if (err)
+		pr_err("error get headers %s\n", strerror(ctx->err));
+	pr_out_section_end(dl);
+
+	dpipe_ctx_clear(ctx);
+	dpipe_ctx_free(ctx);
+	return err;
+}
+
+static void cmd_dpipe_header_help(void)
+{
+	pr_err("Usage: devlink dpipe headers show DEV\n");
+}
+
+static int cmd_dpipe_header(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+		cmd_dpipe_header_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show")) {
+		dl_arg_inc(dl);
+		return cmd_dpipe_headers_show(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static const char
+*dpipe_action_type_e2s(enum devlink_dpipe_action_type action_type)
+{
+	switch (action_type) {
+	case DEVLINK_DPIPE_ACTION_TYPE_FIELD_MODIFY:
+		return "field_modify";
+	default:
+		return "<unknown>";
+	}
+}
+
+static void pr_out_dpipe_action(struct dpipe_ctx *ctx,
+				uint32_t header_id, uint32_t field_id,
+				uint32_t action_type, bool global)
+{
+	const char *mapping;
+
+	pr_out_str(ctx->dl, "type", dpipe_action_type_e2s(action_type));
+	pr_out_str(ctx->dl, "header", dpipe_header_id2s(ctx, header_id,
+							global));
+	pr_out_str(ctx->dl, "field", dpipe_field_id2s(ctx, header_id, field_id,
+						      global));
+	mapping = dpipe_mapping_get(ctx, header_id, field_id, global);
+	if (mapping)
+		pr_out_str(ctx->dl, "mapping", mapping);
+}
+
+static int dpipe_action_show(struct dpipe_ctx *ctx, struct nlattr *nl)
+{
+	struct nlattr *nla_action[DEVLINK_ATTR_MAX + 1] = {};
+	uint32_t header_id, field_id, action_type;
+	bool global;
+	int err;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_action);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_action[DEVLINK_ATTR_DPIPE_ACTION_TYPE] ||
+	    !nla_action[DEVLINK_ATTR_DPIPE_HEADER_INDEX] ||
+	    !nla_action[DEVLINK_ATTR_DPIPE_HEADER_ID] ||
+	    !nla_action[DEVLINK_ATTR_DPIPE_FIELD_ID]) {
+		return -EINVAL;
+	}
+
+	header_id = mnl_attr_get_u32(nla_action[DEVLINK_ATTR_DPIPE_HEADER_ID]);
+	field_id = mnl_attr_get_u32(nla_action[DEVLINK_ATTR_DPIPE_FIELD_ID]);
+	action_type = mnl_attr_get_u32(nla_action[DEVLINK_ATTR_DPIPE_ACTION_TYPE]);
+	global = !!mnl_attr_get_u8(nla_action[DEVLINK_ATTR_DPIPE_HEADER_GLOBAL]);
+
+	pr_out_dpipe_action(ctx, header_id, field_id, action_type, global);
+	return 0;
+}
+
+static int dpipe_table_actions_show(struct dpipe_ctx *ctx,
+				    struct nlattr *nla_actions)
+{
+	struct nlattr *nla_action;
+
+	mnl_attr_for_each_nested(nla_action, nla_actions) {
+		pr_out_entry_start(ctx->dl);
+		if (dpipe_action_show(ctx, nla_action))
+			goto err_action_show;
+		pr_out_entry_end(ctx->dl);
+	}
+	return 0;
+
+err_action_show:
+	pr_out_entry_end(ctx->dl);
+	return -EINVAL;
+}
+
+static const char *
+dpipe_match_type_e2s(enum devlink_dpipe_match_type match_type)
+{
+	switch (match_type) {
+	case DEVLINK_DPIPE_MATCH_TYPE_FIELD_EXACT:
+		return "field_exact";
+	default:
+		return "<unknown>";
+	}
+}
+
+static void pr_out_dpipe_match(struct dpipe_ctx *ctx,
+			       uint32_t header_id, uint32_t field_id,
+			       uint32_t match_type, bool global)
+{
+	const char *mapping;
+
+	pr_out_str(ctx->dl, "type", dpipe_match_type_e2s(match_type));
+	pr_out_str(ctx->dl, "header", dpipe_header_id2s(ctx, header_id,
+							global));
+	pr_out_str(ctx->dl, "field", dpipe_field_id2s(ctx, header_id, field_id,
+						      global));
+	mapping = dpipe_mapping_get(ctx, header_id, field_id, global);
+	if (mapping)
+		pr_out_str(ctx->dl, "mapping", mapping);
+
+}
+
+static int dpipe_match_show(struct dpipe_ctx *ctx, struct nlattr *nl)
+{
+	struct nlattr *nla_match[DEVLINK_ATTR_MAX + 1] = {};
+	uint32_t header_id, field_id, match_type;
+	bool global;
+	int err;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_match);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_match[DEVLINK_ATTR_DPIPE_MATCH_TYPE] ||
+	    !nla_match[DEVLINK_ATTR_DPIPE_HEADER_INDEX] ||
+	    !nla_match[DEVLINK_ATTR_DPIPE_HEADER_ID] ||
+	    !nla_match[DEVLINK_ATTR_DPIPE_FIELD_ID]) {
+		return -EINVAL;
+	}
+
+	match_type = mnl_attr_get_u32(nla_match[DEVLINK_ATTR_DPIPE_MATCH_TYPE]);
+	header_id = mnl_attr_get_u32(nla_match[DEVLINK_ATTR_DPIPE_HEADER_ID]);
+	field_id = mnl_attr_get_u32(nla_match[DEVLINK_ATTR_DPIPE_FIELD_ID]);
+	global = !!mnl_attr_get_u8(nla_match[DEVLINK_ATTR_DPIPE_HEADER_GLOBAL]);
+
+	pr_out_dpipe_match(ctx, header_id, field_id, match_type, global);
+	return 0;
+}
+
+static int dpipe_table_matches_show(struct dpipe_ctx *ctx,
+				    struct nlattr *nla_matches)
+{
+	struct nlattr *nla_match;
+
+	mnl_attr_for_each_nested(nla_match, nla_matches) {
+		pr_out_entry_start(ctx->dl);
+		if (dpipe_match_show(ctx, nla_match))
+			goto err_match_show;
+		pr_out_entry_end(ctx->dl);
+	}
+	return 0;
+
+err_match_show:
+	pr_out_entry_end(ctx->dl);
+	return -EINVAL;
+}
+
+static int dpipe_table_show(struct dpipe_ctx *ctx, struct nlattr *nl)
+{
+	struct nlattr *nla_table[DEVLINK_ATTR_MAX + 1] = {};
+	bool counters_enabled;
+	const char *name;
+	uint32_t size;
+	int err;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_table);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_table[DEVLINK_ATTR_DPIPE_TABLE_NAME] ||
+	    !nla_table[DEVLINK_ATTR_DPIPE_TABLE_SIZE] ||
+	    !nla_table[DEVLINK_ATTR_DPIPE_TABLE_ACTIONS] ||
+	    !nla_table[DEVLINK_ATTR_DPIPE_TABLE_MATCHES] ||
+	    !nla_table[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED]) {
+		return -EINVAL;
+	}
+
+	name = mnl_attr_get_str(nla_table[DEVLINK_ATTR_DPIPE_TABLE_NAME]);
+	size = mnl_attr_get_u32(nla_table[DEVLINK_ATTR_DPIPE_TABLE_SIZE]);
+	counters_enabled = !!mnl_attr_get_u8(nla_table[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED]);
+
+	pr_out_str(ctx->dl, "name", name);
+	pr_out_uint(ctx->dl, "size", size);
+	pr_out_str(ctx->dl, "counters_enabled",
+		   counters_enabled ? "true" : "false");
+
+	pr_out_array_start(ctx->dl, "match");
+	if (dpipe_table_matches_show(ctx, nla_table[DEVLINK_ATTR_DPIPE_TABLE_MATCHES]))
+		goto err_matches_show;
+	pr_out_array_end(ctx->dl);
+
+	pr_out_array_start(ctx->dl, "action");
+	if (dpipe_table_actions_show(ctx, nla_table[DEVLINK_ATTR_DPIPE_TABLE_ACTIONS]))
+		goto err_actions_show;
+	pr_out_array_end(ctx->dl);
+
+	return 0;
+
+err_actions_show:
+err_matches_show:
+	pr_out_array_end(ctx->dl);
+	return -EINVAL;
+}
+
+static int dpipe_tables_show(struct dpipe_ctx *ctx, struct nlattr **tb)
+{
+	struct nlattr *nla_tables = tb[DEVLINK_ATTR_DPIPE_TABLES];
+	struct nlattr *nla_table;
+
+	mnl_attr_for_each_nested(nla_table, nla_tables) {
+		pr_out_handle_start_arr(ctx->dl, tb);
+		if (dpipe_table_show(ctx, nla_table))
+			goto err_table_show;
+		pr_out_handle_end(ctx->dl);
+	}
+	return 0;
+
+err_table_show:
+	pr_out_handle_end(ctx->dl);
+	return -EINVAL;
+}
+
+static int cmd_dpipe_table_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct dpipe_ctx *ctx = data;
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_DPIPE_TABLES])
+		return MNL_CB_ERROR;
+
+	if (dpipe_tables_show(ctx, tb))
+		return MNL_CB_ERROR;
+	return MNL_CB_OK;
+}
+
+static int cmd_dpipe_table_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	struct dpipe_ctx *ctx;
+	uint16_t flags = NLM_F_REQUEST;
+	int err;
+
+	ctx = dpipe_ctx_alloc(dl);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = dl_argv_parse(dl, DL_OPT_HANDLE, DL_OPT_DPIPE_TABLE_NAME);
+	if (err)
+		goto out;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_DPIPE_HEADERS_GET, flags);
+	dl_opts_put(nlh, dl);
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_dpipe_header_cb, ctx);
+	if (err) {
+		pr_err("error get headers %s\n", strerror(ctx->err));
+		goto out;
+	}
+
+	flags = NLM_F_REQUEST | NLM_F_ACK;
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_DPIPE_TABLE_GET, flags);
+	dl_opts_put(nlh, dl);
+
+	pr_out_section_start(dl, "table");
+	_mnlg_socket_sndrcv(dl->nlg, nlh, cmd_dpipe_table_show_cb, ctx);
+	pr_out_section_end(dl);
+out:
+	dpipe_ctx_clear(ctx);
+	dpipe_ctx_free(ctx);
+	return err;
+}
+
+static int cmd_dpipe_table_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl,
+				DL_OPT_HANDLE | DL_OPT_DPIPE_TABLE_NAME |
+				DL_OPT_DPIPE_TABLE_COUNTERS, 0);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int dpipe_entry_value_show(struct dpipe_ctx *ctx,
+				  struct nlattr **nla_match_value)
+{
+	uint16_t value_len;
+	bool mask, mapping;
+
+	mask = !!nla_match_value[DEVLINK_ATTR_DPIPE_VALUE_MASK];
+	mapping = !!nla_match_value[DEVLINK_ATTR_DPIPE_VALUE_MAPPING];
+
+	value_len = mnl_attr_get_payload_len(nla_match_value[DEVLINK_ATTR_DPIPE_VALUE]);
+	if (value_len == sizeof(uint32_t)) {
+		uint32_t value, value_mask, value_mapping;
+
+		if (mapping) {
+			value_mapping = mnl_attr_get_u32(nla_match_value[DEVLINK_ATTR_DPIPE_VALUE_MAPPING]);
+			pr_out_uint(ctx->dl, "mapping_value", value_mapping);
+		}
+
+		if (mask) {
+			value_mask = mnl_attr_get_u32(nla_match_value[DEVLINK_ATTR_DPIPE_VALUE_MASK]);
+			pr_out_uint(ctx->dl, "mask_value", value_mask);
+		}
+
+		value = mnl_attr_get_u32(nla_match_value[DEVLINK_ATTR_DPIPE_VALUE]);
+		pr_out_uint(ctx->dl, "value", value);
+
+	} else {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int dpipe_entry_match_value_show(struct dpipe_ctx *ctx,
+					struct nlattr *nl)
+{
+	struct nlattr *nla_match_value[DEVLINK_ATTR_MAX + 1] = {};
+	int err;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_match_value);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_match_value[DEVLINK_ATTR_DPIPE_MATCH] ||
+	    !nla_match_value[DEVLINK_ATTR_DPIPE_VALUE]) {
+		return -EINVAL;
+	}
+
+	pr_out_entry_start(ctx->dl);
+	if (dpipe_match_show(ctx, nla_match_value[DEVLINK_ATTR_DPIPE_MATCH]))
+		goto err_match_show;
+	if (dpipe_entry_value_show(ctx, nla_match_value))
+		goto err_value_show;
+	pr_out_entry_end(ctx->dl);
+
+	return 0;
+
+err_match_show:
+err_value_show:
+	pr_out_entry_end(ctx->dl);
+	return -EINVAL;
+}
+
+static int dpipe_entry_action_value_show(struct dpipe_ctx *ctx,
+					 struct nlattr *nl)
+{
+	struct nlattr *nla_action_value[DEVLINK_ATTR_MAX + 1] = {};
+	int err;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_action_value);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_action_value[DEVLINK_ATTR_DPIPE_ACTION] ||
+	    !nla_action_value[DEVLINK_ATTR_DPIPE_VALUE]) {
+		return -EINVAL;
+	}
+
+	pr_out_entry_start(ctx->dl);
+	if (dpipe_action_show(ctx, nla_action_value[DEVLINK_ATTR_DPIPE_ACTION]))
+		goto err_action_show;
+	if (dpipe_entry_value_show(ctx, nla_action_value))
+		goto err_value_show;
+	pr_out_entry_end(ctx->dl);
+
+	return 0;
+
+err_action_show:
+err_value_show:
+	pr_out_entry_end(ctx->dl);
+	return -EINVAL;
+}
+
+static int
+dpipe_tables_action_values_show(struct dpipe_ctx *ctx,
+				struct nlattr *nla_action_values)
+{
+	struct nlattr *nla_action_value;
+
+	mnl_attr_for_each_nested(nla_action_value, nla_action_values) {
+		if (dpipe_entry_action_value_show(ctx, nla_action_value))
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+dpipe_tables_match_values_show(struct dpipe_ctx *ctx,
+			       struct nlattr *nla_match_values)
+{
+	struct nlattr *nla_match_value;
+
+	mnl_attr_for_each_nested(nla_match_value, nla_match_values) {
+		if (dpipe_entry_match_value_show(ctx, nla_match_value))
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static int dpipe_entry_show(struct dpipe_ctx *ctx, struct nlattr *nl)
+{
+	struct nlattr *nla_entry[DEVLINK_ATTR_MAX + 1] = {};
+	uint32_t entry_index;
+	uint64_t counter;
+	int err;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_entry);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_entry[DEVLINK_ATTR_DPIPE_ENTRY_INDEX] ||
+	    !nla_entry[DEVLINK_ATTR_DPIPE_ENTRY_MATCH_VALUES] ||
+	    !nla_entry[DEVLINK_ATTR_DPIPE_ENTRY_ACTION_VALUES]) {
+		return -EINVAL;
+	}
+
+	entry_index = mnl_attr_get_u32(nla_entry[DEVLINK_ATTR_DPIPE_ENTRY_INDEX]);
+	pr_out_uint(ctx->dl, "index", entry_index);
+
+	if (nla_entry[DEVLINK_ATTR_DPIPE_ENTRY_COUNTER]) {
+		counter = mnl_attr_get_u64(nla_entry[DEVLINK_ATTR_DPIPE_ENTRY_COUNTER]);
+		pr_out_uint(ctx->dl, "counter", counter);
+	}
+
+	pr_out_array_start(ctx->dl, "match_value");
+	if (dpipe_tables_match_values_show(ctx,
+					   nla_entry[DEVLINK_ATTR_DPIPE_ENTRY_MATCH_VALUES]))
+		goto err_match_values_show;
+	pr_out_array_end(ctx->dl);
+
+	pr_out_array_start(ctx->dl, "action_value");
+	if (dpipe_tables_action_values_show(ctx,
+					    nla_entry[DEVLINK_ATTR_DPIPE_ENTRY_ACTION_VALUES]))
+		goto err_action_values_show;
+	pr_out_array_end(ctx->dl);
+	return 0;
+
+err_action_values_show:
+err_match_values_show:
+	pr_out_array_end(ctx->dl);
+	return -EINVAL;
+}
+
+static int dpipe_table_entries_show(struct dpipe_ctx *ctx, struct nlattr **tb)
+{
+	struct nlattr *nla_entries = tb[DEVLINK_ATTR_DPIPE_ENTRIES];
+	struct nlattr *nla_entry;
+
+	mnl_attr_for_each_nested(nla_entry, nla_entries) {
+		pr_out_handle_start_arr(ctx->dl, tb);
+		if (dpipe_entry_show(ctx, nla_entry))
+			goto err_entry_show;
+		pr_out_handle_end(ctx->dl);
+	}
+	return 0;
+
+err_entry_show:
+	pr_out_handle_end(ctx->dl);
+	return -EINVAL;
+}
+
+static int cmd_dpipe_table_entry_dump_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct dpipe_ctx *ctx = data;
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_DPIPE_ENTRIES])
+		return MNL_CB_ERROR;
+
+	if (dpipe_table_entries_show(ctx, tb))
+		return MNL_CB_ERROR;
+	return MNL_CB_OK;
+}
+
+static int cmd_dpipe_table_dump(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	struct dpipe_ctx *ctx;
+	uint16_t flags = NLM_F_REQUEST;
+	int err;
+
+	ctx = dpipe_ctx_alloc(dl);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = dl_argv_parse(dl, DL_OPT_HANDLE | DL_OPT_DPIPE_TABLE_NAME, 0);
+	if (err)
+		goto out;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_DPIPE_HEADERS_GET, flags);
+	dl_opts_put(nlh, dl);
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_dpipe_header_cb, ctx);
+	if (err) {
+		pr_err("error get headers %s\n", strerror(ctx->err));
+		goto out;
+	}
+
+	flags = NLM_F_REQUEST | NLM_F_ACK;
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_DPIPE_ENTRIES_GET, flags);
+	dl_opts_put(nlh, dl);
+
+	pr_out_section_start(dl, "table_entry");
+	_mnlg_socket_sndrcv(dl->nlg, nlh, cmd_dpipe_table_entry_dump_cb, ctx);
+	pr_out_section_end(dl);
+out:
+	dpipe_ctx_clear(ctx);
+	dpipe_ctx_free(ctx);
+	return err;
+}
+
+static void cmd_dpipe_table_help(void)
+{
+	pr_err("Usage: devlink dpipe table [ OBJECT-LIST ]\n"
+	       "where  OBJECT-LIST := { show | set | dump }\n");
+}
+
+static int cmd_dpipe_table(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+		cmd_dpipe_table_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show")) {
+		dl_arg_inc(dl);
+		return cmd_dpipe_table_show(dl);
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_dpipe_table_set(dl);
+	}  else if (dl_argv_match(dl, "dump")) {
+		dl_arg_inc(dl);
+		return cmd_dpipe_table_dump(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static void cmd_dpipe_help(void)
+{
+	pr_err("Usage: devlink dpipe [ OBJECT-LIST ]\n"
+	       "where  OBJECT-LIST := { header | table }\n");
+}
+
+static int cmd_dpipe(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+		cmd_dpipe_help();
+		return 0;
+	} else if (dl_argv_match(dl, "header")) {
+		dl_arg_inc(dl);
+		return cmd_dpipe_header(dl);
+	} else if (dl_argv_match(dl, "table")) {
+		dl_arg_inc(dl);
+		return cmd_dpipe_table(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static void help(void)
+{
+	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
+	       "where  OBJECT := { dev | port | sb | monitor | dpipe }\n"
+	       "       OPTIONS := { -V[ersion] | -n[no-nice-names] | -j[json] | -p[pretty] | -v[verbose] }\n");
+}
+
+static int dl_cmd(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+		help();
+		return 0;
+	} else if (dl_argv_match(dl, "dev")) {
+		dl_arg_inc(dl);
+		return cmd_dev(dl);
+	} else if (dl_argv_match(dl, "port")) {
+		dl_arg_inc(dl);
+		return cmd_port(dl);
+	} else if (dl_argv_match(dl, "sb")) {
+		dl_arg_inc(dl);
+		return cmd_sb(dl);
+	} else if (dl_argv_match(dl, "monitor")) {
+		dl_arg_inc(dl);
+		return cmd_mon(dl);
+	} else if (dl_argv_match(dl, "dpipe")) {
+		dl_arg_inc(dl);
+		return cmd_dpipe(dl);
+	}
+	pr_err("Object \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static int dl_init(struct dl *dl, int argc, char **argv)
+{
+	int err;
+
+	dl->argc = argc;
+	dl->argv = argv;
+
+	dl->nlg = mnlg_socket_open(DEVLINK_GENL_NAME, DEVLINK_GENL_VERSION);
+	if (!dl->nlg) {
+		pr_err("Failed to connect to devlink Netlink\n");
+		return -errno;
+	}
+
+	err = ifname_map_init(dl);
+	if (err) {
+		pr_err("Failed to create index map\n");
+		goto err_ifname_map_create;
+	}
+	if (dl->json_output) {
+		dl->jw = jsonw_new(stdout);
+		if (!dl->jw) {
+			pr_err("Failed to create JSON writer\n");
+			goto err_json_new;
+		}
+		jsonw_pretty(dl->jw, dl->pretty_output);
+	}
+	return 0;
+
+err_json_new:
+	ifname_map_fini(dl);
+err_ifname_map_create:
+	mnlg_socket_close(dl->nlg);
+	return err;
+}
+
+static void dl_fini(struct dl *dl)
+{
+	if (dl->json_output)
+		jsonw_destroy(&dl->jw);
+	ifname_map_fini(dl);
+	mnlg_socket_close(dl->nlg);
+}
+
+static struct dl *dl_alloc(void)
+{
+	struct dl *dl;
+
+	dl = calloc(1, sizeof(*dl));
+	if (!dl)
+		return NULL;
+	return dl;
+}
+
+static void dl_free(struct dl *dl)
+{
+	free(dl);
+}
+
+int main(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "Version",		no_argument,		NULL, 'V' },
+		{ "no-nice-names",	no_argument,		NULL, 'n' },
+		{ "json",		no_argument,		NULL, 'j' },
+		{ "pretty",		no_argument,		NULL, 'p' },
+		{ "verbose",		no_argument,		NULL, 'v' },
+		{ NULL, 0, NULL, 0 }
+	};
+	struct dl *dl;
+	int opt;
+	int err;
+	int ret;
+
+	dl = dl_alloc();
+	if (!dl) {
+		pr_err("Failed to allocate memory for devlink\n");
+		return EXIT_FAILURE;
+	}
+
+	while ((opt = getopt_long(argc, argv, "Vnjpv",
+				  long_options, NULL)) >= 0) {
+
+		switch (opt) {
+		case 'V':
+			printf("devlink utility, iproute2-ss%s\n", SNAPSHOT);
+			ret = EXIT_SUCCESS;
+			goto dl_free;
+		case 'n':
+			dl->no_nice_names = true;
+			break;
+		case 'j':
+			dl->json_output = true;
+			break;
+		case 'p':
+			dl->pretty_output = true;
+			break;
+		case 'v':
+			dl->verbose = true;
 			break;
 		default:
 			pr_err("Unknown option.\n");
-- 
2.7.4

^ permalink raw reply related

* [patch iproute2 v2 1/2] devlink: Change netlink attribute validation
From: Jiri Pirko @ 2017-05-03 11:25 UTC (permalink / raw)
  To: netdev; +Cc: arkadis, stephen, davem, mlxsw, dsa
In-Reply-To: <1493810723-2536-1-git-send-email-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Currently the netlink attribute resolving is done by a sequence of
if's. Change the attribute resolving to table lookup.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 103 ++++++++++++++++--------------------------------------
 1 file changed, 30 insertions(+), 73 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index e90226e..35220d8 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -232,88 +232,45 @@ static bool dl_no_arg(struct dl *dl)
 	return dl_argc(dl) == 0;
 }
 
+static const enum mnl_attr_data_type devlink_policy[DEVLINK_ATTR_MAX + 1] = {
+	[DEVLINK_ATTR_BUS_NAME] = MNL_TYPE_NUL_STRING,
+	[DEVLINK_ATTR_DEV_NAME] = MNL_TYPE_NUL_STRING,
+	[DEVLINK_ATTR_PORT_INDEX] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_PORT_TYPE] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_PORT_DESIRED_TYPE] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_PORT_NETDEV_IFINDEX] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_PORT_NETDEV_NAME] = MNL_TYPE_NUL_STRING,
+	[DEVLINK_ATTR_PORT_IBDEV_NAME] = MNL_TYPE_NUL_STRING,
+	[DEVLINK_ATTR_SB_INDEX] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_SB_SIZE] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_SB_INGRESS_POOL_COUNT] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_SB_EGRESS_POOL_COUNT] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_SB_INGRESS_TC_COUNT] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_SB_EGRESS_TC_COUNT] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_SB_POOL_INDEX] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_SB_POOL_TYPE] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_SB_POOL_SIZE] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_SB_THRESHOLD] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_SB_TC_INDEX] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_SB_OCC_CUR] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_SB_OCC_MAX] = MNL_TYPE_U32,
+	[DEVLINK_ATTR_ESWITCH_MODE] = MNL_TYPE_U16,
+	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = MNL_TYPE_U8,
+};
+
 static int attr_cb(const struct nlattr *attr, void *data)
 {
 	const struct nlattr **tb = data;
 	int type;
 
-	type = mnl_attr_get_type(attr);
-
 	if (mnl_attr_type_valid(attr, DEVLINK_ATTR_MAX) < 0)
 		return MNL_CB_ERROR;
 
-	if (type == DEVLINK_ATTR_BUS_NAME &&
-	    mnl_attr_validate(attr, MNL_TYPE_NUL_STRING) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_DEV_NAME &&
-	    mnl_attr_validate(attr, MNL_TYPE_NUL_STRING) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_PORT_INDEX &&
-	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_PORT_TYPE &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_PORT_DESIRED_TYPE &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_PORT_NETDEV_IFINDEX &&
-	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_PORT_NETDEV_NAME &&
-	    mnl_attr_validate(attr, MNL_TYPE_NUL_STRING) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_PORT_IBDEV_NAME &&
-	    mnl_attr_validate(attr, MNL_TYPE_NUL_STRING) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_INDEX &&
-	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_SIZE &&
-	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_INGRESS_POOL_COUNT &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_EGRESS_POOL_COUNT &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_INGRESS_TC_COUNT &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_EGRESS_TC_COUNT &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_POOL_INDEX &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_POOL_TYPE &&
-	    mnl_attr_validate(attr, MNL_TYPE_U8) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_POOL_SIZE &&
-	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE &&
-	    mnl_attr_validate(attr, MNL_TYPE_U8) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_THRESHOLD &&
-	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_TC_INDEX &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_OCC_CUR &&
-	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_SB_OCC_MAX &&
-	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_ESWITCH_MODE &&
-	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
-		return MNL_CB_ERROR;
-	if (type == DEVLINK_ATTR_ESWITCH_INLINE_MODE &&
-	    mnl_attr_validate(attr, MNL_TYPE_U8) < 0)
+	type = mnl_attr_get_type(attr);
+	if (mnl_attr_validate(attr, devlink_policy[type]) < 0)
 		return MNL_CB_ERROR;
+
 	tb[type] = attr;
 	return MNL_CB_OK;
 }
-- 
2.7.4

^ permalink raw reply related

* [patch iproute2 v2 0/2] devlink: Add support for pipeline
From: Jiri Pirko @ 2017-05-03 11:25 UTC (permalink / raw)
  To: netdev; +Cc: arkadis, stephen, davem, mlxsw, dsa

From: Jiri Pirko <jiri@mellanox.com>

Arkadi says:

Add support for pipeline debug (dpipe). As a preparation step the netlink
attribute validation was changed before adding new dpipe attributes.
---
v1->v2
- Change netlink attribute validation. 
- Fix commit message typos

Arkadi Sharshevsky (2):
  devlink: Change netlink attribute validation
  devlink: Add support for pipeline debug (dpipe)

 devlink/devlink.c | 1450 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 1281 insertions(+), 169 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Eine Spende von 1 Million Britische Pfund zu Ihnen in gutem Glauben
From: Mr Neil Trotter @ 2017-05-03  9:53 UTC (permalink / raw)




^ permalink raw reply

* Re: [PATCH 1/1] net: usb: qmi_wwan: add Telit ME910 support
From: Bjørn Mork @ 2017-05-03  9:43 UTC (permalink / raw)
  To: Daniele Palmas; +Cc: netdev
In-Reply-To: <1493800211-7758-1-git-send-email-dnlplm@gmail.com>

Daniele Palmas <dnlplm@gmail.com> writes:

> This patch adds support for Telit ME910 PID 0x1100.
>
> Signed-off-by: Daniele Palmas <dnlplm@gmail.com>

Acked-by: Bjørn Mork <bjorn@mork.no>

David, please add this to your stable queue as well.  Thanks


Bjørn

^ permalink raw reply

* Re: [PATCH 3/4] net: macb: Add hardware PTP support
From: Richard Cochran @ 2017-05-03  9:43 UTC (permalink / raw)
  To: Rafal Ozieblo
  Cc: David Miller,
	nicolas.ferre-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	harinikatakamlinux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	harini.katakam-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org,
	Andrei.Pistirica-UWL1GkI3JZL3oGB3hsPCZA@public.gmane.org
In-Reply-To: <BN3PR07MB2516757CB4EA7367F6623C06C9170-EldUQEzkDQfxGZiqM5fOI+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

On Tue, May 02, 2017 at 01:57:15PM +0000, Rafal Ozieblo wrote:
> > What is the point of this wrapper function anyhow?  Please remove it.
> gem_ptp_gettime() is assigned in ptp_clock_info and it has to have 
> ptp_clock_info pointer as first parameter. gem_tsu_get_time() is used in
> the source code but with macb pointer.
> Do you want me to do something like:
> gem_ptp_gettime(macb->ptp, ts);
> and first would be getting macb pointer from ptp ?
> struct macb *bp = container_of(ptp, struct macb, ptp_clock_info);

Yes.  Unless your sub-function is used in more than one place, then it
is wasteful and confusing to wrap the functionality for no apparent
reason.

> > > +	switch (rq->type) {
> > > +	case PTP_CLK_REQ_EXTTS:	/* Toggle TSU match interrupt */
> > > +		if (on)
> > > +			macb_writel(bp, IER, MACB_BIT(TCI));
> > 
> > No locking to protect IER and IDE?
> There is no need.

But what happens when the PTP_CLK_REQ_EXTTS and PTP_CLK_REQ_PPS ioctls
are called at the same time?

You need to ensure that IDR is consistent.  If the bits are write
only, then you should comment this fact.

> > > +		else
> > > +			macb_writel(bp, IDR, MACB_BIT(TCI));
> > > +		break;
> > > +	case PTP_CLK_REQ_PEROUT: /* Toggle Periodic output */
> > > +		return -EOPNOTSUPP;
> > > +		/* break; */
> > > +	case PTP_CLK_REQ_PPS:	/* Toggle TSU periodic (second)
> > interrupt */
> > > +		if (on)
> > > +			macb_writel(bp, IER, MACB_BIT(SRI));
> > > +		else
> > > +			macb_writel(bp, IDR, MACB_BIT(SRI));
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > > +	return 0;
> > > +}

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 09/16] netfilter: xt_socket: Fix broken IPv6 handling
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Peter Tirsek <peter@tirsek.com>

Commit 834184b1f3a4 ("netfilter: defrag: only register defrag
functionality if needed") used the outdated XT_SOCKET_HAVE_IPV6 macro
which was removed earlier in commit 8db4c5be88f6 ("netfilter: move
socket lookup infrastructure to nf_socket_ipv{4,6}.c"). With that macro
never being defined, the xt_socket match emits an "Unknown family 10"
warning when used with IPv6:

WARNING: CPU: 0 PID: 1377 at net/netfilter/xt_socket.c:160 socket_mt_enable_defrag+0x47/0x50 [xt_socket]
Unknown family 10
Modules linked in: xt_socket nf_socket_ipv4 nf_socket_ipv6 nf_defrag_ipv4 [...]
CPU: 0 PID: 1377 Comm: ip6tables-resto Not tainted 4.10.10 #1
Hardware name: [...]
Call Trace:
? __warn+0xe7/0x100
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? warn_slowpath_fmt+0x39/0x40
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? socket_mt_v2_check+0x12/0x40 [xt_socket]
? xt_check_match+0x6b/0x1a0 [x_tables]
? xt_find_match+0x93/0xd0 [x_tables]
? xt_request_find_match+0x20/0x80 [x_tables]
? translate_table+0x48e/0x870 [ip6_tables]
? translate_table+0x577/0x870 [ip6_tables]
? walk_component+0x3a/0x200
? kmalloc_order+0x1d/0x50
? do_ip6t_set_ctl+0x181/0x490 [ip6_tables]
? filename_lookup+0xa5/0x120
? nf_setsockopt+0x3a/0x60
? ipv6_setsockopt+0xb0/0xc0
? sock_common_setsockopt+0x23/0x30
? SyS_socketcall+0x41d/0x630
? vfs_read+0xfa/0x120
? do_fast_syscall_32+0x7a/0x110
? entry_SYSENTER_32+0x47/0x71

This patch brings the conditional back in line with how the rest of the
file handles IPv6.

Fixes: 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if needed")
Signed-off-by: Peter Tirsek <peter@tirsek.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 770bbec878f1..e75ef39669c5 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -152,7 +152,7 @@ static int socket_mt_enable_defrag(struct net *net, int family)
 	switch (family) {
 	case NFPROTO_IPV4:
 		return nf_defrag_ipv4_enable(net);
-#ifdef XT_SOCKET_HAVE_IPV6
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
 	case NFPROTO_IPV6:
 		return nf_defrag_ipv6_enable(net);
 #endif
-- 
2.1.4

^ permalink raw reply related

* [PATCH 08/16] netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadj
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

We should acquire the ct->lock before accessing or modifying the
nf_ct_seqadj, as another CPU may modify the nf_ct_seqadj at the same
time during its packet proccessing.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 86deed6a8db4..78f8c9adbd3c 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -417,8 +417,7 @@ dump_ct_seq_adj(struct sk_buff *skb, const struct nf_ct_seqadj *seq, int type)
 	return -1;
 }
 
-static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb,
-				     const struct nf_conn *ct)
+static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, struct nf_conn *ct)
 {
 	struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
 	struct nf_ct_seqadj *seq;
@@ -426,15 +425,20 @@ static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb,
 	if (!(ct->status & IPS_SEQ_ADJUST) || !seqadj)
 		return 0;
 
+	spin_lock_bh(&ct->lock);
 	seq = &seqadj->seq[IP_CT_DIR_ORIGINAL];
 	if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_ORIG) == -1)
-		return -1;
+		goto err;
 
 	seq = &seqadj->seq[IP_CT_DIR_REPLY];
 	if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_REPLY) == -1)
-		return -1;
+		goto err;
 
+	spin_unlock_bh(&ct->lock);
 	return 0;
+err:
+	spin_unlock_bh(&ct->lock);
+	return -1;
 }
 
 static int ctnetlink_dump_id(struct sk_buff *skb, const struct nf_conn *ct)
@@ -1637,11 +1641,12 @@ ctnetlink_change_seq_adj(struct nf_conn *ct,
 	if (!seqadj)
 		return 0;
 
+	spin_lock_bh(&ct->lock);
 	if (cda[CTA_SEQ_ADJ_ORIG]) {
 		ret = change_seq_adj(&seqadj->seq[IP_CT_DIR_ORIGINAL],
 				     cda[CTA_SEQ_ADJ_ORIG]);
 		if (ret < 0)
-			return ret;
+			goto err;
 
 		set_bit(IPS_SEQ_ADJUST_BIT, &ct->status);
 	}
@@ -1650,12 +1655,16 @@ ctnetlink_change_seq_adj(struct nf_conn *ct,
 		ret = change_seq_adj(&seqadj->seq[IP_CT_DIR_REPLY],
 				     cda[CTA_SEQ_ADJ_REPLY]);
 		if (ret < 0)
-			return ret;
+			goto err;
 
 		set_bit(IPS_SEQ_ADJUST_BIT, &ct->status);
 	}
 
+	spin_unlock_bh(&ct->lock);
 	return 0;
+err:
+	spin_unlock_bh(&ct->lock);
+	return ret;
 }
 
 static int
-- 
2.1.4

^ permalink raw reply related

* [PATCH 04/16] netfilter: nft_set_bitmap: free dummy elements when destroy the set
From: Pablo Neira Ayuso @ 2017-05-03  9:31 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

We forget to free dummy elements when deleting the set. So when I was
running nft-test.py, I saw many kmemleak warnings:
  kmemleak: 1344 new suspected memory leaks ...
  # cat /sys/kernel/debug/kmemleak
  unreferenced object 0xffff8800631345c8 (size 32):
  comm "nft", pid 9075, jiffies 4295743309 (age 1354.815s)
  hex dump (first 32 bytes):
    f8 63 13 63 00 88 ff ff 88 79 13 63 00 88 ff ff  .c.c.....y.c....
    04 0c 00 00 00 00 00 00 00 00 00 00 08 03 00 00  ................
  backtrace:
    [<ffffffff819059da>] kmemleak_alloc+0x4a/0xa0
    [<ffffffff81288174>] __kmalloc+0x164/0x310
    [<ffffffffa061269d>] nft_set_elem_init+0x3d/0x1b0 [nf_tables]
    [<ffffffffa06130da>] nft_add_set_elem+0x45a/0x8c0 [nf_tables]
    [<ffffffffa0613645>] nf_tables_newsetelem+0x105/0x1d0 [nf_tables]
    [<ffffffffa05fe6d4>] nfnetlink_rcv+0x414/0x770 [nfnetlink]
    [<ffffffff817f0ca6>] netlink_unicast+0x1f6/0x310
    [<ffffffff817f10c6>] netlink_sendmsg+0x306/0x3b0
  ...

Fixes: e920dde516088 ("netfilter: nft_set_bitmap: keep a list of dummy elements")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_bitmap.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c
index 8ebbc2940f4c..b988162b5b15 100644
--- a/net/netfilter/nft_set_bitmap.c
+++ b/net/netfilter/nft_set_bitmap.c
@@ -257,6 +257,11 @@ static int nft_bitmap_init(const struct nft_set *set,
 
 static void nft_bitmap_destroy(const struct nft_set *set)
 {
+	struct nft_bitmap *priv = nft_set_priv(set);
+	struct nft_bitmap_elem *be, *n;
+
+	list_for_each_entry_safe(be, n, &priv->list, head)
+		nft_set_elem_destroy(set, be, true);
 }
 
 static bool nft_bitmap_estimate(const struct nft_set_desc *desc, u32 features,
-- 
2.1.4

^ permalink raw reply related

* [PATCH 01/16] netfilter: xt_CT: fix refcnt leak on error path
From: Pablo Neira Ayuso @ 2017-05-03  9:31 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Gao Feng <fgao@ikuai8.com>

There are two cases which causes refcnt leak.

1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should
free the timeout refcnt.
Now goto the err_put_timeout error handler instead of going ahead.

2. When the time policy is not found, we should call module_put.
Otherwise, the related cthelper module cannot be removed anymore.
It is easy to reproduce by typing the following command:
  # iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_CT.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index b008db0184b8..81fdcdca7457 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -167,8 +167,10 @@ xt_ct_set_timeout(struct nf_conn *ct, const struct xt_tgchk_param *par,
 		goto err_put_timeout;
 	}
 	timeout_ext = nf_ct_timeout_ext_add(ct, timeout, GFP_ATOMIC);
-	if (timeout_ext == NULL)
+	if (!timeout_ext) {
 		ret = -ENOMEM;
+		goto err_put_timeout;
+	}
 
 	rcu_read_unlock();
 	return ret;
@@ -200,6 +202,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
 			  struct xt_ct_target_info_v1 *info)
 {
 	struct nf_conntrack_zone zone;
+	struct nf_conn_help *help;
 	struct nf_conn *ct;
 	int ret = -EOPNOTSUPP;
 
@@ -248,7 +251,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
 	if (info->timeout[0]) {
 		ret = xt_ct_set_timeout(ct, par, info->timeout);
 		if (ret < 0)
-			goto err3;
+			goto err4;
 	}
 	__set_bit(IPS_CONFIRMED_BIT, &ct->status);
 	nf_conntrack_get(&ct->ct_general);
@@ -256,6 +259,10 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
 	info->ct = ct;
 	return 0;
 
+err4:
+	help = nfct_help(ct);
+	if (help)
+		module_put(help->helper->me);
 err3:
 	nf_ct_tmpl_free(ct);
 err2:
-- 
2.1.4

^ permalink raw reply related

* [PATCH 16/16] netfilter: nf_tables: check if same extensions are set when adding elements
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

If no NLM_F_EXCL is set and the element already exists in the set, make
sure that both elements have the same extensions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 434c739dfeca..11a96e8dd3cd 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3749,6 +3749,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	err = set->ops->insert(ctx->net, set, &elem, &ext2);
 	if (err) {
 		if (err == -EEXIST) {
+			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
+			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
+			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
+			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF))
+				return -EBUSY;
 			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
 			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
 			     memcmp(nft_set_ext_data(ext),
-- 
2.1.4


^ permalink raw reply related

* [PATCH 15/16] netfilter: update MAINTAINERS file
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

Several updates on the MAINTAINERS section for Netfilter:

1) Add Florian Westphal, he's been part of the coreteam since October 2012.
   He's been dedicating tireless efforts to improve the Netfilter codebase,
   fix bugs and push ongoing new developments ever since.

2) Add http://www.nftables.org/ URL, currently pointing to
   http://www.netfilter.org.

3) Update project status from Supported to Maintained.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 38d3e4ed7208..fc95cb06fb29 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8701,14 +8701,16 @@ F:	drivers/net/ethernet/neterion/
 NETFILTER
 M:	Pablo Neira Ayuso <pablo@netfilter.org>
 M:	Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
+M:	Florian Westphal <fw@strlen.de>
 L:	netfilter-devel@vger.kernel.org
 L:	coreteam@netfilter.org
 W:	http://www.netfilter.org/
 W:	http://www.iptables.org/
+W:	http://www.nftables.org/
 Q:	http://patchwork.ozlabs.org/project/netfilter-devel/list/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git
-S:	Supported
+S:	Maintained
 F:	include/linux/netfilter*
 F:	include/linux/netfilter/
 F:	include/net/netfilter/
-- 
2.1.4


^ permalink raw reply related

* [PATCH 14/16] netfilter: x_tables: unlock on error in xt_find_table_lock()
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Dan Carpenter <dan.carpenter@oracle.com>

According to my static checker we should unlock here before the return.
That seems reasonable to me as well.

Fixes" b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/x_tables.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 14857afc9937..f134d384852f 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1051,8 +1051,10 @@ struct xt_table *xt_find_table_lock(struct net *net, u_int8_t af,
 	list_for_each_entry(t, &init_net.xt.tables[af], list) {
 		if (strcmp(t->name, name))
 			continue;
-		if (!try_module_get(t->me))
+		if (!try_module_get(t->me)) {
+			mutex_unlock(&xt[af].mutex);
 			return NULL;
+		}
 
 		mutex_unlock(&xt[af].mutex);
 		if (t->table_init(net) != 0) {
-- 
2.1.4


^ permalink raw reply related

* [PATCH 13/16] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Paolo Abeni <pabeni@redhat.com>

When creating a new ipvs service, ipv6 addresses are always accepted
if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
is not explicitly checked.

This allows the user-space to configure ipvs services even if the
system is booted with ipv6.disable=1. On specific configuration, ipvs
can try to call ipv6 routing code at setup time, causing the kernel to
oops due to fib6_rules_ops being NULL.

This change addresses the issue adding a check for the ipv6
module being enabled while validating ipv6 service operations and
adding the same validation for dest operations.

According to git history, this issue is apparently present since
the introduction of ipv6 support, and the oops can be triggered
since commit 09571c7ae30865ad ("IPVS: Add function to determine
if IPv6 address is local")

Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5aeb0dde6ccc..4d753beaac32 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3078,6 +3078,17 @@ static int ip_vs_genl_dump_services(struct sk_buff *skb,
 	return skb->len;
 }
 
+static bool ip_vs_is_af_valid(int af)
+{
+	if (af == AF_INET)
+		return true;
+#ifdef CONFIG_IP_VS_IPV6
+	if (af == AF_INET6 && ipv6_mod_enabled())
+		return true;
+#endif
+	return false;
+}
+
 static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
 				    struct ip_vs_service_user_kern *usvc,
 				    struct nlattr *nla, int full_entry,
@@ -3104,11 +3115,7 @@ static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
 	memset(usvc, 0, sizeof(*usvc));
 
 	usvc->af = nla_get_u16(nla_af);
-#ifdef CONFIG_IP_VS_IPV6
-	if (usvc->af != AF_INET && usvc->af != AF_INET6)
-#else
-	if (usvc->af != AF_INET)
-#endif
+	if (!ip_vs_is_af_valid(usvc->af))
 		return -EAFNOSUPPORT;
 
 	if (nla_fwmark) {
@@ -3610,6 +3617,11 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
 		if (udest.af == 0)
 			udest.af = svc->af;
 
+		if (!ip_vs_is_af_valid(udest.af)) {
+			ret = -EAFNOSUPPORT;
+			goto out;
+		}
+
 		if (udest.af != svc->af && cmd != IPVS_CMD_DEL_DEST) {
 			/* The synchronization protocol is incompatible
 			 * with mixed family services
-- 
2.1.4


^ permalink raw reply related

* [PATCH 12/16] netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 path
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Dave Johnson <dave-kernel@centerclick.org>

When recalculating the outer ICMPv6 checksum for a reverse path NATv6
such as ICMPV6_TIME_EXCEED nf_nat_icmpv6_reply_translation() was
accessing data beyond the headlen of the skb for non-linear skb.  This
resulted in incorrect ICMPv6 checksum as garbage data was used.

Patch replaces csum_partial() with skb_checksum() which supports
non-linear skbs similar to nf_nat_icmp_reply_translation() from ipv4.

Signed-off-by: Dave Johnson <dave-kernel@centerclick.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
index e0be97e636a4..69937b637ee5 100644
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -235,7 +235,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
 		inside->icmp6.icmp6_cksum =
 			csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr,
 					skb->len - hdrlen, IPPROTO_ICMPV6,
-					csum_partial(&inside->icmp6,
+					skb_checksum(skb, hdrlen,
 						     skb->len - hdrlen, 0));
 	}
 
-- 
2.1.4


^ permalink raw reply related

* [PATCH 11/16] netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

Currently, after adding the following nft rules:
  # nft add set x target1 { type ipv4_addr \; flags timeout \;}
  # nft add rule x y set add ip daddr timeout 1d @target1 counter

the counters will always be zero despite of the elements are added
to the dynamic set "target1" or not, as we will break the nft expr
traversal unconditionally:
  # nft list ruleset
  ...
  set target1 {
      ...
      elements = { 8.8.8.8 expires 23h59m53s}
  }
  chain output {
      ...
      set add ip daddr timeout 1d @target1 counter packets 0 bytes 0
                                                           ^       ^
      ...
  }

Since we add the elements to the set successfully, we should continue
to the next expression.

Additionally, if elements are added to "flow table" successfully, we
will _always_ continue to the next expr, even if the operation is
_OP_ADD. So it's better to keep them to be consistent.

Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Reported-by: Robert White <rwhite@pobox.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_dynset.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 049ad2d9ee66..fafbeea3ed04 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -82,8 +82,7 @@ static void nft_dynset_eval(const struct nft_expr *expr,
 		    nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) {
 			timeout = priv->timeout ? : set->timeout;
 			*nft_set_ext_expiration(ext) = jiffies + timeout;
-		} else if (sexpr == NULL)
-			goto out;
+		}
 
 		if (sexpr != NULL)
 			sexpr->ops->eval(sexpr, regs, pkt);
@@ -92,7 +91,7 @@ static void nft_dynset_eval(const struct nft_expr *expr,
 			regs->verdict.code = NFT_BREAK;
 		return;
 	}
-out:
+
 	if (!priv->invert)
 		regs->verdict.code = NFT_BREAK;
 }
-- 
2.1.4


^ permalink raw reply related

* [PATCH 10/16] bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Linus Lüssing <linus.luessing@c0d3.blue>

When trying to redirect bridged frames to the bridge device itself or
a bridge port (brouting) via the dnat target then this currently fails:

The ethernet destination of the frame is dnat'ed to the MAC address of
the bridge device or port just fine. However, the IP code drops it in
the beginning of ip_input.c/ip_rcv() as the dnat target left
the skb->pkt_type as PACKET_OTHERHOST.

Fixing this by resetting skb->pkt_type to an appropriate type after
dnat'ing.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/ebt_dnat.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/net/bridge/netfilter/ebt_dnat.c b/net/bridge/netfilter/ebt_dnat.c
index 4e0b0c359325..e0bb624c3845 100644
--- a/net/bridge/netfilter/ebt_dnat.c
+++ b/net/bridge/netfilter/ebt_dnat.c
@@ -9,6 +9,7 @@
  */
 #include <linux/module.h>
 #include <net/sock.h>
+#include "../br_private.h"
 #include <linux/netfilter.h>
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter_bridge/ebtables.h>
@@ -18,11 +19,30 @@ static unsigned int
 ebt_dnat_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct ebt_nat_info *info = par->targinfo;
+	struct net_device *dev;
 
 	if (!skb_make_writable(skb, 0))
 		return EBT_DROP;
 
 	ether_addr_copy(eth_hdr(skb)->h_dest, info->mac);
+
+	if (is_multicast_ether_addr(info->mac)) {
+		if (is_broadcast_ether_addr(info->mac))
+			skb->pkt_type = PACKET_BROADCAST;
+		else
+			skb->pkt_type = PACKET_MULTICAST;
+	} else {
+		if (xt_hooknum(par) != NF_BR_BROUTING)
+			dev = br_port_get_rcu(xt_in(par))->br->dev;
+		else
+			dev = xt_in(par);
+
+		if (ether_addr_equal(info->mac, dev->dev_addr))
+			skb->pkt_type = PACKET_HOST;
+		else
+			skb->pkt_type = PACKET_OTHERHOST;
+	}
+
 	return info->target;
 }
 
-- 
2.1.4


^ permalink raw reply related

* [PATCH 07/16] netfilter: ctnetlink: make it safer when updating ct->status
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

After converting to use rcu for conntrack hash, one CPU may update
the ct->status via ctnetlink, while another CPU may process the
packets and update the ct->status.

So the non-atomic operation "ct->status |= status;" via ctnetlink
becomes unsafe, and this may clear the IPS_DYING_BIT bit set by
another CPU unexpectedly. For example:
         CPU0                            CPU1
  ctnetlink_change_status        __nf_conntrack_find_get
      old = ct->status              nf_ct_gc_expired
          -                         nf_ct_kill
          -                      test_and_set_bit(IPS_DYING_BIT
      new = old | status;                 -
  ct->status = new; <-- oops, _DYING_ is cleared!

Now using a series of atomic bit operation to solve the above issue.

Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly,
so make these two bits be unchangable too.

If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free,
but actually it is alloced by nf_conntrack_alloc.
If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer
deference, as the nfct_seqadj(ct) maybe NULL.

Last, add some comments to describe the logic change due to the
commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS
processing"), which makes me feel a little confusing.

Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_conntrack_common.h | 13 ++++++---
 net/netfilter/nf_conntrack_netlink.c               | 33 ++++++++++++++++------
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h
index 6a8e33dd4ecb..38fc383139f0 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
@@ -82,10 +82,6 @@ enum ip_conntrack_status {
 	IPS_DYING_BIT = 9,
 	IPS_DYING = (1 << IPS_DYING_BIT),
 
-	/* Bits that cannot be altered from userland. */
-	IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK |
-				 IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING),
-
 	/* Connection has fixed timeout. */
 	IPS_FIXED_TIMEOUT_BIT = 10,
 	IPS_FIXED_TIMEOUT = (1 << IPS_FIXED_TIMEOUT_BIT),
@@ -101,6 +97,15 @@ enum ip_conntrack_status {
 	/* Conntrack got a helper explicitly attached via CT target. */
 	IPS_HELPER_BIT = 13,
 	IPS_HELPER = (1 << IPS_HELPER_BIT),
+
+	/* Be careful here, modifying these bits can make things messy,
+	 * so don't let users modify them directly.
+	 */
+	IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK |
+				 IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING |
+				 IPS_SEQ_ADJUST | IPS_TEMPLATE),
+
+	__IPS_MAX_BIT = 14,
 };
 
 /* Connection tracking event types */
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index e5f97777b1f4..86deed6a8db4 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1419,6 +1419,24 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct,
 }
 #endif
 
+static void
+__ctnetlink_change_status(struct nf_conn *ct, unsigned long on,
+			  unsigned long off)
+{
+	unsigned int bit;
+
+	/* Ignore these unchangable bits */
+	on &= ~IPS_UNCHANGEABLE_MASK;
+	off &= ~IPS_UNCHANGEABLE_MASK;
+
+	for (bit = 0; bit < __IPS_MAX_BIT; bit++) {
+		if (on & (1 << bit))
+			set_bit(bit, &ct->status);
+		else if (off & (1 << bit))
+			clear_bit(bit, &ct->status);
+	}
+}
+
 static int
 ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[])
 {
@@ -1438,10 +1456,7 @@ ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[])
 		/* ASSURED bit can only be set */
 		return -EBUSY;
 
-	/* Be careful here, modifying NAT bits can screw up things,
-	 * so don't let users modify them directly if they don't pass
-	 * nf_nat_range. */
-	ct->status |= status & ~(IPS_NAT_DONE_MASK | IPS_NAT_MASK);
+	__ctnetlink_change_status(ct, status, 0);
 	return 0;
 }
 
@@ -1628,7 +1643,7 @@ ctnetlink_change_seq_adj(struct nf_conn *ct,
 		if (ret < 0)
 			return ret;
 
-		ct->status |= IPS_SEQ_ADJUST;
+		set_bit(IPS_SEQ_ADJUST_BIT, &ct->status);
 	}
 
 	if (cda[CTA_SEQ_ADJ_REPLY]) {
@@ -1637,7 +1652,7 @@ ctnetlink_change_seq_adj(struct nf_conn *ct,
 		if (ret < 0)
 			return ret;
 
-		ct->status |= IPS_SEQ_ADJUST;
+		set_bit(IPS_SEQ_ADJUST_BIT, &ct->status);
 	}
 
 	return 0;
@@ -2289,10 +2304,10 @@ ctnetlink_update_status(struct nf_conn *ct, const struct nlattr * const cda[])
 	/* This check is less strict than ctnetlink_change_status()
 	 * because callers often flip IPS_EXPECTED bits when sending
 	 * an NFQA_CT attribute to the kernel.  So ignore the
-	 * unchangeable bits but do not error out.
+	 * unchangeable bits but do not error out. Also user programs
+	 * are allowed to clear the bits that they are allowed to change.
 	 */
-	ct->status = (status & ~IPS_UNCHANGEABLE_MASK) |
-		     (ct->status & IPS_UNCHANGEABLE_MASK);
+	__ctnetlink_change_status(ct, status, ~status);
 	return 0;
 }
 
-- 
2.1.4


^ permalink raw reply related

* [PATCH 06/16] netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twice
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

Currently, ctnetlink_change_conntrack is always protected by _expect_lock,
but this will cause a deadlock when deleting the helper from a conntrack,
as the _expect_lock will be acquired again by nf_ct_remove_expectations:

         CPU0
        ----
  lock(nf_conntrack_expect_lock);
  lock(nf_conntrack_expect_lock);

  *** DEADLOCK ***
  May be due to missing lock nesting notation

  2 locks held by lt-conntrack_gr/12853:
  #0:  (&table[i].mutex){+.+.+.}, at: [<ffffffffa05e2009>]
       nfnetlink_rcv_msg+0x399/0x6a9 [nfnetlink]
  #1:  (nf_conntrack_expect_lock){+.....}, at: [<ffffffffa05f2c1f>]
       ctnetlink_new_conntrack+0x17f/0x408 [nf_conntrack_netlink]

  Call Trace:
   dump_stack+0x85/0xc2
   __lock_acquire+0x1608/0x1680
   ? ctnetlink_parse_tuple_proto+0x10f/0x1c0 [nf_conntrack_netlink]
   lock_acquire+0x100/0x1f0
   ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
   _raw_spin_lock_bh+0x3f/0x50
   ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
   nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
   ctnetlink_change_helper+0xc6/0x190 [nf_conntrack_netlink]
   ctnetlink_new_conntrack+0x1b2/0x408 [nf_conntrack_netlink]
   nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink]
   ? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink]
   ? nfnetlink_bind+0x1a0/0x1a0 [nfnetlink]
   netlink_rcv_skb+0xa4/0xc0
   nfnetlink_rcv+0x87/0x770 [nfnetlink]

Since the operations are unrelated to nf_ct_expect, so we can drop the
_expect_lock. Also note, after removing the _expect_lock protection,
another CPU may invoke nf_conntrack_helper_unregister, so we should
use rcu_read_lock to protect __nf_conntrack_helper_find invoked by
ctnetlink_change_helper.

Fixes: ca7433df3a67 ("netfilter: conntrack: seperate expect locking from nf_conntrack_lock")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 48c184552de0..e5f97777b1f4 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1510,23 +1510,29 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
 		return 0;
 	}
 
+	rcu_read_lock();
 	helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
 					    nf_ct_protonum(ct));
-	if (helper == NULL)
+	if (helper == NULL) {
+		rcu_read_unlock();
 		return -EOPNOTSUPP;
+	}
 
 	if (help) {
 		if (help->helper == helper) {
 			/* update private helper data if allowed. */
 			if (helper->from_nlattr)
 				helper->from_nlattr(helpinfo, ct);
-			return 0;
+			err = 0;
 		} else
-			return -EBUSY;
+			err = -EBUSY;
+	} else {
+		/* we cannot set a helper for an existing conntrack */
+		err = -EOPNOTSUPP;
 	}
 
-	/* we cannot set a helper for an existing conntrack */
-	return -EOPNOTSUPP;
+	rcu_read_unlock();
+	return err;
 }
 
 static int ctnetlink_change_timeout(struct nf_conn *ct,
@@ -1945,9 +1951,7 @@ static int ctnetlink_new_conntrack(struct net *net, struct sock *ctnl,
 	err = -EEXIST;
 	ct = nf_ct_tuplehash_to_ctrack(h);
 	if (!(nlh->nlmsg_flags & NLM_F_EXCL)) {
-		spin_lock_bh(&nf_conntrack_expect_lock);
 		err = ctnetlink_change_conntrack(ct, cda);
-		spin_unlock_bh(&nf_conntrack_expect_lock);
 		if (err == 0) {
 			nf_conntrack_eventmask_report((1 << IPCT_REPLY) |
 						      (1 << IPCT_ASSURED) |
@@ -2342,11 +2346,7 @@ ctnetlink_glue_parse(const struct nlattr *attr, struct nf_conn *ct)
 	if (ret < 0)
 		return ret;
 
-	spin_lock_bh(&nf_conntrack_expect_lock);
-	ret = ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct);
-	spin_unlock_bh(&nf_conntrack_expect_lock);
-
-	return ret;
+	return ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct);
 }
 
 static int ctnetlink_glue_exp_parse(const struct nlattr * const *cda,
-- 
2.1.4


^ permalink raw reply related

* [PATCH 05/16] netfilter: ctnetlink: drop the incorrect cthelper module request
From: Pablo Neira Ayuso @ 2017-05-03  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

First, when creating a new ct, we will invoke request_module to try to
load the related inkernel cthelper. So there's no need to call the
request_module again when updating the ct helpinfo.

Second, ctnetlink_change_helper may be called with rcu_read_lock held,
i.e. rcu_read_lock -> nfqnl_recv_verdict -> nfqnl_ct_parse ->
ctnetlink_glue_parse -> ctnetlink_glue_parse_ct ->
ctnetlink_change_helper. But the request_module invocation may sleep,
so we can't call it with the rcu_read_lock held.

Remove it now.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index dc7dfd68fafe..48c184552de0 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1512,23 +1512,8 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
 
 	helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
 					    nf_ct_protonum(ct));
-	if (helper == NULL) {
-#ifdef CONFIG_MODULES
-		spin_unlock_bh(&nf_conntrack_expect_lock);
-
-		if (request_module("nfct-helper-%s", helpname) < 0) {
-			spin_lock_bh(&nf_conntrack_expect_lock);
-			return -EOPNOTSUPP;
-		}
-
-		spin_lock_bh(&nf_conntrack_expect_lock);
-		helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
-						    nf_ct_protonum(ct));
-		if (helper)
-			return -EAGAIN;
-#endif
+	if (helper == NULL)
 		return -EOPNOTSUPP;
-	}
 
 	if (help) {
 		if (help->helper == helper) {
-- 
2.1.4


^ permalink raw reply related

* [PATCH 03/16] netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlink
From: Pablo Neira Ayuso @ 2017-05-03  9:31 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

cthelpers added via nfnetlink may have the same tuple, i.e. except for
the l3proto and l4proto, other fields are all zero. So even with the
different names, we will also fail to add them:
  # nfct helper add ssdp inet udp
  # nfct helper add tftp inet udp
  nfct v1.4.3: netlink error: File exists

So in order to avoid unpredictable behaviour, we should:
1. cthelpers can be selected by nft ct helper obj or xt_CT target, so
report error if duplicated { name, l3proto, l4proto } tuple exist.
2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when
nf_ct_auto_assign_helper is enabled, so also report error if duplicated
{ l3proto, l4proto, src-port } tuple exist.

Also note, if the cthelper is added from userspace, then the src-port will
always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no
need to check the second point listed above.

Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_helper.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 4eeb3418366a..99bcd44aac70 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -386,17 +386,33 @@ int nf_conntrack_helper_register(struct nf_conntrack_helper *me)
 	struct nf_conntrack_tuple_mask mask = { .src.u.all = htons(0xFFFF) };
 	unsigned int h = helper_hash(&me->tuple);
 	struct nf_conntrack_helper *cur;
-	int ret = 0;
+	int ret = 0, i;
 
 	BUG_ON(me->expect_policy == NULL);
 	BUG_ON(me->expect_class_max >= NF_CT_MAX_EXPECT_CLASSES);
 	BUG_ON(strlen(me->name) > NF_CT_HELPER_NAME_LEN - 1);
 
 	mutex_lock(&nf_ct_helper_mutex);
-	hlist_for_each_entry(cur, &nf_ct_helper_hash[h], hnode) {
-		if (nf_ct_tuple_src_mask_cmp(&cur->tuple, &me->tuple, &mask)) {
-			ret = -EEXIST;
-			goto out;
+	for (i = 0; i < nf_ct_helper_hsize; i++) {
+		hlist_for_each_entry(cur, &nf_ct_helper_hash[i], hnode) {
+			if (!strcmp(cur->name, me->name) &&
+			    (cur->tuple.src.l3num == NFPROTO_UNSPEC ||
+			     cur->tuple.src.l3num == me->tuple.src.l3num) &&
+			    cur->tuple.dst.protonum == me->tuple.dst.protonum) {
+				ret = -EEXIST;
+				goto out;
+			}
+		}
+	}
+
+	/* avoid unpredictable behaviour for auto_assign_helper */
+	if (!(me->flags & NF_CT_HELPER_F_USERSPACE)) {
+		hlist_for_each_entry(cur, &nf_ct_helper_hash[h], hnode) {
+			if (nf_ct_tuple_src_mask_cmp(&cur->tuple, &me->tuple,
+						     &mask)) {
+				ret = -EEXIST;
+				goto out;
+			}
 		}
 	}
 	hlist_add_head_rcu(&me->hnode, &nf_ct_helper_hash[h]);
-- 
2.1.4


^ permalink raw reply related

* [PATCH 02/16] openvswitch: Delete conntrack entry clashing with an expectation.
From: Pablo Neira Ayuso @ 2017-05-03  9:31 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1493803931-2837-1-git-send-email-pablo@netfilter.org>

From: Jarno Rajahalme <jarno@ovn.org>

Conntrack helpers do not check for a potentially clashing conntrack
entry when creating a new expectation.  Also, nf_conntrack_in() will
check expectations (via init_conntrack()) only if a conntrack entry
can not be found.  The expectation for a packet which also matches an
existing conntrack entry will not be removed by conntrack, and is
currently handled inconsistently by OVS, as OVS expects the
expectation to be removed when the connection tracking entry matching
that expectation is confirmed.

It should be noted that normally an IP stack would not allow reuse of
a 5-tuple of an old (possibly lingering) connection for a new data
connection, so this is somewhat unlikely corner case.  However, it is
possible that a misbehaving source could cause conntrack entries be
created that could then interfere with new related connections.

Fix this in the OVS module by deleting the clashing conntrack entry
after an expectation has been matched.  This causes the following
nf_conntrack_in() call also find the expectation and remove it when
creating the new conntrack entry, as well as the forthcoming reply
direction packets to match the new related connection instead of the
old clashing conntrack entry.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Reported-by: Yang Song <yangsong@vmware.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/openvswitch/conntrack.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7b2c2fce408a..c92a5795dcda 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -514,10 +514,38 @@ ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
 		   u16 proto, const struct sk_buff *skb)
 {
 	struct nf_conntrack_tuple tuple;
+	struct nf_conntrack_expect *exp;
 
 	if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple))
 		return NULL;
-	return __nf_ct_expect_find(net, zone, &tuple);
+
+	exp = __nf_ct_expect_find(net, zone, &tuple);
+	if (exp) {
+		struct nf_conntrack_tuple_hash *h;
+
+		/* Delete existing conntrack entry, if it clashes with the
+		 * expectation.  This can happen since conntrack ALGs do not
+		 * check for clashes between (new) expectations and existing
+		 * conntrack entries.  nf_conntrack_in() will check the
+		 * expectations only if a conntrack entry can not be found,
+		 * which can lead to OVS finding the expectation (here) in the
+		 * init direction, but which will not be removed by the
+		 * nf_conntrack_in() call, if a matching conntrack entry is
+		 * found instead.  In this case all init direction packets
+		 * would be reported as new related packets, while reply
+		 * direction packets would be reported as un-related
+		 * established packets.
+		 */
+		h = nf_conntrack_find_get(net, zone, &tuple);
+		if (h) {
+			struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+
+			nf_ct_delete(ct, 0, 0);
+			nf_conntrack_put(&ct->ct_general);
+		}
+	}
+
+	return exp;
 }
 
 /* This replicates logic from nf_conntrack_core.c that is not exported. */
-- 
2.1.4


^ permalink raw reply related

* [PATCH 00/16] Netfilter/IPVS/OVS fixes for net
From: Pablo Neira Ayuso @ 2017-05-03  9:31 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains a rather large batch of Netfilter, IPVS
and OVS fixes for your net tree. This includes fixes for ctnetlink, the
userspace conntrack helper infrastructure, conntrack OVS support,
ebtables DNAT target, several leaks in error path among other. More
specifically, they are:

1) Fix reference count leak in the CT target error path, from Gao Feng.

2) Remove conntrack entry clashing with a matching expectation, patch
   from Jarno Rajahalme.

3) Fix bogus EEXIST when registering two different userspace helpers,
   from Liping Zhang.

4) Don't leak dummy elements in the new bitmap set type in nf_tables,
   from Liping Zhang.

5) Get rid of module autoload from conntrack update path in ctnetlink,
   we don't need autoload at this late stage and it is happening with
   rcu read lock held which is not good. From Liping Zhang.

6) Fix deadlock due to double-acquire of the expect_lock from conntrack
   update path, this fixes a bug that was introduced when the central
   spinlock got removed. Again from Liping Zhang.

7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock
   protection that was selected when the central spinlock was removed was
   not really protecting anything at all.

8) Protect sequence adjustment under ct->lock.

9) Missing socket match with IPv6, from Peter Tirsek.

10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from
    Linus Luessing.

11) Don't give up on evaluating the expression on new entries added via
    dynset expression in nf_tables, from Liping Zhang.

12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals
    with non-linear skbuffs.

13) Don't allow IPv6 service in IPVS if no IPv6 support is available,
    from Paolo Abeni.

14) Missing mutex release in error path of xt_find_table_lock(), from
    Dan Carpenter.

15) Update maintainers files, Netfilter section. Add Florian to the
    file, refer to nftables.org and change project status from Supported
    to Maintained.

16) Bail out on mismatching extensions in element updates in nf_tables.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772:

  Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 16:37:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 9744a6fcefcb4d56501d69adb04c24559d353cad:

  netfilter: nf_tables: check if same extensions are set when adding elements (2017-05-03 10:58:00 +0200)

----------------------------------------------------------------
Dan Carpenter (1):
      netfilter: x_tables: unlock on error in xt_find_table_lock()

Dave Johnson (1):
      netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 path

Gao Feng (1):
      netfilter: xt_CT: fix refcnt leak on error path

Jarno Rajahalme (1):
      openvswitch: Delete conntrack entry clashing with an expectation.

Linus Lüssing (1):
      bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port

Liping Zhang (7):
      netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlink
      netfilter: nft_set_bitmap: free dummy elements when destroy the set
      netfilter: ctnetlink: drop the incorrect cthelper module request
      netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twice
      netfilter: ctnetlink: make it safer when updating ct->status
      netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadj
      netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded

Pablo Neira Ayuso (3):
      Merge tag 'ipvs-fixes-for-v4.11' of http://git.kernel.org/.../horms/ipvs
      netfilter: update MAINTAINERS file
      netfilter: nf_tables: check if same extensions are set when adding elements

Paolo Abeni (1):
      ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled

Peter Tirsek (1):
      netfilter: xt_socket: Fix broken IPv6 handling

 MAINTAINERS                                        |  4 +-
 include/uapi/linux/netfilter/nf_conntrack_common.h | 13 +++-
 net/bridge/netfilter/ebt_dnat.c                    | 20 +++++
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c           |  2 +-
 net/netfilter/ipvs/ip_vs_ctl.c                     | 22 ++++--
 net/netfilter/nf_conntrack_helper.c                | 26 +++++--
 net/netfilter/nf_conntrack_netlink.c               | 89 ++++++++++++----------
 net/netfilter/nf_tables_api.c                      |  5 ++
 net/netfilter/nft_dynset.c                         |  5 +-
 net/netfilter/nft_set_bitmap.c                     |  5 ++
 net/netfilter/x_tables.c                           |  4 +-
 net/netfilter/xt_CT.c                              | 11 ++-
 net/netfilter/xt_socket.c                          |  2 +-
 net/openvswitch/conntrack.c                        | 30 +++++++-
 14 files changed, 174 insertions(+), 64 deletions(-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox