netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] src: trace infrastructure support
@ 2015-11-26 17:59 Florian Westphal
  2015-11-26 17:59 ` [PATCH v2 libnftnl 1/3] src: add " Florian Westphal
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Florian Westphal @ 2015-11-26 17:59 UTC (permalink / raw)
  To: netfilter-devel

This is V2 of the nft trace patch set.

See the individual patches for a change vs. v1 description.

Patrick: I did a quick test w. your last public patches
(plus a few minor changes to make it build again) and things
seem to work fine.

What does not work correctly is decoding of vlan header.
I think whats happening is that we cannot splice sub-byte
sized quantities from a larger chunk.

Other than that, things appear to work correctly.

Cheers,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 libnftnl 1/3] src: add trace infrastructure support
  2015-11-26 17:59 [PATCH v2 0/3] src: trace infrastructure support Florian Westphal
@ 2015-11-26 17:59 ` Florian Westphal
  2015-11-26 22:11   ` Patrick McHardy
  2015-11-26 17:59 ` [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2015-11-26 17:59 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

parses trace monitor netlink messages from the kernel and builds
nftnl_trace struct that contains the dissected information.

Provides getters to access these attributes.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since v1:
  - removed all printf support, not needed by Patricks nft patch
  - adjust for changed NFTA_TRACE* kernel attributes
  - VERDICT is now nested, so treat it accordingly
  - removed TRANSPORTPROTO, seems its not needed at this time
  - add NFTNL_TRACE_QUEUE_ID to obtain nfqueue number
  - NF_VERDICT is normalized in this case, i.e.
  if verdict from kernel is '42 << 16 | NF_QUEUE', make
  NF_QUEUE the verdict and put the queue number ->queue_id, accessible
  via NFTNL_TRACE_QUEUE_ID.

 include/libnftnl/Makefile.am        |   1 +
 include/libnftnl/trace.h            |  52 +++++
 include/linux/netfilter/nf_tables.h |  46 +++++
 src/Makefile.am                     |   1 +
 src/libnftnl.map                    |  15 ++
 src/trace.c                         | 400 ++++++++++++++++++++++++++++++++++++
 src/utils.c                         |  12 ++
 7 files changed, 527 insertions(+)
 create mode 100644 include/libnftnl/trace.h
 create mode 100644 src/trace.c

diff --git a/include/libnftnl/Makefile.am b/include/libnftnl/Makefile.am
index a20aaee..84f01b6 100644
--- a/include/libnftnl/Makefile.am
+++ b/include/libnftnl/Makefile.am
@@ -1,5 +1,6 @@
 pkginclude_HEADERS = batch.h		\
 		     table.h		\
+		     trace.h		\
 		     chain.h		\
 		     rule.h		\
 		     expr.h		\
diff --git a/include/libnftnl/trace.h b/include/libnftnl/trace.h
new file mode 100644
index 0000000..1d94013
--- /dev/null
+++ b/include/libnftnl/trace.h
@@ -0,0 +1,52 @@
+#ifndef _LIBNFTNL_TRACE_H_
+#define _LIBNFTNL_TRACE_H_
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <sys/types.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum nftnl_trace_attr {
+	NFTNL_TRACE_CHAIN = 0,
+	NFTNL_TRACE_FAMILY,
+	NFTNL_TRACE_ID,
+	NFTNL_TRACE_IIF,
+	NFTNL_TRACE_IIFTYPE,
+	NFTNL_TRACE_JUMP_TARGET,
+	NFTNL_TRACE_OIF,
+	NFTNL_TRACE_MARK,
+	NFTNL_TRACE_LL_HEADER,
+	NFTNL_TRACE_NETWORK_HEADER,
+	NFTNL_TRACE_TRANSPORT_HEADER,
+	NFTNL_TRACE_TABLE,
+	NFTNL_TRACE_TYPE,
+	NFTNL_TRACE_RULE_HANDLE,
+	NFTNL_TRACE_VERDICT,
+	NFTNL_TRACE_QUEUE_ID,
+};
+#define NFTNL_TRACE_MAX NFTNL_TRACE_VLAN_TAG
+
+struct nftnl_trace;
+
+struct nftnl_trace *nftnl_trace_alloc(void);
+void nftnl_trace_free(struct nftnl_trace *trace);
+
+bool nftnl_trace_is_set(const struct nftnl_trace *trace, uint16_t type);
+
+const void *nftnl_trace_get_data(const struct nftnl_trace *trace,
+				 uint16_t type, uint32_t *data_len);
+
+uint16_t nftnl_trace_get_u16(const struct nftnl_trace *trace, uint16_t type);
+uint32_t nftnl_trace_get_u32(const struct nftnl_trace *trace, uint16_t type);
+uint64_t nftnl_trace_get_u64(const struct nftnl_trace *trace, uint16_t type);
+const char *nftnl_trace_get_str(const struct nftnl_trace *trace, uint16_t type);
+
+int nftnl_trace_nlmsg_parse(const struct nlmsghdr *nlh, struct nftnl_trace *t);
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif /* _LIBNFTNL_TRACE_H_ */
diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h
index 9796d82..a37fcc4 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -83,6 +83,7 @@ enum nft_verdicts {
  * @NFT_MSG_DELSETELEM: delete a set element (enum nft_set_elem_attributes)
  * @NFT_MSG_NEWGEN: announce a new generation, only for events (enum nft_gen_attributes)
  * @NFT_MSG_GETGEN: get the rule-set generation (enum nft_gen_attributes)
+ * @NFT_MSG_TRACE: trace event (enum nft_trace_attributes)
  */
 enum nf_tables_msg_types {
 	NFT_MSG_NEWTABLE,
@@ -102,6 +103,7 @@ enum nf_tables_msg_types {
 	NFT_MSG_DELSETELEM,
 	NFT_MSG_NEWGEN,
 	NFT_MSG_GETGEN,
+	NFT_MSG_TRACE,
 	NFT_MSG_MAX,
 };
 
@@ -970,4 +972,48 @@ enum nft_gen_attributes {
 };
 #define NFTA_GEN_MAX		(__NFTA_GEN_MAX - 1)
 
+/**
+ * enum nft_trace_attributes - nf_tables trace netlink attributes
+ *
+ * @NFTA_TRACE_TABLE: name of the table (NLA_STRING)
+ * @NFTA_TRACE_CHAIN: name of the chain (NLA_STRING)
+ * @NFTA_TRACE_RULE_HANDLE: numeric handle of the rule (NLA_U64)
+ * @NFTA_TRACE_TYPE: type of the event (NLA_U32: nft_trace_types)
+ * @NFTA_TRACE_VERDICT: verdict returned by hook (NLA_NESTED: nft_verdicts)
+ * @NFTA_TRACE_ID: pseudo-id, same for each skb traced (NLA_U32)
+ * @NFTA_TRACE_LL_HEADER: linklayer header (NLA_BINARY)
+ * @NFTA_TRACE_NETWORK_HEADER: network header (NLA_BINARY)
+ * @NFTA_TRACE_TRANSPORT_HEADER: transport header (NLA_BINARY)
+ * @NFTA_TRACE_IIFTYPE: netdev->type of indev (NLA_U16)
+ * @NFTA_TRACE_IIF: indev ifindex (NLA_U32)
+ * @NFTA_TRACE_OIF: outdev ifindex (NLA_U32)
+ * @NFTA_TRACE_MARK: nfmark (NLA_U32)
+ */
+enum nft_trace_attibutes {
+	NFTA_TRACE_UNSPEC,
+	NFTA_TRACE_TABLE,
+	NFTA_TRACE_CHAIN,
+	NFTA_TRACE_RULE_HANDLE,
+	NFTA_TRACE_TYPE,
+	NFTA_TRACE_VERDICT,
+	NFTA_TRACE_ID,
+	NFTA_TRACE_LL_HEADER,
+	NFTA_TRACE_NETWORK_HEADER,
+	NFTA_TRACE_TRANSPORT_HEADER,
+	NFTA_TRACE_IIFTYPE,
+	NFTA_TRACE_IIF,
+	NFTA_TRACE_OIF,
+	NFTA_TRACE_MARK,
+	__NFTA_TRACE_MAX
+};
+#define NFTA_TRACE_MAX (__NFTA_TRACE_MAX - 1)
+
+enum nft_trace_types {
+	NFT_TRACETYPE_UNSPEC,
+	NFT_TRACETYPE_POLICY,
+	NFT_TRACETYPE_RETURN,
+	NFT_TRACETYPE_RULE,
+	__NFT_TRACETYPE_MAX
+};
+#define NFT_TRACETYPE_MAX (__NFT_TRACETYPE_MAX - 1)
 #endif /* _LINUX_NF_TABLES_H */
diff --git a/src/Makefile.am b/src/Makefile.am
index 107cae5..795307d 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -11,6 +11,7 @@ libnftnl_la_SOURCES = utils.c		\
 		      common.c		\
 		      gen.c		\
 		      table.c		\
+		      trace.c		\
 		      chain.c		\
 		      rule.c		\
 		      set.c		\
diff --git a/src/libnftnl.map b/src/libnftnl.map
index a52b54e..2e193b7 100644
--- a/src/libnftnl.map
+++ b/src/libnftnl.map
@@ -498,3 +498,18 @@ global:
 
 local: *;
 };
+
+LIBNFTNL_4.1 {
+	nftnl_trace_alloc;
+	nftnl_trace_free;
+
+	nftnl_trace_is_set;
+
+	nftnl_trace_get_u16;
+	nftnl_trace_get_u32;
+	nftnl_trace_get_u64;
+	nftnl_trace_get_str;
+	nftnl_trace_get_data;
+
+	nftnl_trace_nlmsg_parse;
+} LIBNFTNL_4;
diff --git a/src/trace.c b/src/trace.c
new file mode 100644
index 0000000..0fb1c74
--- /dev/null
+++ b/src/trace.c
@@ -0,0 +1,400 @@
+/*
+ * (C) 2015 Red Hat GmbH
+ * Author: Florian Westphal <fw@strlen.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published
+ * by the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include "internal.h"
+
+#include <time.h>
+#include <endian.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <netinet/in.h>
+
+#include <libmnl/libmnl.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/nf_tables.h>
+
+#include <libnftnl/trace.h>
+
+struct nftnl_header_data {
+	char *data;
+	unsigned int len;
+};
+
+struct nftnl_trace {
+	char *table;
+	char *chain;
+	char *jump_target;
+	uint64_t rule_handle;
+	struct nftnl_header_data ll;
+	struct nftnl_header_data nh;
+	struct nftnl_header_data th;
+	uint32_t family;
+	uint32_t type;
+	uint32_t id;
+	uint32_t iif;
+	uint32_t oif;
+	uint32_t mark;
+	uint32_t verdict;
+	uint16_t vlan_tag;
+	uint16_t queue_id;
+	uint16_t iiftype;
+
+	uint32_t flags;
+};
+
+EXPORT_SYMBOL(nftnl_trace_alloc);
+struct nftnl_trace *nftnl_trace_alloc(void)
+{
+	return calloc(1, sizeof(struct nftnl_trace));
+}
+
+EXPORT_SYMBOL(nftnl_trace_free);
+void nftnl_trace_free(struct nftnl_trace *t)
+{
+	xfree(t->chain);
+	xfree(t->table);
+	xfree(t->jump_target);
+	xfree(t->ll.data);
+	xfree(t->nh.data);
+	xfree(t->th.data);
+	xfree(t);
+}
+
+EXPORT_SYMBOL(nftnl_trace_is_set);
+bool nftnl_trace_is_set(const struct nftnl_trace *t, uint16_t attr)
+{
+	return t->flags & (1 << attr);
+}
+
+static int nftnl_trace_parse_attr_cb(const struct nlattr *attr, void *data)
+{
+	const struct nlattr **tb = data;
+	enum nft_trace_attibutes type = mnl_attr_get_type(attr);
+
+	if (mnl_attr_type_valid(attr, NFTA_TRACE_MAX) < 0)
+		return MNL_CB_OK;
+
+	switch (type) {
+	case NFTA_TRACE_UNSPEC:
+	case __NFTA_TRACE_MAX:
+		break;
+	case NFTA_TRACE_VERDICT:
+                if (mnl_attr_validate(attr, MNL_TYPE_NESTED) < 0)
+			abi_breakage();
+		break;
+	case NFTA_TRACE_IIFTYPE:
+		if (mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+			abi_breakage();
+		break;
+	case NFTA_TRACE_ID:
+	case NFTA_TRACE_IIF:
+	case NFTA_TRACE_MARK:
+	case NFTA_TRACE_OIF:
+	case NFTA_TRACE_TYPE:
+		if (mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+			abi_breakage();
+		break;
+	case NFTA_TRACE_CHAIN:
+	case NFTA_TRACE_TABLE:
+		if (mnl_attr_validate(attr, MNL_TYPE_STRING) < 0)
+			abi_breakage();
+		break;
+	case NFTA_TRACE_RULE_HANDLE:
+		if (mnl_attr_validate(attr, MNL_TYPE_U64) < 0)
+			abi_breakage();
+		break;
+	case NFTA_TRACE_LL_HEADER:	/* fallthrough */
+	case NFTA_TRACE_NETWORK_HEADER:
+	case NFTA_TRACE_TRANSPORT_HEADER:
+		if (mnl_attr_get_payload_len(attr) == 0)
+			abi_breakage();
+		break;
+	};
+
+	tb[type] = attr;
+	return MNL_CB_OK;
+}
+
+EXPORT_SYMBOL(nftnl_trace_get_data);
+const void *nftnl_trace_get_data(const struct nftnl_trace *trace,
+				 uint16_t type, uint32_t *data_len)
+{
+	enum nftnl_trace_attr attr = type;
+
+	if (!(trace->flags & (1 << type)))
+		return NULL;
+
+	switch (attr) {
+	case NFTNL_TRACE_FAMILY:
+		*data_len = sizeof(uint32_t);
+		return &trace->family;
+	case NFTNL_TRACE_ID:
+		*data_len = sizeof(uint32_t);
+		return &trace->id;
+	case NFTNL_TRACE_IIF:
+		*data_len = sizeof(uint32_t);
+		return &trace->iif;
+	case NFTNL_TRACE_OIF:
+		*data_len = sizeof(uint32_t);
+		return &trace->oif;
+	case NFTNL_TRACE_LL_HEADER:
+		*data_len = trace->ll.len;
+		return trace->ll.data;
+	case NFTNL_TRACE_MARK:
+		*data_len = sizeof(uint32_t);
+		return &trace->mark;
+	case NFTNL_TRACE_NETWORK_HEADER:
+		*data_len = trace->nh.len;
+		return trace->nh.data;
+	case NFTNL_TRACE_TYPE:
+		*data_len = sizeof(uint32_t);
+		return &trace->type;
+	case NFTNL_TRACE_CHAIN:
+		*data_len = strlen(trace->chain);
+		return trace->chain;
+	case NFTNL_TRACE_TABLE:
+		*data_len = strlen(trace->table);
+		return trace->table;
+	case NFTNL_TRACE_JUMP_TARGET:
+		*data_len = strlen(trace->jump_target);
+		return trace->jump_target;
+	case NFTNL_TRACE_TRANSPORT_HEADER:
+		*data_len = trace->th.len;
+		return trace->th.data;
+	case NFTNL_TRACE_RULE_HANDLE:
+		*data_len = sizeof(uint64_t);
+		return &trace->rule_handle;
+	case NFTNL_TRACE_VERDICT:
+		*data_len = sizeof(uint32_t);
+		return &trace->verdict;
+	case NFTNL_TRACE_IIFTYPE:
+		*data_len = sizeof(uint16_t);
+		return &trace->iiftype;
+	case NFTNL_TRACE_QUEUE_ID:
+		*data_len = sizeof(uint16_t);
+		return &trace->queue_id;
+	}
+
+	return NULL;
+}
+
+EXPORT_SYMBOL(nftnl_trace_get_str);
+const char *nftnl_trace_get_str(const struct nftnl_trace *trace, uint16_t type)
+{
+	if (!nftnl_trace_is_set(trace, type))
+		return NULL;
+
+	switch (type) {
+	case NFTNL_TRACE_CHAIN: return trace->chain;
+	case NFTNL_TRACE_TABLE: return trace->table;
+	default: break;
+	}
+	return NULL;
+}
+
+EXPORT_SYMBOL(nftnl_trace_get_u16);
+uint16_t nftnl_trace_get_u16(const struct nftnl_trace *trace, uint16_t type)
+{
+	const uint16_t *d;
+	uint32_t dlen;
+
+	d = nftnl_trace_get_data(trace, type, &dlen);
+	if (d && dlen == sizeof(*d))
+		return *d;
+
+	return 0;
+}
+
+EXPORT_SYMBOL(nftnl_trace_get_u32);
+uint32_t nftnl_trace_get_u32(const struct nftnl_trace *trace, uint16_t type)
+{
+	const uint32_t *d;
+	uint32_t dlen;
+
+	d = nftnl_trace_get_data(trace, type, &dlen);
+	if (d && dlen == sizeof(*d))
+		return *d;
+
+	return 0;
+}
+
+EXPORT_SYMBOL(nftnl_trace_get_u64);
+uint64_t nftnl_trace_get_u64(const struct nftnl_trace *trace, uint16_t type)
+{
+	const uint64_t *d;
+	uint32_t dlen;
+
+	d = nftnl_trace_get_data(trace, type, &dlen);
+	if (d && dlen == sizeof(*d))
+		return *d;
+
+	return 0;
+}
+
+static bool nftnl_trace_nlmsg_parse_hdrdata(struct nlattr *attr,
+					    struct nftnl_header_data *header)
+{
+	uint32_t len;
+
+	if (!attr)
+		return false;
+
+	len = mnl_attr_get_payload_len(attr);
+
+	header->data = malloc(len);
+	if (header->data) {
+		memcpy(header->data, mnl_attr_get_payload(attr), len);
+		header->len = len;
+		return true;
+	}
+
+	return false;
+}
+
+static int nftnl_trace_parse_verdict_cb(const struct nlattr *attr, void *data)
+{
+	int type = mnl_attr_get_type(attr);
+	const struct nlattr **tb = data;
+
+	switch (type) {
+	case NFTA_VERDICT_CODE:
+		if (mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+			abi_breakage();
+		tb[type] = attr;
+		break;
+	case NFTA_VERDICT_CHAIN:
+		if (mnl_attr_validate(attr, MNL_TYPE_STRING) < 0)
+			abi_breakage();
+		tb[type] = attr;
+		break;
+	}
+
+	return MNL_CB_OK;
+}
+
+static void
+nftnl_trace_parse_verdict(const struct nlattr *attr, struct nftnl_trace *t)
+{
+	struct nlattr *tb[NFTA_VERDICT_MAX+1];
+
+	mnl_attr_parse_nested(attr, nftnl_trace_parse_verdict_cb, tb);
+
+	if (!tb[NFTA_VERDICT_CODE])
+		abi_breakage();
+
+	t->verdict = ntohl(mnl_attr_get_u32(tb[NFTA_VERDICT_CODE]));
+	t->flags |= (1 << NFTNL_TRACE_VERDICT);
+
+	switch (t->verdict) {
+	case NFT_GOTO: /* fallthough */
+	case NFT_JUMP:
+		if (!tb[NFTA_VERDICT_CHAIN])
+			abi_breakage();
+		t->jump_target = strdup(mnl_attr_get_str(tb[NFTA_VERDICT_CHAIN]));
+		if (t->jump_target)
+			t->flags |= (1 << NFTNL_TRACE_JUMP_TARGET);
+		break;
+	case NFT_RETURN: /* all other NFT_* cases fall through */
+	case NFT_CONTINUE:
+	case NFT_BREAK:
+		break;
+	case NF_ACCEPT: /* standard verdicts fall though */
+	case NF_DROP:
+	case NF_STOLEN:
+	case NF_REPEAT:
+	case NF_STOP:
+		break;
+	default: /* Unknown NF_ verdict, or verdict contains extra data */
+		switch (t->verdict & NF_VERDICT_MASK) {
+		case NF_QUEUE:
+			t->queue_id = t->verdict >> 16;
+			t->verdict = NF_QUEUE;
+			t->flags |= (1 << NFTNL_TRACE_QUEUE_ID);
+			break;
+		}
+		break;
+	}
+}
+
+EXPORT_SYMBOL(nftnl_trace_nlmsg_parse);
+int nftnl_trace_nlmsg_parse(const struct nlmsghdr *nlh, struct nftnl_trace *t)
+{
+	struct nfgenmsg *nfg = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[NFTA_TRACE_MAX+1] = {};
+
+	if (mnl_attr_parse(nlh, sizeof(*nfg), nftnl_trace_parse_attr_cb, tb) < 0)
+		return -1;
+
+	if (!tb[NFTA_TRACE_ID])
+		abi_breakage();
+
+	if (!tb[NFTA_TRACE_TYPE])
+		abi_breakage();
+
+	if (tb[NFTA_TRACE_TABLE])
+		t->table = strdup(mnl_attr_get_str(tb[NFTA_TRACE_TABLE]));
+	if (tb[NFTA_TRACE_CHAIN])
+		t->chain = strdup(mnl_attr_get_str(tb[NFTA_TRACE_CHAIN]));
+
+	t->family = nfg->nfgen_family;
+	t->flags |= (1 << NFTNL_TRACE_FAMILY);
+
+	t->type = ntohl(mnl_attr_get_u32(tb[NFTA_TRACE_TYPE]));
+	t->flags |= (1 << NFTNL_TRACE_TYPE);
+
+	t->id = ntohl(mnl_attr_get_u32(tb[NFTA_TRACE_ID]));
+	t->flags |= (1 << NFTNL_TRACE_ID);
+
+	if (tb[NFTA_TRACE_IIFTYPE]) {
+		t->iiftype = ntohs(mnl_attr_get_u16(tb[NFTA_TRACE_IIFTYPE]));
+		t->flags |= (1 << NFTNL_TRACE_IIFTYPE);
+	}
+
+	if (tb[NFTA_TRACE_IIF]) {
+		t->iif = ntohl(mnl_attr_get_u32(tb[NFTA_TRACE_IIF]));
+		t->flags |= (1 << NFTNL_TRACE_IIF);
+	}
+
+	if (tb[NFTA_TRACE_OIF]) {
+		t->oif = ntohl(mnl_attr_get_u32(tb[NFTA_TRACE_OIF]));
+		t->flags |= (1 << NFTNL_TRACE_OIF);
+	}
+
+	if (tb[NFTA_TRACE_MARK]) {
+		t->mark = ntohl(mnl_attr_get_u32(tb[NFTA_TRACE_MARK]));
+		t->flags |= (1 << NFTNL_TRACE_MARK);
+	}
+
+	if (tb[NFTA_TRACE_RULE_HANDLE]) {
+		t->rule_handle = be64toh(mnl_attr_get_u64(tb[NFTA_TRACE_RULE_HANDLE]));
+		t->flags |= (1 << NFTNL_TRACE_RULE_HANDLE);
+	}
+
+	if (tb[NFTA_TRACE_VERDICT])
+		nftnl_trace_parse_verdict(tb[NFTA_TRACE_VERDICT], t);
+
+	if (nftnl_trace_nlmsg_parse_hdrdata(tb[NFTA_TRACE_LL_HEADER], &t->ll))
+		t->flags |= (1 << NFTNL_TRACE_LL_HEADER);
+
+	if (nftnl_trace_nlmsg_parse_hdrdata(tb[NFTA_TRACE_NETWORK_HEADER], &t->nh))
+		t->flags |= (1 << NFTNL_TRACE_NETWORK_HEADER);
+
+	if (nftnl_trace_nlmsg_parse_hdrdata(tb[NFTA_TRACE_TRANSPORT_HEADER], &t->th))
+		t->flags |= (1 << NFTNL_TRACE_TRANSPORT_HEADER);
+
+	if (t->chain)
+		t->flags |= (1 << NFTNL_TRACE_CHAIN);
+	if (t->table)
+		t->flags |= (1 << NFTNL_TRACE_TABLE);
+
+	return 0;
+}
diff --git a/src/utils.c b/src/utils.c
index 84fbe94..ba36bc4 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -146,12 +146,24 @@ const char *nftnl_verdict2str(uint32_t verdict)
 		return "accept";
 	case NF_DROP:
 		return "drop";
+	case NF_STOLEN:
+		return "stolen";
+	case NF_QUEUE:
+		return "queue";
+	case NF_REPEAT:
+		return "repeat";
+	case NF_STOP:
+		return "stop";
 	case NFT_RETURN:
 		return "return";
 	case NFT_JUMP:
 		return "jump";
 	case NFT_GOTO:
 		return "goto";
+	case NFT_CONTINUE:
+		return "continue";
+	case NFT_BREAK:
+		return "break";
 	default:
 		return "unknown";
 	}
-- 
2.4.10


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure
  2015-11-26 17:59 [PATCH v2 0/3] src: trace infrastructure support Florian Westphal
  2015-11-26 17:59 ` [PATCH v2 libnftnl 1/3] src: add " Florian Westphal
@ 2015-11-26 17:59 ` Florian Westphal
  2015-11-26 23:20   ` Patrick McHardy
  2015-11-27  8:47   ` Patrick McHardy
  2015-11-26 17:59 ` [PATCH v2 nf-next 3/3] netfilter: nf_tables: wrap tracing with a static key Florian Westphal
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Florian Westphal @ 2015-11-26 17:59 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

nft monitor mode can then decode and display this trace data.

Parts of LL/Network/Transport headers are provided as separate
attributes.

Otherwise, printing IP address data becomes virtually impossible
for userspace since in the case of the netdev family we really don't
want userspace to have to know all the possible link layer types
and/or sizes just to display/print an ip address.

We also don't want userspace to have to follow ipv6 header chains
to get the s/dport info, the kernel already did this work for us.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since v1:
  - move trace into new file, nf_tables_trace.c (Pablo)
  - add new netlink group for trace messages (Pablo, Patrick)
  - make verdict nested attr (Patrick)
  - add percpu area to store current processed skb id
  (suggested by Patrick)
  - fold arguments to nft_trace_notify into a structure,
  nft_traceinfo.  This would allow easier extension later
  on, e.g. ptr to registers to e.g. log NFT_BREAK/mismatch
  as well.

 include/net/netfilter/nf_tables.h        |  17 +++
 include/uapi/linux/netfilter/nf_tables.h |  46 ++++++
 include/uapi/linux/netfilter/nfnetlink.h |   2 +
 net/netfilter/Makefile                   |   2 +-
 net/netfilter/nf_tables_api.c            |  12 +-
 net/netfilter/nf_tables_core.c           |  35 ++++-
 net/netfilter/nf_tables_trace.c          | 253 +++++++++++++++++++++++++++++++
 net/netfilter/nfnetlink.c                |   1 +
 net/netfilter/nft_meta.c                 |   4 +-
 9 files changed, 356 insertions(+), 16 deletions(-)
 create mode 100644 net/netfilter/nf_tables_trace.c

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 101d7d7..59331ba 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -888,6 +888,23 @@ void nft_unregister_chain_type(const struct nf_chain_type *);
 int nft_register_expr(struct nft_expr_type *);
 void nft_unregister_expr(struct nft_expr_type *);
 
+int nft_verdict_dump(struct sk_buff *skb,
+		     const struct nft_verdict *v, u16 type);
+
+struct nft_traceinfo {
+	const struct nft_chain *chain;
+	const struct nft_rule *rule;
+	const struct nft_verdict *verdict;
+	enum nft_trace_types type;
+
+	bool packet_dumped;
+};
+
+void nft_trace_notify(struct nft_traceinfo *info,
+		      const struct nft_pktinfo *pkt);
+
+void nft_trace_start(struct sk_buff *skb);
+
 #define nft_dereference(p)					\
 	nfnl_dereference(p, NFNL_SUBSYS_NFTABLES)
 
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 5f3ecec..d6abe1c 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -83,6 +83,7 @@ enum nft_verdicts {
  * @NFT_MSG_DELSETELEM: delete a set element (enum nft_set_elem_attributes)
  * @NFT_MSG_NEWGEN: announce a new generation, only for events (enum nft_gen_attributes)
  * @NFT_MSG_GETGEN: get the rule-set generation (enum nft_gen_attributes)
+ * @NFT_MSG_TRACE: trace event (enum nft_trace_attributes)
  */
 enum nf_tables_msg_types {
 	NFT_MSG_NEWTABLE,
@@ -102,6 +103,7 @@ enum nf_tables_msg_types {
 	NFT_MSG_DELSETELEM,
 	NFT_MSG_NEWGEN,
 	NFT_MSG_GETGEN,
+	NFT_MSG_TRACE,
 	NFT_MSG_MAX,
 };
 
@@ -987,4 +989,48 @@ enum nft_gen_attributes {
 };
 #define NFTA_GEN_MAX		(__NFTA_GEN_MAX - 1)
 
+/**
+ * enum nft_trace_attributes - nf_tables trace netlink attributes
+ *
+ * @NFTA_TRACE_TABLE: name of the table (NLA_STRING)
+ * @NFTA_TRACE_CHAIN: name of the chain (NLA_STRING)
+ * @NFTA_TRACE_RULE_HANDLE: numeric handle of the rule (NLA_U64)
+ * @NFTA_TRACE_TYPE: type of the event (NLA_U32: nft_trace_types)
+ * @NFTA_TRACE_VERDICT: verdict returned by hook (NLA_NESTED: nft_verdicts)
+ * @NFTA_TRACE_ID: pseudo-id, same for each skb traced (NLA_U32)
+ * @NFTA_TRACE_LL_HEADER: linklayer header (NLA_BINARY)
+ * @NFTA_TRACE_NETWORK_HEADER: network header (NLA_BINARY)
+ * @NFTA_TRACE_TRANSPORT_HEADER: transport header (NLA_BINARY)
+ * @NFTA_TRACE_IIFTYPE: netdev->type of indev (NLA_U16)
+ * @NFTA_TRACE_IIF: indev ifindex (NLA_U32)
+ * @NFTA_TRACE_OIF: outdev ifindex (NLA_U32)
+ * @NFTA_TRACE_MARK: nfmark (NLA_U32)
+ */
+enum nft_trace_attibutes {
+	NFTA_TRACE_UNSPEC,
+	NFTA_TRACE_TABLE,
+	NFTA_TRACE_CHAIN,
+	NFTA_TRACE_RULE_HANDLE,
+	NFTA_TRACE_TYPE,
+	NFTA_TRACE_VERDICT,
+	NFTA_TRACE_ID,
+	NFTA_TRACE_LL_HEADER,
+	NFTA_TRACE_NETWORK_HEADER,
+	NFTA_TRACE_TRANSPORT_HEADER,
+	NFTA_TRACE_IIFTYPE,
+	NFTA_TRACE_IIF,
+	NFTA_TRACE_OIF,
+	NFTA_TRACE_MARK,
+	__NFTA_TRACE_MAX
+};
+#define NFTA_TRACE_MAX (__NFTA_TRACE_MAX - 1)
+
+enum nft_trace_types {
+	NFT_TRACETYPE_UNSPEC,
+	NFT_TRACETYPE_POLICY,
+	NFT_TRACETYPE_RETURN,
+	NFT_TRACETYPE_RULE,
+	__NFT_TRACETYPE_MAX
+};
+#define NFT_TRACETYPE_MAX (__NFT_TRACETYPE_MAX - 1)
 #endif /* _LINUX_NF_TABLES_H */
diff --git a/include/uapi/linux/netfilter/nfnetlink.h b/include/uapi/linux/netfilter/nfnetlink.h
index 354a7e5..4bb8cb7 100644
--- a/include/uapi/linux/netfilter/nfnetlink.h
+++ b/include/uapi/linux/netfilter/nfnetlink.h
@@ -22,6 +22,8 @@ enum nfnetlink_groups {
 #define NFNLGRP_NFTABLES                NFNLGRP_NFTABLES
 	NFNLGRP_ACCT_QUOTA,
 #define NFNLGRP_ACCT_QUOTA		NFNLGRP_ACCT_QUOTA
+	NFNLGRP_NFTRACE,
+#define NFNLGRP_NFTRACE			NFNLGRP_NFTRACE
 	__NFNLGRP_MAX,
 };
 #define NFNLGRP_MAX	(__NFNLGRP_MAX - 1)
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 7638c36..2293484 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -67,7 +67,7 @@ obj-$(CONFIG_NF_NAT_TFTP) += nf_nat_tftp.o
 obj-$(CONFIG_NETFILTER_SYNPROXY) += nf_synproxy_core.o
 
 # nf_tables
-nf_tables-objs += nf_tables_core.o nf_tables_api.o
+nf_tables-objs += nf_tables_core.o nf_tables_api.o nf_tables_trace.o
 nf_tables-objs += nft_immediate.o nft_cmp.o nft_lookup.o nft_dynset.o
 nf_tables-objs += nft_bitwise.o nft_byteorder.o nft_payload.o
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 93cc473..717373a 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4446,22 +4446,22 @@ static void nft_verdict_uninit(const struct nft_data *data)
 	}
 }
 
-static int nft_verdict_dump(struct sk_buff *skb, const struct nft_data *data)
+int nft_verdict_dump(struct sk_buff *skb, const struct nft_verdict *v, u16 type)
 {
 	struct nlattr *nest;
 
-	nest = nla_nest_start(skb, NFTA_DATA_VERDICT);
+	nest = nla_nest_start(skb, type);
 	if (!nest)
 		goto nla_put_failure;
 
-	if (nla_put_be32(skb, NFTA_VERDICT_CODE, htonl(data->verdict.code)))
+	if (nla_put_be32(skb, NFTA_VERDICT_CODE, htonl(v->code)))
 		goto nla_put_failure;
 
-	switch (data->verdict.code) {
+	switch (v->code) {
 	case NFT_JUMP:
 	case NFT_GOTO:
 		if (nla_put_string(skb, NFTA_VERDICT_CHAIN,
-				   data->verdict.chain->name))
+				   v->chain->name))
 			goto nla_put_failure;
 	}
 	nla_nest_end(skb, nest);
@@ -4572,7 +4572,7 @@ int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data,
 		err = nft_value_dump(skb, data, len);
 		break;
 	case NFT_DATA_VERDICT:
-		err = nft_verdict_dump(skb, data);
+		err = nft_verdict_dump(skb, &data->verdict, NFTA_DATA_VERDICT);
 		break;
 	default:
 		err = -EINVAL;
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index f3695a4..e8ba9da 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -54,12 +54,23 @@ static void __nft_trace_packet(const struct nft_pktinfo *pkt,
 		     rulenum);
 }
 
-static inline void nft_trace_packet(const struct nft_pktinfo *pkt,
+static inline void nft_trace_packet(struct nft_traceinfo *info,
+				    const struct nft_pktinfo *pkt,
 				    const struct nft_chain *chain,
-				    int rulenum, enum nft_trace type)
+				    const struct nft_rule *rule,
+				    const struct nft_verdict *verdict,
+				    int rulenum,
+				    enum nft_trace_types type)
 {
-	if (unlikely(pkt->skb->nf_trace))
+	if (unlikely(pkt->skb->nf_trace)) {
+		info->chain = chain;
+		info->rule = rule;
+		info->verdict = verdict;
+		info->type = type;
+		nft_trace_notify(info, pkt);
+
 		__nft_trace_packet(pkt, chain, rulenum, type);
+	}
 }
 
 static void nft_cmp_fast_eval(const struct nft_expr *expr,
@@ -121,7 +132,9 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
 	struct nft_stats *stats;
 	int rulenum;
 	unsigned int gencursor = nft_genmask_cur(net);
+	struct nft_traceinfo info;
 
+	info.packet_dumped = false;
 do_chain:
 	rulenum = 0;
 	rule = list_entry(&chain->rules, struct nft_rule, list);
@@ -151,7 +164,8 @@ next_rule:
 			regs.verdict.code = NFT_CONTINUE;
 			continue;
 		case NFT_CONTINUE:
-			nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+			nft_trace_packet(&info, pkt, chain, rule, &regs.verdict,
+					 rulenum, NFT_TRACETYPE_RULE);
 			continue;
 		}
 		break;
@@ -161,7 +175,8 @@ next_rule:
 	case NF_ACCEPT:
 	case NF_DROP:
 	case NF_QUEUE:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+		nft_trace_packet(&info, pkt, chain, rule, &regs.verdict,
+				 rulenum, NFT_TRACETYPE_RULE);
 		return regs.verdict.code;
 	}
 
@@ -174,7 +189,8 @@ next_rule:
 		stackptr++;
 		/* fall through */
 	case NFT_GOTO:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+		nft_trace_packet(&info, pkt, chain, rule, &regs.verdict,
+				 rulenum, NFT_TRACETYPE_RULE);
 
 		chain = regs.verdict.chain;
 		goto do_chain;
@@ -182,7 +198,9 @@ next_rule:
 		rulenum++;
 		/* fall through */
 	case NFT_RETURN:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RETURN);
+		if (stackptr)
+			nft_trace_packet(&info, pkt, chain, rule, &regs.verdict,
+					 rulenum, NFT_TRACETYPE_RETURN);
 		break;
 	default:
 		WARN_ON(1);
@@ -196,7 +214,8 @@ next_rule:
 		goto next_rule;
 	}
 
-	nft_trace_packet(pkt, basechain, -1, NFT_TRACE_POLICY);
+	nft_trace_packet(&info, pkt, basechain, NULL, NULL, -1,
+			 NFT_TRACETYPE_POLICY);
 
 	rcu_read_lock_bh();
 	stats = this_cpu_ptr(rcu_dereference(nft_base_chain(basechain)->stats));
diff --git a/net/netfilter/nf_tables_trace.c b/net/netfilter/nf_tables_trace.c
new file mode 100644
index 0000000..2f557b7
--- /dev/null
+++ b/net/netfilter/nf_tables_trace.c
@@ -0,0 +1,253 @@
+/*
+ * (C) 2015 Red Hat GmbH
+ * Author: Florian Westphal <fw@strlen.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/hash.h>
+#include <linux/if_vlan.h>
+#include <linux/init.h>
+#include <linux/skbuff.h>
+#include <linux/netlink.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
+#include <net/netfilter/nf_tables.h>
+
+#define NFT_TRACETYPE_LL_HSIZE		20
+#define NFT_TRACETYPE_NETWORK_HSIZE	40
+#define NFT_TRACETYPE_TRANSPORT_HSIZE	20
+
+#define NFT_ID_SKB_HASHBITS	12
+
+/* only used if tracing is on */
+static DEFINE_PER_CPU_READ_MOSTLY(u32, trace_seq);
+static atomic_t trace_cnt;
+
+static int trace_fill_id(struct sk_buff *nlskb, unsigned long skb_addr)
+{
+	u32 seq, id;
+
+	/* using hash_ptr(skb) causes immediate re-use of old ID when
+	 * skb moves to another cpu, this can happen e.g. with nfqueue.
+	 *
+	 * using skb address as ID results in a limited number of
+	 * values (and quick reuse).
+	 *
+	 * In order to also get a 'new' id in that case we use both.
+	 */
+	seq = __this_cpu_read(trace_seq);
+	id = seq + hash_long(skb_addr >> L1_CACHE_SHIFT, NFT_ID_SKB_HASHBITS);
+
+	return nla_put_be32(nlskb, NFTA_TRACE_ID, htonl(id));
+}
+
+static int trace_fill_header(struct sk_buff *nlskb, u16 type,
+			     const struct sk_buff *skb,
+			     int off, unsigned int len)
+{
+	struct nlattr *nla;
+
+	if (len == 0)
+		return 0;
+
+	if (skb_tailroom(nlskb) < nla_total_size(len))
+		return -1;
+
+	nla = (struct nlattr *)skb_put(nlskb, nla_total_size(len));
+	nla->nla_type = type;
+	nla->nla_len = nla_attr_size(len);
+
+	if (skb_copy_bits(skb, off, nla_data(nla), len))
+		return -1;
+
+	return 0;
+}
+
+static int nf_trace_fill_meta_info(struct sk_buff *nlskb,
+				   const struct nft_pktinfo *pkt)
+{
+	if (pkt->in) {
+		if (nla_put_be32(nlskb, NFTA_TRACE_IIF,
+				 htonl(pkt->in->ifindex)))
+			return -1;
+
+		/* needed for LL_HEADER decode */
+		if (nla_put_be16(nlskb, NFTA_TRACE_IIFTYPE,
+				 htons(pkt->in->type)))
+			return -1;
+	}
+
+	if (pkt->out &&
+	    nla_put_be32(nlskb, NFTA_TRACE_OIF, htonl(pkt->out->ifindex)))
+		return -1;
+
+	return 0;
+}
+
+static int nf_trace_fill_ll_header(struct sk_buff *nlskb,
+				   const struct sk_buff *skb)
+{
+	struct vlan_ethhdr veth;
+	unsigned int len, off;
+
+	off = skb_mac_header(skb) - skb->data;
+
+	if (skb_copy_bits(skb, off, &veth, ETH_HLEN))
+		return -1;
+
+	veth.h_vlan_proto = skb->vlan_proto;
+	veth.h_vlan_TCI = htons(skb_vlan_tag_get(skb));
+	veth.h_vlan_encapsulated_proto = skb->protocol;
+
+	len = min_t(unsigned int,
+		    skb->data - skb_mac_header(skb), NFT_TRACETYPE_LL_HSIZE);
+
+	if (WARN_ON_ONCE(len != ETH_HLEN))
+		return -1;
+
+	len = sizeof(veth);
+	if (skb_tailroom(nlskb) < nla_total_size(len))
+		return -1;
+
+	return nla_put(nlskb, NFTA_TRACE_LL_HEADER, len, &veth);
+}
+
+static int nf_trace_fill_pkt_info(struct sk_buff *nlskb,
+				  const struct nft_pktinfo *pkt)
+{
+	const struct sk_buff *skb = pkt->skb;
+	unsigned int len = min_t(unsigned int,
+				 pkt->xt.thoff - skb_network_offset(skb),
+				 NFT_TRACETYPE_NETWORK_HSIZE);
+	int off = skb_network_offset(skb);
+
+	if (trace_fill_header(nlskb, NFTA_TRACE_NETWORK_HEADER, skb, off, len))
+		return -1;
+
+	len = min_t(unsigned int, skb->len - pkt->xt.thoff,
+		    NFT_TRACETYPE_TRANSPORT_HSIZE);
+
+	if (trace_fill_header(nlskb, NFTA_TRACE_TRANSPORT_HEADER, skb,
+			      pkt->xt.thoff, len))
+		return -1;
+
+	if (!skb_mac_header_was_set(skb))
+		return 0;
+
+	if (skb_vlan_tag_get(skb))
+		return nf_trace_fill_ll_header(nlskb, skb);
+
+	len = min_t(unsigned int,
+		    skb->data - skb_mac_header(skb), NFT_TRACETYPE_LL_HSIZE);
+	off = skb_mac_header(skb) - skb->data;
+	return trace_fill_header(nlskb, NFTA_TRACE_LL_HEADER,
+				 skb, off, len);
+}
+
+static int nf_trace_fill_chain_info(struct sk_buff *nlskb,
+				    const struct nft_chain *chain)
+{
+	if (nla_put_string(nlskb, NFTA_TRACE_TABLE, chain->table->name))
+		return -1;
+
+	return nla_put_string(nlskb, NFTA_TRACE_CHAIN, chain->name);
+}
+
+void nft_trace_notify(struct nft_traceinfo *info,
+		      const struct nft_pktinfo *pkt)
+{
+	struct nfgenmsg *nfmsg;
+	struct nlmsghdr *nlh;
+	struct sk_buff *skb;
+	unsigned int size;
+	int event = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_TRACE;
+
+	if (!nfnetlink_has_listeners(pkt->net, NFNLGRP_NFTRACE))
+		return;
+
+	size = nlmsg_total_size(sizeof(struct nfgenmsg)) +
+		nla_total_size(NFT_TABLE_MAXNAMELEN) +
+		nla_total_size(NFT_CHAIN_MAXNAMELEN) +
+		nla_total_size(sizeof(__be64)) +	/* rule handle */
+		nla_total_size(sizeof(__be32)) +	/* trace type */
+		nla_total_size(0) +			/* VERDICT, nested */
+			nla_total_size(sizeof(u32)) +	/* verdict code */
+			nla_total_size(NFT_CHAIN_MAXNAMELEN) + /* jump target */
+		nla_total_size(sizeof(u32)) +		/* id */
+		nla_total_size(NFT_TRACETYPE_LL_HSIZE) +
+		nla_total_size(NFT_TRACETYPE_NETWORK_HSIZE) +
+		nla_total_size(NFT_TRACETYPE_TRANSPORT_HSIZE) +
+		nla_total_size(sizeof(u32)) +		/* mark */
+		nla_total_size(sizeof(u32)) +		/* iif */
+		nla_total_size(sizeof(u32)) +		/* oif */
+		nla_total_size(sizeof(__be16));		/* device type */
+
+	skb = nlmsg_new(size, GFP_ATOMIC);
+	if (!skb)
+		return;
+
+	nlh = nlmsg_put(skb, 0, 0, event, sizeof(struct nfgenmsg), 0);
+	if (!nlh)
+		goto nla_put_failure;
+
+	nfmsg = nlmsg_data(nlh);
+	nfmsg->nfgen_family	= pkt->pf;
+	nfmsg->version		= NFNETLINK_V0;
+	nfmsg->res_id		= 0;
+
+	if (nla_put_be32(skb, NFTA_TRACE_TYPE, htonl(info->type)))
+		goto nla_put_failure;
+
+	if (trace_fill_id(skb, (unsigned long)pkt->skb))
+		goto nla_put_failure;
+
+	if (info->chain &&
+	    nf_trace_fill_chain_info(skb, info->chain))
+		goto nla_put_failure;
+
+	if (info->rule &&
+	    nla_put_be64(skb, NFTA_TRACE_RULE_HANDLE,
+			 cpu_to_be64(info->rule->handle)))
+		goto nla_put_failure;
+
+	if ((info->type == NFT_TRACETYPE_RETURN ||
+	     info->type == NFT_TRACETYPE_RULE) &&
+	    nft_verdict_dump(skb, info->verdict, NFTA_TRACE_VERDICT))
+		goto nla_put_failure;
+
+	if (pkt->skb->mark &&
+	    nla_put_be32(skb, NFTA_TRACE_MARK, htonl(pkt->skb->mark)))
+		goto nla_put_failure;
+
+	if (!info->packet_dumped) {
+		if (nf_trace_fill_meta_info(skb, pkt))
+			goto nla_put_failure;
+
+		if (nf_trace_fill_pkt_info(skb, pkt))
+			goto nla_put_failure;
+		info->packet_dumped = true;
+	}
+
+	nlmsg_end(skb, nlh);
+	nfnetlink_send(skb, pkt->net, 0, NFNLGRP_NFTRACE, 0, GFP_ATOMIC);
+	return;
+
+ nla_put_failure:
+	WARN_ON_ONCE(1);
+	kfree_skb(skb);
+}
+
+void nft_trace_start(struct sk_buff *skb)
+{
+	u32 seq = atomic_inc_return(&trace_cnt) << NFT_ID_SKB_HASHBITS;
+
+	__this_cpu_write(trace_seq, seq);
+	skb->nf_trace = 1;
+}
+EXPORT_SYMBOL(nft_trace_start);
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 46453ab..28591fa 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -49,6 +49,7 @@ static const int nfnl_group2type[NFNLGRP_MAX+1] = {
 	[NFNLGRP_CONNTRACK_EXP_DESTROY] = NFNL_SUBSYS_CTNETLINK_EXP,
 	[NFNLGRP_NFTABLES]		= NFNL_SUBSYS_NFTABLES,
 	[NFNLGRP_ACCT_QUOTA]		= NFNL_SUBSYS_ACCT,
+	[NFNLGRP_NFTRACE]		= NFNL_SUBSYS_NFTABLES,
 };
 
 void nfnl_lock(__u8 subsys_id)
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 9dfaf4d..eaf41e4 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -204,7 +204,9 @@ void nft_meta_set_eval(const struct nft_expr *expr,
 		skb->priority = value;
 		break;
 	case NFT_META_NFTRACE:
-		skb->nf_trace = 1;
+		if (skb->nf_trace)
+			break;
+		nft_trace_start(pkt->skb);
 		break;
 	default:
 		WARN_ON(1);
-- 
2.4.10


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 nf-next 3/3] netfilter: nf_tables: wrap tracing with a static key
  2015-11-26 17:59 [PATCH v2 0/3] src: trace infrastructure support Florian Westphal
  2015-11-26 17:59 ` [PATCH v2 libnftnl 1/3] src: add " Florian Westphal
  2015-11-26 17:59 ` [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
@ 2015-11-26 17:59 ` Florian Westphal
  2015-11-26 21:30   ` Patrick McHardy
  2015-11-26 20:53 ` [PATCH v2 0/3] src: trace infrastructure support Patrick McHardy
  2015-11-27  1:33 ` Patrick McHardy
  4 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2015-11-26 17:59 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Only needed when meta nftrace rule(s) were added.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since v1:
  - do not wrap rulenum++ into static key, because
  it might get incremented too late when trace
  rules are added/removed at high rate, giving wrong results.
  - add EXPORT_SYMBOL for nft_meta_set_destroy.

 There was another followup patch to disable old tracing
 infrastructure, but I dropped this for now.

 So, for the time being, you get both tracing formats.

 Should be easy to add sysctl later on to disable the old
 format.

 include/net/netfilter/nf_tables_core.h |  1 +
 include/net/netfilter/nft_meta.h       |  3 +++
 net/bridge/netfilter/nft_meta_bridge.c |  1 +
 net/netfilter/nf_tables_core.c         |  6 +++++-
 net/netfilter/nf_tables_trace.c        |  4 ++++
 net/netfilter/nft_meta.c               | 16 ++++++++++++++++
 6 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index 4ff5424..a9060dd 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -57,6 +57,7 @@ struct nft_payload_set {
 };
 
 extern const struct nft_expr_ops nft_payload_fast_ops;
+extern struct static_key_false nft_trace_enabled;
 
 int nft_payload_module_init(void);
 void nft_payload_module_exit(void);
diff --git a/include/net/netfilter/nft_meta.h b/include/net/netfilter/nft_meta.h
index 711887a..d27588c 100644
--- a/include/net/netfilter/nft_meta.h
+++ b/include/net/netfilter/nft_meta.h
@@ -33,4 +33,7 @@ void nft_meta_set_eval(const struct nft_expr *expr,
 		       struct nft_regs *regs,
 		       const struct nft_pktinfo *pkt);
 
+void nft_meta_set_destroy(const struct nft_ctx *ctx,
+			  const struct nft_expr *expr);
+
 #endif
diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c
index a21269b..4b901d9 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -84,6 +84,7 @@ static const struct nft_expr_ops nft_meta_bridge_set_ops = {
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_meta)),
 	.eval		= nft_meta_set_eval,
 	.init		= nft_meta_set_init,
+	.destroy	= nft_meta_set_destroy,
 	.dump		= nft_meta_set_dump,
 };
 
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index e8ba9da..5d3d37b 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -16,6 +16,7 @@
 #include <linux/skbuff.h>
 #include <linux/netlink.h>
 #include <linux/netfilter.h>
+#include <linux/static_key.h>
 #include <linux/netfilter/nfnetlink.h>
 #include <linux/netfilter/nf_tables.h>
 #include <net/netfilter/nf_tables_core.h>
@@ -62,7 +63,10 @@ static inline void nft_trace_packet(struct nft_traceinfo *info,
 				    int rulenum,
 				    enum nft_trace_types type)
 {
-	if (unlikely(pkt->skb->nf_trace)) {
+	if (static_branch_unlikely(&nft_trace_enabled)) {
+		if (!pkt->skb->nf_trace)
+			return;
+
 		info->chain = chain;
 		info->rule = rule;
 		info->verdict = verdict;
diff --git a/net/netfilter/nf_tables_trace.c b/net/netfilter/nf_tables_trace.c
index 2f557b7..7446891 100644
--- a/net/netfilter/nf_tables_trace.c
+++ b/net/netfilter/nf_tables_trace.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/static_key.h>
 #include <linux/hash.h>
 #include <linux/if_vlan.h>
 #include <linux/init.h>
@@ -25,6 +26,9 @@
 
 #define NFT_ID_SKB_HASHBITS	12
 
+DEFINE_STATIC_KEY_FALSE(nft_trace_enabled);
+EXPORT_SYMBOL_GPL(nft_trace_enabled);
+
 /* only used if tracing is on */
 static DEFINE_PER_CPU_READ_MOSTLY(u32, trace_seq);
 static atomic_t trace_cnt;
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index eaf41e4..1cf6814 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -18,10 +18,12 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/smp.h>
+#include <linux/static_key.h>
 #include <net/dst.h>
 #include <net/sock.h>
 #include <net/tcp_states.h> /* for TCP_TIME_WAIT */
 #include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
 #include <net/netfilter/nft_meta.h>
 
 void nft_meta_get_eval(const struct nft_expr *expr,
@@ -299,6 +301,9 @@ int nft_meta_set_init(const struct nft_ctx *ctx,
 	if (err < 0)
 		return err;
 
+	if (priv->key == NFT_META_NFTRACE)
+		static_branch_inc(&nft_trace_enabled);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(nft_meta_set_init);
@@ -336,6 +341,16 @@ nla_put_failure:
 }
 EXPORT_SYMBOL_GPL(nft_meta_set_dump);
 
+void nft_meta_set_destroy(const struct nft_ctx *ctx,
+			  const struct nft_expr *expr)
+{
+	const struct nft_meta *priv = nft_expr_priv(expr);
+
+	if (priv->key == NFT_META_NFTRACE)
+		static_branch_dec(&nft_trace_enabled);
+}
+EXPORT_SYMBOL_GPL(nft_meta_set_destroy);
+
 static struct nft_expr_type nft_meta_type;
 static const struct nft_expr_ops nft_meta_get_ops = {
 	.type		= &nft_meta_type,
@@ -350,6 +365,7 @@ static const struct nft_expr_ops nft_meta_set_ops = {
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_meta)),
 	.eval		= nft_meta_set_eval,
 	.init		= nft_meta_set_init,
+	.destroy	= nft_meta_set_destroy,
 	.dump		= nft_meta_set_dump,
 };
 
-- 
2.4.10


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] src: trace infrastructure support
  2015-11-26 17:59 [PATCH v2 0/3] src: trace infrastructure support Florian Westphal
                   ` (2 preceding siblings ...)
  2015-11-26 17:59 ` [PATCH v2 nf-next 3/3] netfilter: nf_tables: wrap tracing with a static key Florian Westphal
@ 2015-11-26 20:53 ` Patrick McHardy
  2015-11-27  1:33 ` Patrick McHardy
  4 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2015-11-26 20:53 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 26.11, Florian Westphal wrote:
> This is V2 of the nft trace patch set.
> 
> See the individual patches for a change vs. v1 description.
> 
> Patrick: I did a quick test w. your last public patches
> (plus a few minor changes to make it build again) and things
> seem to work fine.

Thanks, I have it running, will start making the missing adaptions now.

> What does not work correctly is decoding of vlan header.
> I think whats happening is that we cannot splice sub-byte
> sized quantities from a larger chunk.

It does work for IPv4 (version/hdrlength), but they don't cross byte
boundaries, maybe its related to that. I'll check it out.

> Other than that, things appear to work correctly.

Very nice, thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 nf-next 3/3] netfilter: nf_tables: wrap tracing with a static key
  2015-11-26 17:59 ` [PATCH v2 nf-next 3/3] netfilter: nf_tables: wrap tracing with a static key Florian Westphal
@ 2015-11-26 21:30   ` Patrick McHardy
  0 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2015-11-26 21:30 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1484 bytes --]

On 26.11, Florian Westphal wrote:
> Only needed when meta nftrace rule(s) were added.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  Changes since v1:
>   - do not wrap rulenum++ into static key, because
>   it might get incremented too late when trace
>   rules are added/removed at high rate, giving wrong results.
>   - add EXPORT_SYMBOL for nft_meta_set_destroy.
> 
>  There was another followup patch to disable old tracing
>  infrastructure, but I dropped this for now.
> 
>  So, for the time being, you get both tracing formats.
> 
>  Should be easy to add sysctl later on to disable the old
>  format.

Just an observation so far: this really bloats nft_do_chain(), in my
compilation it adds 655 bytes and inlines all the tracing stuff multiple
times into critical spots. We really need to get this down.

First simple and easy attempt was the attached patch, it brings it down
to only +236b. But I think we can still do better.

One idea would be to wrap the local state in a struct. That way we
can pass all of it together using a single pointer. For that We could
use the current jump stack frame to hold also the *current* chain and
rule pointers if we can make sure the compiler will still use registers
for the current ones.

The nft_trace_notify() invocation should probably be moved to
__nft_trace_packet() and if still necessary that should be marked
noinline.

Just a couple of ideas for a start. We really need to avoid bloating
these critical loops.

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1142 bytes --]

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 5d3d37b..414622a 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -45,14 +45,14 @@ static struct nf_loginfo trace_loginfo = {
 	},
 };
 
-static void __nft_trace_packet(const struct nft_pktinfo *pkt,
-			       const struct nft_chain *chain,
-			       int rulenum, enum nft_trace type)
+static noinline void __nft_trace_packet(struct nft_traceinfo *info,
+			       const struct nft_pktinfo *pkt,
+			       int rulenum)
 {
 	nf_log_trace(pkt->net, pkt->pf, pkt->hook, pkt->skb, pkt->in,
 		     pkt->out, &trace_loginfo, "TRACE: %s:%s:%s:%u ",
-		     chain->table->name, chain->name, comments[type],
-		     rulenum);
+		     info->chain->table->name, info->chain->name,
+		     comments[info->type], rulenum);
 }
 
 static inline void nft_trace_packet(struct nft_traceinfo *info,
@@ -73,7 +73,7 @@ static inline void nft_trace_packet(struct nft_traceinfo *info,
 		info->type = type;
 		nft_trace_notify(info, pkt);
 
-		__nft_trace_packet(pkt, chain, rulenum, type);
+		__nft_trace_packet(info, pkt, rulenum);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 libnftnl 1/3] src: add trace infrastructure support
  2015-11-26 17:59 ` [PATCH v2 libnftnl 1/3] src: add " Florian Westphal
@ 2015-11-26 22:11   ` Patrick McHardy
  0 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2015-11-26 22:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 26.11, Florian Westphal wrote:
>   - add NFTNL_TRACE_QUEUE_ID to obtain nfqueue number
>   - NF_VERDICT is normalized in this case, i.e.
>   if verdict from kernel is '42 << 16 | NF_QUEUE', make
>   NF_QUEUE the verdict and put the queue number ->queue_id, accessible
>   via NFTNL_TRACE_QUEUE_ID.
> 
> +static int nftnl_trace_parse_verdict_cb(const struct nlattr *attr, void *data)
> +{
> +	int type = mnl_attr_get_type(attr);
> +	const struct nlattr **tb = data;
> +
> +	switch (type) {
> +	case NFTA_VERDICT_CODE:
> +		if (mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
> +			abi_breakage();
> +		tb[type] = attr;
> +		break;
> +	case NFTA_VERDICT_CHAIN:
> +		if (mnl_attr_validate(attr, MNL_TYPE_STRING) < 0)
> +			abi_breakage();
> +		tb[type] = attr;
> +		break;
> +	}
> +
> +	return MNL_CB_OK;
> +}
> +
> +static void
> +nftnl_trace_parse_verdict(const struct nlattr *attr, struct nftnl_trace *t)
> +{
> +	struct nlattr *tb[NFTA_VERDICT_MAX+1];
> +
> +	mnl_attr_parse_nested(attr, nftnl_trace_parse_verdict_cb, tb);
> +
> +	if (!tb[NFTA_VERDICT_CODE])
> +		abi_breakage();
> +
> +	t->verdict = ntohl(mnl_attr_get_u32(tb[NFTA_VERDICT_CODE]));
> +	t->flags |= (1 << NFTNL_TRACE_VERDICT);
> +
> +	switch (t->verdict) {
> +	case NFT_GOTO: /* fallthough */
> +	case NFT_JUMP:
> +		if (!tb[NFTA_VERDICT_CHAIN])
> +			abi_breakage();
> +		t->jump_target = strdup(mnl_attr_get_str(tb[NFTA_VERDICT_CHAIN]));
> +		if (t->jump_target)
> +			t->flags |= (1 << NFTNL_TRACE_JUMP_TARGET);
> +		break;
> +	case NFT_RETURN: /* all other NFT_* cases fall through */
> +	case NFT_CONTINUE:
> +	case NFT_BREAK:
> +		break;
> +	case NF_ACCEPT: /* standard verdicts fall though */
> +	case NF_DROP:
> +	case NF_STOLEN:
> +	case NF_REPEAT:
> +	case NF_STOP:
> +		break;
> +	default: /* Unknown NF_ verdict, or verdict contains extra data */
> +		switch (t->verdict & NF_VERDICT_MASK) {
> +		case NF_QUEUE:
> +			t->queue_id = t->verdict >> 16;
> +			t->verdict = NF_QUEUE;
> +			t->flags |= (1 << NFTNL_TRACE_QUEUE_ID);
> +			break;
> +		}
> +		break;
> +	}
> +}

This doesn't handle NF_DROP_ERRNO() codes. I would suggest using
nftnl_parse_verdict() and struct nftnl_data_reg so we have a common
interface for verdicts. Mid term I'd prefer if we pass around verdict
structs and have an interface for those instead of having
IMM_VERDICT/IMM_CHAIN, SET_ELEM_VERDICT/SET_ELEM_CHAIN, ...

Regarding the NFTNL_TRACE_QUEUE_ID, I'm not convinced this is a good
way to go. We'd need another NFTNL_TRACE_DROP_VERDICT_ERRNO or something
like that, and the user needs to make another distinction on which further
attributes to get based on the verdict itself. It seems easier to just
decode it in the user and don't try to put too much knowledge in the
library, it's after all just meant for communication.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure
  2015-11-26 17:59 ` [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
@ 2015-11-26 23:20   ` Patrick McHardy
  2015-11-27  0:10     ` Florian Westphal
  2015-11-27  8:47   ` Patrick McHardy
  1 sibling, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2015-11-26 23:20 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 26.11, Florian Westphal wrote:
> nft monitor mode can then decode and display this trace data.

First round of comments below:

> diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
> index 101d7d7..59331ba 100644
> --- a/include/net/netfilter/nf_tables.h
> +++ b/include/net/netfilter/nf_tables.h
>  
> +int nft_verdict_dump(struct sk_buff *skb,
> +		     const struct nft_verdict *v, u16 type);

We usually use "struct sk_buff *skb, int attr, ...". I know int is actually
not the correct type but that's what all the nla_ functions use. Let's keep
it consistent please.

> +struct nft_traceinfo {
> +	const struct nft_chain *chain;
> +	const struct nft_rule *rule;
> +	const struct nft_verdict *verdict;
> +	enum nft_trace_types type;
> +
> +	bool packet_dumped;
> +};

Also consistent style please, all structs in that file indent the field
names neatly and they are usually also commented.

> +++ b/net/netfilter/nf_tables_core.c
> @@ -196,7 +214,8 @@ next_rule:
>  		goto next_rule;
>  	}
>  
> -	nft_trace_packet(pkt, basechain, -1, NFT_TRACE_POLICY);
> +	nft_trace_packet(&info, pkt, basechain, NULL, NULL, -1,
> +			 NFT_TRACETYPE_POLICY);

We unfortunately don't get the actual policy this way. I know we don't
have a verdict struct available, but would be nice to fix this.

> +++ b/net/netfilter/nf_tables_trace.c
> +static int nf_trace_fill_meta_info(struct sk_buff *nlskb,
> +				   const struct nft_pktinfo *pkt)
> +{
> +	if (pkt->in) {
> +		if (nla_put_be32(nlskb, NFTA_TRACE_IIF,
> +				 htonl(pkt->in->ifindex)))
> +			return -1;
> +
> +		/* needed for LL_HEADER decode */
> +		if (nla_put_be16(nlskb, NFTA_TRACE_IIFTYPE,
> +				 htons(pkt->in->type)))
> +			return -1;
> +	}
> +
> +	if (pkt->out &&
> +	    nla_put_be32(nlskb, NFTA_TRACE_OIF, htonl(pkt->out->ifindex)))
> +		return -1;

I think we might also need the OIFTYPE and the hook number for full coverage.
On the output path we don't have an iif but still might have an LL header.

> +static int nf_trace_fill_chain_info(struct sk_buff *nlskb,
> +				    const struct nft_chain *chain)
> +{
> +	if (nla_put_string(nlskb, NFTA_TRACE_TABLE, chain->table->name))
> +		return -1;
> +
> +	return nla_put_string(nlskb, NFTA_TRACE_CHAIN, chain->name);

I think we also need the netfilter family here. We get pkt->pf, but that
might differ from the table/chain family in case of NFPROTO_INET. Right
now we get IPv4/v6 and are unable to look up any objects since we're searching
in the wrong family.

> +void nft_trace_start(struct sk_buff *skb)
> +{
> +	u32 seq = atomic_inc_return(&trace_cnt) << NFT_ID_SKB_HASHBITS;
> +
> +	__this_cpu_write(trace_seq, seq);

So we're using a global counter to initialize the per CPU vars? That doesn't
seem to make sense. I was thinking about partitioning the ID space among
CPUs in advance, this global counter kind of defeats using per CPU at all.

Ok I think I see what you're doing here, you want a stable ID while the
packet is processed. The problem is we can not really guarantee that for
the output path with involuntary preemption, between different hook
invocations another packet might come along.

I actually didn't consider that we need to get the same ID multiple times
when I suggested the per CPU counter, I was only thinking about getting a
non-clashing ID *once*. I think we need to come up with a different idea,
sorry about that.

> +	skb->nf_trace = 1;
> +}
> +EXPORT_SYMBOL(nft_trace_start);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure
  2015-11-26 23:20   ` Patrick McHardy
@ 2015-11-27  0:10     ` Florian Westphal
  2015-11-27  1:24       ` Patrick McHardy
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2015-11-27  0:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Florian Westphal, netfilter-devel

Patrick McHardy <kaber@trash.net> wrote:
> On 26.11, Florian Westphal wrote:
> > +int nft_verdict_dump(struct sk_buff *skb,
> > +		     const struct nft_verdict *v, u16 type);
> 
> We usually use "struct sk_buff *skb, int attr, ...". I know int is actually
> not the correct type but that's what all the nla_ functions use. Let's keep
> it consistent please.

OK.

> > +struct nft_traceinfo {
> > +	const struct nft_chain *chain;
> > +	const struct nft_rule *rule;
> > +	const struct nft_verdict *verdict;
> > +	enum nft_trace_types type;
> > +
> > +	bool packet_dumped;
> > +};
> 
> Also consistent style please, all structs in that file indent the field
> names neatly and they are usually also commented.

Fair enough.

> > +++ b/net/netfilter/nf_tables_core.c
> > @@ -196,7 +214,8 @@ next_rule:
> >  		goto next_rule;
> >  	}
> >  
> > -	nft_trace_packet(pkt, basechain, -1, NFT_TRACE_POLICY);
> > +	nft_trace_packet(&info, pkt, basechain, NULL, NULL, -1,
> > +			 NFT_TRACETYPE_POLICY);
> 
> We unfortunately don't get the actual policy this way. I know we don't
> have a verdict struct available, but would be nice to fix this.

Suggestions?  Should I add a new attribute or pretend policy was a
verdict? (i.e., add fake verdict struct and set verdict code to policy)?

> > +	if (pkt->out &&
> > +	    nla_put_be32(nlskb, NFTA_TRACE_OIF, htonl(pkt->out->ifindex)))
> > +		return -1;
> 
> I think we might also need the OIFTYPE and the hook number for full coverage.
> On the output path we don't have an iif but still might have an LL header.

Okay. Any preference on how to do this?
I.e, should I include OIFTYPE only if no indev is available?

Alternatives would be to include both and let userspace decide
or use more generic DEV_TYPE attr that is set to indev->type,
and, if no indev is there, outdev->type?

> > +static int nf_trace_fill_chain_info(struct sk_buff *nlskb,
> > +				    const struct nft_chain *chain)
> > +{
> > +	if (nla_put_string(nlskb, NFTA_TRACE_TABLE, chain->table->name))
> > +		return -1;
> > +
> > +	return nla_put_string(nlskb, NFTA_TRACE_CHAIN, chain->name);
> 
> I think we also need the netfilter family here. We get pkt->pf, but that
> might differ from the table/chain family in case of NFPROTO_INET.

Good point, will add NFTA_TRACE_FAMILY

> now we get IPv4/v6 and are unable to look up any objects since we're searching
> in the wrong family.
> 
> > +void nft_trace_start(struct sk_buff *skb)
> > +{
> > +	u32 seq = atomic_inc_return(&trace_cnt) << NFT_ID_SKB_HASHBITS;
> > +
> > +	__this_cpu_write(trace_seq, seq);
> 
> So we're using a global counter to initialize the per CPU vars? That doesn't
> seem to make sense. I was thinking about partitioning the ID space among
> CPUs in advance, this global counter kind of defeats using per CPU at all.
> 
> Ok I think I see what you're doing here, you want a stable ID while the
> packet is processed.

Yes, that was the idea -- which is why v1 used hash_ptr(skb)

> The problem is we can not really guarantee that for
> the output path with involuntary preemption, between different hook
> invocations another packet might come along.

Yes.

> I actually didn't consider that we need to get the same ID multiple times
> when I suggested the per CPU counter, I was only thinking about getting a
> non-clashing ID *once*.

Oh.  What would be the purpose of that?
The idea was to allow to easily figure out which trace messages belong
to the same packet. (i.e., don't make it difficult to dissect events
from different packets being processed on other cpus...)

> I think we need to come up with a different idea,
> sorry about that.

Hmm, in that case I have no idea how to fix this, sorry.
Only *stable* id is hashing skb address, but as you pointed out
that causes fast id reuse

(I think that would be ok, since its good enough to tell
 near-simultaneous trace events apart).


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure
  2015-11-27  0:10     ` Florian Westphal
@ 2015-11-27  1:24       ` Patrick McHardy
  0 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2015-11-27  1:24 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 27.11, Florian Westphal wrote:
> Patrick McHardy <kaber@trash.net> wrote:
> > > @@ -196,7 +214,8 @@ next_rule:
> > >  		goto next_rule;
> > >  	}
> > >  
> > > -	nft_trace_packet(pkt, basechain, -1, NFT_TRACE_POLICY);
> > > +	nft_trace_packet(&info, pkt, basechain, NULL, NULL, -1,
> > > +			 NFT_TRACETYPE_POLICY);
> > 
> > We unfortunately don't get the actual policy this way. I know we don't
> > have a verdict struct available, but would be nice to fix this.
> 
> Suggestions?  Should I add a new attribute or pretend policy was a
> verdict? (i.e., add fake verdict struct and set verdict code to policy)?

I don't see any other alternative right now.

> > > +	if (pkt->out &&
> > > +	    nla_put_be32(nlskb, NFTA_TRACE_OIF, htonl(pkt->out->ifindex)))
> > > +		return -1;
> > 
> > I think we might also need the OIFTYPE and the hook number for full coverage.
> > On the output path we don't have an iif but still might have an LL header.
> 
> Okay. Any preference on how to do this?
> I.e, should I include OIFTYPE only if no indev is available?
> 
> Alternatives would be to include both and let userspace decide
> or use more generic DEV_TYPE attr that is set to indev->type,
> and, if no indev is there, outdev->type?

One of both *seems* fine. If we have an iif, the header is still based
on the received packet. If we only have an oif, the header, if present,
will be based on that type, so I think that should work.

> > > +static int nf_trace_fill_chain_info(struct sk_buff *nlskb,
> > > +				    const struct nft_chain *chain)
> > > +{
> > > +	if (nla_put_string(nlskb, NFTA_TRACE_TABLE, chain->table->name))
> > > +		return -1;
> > > +
> > > +	return nla_put_string(nlskb, NFTA_TRACE_CHAIN, chain->name);
> > 
> > I think we also need the netfilter family here. We get pkt->pf, but that
> > might differ from the table/chain family in case of NFPROTO_INET.
> 
> Good point, will add NFTA_TRACE_FAMILY

Actually thinking about that, all the other nftables messages use the
netfilter family for nfgen_family, so for consistency we should do that
as well and put pkt->pf in the new attribute.

> > now we get IPv4/v6 and are unable to look up any objects since we're searching
> > in the wrong family.
> > 
> > > +void nft_trace_start(struct sk_buff *skb)
> > > +{
> > > +	u32 seq = atomic_inc_return(&trace_cnt) << NFT_ID_SKB_HASHBITS;
> > > +
> > > +	__this_cpu_write(trace_seq, seq);
> > 
> > So we're using a global counter to initialize the per CPU vars? That doesn't
> > seem to make sense. I was thinking about partitioning the ID space among
> > CPUs in advance, this global counter kind of defeats using per CPU at all.
> > 
> > Ok I think I see what you're doing here, you want a stable ID while the
> > packet is processed.
> 
> Yes, that was the idea -- which is why v1 used hash_ptr(skb)
> 
> > The problem is we can not really guarantee that for
> > the output path with involuntary preemption, between different hook
> > invocations another packet might come along.
> 
> Yes.
> 
> > I actually didn't consider that we need to get the same ID multiple times
> > when I suggested the per CPU counter, I was only thinking about getting a
> > non-clashing ID *once*.
> 
> Oh.  What would be the purpose of that?
> The idea was to allow to easily figure out which trace messages belong
> to the same packet. (i.e., don't make it difficult to dissect events
> from different packets being processed on other cpus...)

Yeah, I didn't consider that we need to get the same identifier in
multiple hook invocations. 

> > I think we need to come up with a different idea,
> > sorry about that.
> 
> Hmm, in that case I have no idea how to fix this, sorry.
> Only *stable* id is hashing skb address, but as you pointed out
> that causes fast id reuse
> 
> (I think that would be ok, since its good enough to tell
>  near-simultaneous trace events apart).

I also don't have a better suggestion right now. We could try to find
additional skb members that are stable within netfilter, but other than
that I have no better ideas.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] src: trace infrastructure support
  2015-11-26 17:59 [PATCH v2 0/3] src: trace infrastructure support Florian Westphal
                   ` (3 preceding siblings ...)
  2015-11-26 20:53 ` [PATCH v2 0/3] src: trace infrastructure support Patrick McHardy
@ 2015-11-27  1:33 ` Patrick McHardy
  4 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2015-11-27  1:33 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 26.11, Florian Westphal wrote:
> This is V2 of the nft trace patch set.
> 
> See the individual patches for a change vs. v1 description.
> 
> Patrick: I did a quick test w. your last public patches
> (plus a few minor changes to make it build again) and things
> seem to work fine.
> 
> What does not work correctly is decoding of vlan header.
> I think whats happening is that we cannot splice sub-byte
> sized quantities from a larger chunk.

I've managed to decode it properly, so the kernel part is fine:

ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 vlan pcp 0 vlan cfi 0 vlan id 1000 vlan type ip ip saddr 10.0.0.1 ip daddr 10.0.0.2 ip tos 0 ip ttl 64 ip id 61847 ip length 84 icmp type echo-request icmp code 0 icmp id 29049 icmp sequence 4267 icmp type 61 icmp code 177 icmp id 0 icmp sequence 0 icmp type 116 icmp code 211

There are a number of problems however. The easy one is wrong header
definitions in nft, also my decoding is a bit hackish so far and occasionally
causes endless loops, I'll fix that.

The somewhat bigger problem are some inconsistencies. Usually we can see
the original lower layer header on the receive path in the higher layers,
f.i. the ethernet header. With VLAN offloading this doesn't work since we
loose the meta data once the packet has been received through the VLAN code.
This means we'll just see a regular ethernet header.

This does not only affect tracing, but anything that wants to access that
header. Its not a *huge* problem, but it is unexpected for the user,
especially since he is able to treat the offloaded header like a regular
VLAN header before "decapsulation".

I have no good suggestion what to do about that. The information in the skb
is gone. The only possibility I can think of is to reconstruct (most of) it
based on the incoming device, the known upper protocol and skb->priority,
but that's quite ugly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure
  2015-11-26 17:59 ` [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
  2015-11-26 23:20   ` Patrick McHardy
@ 2015-11-27  8:47   ` Patrick McHardy
  2015-11-27  9:23     ` Florian Westphal
  1 sibling, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2015-11-27  8:47 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

Some more comments on use of the netlink functions below:

On 26.11, Florian Westphal wrote:
> +++ b/net/netfilter/nf_tables_trace.c
> @@ -0,0 +1,253 @@
> +static int trace_fill_header(struct sk_buff *nlskb, u16 type,
> +			     const struct sk_buff *skb,
> +			     int off, unsigned int len)
> +{
> +	struct nlattr *nla;
> +
> +	if (len == 0)
> +		return 0;
> +
> +	if (skb_tailroom(nlskb) < nla_total_size(len))
> +		return -1;
> +
> +	nla = (struct nlattr *)skb_put(nlskb, nla_total_size(len));
> +	nla->nla_type = type;
> +	nla->nla_len = nla_attr_size(len);

nla_reserve()? That will also take care of zeroing the padding.

> +
> +	if (skb_copy_bits(skb, off, nla_data(nla), len))
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int nf_trace_fill_meta_info(struct sk_buff *nlskb,
> +				   const struct nft_pktinfo *pkt)
> +{
> +	if (pkt->in) {
> +		if (nla_put_be32(nlskb, NFTA_TRACE_IIF,
> +				 htonl(pkt->in->ifindex)))
> +			return -1;
> +
> +		/* needed for LL_HEADER decode */
> +		if (nla_put_be16(nlskb, NFTA_TRACE_IIFTYPE,
> +				 htons(pkt->in->type)))
> +			return -1;
> +	}
> +
> +	if (pkt->out &&
> +	    nla_put_be32(nlskb, NFTA_TRACE_OIF, htonl(pkt->out->ifindex)))
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int nf_trace_fill_ll_header(struct sk_buff *nlskb,
> +				   const struct sk_buff *skb)
> +{
> +	struct vlan_ethhdr veth;
> +	unsigned int len, off;
> +
> +	off = skb_mac_header(skb) - skb->data;

off is negative or zero, it should use a signed int I think.

> +
> +	if (skb_copy_bits(skb, off, &veth, ETH_HLEN))
> +		return -1;
> +
> +	veth.h_vlan_proto = skb->vlan_proto;
> +	veth.h_vlan_TCI = htons(skb_vlan_tag_get(skb));
> +	veth.h_vlan_encapsulated_proto = skb->protocol;
> +
> +	len = min_t(unsigned int,
> +		    skb->data - skb_mac_header(skb), NFT_TRACETYPE_LL_HSIZE);

min_t(unsigned int, -off, NFT_TRACETYPE_LL_HSIZE) ?

> +
> +	if (WARN_ON_ONCE(len != ETH_HLEN))
> +		return -1;

Why the min_t at all if we fail on != ETH_HLEN? I'd simply check that
it is exactly ETH_HLEN before even starting copying.

> +
> +	len = sizeof(veth);
> +	if (skb_tailroom(nlskb) < nla_total_size(len))
> +		return -1;
> +
> +	return nla_put(nlskb, NFTA_TRACE_LL_HEADER, len, &veth);

nla_put() already checks the tailroom.

> +}
> +
> +static int nf_trace_fill_pkt_info(struct sk_buff *nlskb,
> +				  const struct nft_pktinfo *pkt)
> +{
> +	const struct sk_buff *skb = pkt->skb;
> +	unsigned int len = min_t(unsigned int,
> +				 pkt->xt.thoff - skb_network_offset(skb),
> +				 NFT_TRACETYPE_NETWORK_HSIZE);
> +	int off = skb_network_offset(skb);
> +
> +	if (trace_fill_header(nlskb, NFTA_TRACE_NETWORK_HEADER, skb, off, len))
> +		return -1;
> +
> +	len = min_t(unsigned int, skb->len - pkt->xt.thoff,
> +		    NFT_TRACETYPE_TRANSPORT_HSIZE);
> +
> +	if (trace_fill_header(nlskb, NFTA_TRACE_TRANSPORT_HEADER, skb,
> +			      pkt->xt.thoff, len))
> +		return -1;
> +
> +	if (!skb_mac_header_was_set(skb))
> +		return 0;
> +
> +	if (skb_vlan_tag_get(skb))
> +		return nf_trace_fill_ll_header(nlskb, skb);
> +
> +	len = min_t(unsigned int,
> +		    skb->data - skb_mac_header(skb), NFT_TRACETYPE_LL_HSIZE);
> +	off = skb_mac_header(skb) - skb->data;

Could again avoid the double calculation by using -off in min_t().

> +	return trace_fill_header(nlskb, NFTA_TRACE_LL_HEADER,
> +				 skb, off, len);
> +}
> +
> +static int nf_trace_fill_chain_info(struct sk_buff *nlskb,
> +				    const struct nft_chain *chain)
> +{
> +	if (nla_put_string(nlskb, NFTA_TRACE_TABLE, chain->table->name))
> +		return -1;
> +
> +	return nla_put_string(nlskb, NFTA_TRACE_CHAIN, chain->name);

I would not move simple stuff that is only used once like this to another
function. Easier to read if you don't have to jump around in the code.

> +}
> +
> +void nft_trace_notify(struct nft_traceinfo *info,
> +		      const struct nft_pktinfo *pkt)
> +{
> +	struct nfgenmsg *nfmsg;
> +	struct nlmsghdr *nlh;
> +	struct sk_buff *skb;
> +	unsigned int size;
> +	int event = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_TRACE;
> +
> +	if (!nfnetlink_has_listeners(pkt->net, NFNLGRP_NFTRACE))
> +		return;
> +
> +	size = nlmsg_total_size(sizeof(struct nfgenmsg)) +
> +		nla_total_size(NFT_TABLE_MAXNAMELEN) +
> +		nla_total_size(NFT_CHAIN_MAXNAMELEN) +
> +		nla_total_size(sizeof(__be64)) +	/* rule handle */
> +		nla_total_size(sizeof(__be32)) +	/* trace type */
> +		nla_total_size(0) +			/* VERDICT, nested */
> +			nla_total_size(sizeof(u32)) +	/* verdict code */
> +			nla_total_size(NFT_CHAIN_MAXNAMELEN) + /* jump target */
> +		nla_total_size(sizeof(u32)) +		/* id */
> +		nla_total_size(NFT_TRACETYPE_LL_HSIZE) +
> +		nla_total_size(NFT_TRACETYPE_NETWORK_HSIZE) +
> +		nla_total_size(NFT_TRACETYPE_TRANSPORT_HSIZE) +
> +		nla_total_size(sizeof(u32)) +		/* mark */
> +		nla_total_size(sizeof(u32)) +		/* iif */
> +		nla_total_size(sizeof(u32)) +		/* oif */
> +		nla_total_size(sizeof(__be16));		/* device type */
> +
> +	skb = nlmsg_new(size, GFP_ATOMIC);
> +	if (!skb)
> +		return;
> +
> +	nlh = nlmsg_put(skb, 0, 0, event, sizeof(struct nfgenmsg), 0);
> +	if (!nlh)
> +		goto nla_put_failure;
> +
> +	nfmsg = nlmsg_data(nlh);
> +	nfmsg->nfgen_family	= pkt->pf;
> +	nfmsg->version		= NFNETLINK_V0;
> +	nfmsg->res_id		= 0;
> +
> +	if (nla_put_be32(skb, NFTA_TRACE_TYPE, htonl(info->type)))
> +		goto nla_put_failure;
> +
> +	if (trace_fill_id(skb, (unsigned long)pkt->skb))
> +		goto nla_put_failure;
> +
> +	if (info->chain &&
> +	    nf_trace_fill_chain_info(skb, info->chain))
> +		goto nla_put_failure;
> +
> +	if (info->rule &&
> +	    nla_put_be64(skb, NFTA_TRACE_RULE_HANDLE,
> +			 cpu_to_be64(info->rule->handle)))
> +		goto nla_put_failure;
> +
> +	if ((info->type == NFT_TRACETYPE_RETURN ||
> +	     info->type == NFT_TRACETYPE_RULE) &&
> +	    nft_verdict_dump(skb, info->verdict, NFTA_TRACE_VERDICT))
> +		goto nla_put_failure;
> +
> +	if (pkt->skb->mark &&
> +	    nla_put_be32(skb, NFTA_TRACE_MARK, htonl(pkt->skb->mark)))
> +		goto nla_put_failure;
> +
> +	if (!info->packet_dumped) {
> +		if (nf_trace_fill_meta_info(skb, pkt))
> +			goto nla_put_failure;
> +
> +		if (nf_trace_fill_pkt_info(skb, pkt))
> +			goto nla_put_failure;
> +		info->packet_dumped = true;
> +	}
> +
> +	nlmsg_end(skb, nlh);
> +	nfnetlink_send(skb, pkt->net, 0, NFNLGRP_NFTRACE, 0, GFP_ATOMIC);
> +	return;
> +
> + nla_put_failure:
> +	WARN_ON_ONCE(1);
> +	kfree_skb(skb);
> +}
> +
> +void nft_trace_start(struct sk_buff *skb)
> +{
> +	u32 seq = atomic_inc_return(&trace_cnt) << NFT_ID_SKB_HASHBITS;
> +
> +	__this_cpu_write(trace_seq, seq);
> +	skb->nf_trace = 1;
> +}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure
  2015-11-27  8:47   ` Patrick McHardy
@ 2015-11-27  9:23     ` Florian Westphal
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2015-11-27  9:23 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Florian Westphal, netfilter-devel

Patrick McHardy <kaber@trash.net> wrote:
> On 26.11, Florian Westphal wrote:
> > +++ b/net/netfilter/nf_tables_trace.c
> > @@ -0,0 +1,253 @@
> > +static int trace_fill_header(struct sk_buff *nlskb, u16 type,
> > +			     const struct sk_buff *skb,
> > +			     int off, unsigned int len)
> > +{
> > +	struct nlattr *nla;
> > +
> > +	if (len == 0)
> > +		return 0;
> > +
> > +	if (skb_tailroom(nlskb) < nla_total_size(len))
> > +		return -1;
> > +
> > +	nla = (struct nlattr *)skb_put(nlskb, nla_total_size(len));
> > +	nla->nla_type = type;
> > +	nla->nla_len = nla_attr_size(len);
> 
> nla_reserve()? That will also take care of zeroing the padding.

Fixed, thanks.

> > +static int nf_trace_fill_ll_header(struct sk_buff *nlskb,
> > +				   const struct sk_buff *skb)
> > +{
> > +	struct vlan_ethhdr veth;
> > +	unsigned int len, off;
> > +
> > +	off = skb_mac_header(skb) - skb->data;
> 
> off is negative or zero, it should use a signed int I think.

Right, fixed.

> > +
> > +	if (skb_copy_bits(skb, off, &veth, ETH_HLEN))
> > +		return -1;
> > +
> > +	veth.h_vlan_proto = skb->vlan_proto;
> > +	veth.h_vlan_TCI = htons(skb_vlan_tag_get(skb));
> > +	veth.h_vlan_encapsulated_proto = skb->protocol;
> > +
> > +	len = min_t(unsigned int,
> > +		    skb->data - skb_mac_header(skb), NFT_TRACETYPE_LL_HSIZE);
> 
> min_t(unsigned int, -off, NFT_TRACETYPE_LL_HSIZE) ?

[..]

> Why the min_t at all if we fail on != ETH_HLEN? I'd simply check that
> it is exactly ETH_HLEN before even starting copying.

Good point, I removed min test completely and changed it to:

    int off;

    BUILD_BUG_ON(NFT_TRACETYPE_LL_HSIZE < ETH_HLEN);

    off = skb_mac_header(skb) - skb->data;
    if (len != -ETH_HLEN)
            return -1;
    ...
> > +
> > +	len = sizeof(veth);
> > +	if (skb_tailroom(nlskb) < nla_total_size(len))
> > +		return -1;
> > +
> > +	return nla_put(nlskb, NFTA_TRACE_LL_HEADER, len, &veth);
> 
> nla_put() already checks the tailroom.

Right, my bad.  Removed.

> > +	len = min_t(unsigned int,
> > +		    skb->data - skb_mac_header(skb), NFT_TRACETYPE_LL_HSIZE);
> > +	off = skb_mac_header(skb) - skb->data;
> 
> Could again avoid the double calculation by using -off in min_t().

Fixed.

> > +static int nf_trace_fill_chain_info(struct sk_buff *nlskb,
> > +				    const struct nft_chain *chain)
> > +{
> > +	if (nla_put_string(nlskb, NFTA_TRACE_TABLE, chain->table->name))
> > +		return -1;
> > +
> > +	return nla_put_string(nlskb, NFTA_TRACE_CHAIN, chain->name);
> 
> I would not move simple stuff that is only used once like this to another
> function. Easier to read if you don't have to jump around in the code.

Ok, open-coded this instead.

Thanks for the review, I'll send a v3 soon.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure
  2015-11-27 18:43 GIT: [PATCH v3 0/3] netfilter " Florian Westphal
@ 2015-11-27 18:43 ` Florian Westphal
  2015-11-28 12:36   ` Patrick McHardy
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2015-11-27 18:43 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

nft monitor mode can then decode and display this trace data.

Parts of LL/Network/Transport headers are provided as separate
attributes.

Otherwise, printing IP address data becomes virtually impossible
for userspace since in the case of the netdev family we really don't
want userspace to have to know all the possible link layer types
and/or sizes just to display/print an ip address.

We also don't want userspace to have to follow ipv6 header chains
to get the s/dport info, the kernel already did this work for us.

To avoid bloating nft_do_chain all data required for tracing is
encapsulated in nft_traceinfo.

The structure is initialized unconditionally(!) for each nft_do_chain
invocation.

This unconditionall call will be moved under a static key in a
followup patch.

With lots of help from Patrick McHardy and Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
Changes since v2:
 - set nfgen_family to the rule family, not pkt->pf.
   To get the 'logical' family (e.g. NFPROTO_INET), we
   keep basechain ptr in nft_traceinfo struct.
 - export oif type, policy and pkt->pf
 - miniminze bloating of nft_do_chain
 - revert to a simpler id for now, based on skb address and
   rx hash
 - several cosmetic changes

 Changes since v1:
  - move trace into new file, nf_tables_trace.c (Pablo)
  - add new netlink group for trace messages (Pablo, Patrick)
  - make verdict nested attr (Patrick)
  - add percpu area to store current processed skb id
  (suggested by Patrick)
  - fold arguments to nft_trace_notify into a structure,
  nft_traceinfo.  This would allow easier extension later
  on, e.g. ptr to registers to e.g. log NFT_BREAK/mismatch
  as well.

 include/net/netfilter/nf_tables.h        |  32 ++++
 include/uapi/linux/netfilter/nf_tables.h |  52 ++++++
 include/uapi/linux/netfilter/nfnetlink.h |   2 +
 net/netfilter/Makefile                   |   2 +-
 net/netfilter/nf_tables_api.c            |  12 +-
 net/netfilter/nf_tables_core.c           |  45 ++++--
 net/netfilter/nf_tables_trace.c          | 269 +++++++++++++++++++++++++++++++
 net/netfilter/nfnetlink.c                |   1 +
 8 files changed, 396 insertions(+), 19 deletions(-)
 create mode 100644 net/netfilter/nf_tables_trace.c

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 101d7d7..b313cda 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -888,6 +888,38 @@ void nft_unregister_chain_type(const struct nf_chain_type *);
 int nft_register_expr(struct nft_expr_type *);
 void nft_unregister_expr(struct nft_expr_type *);
 
+int nft_verdict_dump(struct sk_buff *skb, int type,
+		     const struct nft_verdict *v);
+
+/**
+ *	struct nft_traceinfo - nft tracing information and state
+ *
+ *	@pkt: pktinfo currently processed
+ *	@basechain: base chain currently processed
+ *	@chain: chain currently processed
+ *	@rule:  rule that was evaluated
+ *	@verdict: verdict given by rule
+ *	@type: event type (enum nft_trace_types)
+ *	@packet_dumped: packet headers sent in a previous traceinfo message
+ *	@trace: other struct members are initialised
+ */
+struct nft_traceinfo {
+	const struct nft_pktinfo	*pkt;
+	const struct nft_base_chain	*basechain;
+	const struct nft_chain		*chain;
+	const struct nft_rule		*rule;
+	const struct nft_verdict	*verdict;
+	enum nft_trace_types		type;
+	bool				packet_dumped;
+	bool				trace;
+};
+
+void nft_trace_init(struct nft_traceinfo *info, const struct nft_pktinfo *pkt,
+		    const struct nft_verdict *verdict,
+		    const struct nft_chain *basechain);
+
+void nft_trace_notify(struct nft_traceinfo *info);
+
 #define nft_dereference(p)					\
 	nfnl_dereference(p, NFNL_SUBSYS_NFTABLES)
 
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 5f3ecec..b48a3ab 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -83,6 +83,7 @@ enum nft_verdicts {
  * @NFT_MSG_DELSETELEM: delete a set element (enum nft_set_elem_attributes)
  * @NFT_MSG_NEWGEN: announce a new generation, only for events (enum nft_gen_attributes)
  * @NFT_MSG_GETGEN: get the rule-set generation (enum nft_gen_attributes)
+ * @NFT_MSG_TRACE: trace event (enum nft_trace_attributes)
  */
 enum nf_tables_msg_types {
 	NFT_MSG_NEWTABLE,
@@ -102,6 +103,7 @@ enum nf_tables_msg_types {
 	NFT_MSG_DELSETELEM,
 	NFT_MSG_NEWGEN,
 	NFT_MSG_GETGEN,
+	NFT_MSG_TRACE,
 	NFT_MSG_MAX,
 };
 
@@ -987,4 +989,54 @@ enum nft_gen_attributes {
 };
 #define NFTA_GEN_MAX		(__NFTA_GEN_MAX - 1)
 
+/**
+ * enum nft_trace_attributes - nf_tables trace netlink attributes
+ *
+ * @NFTA_TRACE_TABLE: name of the table (NLA_STRING)
+ * @NFTA_TRACE_CHAIN: name of the chain (NLA_STRING)
+ * @NFTA_TRACE_RULE_HANDLE: numeric handle of the rule (NLA_U64)
+ * @NFTA_TRACE_TYPE: type of the event (NLA_U32: nft_trace_types)
+ * @NFTA_TRACE_VERDICT: verdict returned by hook (NLA_NESTED: nft_verdicts)
+ * @NFTA_TRACE_ID: pseudo-id, same for each skb traced (NLA_U32)
+ * @NFTA_TRACE_LL_HEADER: linklayer header (NLA_BINARY)
+ * @NFTA_TRACE_NETWORK_HEADER: network header (NLA_BINARY)
+ * @NFTA_TRACE_TRANSPORT_HEADER: transport header (NLA_BINARY)
+ * @NFTA_TRACE_IIF: indev ifindex (NLA_U32)
+ * @NFTA_TRACE_IIFTYPE: netdev->type of indev (NLA_U16)
+ * @NFTA_TRACE_OIF: outdev ifindex (NLA_U32)
+ * @NFTA_TRACE_OIFTYPE: netdev->type of outdev (NLA_U16)
+ * @NFTA_TRACE_MARK: nfmark (NLA_U32)
+ * @NFTA_TRACE_NFPROTO: nf protocol processed (NLA_U32)
+ * @NFTA_TRACE_POLICY: policy that decided fate of packet (NLA_U32)
+ */
+enum nft_trace_attibutes {
+	NFTA_TRACE_UNSPEC,
+	NFTA_TRACE_TABLE,
+	NFTA_TRACE_CHAIN,
+	NFTA_TRACE_RULE_HANDLE,
+	NFTA_TRACE_TYPE,
+	NFTA_TRACE_VERDICT,
+	NFTA_TRACE_ID,
+	NFTA_TRACE_LL_HEADER,
+	NFTA_TRACE_NETWORK_HEADER,
+	NFTA_TRACE_TRANSPORT_HEADER,
+	NFTA_TRACE_IIF,
+	NFTA_TRACE_IIFTYPE,
+	NFTA_TRACE_OIF,
+	NFTA_TRACE_OIFTYPE,
+	NFTA_TRACE_MARK,
+	NFTA_TRACE_NFPROTO,
+	NFTA_TRACE_POLICY,
+	__NFTA_TRACE_MAX
+};
+#define NFTA_TRACE_MAX (__NFTA_TRACE_MAX - 1)
+
+enum nft_trace_types {
+	NFT_TRACETYPE_UNSPEC,
+	NFT_TRACETYPE_POLICY,
+	NFT_TRACETYPE_RETURN,
+	NFT_TRACETYPE_RULE,
+	__NFT_TRACETYPE_MAX
+};
+#define NFT_TRACETYPE_MAX (__NFT_TRACETYPE_MAX - 1)
 #endif /* _LINUX_NF_TABLES_H */
diff --git a/include/uapi/linux/netfilter/nfnetlink.h b/include/uapi/linux/netfilter/nfnetlink.h
index 354a7e5..4bb8cb7 100644
--- a/include/uapi/linux/netfilter/nfnetlink.h
+++ b/include/uapi/linux/netfilter/nfnetlink.h
@@ -22,6 +22,8 @@ enum nfnetlink_groups {
 #define NFNLGRP_NFTABLES                NFNLGRP_NFTABLES
 	NFNLGRP_ACCT_QUOTA,
 #define NFNLGRP_ACCT_QUOTA		NFNLGRP_ACCT_QUOTA
+	NFNLGRP_NFTRACE,
+#define NFNLGRP_NFTRACE			NFNLGRP_NFTRACE
 	__NFNLGRP_MAX,
 };
 #define NFNLGRP_MAX	(__NFNLGRP_MAX - 1)
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 7638c36..2293484 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -67,7 +67,7 @@ obj-$(CONFIG_NF_NAT_TFTP) += nf_nat_tftp.o
 obj-$(CONFIG_NETFILTER_SYNPROXY) += nf_synproxy_core.o
 
 # nf_tables
-nf_tables-objs += nf_tables_core.o nf_tables_api.o
+nf_tables-objs += nf_tables_core.o nf_tables_api.o nf_tables_trace.o
 nf_tables-objs += nft_immediate.o nft_cmp.o nft_lookup.o nft_dynset.o
 nf_tables-objs += nft_bitwise.o nft_byteorder.o nft_payload.o
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 93cc473..c4969a0 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4446,22 +4446,22 @@ static void nft_verdict_uninit(const struct nft_data *data)
 	}
 }
 
-static int nft_verdict_dump(struct sk_buff *skb, const struct nft_data *data)
+int nft_verdict_dump(struct sk_buff *skb, int type, const struct nft_verdict *v)
 {
 	struct nlattr *nest;
 
-	nest = nla_nest_start(skb, NFTA_DATA_VERDICT);
+	nest = nla_nest_start(skb, type);
 	if (!nest)
 		goto nla_put_failure;
 
-	if (nla_put_be32(skb, NFTA_VERDICT_CODE, htonl(data->verdict.code)))
+	if (nla_put_be32(skb, NFTA_VERDICT_CODE, htonl(v->code)))
 		goto nla_put_failure;
 
-	switch (data->verdict.code) {
+	switch (v->code) {
 	case NFT_JUMP:
 	case NFT_GOTO:
 		if (nla_put_string(skb, NFTA_VERDICT_CHAIN,
-				   data->verdict.chain->name))
+				   v->chain->name))
 			goto nla_put_failure;
 	}
 	nla_nest_end(skb, nest);
@@ -4572,7 +4572,7 @@ int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data,
 		err = nft_value_dump(skb, data, len);
 		break;
 	case NFT_DATA_VERDICT:
-		err = nft_verdict_dump(skb, data);
+		err = nft_verdict_dump(skb, NFTA_DATA_VERDICT, &data->verdict);
 		break;
 	default:
 		err = -EINVAL;
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index f3695a4..2395de7 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -44,22 +44,36 @@ static struct nf_loginfo trace_loginfo = {
 	},
 };
 
-static void __nft_trace_packet(const struct nft_pktinfo *pkt,
-			       const struct nft_chain *chain,
-			       int rulenum, enum nft_trace type)
+static noinline void __nft_trace_packet(struct nft_traceinfo *info,
+					const struct nft_chain *chain,
+					int rulenum, enum nft_trace type)
 {
+	const struct nft_pktinfo *pkt = info->pkt;
+
+	if (!pkt->skb->nf_trace)
+		return;
+
+	info->chain = chain;
+	info->type = type;
+
+	nft_trace_notify(info);
+
 	nf_log_trace(pkt->net, pkt->pf, pkt->hook, pkt->skb, pkt->in,
 		     pkt->out, &trace_loginfo, "TRACE: %s:%s:%s:%u ",
 		     chain->table->name, chain->name, comments[type],
 		     rulenum);
 }
 
-static inline void nft_trace_packet(const struct nft_pktinfo *pkt,
+static inline void nft_trace_packet(struct nft_traceinfo *info,
 				    const struct nft_chain *chain,
-				    int rulenum, enum nft_trace type)
+				    const struct nft_rule *rule,
+				    int rulenum,
+				    enum nft_trace_types type)
 {
-	if (unlikely(pkt->skb->nf_trace))
-		__nft_trace_packet(pkt, chain, rulenum, type);
+	if (unlikely(info->trace)) {
+		info->rule = rule;
+		__nft_trace_packet(info, chain, rulenum, type);
+	}
 }
 
 static void nft_cmp_fast_eval(const struct nft_expr *expr,
@@ -121,7 +135,9 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
 	struct nft_stats *stats;
 	int rulenum;
 	unsigned int gencursor = nft_genmask_cur(net);
+	struct nft_traceinfo info;
 
+	nft_trace_init(&info, pkt, &regs.verdict, basechain);
 do_chain:
 	rulenum = 0;
 	rule = list_entry(&chain->rules, struct nft_rule, list);
@@ -151,7 +167,8 @@ next_rule:
 			regs.verdict.code = NFT_CONTINUE;
 			continue;
 		case NFT_CONTINUE:
-			nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+			nft_trace_packet(&info, chain, rule,
+					 rulenum, NFT_TRACETYPE_RULE);
 			continue;
 		}
 		break;
@@ -161,7 +178,8 @@ next_rule:
 	case NF_ACCEPT:
 	case NF_DROP:
 	case NF_QUEUE:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+		nft_trace_packet(&info, chain, rule,
+				 rulenum, NFT_TRACETYPE_RULE);
 		return regs.verdict.code;
 	}
 
@@ -174,7 +192,8 @@ next_rule:
 		stackptr++;
 		/* fall through */
 	case NFT_GOTO:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE);
+		nft_trace_packet(&info, chain, rule,
+				 rulenum, NFT_TRACETYPE_RULE);
 
 		chain = regs.verdict.chain;
 		goto do_chain;
@@ -182,7 +201,8 @@ next_rule:
 		rulenum++;
 		/* fall through */
 	case NFT_RETURN:
-		nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RETURN);
+		nft_trace_packet(&info, chain, rule,
+				 rulenum, NFT_TRACETYPE_RETURN);
 		break;
 	default:
 		WARN_ON(1);
@@ -196,7 +216,8 @@ next_rule:
 		goto next_rule;
 	}
 
-	nft_trace_packet(pkt, basechain, -1, NFT_TRACE_POLICY);
+	nft_trace_packet(&info, basechain, NULL, -1,
+			 NFT_TRACETYPE_POLICY);
 
 	rcu_read_lock_bh();
 	stats = this_cpu_ptr(rcu_dereference(nft_base_chain(basechain)->stats));
diff --git a/net/netfilter/nf_tables_trace.c b/net/netfilter/nf_tables_trace.c
new file mode 100644
index 0000000..672add8
--- /dev/null
+++ b/net/netfilter/nf_tables_trace.c
@@ -0,0 +1,269 @@
+/*
+ * (C) 2015 Red Hat GmbH
+ * Author: Florian Westphal <fw@strlen.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/hash.h>
+#include <linux/jhash.h>
+#include <linux/if_vlan.h>
+#include <linux/init.h>
+#include <linux/skbuff.h>
+#include <linux/netlink.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
+#include <net/netfilter/nf_tables.h>
+
+#define NFT_TRACETYPE_LL_HSIZE		20
+#define NFT_TRACETYPE_NETWORK_HSIZE	40
+#define NFT_TRACETYPE_TRANSPORT_HSIZE	20
+
+static int trace_fill_id(struct sk_buff *nlskb, struct sk_buff *skb)
+{
+	__be32 id;
+
+	/* using skb address as ID results in a limited number of
+	 * values (and quick reuse).
+	 *
+	 * So we attempt to use as many skb members that will not
+	 * change while skb is with netfilter.
+	 */
+	id = (__be32)jhash_2words(hash32_ptr(skb), skb_get_hash(skb),
+				  skb->skb_iif);
+
+	return nla_put_be32(nlskb, NFTA_TRACE_ID, id);
+}
+
+static int trace_fill_header(struct sk_buff *nlskb, u16 type,
+			     const struct sk_buff *skb,
+			     int off, unsigned int len)
+{
+	struct nlattr *nla;
+
+	if (len == 0)
+		return 0;
+
+	nla = nla_reserve(nlskb, type, len);
+	if (!nla || skb_copy_bits(skb, off, nla_data(nla), len))
+		return -1;
+
+	return 0;
+}
+
+static int nf_trace_fill_ll_header(struct sk_buff *nlskb,
+				   const struct sk_buff *skb)
+{
+	struct vlan_ethhdr veth;
+	int off;
+
+	BUILD_BUG_ON(sizeof(veth) > NFT_TRACETYPE_LL_HSIZE);
+
+	off = skb_mac_header(skb) - skb->data;
+	if (off != -ETH_HLEN)
+		return -1;
+
+	if (skb_copy_bits(skb, off, &veth, ETH_HLEN))
+		return -1;
+
+	veth.h_vlan_proto = skb->vlan_proto;
+	veth.h_vlan_TCI = htons(skb_vlan_tag_get(skb));
+	veth.h_vlan_encapsulated_proto = skb->protocol;
+
+	return nla_put(nlskb, NFTA_TRACE_LL_HEADER, sizeof(veth), &veth);
+}
+
+static int nf_trace_fill_dev_info(struct sk_buff *nlskb,
+				  const struct net_device *indev,
+				  const struct net_device *outdev)
+{
+	if (indev) {
+		if (nla_put_be32(nlskb, NFTA_TRACE_IIF,
+				 htonl(indev->ifindex)))
+			return -1;
+
+		if (nla_put_be16(nlskb, NFTA_TRACE_IIFTYPE,
+				 htons(indev->type)))
+			return -1;
+	}
+
+	if (outdev) {
+		if (nla_put_be32(nlskb, NFTA_TRACE_OIF,
+				 htonl(outdev->ifindex)))
+			return -1;
+
+		if (nla_put_be16(nlskb, NFTA_TRACE_OIFTYPE,
+				 htons(outdev->type)))
+			return -1;
+	}
+
+	return 0;
+}
+
+static int nf_trace_fill_pkt_info(struct sk_buff *nlskb,
+				  const struct nft_pktinfo *pkt)
+{
+	const struct sk_buff *skb = pkt->skb;
+	unsigned int len = min_t(unsigned int,
+				 pkt->xt.thoff - skb_network_offset(skb),
+				 NFT_TRACETYPE_NETWORK_HSIZE);
+	int off = skb_network_offset(skb);
+
+	if (trace_fill_header(nlskb, NFTA_TRACE_NETWORK_HEADER, skb, off, len))
+		return -1;
+
+	len = min_t(unsigned int, skb->len - pkt->xt.thoff,
+		    NFT_TRACETYPE_TRANSPORT_HSIZE);
+
+	if (trace_fill_header(nlskb, NFTA_TRACE_TRANSPORT_HEADER, skb,
+			      pkt->xt.thoff, len))
+		return -1;
+
+	if (!skb_mac_header_was_set(skb))
+		return 0;
+
+	if (skb_vlan_tag_get(skb))
+		return nf_trace_fill_ll_header(nlskb, skb);
+
+	off = skb_mac_header(skb) - skb->data;
+	len = min_t(unsigned int, -off, NFT_TRACETYPE_LL_HSIZE);
+	return trace_fill_header(nlskb, NFTA_TRACE_LL_HEADER,
+				 skb, off, len);
+}
+
+static int nf_trace_fill_rule_info(struct sk_buff *nlskb,
+				   const struct nft_traceinfo *info)
+{
+	if (!info->rule)
+		return 0;
+
+	/* a continue verdict with ->type == RETURN means that this is
+	 * an implicit return (end of chain reached).
+	 *
+	 * Since no rule matched, the ->rule pointer is invalid.
+	 */
+	if (info->type == NFT_TRACETYPE_RETURN &&
+	    info->verdict->code == NFT_CONTINUE)
+		return 0;
+
+	return nla_put_be64(nlskb, NFTA_TRACE_RULE_HANDLE,
+			    cpu_to_be64(info->rule->handle));
+}
+
+void nft_trace_notify(struct nft_traceinfo *info)
+{
+	const struct nft_pktinfo *pkt = info->pkt;
+	struct nfgenmsg *nfmsg;
+	struct nlmsghdr *nlh;
+	struct sk_buff *skb;
+	unsigned int size;
+	int event = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_TRACE;
+
+	if (!nfnetlink_has_listeners(pkt->net, NFNLGRP_NFTRACE))
+		return;
+
+	size = nlmsg_total_size(sizeof(struct nfgenmsg)) +
+		nla_total_size(NFT_TABLE_MAXNAMELEN) +
+		nla_total_size(NFT_CHAIN_MAXNAMELEN) +
+		nla_total_size(sizeof(__be64)) +	/* rule handle */
+		nla_total_size(sizeof(__be32)) +	/* trace type */
+		nla_total_size(0) +			/* VERDICT, nested */
+			nla_total_size(sizeof(u32)) +	/* verdict code */
+			nla_total_size(NFT_CHAIN_MAXNAMELEN) + /* jump target */
+		nla_total_size(sizeof(u32)) +		/* id */
+		nla_total_size(NFT_TRACETYPE_LL_HSIZE) +
+		nla_total_size(NFT_TRACETYPE_NETWORK_HSIZE) +
+		nla_total_size(NFT_TRACETYPE_TRANSPORT_HSIZE) +
+		nla_total_size(sizeof(u32)) +		/* mark */
+		nla_total_size(sizeof(u32)) +		/* iif */
+		nla_total_size(sizeof(__be16)) +	/* iiftype */
+		nla_total_size(sizeof(u32)) +		/* oif */
+		nla_total_size(sizeof(__be16));		/* oiftype */
+
+	skb = nlmsg_new(size, GFP_ATOMIC);
+	if (!skb)
+		return;
+
+	nlh = nlmsg_put(skb, 0, 0, event, sizeof(struct nfgenmsg), 0);
+	if (!nlh)
+		goto nla_put_failure;
+
+	nfmsg = nlmsg_data(nlh);
+	nfmsg->nfgen_family	= info->basechain->type->family;
+	nfmsg->version		= NFNETLINK_V0;
+	nfmsg->res_id		= 0;
+
+	if (nla_put_be32(skb, NFTA_TRACE_NFPROTO, htonl(pkt->pf)))
+		goto nla_put_failure;
+
+	if (nla_put_be32(skb, NFTA_TRACE_TYPE, htonl(info->type)))
+		goto nla_put_failure;
+
+	if (trace_fill_id(skb, pkt->skb))
+		goto nla_put_failure;
+
+	if (info->chain) {
+		if (nla_put_string(skb, NFTA_TRACE_CHAIN,
+				   info->chain->name))
+			goto nla_put_failure;
+		if (nla_put_string(skb, NFTA_TRACE_TABLE,
+				   info->chain->table->name))
+			goto nla_put_failure;
+	}
+
+	if (nf_trace_fill_rule_info(skb, info))
+		goto nla_put_failure;
+
+	switch (info->type) {
+	case NFT_TRACETYPE_UNSPEC:
+	case __NFT_TRACETYPE_MAX:
+		break;
+	case NFT_TRACETYPE_RETURN:
+	case NFT_TRACETYPE_RULE:
+		if (nft_verdict_dump(skb, NFTA_TRACE_VERDICT, info->verdict))
+			goto nla_put_failure;
+		break;
+	case NFT_TRACETYPE_POLICY:
+		if (nla_put_be32(skb, NFTA_TRACE_POLICY,
+				 info->basechain->policy))
+			goto nla_put_failure;
+		break;
+	}
+
+	if (pkt->skb->mark &&
+	    nla_put_be32(skb, NFTA_TRACE_MARK, htonl(pkt->skb->mark)))
+		goto nla_put_failure;
+
+	if (!info->packet_dumped) {
+		if (nf_trace_fill_dev_info(skb, pkt->in, pkt->out))
+			goto nla_put_failure;
+
+		if (nf_trace_fill_pkt_info(skb, pkt))
+			goto nla_put_failure;
+		info->packet_dumped = true;
+	}
+
+	nlmsg_end(skb, nlh);
+	nfnetlink_send(skb, pkt->net, 0, NFNLGRP_NFTRACE, 0, GFP_ATOMIC);
+	return;
+
+ nla_put_failure:
+	WARN_ON_ONCE(1);
+	kfree_skb(skb);
+}
+
+void nft_trace_init(struct nft_traceinfo *info, const struct nft_pktinfo *pkt,
+		    const struct nft_verdict *verdict,
+		    const struct nft_chain *chain)
+{
+	info->basechain = nft_base_chain(chain);
+	info->trace = true;
+	info->packet_dumped = false;
+	info->pkt = pkt;
+	info->verdict = verdict;
+}
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 46453ab..28591fa 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -49,6 +49,7 @@ static const int nfnl_group2type[NFNLGRP_MAX+1] = {
 	[NFNLGRP_CONNTRACK_EXP_DESTROY] = NFNL_SUBSYS_CTNETLINK_EXP,
 	[NFNLGRP_NFTABLES]		= NFNL_SUBSYS_NFTABLES,
 	[NFNLGRP_ACCT_QUOTA]		= NFNL_SUBSYS_ACCT,
+	[NFNLGRP_NFTRACE]		= NFNL_SUBSYS_NFTABLES,
 };
 
 void nfnl_lock(__u8 subsys_id)
-- 
2.4.10


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure
  2015-11-27 18:43 ` [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
@ 2015-11-28 12:36   ` Patrick McHardy
  0 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2015-11-28 12:36 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 27.11, Florian Westphal wrote:
> nft monitor mode can then decode and display this trace data.
> 
> Parts of LL/Network/Transport headers are provided as separate
> attributes.
> 
> Otherwise, printing IP address data becomes virtually impossible
> for userspace since in the case of the netdev family we really don't
> want userspace to have to know all the possible link layer types
> and/or sizes just to display/print an ip address.
> 
> We also don't want userspace to have to follow ipv6 header chains
> to get the s/dport info, the kernel already did this work for us.
> 
> To avoid bloating nft_do_chain all data required for tracing is
> encapsulated in nft_traceinfo.
> 
> The structure is initialized unconditionally(!) for each nft_do_chain
> invocation.
> 
> This unconditionall call will be moved under a static key in a
> followup patch.
> 
> With lots of help from Patrick McHardy and Pablo Neira.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Looks very good to me, nice work!

Acked-by: Patrick McHardy <kaber@trash.net>

for both kernel patches.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-28 12:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-26 17:59 [PATCH v2 0/3] src: trace infrastructure support Florian Westphal
2015-11-26 17:59 ` [PATCH v2 libnftnl 1/3] src: add " Florian Westphal
2015-11-26 22:11   ` Patrick McHardy
2015-11-26 17:59 ` [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
2015-11-26 23:20   ` Patrick McHardy
2015-11-27  0:10     ` Florian Westphal
2015-11-27  1:24       ` Patrick McHardy
2015-11-27  8:47   ` Patrick McHardy
2015-11-27  9:23     ` Florian Westphal
2015-11-26 17:59 ` [PATCH v2 nf-next 3/3] netfilter: nf_tables: wrap tracing with a static key Florian Westphal
2015-11-26 21:30   ` Patrick McHardy
2015-11-26 20:53 ` [PATCH v2 0/3] src: trace infrastructure support Patrick McHardy
2015-11-27  1:33 ` Patrick McHardy
  -- strict thread matches above, loose matches on Subject: below --
2015-11-27 18:43 GIT: [PATCH v3 0/3] netfilter " Florian Westphal
2015-11-27 18:43 ` [PATCH v2 nf-next 2/3] netfilter: nf_tables: extend tracing infrastructure Florian Westphal
2015-11-28 12:36   ` Patrick McHardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).