Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 28/31] netfilter: nf_flow_table: remove unnecessary nat flag check code
From: Pablo Neira Ayuso @ 2018-10-08 23:01 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20181008230125.2330-1-pablo@netfilter.org>

From: Taehee Yoo <ap420073@gmail.com>

nf_flow_offload_{ip/ipv6}_hook() check nat flag then, call
nf_flow_nat_{ip/ipv6} but that also check nat flag. so that
nat flag check code in nf_flow_offload_{ip/ipv6}_hook() are unnecessary.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_ip.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 15ed91309992..1d291a51cd45 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -254,8 +254,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 	if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff))
 		return NF_ACCEPT;
 
-	if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
-	    nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
+	if (nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
 		return NF_DROP;
 
 	flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
@@ -471,8 +470,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 	if (skb_try_make_writable(skb, sizeof(*ip6h)))
 		return NF_DROP;
 
-	if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
-	    nf_flow_nat_ipv6(flow, skb, dir) < 0)
+	if (nf_flow_nat_ipv6(flow, skb, dir) < 0)
 		return NF_DROP;
 
 	flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
-- 
2.11.0

^ permalink raw reply related

* [PATCH 27/31] netfilter: nf_tables: add requirements for connsecmark support
From: Pablo Neira Ayuso @ 2018-10-08 23:01 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20181008230125.2330-1-pablo@netfilter.org>

From: Christian Göttsche <cgzones@googlemail.com>

Add ability to set the connection tracking secmark value.

Add ability to set the meta secmark value.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_ct.c   | 17 ++++++++++++++++-
 net/netfilter/nft_meta.c |  8 ++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index d74afa70774f..586627c361df 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -279,7 +279,7 @@ static void nft_ct_set_eval(const struct nft_expr *expr,
 {
 	const struct nft_ct *priv = nft_expr_priv(expr);
 	struct sk_buff *skb = pkt->skb;
-#ifdef CONFIG_NF_CONNTRACK_MARK
+#if defined(CONFIG_NF_CONNTRACK_MARK) || defined(CONFIG_NF_CONNTRACK_SECMARK)
 	u32 value = regs->data[priv->sreg];
 #endif
 	enum ip_conntrack_info ctinfo;
@@ -298,6 +298,14 @@ static void nft_ct_set_eval(const struct nft_expr *expr,
 		}
 		break;
 #endif
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+	case NFT_CT_SECMARK:
+		if (ct->secmark != value) {
+			ct->secmark = value;
+			nf_conntrack_event_cache(IPCT_SECMARK, ct);
+		}
+		break;
+#endif
 #ifdef CONFIG_NF_CONNTRACK_LABELS
 	case NFT_CT_LABELS:
 		nf_connlabels_replace(ct,
@@ -565,6 +573,13 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
 		len = sizeof(u32);
 		break;
 #endif
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+	case NFT_CT_SECMARK:
+		if (tb[NFTA_CT_DIRECTION])
+			return -EINVAL;
+		len = sizeof(u32);
+		break;
+#endif
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 91fd6e677ad7..6180626c3f80 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -284,6 +284,11 @@ static void nft_meta_set_eval(const struct nft_expr *expr,
 
 		skb->nf_trace = !!value8;
 		break;
+#ifdef CONFIG_NETWORK_SECMARK
+	case NFT_META_SECMARK:
+		skb->secmark = value;
+		break;
+#endif
 	default:
 		WARN_ON(1);
 	}
@@ -436,6 +441,9 @@ static int nft_meta_set_init(const struct nft_ctx *ctx,
 	switch (priv->key) {
 	case NFT_META_MARK:
 	case NFT_META_PRIORITY:
+#ifdef CONFIG_NETWORK_SECMARK
+	case NFT_META_SECMARK:
+#endif
 		len = sizeof(u32);
 		break;
 	case NFT_META_NFTRACE:
-- 
2.11.0

^ permalink raw reply related

* [PATCH 26/31] netfilter: nf_tables: add SECMARK support
From: Pablo Neira Ayuso @ 2018-10-08 23:01 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20181008230125.2330-1-pablo@netfilter.org>

From: Christian Göttsche <cgzones@googlemail.com>

Add the ability to set the security context of packets within the nf_tables framework.
Add a nft_object for holding security contexts in the kernel and manipulating packets on the wire.

Convert the security context strings at rule addition time to security identifiers.
This is the same behavior like in xt_SECMARK and offers better performance than computing it per packet.

Set the maximum security context length to 256.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables_core.h   |   4 ++
 include/uapi/linux/netfilter/nf_tables.h |  18 +++++-
 net/netfilter/nf_tables_core.c           |  28 ++++++--
 net/netfilter/nft_meta.c                 | 108 +++++++++++++++++++++++++++++++
 4 files changed, 153 insertions(+), 5 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index 8da837d2aaf9..2046d104f323 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -16,6 +16,10 @@ extern struct nft_expr_type nft_meta_type;
 extern struct nft_expr_type nft_rt_type;
 extern struct nft_expr_type nft_exthdr_type;
 
+#ifdef CONFIG_NETWORK_SECMARK
+extern struct nft_object_type nft_secmark_obj_type;
+#endif
+
 int nf_tables_core_module_init(void);
 void nf_tables_core_module_exit(void);
 
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 702e4f0bec56..5444e76870bb 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -1177,6 +1177,21 @@ enum nft_quota_attributes {
 #define NFTA_QUOTA_MAX		(__NFTA_QUOTA_MAX - 1)
 
 /**
+ * enum nft_secmark_attributes - nf_tables secmark object netlink attributes
+ *
+ * @NFTA_SECMARK_CTX: security context (NLA_STRING)
+ */
+enum nft_secmark_attributes {
+	NFTA_SECMARK_UNSPEC,
+	NFTA_SECMARK_CTX,
+	__NFTA_SECMARK_MAX,
+};
+#define NFTA_SECMARK_MAX	(__NFTA_SECMARK_MAX - 1)
+
+/* Max security context length */
+#define NFT_SECMARK_CTX_MAXLEN		256
+
+/**
  * enum nft_reject_types - nf_tables reject expression reject types
  *
  * @NFT_REJECT_ICMP_UNREACH: reject using ICMP unreachable
@@ -1432,7 +1447,8 @@ enum nft_ct_timeout_timeout_attributes {
 #define NFT_OBJECT_CONNLIMIT	5
 #define NFT_OBJECT_TUNNEL	6
 #define NFT_OBJECT_CT_TIMEOUT	7
-#define __NFT_OBJECT_MAX	8
+#define NFT_OBJECT_SECMARK	8
+#define __NFT_OBJECT_MAX	9
 #define NFT_OBJECT_MAX		(__NFT_OBJECT_MAX - 1)
 
 /**
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index ffd5c0f9412b..3fbce3b9c5ec 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -249,12 +249,24 @@ static struct nft_expr_type *nft_basic_types[] = {
 	&nft_exthdr_type,
 };
 
+static struct nft_object_type *nft_basic_objects[] = {
+#ifdef CONFIG_NETWORK_SECMARK
+	&nft_secmark_obj_type,
+#endif
+};
+
 int __init nf_tables_core_module_init(void)
 {
-	int err, i;
+	int err, i, j = 0;
+
+	for (i = 0; i < ARRAY_SIZE(nft_basic_objects); i++) {
+		err = nft_register_obj(nft_basic_objects[i]);
+		if (err)
+			goto err;
+	}
 
-	for (i = 0; i < ARRAY_SIZE(nft_basic_types); i++) {
-		err = nft_register_expr(nft_basic_types[i]);
+	for (j = 0; j < ARRAY_SIZE(nft_basic_types); j++) {
+		err = nft_register_expr(nft_basic_types[j]);
 		if (err)
 			goto err;
 	}
@@ -262,8 +274,12 @@ int __init nf_tables_core_module_init(void)
 	return 0;
 
 err:
+	while (j-- > 0)
+		nft_unregister_expr(nft_basic_types[j]);
+
 	while (i-- > 0)
-		nft_unregister_expr(nft_basic_types[i]);
+		nft_unregister_obj(nft_basic_objects[i]);
+
 	return err;
 }
 
@@ -274,4 +290,8 @@ void nf_tables_core_module_exit(void)
 	i = ARRAY_SIZE(nft_basic_types);
 	while (i-- > 0)
 		nft_unregister_expr(nft_basic_types[i]);
+
+	i = ARRAY_SIZE(nft_basic_objects);
+	while (i-- > 0)
+		nft_unregister_obj(nft_basic_objects[i]);
 }
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 297fe7d97c18..91fd6e677ad7 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -543,3 +543,111 @@ struct nft_expr_type nft_meta_type __read_mostly = {
 	.maxattr	= NFTA_META_MAX,
 	.owner		= THIS_MODULE,
 };
+
+#ifdef CONFIG_NETWORK_SECMARK
+struct nft_secmark {
+	u32 secid;
+	char *ctx;
+};
+
+static const struct nla_policy nft_secmark_policy[NFTA_SECMARK_MAX + 1] = {
+	[NFTA_SECMARK_CTX]     = { .type = NLA_STRING, .len = NFT_SECMARK_CTX_MAXLEN },
+};
+
+static int nft_secmark_compute_secid(struct nft_secmark *priv)
+{
+	u32 tmp_secid = 0;
+	int err;
+
+	err = security_secctx_to_secid(priv->ctx, strlen(priv->ctx), &tmp_secid);
+	if (err)
+		return err;
+
+	if (!tmp_secid)
+		return -ENOENT;
+
+	err = security_secmark_relabel_packet(tmp_secid);
+	if (err)
+		return err;
+
+	priv->secid = tmp_secid;
+	return 0;
+}
+
+static void nft_secmark_obj_eval(struct nft_object *obj, struct nft_regs *regs,
+				 const struct nft_pktinfo *pkt)
+{
+	const struct nft_secmark *priv = nft_obj_data(obj);
+	struct sk_buff *skb = pkt->skb;
+
+	skb->secmark = priv->secid;
+}
+
+static int nft_secmark_obj_init(const struct nft_ctx *ctx,
+				const struct nlattr * const tb[],
+				struct nft_object *obj)
+{
+	struct nft_secmark *priv = nft_obj_data(obj);
+	int err;
+
+	if (tb[NFTA_SECMARK_CTX] == NULL)
+		return -EINVAL;
+
+	priv->ctx = nla_strdup(tb[NFTA_SECMARK_CTX], GFP_KERNEL);
+	if (!priv->ctx)
+		return -ENOMEM;
+
+	err = nft_secmark_compute_secid(priv);
+	if (err) {
+		kfree(priv->ctx);
+		return err;
+	}
+
+	security_secmark_refcount_inc();
+
+	return 0;
+}
+
+static int nft_secmark_obj_dump(struct sk_buff *skb, struct nft_object *obj,
+				bool reset)
+{
+	struct nft_secmark *priv = nft_obj_data(obj);
+	int err;
+
+	if (nla_put_string(skb, NFTA_SECMARK_CTX, priv->ctx))
+		return -1;
+
+	if (reset) {
+		err = nft_secmark_compute_secid(priv);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static void nft_secmark_obj_destroy(const struct nft_ctx *ctx, struct nft_object *obj)
+{
+	struct nft_secmark *priv = nft_obj_data(obj);
+
+	security_secmark_refcount_dec();
+
+	kfree(priv->ctx);
+}
+
+static const struct nft_object_ops nft_secmark_obj_ops = {
+	.type		= &nft_secmark_obj_type,
+	.size		= sizeof(struct nft_secmark),
+	.init		= nft_secmark_obj_init,
+	.eval		= nft_secmark_obj_eval,
+	.dump		= nft_secmark_obj_dump,
+	.destroy	= nft_secmark_obj_destroy,
+};
+struct nft_object_type nft_secmark_obj_type __read_mostly = {
+	.type		= NFT_OBJECT_SECMARK,
+	.ops		= &nft_secmark_obj_ops,
+	.maxattr	= NFTA_SECMARK_MAX,
+	.policy		= nft_secmark_policy,
+	.owner		= THIS_MODULE,
+};
+#endif /* CONFIG_NETWORK_SECMARK */
-- 
2.11.0

^ permalink raw reply related

* [PATCH 29/31] netfilter: nf_tables: use rhashtable_lookup() instead of rhashtable_lookup_fast()
From: Pablo Neira Ayuso @ 2018-10-08 23:01 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20181008230125.2330-1-pablo@netfilter.org>

From: Taehee Yoo <ap420073@gmail.com>

Internally, rhashtable_lookup_fast() calls rcu_read_lock() then,
calls rhashtable_lookup(). so that in places where are guaranteed
by rcu read lock, rhashtable_lookup() is enough.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_core.c | 4 ++--
 net/netfilter/nft_set_hash.c       | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index da3044482317..185c633b6872 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -233,8 +233,8 @@ flow_offload_lookup(struct nf_flowtable *flow_table,
 	struct flow_offload *flow;
 	int dir;
 
-	tuplehash = rhashtable_lookup_fast(&flow_table->rhashtable, tuple,
-					   nf_flow_offload_rhash_params);
+	tuplehash = rhashtable_lookup(&flow_table->rhashtable, tuple,
+				      nf_flow_offload_rhash_params);
 	if (!tuplehash)
 		return NULL;
 
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 4f9c01715856..339a9dd1c832 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -88,7 +88,7 @@ static bool nft_rhash_lookup(const struct net *net, const struct nft_set *set,
 		.key	 = key,
 	};
 
-	he = rhashtable_lookup_fast(&priv->ht, &arg, nft_rhash_params);
+	he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
 	if (he != NULL)
 		*ext = &he->ext;
 
@@ -106,7 +106,7 @@ static void *nft_rhash_get(const struct net *net, const struct nft_set *set,
 		.key	 = elem->key.val.data,
 	};
 
-	he = rhashtable_lookup_fast(&priv->ht, &arg, nft_rhash_params);
+	he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
 	if (he != NULL)
 		return he;
 
@@ -129,7 +129,7 @@ static bool nft_rhash_update(struct nft_set *set, const u32 *key,
 		.key	 = key,
 	};
 
-	he = rhashtable_lookup_fast(&priv->ht, &arg, nft_rhash_params);
+	he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
 	if (he != NULL)
 		goto out;
 
@@ -217,7 +217,7 @@ static void *nft_rhash_deactivate(const struct net *net,
 	};
 
 	rcu_read_lock();
-	he = rhashtable_lookup_fast(&priv->ht, &arg, nft_rhash_params);
+	he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
 	if (he != NULL &&
 	    !nft_rhash_flush(net, set, he))
 		he = NULL;
-- 
2.11.0

^ permalink raw reply related

* [PATCH 31/31] netfilter: xt_quota: Don't use aligned attribute in sizeof
From: Pablo Neira Ayuso @ 2018-10-08 23:01 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20181008230125.2330-1-pablo@netfilter.org>

From: Nathan Chancellor <natechancellor@gmail.com>

Clang warns:

net/netfilter/xt_quota.c:47:44: warning: 'aligned' attribute ignored
when parsing type [-Wignored-attributes]
        BUILD_BUG_ON(sizeof(atomic64_t) != sizeof(__aligned_u64));
                                                  ^~~~~~~~~~~~~

Use 'sizeof(__u64)' instead, as the alignment doesn't affect the size
of the type.

Fixes: e9837e55b020 ("netfilter: xt_quota: fix the behavior of xt_quota module")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_quota.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/xt_quota.c b/net/netfilter/xt_quota.c
index 6afa7f468a73..fceae245eb03 100644
--- a/net/netfilter/xt_quota.c
+++ b/net/netfilter/xt_quota.c
@@ -44,7 +44,7 @@ static int quota_mt_check(const struct xt_mtchk_param *par)
 {
 	struct xt_quota_info *q = par->matchinfo;
 
-	BUILD_BUG_ON(sizeof(atomic64_t) != sizeof(__aligned_u64));
+	BUILD_BUG_ON(sizeof(atomic64_t) != sizeof(__u64));
 
 	if (q->flags & ~XT_QUOTA_MASK)
 		return -EINVAL;
-- 
2.11.0

^ permalink raw reply related

* [PATCH 30/31] netfilter: xt_quota: fix the behavior of xt_quota module
From: Pablo Neira Ayuso @ 2018-10-08 23:01 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20181008230125.2330-1-pablo@netfilter.org>

From: Chenbo Feng <fengc@google.com>

A major flaw of the current xt_quota module is that quota in a specific
rule gets reset every time there is a rule change in the same table. It
makes the xt_quota module not very useful in a table in which iptables
rules are changed at run time. This fix introduces a new counter that is
visible to userspace as the remaining quota of the current rule. When
userspace restores the rules in a table, it can restore the counter to
the remaining quota instead of resetting it to the full quota.

Signed-off-by: Chenbo Feng <fengc@google.com>
Suggested-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/xt_quota.h |  8 +++--
 net/netfilter/xt_quota.c                | 55 +++++++++++++--------------------
 2 files changed, 27 insertions(+), 36 deletions(-)

diff --git a/include/uapi/linux/netfilter/xt_quota.h b/include/uapi/linux/netfilter/xt_quota.h
index f3ba5d9e58b6..d72fd52adbba 100644
--- a/include/uapi/linux/netfilter/xt_quota.h
+++ b/include/uapi/linux/netfilter/xt_quota.h
@@ -15,9 +15,11 @@ struct xt_quota_info {
 	__u32 flags;
 	__u32 pad;
 	__aligned_u64 quota;
-
-	/* Used internally by the kernel */
-	struct xt_quota_priv	*master;
+#ifdef __KERNEL__
+	atomic64_t counter;
+#else
+	__aligned_u64 remain;
+#endif
 };
 
 #endif /* _XT_QUOTA_H */
diff --git a/net/netfilter/xt_quota.c b/net/netfilter/xt_quota.c
index 10d61a6eed71..6afa7f468a73 100644
--- a/net/netfilter/xt_quota.c
+++ b/net/netfilter/xt_quota.c
@@ -11,11 +11,6 @@
 #include <linux/netfilter/xt_quota.h>
 #include <linux/module.h>
 
-struct xt_quota_priv {
-	spinlock_t	lock;
-	uint64_t	quota;
-};
-
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Sam Johnston <samj@samj.net>");
 MODULE_DESCRIPTION("Xtables: countdown quota match");
@@ -26,54 +21,48 @@ static bool
 quota_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
 	struct xt_quota_info *q = (void *)par->matchinfo;
-	struct xt_quota_priv *priv = q->master;
+	u64 current_count = atomic64_read(&q->counter);
 	bool ret = q->flags & XT_QUOTA_INVERT;
-
-	spin_lock_bh(&priv->lock);
-	if (priv->quota >= skb->len) {
-		priv->quota -= skb->len;
-		ret = !ret;
-	} else {
-		/* we do not allow even small packets from now on */
-		priv->quota = 0;
-	}
-	spin_unlock_bh(&priv->lock);
-
-	return ret;
+	u64 old_count, new_count;
+
+	do {
+		if (current_count == 1)
+			return ret;
+		if (current_count <= skb->len) {
+			atomic64_set(&q->counter, 1);
+			return ret;
+		}
+		old_count = current_count;
+		new_count = current_count - skb->len;
+		current_count = atomic64_cmpxchg(&q->counter, old_count,
+						 new_count);
+	} while (current_count != old_count);
+	return !ret;
 }
 
 static int quota_mt_check(const struct xt_mtchk_param *par)
 {
 	struct xt_quota_info *q = par->matchinfo;
 
+	BUILD_BUG_ON(sizeof(atomic64_t) != sizeof(__aligned_u64));
+
 	if (q->flags & ~XT_QUOTA_MASK)
 		return -EINVAL;
+	if (atomic64_read(&q->counter) > q->quota + 1)
+		return -ERANGE;
 
-	q->master = kmalloc(sizeof(*q->master), GFP_KERNEL);
-	if (q->master == NULL)
-		return -ENOMEM;
-
-	spin_lock_init(&q->master->lock);
-	q->master->quota = q->quota;
+	if (atomic64_read(&q->counter) == 0)
+		atomic64_set(&q->counter, q->quota + 1);
 	return 0;
 }
 
-static void quota_mt_destroy(const struct xt_mtdtor_param *par)
-{
-	const struct xt_quota_info *q = par->matchinfo;
-
-	kfree(q->master);
-}
-
 static struct xt_match quota_mt_reg __read_mostly = {
 	.name       = "quota",
 	.revision   = 0,
 	.family     = NFPROTO_UNSPEC,
 	.match      = quota_mt,
 	.checkentry = quota_mt_check,
-	.destroy    = quota_mt_destroy,
 	.matchsize  = sizeof(struct xt_quota_info),
-	.usersize   = offsetof(struct xt_quota_info, master),
 	.me         = THIS_MODULE,
 };
 
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH v2 1/3] bpf: allow zero-initializing hash map seed
From: Song Liu @ 2018-10-08 23:07 UTC (permalink / raw)
  To: lmb; +Cc: Alexei Starovoitov, Daniel Borkmann, Networking, linux-api
In-Reply-To: <20181008103221.13468-2-lmb@cloudflare.com>

On Mon, Oct 8, 2018 at 3:34 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> Add a new flag BPF_F_ZERO_SEED, which forces a hash map
> to initialize the seed to zero. This is useful when doing
> performance analysis both on individual BPF programs, as
> well as the kernel's hash table implementation.
>
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  include/uapi/linux/bpf.h |  2 ++
>  kernel/bpf/hashtab.c     | 13 +++++++++++--
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index f9187b41dff6..2c121f862082 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -253,6 +253,8 @@ enum bpf_attach_type {
>  #define BPF_F_NO_COMMON_LRU    (1U << 1)
>  /* Specify numa node during map creation */
>  #define BPF_F_NUMA_NODE                (1U << 2)
> +/* Zero-initialize hash function seed. This should only be used for testing. */
> +#define BPF_F_ZERO_SEED                (1U << 6)

Please add this line after
#define BPF_F_STACK_BUILD_ID    (1U << 5)

Other than this
Acked-by: Song Liu <songliubraving@fb.com>


>
>  /* flags for BPF_PROG_QUERY */
>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 2c1790288138..4b7c76765d9d 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -23,7 +23,7 @@
>
>  #define HTAB_CREATE_FLAG_MASK                                          \
>         (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |    \
> -        BPF_F_RDONLY | BPF_F_WRONLY)
> +        BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ZERO_SEED)
>
>  struct bucket {
>         struct hlist_nulls_head head;
> @@ -244,6 +244,7 @@ static int htab_map_alloc_check(union bpf_attr *attr)
>          */
>         bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
>         bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
> +       bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED);
>         int numa_node = bpf_map_attr_numa_node(attr);
>
>         BUILD_BUG_ON(offsetof(struct htab_elem, htab) !=
> @@ -257,6 +258,10 @@ static int htab_map_alloc_check(union bpf_attr *attr)
>                  */
>                 return -EPERM;
>
> +       if (zero_seed && !capable(CAP_SYS_ADMIN))
> +               /* Guard against local DoS, and discourage production use. */
> +               return -EPERM;
> +
>         if (attr->map_flags & ~HTAB_CREATE_FLAG_MASK)
>                 /* reserved bits should not be used */
>                 return -EINVAL;
> @@ -373,7 +378,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>         if (!htab->buckets)
>                 goto free_htab;
>
> -       htab->hashrnd = get_random_int();
> +       if (htab->map.map_flags & BPF_F_ZERO_SEED)
> +               htab->hashrnd = 0;
> +       else
> +               htab->hashrnd = get_random_int();
> +
>         for (i = 0; i < htab->n_buckets; i++) {
>                 INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
>                 raw_spin_lock_init(&htab->buckets[i].lock);
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH v2 2/3] tools: sync linux/bpf.h
From: Song Liu @ 2018-10-08 23:12 UTC (permalink / raw)
  To: lmb; +Cc: Alexei Starovoitov, Daniel Borkmann, Networking, linux-api
In-Reply-To: <20181008103221.13468-3-lmb@cloudflare.com>

On Mon, Oct 8, 2018 at 3:34 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> Synchronize changes to linux/bpf.h from
> commit 88db241b34bf ("bpf: allow zero-initializing hash map seed").
I guess we cannot keep this hash during git-am? We probably don't
need this hash anyway, as the two patches will be applied back to back.

>
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  tools/include/uapi/linux/bpf.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index f9187b41dff6..2c121f862082 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -253,6 +253,8 @@ enum bpf_attach_type {
>  #define BPF_F_NO_COMMON_LRU    (1U << 1)
>  /* Specify numa node during map creation */
>  #define BPF_F_NUMA_NODE                (1U << 2)
> +/* Zero-initialize hash function seed. This should only be used for testing. */
> +#define BPF_F_ZERO_SEED                (1U << 6)

Same as 01.

>
>  /* flags for BPF_PROG_QUERY */
>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
> --
> 2.17.1
>

^ permalink raw reply

* [PATCH net-next v3] net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI command
From: Justin.Lee1 @ 2018-10-08 23:13 UTC (permalink / raw)
  To: sam, joel; +Cc: linux-aspeed, netdev, openbmc, amithash, christian, vijaykhemka

The new command (NCSI_CMD_SEND_CMD) is added to allow user space 
application to send NC-SI command to the network card.
Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response.

The work flow is as below. 

Request:
User space application
	-> Netlink interface (msg)
	-> new Netlink handler - ncsi_send_cmd_nl()
	-> ncsi_xmit_cmd()

Response:
Response received - ncsi_rcv_rsp()
	-> internal response handler - ncsi_rsp_handler_xxx()
	-> ncsi_rsp_handler_netlink()
	-> ncsi_send_netlink_rsp ()
	-> Netlink interface (msg)
	-> user space application

Command timeout - ncsi_request_timeout()
	-> ncsi_send_netlink_timeout ()
	-> Netlink interface (msg with zero data length)
	-> user space application

Error:
Error detected
	-> ncsi_send_netlink_err ()
	-> Netlink interface (err msg)
	-> user space application

V3: Based on http://patchwork.ozlabs.org/patch/979688/ to remove the duplicated code.
V2: Remove non-related debug message and clean up the code.


Signed-off-by: Justin Lee <justin.lee1@dell.com> 


---
 include/uapi/linux/ncsi.h |   3 +
 net/ncsi/internal.h       |  10 ++-
 net/ncsi/ncsi-cmd.c       |   8 ++
 net/ncsi/ncsi-manage.c    |  16 ++++
 net/ncsi/ncsi-netlink.c   | 204 ++++++++++++++++++++++++++++++++++++++++++++++
 net/ncsi/ncsi-netlink.h   |  12 +++
 net/ncsi/ncsi-rsp.c       |  67 +++++++++++++--
 7 files changed, 314 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/ncsi.h b/include/uapi/linux/ncsi.h
index 4c292ec..4992bfc 100644
--- a/include/uapi/linux/ncsi.h
+++ b/include/uapi/linux/ncsi.h
@@ -30,6 +30,7 @@ enum ncsi_nl_commands {
 	NCSI_CMD_PKG_INFO,
 	NCSI_CMD_SET_INTERFACE,
 	NCSI_CMD_CLEAR_INTERFACE,
+	NCSI_CMD_SEND_CMD,
 
 	__NCSI_CMD_AFTER_LAST,
 	NCSI_CMD_MAX = __NCSI_CMD_AFTER_LAST - 1
@@ -43,6 +44,7 @@ enum ncsi_nl_commands {
  * @NCSI_ATTR_PACKAGE_LIST: nested array of NCSI_PKG_ATTR attributes
  * @NCSI_ATTR_PACKAGE_ID: package ID
  * @NCSI_ATTR_CHANNEL_ID: channel ID
+ * @NCSI_ATTR_DATA: command payload
  * @NCSI_ATTR_MAX: highest attribute number
  */
 enum ncsi_nl_attrs {
@@ -51,6 +53,7 @@ enum ncsi_nl_attrs {
 	NCSI_ATTR_PACKAGE_LIST,
 	NCSI_ATTR_PACKAGE_ID,
 	NCSI_ATTR_CHANNEL_ID,
+	NCSI_ATTR_DATA,
 
 	__NCSI_ATTR_AFTER_LAST,
 	NCSI_ATTR_MAX = __NCSI_ATTR_AFTER_LAST - 1
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 3d0a33b..e9db100 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -175,6 +175,8 @@ struct ncsi_package;
 #define NCSI_RESERVED_CHANNEL	0x1f
 #define NCSI_CHANNEL_INDEX(c)	((c) & ((1 << NCSI_PACKAGE_SHIFT) - 1))
 #define NCSI_TO_CHANNEL(p, c)	(((p) << NCSI_PACKAGE_SHIFT) | (c))
+#define NCSI_MAX_PACKAGE	8
+#define NCSI_MAX_CHANNEL	32
 
 struct ncsi_channel {
 	unsigned char               id;
@@ -219,12 +221,17 @@ struct ncsi_request {
 	unsigned char        id;      /* Request ID - 0 to 255           */
 	bool                 used;    /* Request that has been assigned  */
 	unsigned int         flags;   /* NCSI request property           */
-#define NCSI_REQ_FLAG_EVENT_DRIVEN	1
+#define NCSI_REQ_FLAG_EVENT_DRIVEN		1
+#define NCSI_REQ_FLAG_NETLINK_DRIVEN	2
 	struct ncsi_dev_priv *ndp;    /* Associated NCSI device          */
 	struct sk_buff       *cmd;    /* Associated NCSI command packet  */
 	struct sk_buff       *rsp;    /* Associated NCSI response packet */
 	struct timer_list    timer;   /* Timer on waiting for response   */
 	bool                 enabled; /* Time has been enabled or not    */
+
+	u32                  snd_seq;     /* netlink sending sequence number */
+	u32                  snd_portid;  /* netlink portid of sender        */
+	struct nlmsghdr      nlhdr;       /* netlink message header          */
 };
 
 enum {
@@ -310,6 +317,7 @@ struct ncsi_cmd_arg {
 		unsigned int   dwords[4];
 	};
 	unsigned char        *data;       /* NCSI OEM data                 */
+	struct genl_info     *info;       /* Netlink information           */
 };
 
 extern struct list_head ncsi_dev_list;
diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index 82b7d92..356af47 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -17,6 +17,7 @@
 #include <net/ncsi.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
@@ -346,6 +347,13 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
 	if (!nr)
 		return -ENOMEM;
 
+	/* track netlink information */
+	if (nca->req_flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		nr->snd_seq = nca->info->snd_seq;
+		nr->snd_portid = nca->info->snd_portid;
+		nr->nlhdr = *nca->info->nlhdr;
+	}
+
 	/* Prepare the packet */
 	nca->id = nr->id;
 	ret = nch->handler(nr->cmd, nca);
diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index 0912847..76a4bcb 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -19,6 +19,7 @@
 #include <net/addrconf.h>
 #include <net/ipv6.h>
 #include <net/if_inet6.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
@@ -406,6 +407,9 @@ static void ncsi_request_timeout(struct timer_list *t)
 {
 	struct ncsi_request *nr = from_timer(nr, t, timer);
 	struct ncsi_dev_priv *ndp = nr->ndp;
+	struct ncsi_package *np;
+	struct ncsi_channel *nc;
+	struct ncsi_cmd_pkt *cmd;
 	unsigned long flags;
 
 	/* If the request already had associated response,
@@ -419,6 +423,18 @@ static void ncsi_request_timeout(struct timer_list *t)
 	}
 	spin_unlock_irqrestore(&ndp->lock, flags);
 
+	if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		if (nr->cmd) {
+			/* Find the package */
+			cmd = (struct ncsi_cmd_pkt *)
+			      skb_network_header(nr->cmd);
+			ncsi_find_package_and_channel(ndp,
+						      cmd->cmd.common.channel,
+						      &np, &nc);
+			ncsi_send_netlink_timeout(nr, np, nc);
+		}
+	}
+
 	/* Release the request */
 	ncsi_free_request(nr);
 }
diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 45f33d6..3941bf6 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -20,6 +20,7 @@
 #include <uapi/linux/ncsi.h>
 
 #include "internal.h"
+#include "ncsi-pkt.h"
 #include "ncsi-netlink.h"
 
 static struct genl_family ncsi_genl_family;
@@ -29,6 +30,7 @@ static const struct nla_policy ncsi_genl_policy[NCSI_ATTR_MAX + 1] = {
 	[NCSI_ATTR_PACKAGE_LIST] =	{ .type = NLA_NESTED },
 	[NCSI_ATTR_PACKAGE_ID] =	{ .type = NLA_U32 },
 	[NCSI_ATTR_CHANNEL_ID] =	{ .type = NLA_U32 },
+	[NCSI_ATTR_DATA] =		{ .type = NLA_BINARY, .len = 2048 },
 };
 
 static struct ncsi_dev_priv *ndp_from_ifindex(struct net *net, u32 ifindex)
@@ -366,6 +368,202 @@ static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
 	return 0;
 }
 
+static int ncsi_send_cmd_nl(struct sk_buff *msg, struct genl_info *info)
+{
+	struct ncsi_dev_priv *ndp;
+
+	struct ncsi_cmd_arg nca;
+	struct ncsi_pkt_hdr *hdr;
+
+	u32 package_id, channel_id;
+	unsigned char *data;
+	int len, ret;
+
+	if (!info || !info->attrs) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_IFINDEX]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_PACKAGE_ID]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_CHANNEL_ID]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ndp = ndp_from_ifindex(get_net(sock_net(msg->sk)),
+			       nla_get_u32(info->attrs[NCSI_ATTR_IFINDEX]));
+	if (!ndp) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	package_id = nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_ID]);
+	channel_id = nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_ID]);
+
+	if (package_id >= NCSI_MAX_PACKAGE || channel_id >= NCSI_MAX_CHANNEL) {
+		ret = -ERANGE;
+		goto out_netlink;
+	}
+
+	len = nla_len(info->attrs[NCSI_ATTR_DATA]);
+	if (len < sizeof(struct ncsi_pkt_hdr)) {
+		netdev_info(ndp->ndev.dev, "NCSI: no OEM command to send %u\n",
+			    package_id);
+		ret = -EINVAL;
+		goto out_netlink;
+	} else {
+		data = (unsigned char *)nla_data(info->attrs[NCSI_ATTR_DATA]);
+	}
+
+	hdr = (struct ncsi_pkt_hdr *)data;
+
+	nca.ndp = ndp;
+	nca.package = (unsigned char)package_id;
+	nca.channel = (unsigned char)channel_id;
+	nca.type = hdr->type;
+	nca.req_flags = NCSI_REQ_FLAG_NETLINK_DRIVEN;
+	nca.info = info;
+	nca.payload = ntohs(hdr->length);
+	nca.data = data + sizeof(*hdr);
+
+	ret = ncsi_xmit_cmd(&nca);
+out_netlink:
+	if (ret != 0) {
+		netdev_err(ndp->ndev.dev,
+			   "NCSI: Error %d sending OEM command\n",
+			   ret);
+		ncsi_send_netlink_err(ndp->ndev.dev,
+				      info->snd_seq,
+				      info->snd_portid,
+				      info->nlhdr,
+				      ret);
+	}
+out:
+	return ret;
+}
+
+int ncsi_send_netlink_rsp(struct ncsi_request *nr,
+			  struct ncsi_package *np,
+			  struct ncsi_channel *nc)
+{
+	struct sk_buff *skb;
+	struct net *net;
+	void *hdr;
+	int rc;
+
+	netdev_dbg(nr->ndp->ndev.dev, "NCSI: %s\n", __func__);
+
+	net = dev_net(nr->rsp->dev);
+
+	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(skb, nr->snd_portid, nr->snd_seq,
+			  &ncsi_genl_family, 0, NCSI_CMD_SEND_CMD);
+	if (!hdr) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	nla_put_u32(skb, NCSI_ATTR_IFINDEX, nr->rsp->dev->ifindex);
+	if (np)
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID, np->id);
+	if (nc)
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, nc->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, NCSI_RESERVED_CHANNEL);
+
+	rc = nla_put(skb, NCSI_ATTR_DATA, nr->rsp->len, (void *)nr->rsp->data);
+	if (rc)
+		goto err;
+
+	genlmsg_end(skb, hdr);
+	return genlmsg_unicast(net, skb, nr->snd_portid);
+
+err:
+	kfree_skb(skb);
+	return rc;
+}
+
+int ncsi_send_netlink_timeout(struct ncsi_request *nr,
+			      struct ncsi_package *np,
+			      struct ncsi_channel *nc)
+{
+	struct sk_buff *skb;
+	struct net *net;
+	void *hdr;
+
+	netdev_dbg(nr->ndp->ndev.dev, "NCSI: %s\n", __func__);
+
+	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(skb, nr->snd_portid, nr->snd_seq,
+			  &ncsi_genl_family, 0, NCSI_CMD_SEND_CMD);
+	if (!hdr) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	net = dev_net(nr->cmd->dev);
+
+	nla_put_u32(skb, NCSI_ATTR_IFINDEX, nr->cmd->dev->ifindex);
+
+	if (np)
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID, np->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID,
+			    NCSI_PACKAGE_INDEX((((struct ncsi_pkt_hdr *)
+						 nr->cmd->data)->channel)));
+
+	if (nc)
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, nc->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, NCSI_RESERVED_CHANNEL);
+
+	genlmsg_end(skb, hdr);
+	return genlmsg_unicast(net, skb, nr->snd_portid);
+}
+
+int ncsi_send_netlink_err(struct net_device *dev,
+			  u32 snd_seq,
+			  u32 snd_portid,
+			  struct nlmsghdr *nlhdr,
+			  int err)
+{
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	struct nlmsgerr *nle;
+	struct net *net;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	net = dev_net(dev);
+
+	nlh = nlmsg_put(skb, snd_portid, snd_seq,
+			NLMSG_ERROR, sizeof(*nle), 0);
+	nle = (struct nlmsgerr *)nlmsg_data(nlh);
+	nle->error = err;
+	memcpy(&nle->msg, nlhdr, sizeof(*nlh));
+
+	nlmsg_end(skb, nlh);
+
+	return nlmsg_unicast(net->genl_sock, skb, snd_portid);
+}
+
 static const struct genl_ops ncsi_ops[] = {
 	{
 		.cmd = NCSI_CMD_PKG_INFO,
@@ -386,6 +584,12 @@ static const struct genl_ops ncsi_ops[] = {
 		.doit = ncsi_clear_interface_nl,
 		.flags = GENL_ADMIN_PERM,
 	},
+	{
+		.cmd = NCSI_CMD_SEND_CMD,
+		.policy = ncsi_genl_policy,
+		.doit = ncsi_send_cmd_nl,
+		.flags = GENL_ADMIN_PERM,
+	},
 };
 
 static struct genl_family ncsi_genl_family __ro_after_init = {
diff --git a/net/ncsi/ncsi-netlink.h b/net/ncsi/ncsi-netlink.h
index 91a5c25..c4a4688 100644
--- a/net/ncsi/ncsi-netlink.h
+++ b/net/ncsi/ncsi-netlink.h
@@ -14,6 +14,18 @@
 
 #include "internal.h"
 
+int ncsi_send_netlink_rsp(struct ncsi_request *nr,
+			  struct ncsi_package *np,
+			  struct ncsi_channel *nc);
+int ncsi_send_netlink_timeout(struct ncsi_request *nr,
+			      struct ncsi_package *np,
+			      struct ncsi_channel *nc);
+int ncsi_send_netlink_err(struct net_device *dev,
+			  u32 snd_seq,
+			  u32 snd_portid,
+			  struct nlmsghdr *nlhdr,
+			  int err);
+
 int ncsi_init_netlink(struct net_device *dev);
 int ncsi_unregister_netlink(struct net_device *dev);
 
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index d66b347..dd931d2 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -16,9 +16,11 @@
 #include <net/ncsi.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
+#include "ncsi-netlink.h"
 
 static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 				 unsigned short payload)
@@ -32,15 +34,25 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 	 * before calling this function.
 	 */
 	h = (struct ncsi_rsp_pkt_hdr *)skb_network_header(nr->rsp);
-	if (h->common.revision != NCSI_PKT_REVISION)
+
+	if (h->common.revision != NCSI_PKT_REVISION) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: unsupported header revision\n");
 		return -EINVAL;
-	if (ntohs(h->common.length) != payload)
+	}
+	if (ntohs(h->common.length) != payload) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: payload length mismatched\n");
 		return -EINVAL;
+	}
 
 	/* Check on code and reason */
 	if (ntohs(h->code) != NCSI_PKT_RSP_C_COMPLETED ||
-	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR)
-		return -EINVAL;
+	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: non zero response/reason code\n");
+		return -EPERM;
+	}
 
 	/* Validate checksum, which might be zeroes if the
 	 * sender doesn't support checksum according to NCSI
@@ -52,8 +64,11 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 
 	checksum = ncsi_calculate_checksum((unsigned char *)h,
 					   sizeof(*h) + payload - 4);
-	if (*pchecksum != htonl(checksum))
+
+	if (*pchecksum != htonl(checksum)) {
+		netdev_dbg(nr->ndp->ndev.dev, "NCSI: checksum mismatched\n");
 		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -941,6 +956,26 @@ static int ncsi_rsp_handler_gpuuid(struct ncsi_request *nr)
 	return 0;
 }
 
+static int ncsi_rsp_handler_netlink(struct ncsi_request *nr)
+{
+	struct ncsi_rsp_pkt *rsp;
+	struct ncsi_dev_priv *ndp = nr->ndp;
+	struct ncsi_package *np;
+	struct ncsi_channel *nc;
+	int ret;
+
+	/* Find the package */
+	rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
+	ncsi_find_package_and_channel(ndp, rsp->rsp.common.channel,
+				      &np, &nc);
+	if (!np)
+		return -ENODEV;
+
+	ret = ncsi_send_netlink_rsp(nr, np, nc);
+
+	return ret;
+}
+
 static struct ncsi_rsp_handler {
 	unsigned char	type;
 	int             payload;
@@ -1043,6 +1078,17 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
 		netdev_warn(ndp->ndev.dev,
 			    "NCSI: 'bad' packet ignored for type 0x%x\n",
 			    hdr->type);
+
+		if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+			if (ret == -EPERM)
+				goto out_netlink;
+			else
+				ncsi_send_netlink_err(ndp->ndev.dev,
+						      nr->snd_seq,
+						      nr->snd_portid,
+						      &nr->nlhdr,
+						      ret);
+		}
 		goto out;
 	}
 
@@ -1052,6 +1098,17 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
 		netdev_err(ndp->ndev.dev,
 			   "NCSI: Handler for packet type 0x%x returned %d\n",
 			   hdr->type, ret);
+
+out_netlink:
+	if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		ret = ncsi_rsp_handler_netlink(nr);
+		if (ret) {
+			netdev_err(ndp->ndev.dev,
+				   "NCSI: Netlink handler for packet type 0x%x returned %d\n",
+				   hdr->type, ret);
+		}
+	}
+
 out:
 	ncsi_free_request(nr);
 	return ret;
-- 
2.9.3


^ permalink raw reply related

* Re: [PATCH v2 3/3] tools: add selftest for BPF_F_ZERO_SEED
From: Song Liu @ 2018-10-08 23:15 UTC (permalink / raw)
  To: lmb; +Cc: Alexei Starovoitov, Daniel Borkmann, Networking, linux-api
In-Reply-To: <20181008103221.13468-4-lmb@cloudflare.com>

On Mon, Oct 8, 2018 at 3:34 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> Check that iterating two separate hash maps produces the same
> order of keys if BPF_F_ZERO_SEED is used.
>
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  tools/testing/selftests/bpf/test_maps.c | 68 +++++++++++++++++++++----
>  1 file changed, 57 insertions(+), 11 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index 9b552c0fc47d..a8d6af27803a 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -257,23 +257,35 @@ static void test_hashmap_percpu(int task, void *data)
>         close(fd);
>  }
>
> +static int helper_fill_hashmap(int max_entries)

How about we add map_flags as the second argument of this function?
This will help
avoid the old_flags hack.

Thanks,
Song

> +{
> +       int i, fd, ret;
> +       long long key, value;
> +
> +       fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
> +                           max_entries, map_flags);
> +       CHECK(fd < 0,
> +             "failed to create hashmap",
> +             "err: %s, flags: 0x%x\n", strerror(errno), map_flags);
> +
> +       for (i = 0; i < max_entries; i++) {
> +               key = i; value = key;
> +               ret = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST);
> +               CHECK(ret != 0,
> +                     "can't update hashmap",
> +                     "err: %s\n", strerror(ret));
> +       }
> +
> +       return fd;
> +}
> +
>  static void test_hashmap_walk(int task, void *data)
>  {
>         int fd, i, max_entries = 1000;
>         long long key, value, next_key;
>         bool next_key_valid = true;
>
> -       fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
> -                           max_entries, map_flags);
> -       if (fd < 0) {
> -               printf("Failed to create hashmap '%s'!\n", strerror(errno));
> -               exit(1);
> -       }
> -
> -       for (i = 0; i < max_entries; i++) {
> -               key = i; value = key;
> -               assert(bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST) == 0);
> -       }
> +       fd = helper_fill_hashmap(max_entries);
>
>         for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key,
>                                          &next_key) == 0; i++) {
> @@ -305,6 +317,39 @@ static void test_hashmap_walk(int task, void *data)
>         close(fd);
>  }
>
> +static void test_hashmap_zero_seed(void)
> +{
> +       int i, first, second, old_flags;
> +       long long key, next_first, next_second;
> +
> +       old_flags = map_flags;
> +       map_flags |= BPF_F_ZERO_SEED;
> +
> +       first = helper_fill_hashmap(3);
> +       second = helper_fill_hashmap(3);
> +
> +       for (i = 0; ; i++) {
> +               void *key_ptr = !i ? NULL : &key;
> +
> +               if (bpf_map_get_next_key(first, key_ptr, &next_first) != 0)
> +                       break;
> +
> +               CHECK(bpf_map_get_next_key(second, key_ptr, &next_second) != 0,
> +                     "next_key for second map must succeed",
> +                     "key_ptr: %p", key_ptr);
> +               CHECK(next_first != next_second,
> +                     "keys must match",
> +                     "i: %d first: %lld second: %lld\n", i,
> +                     next_first, next_second);
> +
> +               key = next_first;
> +       }
> +
> +       map_flags = old_flags;
> +       close(first);
> +       close(second);
> +}
> +
>  static void test_arraymap(int task, void *data)
>  {
>         int key, next_key, fd;
> @@ -1417,6 +1462,7 @@ static void run_all_tests(void)
>         test_hashmap(0, NULL);
>         test_hashmap_percpu(0, NULL);
>         test_hashmap_walk(0, NULL);
> +       test_hashmap_zero_seed();
>
>         test_arraymap(0, NULL);
>         test_arraymap_percpu(0, NULL);
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH iptables] extensions: libxt_quota: Allow setting the remaining quota
From: Pablo Neira Ayuso @ 2018-10-08 23:16 UTC (permalink / raw)
  To: Chenbo Feng
  Cc: netdev, netfilter-devel, kernel-team, Lorenzo Colitti, maze,
	Chenbo Feng
In-Reply-To: <1538443388-6881-2-git-send-email-chenbofeng.kernel@gmail.com>

On Mon, Oct 01, 2018 at 06:23:07PM -0700, Chenbo Feng wrote:
> From: Chenbo Feng <fengc@google.com>
> 
> The current xt_quota module cannot track the current remaining quota
> of a specific rule. Everytime an unrelated rule is updated in the same
> iptables table, the quota will be reset. This is not a very useful
> function for iptables that get changed at run time. This patch fixes the
> above problem by adding a new field in the struct that records the
> current remaining quota.
> 
> Fixed a print out bug in verbose print out wrt. inversion.

Applied, thanks.

Please, send me a patch to:

#1 add tests for iptables-tests.py, see .t file under extensions/
#2 document this new option in the manpage.

Thanks again.

^ permalink raw reply

* Re: [PATCH net-next v7 25/28] crypto: port Poly1305 to Zinc
From: Eric Biggers @ 2018-10-08 23:21 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, netdev, davem, gregkh, Samuel Neves,
	Andy Lutomirski, linux-crypto
In-Reply-To: <20181006025709.4019-26-Jason@zx2c4.com>

On Sat, Oct 06, 2018 at 04:57:06AM +0200, Jason A. Donenfeld wrote:
> diff --git a/crypto/poly1305_zinc.c b/crypto/poly1305_zinc.c
> new file mode 100644
> index 000000000000..4794442edf26
> --- /dev/null
> +++ b/crypto/poly1305_zinc.c
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * Copyright (C) 2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + */
> +
> +#include <crypto/algapi.h>
> +#include <crypto/internal/hash.h>
> +#include <zinc/poly1305.h>
> +#include <linux/crypto.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/simd.h>
> +
> +struct poly1305_desc_ctx {
> +	struct poly1305_ctx ctx;
> +	u8 key[POLY1305_KEY_SIZE];
> +	unsigned int rem_key_bytes;
> +};
> +
> +static int crypto_poly1305_init(struct shash_desc *desc)
> +{
> +	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
> +	dctx->rem_key_bytes = POLY1305_KEY_SIZE;
> +	return 0;
> +}
> +
> +static int crypto_poly1305_update(struct shash_desc *desc, const u8 *src,
> +				  unsigned int srclen)
> +{
> +	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
> +	simd_context_t simd_context;
> +
> +	if (unlikely(dctx->rem_key_bytes)) {
> +		unsigned int key_bytes = min(srclen, dctx->rem_key_bytes);
> +		memcpy(dctx->key + (POLY1305_KEY_SIZE - dctx->rem_key_bytes),
> +		       src, key_bytes);
> +		src += key_bytes;
> +		srclen -= key_bytes;
> +		dctx->rem_key_bytes -= key_bytes;
> +		if (!dctx->rem_key_bytes) {
> +			poly1305_init(&dctx->ctx, dctx->key);
> +			memzero_explicit(dctx->key, sizeof(dctx->key));
> +		}
> +		if (!srclen)
> +			return 0;
> +	}
> +
> +	simd_get(&simd_context);
> +	poly1305_update(&dctx->ctx, src, srclen, &simd_context);
> +	simd_put(&simd_context);
> +
> +	return 0;
> +}
> +
> +static int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
> +{
> +	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
> +	simd_context_t simd_context;
> +
> +	simd_get(&simd_context);
> +	poly1305_final(&dctx->ctx, dst, &simd_context);
> +	simd_put(&simd_context);
> +	return 0;
> +}

This crashes on very short inputs.  crypto_poly1305_final() is missing:

	if (dctx->rem_key_bytes)
		return -ENOKEY;

- Eric

^ permalink raw reply

* Re: [PATCH] bpf: btf: Fix a missing check bug
From: Song Liu @ 2018-10-09  6:55 UTC (permalink / raw)
  To: valdis.kletnieks
  Cc: wang6495, kjlu, Alexei Starovoitov, Daniel Borkmann, Networking,
	open list
In-Reply-To: <9337.1539047240@turing-police.cc.vt.edu>

On Mon, Oct 8, 2018 at 6:07 PM <valdis.kletnieks@vt.edu> wrote:
>
> On Mon, 08 Oct 2018 17:44:46 -0700, Song Liu said:
>
> > I think I get the security concept here. However, hdr_len here is only used to
> > copy the whole header into kernel space, and it is not used in other
> > logic at all.
> > I cannot image any security flaw with either hdr_len > btf->hdr->hdr_len case or
> > hdr_len < btf->hdr->hdr_len. Could you please provide more insights on what
> > would break by malicious user space?
>
> Say the biggest allowed value for hdr_len is 128.  We check the value, the user has 98.
> They then stuff 16,383 into there.
>
> Now here's the problem - hdr_len is a local variable, and evaporates when the function
> returns.  From here on out, anybody who cares about the header length will use the
> value in btf->hdr_len....
>
> (And yes, somebody *does* care about the length, otherwise we wouldn't need a field
> saying what the length was....)
>
> Now think how many ways that can go pear-shaped.  You copied in 98 bytes, but outside
> the function, they think that header is almost 4 pages long.  Does that ever get used as
> a length for kmemcpy()?  Or a limit for a 'for (i=start; i< (start+hdr->hdr_len); i++)' that
> walks across a variable length header?
>
> Can you cook up a way to have a good chance to oops the kernel when it walks off the
> page you allocated the 98 bytes on?  Can you use it to export chunks of memory out to
> userspace?  Lots and lots of ways for this to kersplat a kernel...;

In current code, I don't thing any malicious hdr_len value could pass
btf_check_sec_info().
On the other hand, I agree this is a good-to-have check.

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [PATCH net 10/10] rxrpc: Fix the packet reception routine
From: David Howells @ 2018-10-08 23:41 UTC (permalink / raw)
  Cc: dhowells, netdev, pabeni, eric.dumazet, linux-afs, linux-kernel
In-Reply-To: <153903891257.17944.7418193390115112585.stgit@warthog.procyon.org.uk>

David Howells <dhowells@redhat.com> wrote:

>  struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local,
>  					   struct rxrpc_sock *rx,
> -					   struct rxrpc_peer *peer,
> -					   struct rxrpc_connection *conn,
>  					   struct sk_buff *skb)
>  {
>  	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> +	struct rxrpc_connection *conn;
> +	struct rxrpc_peer *peer;

'peer' needs to be initialised to NULL here to prevent an oops later if we
fail to look the peer up.

	-	struct rxrpc_peer *peer;
	+	struct rxrpc_peer *peer = NULL;

I've repushed the branch with a new tag rxrpc-fixes-20181008-b if you could
pull that instead.  I can repost the series if you'd prefer.

David

^ permalink raw reply

* [RFC PATCH 11/11] net: ethernet: ti: cpsw: deprecate cpsw-phy-sel driver
From: Grygorii Strashko @ 2018-10-08 23:49 UTC (permalink / raw)
  To: David S. Miller, netdev, Tony Lindgren, Rob Herring,
	Kishon Vijay Abraham I
  Cc: Sekhar Nori, linux-kernel, linux-omap, devicetree,
	Grygorii Strashko
In-Reply-To: <20181008234949.15416-1-grygorii.strashko@ti.com>

Deprecate cpsw-phy-sel driver as it's been replaced with new
TI phy-gmii-sel PHY driver.

Cc: Kishon Vijay Abraham I <kishon@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/Kconfig | 6 +++---
 drivers/net/ethernet/ti/cpsw.h  | 6 ++++++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index f932923..96415da 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -49,10 +49,11 @@ config TI_DAVINCI_CPDMA
 	  will be called davinci_cpdma.  This is recommended.
 
 config TI_CPSW_PHY_SEL
-	bool
+	bool "TI CPSW Phy mode Selection (DEPRECATED)"
+	default n
 	---help---
 	  This driver supports configuring of the phy mode connected to
-	  the CPSW.
+	  the CPSW. DEPRECATED: use PHY_TI_GMII_SEL.
 
 config TI_CPSW_ALE
 	tristate "TI CPSW ALE Support"
@@ -64,7 +65,6 @@ config TI_CPSW
 	depends on ARCH_DAVINCI || ARCH_OMAP2PLUS || COMPILE_TEST
 	select TI_DAVINCI_CPDMA
 	select TI_DAVINCI_MDIO
-	select TI_CPSW_PHY_SEL
 	select TI_CPSW_ALE
 	select MFD_SYSCON
 	select REGMAP
diff --git a/drivers/net/ethernet/ti/cpsw.h b/drivers/net/ethernet/ti/cpsw.h
index cf111db..907e05fc 100644
--- a/drivers/net/ethernet/ti/cpsw.h
+++ b/drivers/net/ethernet/ti/cpsw.h
@@ -21,7 +21,13 @@
 			 ((mac)[2] << 16) | ((mac)[3] << 24))
 #define mac_lo(mac)	(((mac)[4] << 0) | ((mac)[5] << 8))
 
+#if IS_ENABLED(CONFIG_TI_CPSW_PHY_SEL)
 void cpsw_phy_sel(struct device *dev, phy_interface_t phy_mode, int slave);
+#else
+static inline
+void cpsw_phy_sel(struct device *dev, phy_interface_t phy_mode, int slave)
+{}
+#endif
 int ti_cm_get_macid(struct device *dev, int slave, u8 *mac_addr);
 
 #endif /* __CPSW_H__ */
-- 
2.10.5

^ permalink raw reply related

* [RFC PATCH 00/11] net: ethernet: ti: cpsw: replace cpsw-phy-sel with phy driver
From: Grygorii Strashko @ 2018-10-08 23:49 UTC (permalink / raw)
  To: David S. Miller, netdev, Tony Lindgren, Rob Herring,
	Kishon Vijay Abraham I
  Cc: Sekhar Nori, linux-kernel, linux-omap, devicetree,
	Grygorii Strashko

TI am335x/am437x/dra7(am5)/dm814x CPSW3G Ethernet Subsystem supports 
two 10/100/1000 Ethernet ports with selectable G/MII, RMII, and RGMII interfaces.
The interface mode is selected by configuring the MII mode selection register(s)
(GMII_SEL) in the System Control Module chapter (SCM).
                                               +--------------+
        +-------------------------------+      |SCM           |
        |                     CPSW      |      |  +---------+ |
        |        +--------------------------------+gmii_sel | |
        |        |                      |      |  +---------+ |
        |   +----v---+     +--------+   |      +--------------+
        |   |Port 1..<--+-->GMII/MII<------->
        |   |        |  |  |        |   |
        |   +--------+  |  +--------+   |
        |               |               |
        |               |  +--------+   |
        |               |  | RMII   <------->
        |               +-->        |   |
        |               |  +--------+   |
        |               |               |
        |               |  +--------+   |
        |               |  | RGMII  <------->
        |               +-->        |   |
        |                  +--------+   |
        +-------------------------------+

GMII_SEL register(s) and bit fields placement in SCM are different between SoCs
while fields meaning is the same. GMII_SEL(s) allows to select -
Port GMII/MII/RMII/RGMII Mode; RGMII Internal Delay Mode (SoC dependant) and
RMII Reference Clock Output mode (SoC dependant).

Historically CPSW external Port's interface mode selection configuartion was
introduced using custom driver and API cpsw-phy-sel.c.
This leads to unnecessary driver, DT binding and custom API support effort.
Moreover, even definition of cpsw-phy-sel node in DTs is logically incorrect [1]

mac: ethernet@4a100000 {
	compatible = "ti,am4372-cpsw","ti,cpsw";
	...

	phy_sel: cpsw-phy-sel@44e10650 {
		compatible = "ti,am43xx-cpsw-phy-sel";
		reg= <0x44e10650 0x4>;
		reg-names = "gmii-sel";
	};
};

This series introduces attempt to drop custom CPSW Port interface selection
implementation (cpsw-phy-sel.c) and use well defined Linux PHY framework instead.
it introduces CPSW Port's PHY Interface Mode selection Driver (phy-gmii-sel)
which implements standard Linux PHY interface. The phy-gmii-sel PHY device
should defined as child device of SCM node (scm_conf) and can be attached to
each CPSW port node using standard PHY bindings (cell 1 - port number,
cell 2 - RMII refclk mode).

scm_conf: scm_conf@0 {
	compatible = "syscon", "simple-bus";

	gmii_sel_phy: cpsw-sel-netif {
		compatible = "ti,am43xx-gmii-sel-phy";
		syscon-scm = <&scm_conf>;
		#phy-cells = <2>;
	};
};

mac: ethernet@4a100000 {
	compatible = "ti,am4372-cpsw","ti,cpsw";

	cpsw_emac0: slave@4a100200 {
		phy-mode = "rgmii";
		phys = <&gmii_sel_phy 1 0>;
	};
};

The CPSW driver requests phy-gmii-sel PHY for each external port and uses newly
introduced API phy_set_netif_mode() (Patch 1) for port interface mode selection
when netdev is opened.

	slave->data->gmii_sel_phy = devm_of_phy_get(&pdev->dev, port_node, NULL);
	slave->data->phy_if = of_get_phy_mode(port_node);

cpsw_ndo_open()
	phy_set_netif_mode(slave->data->gmii_sel_phy, slave->data->phy_if);

Note. CPSW Port interface has to be reconfigured every time netdev is opened for
proper System Suspend support where CPSW can lose context.

I've considered two options while working on this series:
1) extend enum phy_mode {} and introduce more enum elements to cover missing,
required Network PHY's Interface Mode definitions, like MII/GMII/RGMII(-XID),
but it'd mean copy-past and data duplication from phy_interface_t -> phy_mode.
More over, phy_interface_t can still continue growing.

2) introduce new PHY API for network interface mode selection which will use
already defined set of modes from phy_interface_t.

Option 2 was selected for this series.

[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg247135.html

Cc: Kishon Vijay Abraham I <kishon@ti.com>
Cc: Tony Lindgren <tony@atomide.com>

Grygorii Strashko (11):
  phy: core add phy_set_netif_mode() api
  dt-bindings: phy: add cpsw port interface mode selection phy bindings
  phy: ti: introduce phy-gmii-sel driver
  dt-bindings: net: ti: cpsw: switch to use phy-gmii-sel phy
  net: ethernet: ti: cpsw: add support for port interface mode selection
    phy
  ARM: dts: dra7: switch to use phy-gmii-sel
  ARM: dts: dm814x: switch to use phy-gmii-sel
  ARM: dts: am4372: switch to use phy-gmii-sel
  ARM: dts: am335x: switch to use phy-gmii-sel
  dt-bindings: net: ti: deprecate cpsw-phy-sel bindings
  net: ethernet: ti: cpsw: deprecate cpsw-phy-sel driver

 .../devicetree/bindings/net/cpsw-phy-sel.txt       |   2 +-
 Documentation/devicetree/bindings/net/cpsw.txt     |   8 +-
 .../devicetree/bindings/phy/ti-phy-gmii-sel.txt    |  68 ++++
 arch/arm/boot/dts/am335x-baltos-ir2110.dts         |   4 -
 arch/arm/boot/dts/am335x-baltos-ir3220.dts         |   3 -
 arch/arm/boot/dts/am335x-baltos-ir5221.dts         |   3 -
 arch/arm/boot/dts/am335x-chiliboard.dts            |   3 -
 arch/arm/boot/dts/am335x-icev2.dts                 |   4 -
 arch/arm/boot/dts/am335x-igep0033.dtsi             |   3 -
 arch/arm/boot/dts/am335x-lxm.dts                   |   3 -
 arch/arm/boot/dts/am335x-moxa-uc-8100-me-t.dts     |   5 -
 arch/arm/boot/dts/am335x-phycore-som.dtsi          |   3 -
 arch/arm/boot/dts/am33xx.dtsi                      |  14 +-
 arch/arm/boot/dts/am4372.dtsi                      |  16 +-
 arch/arm/boot/dts/am43x-epos-evm.dts               |   5 +-
 arch/arm/boot/dts/dm814x.dtsi                      |  15 +-
 arch/arm/boot/dts/dra7.dtsi                        |  14 +-
 drivers/net/ethernet/ti/Kconfig                    |   6 +-
 drivers/net/ethernet/ti/cpsw.c                     |  18 +-
 drivers/net/ethernet/ti/cpsw.h                     |   6 +
 drivers/phy/phy-core.c                             |  15 +
 drivers/phy/ti/Kconfig                             |  10 +
 drivers/phy/ti/Makefile                            |   1 +
 drivers/phy/ti/phy-gmii-sel.c                      | 345 +++++++++++++++++++++
 include/linux/phy/phy.h                            |  12 +
 25 files changed, 520 insertions(+), 66 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/phy/ti-phy-gmii-sel.txt
 create mode 100644 drivers/phy/ti/phy-gmii-sel.c

-- 
2.10.5

^ permalink raw reply

* Re: [PATCH net-next v7 25/28] crypto: port Poly1305 to Zinc
From: Jason A. Donenfeld @ 2018-10-09  0:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: LKML, Netdev, David Miller, Greg Kroah-Hartman, Samuel Neves,
	Andrew Lutomirski, Linux Crypto Mailing List
In-Reply-To: <20181008232059.GA164708@gmail.com>

Hi Eric,

On Tue, Oct 9, 2018 at 1:21 AM Eric Biggers <ebiggers@kernel.org> wrote:
> This crashes on very short inputs.  crypto_poly1305_final() is missing:
>
>         if (dctx->rem_key_bytes)
>                 return -ENOKEY;

Good catch, thanks. Queued for v8.

Jason

^ permalink raw reply

* Re: [PATCH bpf-next 0/6] Error handling when map lookup isn't supported
From: Prashant Bhole @ 2018-10-09  0:02 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Alexei Starovoitov, Jakub Kicinski, David S . Miller,
	Quentin Monnet, netdev
In-Reply-To: <20181005183512.lmjbgf2qoene3r4w@ast-mbp.dhcp.thefacebook.com>



On 10/6/2018 3:35 AM, Alexei Starovoitov wrote:
> On Fri, Oct 05, 2018 at 12:35:55PM +0900, Prashant Bhole wrote:
>> Currently when map a lookup fails, user space API can not make any
>> distinction whether given key was not found or lookup is not supported
>> by particular map.
>>
>> In this series we modify return value of maps which do not support
>> lookup. Lookup on such map implementation will return -EOPNOTSUPP.
>> bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
>> errno. We also handle this error in bpftool to print appropriate
>> message.
>>
>> Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
>> such that errno will set to EOPNOTSUPP when map doesn't support lookup
>>
>> Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
>> for maps which do not support lookup
>>
>> Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
>> moved out into new function dump_map_elem(). This was done in order to
>> reduce deep indentation and accomodate further changes.
>>
>> Patch 4: Changes in bpftool to print strerror() message when lookup
>> error is occured. This will result in appropriate message like
>> "Operation not supported" when map doesn't support lookup.
>>
>> Patch 5: test_verifier: change fixup map naming convention as
>> suggested by Alexei
>>
>> Patch 6: Added verifier tests to check whether verifier rejects call
>> to bpf_map_lookup_elem from bpf program. For all map types those
>> do not support map lookup.
> 
> for the set:
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Thanks. Is there any reason this series did not get posted on 
netdev-list and can not be seen in the patchwork?

^ permalink raw reply

* Re: [PATCH] dt-bindings: Add bindings for aliases node
From: Geert Uytterhoeven @ 2018-10-09  7:22 UTC (permalink / raw)
  To: Brian Norris
  Cc: Matthias Kaehlcke, Rob Herring, Mark Rutland,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Linux Kernel Mailing List, linux-wireless, linux-spi, netdev,
	swboyd, Florian Fainelli
In-Reply-To: <20181009010711.GA1577@ban.mtv.corp.google.com>

Hi Brian,

On Tue, Oct 9, 2018 at 3:07 AM Brian Norris <briannorris@chromium.org> wrote:
> On Tue, Sep 25, 2018 at 02:02:55PM -0700, Matthias Kaehlcke wrote:
> > Add a global binding for the 'aliases' node. This includes an initial list
> > of standardized alias names for some hardware components that are commonly
> > found in 'aliases'.
> >
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>

> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/aliases.txt
> > @@ -0,0 +1,47 @@
> > +The aliases node
> > +----------------
>
> I like the idea in general, and it might be good to note (e.g., commit
> message) that this was inspired by this thread:
>
> https://lore.kernel.org/lkml/20180815221601.GB24830@rob-hp-laptop/
>
> where we were interested in firmware-to-device-tree path stability --
> and the answer was basically: don't memorize paths, just use aliases
> instead. But then, it was clear that aliases were not documented very
> formally at all.
>
> So here we are!
>
> > +
> > +The aliases node contains properties that represent aliases to device tree
> > +nodes. The name of the property is the alias name, the value is the path of
> > +a the device tree node that corresponds to the alias. The path may be
> > +specified as a string or a phandle.
> > +
> > +Alias names are often suffixed with a numeric ID, especially when there may
> > +be multiple instances of the same type. The ID typically corresponds to the
> > +hardware layout, it may also be used by drivers for a stable mapping of
> > +device names and hardware entities.
> > +
> > +Alias names
> > +-----------
> > +
> > +The devicetree specification doesn't require the use of specific alias
> > +names to refer to hardware entities of a given type, however the Linux
> > +kernel aims for a certain level of consistency.
> > +
> > +The following standardized alias names shall be used for their

s/shall/may/

> > +corresponding hardware components:
> > +
> > +  bluetoothN         Bluetooth controller
> > +  ethernetN          Ethernet interface
> > +  gpioN                      GPIO controller
> > +  i2cN                       i2c bus
> > +  mmcN                       MMC bus
> > +  rtcN                       Real time clock
> > +  serialN            UART port
> > +  spiN                       SPI bus
> > +  wifiN                      Wireless network interface
>
> For the network-device-related names (bluetooth, ethernet, and wifi), I
> think there's a clear documented reason for this (supporting MAC address
> plumbing from a DT-aware bootloader). I'm not quite as sure about all
> the others, and unfortunately, I'm aware of at least one subsystem owner
> that explicitly does NOT like the aliases usage that is currently
> supported (spi), and shot down a patch where I tried to use it in a DTS
> file (despite its regular usage in many other DTS files).
>
> So I guess I'm saying: perhaps we should get buy-in from various
> subsystems before we include them? So maybe it's wiser to start
> small(er) and only add once we're sure they are useful? Or perhaps Rob
> has other thoughts.

Please note these aliases become cumbersome once you start considering
(dynamic) DT overlays.  That's why I made them optional in the sh-sci
serial driver, cfr. commit 7678f4c20fa7670f ("serial: sh-sci: Add support
for dynamic instances").
Relevant parts of the commit description are:

    On DT platforms, the sh-sci driver requires the presence of "serialN"
    aliases in DT, from which instance IDs are derived.  If a DT alias is
    missing, the drivers fails to probe the corresponding serial port.

    This becomes cumbersome when considering DT overlays, as currently
    there is no upstream support for dynamically updating the /aliases node
    in DT.  Furthermore, even in the presence of such support, hardcoded
    instance IDs in independent overlays are prone to conflicts.

    Hence add support for dynamic instance IDs, to be used in the absence of
    a DT alias.  This makes serial ports behave similar to I2C and SPI
    buses, which already support dynamic instances.

To clarify my point: R-Car M2-W has 4 different types of serial ports, for a
total of 18 ports, and the two ports on a board labeled 0 and 1 may not
correspond to the physical first two ports (what's "first" in a collection of
4 different types?).

Aliases may be fine for referring to the main serial console (labeled
port 0 on the device, too), and the primary Ethernet interface (so U-Boot
knows where to add the "local-mac-address" property), but beyond that,
I think they should be avoided.

Just my two^H^H^Hfive €c.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH bpf] xsk: do not call synchronize_net() under RCU read lock
From: Song Liu @ 2018-10-09  0:30 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, eric.dumazet,
	Björn Töpel, Magnus Karlsson, Magnus Karlsson
In-Reply-To: <20181008174016.12307-1-bjorn.topel@gmail.com>

On Mon, Oct 8, 2018 at 10:41 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The XSKMAP update and delete functions called synchronize_net(), which
> can sleep. It is not allowed to sleep during an RCU read section.
>
> Instead we need to make sure that the sock sk_destruct (xsk_destruct)
> function is asynchronously called after an RCU grace period. Setting
> the SOCK_RCU_FREE flag for XDP sockets takes care of this.
>
> Fixes: fbfc504a24f5 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  kernel/bpf/xskmap.c | 10 ++--------
>  net/xdp/xsk.c       |  2 ++
>  2 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
> index 9f8463afda9c..47147c9e184d 100644
> --- a/kernel/bpf/xskmap.c
> +++ b/kernel/bpf/xskmap.c
> @@ -192,11 +192,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
>         sock_hold(sock->sk);
>
>         old_xs = xchg(&m->xsk_map[i], xs);
> -       if (old_xs) {
> -               /* Make sure we've flushed everything. */
> -               synchronize_net();
> +       if (old_xs)
>                 sock_put((struct sock *)old_xs);
> -       }
>
>         sockfd_put(sock);
>         return 0;
> @@ -212,11 +209,8 @@ static int xsk_map_delete_elem(struct bpf_map *map, void *key)
>                 return -EINVAL;
>
>         old_xs = xchg(&m->xsk_map[k], NULL);
> -       if (old_xs) {
> -               /* Make sure we've flushed everything. */
> -               synchronize_net();
> +       if (old_xs)
>                 sock_put((struct sock *)old_xs);
> -       }
>
>         return 0;
>  }
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 0577cd49aa72..07156f43d295 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -754,6 +754,8 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
>         sk->sk_destruct = xsk_destruct;
>         sk_refcnt_debug_inc(sk);
>
> +       sock_set_flag(sk, SOCK_RCU_FREE);
> +
>         xs = xdp_sk(sk);
>         mutex_init(&xs->mutex);
>         spin_lock_init(&xs->tx_completion_lock);
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH bpf-next 0/6] Error handling when map lookup isn't supported
From: Daniel Borkmann @ 2018-10-09  0:43 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov
  Cc: Alexei Starovoitov, Jakub Kicinski, David S . Miller,
	Quentin Monnet, netdev
In-Reply-To: <49b6c6b1-d66b-db0d-d866-9720b269c1a2@lab.ntt.co.jp>

On 10/09/2018 02:02 AM, Prashant Bhole wrote:
> On 10/6/2018 3:35 AM, Alexei Starovoitov wrote:
>> On Fri, Oct 05, 2018 at 12:35:55PM +0900, Prashant Bhole wrote:
>>> Currently when map a lookup fails, user space API can not make any
>>> distinction whether given key was not found or lookup is not supported
>>> by particular map.
>>>
>>> In this series we modify return value of maps which do not support
>>> lookup. Lookup on such map implementation will return -EOPNOTSUPP.
>>> bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
>>> errno. We also handle this error in bpftool to print appropriate
>>> message.
>>>
>>> Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
>>> such that errno will set to EOPNOTSUPP when map doesn't support lookup
>>>
>>> Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
>>> for maps which do not support lookup
>>>
>>> Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
>>> moved out into new function dump_map_elem(). This was done in order to
>>> reduce deep indentation and accomodate further changes.
>>>
>>> Patch 4: Changes in bpftool to print strerror() message when lookup
>>> error is occured. This will result in appropriate message like
>>> "Operation not supported" when map doesn't support lookup.
>>>
>>> Patch 5: test_verifier: change fixup map naming convention as
>>> suggested by Alexei
>>>
>>> Patch 6: Added verifier tests to check whether verifier rejects call
>>> to bpf_map_lookup_elem from bpf program. For all map types those
>>> do not support map lookup.
>>
>> for the set:
>> Acked-by: Alexei Starovoitov <ast@kernel.org>
> 
> Thanks. Is there any reason this series did not get posted on netdev-list and can not be seen in the patchwork?

Hmm, could you repost to netdev? Perhaps a netdev or patchwork issue that
it did not land there. I just double-checked and it's indeed not present.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf-next 0/6] Error handling when map lookup isn't supported
From: Prashant Bhole @ 2018-10-09  0:47 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: Alexei Starovoitov, Jakub Kicinski, David S . Miller,
	Quentin Monnet, netdev
In-Reply-To: <5029a2fe-10f8-116a-428d-e02790ec8366@iogearbox.net>



On 10/9/2018 9:43 AM, Daniel Borkmann wrote:
> On 10/09/2018 02:02 AM, Prashant Bhole wrote:
>> On 10/6/2018 3:35 AM, Alexei Starovoitov wrote:
>>> On Fri, Oct 05, 2018 at 12:35:55PM +0900, Prashant Bhole wrote:
>>>> Currently when map a lookup fails, user space API can not make any
>>>> distinction whether given key was not found or lookup is not supported
>>>> by particular map.
>>>>
>>>> In this series we modify return value of maps which do not support
>>>> lookup. Lookup on such map implementation will return -EOPNOTSUPP.
>>>> bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
>>>> errno. We also handle this error in bpftool to print appropriate
>>>> message.
>>>>
>>>> Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
>>>> such that errno will set to EOPNOTSUPP when map doesn't support lookup
>>>>
>>>> Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
>>>> for maps which do not support lookup
>>>>
>>>> Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
>>>> moved out into new function dump_map_elem(). This was done in order to
>>>> reduce deep indentation and accomodate further changes.
>>>>
>>>> Patch 4: Changes in bpftool to print strerror() message when lookup
>>>> error is occured. This will result in appropriate message like
>>>> "Operation not supported" when map doesn't support lookup.
>>>>
>>>> Patch 5: test_verifier: change fixup map naming convention as
>>>> suggested by Alexei
>>>>
>>>> Patch 6: Added verifier tests to check whether verifier rejects call
>>>> to bpf_map_lookup_elem from bpf program. For all map types those
>>>> do not support map lookup.
>>>
>>> for the set:
>>> Acked-by: Alexei Starovoitov <ast@kernel.org>
>>
>> Thanks. Is there any reason this series did not get posted on netdev-list and can not be seen in the patchwork?
> 
> Hmm, could you repost to netdev? Perhaps a netdev or patchwork issue that
> it did not land there. I just double-checked and it's indeed not present.
> 

Shall I repost with the same version and Alexei's Acked-by for the series?

-Prashant

^ permalink raw reply

* Re: [PATCH bpf-next 0/6] Error handling when map lookup isn't supported
From: Alexei Starovoitov @ 2018-10-09  0:57 UTC (permalink / raw)
  To: Prashant Bhole
  Cc: Daniel Borkmann, Alexei Starovoitov, Jakub Kicinski,
	David S. Miller, Quentin Monnet, Network Development
In-Reply-To: <57e8a677-bfe7-4a39-f58c-daa1a5dba72b@lab.ntt.co.jp>

On Tue, Oct 9, 2018 at 12:48 AM Prashant Bhole
<bhole_prashant_q7@lab.ntt.co.jp> wrote:
>
> Shall I repost with the same version and Alexei's Acked-by for the series?

yes. please repost as-is and add my Ack to all patches.
Thanks!

^ permalink raw reply

* [PATCH bpf-next 0/6] Error handling when map lookup isn't supported
From: Prashant Bhole @ 2018-10-09  1:04 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Prashant Bhole, Jakub Kicinski, David S . Miller, Quentin Monnet,
	netdev

Currently when map a lookup fails, user space API can not make any
distinction whether given key was not found or lookup is not supported
by particular map.

In this series we modify return value of maps which do not support
lookup. Lookup on such map implementation will return -EOPNOTSUPP.
bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
errno. We also handle this error in bpftool to print appropriate
message.

Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
such that errno will set to EOPNOTSUPP when map doesn't support lookup

Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
for maps which do not support lookup

Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
moved out into new function dump_map_elem(). This was done in order to
reduce deep indentation and accomodate further changes.

Patch 4: Changes in bpftool to print strerror() message when lookup
error is occured. This will result in appropriate message like
"Operation not supported" when map doesn't support lookup.

Patch 5: test_verifier: change fixup map naming convention as
suggested by Alexei

Patch 6: Added verifier tests to check whether verifier rejects call 
to bpf_map_lookup_elem from bpf program. For all map types those
do not support map lookup.

Prashant Bhole (6):
  bpf: error handling when map_lookup_elem isn't supported
  bpf: return EOPNOTSUPP when map lookup isn't supported
  tools/bpf: bpftool, split the function do_dump()
  tools/bpf: bpftool, print strerror when map lookup error occurs
  selftests/bpf: test_verifier, change names of fixup maps
  selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf
    prog

 kernel/bpf/arraymap.c                       |   2 +-
 kernel/bpf/sockmap.c                        |   2 +-
 kernel/bpf/stackmap.c                       |   2 +-
 kernel/bpf/syscall.c                        |   9 +-
 kernel/bpf/xskmap.c                         |   2 +-
 tools/bpf/bpftool/map.c                     | 102 ++--
 tools/testing/selftests/bpf/test_verifier.c | 501 ++++++++++++--------
 7 files changed, 389 insertions(+), 231 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH bpf-next 1/6] bpf: error handling when map_lookup_elem isn't supported
From: Prashant Bhole @ 2018-10-09  1:04 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Prashant Bhole, Jakub Kicinski, David S . Miller, Quentin Monnet,
	netdev
In-Reply-To: <20181009010454.6652-1-bhole_prashant_q7@lab.ntt.co.jp>

The error value returned by map_lookup_elem doesn't differentiate
whether lookup was failed because of invalid key or lookup is not
supported.

Lets add handling for -EOPNOTSUPP return value of map_lookup_elem()
method of map, with expectation from map's implementation that it
should return -EOPNOTSUPP if lookup is not supported.

The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set
to EOPNOTSUPP if map lookup is not supported.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/syscall.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5742df21598c..4f416234251f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -719,10 +719,15 @@ static int map_lookup_elem(union bpf_attr *attr)
 	} else {
 		rcu_read_lock();
 		ptr = map->ops->map_lookup_elem(map, key);
-		if (ptr)
+		if (IS_ERR(ptr)) {
+			err = PTR_ERR(ptr);
+		} else if (!ptr) {
+			err = -ENOENT;
+		} else {
+			err = 0;
 			memcpy(value, ptr, value_size);
+		}
 		rcu_read_unlock();
-		err = ptr ? 0 : -ENOENT;
 	}
 
 	if (err)
-- 
2.17.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox