* [PATCH net-next 1/4] rhashtable: add function to replace an element
2015-09-24 16:30 [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook Tom Herbert
@ 2015-09-24 16:30 ` Tom Herbert
2015-09-24 16:30 ` [PATCH net-next 2/4] netlink: add a start callback for starting a netlink dump Tom Herbert
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2015-09-24 16:30 UTC (permalink / raw)
To: davem, netdev; +Cc: kernel-team
Add the rhashtable_replace_fast function. This replaces one object in
the table with another atomically. The hashes of the new and old objects
must be equal.
Signed-off-by: Tom Herbert <tom@herbertland.com>
---
include/linux/rhashtable.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 80 insertions(+)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 843ceca..c7c86cb 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -819,4 +819,84 @@ out:
return err;
}
+/* Internal function, please use rhashtable_replace_fast() instead */
+static inline int __rhashtable_replace_fast(
+ struct rhashtable *ht, struct bucket_table *tbl,
+ struct rhash_head *obj_old, struct rhash_head *obj_new,
+ const struct rhashtable_params params)
+{
+ struct rhash_head __rcu **pprev;
+ struct rhash_head *he;
+ spinlock_t *lock;
+ unsigned int hash;
+ int err = -ENOENT;
+
+ /* Minimally, the old and new objects must have same hash */
+ hash = rht_head_hashfn(ht, tbl, obj_old, params);
+ if (hash != rht_head_hashfn(ht, tbl, obj_new, params))
+ return -EINVAL;
+
+ lock = rht_bucket_lock(tbl, hash);
+
+ spin_lock_bh(lock);
+
+ pprev = &tbl->buckets[hash];
+ rht_for_each(he, tbl, hash) {
+ if (he != obj_old) {
+ pprev = &he->next;
+ continue;
+ }
+
+ rcu_assign_pointer(obj_new->next, obj_old->next);
+ rcu_assign_pointer(*pprev, obj_new);
+ err = 0;
+ break;
+ }
+
+ spin_unlock_bh(lock);
+
+ return err;
+}
+
+/**
+ * rhashtable_replace_fast - replace an object in hash table
+ * @ht: hash table
+ * @obj_old: pointer to hash head inside object being replaced
+ * @obj_new: pointer to hash head inside object which is new
+ * @params: hash table parameters
+ *
+ * Replacing an object doesn't affect the number of elements in the hash table
+ * or bucket, so we don't need to worry about shrinking or expanding the
+ * table here.
+ *
+ * Returns zero on success, -ENOENT if the entry could not be found,
+ * -EINVAL if hash is not the same for the old and new objects.
+ */
+static inline int rhashtable_replace_fast(
+ struct rhashtable *ht, struct rhash_head *obj_old,
+ struct rhash_head *obj_new,
+ const struct rhashtable_params params)
+{
+ struct bucket_table *tbl;
+ int err;
+
+ rcu_read_lock();
+
+ tbl = rht_dereference_rcu(ht->tbl, ht);
+
+ /* Because we have already taken (and released) the bucket
+ * lock in old_tbl, if we find that future_tbl is not yet
+ * visible then that guarantees the entry to still be in
+ * the old tbl if it exists.
+ */
+ while ((err = __rhashtable_replace_fast(ht, tbl, obj_old,
+ obj_new, params)) &&
+ (tbl = rht_dereference_rcu(tbl->future_tbl, ht)))
+ ;
+
+ rcu_read_unlock();
+
+ return err;
+}
+
#endif /* _LINUX_RHASHTABLE_H */
--
2.4.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next 2/4] netlink: add a start callback for starting a netlink dump
2015-09-24 16:30 [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook Tom Herbert
2015-09-24 16:30 ` [PATCH net-next 1/4] rhashtable: add function to replace an element Tom Herbert
@ 2015-09-24 16:30 ` Tom Herbert
2015-09-24 16:30 ` [PATCH net-next 3/4] ila: Create net/ipv6/ila directory Tom Herbert
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2015-09-24 16:30 UTC (permalink / raw)
To: davem, netdev; +Cc: kernel-team
The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.
Signed-off-by: Tom Herbert <tom@herbertland.com>
---
include/linux/netlink.h | 2 ++
include/net/genetlink.h | 2 ++
net/netlink/af_netlink.c | 4 ++++
net/netlink/genetlink.c | 16 ++++++++++++++++
4 files changed, 24 insertions(+)
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 639e9b8..0b41959 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -131,6 +131,7 @@ netlink_skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
struct netlink_callback {
struct sk_buff *skb;
const struct nlmsghdr *nlh;
+ int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff * skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
@@ -153,6 +154,7 @@ struct nlmsghdr *
__nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int flags);
struct netlink_dump_control {
+ int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *skb, struct netlink_callback *);
int (*done)(struct netlink_callback *);
void *data;
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index a9af1cc..d76f2da 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -114,6 +114,7 @@ static inline void genl_info_net_set(struct genl_info *info, struct net *net)
* @flags: flags
* @policy: attribute validation policy
* @doit: standard command callback
+ * @start: start callback for dumps
* @dumpit: callback for dumpers
* @done: completion callback for dumps
* @ops_list: operations list
@@ -122,6 +123,7 @@ struct genl_ops {
const struct nla_policy *policy;
int (*doit)(struct sk_buff *skb,
struct genl_info *info);
+ int (*start)(struct netlink_callback *cb);
int (*dumpit)(struct sk_buff *skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7f86d3b..25c6633 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2870,6 +2870,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
cb = &nlk->cb;
memset(cb, 0, sizeof(*cb));
+ cb->start = control->start;
cb->dump = control->dump;
cb->done = control->done;
cb->nlh = nlh;
@@ -2882,6 +2883,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
mutex_unlock(nlk->cb_mutex);
+ if (cb->start)
+ cb->start(cb);
+
ret = netlink_dump(sk);
sock_put(sk);
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 2ed5f96..3d111b0 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -513,6 +513,20 @@ void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
}
EXPORT_SYMBOL(genlmsg_put);
+static int genl_lock_start(struct netlink_callback *cb)
+{
+ /* our ops are always const - netlink API doesn't propagate that */
+ const struct genl_ops *ops = cb->data;
+ int rc = 0;
+
+ if (ops->start) {
+ genl_lock();
+ rc = ops->start(cb);
+ genl_unlock();
+ }
+ return rc;
+}
+
static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
{
/* our ops are always const - netlink API doesn't propagate that */
@@ -577,6 +591,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
.module = family->module,
/* we have const, but the netlink API doesn't */
.data = (void *)ops,
+ .start = genl_lock_start,
.dump = genl_lock_dumpit,
.done = genl_lock_done,
};
@@ -588,6 +603,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
} else {
struct netlink_dump_control c = {
.module = family->module,
+ .start = ops->start,
.dump = ops->dumpit,
.done = ops->done,
};
--
2.4.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next 3/4] ila: Create net/ipv6/ila directory
2015-09-24 16:30 [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook Tom Herbert
2015-09-24 16:30 ` [PATCH net-next 1/4] rhashtable: add function to replace an element Tom Herbert
2015-09-24 16:30 ` [PATCH net-next 2/4] netlink: add a start callback for starting a netlink dump Tom Herbert
@ 2015-09-24 16:30 ` Tom Herbert
2015-09-24 16:30 ` [PATCH net-next 4/4] ila: Add support for netfilter NF_INET_PRE_ROUTING hook Tom Herbert
2015-09-27 6:15 ` [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook David Miller
4 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2015-09-24 16:30 UTC (permalink / raw)
To: davem, netdev; +Cc: kernel-team
Create ila directory in preparation for supporting other hooks in the
kernel than LWT for doing ILA. This includes:
- Moving ila.c to ila/ila_lwt.c
- Splitting out some common functions into ila_common.c
Signed-off-by: Tom Herbert <tom@herbertland.com>
---
net/ipv6/Makefile | 2 +-
net/ipv6/ila.c | 229 ----------------------------------------------
net/ipv6/ila/Makefile | 7 ++
net/ipv6/ila/ila.h | 46 ++++++++++
net/ipv6/ila/ila_common.c | 94 +++++++++++++++++++
net/ipv6/ila/ila_lwt.c | 149 ++++++++++++++++++++++++++++++
6 files changed, 297 insertions(+), 230 deletions(-)
delete mode 100644 net/ipv6/ila.c
create mode 100644 net/ipv6/ila/Makefile
create mode 100644 net/ipv6/ila/ila.h
create mode 100644 net/ipv6/ila/ila_common.c
create mode 100644 net/ipv6/ila/ila_lwt.c
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2c900c7..2fbd90b 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
obj-$(CONFIG_IPV6_MIP6) += mip6.o
-obj-$(CONFIG_IPV6_ILA) += ila.o
+obj-$(CONFIG_IPV6_ILA) += ila/
obj-$(CONFIG_NETFILTER) += netfilter/
obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
diff --git a/net/ipv6/ila.c b/net/ipv6/ila.c
deleted file mode 100644
index 678d2df..0000000
--- a/net/ipv6/ila.c
+++ /dev/null
@@ -1,229 +0,0 @@
-#include <linux/errno.h>
-#include <linux/ip.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/skbuff.h>
-#include <linux/socket.h>
-#include <linux/types.h>
-#include <net/checksum.h>
-#include <net/ip.h>
-#include <net/ip6_fib.h>
-#include <net/lwtunnel.h>
-#include <net/protocol.h>
-#include <uapi/linux/ila.h>
-
-struct ila_params {
- __be64 locator;
- __be64 locator_match;
- __wsum csum_diff;
-};
-
-static inline struct ila_params *ila_params_lwtunnel(
- struct lwtunnel_state *lwstate)
-{
- return (struct ila_params *)lwstate->data;
-}
-
-static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
-{
- __be32 diff[] = {
- ~from[0], ~from[1], to[0], to[1],
- };
-
- return csum_partial(diff, sizeof(diff), 0);
-}
-
-static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
-{
- if (*(__be64 *)&ip6h->daddr == p->locator_match)
- return p->csum_diff;
- else
- return compute_csum_diff8((__be32 *)&ip6h->daddr,
- (__be32 *)&p->locator);
-}
-
-static void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
-{
- __wsum diff;
- struct ipv6hdr *ip6h = ipv6_hdr(skb);
- size_t nhoff = sizeof(struct ipv6hdr);
-
- /* First update checksum */
- switch (ip6h->nexthdr) {
- case NEXTHDR_TCP:
- if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr)))) {
- struct tcphdr *th = (struct tcphdr *)
- (skb_network_header(skb) + nhoff);
-
- diff = get_csum_diff(ip6h, p);
- inet_proto_csum_replace_by_diff(&th->check, skb,
- diff, true);
- }
- break;
- case NEXTHDR_UDP:
- if (likely(pskb_may_pull(skb, nhoff + sizeof(struct udphdr)))) {
- struct udphdr *uh = (struct udphdr *)
- (skb_network_header(skb) + nhoff);
-
- if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
- diff = get_csum_diff(ip6h, p);
- inet_proto_csum_replace_by_diff(&uh->check, skb,
- diff, true);
- if (!uh->check)
- uh->check = CSUM_MANGLED_0;
- }
- }
- break;
- case NEXTHDR_ICMP:
- if (likely(pskb_may_pull(skb,
- nhoff + sizeof(struct icmp6hdr)))) {
- struct icmp6hdr *ih = (struct icmp6hdr *)
- (skb_network_header(skb) + nhoff);
-
- diff = get_csum_diff(ip6h, p);
- inet_proto_csum_replace_by_diff(&ih->icmp6_cksum, skb,
- diff, true);
- }
- break;
- }
-
- /* Now change destination address */
- *(__be64 *)&ip6h->daddr = p->locator;
-}
-
-static int ila_output(struct sock *sk, struct sk_buff *skb)
-{
- struct dst_entry *dst = skb_dst(skb);
-
- if (skb->protocol != htons(ETH_P_IPV6))
- goto drop;
-
- update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
-
- return dst->lwtstate->orig_output(sk, skb);
-
-drop:
- kfree_skb(skb);
- return -EINVAL;
-}
-
-static int ila_input(struct sk_buff *skb)
-{
- struct dst_entry *dst = skb_dst(skb);
-
- if (skb->protocol != htons(ETH_P_IPV6))
- goto drop;
-
- update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
-
- return dst->lwtstate->orig_input(skb);
-
-drop:
- kfree_skb(skb);
- return -EINVAL;
-}
-
-static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
- [ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
-};
-
-static int ila_build_state(struct net_device *dev, struct nlattr *nla,
- unsigned int family, const void *cfg,
- struct lwtunnel_state **ts)
-{
- struct ila_params *p;
- struct nlattr *tb[ILA_ATTR_MAX + 1];
- size_t encap_len = sizeof(*p);
- struct lwtunnel_state *newts;
- const struct fib6_config *cfg6 = cfg;
- int ret;
-
- if (family != AF_INET6)
- return -EINVAL;
-
- ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla,
- ila_nl_policy);
- if (ret < 0)
- return ret;
-
- if (!tb[ILA_ATTR_LOCATOR])
- return -EINVAL;
-
- newts = lwtunnel_state_alloc(encap_len);
- if (!newts)
- return -ENOMEM;
-
- newts->len = encap_len;
- p = ila_params_lwtunnel(newts);
-
- p->locator = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]);
-
- if (cfg6->fc_dst_len > sizeof(__be64)) {
- /* Precompute checksum difference for translation since we
- * know both the old locator and the new one.
- */
- p->locator_match = *(__be64 *)&cfg6->fc_dst;
- p->csum_diff = compute_csum_diff8(
- (__be32 *)&p->locator_match, (__be32 *)&p->locator);
- }
-
- newts->type = LWTUNNEL_ENCAP_ILA;
- newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
- LWTUNNEL_STATE_INPUT_REDIRECT;
-
- *ts = newts;
-
- return 0;
-}
-
-static int ila_fill_encap_info(struct sk_buff *skb,
- struct lwtunnel_state *lwtstate)
-{
- struct ila_params *p = ila_params_lwtunnel(lwtstate);
-
- if (nla_put_u64(skb, ILA_ATTR_LOCATOR, (__force u64)p->locator))
- goto nla_put_failure;
-
- return 0;
-
-nla_put_failure:
- return -EMSGSIZE;
-}
-
-static int ila_encap_nlsize(struct lwtunnel_state *lwtstate)
-{
- /* No encapsulation overhead */
- return 0;
-}
-
-static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
-{
- struct ila_params *a_p = ila_params_lwtunnel(a);
- struct ila_params *b_p = ila_params_lwtunnel(b);
-
- return (a_p->locator != b_p->locator);
-}
-
-static const struct lwtunnel_encap_ops ila_encap_ops = {
- .build_state = ila_build_state,
- .output = ila_output,
- .input = ila_input,
- .fill_encap = ila_fill_encap_info,
- .get_encap_size = ila_encap_nlsize,
- .cmp_encap = ila_encap_cmp,
-};
-
-static int __init ila_init(void)
-{
- return lwtunnel_encap_add_ops(&ila_encap_ops, LWTUNNEL_ENCAP_ILA);
-}
-
-static void __exit ila_fini(void)
-{
- lwtunnel_encap_del_ops(&ila_encap_ops, LWTUNNEL_ENCAP_ILA);
-}
-
-module_init(ila_init);
-module_exit(ila_fini);
-MODULE_AUTHOR("Tom Herbert <tom@herbertland.com>");
-MODULE_LICENSE("GPL");
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
new file mode 100644
index 0000000..31d136b
--- /dev/null
+++ b/net/ipv6/ila/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for ILA module
+#
+
+obj-$(CONFIG_IPV6_ILA) += ila.o
+
+ila-objs := ila_common.o ila_lwt.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
new file mode 100644
index 0000000..b94081f
--- /dev/null
+++ b/net/ipv6/ila/ila.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (c) 2015 Tom Herbert <tom@herbertland.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ */
+
+#ifndef __ILA_H
+#define __ILA_H
+
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/socket.h>
+#include <linux/skbuff.h>
+#include <linux/types.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/protocol.h>
+#include <uapi/linux/ila.h>
+
+struct ila_params {
+ __be64 locator;
+ __be64 locator_match;
+ __wsum csum_diff;
+};
+
+static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
+{
+ __be32 diff[] = {
+ ~from[0], ~from[1], to[0], to[1],
+ };
+
+ return csum_partial(diff, sizeof(diff), 0);
+}
+
+void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p);
+
+int ila_lwt_init(void);
+void ila_lwt_fini(void);
+
+#endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
new file mode 100644
index 0000000..7cfd54d
--- /dev/null
+++ b/net/ipv6/ila/ila_common.c
@@ -0,0 +1,94 @@
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+#include <net/protocol.h>
+#include <uapi/linux/ila.h>
+#include "ila.h"
+
+static __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
+{
+ if (*(__be64 *)&ip6h->daddr == p->locator_match)
+ return p->csum_diff;
+ else
+ return compute_csum_diff8((__be32 *)&ip6h->daddr,
+ (__be32 *)&p->locator);
+}
+
+void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
+{
+ __wsum diff;
+ struct ipv6hdr *ip6h = ipv6_hdr(skb);
+ size_t nhoff = sizeof(struct ipv6hdr);
+
+ /* First update checksum */
+ switch (ip6h->nexthdr) {
+ case NEXTHDR_TCP:
+ if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr)))) {
+ struct tcphdr *th = (struct tcphdr *)
+ (skb_network_header(skb) + nhoff);
+
+ diff = get_csum_diff(ip6h, p);
+ inet_proto_csum_replace_by_diff(&th->check, skb,
+ diff, true);
+ }
+ break;
+ case NEXTHDR_UDP:
+ if (likely(pskb_may_pull(skb, nhoff + sizeof(struct udphdr)))) {
+ struct udphdr *uh = (struct udphdr *)
+ (skb_network_header(skb) + nhoff);
+
+ if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
+ diff = get_csum_diff(ip6h, p);
+ inet_proto_csum_replace_by_diff(&uh->check, skb,
+ diff, true);
+ if (!uh->check)
+ uh->check = CSUM_MANGLED_0;
+ }
+ }
+ break;
+ case NEXTHDR_ICMP:
+ if (likely(pskb_may_pull(skb,
+ nhoff + sizeof(struct icmp6hdr)))) {
+ struct icmp6hdr *ih = (struct icmp6hdr *)
+ (skb_network_header(skb) + nhoff);
+
+ diff = get_csum_diff(ip6h, p);
+ inet_proto_csum_replace_by_diff(&ih->icmp6_cksum, skb,
+ diff, true);
+ }
+ break;
+ }
+
+ /* Now change destination address */
+ *(__be64 *)&ip6h->daddr = p->locator;
+}
+
+static int __init ila_init(void)
+{
+ int ret;
+
+ ret = ila_lwt_init();
+
+ if (ret)
+ goto fail_lwt;
+
+fail_lwt:
+ return ret;
+}
+
+static void __exit ila_fini(void)
+{
+ ila_lwt_fini();
+}
+
+module_init(ila_init);
+module_exit(ila_fini);
+MODULE_AUTHOR("Tom Herbert <tom@herbertland.com>");
+MODULE_LICENSE("GPL");
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
new file mode 100644
index 0000000..e423f06
--- /dev/null
+++ b/net/ipv6/ila/ila_lwt.c
@@ -0,0 +1,149 @@
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+#include <net/lwtunnel.h>
+#include <uapi/linux/ila.h>
+#include "ila.h"
+
+static inline struct ila_params *ila_params_lwtunnel(
+ struct lwtunnel_state *lwstate)
+{
+ return (struct ila_params *)lwstate->data;
+}
+
+static int ila_output(struct sock *sk, struct sk_buff *skb)
+{
+ struct dst_entry *dst = skb_dst(skb);
+
+ if (skb->protocol != htons(ETH_P_IPV6))
+ goto drop;
+
+ update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
+
+ return dst->lwtstate->orig_output(sk, skb);
+
+drop:
+ kfree_skb(skb);
+ return -EINVAL;
+}
+
+static int ila_input(struct sk_buff *skb)
+{
+ struct dst_entry *dst = skb_dst(skb);
+
+ if (skb->protocol != htons(ETH_P_IPV6))
+ goto drop;
+
+ update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
+
+ return dst->lwtstate->orig_input(skb);
+
+drop:
+ kfree_skb(skb);
+ return -EINVAL;
+}
+
+static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
+ [ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
+};
+
+static int ila_build_state(struct net_device *dev, struct nlattr *nla,
+ unsigned int family, const void *cfg,
+ struct lwtunnel_state **ts)
+{
+ struct ila_params *p;
+ struct nlattr *tb[ILA_ATTR_MAX + 1];
+ size_t encap_len = sizeof(*p);
+ struct lwtunnel_state *newts;
+ const struct fib6_config *cfg6 = cfg;
+ int ret;
+
+ if (family != AF_INET6)
+ return -EINVAL;
+
+ ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla,
+ ila_nl_policy);
+ if (ret < 0)
+ return ret;
+
+ if (!tb[ILA_ATTR_LOCATOR])
+ return -EINVAL;
+
+ newts = lwtunnel_state_alloc(encap_len);
+ if (!newts)
+ return -ENOMEM;
+
+ newts->len = encap_len;
+ p = ila_params_lwtunnel(newts);
+
+ p->locator = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]);
+
+ if (cfg6->fc_dst_len > sizeof(__be64)) {
+ /* Precompute checksum difference for translation since we
+ * know both the old locator and the new one.
+ */
+ p->locator_match = *(__be64 *)&cfg6->fc_dst;
+ p->csum_diff = compute_csum_diff8(
+ (__be32 *)&p->locator_match, (__be32 *)&p->locator);
+ }
+
+ newts->type = LWTUNNEL_ENCAP_ILA;
+ newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
+ LWTUNNEL_STATE_INPUT_REDIRECT;
+
+ *ts = newts;
+
+ return 0;
+}
+
+static int ila_fill_encap_info(struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate)
+{
+ struct ila_params *p = ila_params_lwtunnel(lwtstate);
+
+ if (nla_put_u64(skb, ILA_ATTR_LOCATOR, (__force u64)p->locator))
+ goto nla_put_failure;
+
+ return 0;
+
+nla_put_failure:
+ return -EMSGSIZE;
+}
+
+static int ila_encap_nlsize(struct lwtunnel_state *lwtstate)
+{
+ /* No encapsulation overhead */
+ return 0;
+}
+
+static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
+{
+ struct ila_params *a_p = ila_params_lwtunnel(a);
+ struct ila_params *b_p = ila_params_lwtunnel(b);
+
+ return (a_p->locator != b_p->locator);
+}
+
+static const struct lwtunnel_encap_ops ila_encap_ops = {
+ .build_state = ila_build_state,
+ .output = ila_output,
+ .input = ila_input,
+ .fill_encap = ila_fill_encap_info,
+ .get_encap_size = ila_encap_nlsize,
+ .cmp_encap = ila_encap_cmp,
+};
+
+int ila_lwt_init(void)
+{
+ return lwtunnel_encap_add_ops(&ila_encap_ops, LWTUNNEL_ENCAP_ILA);
+}
+
+void ila_lwt_fini(void)
+{
+ lwtunnel_encap_del_ops(&ila_encap_ops, LWTUNNEL_ENCAP_ILA);
+}
--
2.4.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next 4/4] ila: Add support for netfilter NF_INET_PRE_ROUTING hook
2015-09-24 16:30 [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook Tom Herbert
` (2 preceding siblings ...)
2015-09-24 16:30 ` [PATCH net-next 3/4] ila: Create net/ipv6/ila directory Tom Herbert
@ 2015-09-24 16:30 ` Tom Herbert
2015-09-27 6:15 ` [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook David Miller
4 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2015-09-24 16:30 UTC (permalink / raw)
To: davem, netdev; +Cc: kernel-team
This patch sets up a hook at NF_INET_PRE_ROUTING to perform ILA
translation. This is done to have a way to perform translation before
early demux which can be a significant performance advantage over
LWT which would occur after.
The implementation uses an rhashtable which is used to do the locator
lookup. The entries in the rhashtable is configured via new netlink
commands.
Signed-off-by: Tom Herbert <tom@herbertland.com>
---
include/uapi/linux/ila.h | 22 ++
net/ipv6/ila/Makefile | 2 +-
net/ipv6/ila/ila.h | 2 +
net/ipv6/ila/ila_common.c | 8 +
net/ipv6/ila/ila_xlat.c | 665 ++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 698 insertions(+), 1 deletion(-)
create mode 100644 net/ipv6/ila/ila_xlat.c
diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 7ed9e67..abde7bb 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -3,13 +3,35 @@
#ifndef _UAPI_LINUX_ILA_H
#define _UAPI_LINUX_ILA_H
+/* NETLINK_GENERIC related info */
+#define ILA_GENL_NAME "ila"
+#define ILA_GENL_VERSION 0x1
+
enum {
ILA_ATTR_UNSPEC,
ILA_ATTR_LOCATOR, /* u64 */
+ ILA_ATTR_IDENTIFIER, /* u64 */
+ ILA_ATTR_LOCATOR_MATCH, /* u64 */
+ ILA_ATTR_IFINDEX, /* s32 */
+ ILA_ATTR_DIR, /* u32 */
__ILA_ATTR_MAX,
};
#define ILA_ATTR_MAX (__ILA_ATTR_MAX - 1)
+enum {
+ ILA_CMD_UNSPEC,
+ ILA_CMD_ADD,
+ ILA_CMD_DEL,
+ ILA_CMD_GET,
+
+ __ILA_CMD_MAX,
+};
+
+#define ILA_CMD_MAX (__ILA_CMD_MAX - 1)
+
+#define ILA_DIR_IN (1 << 0)
+#define ILA_DIR_OUT (1 << 1)
+
#endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 31d136b..4b32e59 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
obj-$(CONFIG_IPV6_ILA) += ila.o
-ila-objs := ila_common.o ila_lwt.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index b94081f..28542cb 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -42,5 +42,7 @@ void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p);
int ila_lwt_init(void);
void ila_lwt_fini(void);
+int ila_xlat_init(void);
+void ila_xlat_fini(void);
#endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index 7cfd54d..5f97559 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -79,12 +79,20 @@ static int __init ila_init(void)
if (ret)
goto fail_lwt;
+ ret = ila_xlat_init();
+ if (ret)
+ goto fail_xlat;
+
+ return 0;
+fail_xlat:
+ ila_lwt_fini();
fail_lwt:
return ret;
}
static void __exit ila_fini(void)
{
+ ila_xlat_fini();
ila_lwt_fini();
}
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
new file mode 100644
index 0000000..85641bd
--- /dev/null
+++ b/net/ipv6/ila/ila_xlat.c
@@ -0,0 +1,665 @@
+#include <linux/jhash.h>
+#include <linux/netfilter.h>
+#include <linux/rcupdate.h>
+#include <linux/rhashtable.h>
+#include <linux/vmalloc.h>
+#include <net/genetlink.h>
+#include <net/netns/generic.h>
+#include <net/xfrm.h>
+#include <uapi/linux/genetlink.h>
+#include "ila.h"
+
+struct ila_xlat_params {
+ struct ila_params ip;
+ __be64 identifier;
+ int ifindex;
+ unsigned int dir;
+};
+
+struct ila_map {
+ struct ila_xlat_params p;
+ struct rhash_head node;
+ struct ila_map *next;
+ struct rcu_head rcu;
+};
+
+static unsigned int ila_net_id;
+
+struct ila_net {
+ struct rhashtable rhash_table;
+ spinlock_t *locks; /* Bucket locks for entry manipulation */
+ unsigned int locks_mask;
+};
+
+#define LOCKS_PER_CPU 10
+
+static int alloc_ila_locks(struct ila_net *ilan, gfp_t gfp)
+{
+ unsigned int i, size;
+ unsigned int nr_pcpus = num_possible_cpus();
+
+ nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL);
+ size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
+
+ if (sizeof(spinlock_t) != 0) {
+#ifdef CONFIG_NUMA
+ if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+ gfp == GFP_KERNEL)
+ ilan->locks = vmalloc(size * sizeof(spinlock_t));
+ else
+#endif
+ ilan->locks = kmalloc_array(size, sizeof(spinlock_t),
+ gfp);
+ if (!ilan->locks)
+ return -ENOMEM;
+ for (i = 0; i < size; i++)
+ spin_lock_init(&ilan->locks[i]);
+ }
+ ilan->locks_mask = size - 1;
+
+ return 0;
+}
+
+static u32 hashrnd __read_mostly;
+static __always_inline void __ila_hash_secret_init(void)
+{
+ net_get_random_once(&hashrnd, sizeof(hashrnd));
+}
+
+static inline u32 ila_identifier_hash(__be64 identifier)
+{
+ u32 *v = (u32 *)&identifier;
+
+ return jhash_2words(v[0], v[1], hashrnd);
+}
+
+static inline spinlock_t *ila_get_lock(struct ila_net *ilan, __be64 identifier)
+{
+ return &ilan->locks[ila_identifier_hash(identifier) & ilan->locks_mask];
+}
+
+static inline int ila_cmp_wildcards(struct ila_map *ila, __be64 loc,
+ int ifindex, unsigned int dir)
+{
+ return (ila->p.ip.locator_match && ila->p.ip.locator_match != loc) ||
+ (ila->p.ifindex && ila->p.ifindex != ifindex) ||
+ !(ila->p.dir & dir);
+}
+
+static inline int ila_cmp_params(struct ila_map *ila, struct ila_xlat_params *p)
+{
+ return (ila->p.ip.locator_match != p->ip.locator_match) ||
+ (ila->p.ifindex != p->ifindex) ||
+ (ila->p.dir != p->dir);
+}
+
+static int ila_cmpfn(struct rhashtable_compare_arg *arg,
+ const void *obj)
+{
+ const struct ila_map *ila = obj;
+
+ return (ila->p.identifier != *(__be64 *)arg->key);
+}
+
+static inline int ila_order(struct ila_map *ila)
+{
+ int score = 0;
+
+ if (ila->p.ip.locator_match)
+ score += 1 << 0;
+
+ if (ila->p.ifindex)
+ score += 1 << 1;
+
+ return score;
+}
+
+static const struct rhashtable_params rht_params = {
+ .nelem_hint = 1024,
+ .head_offset = offsetof(struct ila_map, node),
+ .key_offset = offsetof(struct ila_map, p.identifier),
+ .key_len = sizeof(u64), /* identifier */
+ .max_size = 1048576,
+ .min_size = 256,
+ .automatic_shrinking = true,
+ .obj_cmpfn = ila_cmpfn,
+};
+
+static struct genl_family ila_nl_family = {
+ .id = GENL_ID_GENERATE,
+ .hdrsize = 0,
+ .name = ILA_GENL_NAME,
+ .version = ILA_GENL_VERSION,
+ .maxattr = ILA_ATTR_MAX,
+ .netnsok = true,
+ .parallel_ops = true,
+};
+
+static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
+ [ILA_ATTR_IDENTIFIER] = { .type = NLA_U64, },
+ [ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
+ [ILA_ATTR_LOCATOR_MATCH] = { .type = NLA_U64, },
+ [ILA_ATTR_IFINDEX] = { .type = NLA_U32, },
+ [ILA_ATTR_DIR] = { .type = NLA_U32, },
+};
+
+static int parse_nl_config(struct genl_info *info,
+ struct ila_xlat_params *p)
+{
+ memset(p, 0, sizeof(*p));
+
+ if (info->attrs[ILA_ATTR_IDENTIFIER])
+ p->identifier = (__force __be64)nla_get_u64(
+ info->attrs[ILA_ATTR_IDENTIFIER]);
+
+ if (info->attrs[ILA_ATTR_LOCATOR])
+ p->ip.locator = (__force __be64)nla_get_u64(
+ info->attrs[ILA_ATTR_LOCATOR]);
+
+ if (info->attrs[ILA_ATTR_LOCATOR_MATCH])
+ p->ip.locator_match = (__force __be64)nla_get_u64(
+ info->attrs[ILA_ATTR_LOCATOR_MATCH]);
+
+ if (info->attrs[ILA_ATTR_IFINDEX])
+ p->ifindex = nla_get_s32(info->attrs[ILA_ATTR_IFINDEX]);
+
+ if (info->attrs[ILA_ATTR_DIR])
+ p->dir = nla_get_u32(info->attrs[ILA_ATTR_DIR]);
+
+ return 0;
+}
+
+/* Must be called with rcu readlock */
+static inline struct ila_map *ila_lookup_wildcards(__be64 id, __be64 loc,
+ int ifindex,
+ unsigned int dir,
+ struct ila_net *ilan)
+{
+ struct ila_map *ila;
+
+ ila = rhashtable_lookup_fast(&ilan->rhash_table, &id, rht_params);
+ while (ila) {
+ if (!ila_cmp_wildcards(ila, loc, ifindex, dir))
+ return ila;
+ ila = rcu_access_pointer(ila->next);
+ }
+
+ return NULL;
+}
+
+/* Must be called with rcu readlock */
+static inline struct ila_map *ila_lookup_by_params(struct ila_xlat_params *p,
+ struct ila_net *ilan)
+{
+ struct ila_map *ila;
+
+ ila = rhashtable_lookup_fast(&ilan->rhash_table, &p->identifier,
+ rht_params);
+ while (ila) {
+ if (!ila_cmp_params(ila, p))
+ return ila;
+ ila = rcu_access_pointer(ila->next);
+ }
+
+ return NULL;
+}
+
+static inline void ila_release(struct ila_map *ila)
+{
+ kfree_rcu(ila, rcu);
+}
+
+static void ila_free_cb(void *ptr, void *arg)
+{
+ struct ila_map *ila = (struct ila_map *)ptr, *next;
+
+ /* Assume rcu_readlock held */
+ while (ila) {
+ next = rcu_access_pointer(ila->next);
+ ila_release(ila);
+ ila = next;
+ }
+}
+
+static int ila_add_mapping(struct net *net, struct ila_xlat_params *p)
+{
+ struct ila_net *ilan = net_generic(net, ila_net_id);
+ struct ila_map *ila, *head;
+ spinlock_t *lock = ila_get_lock(ilan, p->identifier);
+ int err = 0, order;
+
+ ila = kzalloc(sizeof(*ila), GFP_KERNEL);
+ if (!ila)
+ return -ENOMEM;
+
+ ila->p = *p;
+
+ if (p->ip.locator_match) {
+ /* Precompute checksum difference for translation since we
+ * know both the old identifier and the new one.
+ */
+ ila->p.ip.csum_diff = compute_csum_diff8(
+ (__be32 *)&p->ip.locator_match,
+ (__be32 *)&p->ip.locator);
+ }
+
+ order = ila_order(ila);
+
+ spin_lock(lock);
+
+ head = rhashtable_lookup_fast(&ilan->rhash_table, &p->identifier,
+ rht_params);
+ if (!head) {
+ /* New entry for the rhash_table */
+ err = rhashtable_lookup_insert_fast(&ilan->rhash_table,
+ &ila->node, rht_params);
+ } else {
+ struct ila_map *tila = head, *prev = NULL;
+
+ do {
+ if (!ila_cmp_params(tila, p)) {
+ err = -EEXIST;
+ goto out;
+ }
+
+ if (order > ila_order(tila))
+ break;
+
+ prev = tila;
+ tila = rcu_dereference_protected(tila->next,
+ lockdep_is_held(lock));
+ } while (tila);
+
+ if (prev) {
+ /* Insert in sub list of head */
+ RCU_INIT_POINTER(ila->next, tila);
+ rcu_assign_pointer(prev->next, ila);
+ } else {
+ /* Make this ila new head */
+ RCU_INIT_POINTER(ila->next, head);
+ err = rhashtable_replace_fast(&ilan->rhash_table,
+ &head->node,
+ &ila->node, rht_params);
+ if (err)
+ goto out;
+ }
+ }
+
+out:
+ spin_unlock(lock);
+
+ if (err)
+ kfree(ila);
+
+ return err;
+}
+
+static int ila_del_mapping(struct net *net, struct ila_xlat_params *p)
+{
+ struct ila_net *ilan = net_generic(net, ila_net_id);
+ struct ila_map *ila, *head, *prev;
+ spinlock_t *lock = ila_get_lock(ilan, p->identifier);
+ int err = -ENOENT;
+
+ spin_lock(lock);
+
+ head = rhashtable_lookup_fast(&ilan->rhash_table,
+ &p->identifier, rht_params);
+ ila = head;
+
+ prev = NULL;
+
+ while (ila) {
+ if (ila_cmp_params(ila, p)) {
+ prev = ila;
+ ila = rcu_dereference_protected(ila->next,
+ lockdep_is_held(lock));
+ continue;
+ }
+
+ err = 0;
+
+ if (prev) {
+ /* Not head, just delete from list */
+ rcu_assign_pointer(prev->next, ila->next);
+ } else {
+ /* It is the head. If there is something in the
+ * sublist we need to make a new head.
+ */
+ head = rcu_dereference_protected(ila->next,
+ lockdep_is_held(lock));
+ if (head) {
+ /* Put first entry in the sublist into the
+ * table
+ */
+ err = rhashtable_replace_fast(
+ &ilan->rhash_table, &ila->node,
+ &head->node, rht_params);
+ if (err)
+ goto out;
+ } else {
+ /* Entry no longer used */
+ err = rhashtable_remove_fast(&ilan->rhash_table,
+ &ila->node,
+ rht_params);
+ }
+ }
+
+ ila_release(ila);
+
+ break;
+ }
+
+out:
+ spin_unlock(lock);
+
+ return err;
+}
+
+static int ila_nl_cmd_add_mapping(struct sk_buff *skb, struct genl_info *info)
+{
+ struct net *net = genl_info_net(info);
+ struct ila_xlat_params p;
+ int err;
+
+ err = parse_nl_config(info, &p);
+ if (err)
+ return err;
+
+ return ila_add_mapping(net, &p);
+}
+
+static int ila_nl_cmd_del_mapping(struct sk_buff *skb, struct genl_info *info)
+{
+ struct net *net = genl_info_net(info);
+ struct ila_xlat_params p;
+ int err;
+
+ err = parse_nl_config(info, &p);
+ if (err)
+ return err;
+
+ ila_del_mapping(net, &p);
+
+ return 0;
+}
+
+static int ila_fill_info(struct ila_map *ila, struct sk_buff *msg)
+{
+ if (nla_put_u64(msg, ILA_ATTR_IDENTIFIER,
+ (__force u64)ila->p.identifier) ||
+ nla_put_u64(msg, ILA_ATTR_LOCATOR,
+ (__force u64)ila->p.ip.locator) ||
+ nla_put_u64(msg, ILA_ATTR_LOCATOR_MATCH,
+ (__force u64)ila->p.ip.locator_match) ||
+ nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->p.ifindex) ||
+ nla_put_u32(msg, ILA_ATTR_DIR, ila->p.dir))
+ return -1;
+
+ return 0;
+}
+
+static int ila_dump_info(struct ila_map *ila,
+ u32 portid, u32 seq, u32 flags,
+ struct sk_buff *skb, u8 cmd)
+{
+ void *hdr;
+
+ hdr = genlmsg_put(skb, portid, seq, &ila_nl_family, flags, cmd);
+ if (!hdr)
+ return -ENOMEM;
+
+ if (ila_fill_info(ila, skb) < 0)
+ goto nla_put_failure;
+
+ genlmsg_end(skb, hdr);
+ return 0;
+
+nla_put_failure:
+ genlmsg_cancel(skb, hdr);
+ return -EMSGSIZE;
+}
+
+static int ila_nl_cmd_get_mapping(struct sk_buff *skb, struct genl_info *info)
+{
+ struct net *net = genl_info_net(info);
+ struct ila_net *ilan = net_generic(net, ila_net_id);
+ struct sk_buff *msg;
+ struct ila_xlat_params p;
+ struct ila_map *ila;
+ int ret;
+
+ ret = parse_nl_config(info, &p);
+ if (ret)
+ return ret;
+
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg)
+ return -ENOMEM;
+
+ rcu_read_lock();
+
+ ila = ila_lookup_by_params(&p, ilan);
+ if (ila) {
+ ret = ila_dump_info(ila,
+ info->snd_portid,
+ info->snd_seq, 0, msg,
+ info->genlhdr->cmd);
+ }
+
+ rcu_read_unlock();
+
+ if (ret < 0)
+ goto out_free;
+
+ return genlmsg_reply(msg, info);
+
+out_free:
+ nlmsg_free(msg);
+ return ret;
+}
+
+struct ila_dump_iter {
+ struct rhashtable_iter rhiter;
+};
+
+static int ila_nl_dump_start(struct netlink_callback *cb)
+{
+ struct net *net = sock_net(cb->skb->sk);
+ struct ila_net *ilan = net_generic(net, ila_net_id);
+ struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args;
+
+ return rhashtable_walk_init(&ilan->rhash_table, &iter->rhiter);
+}
+
+static int ila_nl_dump_done(struct netlink_callback *cb)
+{
+ struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args;
+
+ rhashtable_walk_exit(&iter->rhiter);
+
+ return 0;
+}
+
+static int ila_nl_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args;
+ struct rhashtable_iter *rhiter = &iter->rhiter;
+ struct ila_map *ila;
+ int ret;
+
+ ret = rhashtable_walk_start(rhiter);
+ if (ret && ret != -EAGAIN)
+ goto done;
+
+ for (;;) {
+ ila = rhashtable_walk_next(rhiter);
+
+ if (IS_ERR(ila)) {
+ if (PTR_ERR(ila) == -EAGAIN)
+ continue;
+ ret = PTR_ERR(ila);
+ goto done;
+ } else if (!ila) {
+ break;
+ }
+
+ while (ila) {
+ ret = ila_dump_info(ila, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, NLM_F_MULTI,
+ skb, ILA_CMD_GET);
+ if (ret)
+ goto done;
+
+ ila = rcu_access_pointer(ila->next);
+ }
+ }
+
+ ret = skb->len;
+
+done:
+ rhashtable_walk_stop(rhiter);
+ return ret;
+}
+
+static const struct genl_ops ila_nl_ops[] = {
+ {
+ .cmd = ILA_CMD_ADD,
+ .doit = ila_nl_cmd_add_mapping,
+ .policy = ila_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ },
+ {
+ .cmd = ILA_CMD_DEL,
+ .doit = ila_nl_cmd_del_mapping,
+ .policy = ila_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ },
+ {
+ .cmd = ILA_CMD_GET,
+ .doit = ila_nl_cmd_get_mapping,
+ .start = ila_nl_dump_start,
+ .dumpit = ila_nl_dump,
+ .done = ila_nl_dump_done,
+ .policy = ila_nl_policy,
+ },
+};
+
+static __net_init int ila_init_net(struct net *net)
+{
+ int err;
+ struct ila_net *ilan = net_generic(net, ila_net_id);
+
+ err = alloc_ila_locks(ilan, GFP_KERNEL);
+ if (err)
+ return err;
+
+ rhashtable_init(&ilan->rhash_table, &rht_params);
+
+ return 0;
+}
+
+static __net_exit void ila_exit_net(struct net *net)
+{
+ struct ila_net *ilan = net_generic(net, ila_net_id);
+
+ rhashtable_free_and_destroy(&ilan->rhash_table, ila_free_cb, NULL);
+
+ kvfree(ilan->locks);
+}
+
+static struct pernet_operations ila_net_ops = {
+ .init = ila_init_net,
+ .exit = ila_exit_net,
+ .id = &ila_net_id,
+ .size = sizeof(struct ila_net),
+};
+
+static int ila_xlat_addr_in(struct sk_buff *skb)
+{
+ struct ila_map *ila;
+ struct ipv6hdr *ip6h = ipv6_hdr(skb);
+ struct net *net = dev_net(skb->dev);
+ struct ila_net *ilan = net_generic(net, ila_net_id);
+ __be64 identifier, locator_match;
+ size_t nhoff;
+
+ /* Assumes skb contains a valid IPv6 header that is pulled */
+
+ identifier = *(__be64 *)&ip6h->daddr.in6_u.u6_addr8[8];
+ locator_match = *(__be64 *)&ip6h->daddr.in6_u.u6_addr8[0];
+ nhoff = sizeof(struct ipv6hdr);
+
+ rcu_read_lock();
+
+ ila = ila_lookup_wildcards(identifier, locator_match,
+ skb->dev->ifindex, ILA_DIR_IN, ilan);
+ if (ila)
+ update_ipv6_locator(skb, &ila->p.ip);
+
+ rcu_read_unlock();
+
+ return 0;
+}
+
+static unsigned int
+ila_nf_input(void *priv,
+ struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ ila_xlat_addr_in(skb);
+ return NF_ACCEPT;
+}
+
+static struct nf_hook_ops ila_nf_hook_ops[] __read_mostly = {
+ {
+ .hook = ila_nf_input,
+ .pf = NFPROTO_IPV6,
+ .hooknum = NF_INET_PRE_ROUTING,
+ .priority = -1,
+ },
+};
+
+int ila_xlat_init(void)
+{
+ int ret;
+ int nf_hook = 0;
+
+ ret = register_pernet_device(&ila_net_ops);
+ if (ret)
+ goto exit;
+
+ ret = genl_register_family_with_ops(&ila_nl_family,
+ ila_nl_ops);
+ if (ret < 0)
+ goto unregister;
+
+ for (; nf_hook < ARRAY_SIZE(ila_nf_hook_ops); nf_hook++) {
+ ret = nf_register_hook(&ila_nf_hook_ops[nf_hook]);
+ if (ret < 0)
+ goto unregister;
+ }
+
+ return 0;
+
+unregister:
+ while (nf_hook > 0) {
+ nf_hook--;
+ nf_unregister_hook(&ila_nf_hook_ops[nf_hook]);
+ }
+ unregister_pernet_device(&ila_net_ops);
+exit:
+ return ret;
+}
+
+void ila_xlat_fini(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(ila_nf_hook_ops); i++)
+ nf_unregister_hook(&ila_nf_hook_ops[i]);
+
+ genl_unregister_family(&ila_nl_family);
+ unregister_pernet_device(&ila_net_ops);
+}
+
--
2.4.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook
2015-09-24 16:30 [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook Tom Herbert
` (3 preceding siblings ...)
2015-09-24 16:30 ` [PATCH net-next 4/4] ila: Add support for netfilter NF_INET_PRE_ROUTING hook Tom Herbert
@ 2015-09-27 6:15 ` David Miller
2015-09-27 8:10 ` Florian Westphal
4 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2015-09-27 6:15 UTC (permalink / raw)
To: tom; +Cc: netdev, kernel-team
From: Tom Herbert <tom@herbertland.com>
Date: Thu, 24 Sep 2015 09:30:20 -0700
> This patch set addresses the issue for ILA by adding a fast locator
> lookup that occurs before early demux. This is done by using a hook
> at NF_INET_PRE_ROUTING. For the backend we implement an rhashtable
> that contains identifier to locator to mappings. The table also
> allows more specific matches that include original locator and
> interface.
I really don't think we should use netfilter hooks to perform
operations setup outside of netfilter's normal configuration
mechanisms.
It is not a set of arbitray hooks to take advantage of in another
subsystem or facility.
If ILA were instead configured inside of netfilter's normal mechanisms
then there would be full transparency about whether ILA
transformations are performed before or after the user's other
netfilter rules. And the user would have full control over this.
As implemented here, they don't.
So sorry, I'm not too keen on this and I bet if netfilter developers
reviewed this patch series they'd have similar objections.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook
2015-09-27 6:15 ` [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook David Miller
@ 2015-09-27 8:10 ` Florian Westphal
2015-09-28 22:27 ` Tom Herbert
0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2015-09-27 8:10 UTC (permalink / raw)
To: David Miller; +Cc: tom, netdev, kernel-team
David Miller <davem@davemloft.net> wrote:
> > This patch set addresses the issue for ILA by adding a fast locator
> > lookup that occurs before early demux. This is done by using a hook
> > at NF_INET_PRE_ROUTING. For the backend we implement an rhashtable
> > that contains identifier to locator to mappings. The table also
> > allows more specific matches that include original locator and
> > interface.
>
> I really don't think we should use netfilter hooks to perform
> operations setup outside of netfilter's normal configuration
> mechanisms.
Thanks, thats my thinking as well.
> If ILA were instead configured inside of netfilter's normal mechanisms
> then there would be full transparency about whether ILA
> transformations are performed before or after the user's other
> netfilter rules. And the user would have full control over this.
> As implemented here, they don't.
>
> So sorry, I'm not too keen on this and I bet if netfilter developers
> reviewed this patch series they'd have similar objections.
Seems this should/could be implemented similar to RFC6296 network
prefix translations (see net/ipv6/netfilter/ip6t_NPT.c).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook
2015-09-27 8:10 ` Florian Westphal
@ 2015-09-28 22:27 ` Tom Herbert
2015-09-28 23:00 ` Florian Westphal
0 siblings, 1 reply; 10+ messages in thread
From: Tom Herbert @ 2015-09-28 22:27 UTC (permalink / raw)
To: Florian Westphal
Cc: David Miller, Linux Kernel Network Developers, Kernel Team
On Sun, Sep 27, 2015 at 1:10 AM, Florian Westphal <fw@strlen.de> wrote:
> David Miller <davem@davemloft.net> wrote:
>> > This patch set addresses the issue for ILA by adding a fast locator
>> > lookup that occurs before early demux. This is done by using a hook
>> > at NF_INET_PRE_ROUTING. For the backend we implement an rhashtable
>> > that contains identifier to locator to mappings. The table also
>> > allows more specific matches that include original locator and
>> > interface.
>>
>> I really don't think we should use netfilter hooks to perform
>> operations setup outside of netfilter's normal configuration
>> mechanisms.
>
> Thanks, thats my thinking as well.
>
>> If ILA were instead configured inside of netfilter's normal mechanisms
>> then there would be full transparency about whether ILA
>> transformations are performed before or after the user's other
>> netfilter rules. And the user would have full control over this.
>> As implemented here, they don't.
>>
>> So sorry, I'm not too keen on this and I bet if netfilter developers
>> reviewed this patch series they'd have similar objections.
>
> Seems this should/could be implemented similar to RFC6296 network
> prefix translations (see net/ipv6/netfilter/ip6t_NPT.c).
Hi Florian,
RFC6296 doesn't work because it allows an invalid checksum to be sent
on wire relative to the addresses used on the wire. That means we
would lose CHECKSUM_UNNECESSARY for ILA which is way too big of a
performance hit. This might actually be worse for some devices that
are doing NETIF_F_IP_CSUM if they are calculating the checksum based
on the addresses in packet instead of the ultimate destination address
which seems to be what RFC6296 wants (could result checksum errors).
Unfortunately we're going to be seeing this problem more and more as
there are more method that rewrite addresses without updating the
transport checksum before leaving the host (e.g. segment routing).
For devices doing CHECKSUM_COMPLETE I suspect there are also some
issues. In ip6t_npt_map_pfx it seems like if the addresses are being
rewritten in the receive path, the skb->csum should be updated
accordingly. GRO might consistently fail on bad checksums also...
In any case, I did at one point create some netfilter targets for ILA
to do the translation correctly updating the checksum. While this
provided the required functionality, I couldn't get sufficient
performance. A specialized fixed length lookup table gets most of the
performance needed for ILA.
Thanks,
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook
2015-09-28 22:27 ` Tom Herbert
@ 2015-09-28 23:00 ` Florian Westphal
2015-09-29 17:17 ` Tom Herbert
0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2015-09-28 23:00 UTC (permalink / raw)
To: Tom Herbert
Cc: Florian Westphal, David Miller, Linux Kernel Network Developers,
Kernel Team
Tom Herbert <tom@herbertland.com> wrote:
> RFC6296 doesn't work because it allows an invalid checksum to be sent
> on wire relative to the addresses used on the wire. That means we
> would lose CHECKSUM_UNNECESSARY for ILA which is way too big of a
> performance hit.
Not following. I did not say you should use NPT instead of ILA.
[..]
> In any case, I did at one point create some netfilter targets for ILA
> to do the translation correctly updating the checksum. While this
> provided the required functionality, I couldn't get sufficient
> performance. A specialized fixed length lookup table gets most of the
> performance needed for ILA.
I'm not following at all.
Could you explain why you can't just 'relocate' your proposed
implementation to netfilter/ipv6?
F.e. I see no reason why you could not use a lookup table in a netfilter
target (or nft expression, for that matter) ... ?
Thanks,
Florian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 0/4] ila: Use NF_INET_PRE_ROUTING nfhook
2015-09-28 23:00 ` Florian Westphal
@ 2015-09-29 17:17 ` Tom Herbert
0 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2015-09-29 17:17 UTC (permalink / raw)
To: Florian Westphal
Cc: David Miller, Linux Kernel Network Developers, Kernel Team
On Mon, Sep 28, 2015 at 4:00 PM, Florian Westphal <fw@strlen.de> wrote:
> Tom Herbert <tom@herbertland.com> wrote:
>> RFC6296 doesn't work because it allows an invalid checksum to be sent
>> on wire relative to the addresses used on the wire. That means we
>> would lose CHECKSUM_UNNECESSARY for ILA which is way too big of a
>> performance hit.
>
> Not following. I did not say you should use NPT instead of ILA.
>
> [..]
>> In any case, I did at one point create some netfilter targets for ILA
>> to do the translation correctly updating the checksum. While this
>> provided the required functionality, I couldn't get sufficient
>> performance. A specialized fixed length lookup table gets most of the
>> performance needed for ILA.
>
> I'm not following at all.
>
> Could you explain why you can't just 'relocate' your proposed
> implementation to netfilter/ipv6?
>
Florian
I modified DNPT to perform ILA. Performance results are below. What I
see is that DNPT offers only a slight improvement over just doing
translation at LWT and not getting a hit in early demux. Top function
in perf is:
2.49% [kernel] [k] ip6t_do_table
so I think this performance result is mostly the overhead of netfilter
and not ILA translation. But in any case, doing a direct specialized
lookup like what we do in this patch gets us close to same performance
without ILA enabled-- low performance overhead is critical for our ILA
use cases.
Tom
No ILA, baseline
85.72% CPU utilization
1861945 tps
93/163/330 50/90/99% latencies
ILA before fix (LWT on both input and output)
83.47 CPU utilization
16583186 tps (-11% from baseline)
107/183/338 50/90/99% latencies
ILA after fix (hook for input)
84.97% CPU utilization
1833948 tps (-1.5% from baseline)
95/164/331 50/90/99% latencies
Modify DNPT to do ILA (ip6tables -t mangle -I PREROUTING -d
2001:0:0:3::/64 -j DNPT --src-pfx 2001:0:0:3::/64 --dst-pfx
3333:0:0:1::/64)
80.94% CPU utilization
1683315 tps (-10% from baseline)
104/179/350 50/90/99% latencies
> F.e. I see no reason why you could not use a lookup table in a netfilter
> target (or nft expression, for that matter) ... ?
>
> Thanks,
> Florian
^ permalink raw reply [flat|nested] 10+ messages in thread