* [PATCH net-next 0/2] ila: Cache a route in ILA lwt structure
@ 2016-10-14 0:57 Tom Herbert
2016-10-14 0:57 ` [PATCH net-next 1/2] lwtunnel: Add destroy state operation Tom Herbert
2016-10-14 0:57 ` [PATCH net-next 2/2] ila: Cache a route to translated address Tom Herbert
0 siblings, 2 replies; 8+ messages in thread
From: Tom Herbert @ 2016-10-14 0:57 UTC (permalink / raw)
To: davem, netdev, roopa; +Cc: kernel-team
Add a dst_cache to ila_lwt structure. This holds a cached route for the
translated address. In ila_output we now perform a route lookup after
translation and if possible (destination in original route is full 128
bits) we set the dst_cache. Subsequent calls to ila_output can then use
the cache to avoid the route lookup.
This eliminates the need to set the gateway on ILA routes as previously
was being done. Now we can do something like:
./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
csum-mode neutral-map dev eth0 ## No via needed!
Also, add destroy_state to lwt ops. We need this do destroy the
dst_cache.
Tested:
Running 200 TCP_RR streams:
Baseline, no ILA
1730716 tps
102/170/313 50/90/99% latencies
88.11 CPU utilization
Using ILA in both directions
1680428 tps
105/176/325 50/90/99% latencies
88.16 CPU utilization
Tom Herbert (2):
lwtunnel: Add destroy state operation
ila: Cache a route to translated address
include/net/lwtunnel.h | 7 ++---
net/core/lwtunnel.c | 13 +++++++++
net/ipv6/ila/ila_lwt.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 88 insertions(+), 8 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next 1/2] lwtunnel: Add destroy state operation
2016-10-14 0:57 [PATCH net-next 0/2] ila: Cache a route in ILA lwt structure Tom Herbert
@ 2016-10-14 0:57 ` Tom Herbert
2016-10-14 5:58 ` Roopa Prabhu
` (2 more replies)
2016-10-14 0:57 ` [PATCH net-next 2/2] ila: Cache a route to translated address Tom Herbert
1 sibling, 3 replies; 8+ messages in thread
From: Tom Herbert @ 2016-10-14 0:57 UTC (permalink / raw)
To: davem, netdev, roopa; +Cc: kernel-team
Users of lwt tunnels may set up some secondary state in build_state
function. Add a corresponding destroy_state function to allow users to
clean up state. This destroy state function is called from lwstate_free.
Also, we now free lwstate using kfree_rcu so user can assume structure
is not freed before rcu.
Signed-off-by: Tom Herbert <tom@herbertland.com>
---
include/net/lwtunnel.h | 7 +++----
net/core/lwtunnel.c | 13 +++++++++++++
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index ea3f80f..119b11b 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -29,6 +29,7 @@ struct lwtunnel_state {
int (*orig_input)(struct sk_buff *);
int len;
__u16 headroom;
+ struct rcu_head rcu;
__u8 data[0];
};
@@ -43,13 +44,11 @@ struct lwtunnel_encap_ops {
int (*get_encap_size)(struct lwtunnel_state *lwtstate);
int (*cmp_encap)(struct lwtunnel_state *a, struct lwtunnel_state *b);
int (*xmit)(struct sk_buff *skb);
+ void (*destroy_state)(struct lwtunnel_state *lws);
};
#ifdef CONFIG_LWTUNNEL
-static inline void lwtstate_free(struct lwtunnel_state *lws)
-{
- kfree(lws);
-}
+void lwtstate_free(struct lwtunnel_state *lws);
static inline struct lwtunnel_state *
lwtstate_get(struct lwtunnel_state *lws)
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index e5f84c2..7401474 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -130,6 +130,19 @@ int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
}
EXPORT_SYMBOL(lwtunnel_build_state);
+void lwtstate_free(struct lwtunnel_state *lws)
+{
+ const struct lwtunnel_encap_ops *ops = lwtun_encaps[lws->type];
+
+ if (ops->destroy_state) {
+ ops->destroy_state(lws);
+ kfree_rcu(lws, rcu);
+ } else {
+ kfree(lws);
+ }
+}
+EXPORT_SYMBOL(lwtstate_free);
+
int lwtunnel_fill_encap(struct sk_buff *skb, struct lwtunnel_state *lwtstate)
{
const struct lwtunnel_encap_ops *ops;
--
2.9.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 2/2] ila: Cache a route to translated address
2016-10-14 0:57 [PATCH net-next 0/2] ila: Cache a route in ILA lwt structure Tom Herbert
2016-10-14 0:57 ` [PATCH net-next 1/2] lwtunnel: Add destroy state operation Tom Herbert
@ 2016-10-14 0:57 ` Tom Herbert
2016-10-14 6:22 ` Roopa Prabhu
1 sibling, 1 reply; 8+ messages in thread
From: Tom Herbert @ 2016-10-14 0:57 UTC (permalink / raw)
To: davem, netdev, roopa; +Cc: kernel-team
Add a dst_cache to ila_lwt structure. This holds a cached route for the
translated address. In ila_output we now perform a route lookup after
translation and if possible (destination in original route is full 128
bits) we set the dst_cache. Subsequent calls to ila_output can then use
the cache to avoid the route lookup.
This eliminates the need to set the gateway on ILA routes as previously
was being done. Now we can do something like:
./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
csum-mode neutral-map dev eth0 ## No via needed!
Signed-off-by: Tom Herbert <tom@herbertland.com>
---
net/ipv6/ila/ila_lwt.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 72 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index e50c27a..df14b00 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -6,29 +6,80 @@
#include <linux/socket.h>
#include <linux/types.h>
#include <net/checksum.h>
+#include <net/dst_cache.h>
#include <net/ip.h>
#include <net/ip6_fib.h>
+#include <net/ip6_route.h>
#include <net/lwtunnel.h>
#include <net/protocol.h>
#include <uapi/linux/ila.h>
#include "ila.h"
+struct ila_lwt {
+ struct ila_params p;
+ struct dst_cache dst_cache;
+ u32 connected : 1;
+};
+
+static inline struct ila_lwt *ila_lwt_lwtunnel(
+ struct lwtunnel_state *lwstate)
+{
+ return (struct ila_lwt *)lwstate->data;
+}
+
static inline struct ila_params *ila_params_lwtunnel(
struct lwtunnel_state *lwstate)
{
- return (struct ila_params *)lwstate->data;
+ return &((struct ila_lwt *)lwstate->data)->p;
}
static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
- struct dst_entry *dst = skb_dst(skb);
+ struct dst_entry *orig_dst = skb_dst(skb);
+ struct ila_lwt *ilwt = ila_lwt_lwtunnel(orig_dst->lwtstate);
+ struct dst_entry *dst;
+ int err = -EINVAL;
if (skb->protocol != htons(ETH_P_IPV6))
goto drop;
- ila_update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate), true);
+ ila_update_ipv6_locator(skb, ila_params_lwtunnel(orig_dst->lwtstate),
+ true);
- return dst->lwtstate->orig_output(net, sk, skb);
+ dst = dst_cache_get(&ilwt->dst_cache);
+ if (unlikely(!dst)) {
+ struct ipv6hdr *ip6h = ipv6_hdr(skb);
+ struct flowi6 fl6;
+
+ /* Lookup a route for the new destination. Take into
+ * account that the base route may already have a gateway.
+ */
+
+ memset(&fl6, 0, sizeof(fl6));
+ fl6.flowi6_oif = orig_dst->dev->ifindex;
+ fl6.flowi6_iif = LOOPBACK_IFINDEX;
+ fl6.daddr = *rt6_nexthop((struct rt6_info *)orig_dst,
+ &ip6h->daddr);
+
+ dst = ip6_route_output(net, NULL, &fl6);
+ if (dst->error) {
+ err = -EHOSTUNREACH;
+ dst_release(dst);
+ goto drop;
+ }
+
+ dst = xfrm_lookup(net, dst, flowi6_to_flowi(&fl6), NULL, 0);
+ if (IS_ERR(dst)) {
+ err = PTR_ERR(dst);
+ goto drop;
+ }
+
+ if (ilwt->connected)
+ dst_cache_set_ip6(&ilwt->dst_cache, dst, &fl6.saddr);
+ }
+
+ skb_dst_set(skb, dst);
+ return dst_output(dev_net(skb_dst(skb)->dev), sk, skb);
drop:
kfree_skb(skb);
@@ -60,6 +111,7 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
unsigned int family, const void *cfg,
struct lwtunnel_state **ts)
{
+ struct ila_lwt *ilwt;
struct ila_params *p;
struct nlattr *tb[ILA_ATTR_MAX + 1];
size_t encap_len = sizeof(*p);
@@ -99,6 +151,13 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
if (!newts)
return -ENOMEM;
+ ilwt = ila_lwt_lwtunnel(newts);
+ ret = dst_cache_init(&ilwt->dst_cache, GFP_ATOMIC);
+ if (ret) {
+ kfree(newts);
+ return ret;
+ }
+
newts->len = encap_len;
p = ila_params_lwtunnel(newts);
@@ -120,11 +179,19 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
LWTUNNEL_STATE_INPUT_REDIRECT;
+ if (cfg6->fc_dst_len == sizeof(struct in6_addr))
+ ilwt->connected = 1;
+
*ts = newts;
return 0;
}
+static void ila_destroy_state(struct lwtunnel_state *lwt)
+{
+ dst_cache_destroy(&ila_lwt_lwtunnel(lwt)->dst_cache);
+}
+
static int ila_fill_encap_info(struct sk_buff *skb,
struct lwtunnel_state *lwtstate)
{
@@ -159,6 +226,7 @@ static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
static const struct lwtunnel_encap_ops ila_encap_ops = {
.build_state = ila_build_state,
+ .destroy_state = ila_destroy_state,
.output = ila_output,
.input = ila_input,
.fill_encap = ila_fill_encap_info,
--
2.9.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 1/2] lwtunnel: Add destroy state operation
2016-10-14 0:57 ` [PATCH net-next 1/2] lwtunnel: Add destroy state operation Tom Herbert
@ 2016-10-14 5:58 ` Roopa Prabhu
2016-10-14 8:59 ` Jiri Benc
2016-10-14 15:15 ` David Miller
2 siblings, 0 replies; 8+ messages in thread
From: Roopa Prabhu @ 2016-10-14 5:58 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, netdev, kernel-team
On 10/13/16, 5:57 PM, Tom Herbert wrote:
> Users of lwt tunnels may set up some secondary state in build_state
> function. Add a corresponding destroy_state function to allow users to
> clean up state. This destroy state function is called from lwstate_free.
> Also, we now free lwstate using kfree_rcu so user can assume structure
> is not freed before rcu.
>
> Signed-off-by: Tom Herbert <tom@herbertland.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
this will be useful elsewhere too, thanks!.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 2/2] ila: Cache a route to translated address
2016-10-14 0:57 ` [PATCH net-next 2/2] ila: Cache a route to translated address Tom Herbert
@ 2016-10-14 6:22 ` Roopa Prabhu
2016-10-14 9:04 ` Jiri Benc
0 siblings, 1 reply; 8+ messages in thread
From: Roopa Prabhu @ 2016-10-14 6:22 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, netdev, kernel-team
On 10/13/16, 5:57 PM, Tom Herbert wrote:
> Add a dst_cache to ila_lwt structure. This holds a cached route for the
> translated address. In ila_output we now perform a route lookup after
> translation and if possible (destination in original route is full 128
> bits) we set the dst_cache. Subsequent calls to ila_output can then use
> the cache to avoid the route lookup.
>
> This eliminates the need to set the gateway on ILA routes as previously
> was being done. Now we can do something like:
>
> ./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
> csum-mode neutral-map dev eth0 ## No via needed!
>
> Signed-off-by: Tom Herbert <tom@herbertland.com>
> ---
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
This is a nice way to cache and redirect to a new dst.
This removes the last and only user of lwt orig_output. we can drop it
subsequently. But since orig_input is still in use, probably better to keep it
around for symmetry and for other uses in the future.
thanks.
> net/ipv6/ila/ila_lwt.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 72 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
> index e50c27a..df14b00 100644
> --- a/net/ipv6/ila/ila_lwt.c
> +++ b/net/ipv6/ila/ila_lwt.c
> @@ -6,29 +6,80 @@
> #include <linux/socket.h>
> #include <linux/types.h>
> #include <net/checksum.h>
> +#include <net/dst_cache.h>
> #include <net/ip.h>
> #include <net/ip6_fib.h>
> +#include <net/ip6_route.h>
> #include <net/lwtunnel.h>
> #include <net/protocol.h>
> #include <uapi/linux/ila.h>
> #include "ila.h"
>
> +struct ila_lwt {
> + struct ila_params p;
> + struct dst_cache dst_cache;
> + u32 connected : 1;
> +};
> +
> +static inline struct ila_lwt *ila_lwt_lwtunnel(
> + struct lwtunnel_state *lwstate)
> +{
> + return (struct ila_lwt *)lwstate->data;
> +}
> +
> static inline struct ila_params *ila_params_lwtunnel(
> struct lwtunnel_state *lwstate)
> {
> - return (struct ila_params *)lwstate->data;
> + return &((struct ila_lwt *)lwstate->data)->p;
> }
>
> static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> {
> - struct dst_entry *dst = skb_dst(skb);
> + struct dst_entry *orig_dst = skb_dst(skb);
> + struct ila_lwt *ilwt = ila_lwt_lwtunnel(orig_dst->lwtstate);
> + struct dst_entry *dst;
> + int err = -EINVAL;
>
> if (skb->protocol != htons(ETH_P_IPV6))
> goto drop;
>
> - ila_update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate), true);
> + ila_update_ipv6_locator(skb, ila_params_lwtunnel(orig_dst->lwtstate),
> + true);
>
> - return dst->lwtstate->orig_output(net, sk, skb);
> + dst = dst_cache_get(&ilwt->dst_cache);
> + if (unlikely(!dst)) {
> + struct ipv6hdr *ip6h = ipv6_hdr(skb);
> + struct flowi6 fl6;
> +
> + /* Lookup a route for the new destination. Take into
> + * account that the base route may already have a gateway.
> + */
> +
> + memset(&fl6, 0, sizeof(fl6));
> + fl6.flowi6_oif = orig_dst->dev->ifindex;
> + fl6.flowi6_iif = LOOPBACK_IFINDEX;
> + fl6.daddr = *rt6_nexthop((struct rt6_info *)orig_dst,
> + &ip6h->daddr);
> +
> + dst = ip6_route_output(net, NULL, &fl6);
> + if (dst->error) {
> + err = -EHOSTUNREACH;
> + dst_release(dst);
> + goto drop;
> + }
> +
> + dst = xfrm_lookup(net, dst, flowi6_to_flowi(&fl6), NULL, 0);
> + if (IS_ERR(dst)) {
> + err = PTR_ERR(dst);
> + goto drop;
> + }
> +
> + if (ilwt->connected)
> + dst_cache_set_ip6(&ilwt->dst_cache, dst, &fl6.saddr);
> + }
> +
> + skb_dst_set(skb, dst);
> + return dst_output(dev_net(skb_dst(skb)->dev), sk, skb);
>
> drop:
> kfree_skb(skb);
> @@ -60,6 +111,7 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
> unsigned int family, const void *cfg,
> struct lwtunnel_state **ts)
> {
> + struct ila_lwt *ilwt;
> struct ila_params *p;
> struct nlattr *tb[ILA_ATTR_MAX + 1];
> size_t encap_len = sizeof(*p);
> @@ -99,6 +151,13 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
> if (!newts)
> return -ENOMEM;
>
> + ilwt = ila_lwt_lwtunnel(newts);
> + ret = dst_cache_init(&ilwt->dst_cache, GFP_ATOMIC);
> + if (ret) {
> + kfree(newts);
> + return ret;
> + }
> +
> newts->len = encap_len;
> p = ila_params_lwtunnel(newts);
>
> @@ -120,11 +179,19 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
> newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
> LWTUNNEL_STATE_INPUT_REDIRECT;
>
> + if (cfg6->fc_dst_len == sizeof(struct in6_addr))
> + ilwt->connected = 1;
> +
> *ts = newts;
>
> return 0;
> }
>
> +static void ila_destroy_state(struct lwtunnel_state *lwt)
> +{
> + dst_cache_destroy(&ila_lwt_lwtunnel(lwt)->dst_cache);
> +}
> +
> static int ila_fill_encap_info(struct sk_buff *skb,
> struct lwtunnel_state *lwtstate)
> {
> @@ -159,6 +226,7 @@ static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
>
> static const struct lwtunnel_encap_ops ila_encap_ops = {
> .build_state = ila_build_state,
> + .destroy_state = ila_destroy_state,
> .output = ila_output,
> .input = ila_input,
> .fill_encap = ila_fill_encap_info,
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 1/2] lwtunnel: Add destroy state operation
2016-10-14 0:57 ` [PATCH net-next 1/2] lwtunnel: Add destroy state operation Tom Herbert
2016-10-14 5:58 ` Roopa Prabhu
@ 2016-10-14 8:59 ` Jiri Benc
2016-10-14 15:15 ` David Miller
2 siblings, 0 replies; 8+ messages in thread
From: Jiri Benc @ 2016-10-14 8:59 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, netdev, roopa, kernel-team
On Thu, 13 Oct 2016 17:57:42 -0700, Tom Herbert wrote:
> @@ -43,13 +44,11 @@ struct lwtunnel_encap_ops {
> int (*get_encap_size)(struct lwtunnel_state *lwtstate);
> int (*cmp_encap)(struct lwtunnel_state *a, struct lwtunnel_state *b);
> int (*xmit)(struct sk_buff *skb);
> + void (*destroy_state)(struct lwtunnel_state *lws);
> };
Could you add destroy_state next to build_state? Seems weird to have
those two scattered at the opposite ends of the structure. Looks good
otherwise.
Thanks,
Jiri
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 2/2] ila: Cache a route to translated address
2016-10-14 6:22 ` Roopa Prabhu
@ 2016-10-14 9:04 ` Jiri Benc
0 siblings, 0 replies; 8+ messages in thread
From: Jiri Benc @ 2016-10-14 9:04 UTC (permalink / raw)
To: Roopa Prabhu; +Cc: Tom Herbert, davem, netdev, kernel-team
On Thu, 13 Oct 2016 23:22:14 -0700, Roopa Prabhu wrote:
> This removes the last and only user of lwt orig_output. we can drop it
> subsequently. But since orig_input is still in use, probably better to keep it
> around for symmetry and for other uses in the future.
If it's no longer used, let's remove it. It can be always added later
again if needed. We don't keep things just because they maybe can be
used for something in the future.
Thanks,
Jiri
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 1/2] lwtunnel: Add destroy state operation
2016-10-14 0:57 ` [PATCH net-next 1/2] lwtunnel: Add destroy state operation Tom Herbert
2016-10-14 5:58 ` Roopa Prabhu
2016-10-14 8:59 ` Jiri Benc
@ 2016-10-14 15:15 ` David Miller
2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2016-10-14 15:15 UTC (permalink / raw)
To: tom; +Cc: netdev, roopa, kernel-team
From: Tom Herbert <tom@herbertland.com>
Date: Thu, 13 Oct 2016 17:57:42 -0700
> @@ -130,6 +130,19 @@ int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
> }
> EXPORT_SYMBOL(lwtunnel_build_state);
>
> +void lwtstate_free(struct lwtunnel_state *lws)
There should only be one space between "void" and "lwstate_free".
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-10-14 15:15 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-14 0:57 [PATCH net-next 0/2] ila: Cache a route in ILA lwt structure Tom Herbert
2016-10-14 0:57 ` [PATCH net-next 1/2] lwtunnel: Add destroy state operation Tom Herbert
2016-10-14 5:58 ` Roopa Prabhu
2016-10-14 8:59 ` Jiri Benc
2016-10-14 15:15 ` David Miller
2016-10-14 0:57 ` [PATCH net-next 2/2] ila: Cache a route to translated address Tom Herbert
2016-10-14 6:22 ` Roopa Prabhu
2016-10-14 9:04 ` Jiri Benc
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).