* [PATCH net v2 1/6] ipv4: fix sending of redirects
2012-10-08 21:41 [PATCH net v2 0/6] ipv4: Changes for rt_gateway Julian Anastasov
@ 2012-10-08 21:41 ` Julian Anastasov
2012-10-08 21:41 ` [PATCH net v2 2/6] ipv4: fix forwarding for strict source routes Julian Anastasov
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
To: netdev
After "Cache input routes in fib_info nexthops" (commit
d2d68ba9fe) and "Elide fib_validate_source() completely when possible"
(commit 7a9bc9b81a) we can not send ICMP redirects. It seems we
should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
the same fib_info can be used for traffic that is not redirected,
eg. from other input devices or from sources that are not in same subnet.
As result, we have to disable the caching of RTCF_DOREDIRECT
flag and to force source validation for the case when forwarding
traffic to the input device. If traffic comes from directly connected
source we allow redirection as it was done before both changes.
Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
is disabled, this can avoid source address validation and to
help caching the routes.
After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) we should make sure our ICMP_REDIR_HOST messages
contain daddr instead of 0.0.0.0 when target is directly connected.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
net/ipv4/fib_frontend.c | 3 ++-
net/ipv4/route.c | 30 ++++++++++++++++--------------
2 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 68c93d1..825c608 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -322,7 +322,8 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
{
int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
- if (!r && !fib_num_tclassid_users(dev_net(dev))) {
+ if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
+ (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
*itag = 0;
return 0;
}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 132e0df..b90da1b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -802,7 +802,8 @@ void ip_rt_send_redirect(struct sk_buff *skb)
net = dev_net(rt->dst.dev);
peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, 1);
if (!peer) {
- icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, rt->rt_gateway);
+ icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST,
+ rt_nexthop(rt, ip_hdr(skb)->daddr));
return;
}
@@ -827,7 +828,9 @@ void ip_rt_send_redirect(struct sk_buff *skb)
time_after(jiffies,
(peer->rate_last +
(ip_rt_redirect_load << peer->rate_tokens)))) {
- icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, rt->rt_gateway);
+ __be32 gw = rt_nexthop(rt, ip_hdr(skb)->daddr);
+
+ icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, gw);
peer->rate_last = jiffies;
++peer->rate_tokens;
#ifdef CONFIG_IP_ROUTE_VERBOSE
@@ -835,7 +838,7 @@ void ip_rt_send_redirect(struct sk_buff *skb)
peer->rate_tokens == ip_rt_redirect_number)
net_warn_ratelimited("host %pI4/if%d ignores redirects for %pI4 to %pI4\n",
&ip_hdr(skb)->saddr, inet_iif(skb),
- &ip_hdr(skb)->daddr, &rt->rt_gateway);
+ &ip_hdr(skb)->daddr, &gw);
#endif
}
out_put_peer:
@@ -1442,10 +1445,13 @@ static int __mkroute_input(struct sk_buff *skb,
goto cleanup;
}
- if (out_dev == in_dev && err &&
+ do_cache = res->fi && !itag;
+ if (out_dev == in_dev && err && IN_DEV_TX_REDIRECTS(out_dev) &&
(IN_DEV_SHARED_MEDIA(out_dev) ||
- inet_addr_onlink(out_dev, saddr, FIB_RES_GW(*res))))
+ inet_addr_onlink(out_dev, saddr, FIB_RES_GW(*res)))) {
flags |= RTCF_DOREDIRECT;
+ do_cache = false;
+ }
if (skb->protocol != htons(ETH_P_IP)) {
/* Not IP (i.e. ARP). Do not create route, if it is
@@ -1462,15 +1468,11 @@ static int __mkroute_input(struct sk_buff *skb,
}
}
- do_cache = false;
- if (res->fi) {
- if (!itag) {
- rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
- if (rt_cache_valid(rth)) {
- skb_dst_set_noref(skb, &rth->dst);
- goto out;
- }
- do_cache = true;
+ if (do_cache) {
+ rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
+ if (rt_cache_valid(rth)) {
+ skb_dst_set_noref(skb, &rth->dst);
+ goto out;
}
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net v2 2/6] ipv4: fix forwarding for strict source routes
2012-10-08 21:41 [PATCH net v2 0/6] ipv4: Changes for rt_gateway Julian Anastasov
2012-10-08 21:41 ` [PATCH net v2 1/6] ipv4: fix sending of redirects Julian Anastasov
@ 2012-10-08 21:41 ` Julian Anastasov
2012-10-08 21:41 ` [PATCH net v2 3/6] ipv4: make sure nh_pcpu_rth_output is always allocated Julian Anastasov
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
To: netdev
After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) rt_gateway can be 0 but ip_forward() compares
it directly with nexthop. What we want here is to check if traffic
is to directly connected nexthop and to fail if using gateway.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
net/ipv4/ip_forward.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index ab09b12..7f35ac2 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -85,7 +85,7 @@ int ip_forward(struct sk_buff *skb)
rt = skb_rtable(skb);
- if (opt->is_strictroute && opt->nexthop != rt->rt_gateway)
+ if (opt->is_strictroute && rt->rt_gateway)
goto sr_failed;
if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
--
1.7.3.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net v2 3/6] ipv4: make sure nh_pcpu_rth_output is always allocated
2012-10-08 21:41 [PATCH net v2 0/6] ipv4: Changes for rt_gateway Julian Anastasov
2012-10-08 21:41 ` [PATCH net v2 1/6] ipv4: fix sending of redirects Julian Anastasov
2012-10-08 21:41 ` [PATCH net v2 2/6] ipv4: fix forwarding for strict source routes Julian Anastasov
@ 2012-10-08 21:41 ` Julian Anastasov
2012-10-08 21:41 ` [PATCH net v2 4/6] ipv4: introduce rt_uses_gateway Julian Anastasov
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
To: netdev
Avoid checking nh_pcpu_rth_output in fast path,
abort fib_info creation on alloc_percpu failure.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
net/ipv4/fib_semantics.c | 2 ++
net/ipv4/route.c | 3 ---
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 2677530..71b125c 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -840,6 +840,8 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
change_nexthops(fi) {
nexthop_nh->nh_parent = fi;
nexthop_nh->nh_pcpu_rth_output = alloc_percpu(struct rtable __rcu *);
+ if (!nexthop_nh->nh_pcpu_rth_output)
+ goto failure;
} endfor_nexthops(fi)
if (cfg->fc_mx) {
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b90da1b..5b0180f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1207,8 +1207,6 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
if (rt_is_input_route(rt)) {
p = (struct rtable **)&nh->nh_rth_input;
} else {
- if (!nh->nh_pcpu_rth_output)
- goto nocache;
p = (struct rtable **)__this_cpu_ptr(nh->nh_pcpu_rth_output);
}
orig = *p;
@@ -1223,7 +1221,6 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
* unsuccessful at storing this route into the cache
* we really need to set it.
*/
-nocache:
rt->dst.flags |= DST_NOCACHE;
ret = false;
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net v2 4/6] ipv4: introduce rt_uses_gateway
2012-10-08 21:41 [PATCH net v2 0/6] ipv4: Changes for rt_gateway Julian Anastasov
` (2 preceding siblings ...)
2012-10-08 21:41 ` [PATCH net v2 3/6] ipv4: make sure nh_pcpu_rth_output is always allocated Julian Anastasov
@ 2012-10-08 21:41 ` Julian Anastasov
2012-10-08 21:41 ` [PATCH net v2 5/6] ipv4: Add FLOWI_FLAG_KNOWN_NH Julian Anastasov
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
To: netdev
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
include/net/route.h | 3 +-
net/ipv4/inet_connection_sock.c | 4 +-
net/ipv4/ip_forward.c | 2 +-
net/ipv4/ip_output.c | 4 +-
net/ipv4/route.c | 48 +++++++++++++++++++++-----------------
net/ipv4/xfrm4_policy.c | 1 +
6 files changed, 34 insertions(+), 28 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index da22243..bc40b63 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -48,7 +48,8 @@ struct rtable {
int rt_genid;
unsigned int rt_flags;
__u16 rt_type;
- __u16 rt_is_input;
+ __u8 rt_is_input;
+ __u8 rt_uses_gateway;
int rt_iif;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f0c5b9c..d34ce29 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -406,7 +406,7 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
rt = ip_route_output_flow(net, fl4, sk);
if (IS_ERR(rt))
goto no_route;
- if (opt && opt->opt.is_strictroute && rt->rt_gateway)
+ if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
goto route_err;
return &rt->dst;
@@ -442,7 +442,7 @@ struct dst_entry *inet_csk_route_child_sock(struct sock *sk,
rt = ip_route_output_flow(net, fl4, sk);
if (IS_ERR(rt))
goto no_route;
- if (opt && opt->opt.is_strictroute && rt->rt_gateway)
+ if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
goto route_err;
rcu_read_unlock();
return &rt->dst;
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 7f35ac2..694de3b 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -85,7 +85,7 @@ int ip_forward(struct sk_buff *skb)
rt = skb_rtable(skb);
- if (opt->is_strictroute && rt->rt_gateway)
+ if (opt->is_strictroute && rt->rt_uses_gateway)
goto sr_failed;
if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 24a29a3..6537a40 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -193,7 +193,7 @@ static inline int ip_finish_output2(struct sk_buff *skb)
}
rcu_read_lock_bh();
- nexthop = rt->rt_gateway ? rt->rt_gateway : ip_hdr(skb)->daddr;
+ nexthop = (__force u32) rt_nexthop(rt, ip_hdr(skb)->daddr);
neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
if (unlikely(!neigh))
neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
@@ -371,7 +371,7 @@ int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl)
skb_dst_set_noref(skb, &rt->dst);
packet_routed:
- if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_gateway)
+ if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_uses_gateway)
goto no_route;
/* OK, we know where to send it, allocate and build IP header. */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5b0180f..3a116cb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1126,7 +1126,7 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst)
mtu = dst->dev->mtu;
if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
- if (rt->rt_gateway && mtu > 576)
+ if (rt->rt_uses_gateway && mtu > 576)
mtu = 576;
}
@@ -1177,7 +1177,9 @@ static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
if (fnhe->fnhe_gw) {
rt->rt_flags |= RTCF_REDIRECTED;
rt->rt_gateway = fnhe->fnhe_gw;
- }
+ rt->rt_uses_gateway = 1;
+ } else if (!rt->rt_gateway)
+ rt->rt_gateway = daddr;
orig = rcu_dereference(fnhe->fnhe_rth);
rcu_assign_pointer(fnhe->fnhe_rth, rt);
@@ -1186,13 +1188,6 @@ static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
fnhe->fnhe_stamp = jiffies;
ret = true;
- } else {
- /* Routes we intend to cache in nexthop exception have
- * the DST_NOCACHE bit clear. However, if we are
- * unsuccessful at storing this route into the cache
- * we really need to set it.
- */
- rt->dst.flags |= DST_NOCACHE;
}
spin_unlock_bh(&fnhe_lock);
@@ -1215,15 +1210,8 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
if (prev == orig) {
if (orig)
rt_free(orig);
- } else {
- /* Routes we intend to cache in the FIB nexthop have
- * the DST_NOCACHE bit clear. However, if we are
- * unsuccessful at storing this route into the cache
- * we really need to set it.
- */
- rt->dst.flags |= DST_NOCACHE;
+ } else
ret = false;
- }
return ret;
}
@@ -1284,8 +1272,10 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
if (fi) {
struct fib_nh *nh = &FIB_RES_NH(*res);
- if (nh->nh_gw && nh->nh_scope == RT_SCOPE_LINK)
+ if (nh->nh_gw && nh->nh_scope == RT_SCOPE_LINK) {
rt->rt_gateway = nh->nh_gw;
+ rt->rt_uses_gateway = 1;
+ }
dst_init_metrics(&rt->dst, fi->fib_metrics, true);
#ifdef CONFIG_IP_ROUTE_CLASSID
rt->dst.tclassid = nh->nh_tclassid;
@@ -1294,8 +1284,18 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
cached = rt_bind_exception(rt, fnhe, daddr);
else if (!(rt->dst.flags & DST_NOCACHE))
cached = rt_cache_route(nh, rt);
- }
- if (unlikely(!cached))
+ if (unlikely(!cached)) {
+ /* Routes we intend to cache in nexthop exception or
+ * FIB nexthop have the DST_NOCACHE bit clear.
+ * However, if we are unsuccessful at storing this
+ * route into the cache we really need to set it.
+ */
+ rt->dst.flags |= DST_NOCACHE;
+ if (!rt->rt_gateway)
+ rt->rt_gateway = daddr;
+ rt_add_uncached_list(rt);
+ }
+ } else
rt_add_uncached_list(rt);
#ifdef CONFIG_IP_ROUTE_CLASSID
@@ -1363,6 +1363,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
rth->rt_iif = 0;
rth->rt_pmtu = 0;
rth->rt_gateway = 0;
+ rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
if (our) {
rth->dst.input= ip_local_deliver;
@@ -1432,7 +1433,6 @@ static int __mkroute_input(struct sk_buff *skb,
return -EINVAL;
}
-
err = fib_validate_source(skb, saddr, daddr, tos, FIB_RES_OIF(*res),
in_dev->dev, in_dev, &itag);
if (err < 0) {
@@ -1488,6 +1488,7 @@ static int __mkroute_input(struct sk_buff *skb,
rth->rt_iif = 0;
rth->rt_pmtu = 0;
rth->rt_gateway = 0;
+ rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
rth->dst.input = ip_forward;
@@ -1658,6 +1659,7 @@ local_input:
rth->rt_iif = 0;
rth->rt_pmtu = 0;
rth->rt_gateway = 0;
+ rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
if (res.type == RTN_UNREACHABLE) {
rth->dst.input= ip_error;
@@ -1826,6 +1828,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
rth->rt_iif = orig_oif ? : 0;
rth->rt_pmtu = 0;
rth->rt_gateway = 0;
+ rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
RT_CACHE_STAT_INC(out_slow_tot);
@@ -2104,6 +2107,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
rt->rt_flags = ort->rt_flags;
rt->rt_type = ort->rt_type;
rt->rt_gateway = ort->rt_gateway;
+ rt->rt_uses_gateway = ort->rt_uses_gateway;
INIT_LIST_HEAD(&rt->rt_uncached);
@@ -2182,7 +2186,7 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
if (nla_put_be32(skb, RTA_PREFSRC, fl4->saddr))
goto nla_put_failure;
}
- if (rt->rt_gateway &&
+ if (rt->rt_uses_gateway &&
nla_put_be32(skb, RTA_GATEWAY, rt->rt_gateway))
goto nla_put_failure;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 681ea2f..05c5ab8 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -91,6 +91,7 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
RTCF_LOCAL);
xdst->u.rt.rt_type = rt->rt_type;
xdst->u.rt.rt_gateway = rt->rt_gateway;
+ xdst->u.rt.rt_uses_gateway = rt->rt_uses_gateway;
xdst->u.rt.rt_pmtu = rt->rt_pmtu;
INIT_LIST_HEAD(&xdst->u.rt.rt_uncached);
--
1.7.3.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net v2 5/6] ipv4: Add FLOWI_FLAG_KNOWN_NH
2012-10-08 21:41 [PATCH net v2 0/6] ipv4: Changes for rt_gateway Julian Anastasov
` (3 preceding siblings ...)
2012-10-08 21:41 ` [PATCH net v2 4/6] ipv4: introduce rt_uses_gateway Julian Anastasov
@ 2012-10-08 21:41 ` Julian Anastasov
2012-10-08 21:41 ` [PATCH net v2 6/6] ipvs: fix ARP resolving for direct routing mode Julian Anastasov
2012-10-08 21:43 ` [PATCH net v2 0/6] ipv4: Changes for rt_gateway David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
To: netdev
Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.
The returned route can be cached as follows:
- in NH exception: because the cached routes are not shared
with other destinations
- in FIB NH: when using gateway because all destinations for
NH share same gateway
As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
include/net/flow.h | 1 +
net/ipv4/route.c | 21 +++++++++++++++++----
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/include/net/flow.h b/include/net/flow.h
index e1dd508..628e11b 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -21,6 +21,7 @@ struct flowi_common {
__u8 flowic_flags;
#define FLOWI_FLAG_ANYSRC 0x01
#define FLOWI_FLAG_CAN_SLEEP 0x02
+#define FLOWI_FLAG_KNOWN_NH 0x04
__u32 flowic_secid;
};
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3a116cb..1a0da8d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1762,6 +1762,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
struct in_device *in_dev;
u16 type = res->type;
struct rtable *rth;
+ bool do_cache;
in_dev = __in_dev_get_rcu(dev_out);
if (!in_dev)
@@ -1798,24 +1799,36 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
}
fnhe = NULL;
+ do_cache = fi != NULL;
if (fi) {
struct rtable __rcu **prth;
+ struct fib_nh *nh = &FIB_RES_NH(*res);
- fnhe = find_exception(&FIB_RES_NH(*res), fl4->daddr);
+ fnhe = find_exception(nh, fl4->daddr);
if (fnhe)
prth = &fnhe->fnhe_rth;
- else
- prth = __this_cpu_ptr(FIB_RES_NH(*res).nh_pcpu_rth_output);
+ else {
+ if (unlikely(fl4->flowi4_flags &
+ FLOWI_FLAG_KNOWN_NH &&
+ !(nh->nh_gw &&
+ nh->nh_scope == RT_SCOPE_LINK))) {
+ do_cache = false;
+ goto add;
+ }
+ prth = __this_cpu_ptr(nh->nh_pcpu_rth_output);
+ }
rth = rcu_dereference(*prth);
if (rt_cache_valid(rth)) {
dst_hold(&rth->dst);
return rth;
}
}
+
+add:
rth = rt_dst_alloc(dev_out,
IN_DEV_CONF_GET(in_dev, NOPOLICY),
IN_DEV_CONF_GET(in_dev, NOXFRM),
- fi);
+ do_cache);
if (!rth)
return ERR_PTR(-ENOBUFS);
--
1.7.3.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net v2 6/6] ipvs: fix ARP resolving for direct routing mode
2012-10-08 21:41 [PATCH net v2 0/6] ipv4: Changes for rt_gateway Julian Anastasov
` (4 preceding siblings ...)
2012-10-08 21:41 ` [PATCH net v2 5/6] ipv4: Add FLOWI_FLAG_KNOWN_NH Julian Anastasov
@ 2012-10-08 21:41 ` Julian Anastasov
2012-10-08 21:43 ` [PATCH net v2 0/6] ipv4: Changes for rt_gateway David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
To: netdev
After the change "Make neigh lookups directly in output packet path"
(commit a263b30936) IPVS can not reach the real server for DR mode
because we resolve the destination address from IP header, not from
route neighbour. Use the new FLOWI_FLAG_KNOWN_NH flag to request
output routes with known nexthop, so that it has preference
on resolving.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
net/netfilter/ipvs/ip_vs_xmit.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 56f6d5d..cc4c809 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -50,6 +50,7 @@ enum {
* local
*/
IP_VS_RT_MODE_CONNECT = 8, /* Always bind route to saddr */
+ IP_VS_RT_MODE_KNOWN_NH = 16,/* Route via remote addr */
};
/*
@@ -113,6 +114,8 @@ static struct rtable *do_output_route4(struct net *net, __be32 daddr,
fl4.daddr = daddr;
fl4.saddr = (rt_mode & IP_VS_RT_MODE_CONNECT) ? *saddr : 0;
fl4.flowi4_tos = rtos;
+ fl4.flowi4_flags = (rt_mode & IP_VS_RT_MODE_KNOWN_NH) ?
+ FLOWI_FLAG_KNOWN_NH : 0;
retry:
rt = ip_route_output_key(net, &fl4);
@@ -1061,7 +1064,8 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
RT_TOS(iph->tos),
IP_VS_RT_MODE_LOCAL |
- IP_VS_RT_MODE_NON_LOCAL, NULL)))
+ IP_VS_RT_MODE_NON_LOCAL |
+ IP_VS_RT_MODE_KNOWN_NH, NULL)))
goto tx_error_icmp;
if (rt->rt_flags & RTCF_LOCAL) {
ip_rt_put(rt);
--
1.7.3.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net v2 0/6] ipv4: Changes for rt_gateway
2012-10-08 21:41 [PATCH net v2 0/6] ipv4: Changes for rt_gateway Julian Anastasov
` (5 preceding siblings ...)
2012-10-08 21:41 ` [PATCH net v2 6/6] ipvs: fix ARP resolving for direct routing mode Julian Anastasov
@ 2012-10-08 21:43 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2012-10-08 21:43 UTC (permalink / raw)
To: ja; +Cc: netdev
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 9 Oct 2012 00:41:14 +0300
> This patchset fixes some problems for the routing caused
> by the new rt_gateway semantics. What started as a fix for
> IPVS-DR ended as fixes for more problems. To solve the IPVS
> problem I decided to name the flag FLOWI_FLAG_KNOWN_NH, so that
> we can even get route cached in FNHE or FIB NH.
>
> Different flag FLOWI_FLAG_RT_NOCACHE could be equally good
> for IPVS, we again would be able to use data from fnhe but working
> with cached routes should be preferred. If there is no FNHE, the
> common case is IPVS to get uncached route, of course, IPVS caches
> it itself.
>
> Patches 1-3 are fixes not related to IPVS problem,
> 4 and 5 add code that will be used by IPVS in patch 6.
This series looks great, applied, thanks Julian.
^ permalink raw reply [flat|nested] 8+ messages in thread