* [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
@ 2014-03-06 9:50 Michal Kubecek
2014-03-06 19:24 ` David Miller
0 siblings, 1 reply; 31+ messages in thread
From: Michal Kubecek @ 2014-03-06 9:50 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
Patrick McHardy
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.
Allocate separate metrics in ip6_route_add() and copy them into
inetpeer (and update dst->_metrics) just before the new route is
actually inserted in fib6_add_rt2node().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
net/ipv6/ip6_fib.c | 28 ++++++++++++++++++++++++++++
net/ipv6/route.c | 10 +++++-----
2 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..3067715 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -638,6 +638,32 @@ static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
RTF_GATEWAY;
}
+/* Called just before inserting a new route into the tree.
+ * If it is a host route and has its own metrics allocated, copy
+ * them to its inetpeer (create and/or bind if needed) and free
+ * the temporary block.
+ */
+
+static void rt6_metrics_to_peer(struct rt6_info *rt)
+{
+ struct inet_peer *peer = rt6_has_peer(rt) ? rt6_peer_ptr(rt) : NULL;
+ struct dst_entry *dst = &rt->dst;
+ unsigned long old = dst->_metrics;
+
+ if (!(rt->dst.flags & DST_HOST) || dst_metrics_read_only(dst))
+ return;
+ if (peer && dst_metrics_ptr(dst) == peer->metrics)
+ return;
+
+ dst->ops->cow_metrics(dst, old);
+ if (dst->_metrics != old) {
+ u32 *old_p = __DST_METRICS_PTR(old);
+
+ memcpy(dst_metrics_ptr(dst), old_p, RTAX_MAX * sizeof(u32));
+ kfree(old_p);
+ }
+}
+
/*
* Insert routing information in a node.
*/
@@ -752,6 +778,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
add:
rt->dst.rt6_next = iter;
+ rt6_metrics_to_peer(rt);
*ins = rt;
rt->rt6i_node = fn;
atomic_inc(&rt->rt6i_ref);
@@ -770,6 +797,7 @@ add:
pr_warn("NLM_F_REPLACE set, but no existing node found!\n");
return -ENOENT;
}
+ rt6_metrics_to_peer(rt);
*ins = rt;
rt->rt6i_node = fn;
rt->dst.rt6_next = iter->dst.rt6_next;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 11dac21..4ba981e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -324,8 +324,10 @@ static void ip6_dst_destroy(struct dst_entry *dst)
struct rt6_info *rt = (struct rt6_info *)dst;
struct inet6_dev *idev = rt->rt6i_idev;
struct dst_entry *from = dst->from;
+ struct inet_peer *peer = rt6_has_peer(rt) ? rt6_peer_ptr(rt) : NULL;
- if (!(rt->dst.flags & DST_HOST))
+ if (!(dst->flags & DST_HOST) || !peer ||
+ DST_METRICS_PTR(dst) != peer->metrics)
dst_destroy_metrics_generic(dst);
if (idev) {
@@ -336,10 +338,8 @@ static void ip6_dst_destroy(struct dst_entry *dst)
dst->from = NULL;
dst_release(from);
- if (rt6_has_peer(rt)) {
- struct inet_peer *peer = rt6_peer_ptr(rt);
+ if (peer)
inet_putpeer(peer);
- }
}
static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
@@ -1546,7 +1546,7 @@ int ip6_route_add(struct fib6_config *cfg)
if (rt->rt6i_dst.plen == 128)
rt->dst.flags |= DST_HOST;
- if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
+ if (cfg->fc_mx) {
u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
if (!metrics) {
err = -ENOMEM;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-06 9:50 [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely Michal Kubecek
@ 2014-03-06 19:24 ` David Miller
2014-03-06 20:06 ` Michal Kubecek
0 siblings, 1 reply; 31+ messages in thread
From: David Miller @ 2014-03-06 19:24 UTC (permalink / raw)
To: mkubecek; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
From: Michal Kubecek <mkubecek@suse.cz>
Date: Thu, 6 Mar 2014 10:50:54 +0100 (CET)
> If an IPv6 host route with metrics exists, an attempt to add a
> new route for the same target with different metrics fails but
> rewrites the metrics anyway:
>
> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
> 12sp0:~ # ip -6 route show
> fe80::/64 dev eth0 proto kernel metric 256
> fec0::1 dev eth0 metric 1024 rto_min lock 1s
> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
> RTNETLINK answers: File exists
> 12sp0:~ # ip -6 route show
> fe80::/64 dev eth0 proto kernel metric 256
> fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
>
> This is caused by all IPv6 host routes using the metrics in
> their inetpeer (or the shared default). This also holds for the
> new route created in ip6_route_add() which shares the metrics
> with the already existing route and thus ip6_route_add()
> rewrites the metrics even if the new route ends up not being
> used at all.
>
> Allocate separate metrics in ip6_route_add() and copy them into
> inetpeer (and update dst->_metrics) just before the new route is
> actually inserted in fib6_add_rt2node().
>
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Thanks for working on this.
> +static void rt6_metrics_to_peer(struct rt6_info *rt)
> +{
> + struct inet_peer *peer = rt6_has_peer(rt) ? rt6_peer_ptr(rt) : NULL;
> + struct dst_entry *dst = &rt->dst;
> + unsigned long old = dst->_metrics;
> +
> + if (!(rt->dst.flags & DST_HOST) || dst_metrics_read_only(dst))
> + return;
> + if (peer && dst_metrics_ptr(dst) == peer->metrics)
> + return;
> +
> + dst->ops->cow_metrics(dst, old);
> + if (dst->_metrics != old) {
> + u32 *old_p = __DST_METRICS_PTR(old);
> +
> + memcpy(dst_metrics_ptr(dst), old_p, RTAX_MAX * sizeof(u32));
> + kfree(old_p);
> + }
> +}
Hmmm... if inet_metrics_new() is true then ->cow_metrics() will copy the
metrics from old to new. So you therefore shouldn't have to do the copy
explicitly here.
If inet_metrics_new() is not true, you are overwriting non-new metrics.
If there is some reason why what rt6_metrics_to_peer() is doing is OK, I'd
like you to explain this in the commit message.
Thanks!
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-06 19:24 ` David Miller
@ 2014-03-06 20:06 ` Michal Kubecek
2014-03-07 12:36 ` [PATCH net v2] " Michal Kubecek
2014-03-07 20:52 ` [PATCH net] " David Miller
0 siblings, 2 replies; 31+ messages in thread
From: Michal Kubecek @ 2014-03-06 20:06 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
On Thu, Mar 06, 2014 at 02:24:25PM -0500, David Miller wrote:
>
> > +static void rt6_metrics_to_peer(struct rt6_info *rt)
> > +{
> > + struct inet_peer *peer = rt6_has_peer(rt) ? rt6_peer_ptr(rt) : NULL;
> > + struct dst_entry *dst = &rt->dst;
> > + unsigned long old = dst->_metrics;
> > +
> > + if (!(rt->dst.flags & DST_HOST) || dst_metrics_read_only(dst))
> > + return;
> > + if (peer && dst_metrics_ptr(dst) == peer->metrics)
> > + return;
> > +
> > + dst->ops->cow_metrics(dst, old);
> > + if (dst->_metrics != old) {
> > + u32 *old_p = __DST_METRICS_PTR(old);
> > +
> > + memcpy(dst_metrics_ptr(dst), old_p, RTAX_MAX * sizeof(u32));
> > + kfree(old_p);
> > + }
> > +}
>
> Hmmm... if inet_metrics_new() is true then ->cow_metrics() will copy the
> metrics from old to new. So you therefore shouldn't have to do the copy
> explicitly here.
If we are replacing an existing host route with metrics, e.g.
ip route add fec0::1 dev eth0 rto_min 1000
ip route change fec0::1 dev eth0 rto_min 1500
then peer will be the existing inetpeer and inet_metrics_new() will be
false. However, we still need to copy the new metrics from the netlink
message over the old ones.
> If inet_metrics_new() is not true, you are overwriting non-new metrics.
The only problem with this is IMHO that if inet_metrics_new() is true,
i.e. when adding a new route with new inetpeer (or old inetpeer whose
metrics were not used before), the memcpy() is done twice, once in
ipv6_cow_metrics() and once in rt6_metrics_to_peer(). We are copying the
same data twice so that the result is correct but it's not efficient and
it's not nice.
The only way I can see to avoid this (except using own metrics always
instead of those in struct inetpeer as we do for non-host routes) would
be not to call ipv6_cow_metrics() at all and write a special function
for this purpose in net/ipv6/route.c which would duplicate the parts of
ipv6_cow_metrics() we really need (and add the free()). Do you think
this is the way to go?
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net v2] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-06 20:06 ` Michal Kubecek
@ 2014-03-07 12:36 ` Michal Kubecek
2014-03-07 20:52 ` [PATCH net] " David Miller
1 sibling, 0 replies; 31+ messages in thread
From: Michal Kubecek @ 2014-03-07 12:36 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
Patrick McHardy
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.
Allocate separate metrics in ip6_route_add() and copy them into
inetpeer (and update dst->_metrics) just before the new route is
actually inserted in fib6_add_rt2node() (so that we don't have
to care about simultaneous access).
v2: rather than trying to (ab)use ipv6_cow_metrics(), add a new
function doing only what is actually needed. Keep some checks in
an inline wrapper to avoid a function call in most cases.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
include/net/ip6_route.h | 1 +
net/ipv6/ip6_fib.c | 10 ++++++++++
net/ipv6/route.c | 28 +++++++++++++++++++++++-----
3 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 017badb..abb3bfa 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -117,6 +117,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg);
void rt6_ifdown(struct net *net, struct net_device *dev);
void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
+void rt6_move_metrics_to_peer(struct rt6_info *rt);
/*
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..454eff7 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -638,6 +638,14 @@ static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
RTF_GATEWAY;
}
+static inline void rt6_metrics_to_peer(struct rt6_info *rt)
+{
+ struct dst_entry *dst = &rt->dst;
+
+ if ((dst->flags & DST_HOST) && !dst_metrics_read_only(dst))
+ rt6_move_metrics_to_peer(rt);
+}
+
/*
* Insert routing information in a node.
*/
@@ -752,6 +760,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
add:
rt->dst.rt6_next = iter;
+ rt6_metrics_to_peer(rt);
*ins = rt;
rt->rt6i_node = fn;
atomic_inc(&rt->rt6i_ref);
@@ -770,6 +779,7 @@ add:
pr_warn("NLM_F_REPLACE set, but no existing node found!\n");
return -ENOENT;
}
+ rt6_metrics_to_peer(rt);
*ins = rt;
rt->rt6i_node = fn;
rt->dst.rt6_next = iter->dst.rt6_next;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fba54a4..6edc32e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -164,6 +164,24 @@ static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
return p;
}
+/* Copy standalone metrics to inetpeer and free the block.
+ * Assumes that route is not inserted yet and that caller checked
+ * the metrics are not read-only.
+ */
+
+void rt6_move_metrics_to_peer(struct rt6_info *rt)
+{
+ struct dst_entry *dst = &rt->dst;
+ struct inet_peer *peer = rt6_get_peer_create(rt);
+ u32 *old = dst_metrics_ptr(dst);
+
+ if (!peer || old == peer->metrics)
+ return;
+ memcpy(peer->metrics, old, RTAX_MAX * sizeof(u32));
+ dst_init_metrics(dst, peer->metrics, false);
+ kfree(old);
+}
+
static inline const void *choose_neigh_daddr(struct rt6_info *rt,
struct sk_buff *skb,
const void *daddr)
@@ -324,8 +342,10 @@ static void ip6_dst_destroy(struct dst_entry *dst)
struct rt6_info *rt = (struct rt6_info *)dst;
struct inet6_dev *idev = rt->rt6i_idev;
struct dst_entry *from = dst->from;
+ struct inet_peer *peer = rt6_has_peer(rt) ? rt6_peer_ptr(rt) : NULL;
- if (!(rt->dst.flags & DST_HOST))
+ if (!(dst->flags & DST_HOST) || !peer ||
+ DST_METRICS_PTR(dst) != peer->metrics)
dst_destroy_metrics_generic(dst);
if (idev) {
@@ -336,10 +356,8 @@ static void ip6_dst_destroy(struct dst_entry *dst)
dst->from = NULL;
dst_release(from);
- if (rt6_has_peer(rt)) {
- struct inet_peer *peer = rt6_peer_ptr(rt);
+ if (peer)
inet_putpeer(peer);
- }
}
static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
@@ -1546,7 +1564,7 @@ int ip6_route_add(struct fib6_config *cfg)
if (rt->rt6i_dst.plen == 128)
rt->dst.flags |= DST_HOST;
- if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
+ if (cfg->fc_mx) {
u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
if (!metrics) {
err = -ENOMEM;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-06 20:06 ` Michal Kubecek
2014-03-07 12:36 ` [PATCH net v2] " Michal Kubecek
@ 2014-03-07 20:52 ` David Miller
2014-03-07 21:38 ` Michal Kubecek
2014-03-08 8:06 ` Hannes Frederic Sowa
1 sibling, 2 replies; 31+ messages in thread
From: David Miller @ 2014-03-07 20:52 UTC (permalink / raw)
To: mkubecek; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
From: Michal Kubecek <mkubecek@suse.cz>
Date: Thu, 6 Mar 2014 21:06:58 +0100
> On Thu, Mar 06, 2014 at 02:24:25PM -0500, David Miller wrote:
>>
>> > +static void rt6_metrics_to_peer(struct rt6_info *rt)
>> > +{
>> > + struct inet_peer *peer = rt6_has_peer(rt) ? rt6_peer_ptr(rt) : NULL;
>> > + struct dst_entry *dst = &rt->dst;
>> > + unsigned long old = dst->_metrics;
>> > +
>> > + if (!(rt->dst.flags & DST_HOST) || dst_metrics_read_only(dst))
>> > + return;
>> > + if (peer && dst_metrics_ptr(dst) == peer->metrics)
>> > + return;
>> > +
>> > + dst->ops->cow_metrics(dst, old);
>> > + if (dst->_metrics != old) {
>> > + u32 *old_p = __DST_METRICS_PTR(old);
>> > +
>> > + memcpy(dst_metrics_ptr(dst), old_p, RTAX_MAX * sizeof(u32));
>> > + kfree(old_p);
>> > + }
>> > +}
>>
>> Hmmm... if inet_metrics_new() is true then ->cow_metrics() will copy the
>> metrics from old to new. So you therefore shouldn't have to do the copy
>> explicitly here.
>
> If we are replacing an existing host route with metrics, e.g.
>
> ip route add fec0::1 dev eth0 rto_min 1000
> ip route change fec0::1 dev eth0 rto_min 1500
>
> then peer will be the existing inetpeer and inet_metrics_new() will be
> false. However, we still need to copy the new metrics from the netlink
> message over the old ones.
>
>> If inet_metrics_new() is not true, you are overwriting non-new metrics.
>
> The only problem with this is IMHO that if inet_metrics_new() is true,
> i.e. when adding a new route with new inetpeer (or old inetpeer whose
> metrics were not used before), the memcpy() is done twice, once in
> ipv6_cow_metrics() and once in rt6_metrics_to_peer(). We are copying the
> same data twice so that the result is correct but it's not efficient and
> it's not nice.
>
> The only way I can see to avoid this (except using own metrics always
> instead of those in struct inetpeer as we do for non-host routes) would
> be not to call ipv6_cow_metrics() at all and write a special function
> for this purpose in net/ipv6/route.c which would duplicate the parts of
> ipv6_cow_metrics() we really need (and add the free()). Do you think
> this is the way to go?
Thank you for explaining all of this, I would like to think about this
some more.
My initial suspicion is that the something about the test in cow
metrics might need to be adjusted.
The conceptual attributes we have built for inetpeer metrics, that of
"newness" and "read-only", might not be built adequately for the task
at hand here.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-07 20:52 ` [PATCH net] " David Miller
@ 2014-03-07 21:38 ` Michal Kubecek
2014-03-08 8:34 ` Hannes Frederic Sowa
2014-03-08 8:06 ` Hannes Frederic Sowa
1 sibling, 1 reply; 31+ messages in thread
From: Michal Kubecek @ 2014-03-07 21:38 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
On Fri, Mar 07, 2014 at 03:52:58PM -0500, David Miller wrote:
> From: Michal Kubecek <mkubecek@suse.cz>
> Date: Thu, 6 Mar 2014 21:06:58 +0100
>
> > On Thu, Mar 06, 2014 at 02:24:25PM -0500, David Miller wrote:
> >>
> >> > +static void rt6_metrics_to_peer(struct rt6_info *rt)
> >> > +{
> >> > + struct inet_peer *peer = rt6_has_peer(rt) ? rt6_peer_ptr(rt) : NULL;
> >> > + struct dst_entry *dst = &rt->dst;
> >> > + unsigned long old = dst->_metrics;
> >> > +
> >> > + if (!(rt->dst.flags & DST_HOST) || dst_metrics_read_only(dst))
> >> > + return;
> >> > + if (peer && dst_metrics_ptr(dst) == peer->metrics)
> >> > + return;
> >> > +
> >> > + dst->ops->cow_metrics(dst, old);
> >> > + if (dst->_metrics != old) {
> >> > + u32 *old_p = __DST_METRICS_PTR(old);
> >> > +
> >> > + memcpy(dst_metrics_ptr(dst), old_p, RTAX_MAX * sizeof(u32));
> >> > + kfree(old_p);
> >> > + }
> >> > +}
> >>
> >> Hmmm... if inet_metrics_new() is true then ->cow_metrics() will copy the
> >> metrics from old to new. So you therefore shouldn't have to do the copy
> >> explicitly here.
> >
> > If we are replacing an existing host route with metrics, e.g.
> >
> > ip route add fec0::1 dev eth0 rto_min 1000
> > ip route change fec0::1 dev eth0 rto_min 1500
> >
> > then peer will be the existing inetpeer and inet_metrics_new() will be
> > false. However, we still need to copy the new metrics from the netlink
> > message over the old ones.
> >
> >> If inet_metrics_new() is not true, you are overwriting non-new metrics.
> >
> > The only problem with this is IMHO that if inet_metrics_new() is true,
> > i.e. when adding a new route with new inetpeer (or old inetpeer whose
> > metrics were not used before), the memcpy() is done twice, once in
> > ipv6_cow_metrics() and once in rt6_metrics_to_peer(). We are copying the
> > same data twice so that the result is correct but it's not efficient and
> > it's not nice.
> >
> > The only way I can see to avoid this (except using own metrics always
> > instead of those in struct inetpeer as we do for non-host routes) would
> > be not to call ipv6_cow_metrics() at all and write a special function
> > for this purpose in net/ipv6/route.c which would duplicate the parts of
> > ipv6_cow_metrics() we really need (and add the free()). Do you think
> > this is the way to go?
>
> Thank you for explaining all of this, I would like to think about this
> some more.
>
> My initial suspicion is that the something about the test in cow
> metrics might need to be adjusted.
>
> The conceptual attributes we have built for inetpeer metrics, that of
> "newness" and "read-only", might not be built adequately for the task
> at hand here.
Today I also realized another problem with current code: if we already
have inetpeer with metrics and use "ip route change" with metrics, e.g.
ip route add fec0::1 dev eth0 rto_min 1000
ip route change fec0::1 dev eth0 hoplimit 10
only the metrics listed in the netlink message (hoplimit here) are
modified and the rest (rto_min here) is preserved. This is inconsistent
with non-host IPv6 routes and IPv4 routes where the whole set of metrics
is replaced.
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-07 20:52 ` [PATCH net] " David Miller
2014-03-07 21:38 ` Michal Kubecek
@ 2014-03-08 8:06 ` Hannes Frederic Sowa
2014-03-10 0:26 ` David Miller
1 sibling, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-08 8:06 UTC (permalink / raw)
To: David Miller; +Cc: mkubecek, netdev, kuznet, jmorris, yoshfuji, kaber
On Fri, Mar 07, 2014 at 03:52:58PM -0500, David Miller wrote:
> Thank you for explaining all of this, I would like to think about this
> some more.
>
> My initial suspicion is that the something about the test in cow
> metrics might need to be adjusted.
Hmm, you mean we can do the check when looking up the route and testing that
on cloning or cowing and maybe do the copy then?
Just trying to think out loud:
I think we always should have the metrics (if we have them) in inet_peer
in case the dst is a DST_HOST. We cannot do that directly in ip6_route_add
because we would also publish the metrics in case we don't actually
install the route because of an error. So this is a no-go, we need to
stage them first.
In case we do that in ip6_rt_copy / ip6_pol_route:
We don't reinsert DST_HOST routes into fib so they will never go through
ip6_rt_copy, as such they are never pushed through ipv6_cow_metrics logic.
A new check would be needed in ip6_pol_route for DST_HOST routes and
check inet_peer metrics pointer with dst->metrics one. As we don't set
RTF_CACHE in rt6i_flags afterwards we would need to do that on every DST_HOST
result every time (if metrics pointer is writable).
So moving the metrics from staging area into inet_peer metrics in
rt2node seems like the best solution to me so far. I am fine with the
current approach. I also agree that current patch should not have a
performance impact, as we don't reinsert DST_HOST nodes into the fib,
so the new inline rt6_metrics_to_peer should be ok to protect against
this common case.
> The conceptual attributes we have built for inetpeer metrics, that of
> "newness" and "read-only", might not be built adequately for the task
> at hand here.
One question regarding the patch:
In case fc_mx is set we now always kzalloc a non-inet-peer region to
store the metrics for staging. I would find it a bit easier to switch
away from dst_metric_set and use array notation to make it more clear we
don't operate on inet_peer metrics below the install_route: mark in
ip6_route_add. This shouldn't alter the behaviour but just make it clear we
don't operate on inet_peer metric storage.
Hope this made any sense,
Hannes
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-07 21:38 ` Michal Kubecek
@ 2014-03-08 8:34 ` Hannes Frederic Sowa
0 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-08 8:34 UTC (permalink / raw)
To: Michal Kubecek; +Cc: David Miller, netdev, kuznet, jmorris, yoshfuji, kaber
On Fri, Mar 07, 2014 at 10:38:42PM +0100, Michal Kubecek wrote:
> Today I also realized another problem with current code: if we already
> have inetpeer with metrics and use "ip route change" with metrics, e.g.
>
> ip route add fec0::1 dev eth0 rto_min 1000
> ip route change fec0::1 dev eth0 hoplimit 10
Hmm, those routes seem to not have RTF_NONEXTHOP (no gateway). I am
a bit concerend that rt6_select returns !RTF_CACHE node and we do end
up doing the inetpeer lookup on every lookup on them. Maybe you have
already investigated?
Thanks,
Hannes
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-08 8:06 ` Hannes Frederic Sowa
@ 2014-03-10 0:26 ` David Miller
2014-03-10 0:52 ` Hannes Frederic Sowa
0 siblings, 1 reply; 31+ messages in thread
From: David Miller @ 2014-03-10 0:26 UTC (permalink / raw)
To: hannes; +Cc: mkubecek, netdev, kuznet, jmorris, yoshfuji, kaber
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Sat, 8 Mar 2014 09:06:16 +0100
> I think we always should have the metrics (if we have them) in inet_peer
> in case the dst is a DST_HOST. We cannot do that directly in ip6_route_add
> because we would also publish the metrics in case we don't actually
> install the route because of an error. So this is a no-go, we need to
> stage them first.
Absolutely, we cannot make the metrics visible until the final commit
of the route insert.
This worked transparently before we placed metrics into the inetpeer
cache, because the metrics lives in the dst itself.
I also agree with Michal's observarion of another unintended
behavioral change by this conversion. Specifically the handling of
metric changes of existing routes. The original code replaced all of
the metrics completely with defaults overlayed with whatever was
provided in the route change request, whereas the current inetpeer
code only updates metrics explicitly provided.
This is part of why we need to remove that DST_HOST check currently
guarding the kzalloc()'d metrics in ip6_route_add().
> So moving the metrics from staging area into inet_peer metrics in
> rt2node seems like the best solution to me so far. I am fine with the
> current approach.
I agree.
> In case fc_mx is set we now always kzalloc a non-inet-peer region to
> store the metrics for staging. I would find it a bit easier to switch
> away from dst_metric_set and use array notation to make it more clear we
> don't operate on inet_peer metrics below the install_route: mark in
> ip6_route_add. This shouldn't alter the behaviour but just make it clear we
> don't operate on inet_peer metric storage.
In fact, thinking about this some more, it's almost silly to allocate
this temporary array at all.
A different, potentially much cleaner approach, would be to pass the
cfg->fc_mx down into the __ip6_ins_rt() code path.
Then there is _no_ ambiguity. These are metrics from a netlink
request and we must write them into the final route which we end
up with.
This makes it's way down into fib6_add_rt2node(), and right before the
insertion we make the metrics writable, clear them, and copy in the
explicit values provided from the netlink message.
All of these operations can error, and that's fine, the call chain
can handle errors signalled from here already.
Michal, Hannes, what do you think?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-10 0:26 ` David Miller
@ 2014-03-10 0:52 ` Hannes Frederic Sowa
2014-03-10 5:03 ` David Miller
0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-10 0:52 UTC (permalink / raw)
To: David Miller; +Cc: mkubecek, netdev, kuznet, jmorris, yoshfuji, kaber
On Sun, Mar 09, 2014 at 08:26:58PM -0400, David Miller wrote:
> In fact, thinking about this some more, it's almost silly to allocate
> this temporary array at all.
>
> A different, potentially much cleaner approach, would be to pass the
> cfg->fc_mx down into the __ip6_ins_rt() code path.
Maybe some people call this a layering violation, but I like the idea! :)
> Then there is _no_ ambiguity. These are metrics from a netlink
> request and we must write them into the final route which we end
> up with.
Exactly, we can be sure that we don't accidentally always lookup
the inetpeer if we try to reinsert a cloned route where !(rt->rt6i_flags &
(RTF_NONEXTHOP | RTF_GATEWAY)) validates to true and DST_HOST is set.
This addresses the feedback from my other mail because I think if
we insert interface routes (ip r a 2000::/xx dev foo) we don't set
RTF_NONEXTHOP. (Maybe we should change that in rtm_to_fib6_config.)
> This makes it's way down into fib6_add_rt2node(), and right before the
> insertion we make the metrics writable, clear them, and copy in the
> explicit values provided from the netlink message.
>
> All of these operations can error, and that's fine, the call chain
> can handle errors signalled from here already.
>
> Michal, Hannes, what do you think?
So we need to change __ip6_ins_rt to take fib6_config as the second
argument instead, still have the info pointer in fc_nlinfo and only have
to wrap the info pointer in ip6_ins_rt into another structure. fib6_add
seems to be straightforward from there.
Sounds good to me!
Thanks,
Hannes
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-10 0:52 ` Hannes Frederic Sowa
@ 2014-03-10 5:03 ` David Miller
2014-03-10 8:15 ` Michal Kubecek
0 siblings, 1 reply; 31+ messages in thread
From: David Miller @ 2014-03-10 5:03 UTC (permalink / raw)
To: hannes; +Cc: mkubecek, netdev, kuznet, jmorris, yoshfuji, kaber
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 10 Mar 2014 01:52:20 +0100
> So we need to change __ip6_ins_rt to take fib6_config as the second
> argument instead, still have the info pointer in fc_nlinfo and only have
> to wrap the info pointer in ip6_ins_rt into another structure. fib6_add
> seems to be straightforward from there.
fib6_config is too large to place on the stack I think, I was more
thinking something along these lines:
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index aca0c27..9bcb220 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -284,7 +284,8 @@ struct fib6_node *fib6_locate(struct fib6_node *root,
void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
void *arg);
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info);
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len);
int fib6_del(struct rt6_info *rt, struct nl_info *info);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..941df4f 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -643,7 +643,7 @@ static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
*/
static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
- struct nl_info *info)
+ struct nl_info *info, struct nlattr *mx, int mx_len)
{
struct rt6_info *iter = NULL;
struct rt6_info **ins;
@@ -743,6 +743,30 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
}
+ if (mx) {
+ struct nlattr *nla;
+ bool was_writable;
+ int remaining;
+ u32 *mp;
+
+ was_writable = !dst_metrics_read_only(&rt->dst);
+ mp = dst_metrics_write_ptr(&rt->dst);
+
+ if (was_writable)
+ memset(mp, 0, RTAX_MAX * sizeof(u32));
+
+ nla_for_each_attr(nla, mx, mx_len, remaining) {
+ int type = nla_type(nla);
+
+ if (type) {
+ if (type > RTAX_MAX)
+ return -EINVAL;
+
+ mp[type - 1] = nla_get_u32(nla);
+ }
+ }
+ }
+
/*
* insert node
*/
@@ -806,7 +830,8 @@ void fib6_force_start_gc(struct net *net)
* with source addr info in sub-trees
*/
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
struct fib6_node *fn, *pn = NULL;
int err = -ENOMEM;
@@ -900,7 +925,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
}
#endif
- err = fib6_add_rt2node(fn, rt, info);
+ err = fib6_add_rt2node(fn, rt, info, mx, mx_len);
if (!err) {
fib6_start_gc(info->nl_net, rt);
if (!(rt->rt6i_flags & RTF_CACHE))
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fba54a4..a39d3da 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -857,14 +857,15 @@ EXPORT_SYMBOL(rt6_lookup);
be destroyed.
*/
-static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info)
+static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
int err;
struct fib6_table *table;
table = rt->rt6i_table;
write_lock_bh(&table->tb6_lock);
- err = fib6_add(&table->tb6_root, rt, info);
+ err = fib6_add(&table->tb6_root, rt, info, mx, mx_len);
write_unlock_bh(&table->tb6_lock);
return err;
@@ -875,7 +876,7 @@ int ip6_ins_rt(struct rt6_info *rt)
struct nl_info info = {
.nl_net = dev_net(rt->dst.dev),
};
- return __ip6_ins_rt(rt, &info);
+ return __ip6_ins_rt(rt, &info, NULL, 0);
}
static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort,
@@ -1546,14 +1547,6 @@ int ip6_route_add(struct fib6_config *cfg)
if (rt->rt6i_dst.plen == 128)
rt->dst.flags |= DST_HOST;
- if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
- u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
- if (!metrics) {
- err = -ENOMEM;
- goto out;
- }
- dst_init_metrics(&rt->dst, metrics, 0);
- }
#ifdef CONFIG_IPV6_SUBTREES
ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
rt->rt6i_src.plen = cfg->fc_src_len;
@@ -1672,31 +1665,13 @@ int ip6_route_add(struct fib6_config *cfg)
rt->rt6i_flags = cfg->fc_flags;
install_route:
- if (cfg->fc_mx) {
- struct nlattr *nla;
- int remaining;
-
- nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla_type(nla);
-
- if (type) {
- if (type > RTAX_MAX) {
- err = -EINVAL;
- goto out;
- }
-
- dst_metric_set(&rt->dst, type, nla_get_u32(nla));
- }
- }
- }
-
rt->dst.dev = dev;
rt->rt6i_idev = idev;
rt->rt6i_table = table;
cfg->fc_nlinfo.nl_net = dev_net(dev);
- return __ip6_ins_rt(rt, &cfg->fc_nlinfo);
+ return __ip6_ins_rt(rt, &cfg->fc_nlinfo, cfg->fc_mx, cfg->fc_mx_len);
out:
if (dev)
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-10 5:03 ` David Miller
@ 2014-03-10 8:15 ` Michal Kubecek
2014-03-10 12:00 ` Hannes Frederic Sowa
0 siblings, 1 reply; 31+ messages in thread
From: Michal Kubecek @ 2014-03-10 8:15 UTC (permalink / raw)
To: David Miller; +Cc: hannes, netdev, kuznet, jmorris, yoshfuji, kaber
On Mon, Mar 10, 2014 at 01:03:13AM -0400, David Miller wrote:
> @@ -743,6 +743,30 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
> BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
> }
>
> + if (mx) {
> + struct nlattr *nla;
> + bool was_writable;
> + int remaining;
> + u32 *mp;
> +
> + was_writable = !dst_metrics_read_only(&rt->dst);
> + mp = dst_metrics_write_ptr(&rt->dst);
> +
> + if (was_writable)
> + memset(mp, 0, RTAX_MAX * sizeof(u32));
> +
> + nla_for_each_attr(nla, mx, mx_len, remaining) {
> + int type = nla_type(nla);
> +
> + if (type) {
> + if (type > RTAX_MAX)
> + return -EINVAL;
> +
> + mp[type - 1] = nla_get_u32(nla);
> + }
> + }
> + }
> +
> /*
> * insert node
> */
This is too early. After this point, the insertion can still fail if
(!found && !add) (i.e. when trying to modify a non-existent route with
"ip route change").
> @@ -1546,14 +1547,6 @@ int ip6_route_add(struct fib6_config *cfg)
> if (rt->rt6i_dst.plen == 128)
> rt->dst.flags |= DST_HOST;
>
> - if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
> - u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
> - if (!metrics) {
> - err = -ENOMEM;
> - goto out;
> - }
> - dst_init_metrics(&rt->dst, metrics, 0);
> - }
> #ifdef CONFIG_IPV6_SUBTREES
> ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
> rt->rt6i_src.plen = cfg->fc_src_len;
I think this should stay. We still need to allocate a separate copy of
metrics in the non-DST_HOST case, otherwise we would reintroduce the
issue fixed by commit 8e2ec6391 ("ipv6: don't use inetpeer to store
metrics for routes."). Or rather get a null pointer dereference in
fib6_add_rt2node() as dst_metrics_write_ptr() would return NULL. An
alternative solution (and perhaps a more suitable one) would be to
allocate the block in ip6_cow_metrics() instead of returning NULL.
Other than that, I believe this should work. Actually, I was also
considering this approach but I wasn't brave enough to propose passing
those extra parameters all the way down to rt6_add_rt2node(). But if you
are OK with it, I agree that saving the extra kzalloc()/kfree() is worth
that bit of ugliness. We could also extract the metrics from the info
parameter we already have but it would be inefficient to parse the whole
message again.
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-10 8:15 ` Michal Kubecek
@ 2014-03-10 12:00 ` Hannes Frederic Sowa
2014-03-10 13:15 ` Michal Kubecek
2014-03-11 2:38 ` David Miller
0 siblings, 2 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-10 12:00 UTC (permalink / raw)
To: Michal Kubecek; +Cc: David Miller, netdev, kuznet, jmorris, yoshfuji, kaber
On Mon, Mar 10, 2014 at 09:15:33AM +0100, Michal Kubecek wrote:
> On Mon, Mar 10, 2014 at 01:03:13AM -0400, David Miller wrote:
> > @@ -743,6 +743,30 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
> > BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
> > }
> >
> > + if (mx) {
> > + struct nlattr *nla;
> > + bool was_writable;
> > + int remaining;
> > + u32 *mp;
> > +
> > + was_writable = !dst_metrics_read_only(&rt->dst);
> > + mp = dst_metrics_write_ptr(&rt->dst);
> > +
> > + if (was_writable)
> > + memset(mp, 0, RTAX_MAX * sizeof(u32));
> > +
> > + nla_for_each_attr(nla, mx, mx_len, remaining) {
> > + int type = nla_type(nla);
> > +
> > + if (type) {
> > + if (type > RTAX_MAX)
> > + return -EINVAL;
> > +
> > + mp[type - 1] = nla_get_u32(nla);
> > + }
> > + }
> > + }
> > +
> > /*
> > * insert node
> > */
>
> This is too early. After this point, the insertion can still fail if
> (!found && !add) (i.e. when trying to modify a non-existent route with
> "ip route change").
Yep, but we can make this a static function to ip6_fib.c and call it from the
code points you already proposed in your original patch?
> > @@ -1546,14 +1547,6 @@ int ip6_route_add(struct fib6_config *cfg)
> > if (rt->rt6i_dst.plen == 128)
> > rt->dst.flags |= DST_HOST;
> >
> > - if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
> > - u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
> > - if (!metrics) {
> > - err = -ENOMEM;
> > - goto out;
> > - }
> > - dst_init_metrics(&rt->dst, metrics, 0);
> > - }
> > #ifdef CONFIG_IPV6_SUBTREES
> > ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
> > rt->rt6i_src.plen = cfg->fc_src_len;
>
> I think this should stay. We still need to allocate a separate copy of
> metrics in the non-DST_HOST case, otherwise we would reintroduce the
> issue fixed by commit 8e2ec6391 ("ipv6: don't use inetpeer to store
> metrics for routes.").
Full ACK, we must only intantiate inetpeers for DST_HOST dsts but it seems to
be possible at this point, too.
The reason why I liked David's proposal is that it removes the ambiguity
when looking up the inetpeer (we only get the inetpeer if we are sure
a new route is added). I don't mind the kzalloc that much. The only
overhead it produces would be if a user inserts a route for a /128 target
and specifies metrics.
Maybe we can remove this ambiguity by checking the already present
info pointer.
> Or rather get a null pointer dereference in
> fib6_add_rt2node() as dst_metrics_write_ptr() would return NULL. An
> alternative solution (and perhaps a more suitable one) would be to
> allocate the block in ip6_cow_metrics() instead of returning NULL.
I am not sure of the advantage here. If this patch targets net I would
like to keep the changes a bit more straightforward. In case it gives
a nice advantage, maybe another patch for net-next? Haven't had a close look
on that yet.
> Other than that, I believe this should work. Actually, I was also
> considering this approach but I wasn't brave enough to propose passing
> those extra parameters all the way down to rt6_add_rt2node(). But if you
> are OK with it, I agree that saving the extra kzalloc()/kfree() is worth
> that bit of ugliness. We could also extract the metrics from the info
> parameter we already have but it would be inefficient to parse the whole
> message again.
Great! :)
Thanks,
Hannes
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-10 12:00 ` Hannes Frederic Sowa
@ 2014-03-10 13:15 ` Michal Kubecek
2014-03-11 2:38 ` David Miller
1 sibling, 0 replies; 31+ messages in thread
From: Michal Kubecek @ 2014-03-10 13:15 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: David Miller, netdev, kuznet, jmorris, yoshfuji, kaber
On Mon, Mar 10, 2014 at 01:00:16PM +0100, Hannes Frederic Sowa wrote:
> On Mon, Mar 10, 2014 at 09:15:33AM +0100, Michal Kubecek wrote:
> > On Mon, Mar 10, 2014 at 01:03:13AM -0400, David Miller wrote:
> > > @@ -743,6 +743,30 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
> > > BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
> > > }
> > >
> > > + if (mx) {
> > > + struct nlattr *nla;
> > > + bool was_writable;
> > > + int remaining;
> > > + u32 *mp;
> > > +
> > > + was_writable = !dst_metrics_read_only(&rt->dst);
> > > + mp = dst_metrics_write_ptr(&rt->dst);
> > > +
> > > + if (was_writable)
> > > + memset(mp, 0, RTAX_MAX * sizeof(u32));
> > > +
> > > + nla_for_each_attr(nla, mx, mx_len, remaining) {
> > > + int type = nla_type(nla);
> > > +
> > > + if (type) {
> > > + if (type > RTAX_MAX)
> > > + return -EINVAL;
> > > +
> > > + mp[type - 1] = nla_get_u32(nla);
> > > + }
> > > + }
> > > + }
> > > +
> > > /*
> > > * insert node
> > > */
> >
> > This is too early. After this point, the insertion can still fail if
> > (!found && !add) (i.e. when trying to modify a non-existent route with
> > "ip route change").
>
> Yep, but we can make this a static function to ip6_fib.c and call it from the
> code points you already proposed in your original patch?
Yes, this sounds like the best option.
> The reason why I liked David's proposal is that it removes the ambiguity
> when looking up the inetpeer (we only get the inetpeer if we are sure
> a new route is added). I don't mind the kzalloc that much. The only
> overhead it produces would be if a user inserts a route for a /128 target
> and specifies metrics.
Right, I didn't realize that.
> > Or rather get a null pointer dereference in
> > fib6_add_rt2node() as dst_metrics_write_ptr() would return NULL. An
> > alternative solution (and perhaps a more suitable one) would be to
> > allocate the block in ip6_cow_metrics() instead of returning NULL.
>
> I am not sure of the advantage here. If this patch targets net I would
> like to keep the changes a bit more straightforward. In case it gives
> a nice advantage, maybe another patch for net-next? Haven't had a close look
> on that yet.
Agreed. It would only delay the allocation for non-DST_HOST routes to
the point we are sure we need it; on the other hand, it would require
reorganizing the ip6_cow_metrics() code so I guess it can wait for
net-next (if at all).
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-10 12:00 ` Hannes Frederic Sowa
2014-03-10 13:15 ` Michal Kubecek
@ 2014-03-11 2:38 ` David Miller
2014-03-11 9:53 ` Michal Kubecek
1 sibling, 1 reply; 31+ messages in thread
From: David Miller @ 2014-03-11 2:38 UTC (permalink / raw)
To: hannes; +Cc: mkubecek, netdev, kuznet, jmorris, yoshfuji, kaber
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 10 Mar 2014 13:00:16 +0100
> On Mon, Mar 10, 2014 at 09:15:33AM +0100, Michal Kubecek wrote:
...
> Full ACK, we must only intantiate inetpeers for DST_HOST dsts but it seems to
> be possible at this point, too.
...
>> Other than that, I believe this should work. Actually, I was also
>> considering this approach but I wasn't brave enough to propose passing
>> those extra parameters all the way down to rt6_add_rt2node(). But if you
>> are OK with it, I agree that saving the extra kzalloc()/kfree() is worth
>> that bit of ugliness. We could also extract the metrics from the info
>> parameter we already have but it would be inefficient to parse the whole
>> message again.
>
> Great! :)
Does this version address everyone's concerns? Michal can you test it
for me?
Changes since my original patch:
1) non-DST_HOST routes keep getting the private, non-inetpeer, kzalloc()'d
metrics
2) only instantiate the inetpeer when we really do commit to linking
the route into the tree
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index aca0c27..9bcb220 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -284,7 +284,8 @@ struct fib6_node *fib6_locate(struct fib6_node *root,
void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
void *arg);
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info);
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len);
int fib6_del(struct rt6_info *rt, struct nl_info *info);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..d342086 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -638,12 +638,38 @@ static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
RTF_GATEWAY;
}
+static int fib_commit_metrics(struct dst_entry *dst, struct nlattr *mx, int mx_len)
+{
+ struct nlattr *nla;
+ bool was_writable;
+ int remaining;
+ u32 *mp;
+
+ was_writable = !dst_metrics_read_only(dst);
+ mp = dst_metrics_write_ptr(dst);
+
+ if (was_writable)
+ memset(mp, 0, RTAX_MAX * sizeof(u32));
+
+ nla_for_each_attr(nla, mx, mx_len, remaining) {
+ int type = nla_type(nla);
+
+ if (type) {
+ if (type > RTAX_MAX)
+ return -EINVAL;
+
+ mp[type - 1] = nla_get_u32(nla);
+ }
+ }
+ return 0;
+}
+
/*
* Insert routing information in a node.
*/
static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
- struct nl_info *info)
+ struct nl_info *info, struct nlattr *mx, int mx_len)
{
struct rt6_info *iter = NULL;
struct rt6_info **ins;
@@ -653,6 +679,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
(info->nlh->nlmsg_flags & NLM_F_CREATE));
int found = 0;
bool rt_can_ecmp = rt6_qualify_for_ecmp(rt);
+ int err;
ins = &fn->leaf;
@@ -751,6 +778,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
pr_warn("NLM_F_CREATE should be set when creating new route\n");
add:
+ err = fib_commit_metrics(&rt->dst, mx, mx_len);
+ if (err)
+ return err;
rt->dst.rt6_next = iter;
*ins = rt;
rt->rt6i_node = fn;
@@ -770,6 +800,9 @@ add:
pr_warn("NLM_F_REPLACE set, but no existing node found!\n");
return -ENOENT;
}
+ err = fib_commit_metrics(&rt->dst, mx, mx_len);
+ if (err)
+ return err;
*ins = rt;
rt->rt6i_node = fn;
rt->dst.rt6_next = iter->dst.rt6_next;
@@ -806,7 +839,8 @@ void fib6_force_start_gc(struct net *net)
* with source addr info in sub-trees
*/
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
struct fib6_node *fn, *pn = NULL;
int err = -ENOMEM;
@@ -900,7 +934,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
}
#endif
- err = fib6_add_rt2node(fn, rt, info);
+ err = fib6_add_rt2node(fn, rt, info, mx, mx_len);
if (!err) {
fib6_start_gc(info->nl_net, rt);
if (!(rt->rt6i_flags & RTF_CACHE))
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fba54a4..d6aead7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -857,14 +857,15 @@ EXPORT_SYMBOL(rt6_lookup);
be destroyed.
*/
-static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info)
+static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
int err;
struct fib6_table *table;
table = rt->rt6i_table;
write_lock_bh(&table->tb6_lock);
- err = fib6_add(&table->tb6_root, rt, info);
+ err = fib6_add(&table->tb6_root, rt, info, mx, mx_len);
write_unlock_bh(&table->tb6_lock);
return err;
@@ -875,7 +876,7 @@ int ip6_ins_rt(struct rt6_info *rt)
struct nl_info info = {
.nl_net = dev_net(rt->dst.dev),
};
- return __ip6_ins_rt(rt, &info);
+ return __ip6_ins_rt(rt, &info, NULL, 0);
}
static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort,
@@ -1672,31 +1673,13 @@ int ip6_route_add(struct fib6_config *cfg)
rt->rt6i_flags = cfg->fc_flags;
install_route:
- if (cfg->fc_mx) {
- struct nlattr *nla;
- int remaining;
-
- nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla_type(nla);
-
- if (type) {
- if (type > RTAX_MAX) {
- err = -EINVAL;
- goto out;
- }
-
- dst_metric_set(&rt->dst, type, nla_get_u32(nla));
- }
- }
- }
-
rt->dst.dev = dev;
rt->rt6i_idev = idev;
rt->rt6i_table = table;
cfg->fc_nlinfo.nl_net = dev_net(dev);
- return __ip6_ins_rt(rt, &cfg->fc_nlinfo);
+ return __ip6_ins_rt(rt, &cfg->fc_nlinfo, cfg->fc_mx, cfg->fc_mx_len);
out:
if (dev)
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-11 2:38 ` David Miller
@ 2014-03-11 9:53 ` Michal Kubecek
2014-03-11 15:08 ` Michal Kubecek
0 siblings, 1 reply; 31+ messages in thread
From: Michal Kubecek @ 2014-03-11 9:53 UTC (permalink / raw)
To: David Miller; +Cc: hannes, netdev, kuznet, jmorris, yoshfuji, kaber
On Mon, Mar 10, 2014 at 10:38:41PM -0400, David Miller wrote:
>
> Does this version address everyone's concerns? Michal can you test it
> for me?
There is still a problem:
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1s
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
IMHO this comes from the was_writable test in fib_commit_metrics():
> +static int fib_commit_metrics(struct dst_entry *dst, struct nlattr *mx, int mx_len)
> +{
> + struct nlattr *nla;
> + bool was_writable;
> + int remaining;
> + u32 *mp;
> +
> + was_writable = !dst_metrics_read_only(dst);
> + mp = dst_metrics_write_ptr(dst);
> +
> + if (was_writable)
> + memset(mp, 0, RTAX_MAX * sizeof(u32));
> +
> + nla_for_each_attr(nla, mx, mx_len, remaining) {
> + int type = nla_type(nla);
> +
> + if (type) {
> + if (type > RTAX_MAX)
> + return -EINVAL;
> +
> + mp[type - 1] = nla_get_u32(nla);
> + }
> + }
> + return 0;
> +}
For DST_HOST, was_writable is always false as at this point,
dst->_metrics still points to the shared default. I believe what we want
to know is whether the _original_ dst_entry (the one we are replacing,
if any) has writable metrics. Because if it has, we are reusing them, if
not, we get new ones. For non-DST_HOST routes, we can always skip the
memset() as we never share (writeable) metrics between old and new in
the non-DST_HOST case.
I also believe the function should return immediately if mx is null so
that we don't call dst_metrics_write_ptr() if no metrics are to be set
for the new route.
I'll test these changes and send an updated patch if I find no problem
with it.
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-11 9:53 ` Michal Kubecek
@ 2014-03-11 15:08 ` Michal Kubecek
2014-03-11 15:20 ` Hannes Frederic Sowa
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Michal Kubecek @ 2014-03-11 15:08 UTC (permalink / raw)
To: David Miller; +Cc: hannes, netdev, kuznet, jmorris, yoshfuji, kaber
On Tue, Mar 11, 2014 at 10:53:09AM +0100, Michal Kubecek wrote:
> On Mon, Mar 10, 2014 at 10:38:41PM -0400, David Miller wrote:
>
> > +static int fib_commit_metrics(struct dst_entry *dst, struct nlattr *mx, int mx_len)
> > +{
> > + struct nlattr *nla;
> > + bool was_writable;
> > + int remaining;
> > + u32 *mp;
> > +
> > + was_writable = !dst_metrics_read_only(dst);
> > + mp = dst_metrics_write_ptr(dst);
> > +
> > + if (was_writable)
> > + memset(mp, 0, RTAX_MAX * sizeof(u32));
> > +
> > + nla_for_each_attr(nla, mx, mx_len, remaining) {
> > + int type = nla_type(nla);
> > +
> > + if (type) {
> > + if (type > RTAX_MAX)
> > + return -EINVAL;
> > +
> > + mp[type - 1] = nla_get_u32(nla);
> > + }
> > + }
> > + return 0;
> > +}
>
> For DST_HOST, was_writable is always false as at this point,
> dst->_metrics still points to the shared default. I believe what we want
> to know is whether the _original_ dst_entry (the one we are replacing,
> if any) has writable metrics. Because if it has, we are reusing them, if
> not, we get new ones. For non-DST_HOST routes, we can always skip the
> memset() as we never share (writeable) metrics between old and new in
> the non-DST_HOST case.
>
> I also believe the function should return immediately if mx is null so
> that we don't call dst_metrics_write_ptr() if no metrics are to be set
> for the new route.
Not so easy... :-( This would cause a problem if a host route is changed
twice in this way:
ip route add fec0::1 dev eth0 rto_min 1000
ip route change fec0::1 dev eth0
ip route change fec0::1 dev eth0 hoplimit 10
First route has metrics in its inetpeer. This inetpeer is then inherited
by the second route but the metrics in it are not used as its dst_entry
points to the read-only default. But when it is replaced by the third
version, it inherits the inetpeer and it is not cleaned up by
ip6_cow_metrics() because it is not new.
What I ended up with is below. It uses the metrics in inetpeer if there
is one even if the new host metric doesn't have any metrics to set (in
which case it clears them first). I tested various scenarios and the
results were correct.
Changes against the previous version:
1. fib_commit_metrics() renamed to fib6_commit_metrics() for consistency
2. old_dst parameter added to it; it is set to the route we are replacing
or null if a new route is being added
3. if no mx/mx_len are passed, do nothing except the case we are
inheriting an inetpeer with metrics from the old route
4. zero the metrics if DST_HOST is set and the _old_ route has metrics
writeable
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index aca0c27..9bcb220 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -284,7 +284,8 @@ struct fib6_node *fib6_locate(struct fib6_node *root,
void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
void *arg);
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info);
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len);
int fib6_del(struct rt6_info *rt, struct nl_info *info);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..8111248 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -638,12 +638,42 @@ static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
RTF_GATEWAY;
}
+static int fib6_commit_metrics(struct dst_entry *old_dst, struct dst_entry *dst,
+ struct nlattr *mx, int mx_len)
+{
+ struct nlattr *nla;
+ int remaining;
+ u32 *mp;
+
+ /* no new metrics and not inheriting an inetpeer with old ones */
+ if (!mx && (!(dst->flags & DST_HOST) || !old_dst ||
+ dst_metrics_read_only(old_dst)))
+ return 0;
+
+ mp = dst_metrics_write_ptr(dst);
+ if ((dst->flags & DST_HOST) &&
+ old_dst && !dst_metrics_read_only(old_dst))
+ memset(mp, 0, RTAX_MAX * sizeof(u32));
+
+ nla_for_each_attr(nla, mx, mx_len, remaining) {
+ int type = nla_type(nla);
+
+ if (type) {
+ if (type > RTAX_MAX)
+ return -EINVAL;
+
+ mp[type - 1] = nla_get_u32(nla);
+ }
+ }
+ return 0;
+}
+
/*
* Insert routing information in a node.
*/
static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
- struct nl_info *info)
+ struct nl_info *info, struct nlattr *mx, int mx_len)
{
struct rt6_info *iter = NULL;
struct rt6_info **ins;
@@ -653,6 +683,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
(info->nlh->nlmsg_flags & NLM_F_CREATE));
int found = 0;
bool rt_can_ecmp = rt6_qualify_for_ecmp(rt);
+ int err;
ins = &fn->leaf;
@@ -751,6 +782,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
pr_warn("NLM_F_CREATE should be set when creating new route\n");
add:
+ err = fib6_commit_metrics(NULL, &rt->dst, mx, mx_len);
+ if (err)
+ return err;
rt->dst.rt6_next = iter;
*ins = rt;
rt->rt6i_node = fn;
@@ -770,6 +804,9 @@ add:
pr_warn("NLM_F_REPLACE set, but no existing node found!\n");
return -ENOENT;
}
+ err = fib6_commit_metrics(&iter->dst, &rt->dst, mx, mx_len);
+ if (err)
+ return err;
*ins = rt;
rt->rt6i_node = fn;
rt->dst.rt6_next = iter->dst.rt6_next;
@@ -806,7 +843,8 @@ void fib6_force_start_gc(struct net *net)
* with source addr info in sub-trees
*/
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
struct fib6_node *fn, *pn = NULL;
int err = -ENOMEM;
@@ -900,7 +938,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
}
#endif
- err = fib6_add_rt2node(fn, rt, info);
+ err = fib6_add_rt2node(fn, rt, info, mx, mx_len);
if (!err) {
fib6_start_gc(info->nl_net, rt);
if (!(rt->rt6i_flags & RTF_CACHE))
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fba54a4..d6aead7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -857,14 +857,15 @@ EXPORT_SYMBOL(rt6_lookup);
be destroyed.
*/
-static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info)
+static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
int err;
struct fib6_table *table;
table = rt->rt6i_table;
write_lock_bh(&table->tb6_lock);
- err = fib6_add(&table->tb6_root, rt, info);
+ err = fib6_add(&table->tb6_root, rt, info, mx, mx_len);
write_unlock_bh(&table->tb6_lock);
return err;
@@ -875,7 +876,7 @@ int ip6_ins_rt(struct rt6_info *rt)
struct nl_info info = {
.nl_net = dev_net(rt->dst.dev),
};
- return __ip6_ins_rt(rt, &info);
+ return __ip6_ins_rt(rt, &info, NULL, 0);
}
static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort,
@@ -1672,31 +1673,13 @@ int ip6_route_add(struct fib6_config *cfg)
rt->rt6i_flags = cfg->fc_flags;
install_route:
- if (cfg->fc_mx) {
- struct nlattr *nla;
- int remaining;
-
- nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla_type(nla);
-
- if (type) {
- if (type > RTAX_MAX) {
- err = -EINVAL;
- goto out;
- }
-
- dst_metric_set(&rt->dst, type, nla_get_u32(nla));
- }
- }
- }
-
rt->dst.dev = dev;
rt->rt6i_idev = idev;
rt->rt6i_table = table;
cfg->fc_nlinfo.nl_net = dev_net(dev);
- return __ip6_ins_rt(rt, &cfg->fc_nlinfo);
+ return __ip6_ins_rt(rt, &cfg->fc_nlinfo, cfg->fc_mx, cfg->fc_mx_len);
out:
if (dev)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-11 15:08 ` Michal Kubecek
@ 2014-03-11 15:20 ` Hannes Frederic Sowa
2014-03-11 15:39 ` Michal Kubecek
2014-03-12 20:54 ` [PATCH net] " David Miller
2 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-11 15:20 UTC (permalink / raw)
To: Michal Kubecek; +Cc: David Miller, netdev, kuznet, jmorris, yoshfuji, kaber
On Tue, Mar 11, 2014 at 04:08:14PM +0100, Michal Kubecek wrote:
> +static int fib6_commit_metrics(struct dst_entry *old_dst, struct dst_entry *dst,
> + struct nlattr *mx, int mx_len)
> +{
> + struct nlattr *nla;
> + int remaining;
> + u32 *mp;
> +
> + /* no new metrics and not inheriting an inetpeer with old ones */
> + if (!mx && (!(dst->flags & DST_HOST) || !old_dst ||
> + dst_metrics_read_only(old_dst)))
> + return 0;
> +
> + mp = dst_metrics_write_ptr(dst);
> + if ((dst->flags & DST_HOST) &&
> + old_dst && !dst_metrics_read_only(old_dst))
> + memset(mp, 0, RTAX_MAX * sizeof(u32));
> +
Small question:
Wouldn't make it easier if we move the kzalloc/dst_init_metrics part to
here in case we aren't a DST_HOST and remove it from ip6_route_add, too?
Otherwise (but really just a first look!), it seems ok to me.
Thanks,
Hannes
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-11 15:08 ` Michal Kubecek
2014-03-11 15:20 ` Hannes Frederic Sowa
@ 2014-03-11 15:39 ` Michal Kubecek
2014-03-25 19:11 ` Hannes Frederic Sowa
2014-03-12 20:54 ` [PATCH net] " David Miller
2 siblings, 1 reply; 31+ messages in thread
From: Michal Kubecek @ 2014-03-11 15:39 UTC (permalink / raw)
To: David Miller; +Cc: hannes, netdev, kuznet, jmorris, yoshfuji, kaber
On Tue, Mar 11, 2014 at 04:08:14PM +0100, Michal Kubecek wrote:
> >
> > I also believe the function should return immediately if mx is null so
> > that we don't call dst_metrics_write_ptr() if no metrics are to be set
> > for the new route.
>
> Not so easy... :-( This would cause a problem if a host route is changed
> twice in this way:
>
> ip route add fec0::1 dev eth0 rto_min 1000
> ip route change fec0::1 dev eth0
> ip route change fec0::1 dev eth0 hoplimit 10
>
> First route has metrics in its inetpeer. This inetpeer is then inherited
> by the second route but the metrics in it are not used as its dst_entry
> points to the read-only default. But when it is replaced by the third
> version, it inherits the inetpeer and it is not cleaned up by
> ip6_cow_metrics() because it is not new.
>
> What I ended up with is below. It uses the metrics in inetpeer if there
> is one even if the new host metric doesn't have any metrics to set (in
> which case it clears them first). I tested various scenarios and the
> results were correct.
This doesn't cover all cases either:
ip route add fec0::1 dev eth0 rto_min 1000
ip route delete fec0::1
ip route add fec0::1 dev eth0
ip route change fec0::1 dev eth0 hoplimit 10
This way the inetpeer with rto_min 1000 persists until the fourth
command but neither third nor fourth see old dst_entry with writeable
metrics.
I think changing the condition for copying in ip6_cow_metrics() would
do the trick but I better stop now and look at it again later with
a fresh mind.
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-11 15:08 ` Michal Kubecek
2014-03-11 15:20 ` Hannes Frederic Sowa
2014-03-11 15:39 ` Michal Kubecek
@ 2014-03-12 20:54 ` David Miller
2 siblings, 0 replies; 31+ messages in thread
From: David Miller @ 2014-03-12 20:54 UTC (permalink / raw)
To: mkubecek; +Cc: hannes, netdev, kuznet, jmorris, yoshfuji, kaber
From: Michal Kubecek <mkubecek@suse.cz>
Date: Tue, 11 Mar 2014 16:08:14 +0100
> + nla_for_each_attr(nla, mx, mx_len, remaining) {
> + int type = nla_type(nla);
> +
> + if (type) {
> + if (type > RTAX_MAX)
> + return -EINVAL;
> +
> + mp[type - 1] = nla_get_u32(nla);
> + }
> + }
It was surprising to me that nla_for_each_attr() works with a second
argument of NULL.
It only does so when the length is zero, due to how the test in nla_ok()
is codified.
I know it's a minor nit, but can you only execute this loop if mx is
non-NULL?
Perhaps you can put this into (yet another) helper function:
int __fib6_commit_metrics(u32 *mp, struct nlattr *mx, int mx_len)
Otherwise looks good to me.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-11 15:39 ` Michal Kubecek
@ 2014-03-25 19:11 ` Hannes Frederic Sowa
2014-03-26 15:09 ` Michal Kubecek
2014-03-26 15:42 ` [PATCH net-next v3] " Michal Kubecek
0 siblings, 2 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-25 19:11 UTC (permalink / raw)
To: Michal Kubecek; +Cc: David Miller, netdev, kuznet, jmorris, yoshfuji, kaber
Hi Michal,
On Tue, Mar 11, 2014 at 04:39:31PM +0100, Michal Kubecek wrote:
> On Tue, Mar 11, 2014 at 04:08:14PM +0100, Michal Kubecek wrote:
> > >
> > > I also believe the function should return immediately if mx is null so
> > > that we don't call dst_metrics_write_ptr() if no metrics are to be set
> > > for the new route.
> >
> > Not so easy... :-( This would cause a problem if a host route is changed
> > twice in this way:
> >
> > ip route add fec0::1 dev eth0 rto_min 1000
> > ip route change fec0::1 dev eth0
> > ip route change fec0::1 dev eth0 hoplimit 10
> >
> > First route has metrics in its inetpeer. This inetpeer is then inherited
> > by the second route but the metrics in it are not used as its dst_entry
> > points to the read-only default. But when it is replaced by the third
> > version, it inherits the inetpeer and it is not cleaned up by
> > ip6_cow_metrics() because it is not new.
> >
> > What I ended up with is below. It uses the metrics in inetpeer if there
> > is one even if the new host metric doesn't have any metrics to set (in
> > which case it clears them first). I tested various scenarios and the
> > results were correct.
>
> This doesn't cover all cases either:
>
> ip route add fec0::1 dev eth0 rto_min 1000
> ip route delete fec0::1
> ip route add fec0::1 dev eth0
> ip route change fec0::1 dev eth0 hoplimit 10
>
> This way the inetpeer with rto_min 1000 persists until the fourth
> command but neither third nor fourth see old dst_entry with writeable
> metrics.
>
> I think changing the condition for copying in ip6_cow_metrics() would
> do the trick but I better stop now and look at it again later with
> a fresh mind.
Do you need some help regarding this patch so that it may be included in
net-next before it gets closed?
Thanks,
Hannes
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-25 19:11 ` Hannes Frederic Sowa
@ 2014-03-26 15:09 ` Michal Kubecek
2014-03-26 15:42 ` [PATCH net-next v3] " Michal Kubecek
1 sibling, 0 replies; 31+ messages in thread
From: Michal Kubecek @ 2014-03-26 15:09 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: David Miller, netdev, kuznet, jmorris, yoshfuji, kaber
On Tue, Mar 25, 2014 at 08:11:55PM +0100, Hannes Frederic Sowa wrote:
> > > What I ended up with is below. It uses the metrics in inetpeer if there
> > > is one even if the new host metric doesn't have any metrics to set (in
> > > which case it clears them first). I tested various scenarios and the
> > > results were correct.
> >
> > This doesn't cover all cases either:
> >
> > ip route add fec0::1 dev eth0 rto_min 1000
> > ip route delete fec0::1
> > ip route add fec0::1 dev eth0
> > ip route change fec0::1 dev eth0 hoplimit 10
> >
> > This way the inetpeer with rto_min 1000 persists until the fourth
> > command but neither third nor fourth see old dst_entry with writeable
> > metrics.
> >
> > I think changing the condition for copying in ip6_cow_metrics() would
> > do the trick but I better stop now and look at it again later with
> > a fresh mind.
>
> Do you need some help regarding this patch so that it may be included in
> net-next before it gets closed?
Sorry for the delay, I was working on some more urgent bugs.
While rethinking the problem, I realized that these metrics hidden in
old inetpeer are actually even more dangerous. Assume we create a (host)
route with metrics, delete it but the inetpeer with the metrics still
exists. If we then create a new route without metrics, this inetpeer is
(later) bound to it but metrics in it are not used. However, as soon as
ipv6_cow_metrics() is called, they are linked into struct dst_entry but
as they are not "new", they are not overwritten.
The approach I tried was to add a new flag DST_KEEP_METRICS (should
rather be something like DST_OVERRIDE_INETPEER_METRICS but that looks
awful) which is set in ip6_route_add() and makes ipv6_cow_metrics()
always rewrite metrics in attached inetpeer and then reset the flag. In
ip6_rt_copy(), this flag is not copied (but the metrics are overwritten
anyway).
I ran some tests with all scenarios I could think of and all seem to
work fine. I'll post the patch for review in a moment.
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next v3] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-25 19:11 ` Hannes Frederic Sowa
2014-03-26 15:09 ` Michal Kubecek
@ 2014-03-26 15:42 ` Michal Kubecek
2014-03-26 15:50 ` Hannes Frederic Sowa
1 sibling, 1 reply; 31+ messages in thread
From: Michal Kubecek @ 2014-03-26 15:42 UTC (permalink / raw)
To: David S. Miller
Cc: Hannes Frederic Sowa, netdev, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.
Another problem is that old metrics in inetpeer can reappear
unexpectedly for a new route, e.g.
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip route del fec0::1
12sp0:~ # ip route add fec0::1 dev eth0
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
Resolve the first problem by moving the setting of metrics down
into fib6_add_rt2node() to the point we are sure we are
inserting the new route into the tree. Second problem is
addressed by introducing new flag DST_KEEP_METRICS which is set
for a new host route in ip6_route_add() and makes
ipv6_cow_metrics() always overwrite the metrics in inetpeer
(even if they are not "new"); it is reset after that.
v3: rewritten based on David Miller's idea to move setting the
metrics (and allocation in non-host case) down to the point we
already know the route is to be inserted. Also rebased to
net-next as it is quite late in the cycle.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
include/net/dst.h | 1 +
include/net/ip6_fib.h | 3 ++-
net/ipv6/ip6_fib.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
net/ipv6/route.c | 41 +++++++++--------------------------------
4 files changed, 56 insertions(+), 36 deletions(-)
diff --git a/include/net/dst.h b/include/net/dst.h
index e01a826..8cf6772 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -57,6 +57,7 @@ struct dst_entry {
#define DST_FAKE_RTABLE 0x0040
#define DST_XFRM_TUNNEL 0x0080
#define DST_XFRM_QUEUE 0x0100
+#define DST_KEEP_METRICS 0x0400
unsigned short pending_confirm;
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index aca0c27..9bcb220 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -284,7 +284,8 @@ struct fib6_node *fib6_locate(struct fib6_node *root,
void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
void *arg);
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info);
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len);
int fib6_del(struct rt6_info *rt, struct nl_info *info);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..4ee487b 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -638,12 +638,41 @@ static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
RTF_GATEWAY;
}
+static int fib6_commit_metrics(struct dst_entry *dst,
+ struct nlattr *mx, int mx_len)
+{
+ struct nlattr *nla;
+ int remaining;
+ u32 *mp;
+
+ if (dst->flags & DST_HOST) {
+ mp = dst_metrics_write_ptr(dst);
+ } else {
+ mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+ if (!mp)
+ return -ENOMEM;
+ dst_init_metrics(dst, mp, 0);
+ }
+
+ nla_for_each_attr(nla, mx, mx_len, remaining) {
+ int type = nla_type(nla);
+
+ if (type) {
+ if (type > RTAX_MAX)
+ return -EINVAL;
+
+ mp[type - 1] = nla_get_u32(nla);
+ }
+ }
+ return 0;
+}
+
/*
* Insert routing information in a node.
*/
static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
- struct nl_info *info)
+ struct nl_info *info, struct nlattr *mx, int mx_len)
{
struct rt6_info *iter = NULL;
struct rt6_info **ins;
@@ -653,6 +682,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
(info->nlh->nlmsg_flags & NLM_F_CREATE));
int found = 0;
bool rt_can_ecmp = rt6_qualify_for_ecmp(rt);
+ int err;
ins = &fn->leaf;
@@ -751,6 +781,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
pr_warn("NLM_F_CREATE should be set when creating new route\n");
add:
+ if (mx) {
+ err = fib6_commit_metrics(&rt->dst, mx, mx_len);
+ if (err)
+ return err;
+ }
rt->dst.rt6_next = iter;
*ins = rt;
rt->rt6i_node = fn;
@@ -770,6 +805,11 @@ add:
pr_warn("NLM_F_REPLACE set, but no existing node found!\n");
return -ENOENT;
}
+ if (mx) {
+ err = fib6_commit_metrics(&rt->dst, mx, mx_len);
+ if (err)
+ return err;
+ }
*ins = rt;
rt->rt6i_node = fn;
rt->dst.rt6_next = iter->dst.rt6_next;
@@ -806,7 +846,8 @@ void fib6_force_start_gc(struct net *net)
* with source addr info in sub-trees
*/
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
struct fib6_node *fn, *pn = NULL;
int err = -ENOMEM;
@@ -900,7 +941,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
}
#endif
- err = fib6_add_rt2node(fn, rt, info);
+ err = fib6_add_rt2node(fn, rt, info, mx, mx_len);
if (!err) {
fib6_start_gc(info->nl_net, rt);
if (!(rt->rt6i_flags & RTF_CACHE))
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fba54a4..b6dda6a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -149,8 +149,10 @@ static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
unsigned long prev, new;
p = peer->metrics;
- if (inet_metrics_new(peer))
+ if (inet_metrics_new(peer) || (dst->flags | DST_KEEP_METRICS)) {
memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
+ dst->flags &= ~DST_KEEP_METRICS;
+ }
new = (unsigned long) p;
prev = cmpxchg(&dst->_metrics, old, new);
@@ -857,14 +859,15 @@ EXPORT_SYMBOL(rt6_lookup);
be destroyed.
*/
-static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info)
+static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
int err;
struct fib6_table *table;
table = rt->rt6i_table;
write_lock_bh(&table->tb6_lock);
- err = fib6_add(&table->tb6_root, rt, info);
+ err = fib6_add(&table->tb6_root, rt, info, mx, mx_len);
write_unlock_bh(&table->tb6_lock);
return err;
@@ -875,7 +878,7 @@ int ip6_ins_rt(struct rt6_info *rt)
struct nl_info info = {
.nl_net = dev_net(rt->dst.dev),
};
- return __ip6_ins_rt(rt, &info);
+ return __ip6_ins_rt(rt, &info, NULL, 0);
}
static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort,
@@ -1544,16 +1547,8 @@ int ip6_route_add(struct fib6_config *cfg)
ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
rt->rt6i_dst.plen = cfg->fc_dst_len;
if (rt->rt6i_dst.plen == 128)
- rt->dst.flags |= DST_HOST;
+ rt->dst.flags |= DST_HOST | DST_KEEP_METRICS;
- if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
- u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
- if (!metrics) {
- err = -ENOMEM;
- goto out;
- }
- dst_init_metrics(&rt->dst, metrics, 0);
- }
#ifdef CONFIG_IPV6_SUBTREES
ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
rt->rt6i_src.plen = cfg->fc_src_len;
@@ -1672,31 +1667,13 @@ int ip6_route_add(struct fib6_config *cfg)
rt->rt6i_flags = cfg->fc_flags;
install_route:
- if (cfg->fc_mx) {
- struct nlattr *nla;
- int remaining;
-
- nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla_type(nla);
-
- if (type) {
- if (type > RTAX_MAX) {
- err = -EINVAL;
- goto out;
- }
-
- dst_metric_set(&rt->dst, type, nla_get_u32(nla));
- }
- }
- }
-
rt->dst.dev = dev;
rt->rt6i_idev = idev;
rt->rt6i_table = table;
cfg->fc_nlinfo.nl_net = dev_net(dev);
- return __ip6_ins_rt(rt, &cfg->fc_nlinfo);
+ return __ip6_ins_rt(rt, &cfg->fc_nlinfo, cfg->fc_mx, cfg->fc_mx_len);
out:
if (dev)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-26 15:42 ` [PATCH net-next v3] " Michal Kubecek
@ 2014-03-26 15:50 ` Hannes Frederic Sowa
2014-03-26 15:56 ` Michal Kubecek
2014-03-26 16:05 ` [PATCH net-next v4] " Michal Kubecek
0 siblings, 2 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-26 15:50 UTC (permalink / raw)
To: Michal Kubecek
Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On Wed, Mar 26, 2014 at 04:42:45PM +0100, Michal Kubecek wrote:
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index fba54a4..b6dda6a 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -149,8 +149,10 @@ static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
> unsigned long prev, new;
>
> p = peer->metrics;
> - if (inet_metrics_new(peer))
> + if (inet_metrics_new(peer) || (dst->flags | DST_KEEP_METRICS)) {
> memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> + dst->flags &= ~DST_KEEP_METRICS;
> + }
The (dst->flags | DST_KEEP_METRICS) looks very suspicious. ;)
Bye,
Hannes
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v3] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-26 15:50 ` Hannes Frederic Sowa
@ 2014-03-26 15:56 ` Michal Kubecek
2014-03-26 16:05 ` [PATCH net-next v4] " Michal Kubecek
1 sibling, 0 replies; 31+ messages in thread
From: Michal Kubecek @ 2014-03-26 15:56 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: David Miller, netdev, kuznet, jmorris, yoshfuji, kaber
On Wed, Mar 26, 2014 at 04:50:47PM +0100, Hannes Frederic Sowa wrote:
> On Wed, Mar 26, 2014 at 04:42:45PM +0100, Michal Kubecek wrote:
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index fba54a4..b6dda6a 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -149,8 +149,10 @@ static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
> > unsigned long prev, new;
> >
> > p = peer->metrics;
> > - if (inet_metrics_new(peer))
> > + if (inet_metrics_new(peer) || (dst->flags | DST_KEEP_METRICS)) {
> > memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> > + dst->flags &= ~DST_KEEP_METRICS;
> > + }
>
> The (dst->flags | DST_KEEP_METRICS) looks very suspicious. ;)
Oops. I already fixed this typo... and then did the submission from a
different branch. :-(
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next v4] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-26 15:50 ` Hannes Frederic Sowa
2014-03-26 15:56 ` Michal Kubecek
@ 2014-03-26 16:05 ` Michal Kubecek
2014-03-27 5:06 ` Hannes Frederic Sowa
1 sibling, 1 reply; 31+ messages in thread
From: Michal Kubecek @ 2014-03-26 16:05 UTC (permalink / raw)
To: David S. Miller
Cc: Hannes Frederic Sowa, netdev, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.
Another problem is that old metrics in inetpeer can reappear
unexpectedly for a new route, e.g.
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip route del fec0::1
12sp0:~ # ip route add fec0::1 dev eth0
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
Resolve the first problem by moving the setting of metrics down
into fib6_add_rt2node() to the point we are sure we are
inserting the new route into the tree. Second problem is
addressed by introducing new flag DST_KEEP_METRICS which is set
for a new host route in ip6_route_add() and makes
ipv6_cow_metrics() always overwrite the metrics in inetpeer
(even if they are not "new"); it is reset after that.
v4: fix a typo making a condition always true (thanks to Hannes
Frederic Sowa)
v3: rewritten based on David Miller's idea to move setting the
metrics (and allocation in non-host case) down to the point we
already know the route is to be inserted. Also rebased to
net-next as it is quite late in the cycle.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
include/net/dst.h | 1 +
include/net/ip6_fib.h | 3 ++-
net/ipv6/ip6_fib.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
net/ipv6/route.c | 41 +++++++++--------------------------------
4 files changed, 56 insertions(+), 36 deletions(-)
diff --git a/include/net/dst.h b/include/net/dst.h
index e01a826..8cf6772 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -57,6 +57,7 @@ struct dst_entry {
#define DST_FAKE_RTABLE 0x0040
#define DST_XFRM_TUNNEL 0x0080
#define DST_XFRM_QUEUE 0x0100
+#define DST_KEEP_METRICS 0x0400
unsigned short pending_confirm;
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index aca0c27..9bcb220 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -284,7 +284,8 @@ struct fib6_node *fib6_locate(struct fib6_node *root,
void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
void *arg);
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info);
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len);
int fib6_del(struct rt6_info *rt, struct nl_info *info);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..4ee487b 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -638,12 +638,41 @@ static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
RTF_GATEWAY;
}
+static int fib6_commit_metrics(struct dst_entry *dst,
+ struct nlattr *mx, int mx_len)
+{
+ struct nlattr *nla;
+ int remaining;
+ u32 *mp;
+
+ if (dst->flags & DST_HOST) {
+ mp = dst_metrics_write_ptr(dst);
+ } else {
+ mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+ if (!mp)
+ return -ENOMEM;
+ dst_init_metrics(dst, mp, 0);
+ }
+
+ nla_for_each_attr(nla, mx, mx_len, remaining) {
+ int type = nla_type(nla);
+
+ if (type) {
+ if (type > RTAX_MAX)
+ return -EINVAL;
+
+ mp[type - 1] = nla_get_u32(nla);
+ }
+ }
+ return 0;
+}
+
/*
* Insert routing information in a node.
*/
static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
- struct nl_info *info)
+ struct nl_info *info, struct nlattr *mx, int mx_len)
{
struct rt6_info *iter = NULL;
struct rt6_info **ins;
@@ -653,6 +682,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
(info->nlh->nlmsg_flags & NLM_F_CREATE));
int found = 0;
bool rt_can_ecmp = rt6_qualify_for_ecmp(rt);
+ int err;
ins = &fn->leaf;
@@ -751,6 +781,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
pr_warn("NLM_F_CREATE should be set when creating new route\n");
add:
+ if (mx) {
+ err = fib6_commit_metrics(&rt->dst, mx, mx_len);
+ if (err)
+ return err;
+ }
rt->dst.rt6_next = iter;
*ins = rt;
rt->rt6i_node = fn;
@@ -770,6 +805,11 @@ add:
pr_warn("NLM_F_REPLACE set, but no existing node found!\n");
return -ENOENT;
}
+ if (mx) {
+ err = fib6_commit_metrics(&rt->dst, mx, mx_len);
+ if (err)
+ return err;
+ }
*ins = rt;
rt->rt6i_node = fn;
rt->dst.rt6_next = iter->dst.rt6_next;
@@ -806,7 +846,8 @@ void fib6_force_start_gc(struct net *net)
* with source addr info in sub-trees
*/
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
struct fib6_node *fn, *pn = NULL;
int err = -ENOMEM;
@@ -900,7 +941,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
}
#endif
- err = fib6_add_rt2node(fn, rt, info);
+ err = fib6_add_rt2node(fn, rt, info, mx, mx_len);
if (!err) {
fib6_start_gc(info->nl_net, rt);
if (!(rt->rt6i_flags & RTF_CACHE))
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fba54a4..4e5b19e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -149,8 +149,10 @@ static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
unsigned long prev, new;
p = peer->metrics;
- if (inet_metrics_new(peer))
+ if (inet_metrics_new(peer) || (dst->flags & DST_KEEP_METRICS)) {
memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
+ dst->flags &= ~DST_KEEP_METRICS;
+ }
new = (unsigned long) p;
prev = cmpxchg(&dst->_metrics, old, new);
@@ -857,14 +859,15 @@ EXPORT_SYMBOL(rt6_lookup);
be destroyed.
*/
-static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info)
+static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
int err;
struct fib6_table *table;
table = rt->rt6i_table;
write_lock_bh(&table->tb6_lock);
- err = fib6_add(&table->tb6_root, rt, info);
+ err = fib6_add(&table->tb6_root, rt, info, mx, mx_len);
write_unlock_bh(&table->tb6_lock);
return err;
@@ -875,7 +878,7 @@ int ip6_ins_rt(struct rt6_info *rt)
struct nl_info info = {
.nl_net = dev_net(rt->dst.dev),
};
- return __ip6_ins_rt(rt, &info);
+ return __ip6_ins_rt(rt, &info, NULL, 0);
}
static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort,
@@ -1544,16 +1547,8 @@ int ip6_route_add(struct fib6_config *cfg)
ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
rt->rt6i_dst.plen = cfg->fc_dst_len;
if (rt->rt6i_dst.plen == 128)
- rt->dst.flags |= DST_HOST;
+ rt->dst.flags |= DST_HOST | DST_KEEP_METRICS;
- if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
- u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
- if (!metrics) {
- err = -ENOMEM;
- goto out;
- }
- dst_init_metrics(&rt->dst, metrics, 0);
- }
#ifdef CONFIG_IPV6_SUBTREES
ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
rt->rt6i_src.plen = cfg->fc_src_len;
@@ -1672,31 +1667,13 @@ int ip6_route_add(struct fib6_config *cfg)
rt->rt6i_flags = cfg->fc_flags;
install_route:
- if (cfg->fc_mx) {
- struct nlattr *nla;
- int remaining;
-
- nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla_type(nla);
-
- if (type) {
- if (type > RTAX_MAX) {
- err = -EINVAL;
- goto out;
- }
-
- dst_metric_set(&rt->dst, type, nla_get_u32(nla));
- }
- }
- }
-
rt->dst.dev = dev;
rt->rt6i_idev = idev;
rt->rt6i_table = table;
cfg->fc_nlinfo.nl_net = dev_net(dev);
- return __ip6_ins_rt(rt, &cfg->fc_nlinfo);
+ return __ip6_ins_rt(rt, &cfg->fc_nlinfo, cfg->fc_mx, cfg->fc_mx_len);
out:
if (dev)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v4] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-26 16:05 ` [PATCH net-next v4] " Michal Kubecek
@ 2014-03-27 5:06 ` Hannes Frederic Sowa
2014-03-27 7:43 ` Michal Kubecek
2014-03-27 12:04 ` [PATCH net-next v5] " Michal Kubecek
0 siblings, 2 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-27 5:06 UTC (permalink / raw)
To: Michal Kubecek
Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On Wed, Mar 26, 2014 at 05:05:03PM +0100, Michal Kubecek wrote:
> If an IPv6 host route with metrics exists, an attempt to add a
> new route for the same target with different metrics fails but
> rewrites the metrics anyway:
>
> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
> 12sp0:~ # ip -6 route show
> fe80::/64 dev eth0 proto kernel metric 256
> fec0::1 dev eth0 metric 1024 rto_min lock 1s
> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
> RTNETLINK answers: File exists
> 12sp0:~ # ip -6 route show
> fe80::/64 dev eth0 proto kernel metric 256
> fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
>
> This is caused by all IPv6 host routes using the metrics in
> their inetpeer (or the shared default). This also holds for the
> new route created in ip6_route_add() which shares the metrics
> with the already existing route and thus ip6_route_add()
> rewrites the metrics even if the new route ends up not being
> used at all.
>
> Another problem is that old metrics in inetpeer can reappear
> unexpectedly for a new route, e.g.
>
> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
> 12sp0:~ # ip route del fec0::1
> 12sp0:~ # ip route add fec0::1 dev eth0
> 12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
> 12sp0:~ # ip -6 route show
> fe80::/64 dev eth0 proto kernel metric 256
> fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
>
> Resolve the first problem by moving the setting of metrics down
> into fib6_add_rt2node() to the point we are sure we are
> inserting the new route into the tree. Second problem is
> addressed by introducing new flag DST_KEEP_METRICS which is set
> for a new host route in ip6_route_add() and makes
> ipv6_cow_metrics() always overwrite the metrics in inetpeer
> (even if they are not "new"); it is reset after that.
>
> v4: fix a typo making a condition always true (thanks to Hannes
> Frederic Sowa)
>
> v3: rewritten based on David Miller's idea to move setting the
> metrics (and allocation in non-host case) down to the point we
> already know the route is to be inserted. Also rebased to
> net-next as it is quite late in the cycle.
>
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> ---
> include/net/dst.h | 1 +
> include/net/ip6_fib.h | 3 ++-
> net/ipv6/ip6_fib.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
> net/ipv6/route.c | 41 +++++++++--------------------------------
> 4 files changed, 56 insertions(+), 36 deletions(-)
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index e01a826..8cf6772 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -57,6 +57,7 @@ struct dst_entry {
> #define DST_FAKE_RTABLE 0x0040
> #define DST_XFRM_TUNNEL 0x0080
> #define DST_XFRM_QUEUE 0x0100
> +#define DST_KEEP_METRICS 0x0400
Minor nit: 0x0200
You jumped over one bit.
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index fba54a4..4e5b19e 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -149,8 +149,10 @@ static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
> unsigned long prev, new;
>
> p = peer->metrics;
> - if (inet_metrics_new(peer))
> + if (inet_metrics_new(peer) || (dst->flags & DST_KEEP_METRICS)) {
> memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> + dst->flags &= ~DST_KEEP_METRICS;
> + }
>
I wonder if we can make this more concurrency friendly:
The idea that came to my mind would be to store KEEP_METRICS flag in the
metrics pointer and introduce new DST_METRICS_FORCE_OVERWRITE flag like
DST_METRICS_READ_ONLY.
__DST_METRICS_PTR would have to mask it out, too (so would the check in
dst_metrics_write_ptr).
Then cmpxchg would atomically change status of KEEP_METRICS flag, too.
In fib6_commit_metrics we could then add the flag somehow if dst is
DST_HOST and mx != NULL.
Just a thought, depends if this feature would be useful for other
users, too.
Greetings,
Hannes
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v4] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-27 5:06 ` Hannes Frederic Sowa
@ 2014-03-27 7:43 ` Michal Kubecek
2014-03-27 12:04 ` [PATCH net-next v5] " Michal Kubecek
1 sibling, 0 replies; 31+ messages in thread
From: Michal Kubecek @ 2014-03-27 7:43 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On Thursday 27 of March 2014 06:06:18 Hannes Frederic Sowa wrote:
> On Wed, Mar 26, 2014 at 05:05:03PM +0100, Michal Kubecek wrote:
> > diff --git a/include/net/dst.h b/include/net/dst.h
> > index e01a826..8cf6772 100644
> > --- a/include/net/dst.h
> > +++ b/include/net/dst.h
> > @@ -57,6 +57,7 @@ struct dst_entry {
> >
> > #define DST_FAKE_RTABLE 0x0040
> > #define DST_XFRM_TUNNEL 0x0080
> > #define DST_XFRM_QUEUE 0x0100
> >
> > +#define DST_KEEP_METRICS 0x0400
>
> Minor nit: 0x0200
>
> You jumped over one bit.
Oh yes. This is because of DST_NOPEER which is still in net but was
removed in net-next.
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index fba54a4..4e5b19e 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -149,8 +149,10 @@ static u32 *ipv6_cow_metrics(struct dst_entry
> > *dst, unsigned long old)>
> > unsigned long prev, new;
> >
> > p = peer->metrics;
> >
> > - if (inet_metrics_new(peer))
> > + if (inet_metrics_new(peer) || (dst->flags & DST_KEEP_METRICS)) {
> >
> > memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> >
> > + dst->flags &= ~DST_KEEP_METRICS;
> > + }
>
> I wonder if we can make this more concurrency friendly:
>
> The idea that came to my mind would be to store KEEP_METRICS flag in
> the metrics pointer and introduce new DST_METRICS_FORCE_OVERWRITE
> flag like DST_METRICS_READ_ONLY.
I was also considering this as logically the flag belongs here more
than into the dst_entry flags. For some reason I was afraid it would
be less efficient but now thinking about it again, I don't see why it
should be.
> __DST_METRICS_PTR would have to mask it out, too (so would the check
> in dst_metrics_write_ptr).
I believe the test in dst_metrics_write_ptr() should stay the way it is.
We want to call cow_metrics() whenever the metrics are read only and
consider DST_METRICS_FORCE_OVERWRITE only in ipv6_cow_metrics(). But I
better check again.
> Then cmpxchg would atomically change status of KEEP_METRICS flag, too.
That would be useful. However, I'm more concerned about
> In fib6_commit_metrics we could then add the flag somehow if dst is
> DST_HOST and mx != NULL.
IMHO it should do so even if mx is null. Otherwise we could end up with
a dst using read only defaults but having old metrics hidden in inetpeer
which would get revived once cow_metrics() is called for any reason.
I'll prepare a patch and test it.
Michal Kubecek
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next v5] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-27 5:06 ` Hannes Frederic Sowa
2014-03-27 7:43 ` Michal Kubecek
@ 2014-03-27 12:04 ` Michal Kubecek
2014-03-27 16:30 ` Hannes Frederic Sowa
1 sibling, 1 reply; 31+ messages in thread
From: Michal Kubecek @ 2014-03-27 12:04 UTC (permalink / raw)
To: David S. Miller
Cc: Hannes Frederic Sowa, netdev, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.
Another problem is that old metrics in inetpeer can reappear
unexpectedly for a new route, e.g.
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip route del fec0::1
12sp0:~ # ip route add fec0::1 dev eth0
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
Resolve the first problem by moving the setting of metrics down
into fib6_add_rt2node() to the point we are sure we are
inserting the new route into the tree. Second problem is
addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
which is set for a new host route in ip6_route_add() and makes
ipv6_cow_metrics() always overwrite the metrics in inetpeer
(even if they are not "new"); it is reset after that.
v5: use a flag in _metrics member rather than one in flags
v4: fix a typo making a condition always true (thanks to Hannes
Frederic Sowa)
v3: rewritten based on David Miller's idea to move setting the
metrics (and allocation in non-host case) down to the point we
already know the route is to be inserted. Also rebased to
net-next as it is quite late in the cycle.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
include/net/dst.h | 11 +++++++++--
include/net/ip6_fib.h | 3 ++-
net/ipv6/ip6_fib.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
net/ipv6/route.c | 44 +++++++++++---------------------------------
4 files changed, 66 insertions(+), 39 deletions(-)
diff --git a/include/net/dst.h b/include/net/dst.h
index e01a826..46ed958 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -108,9 +108,11 @@ struct dst_entry {
u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old);
extern const u32 dst_default_metrics[];
-#define DST_METRICS_READ_ONLY 0x1UL
+#define DST_METRICS_READ_ONLY 0x1UL
+#define DST_METRICS_FORCE_OVERWRITE 0x2UL
+#define DST_METRICS_FLAGS 0x3UL
#define __DST_METRICS_PTR(Y) \
- ((u32 *)((Y) & ~DST_METRICS_READ_ONLY))
+ ((u32 *)((Y) & ~DST_METRICS_FLAGS))
#define DST_METRICS_PTR(X) __DST_METRICS_PTR((X)->_metrics)
static inline bool dst_metrics_read_only(const struct dst_entry *dst)
@@ -118,6 +120,11 @@ static inline bool dst_metrics_read_only(const struct dst_entry *dst)
return dst->_metrics & DST_METRICS_READ_ONLY;
}
+static inline void dst_metrics_set_force_overwrite(struct dst_entry *dst)
+{
+ dst->_metrics |= DST_METRICS_FORCE_OVERWRITE;
+}
+
void __dst_destroy_metrics_generic(struct dst_entry *dst, unsigned long old);
static inline void dst_destroy_metrics_generic(struct dst_entry *dst)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index aca0c27..9bcb220 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -284,7 +284,8 @@ struct fib6_node *fib6_locate(struct fib6_node *root,
void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
void *arg);
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info);
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len);
int fib6_del(struct rt6_info *rt, struct nl_info *info);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..4ee487b 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -638,12 +638,41 @@ static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt)
RTF_GATEWAY;
}
+static int fib6_commit_metrics(struct dst_entry *dst,
+ struct nlattr *mx, int mx_len)
+{
+ struct nlattr *nla;
+ int remaining;
+ u32 *mp;
+
+ if (dst->flags & DST_HOST) {
+ mp = dst_metrics_write_ptr(dst);
+ } else {
+ mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+ if (!mp)
+ return -ENOMEM;
+ dst_init_metrics(dst, mp, 0);
+ }
+
+ nla_for_each_attr(nla, mx, mx_len, remaining) {
+ int type = nla_type(nla);
+
+ if (type) {
+ if (type > RTAX_MAX)
+ return -EINVAL;
+
+ mp[type - 1] = nla_get_u32(nla);
+ }
+ }
+ return 0;
+}
+
/*
* Insert routing information in a node.
*/
static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
- struct nl_info *info)
+ struct nl_info *info, struct nlattr *mx, int mx_len)
{
struct rt6_info *iter = NULL;
struct rt6_info **ins;
@@ -653,6 +682,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
(info->nlh->nlmsg_flags & NLM_F_CREATE));
int found = 0;
bool rt_can_ecmp = rt6_qualify_for_ecmp(rt);
+ int err;
ins = &fn->leaf;
@@ -751,6 +781,11 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
pr_warn("NLM_F_CREATE should be set when creating new route\n");
add:
+ if (mx) {
+ err = fib6_commit_metrics(&rt->dst, mx, mx_len);
+ if (err)
+ return err;
+ }
rt->dst.rt6_next = iter;
*ins = rt;
rt->rt6i_node = fn;
@@ -770,6 +805,11 @@ add:
pr_warn("NLM_F_REPLACE set, but no existing node found!\n");
return -ENOENT;
}
+ if (mx) {
+ err = fib6_commit_metrics(&rt->dst, mx, mx_len);
+ if (err)
+ return err;
+ }
*ins = rt;
rt->rt6i_node = fn;
rt->dst.rt6_next = iter->dst.rt6_next;
@@ -806,7 +846,8 @@ void fib6_force_start_gc(struct net *net)
* with source addr info in sub-trees
*/
-int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
+int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
struct fib6_node *fn, *pn = NULL;
int err = -ENOMEM;
@@ -900,7 +941,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
}
#endif
- err = fib6_add_rt2node(fn, rt, info);
+ err = fib6_add_rt2node(fn, rt, info, mx, mx_len);
if (!err) {
fib6_start_gc(info->nl_net, rt);
if (!(rt->rt6i_flags & RTF_CACHE))
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fba54a4..b93ae6a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -149,7 +149,8 @@ static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
unsigned long prev, new;
p = peer->metrics;
- if (inet_metrics_new(peer))
+ if (inet_metrics_new(peer) ||
+ (old & DST_METRICS_FORCE_OVERWRITE))
memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
new = (unsigned long) p;
@@ -857,14 +858,15 @@ EXPORT_SYMBOL(rt6_lookup);
be destroyed.
*/
-static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info)
+static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info,
+ struct nlattr *mx, int mx_len)
{
int err;
struct fib6_table *table;
table = rt->rt6i_table;
write_lock_bh(&table->tb6_lock);
- err = fib6_add(&table->tb6_root, rt, info);
+ err = fib6_add(&table->tb6_root, rt, info, mx, mx_len);
write_unlock_bh(&table->tb6_lock);
return err;
@@ -875,7 +877,7 @@ int ip6_ins_rt(struct rt6_info *rt)
struct nl_info info = {
.nl_net = dev_net(rt->dst.dev),
};
- return __ip6_ins_rt(rt, &info);
+ return __ip6_ins_rt(rt, &info, NULL, 0);
}
static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort,
@@ -1543,17 +1545,11 @@ int ip6_route_add(struct fib6_config *cfg)
ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
rt->rt6i_dst.plen = cfg->fc_dst_len;
- if (rt->rt6i_dst.plen == 128)
- rt->dst.flags |= DST_HOST;
-
- if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
- u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
- if (!metrics) {
- err = -ENOMEM;
- goto out;
- }
- dst_init_metrics(&rt->dst, metrics, 0);
+ if (rt->rt6i_dst.plen == 128) {
+ rt->dst.flags |= DST_HOST;
+ dst_metrics_set_force_overwrite(&rt->dst);
}
+
#ifdef CONFIG_IPV6_SUBTREES
ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
rt->rt6i_src.plen = cfg->fc_src_len;
@@ -1672,31 +1668,13 @@ int ip6_route_add(struct fib6_config *cfg)
rt->rt6i_flags = cfg->fc_flags;
install_route:
- if (cfg->fc_mx) {
- struct nlattr *nla;
- int remaining;
-
- nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla_type(nla);
-
- if (type) {
- if (type > RTAX_MAX) {
- err = -EINVAL;
- goto out;
- }
-
- dst_metric_set(&rt->dst, type, nla_get_u32(nla));
- }
- }
- }
-
rt->dst.dev = dev;
rt->rt6i_idev = idev;
rt->rt6i_table = table;
cfg->fc_nlinfo.nl_net = dev_net(dev);
- return __ip6_ins_rt(rt, &cfg->fc_nlinfo);
+ return __ip6_ins_rt(rt, &cfg->fc_nlinfo, cfg->fc_mx, cfg->fc_mx_len);
out:
if (dev)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v5] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-27 12:04 ` [PATCH net-next v5] " Michal Kubecek
@ 2014-03-27 16:30 ` Hannes Frederic Sowa
2014-03-27 19:09 ` David Miller
0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2014-03-27 16:30 UTC (permalink / raw)
To: Michal Kubecek
Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On Thu, Mar 27, 2014 at 01:04:08PM +0100, Michal Kubecek wrote:
> If an IPv6 host route with metrics exists, an attempt to add a
> new route for the same target with different metrics fails but
> rewrites the metrics anyway:
>
> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
> 12sp0:~ # ip -6 route show
> fe80::/64 dev eth0 proto kernel metric 256
> fec0::1 dev eth0 metric 1024 rto_min lock 1s
> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
> RTNETLINK answers: File exists
> 12sp0:~ # ip -6 route show
> fe80::/64 dev eth0 proto kernel metric 256
> fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
>
> This is caused by all IPv6 host routes using the metrics in
> their inetpeer (or the shared default). This also holds for the
> new route created in ip6_route_add() which shares the metrics
> with the already existing route and thus ip6_route_add()
> rewrites the metrics even if the new route ends up not being
> used at all.
>
> Another problem is that old metrics in inetpeer can reappear
> unexpectedly for a new route, e.g.
>
> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
> 12sp0:~ # ip route del fec0::1
> 12sp0:~ # ip route add fec0::1 dev eth0
> 12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
> 12sp0:~ # ip -6 route show
> fe80::/64 dev eth0 proto kernel metric 256
> fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
>
> Resolve the first problem by moving the setting of metrics down
> into fib6_add_rt2node() to the point we are sure we are
> inserting the new route into the tree. Second problem is
> addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
> which is set for a new host route in ip6_route_add() and makes
> ipv6_cow_metrics() always overwrite the metrics in inetpeer
> (even if they are not "new"); it is reset after that.
>
> v5: use a flag in _metrics member rather than one in flags
>
> v4: fix a typo making a condition always true (thanks to Hannes
> Frederic Sowa)
>
> v3: rewritten based on David Miller's idea to move setting the
> metrics (and allocation in non-host case) down to the point we
> already know the route is to be inserted. Also rebased to
> net-next as it is quite late in the cycle.
>
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
I like your patch! :)
Actually I was a bit surprised how easy it turned out.
Thanks!
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next v5] ipv6: do not overwrite inetpeer metrics prematurely
2014-03-27 16:30 ` Hannes Frederic Sowa
@ 2014-03-27 19:09 ` David Miller
0 siblings, 0 replies; 31+ messages in thread
From: David Miller @ 2014-03-27 19:09 UTC (permalink / raw)
To: hannes; +Cc: mkubecek, netdev, kuznet, jmorris, yoshfuji, kaber
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 27 Mar 2014 17:30:46 +0100
> On Thu, Mar 27, 2014 at 01:04:08PM +0100, Michal Kubecek wrote:
>> If an IPv6 host route with metrics exists, an attempt to add a
>> new route for the same target with different metrics fails but
>> rewrites the metrics anyway:
>>
>> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
>> 12sp0:~ # ip -6 route show
>> fe80::/64 dev eth0 proto kernel metric 256
>> fec0::1 dev eth0 metric 1024 rto_min lock 1s
>> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
>> RTNETLINK answers: File exists
>> 12sp0:~ # ip -6 route show
>> fe80::/64 dev eth0 proto kernel metric 256
>> fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
>>
>> This is caused by all IPv6 host routes using the metrics in
>> their inetpeer (or the shared default). This also holds for the
>> new route created in ip6_route_add() which shares the metrics
>> with the already existing route and thus ip6_route_add()
>> rewrites the metrics even if the new route ends up not being
>> used at all.
>>
>> Another problem is that old metrics in inetpeer can reappear
>> unexpectedly for a new route, e.g.
>>
>> 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
>> 12sp0:~ # ip route del fec0::1
>> 12sp0:~ # ip route add fec0::1 dev eth0
>> 12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
>> 12sp0:~ # ip -6 route show
>> fe80::/64 dev eth0 proto kernel metric 256
>> fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
>>
>> Resolve the first problem by moving the setting of metrics down
>> into fib6_add_rt2node() to the point we are sure we are
>> inserting the new route into the tree. Second problem is
>> addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
>> which is set for a new host route in ip6_route_add() and makes
>> ipv6_cow_metrics() always overwrite the metrics in inetpeer
>> (even if they are not "new"); it is reset after that.
>>
>> v5: use a flag in _metrics member rather than one in flags
>>
>> v4: fix a typo making a condition always true (thanks to Hannes
>> Frederic Sowa)
>>
>> v3: rewritten based on David Miller's idea to move setting the
>> metrics (and allocation in non-host case) down to the point we
>> already know the route is to be inserted. Also rebased to
>> net-next as it is quite late in the cycle.
>>
>> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
>
> I like your patch! :)
>
> Actually I was a bit surprised how easy it turned out.
>
> Thanks!
>
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Yeah this looks fantastic, applied, thanks!
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2014-03-27 19:09 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-06 9:50 [PATCH net] ipv6: do not overwrite inetpeer metrics prematurely Michal Kubecek
2014-03-06 19:24 ` David Miller
2014-03-06 20:06 ` Michal Kubecek
2014-03-07 12:36 ` [PATCH net v2] " Michal Kubecek
2014-03-07 20:52 ` [PATCH net] " David Miller
2014-03-07 21:38 ` Michal Kubecek
2014-03-08 8:34 ` Hannes Frederic Sowa
2014-03-08 8:06 ` Hannes Frederic Sowa
2014-03-10 0:26 ` David Miller
2014-03-10 0:52 ` Hannes Frederic Sowa
2014-03-10 5:03 ` David Miller
2014-03-10 8:15 ` Michal Kubecek
2014-03-10 12:00 ` Hannes Frederic Sowa
2014-03-10 13:15 ` Michal Kubecek
2014-03-11 2:38 ` David Miller
2014-03-11 9:53 ` Michal Kubecek
2014-03-11 15:08 ` Michal Kubecek
2014-03-11 15:20 ` Hannes Frederic Sowa
2014-03-11 15:39 ` Michal Kubecek
2014-03-25 19:11 ` Hannes Frederic Sowa
2014-03-26 15:09 ` Michal Kubecek
2014-03-26 15:42 ` [PATCH net-next v3] " Michal Kubecek
2014-03-26 15:50 ` Hannes Frederic Sowa
2014-03-26 15:56 ` Michal Kubecek
2014-03-26 16:05 ` [PATCH net-next v4] " Michal Kubecek
2014-03-27 5:06 ` Hannes Frederic Sowa
2014-03-27 7:43 ` Michal Kubecek
2014-03-27 12:04 ` [PATCH net-next v5] " Michal Kubecek
2014-03-27 16:30 ` Hannes Frederic Sowa
2014-03-27 19:09 ` David Miller
2014-03-12 20:54 ` [PATCH net] " David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).