ECMP ipv6 vs ipv4

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* ECMP ipv6 vs ipv4
@ 2013-04-15  7:58 Wilco Baan Hofman
  2013-04-15 15:51 ` Nicolas Dichtel
  0 siblings, 1 reply; 7+ messages in thread
From: Wilco Baan Hofman @ 2013-04-15  7:58 UTC (permalink / raw)
  To: netdev

Hi,

I'm working on a patch to implement 'nexthop weight' for multipath ipv6.
However, the ECMPv6 implementation has a few flaws that are quite
annoying.

One of the flaws is that the netlink nexthop API is asymmetrical, you
can add nexthops through the netlink API, but when the result is
requested it is completely different, resulting in bird6 removing the
route as it does not match the initial route set.

Another one of the flaws is that if I add nexthop weight or algorithm
(weighted hash or weighted random) I need to add this to the main rt
node, this seems like an inefficient memory structure, as this needs to
be added to all the siblings as well.

I propose that we have a nexthop structure to an exclusive route,
similar what we have for IPv4, where we store the gateway, device and
weight for all nexthops and the algorithm in the route. This would make
the netlink API symmetrical again and fixes the n*n inefficiencies when
adding routes (all siblings need to know about all siblings).

What are your thoughts on this?

Regards,

Wilco Baan Hofman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ECMP ipv6 vs ipv4
  2013-04-15  7:58 ECMP ipv6 vs ipv4 Wilco Baan Hofman
@ 2013-04-15 15:51 ` Nicolas Dichtel
  2013-04-15 16:53   ` Wilco Baan Hofman
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Dichtel @ 2013-04-15 15:51 UTC (permalink / raw)
  To: Wilco Baan Hofman; +Cc: netdev

Le 15/04/2013 09:58, Wilco Baan Hofman a écrit :
> Hi,
>
> I'm working on a patch to implement 'nexthop weight' for multipath ipv6.
> However, the ECMPv6 implementation has a few flaws that are quite
> annoying.
>
> One of the flaws is that the netlink nexthop API is asymmetrical, you
> can add nexthops through the netlink API, but when the result is
> requested it is completely different, resulting in bird6 removing the
> route as it does not match the initial route set.
In fact, there is two ways to add ECMP routes:
$ ip -6 route add 3ffe:304:124:2306::/64 \
	nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
	nexthop via fe80::230:1bff:feb4:dd4f dev eth0
or
$ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
eth0
$ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
eth0

Note that the second way matchs what is returned by the kernel (ie one entry per 
nexthop).

>
> Another one of the flaws is that if I add nexthop weight or algorithm
> (weighted hash or weighted random) I need to add this to the main rt
> node, this seems like an inefficient memory structure, as this needs to
> be added to all the siblings as well.
Nexthop weight (rtnh->rtnh_hops) is not implemented.

>
> I propose that we have a nexthop structure to an exclusive route,
> similar what we have for IPv4, where we store the gateway, device and
> weight for all nexthops and the algorithm in the route. This would make
> the netlink API symmetrical again and fixes the n*n inefficiencies when
> adding routes (all siblings need to know about all siblings).
>
> What are your thoughts on this?
>
> Regards,
>
> Wilco Baan Hofman
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ECMP ipv6 vs ipv4
  2013-04-15 15:51 ` Nicolas Dichtel
@ 2013-04-15 16:53   ` Wilco Baan Hofman
  2013-04-17  9:03     ` Nicolas Dichtel
  0 siblings, 1 reply; 7+ messages in thread
From: Wilco Baan Hofman @ 2013-04-15 16:53 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev

On Mon, 2013-04-15 at 17:51 +0200, Nicolas Dichtel wrote:
> Le 15/04/2013 09:58, Wilco Baan Hofman a écrit :
> > Hi,
> >
> > I'm working on a patch to implement 'nexthop weight' for multipath ipv6.
> > However, the ECMPv6 implementation has a few flaws that are quite
> > annoying.
> >
> > One of the flaws is that the netlink nexthop API is asymmetrical, you
> > can add nexthops through the netlink API, but when the result is
> > requested it is completely different, resulting in bird6 removing the
> > route as it does not match the initial route set.
> In fact, there is two ways to add ECMP routes:
> $ ip -6 route add 3ffe:304:124:2306::/64 \
> 	nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
> 	nexthop via fe80::230:1bff:feb4:dd4f dev eth0
> or
> $ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
> eth0
> $ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
> eth0

> Note that the second way matchs what is returned by the kernel (ie one entry per 
> nexthop).

Sure, but how do we add nexthop weights and algorithm selection (hash,
random) to this API? I personally prefer to have the routing behaviour
of ipv4 and ipv6 to be as similar as possible, as the basics are the
same anyway.

> >
> > Another one of the flaws is that if I add nexthop weight or algorithm
> > (weighted hash or weighted random) I need to add this to the main rt
> > node, this seems like an inefficient memory structure, as this needs to
> > be added to all the siblings as well.
> Nexthop weight (rtnh->rtnh_hops) is not implemented.

Yes it is... in my tree, but I want to extend it to also include support
for algorithm for hash based, etc.. and to keep it as close to the
existing APIs as possible I think the nexthop structure makes the most
sense for this.

> >
> > I propose that we have a nexthop structure to an exclusive route,
> > similar what we have for IPv4, where we store the gateway, device and
> > weight for all nexthops and the algorithm in the route. This would make
> > the netlink API symmetrical again and fixes the n*n inefficiencies when
> > adding routes (all siblings need to know about all siblings).
> >
> > What are your thoughts on this?
> >
This stands :)

-- Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ECMP ipv6 vs ipv4
  2013-04-15 16:53   ` Wilco Baan Hofman
@ 2013-04-17  9:03     ` Nicolas Dichtel
  2013-04-17 13:16       ` Wilco Baan Hofman
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Dichtel @ 2013-04-17  9:03 UTC (permalink / raw)
  To: Wilco Baan Hofman; +Cc: netdev

Le 15/04/2013 18:53, Wilco Baan Hofman a écrit :
> On Mon, 2013-04-15 at 17:51 +0200, Nicolas Dichtel wrote:
>> Le 15/04/2013 09:58, Wilco Baan Hofman a écrit :
>>> Hi,
>>>
>>> I'm working on a patch to implement 'nexthop weight' for multipath ipv6.
>>> However, the ECMPv6 implementation has a few flaws that are quite
>>> annoying.
>>>
>>> One of the flaws is that the netlink nexthop API is asymmetrical, you
>>> can add nexthops through the netlink API, but when the result is
>>> requested it is completely different, resulting in bird6 removing the
>>> route as it does not match the initial route set.
>> In fact, there is two ways to add ECMP routes:
>> $ ip -6 route add 3ffe:304:124:2306::/64 \
>> 	nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
>> 	nexthop via fe80::230:1bff:feb4:dd4f dev eth0
>> or
>> $ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
>> eth0
>> $ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
>> eth0
>
>> Note that the second way matchs what is returned by the kernel (ie one entry per
>> nexthop).
>
> Sure, but how do we add nexthop weights and algorithm selection (hash,
> random) to this API? I personally prefer to have the routing behaviour
> of ipv4 and ipv6 to be as similar as possible, as the basics are the
> same anyway.
You can use something like this:

$ ip -6 route add 3ffe:304:124:2306::/64 dev eth0 nexthop via 
fe80::230:1bff:feb4:dd4f weight 1
$ ip -6 route append 3ffe:304:124:2306::/64 dev eth0 nexthop via 
fe80::230:1bff:feb4:e05c weight 2

>
>>>
>>> Another one of the flaws is that if I add nexthop weight or algorithm
>>> (weighted hash or weighted random) I need to add this to the main rt
>>> node, this seems like an inefficient memory structure, as this needs to
>>> be added to all the siblings as well.
>> Nexthop weight (rtnh->rtnh_hops) is not implemented.
>
> Yes it is... in my tree, but I want to extend it to also include support
> for algorithm for hash based, etc.. and to keep it as close to the
> existing APIs as possible I think the nexthop structure makes the most
> sense for this.
>
>>>
>>> I propose that we have a nexthop structure to an exclusive route,
>>> similar what we have for IPv4, where we store the gateway, device and
>>> weight for all nexthops and the algorithm in the route. This would make
>>> the netlink API symmetrical again and fixes the n*n inefficiencies when
>>> adding routes (all siblings need to know about all siblings).
>>>
>>> What are your thoughts on this?
The pro of the current implementation is that you can add or delete a nexthop 
withtout removing the whole route. You don't need to list again all nexthops 
each time you want to modify one.

Nicolas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ECMP ipv6 vs ipv4
  2013-04-17  9:03     ` Nicolas Dichtel
@ 2013-04-17 13:16       ` Wilco Baan Hofman
  2013-04-17 14:14         ` Nicolas Dichtel
  0 siblings, 1 reply; 7+ messages in thread
From: Wilco Baan Hofman @ 2013-04-17 13:16 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev

On Wed, 2013-04-17 at 11:03 +0200, Nicolas Dichtel wrote:

> > Sure, but how do we add nexthop weights and algorithm selection (hash,
> > random) to this API? I personally prefer to have the routing behaviour
> > of ipv4 and ipv6 to be as similar as possible, as the basics are the
> > same anyway.
> You can use something like this:
> 
> $ ip -6 route add 3ffe:304:124:2306::/64 dev eth0 nexthop via 
> fe80::230:1bff:feb4:dd4f weight 1
> $ ip -6 route append 3ffe:304:124:2306::/64 dev eth0 nexthop via 
> fe80::230:1bff:feb4:e05c weight 2
> 
> >
> >>>
> >>> Another one of the flaws is that if I add nexthop weight or algorithm
> >>> (weighted hash or weighted random) I need to add this to the main rt
> >>> node, this seems like an inefficient memory structure, as this needs to
> >>> be added to all the siblings as well.
> >> Nexthop weight (rtnh->rtnh_hops) is not implemented.
> >
> > Yes it is... in my tree, but I want to extend it to also include support
> > for algorithm for hash based, etc.. and to keep it as close to the
> > existing APIs as possible I think the nexthop structure makes the most
> > sense for this.
> >
> >>>
> >>> I propose that we have a nexthop structure to an exclusive route,
> >>> similar what we have for IPv4, where we store the gateway, device and
> >>> weight for all nexthops and the algorithm in the route. This would make
> >>> the netlink API symmetrical again and fixes the n*n inefficiencies when
> >>> adding routes (all siblings need to know about all siblings).
> >>>
> >>> What are your thoughts on this?
> The pro of the current implementation is that you can add or delete a nexthop 
> withtout removing the whole route. You don't need to list again all nexthops 
> each time you want to modify one.

That would also be possible using ip -6 route change, it'll be more
efficient for insertions and more consistent with the IPv4
implementation. Remember that most code is in fact shared between IPv4
and IPv6 implementations for routing protocol suites.

For bird it would be much more convenient to have the same API work for
both as the code is shared (with minor differences). 

The memory structure like below would make sense and you can expand it
as well:

struct ip6_nexthop {
	int               flags; /* algorithm per packet or hash, etc */
	struct list_head  *hops; /* nh_via */
};
struct ip6_nh {
	int              ifindex;
	struct in6_addr  rt6i_gateway;
	char             weight;
	int              flags; /* pervasive, onlink */
};

I'm not sure how to make this map correctly to the append API.. I think
we need to make sure that all APIs either are consistent and symmetrical
or don't work from day 1.

I am willing to implement this, including algorithm support using the
netlink nexthop API, like the IPv4 implementation.. or change the IPv4
implementation, but either way I feel they need to be consistent.


Regards,

Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ECMP ipv6 vs ipv4
  2013-04-17 13:16       ` Wilco Baan Hofman
@ 2013-04-17 14:14         ` Nicolas Dichtel
  2013-04-17 15:22           ` Wilco Baan Hofman
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Dichtel @ 2013-04-17 14:14 UTC (permalink / raw)
  To: Wilco Baan Hofman; +Cc: netdev

Le 17/04/2013 15:16, Wilco Baan Hofman a écrit :
> On Wed, 2013-04-17 at 11:03 +0200, Nicolas Dichtel wrote:
>
>>> Sure, but how do we add nexthop weights and algorithm selection (hash,
>>> random) to this API? I personally prefer to have the routing behaviour
>>> of ipv4 and ipv6 to be as similar as possible, as the basics are the
>>> same anyway.
>> You can use something like this:
>>
>> $ ip -6 route add 3ffe:304:124:2306::/64 dev eth0 nexthop via
>> fe80::230:1bff:feb4:dd4f weight 1
>> $ ip -6 route append 3ffe:304:124:2306::/64 dev eth0 nexthop via
>> fe80::230:1bff:feb4:e05c weight 2
>>
>>>
>>>>>
>>>>> Another one of the flaws is that if I add nexthop weight or algorithm
>>>>> (weighted hash or weighted random) I need to add this to the main rt
>>>>> node, this seems like an inefficient memory structure, as this needs to
>>>>> be added to all the siblings as well.
>>>> Nexthop weight (rtnh->rtnh_hops) is not implemented.
>>>
>>> Yes it is... in my tree, but I want to extend it to also include support
>>> for algorithm for hash based, etc.. and to keep it as close to the
>>> existing APIs as possible I think the nexthop structure makes the most
>>> sense for this.
>>>
>>>>>
>>>>> I propose that we have a nexthop structure to an exclusive route,
>>>>> similar what we have for IPv4, where we store the gateway, device and
>>>>> weight for all nexthops and the algorithm in the route. This would make
>>>>> the netlink API symmetrical again and fixes the n*n inefficiencies when
>>>>> adding routes (all siblings need to know about all siblings).
>>>>>
>>>>> What are your thoughts on this?
>> The pro of the current implementation is that you can add or delete a nexthop
>> withtout removing the whole route. You don't need to list again all nexthops
>> each time you want to modify one.
>
> That would also be possible using ip -6 route change, it'll be more
> efficient for insertions and more consistent with the IPv4
> implementation. Remember that most code is in fact shared between IPv4
> and IPv6 implementations for routing protocol suites.
>
> For bird it would be much more convenient to have the same API work for
> both as the code is shared (with minor differences).
>
> The memory structure like below would make sense and you can expand it
> as well:
>
> struct ip6_nexthop {
> 	int               flags; /* algorithm per packet or hash, etc */
> 	struct list_head  *hops; /* nh_via */
> };
> struct ip6_nh {
> 	int              ifindex;
> 	struct in6_addr  rt6i_gateway;
> 	char             weight;
> 	int              flags; /* pervasive, onlink */
> };
>
> I'm not sure how to make this map correctly to the append API.. I think
> we need to make sure that all APIs either are consistent and symmetrical
> or don't work from day 1.
Maybe the error was to propose two API to insert ECMPv6 routes, but as soon as 
there is two API, one will not be symetric with what is returned by the kernel ;-)

>
> I am willing to implement this, including algorithm support using the
> netlink nexthop API, like the IPv4 implementation.. or change the IPv4
> implementation, but either way I feel they need to be consistent.
I'm not sure that this is a major argument. There is already differences between 
IPv4 and IPv6 (for example, IPv4 addresses are kept when an interface is down, 
not IPv6 addresses, netlink messages are sent when routes are removed after 
putting down an interface in IPv6 but not in IPv4). But I let other speak about 
this.
What is important is to avoid breaking existing API.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ECMP ipv6 vs ipv4
  2013-04-17 14:14         ` Nicolas Dichtel
@ 2013-04-17 15:22           ` Wilco Baan Hofman
  0 siblings, 0 replies; 7+ messages in thread
From: Wilco Baan Hofman @ 2013-04-17 15:22 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev

On Wed, 2013-04-17 at 16:14 +0200, Nicolas Dichtel wrote:
> Le 17/04/2013 15:16, Wilco Baan Hofman a écrit :
> > On Wed, 2013-04-17 at 11:03 +0200, Nicolas Dichtel wrote:
> >
> >>>>>
> >>>>> I propose that we have a nexthop structure to an exclusive route,
> >>>>> similar what we have for IPv4, where we store the gateway, device and
> >>>>> weight for all nexthops and the algorithm in the route. This would make
> >>>>> the netlink API symmetrical again and fixes the n*n inefficiencies when
> >>>>> adding routes (all siblings need to know about all siblings).
> >>>>>
> >>>>> What are your thoughts on this?
> >> The pro of the current implementation is that you can add or delete a nexthop
> >> withtout removing the whole route. You don't need to list again all nexthops
> >> each time you want to modify one.
> >
> > That would also be possible using ip -6 route change, it'll be more
> > efficient for insertions and more consistent with the IPv4
> > implementation. Remember that most code is in fact shared between IPv4
> > and IPv6 implementations for routing protocol suites.
> >
> > For bird it would be much more convenient to have the same API work for
> > both as the code is shared (with minor differences).
> >
> > The memory structure like below would make sense and you can expand it
> > as well:
> >
> > struct ip6_nexthop {
> > 	int               flags; /* algorithm per packet or hash, etc */
> > 	struct list_head  *hops; /* nh_via */
> > };
> > struct ip6_nh {
> > 	int              ifindex;
> > 	struct in6_addr  rt6i_gateway;
> > 	char             weight;
> > 	int              flags; /* pervasive, onlink */
> > };
> >
> > I'm not sure how to make this map correctly to the append API.. I think
> > we need to make sure that all APIs either are consistent and symmetrical
> > or don't work from day 1.
> Maybe the error was to propose two API to insert ECMPv6 routes, but as soon as 
> there is two API, one will not be symetric with what is returned by the kernel ;-)

Yeah, I'm not a fan, especially when it doesn't map 1:1 with what's
going on.


> >
> > I am willing to implement this, including algorithm support using the
> > netlink nexthop API, like the IPv4 implementation.. or change the IPv4
> > implementation, but either way I feel they need to be consistent.
> I'm not sure that this is a major argument. There is already differences between 
> IPv4 and IPv6 (for example, IPv4 addresses are kept when an interface is down, 
> not IPv6 addresses, netlink messages are sent when routes are removed after 
> putting down an interface in IPv6 but not in IPv4). But I let other speak about 
> this.

I would prefer to have fewer differences between IPv4 and IPv6 handling
instead of more, unless the RFCs demand different behaviour.

> What is important is to avoid breaking existing API.
> 

I sort of agree, but quagga support is on hold until this is resolved,
and bird does not support it properly until we resolve this. The latter
I intend to fix myself and I am in contact with Quagga developers.
Static via iproute is a slightly different story though.


If no-one else comments, I'll start on writing a patch to support the
netlink nexthop API with weights and per-packet and weighted hash
algorithms on an exclusive route. I'll also see if I can support ip
route append if nexthop is specified to add a nexthop to the list, but
this shall be a different patch and it may not map well.

I would like to hear some more thoughts on this though.


Wilco Baan Hofman

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-04-17 15:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15  7:58 ECMP ipv6 vs ipv4 Wilco Baan Hofman
2013-04-15 15:51 ` Nicolas Dichtel
2013-04-15 16:53   ` Wilco Baan Hofman
2013-04-17  9:03     ` Nicolas Dichtel
2013-04-17 13:16       ` Wilco Baan Hofman
2013-04-17 14:14         ` Nicolas Dichtel
2013-04-17 15:22           ` Wilco Baan Hofman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox