From: Vivek Venkatraman <vivek@cumulusnetworks.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, roopa <roopa@cumulusnetworks.com>,
Stephen Hemminger <stephen@networkplumber.org>,
santiago@crfreenet.org
Subject: Re: [PATCH net-next 8/8] ipmpls: Basic device for injecting packets into an mpls tunnel
Date: Thu, 5 Mar 2015 22:05:24 -0800 [thread overview]
Message-ID: <CAMs_D19uPsshwSDtwP44GCJNO-xeWDVP7Lp03UnC5c7+J2RtPQ@mail.gmail.com> (raw)
In-Reply-To: <871tl39q8v.fsf@x220.int.ebiederm.org>
On Thu, Mar 5, 2015 at 11:52 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Vivek Venkatraman <vivek@cumulusnetworks.com> writes:
>
>> On Thu, Mar 5, 2015 at 6:00 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Vivek Venkatraman <vivek@cumulusnetworks.com> writes:
>>>
>>>> It is great to see an MPLS data plane implementation make it into the
>>>> kernel. I have a couple of questions on this patch.
>>>>
>>>> On Wed, Feb 25, 2015 at 9:18 AM, Eric W. Biederman
>>>> <ebiederm@xmission.com> wrote:
>>>>>
>>>>>
>>>>> Allow creating an mpls tunnel endpoint with
>>>>>
>>>>> ip link add type ipmpls.
>>>>>
>>>>> This tunnel has an mpls label for it's link layer address, and by
>>>>> default sends all ingress packets over loopback to the local MPLS
>>>>> forwarding logic which performs all of the work.
>>>>>
>>>>
>>>> Is it correct that to achieve IPoMPLS, each LSP has to be installed as
>>>> a link/netdevice?
>>>
>>> This is still a bit in flux. The ingress logic is not yet merged. When
>>> I resent the patches I did not resend this one as I am less happy with
>>> it than I am about the others and the problem is orthogonal.
>>>
>>>> If ingress packets loopback with the label associated with the link to
>>>> hit the MPLS forwarding logic, how does it work if each packet has to
>>>> be then forwarded with a different label stack? One use case is a
>>>> common IP/MPLS application such as L3VPNs (RFC 4364) where multiple
>>>> VPNs may reside over the same LSP, each having its own VPN (inner)
>>>> label.
>>>
>>> If we continue using this approach (which I picked because it was simple
>>> for bootstrapping and testing) the way it would work is that you have a
>>> local label that when you forward packets with that label all of the
>>> other needed labels are pushed.
>>>
>>
>> Yes, I can see that this approach is simple for bootstrapping.
>>
>> However, I think the need for a local label is going to be bit of a
>> challenge as well as not intuitive. I say the latter because at an
>> ingress LSP (i.e., the kernel is performing an MPLS LER function), you
>> are only pushing labels just based on normal IP routing (or L2, if
>> implementing a pseudowire), so needing to assign a local label that
>> then gets popped seems convoluted. The challenge is because the local
>> label has to be unique for the label stack that needs to be imposed,
>> it is not just a 1-to-1 mapping with the tunnel.
>
> Agreed.
>
>>> That said I think the approach I chose has a lot going for it.
>>>
>>> Fundamentally I think the ingress to an mpls tunnel fundamentally needs
>>> the same knobs and parameters as struct mpls_route. Aka which machine
>>> do we forward the packets to, and which labels do we push.
>>>
>>> The extra decrement of the hop count on ingress is not my favorite
>>> thing.
>>>
>>> The question in my mind is how do we select which mpls route to use.
>>> Spending a local label for that purpose does not seem particularly
>>> unreasonable.
>>>
>>> Using one network device per tunnel it a bit more questionable. I keep
>>> playing with ideas that would allow a single device to serve multiple
>>> mpls tunnels.
>>>
>>
>> For the scenario I mentioned (L3VPNs) which would be common at the
>> edge, isn't it a network device per "VPN" (or more precisely, per VPN
>> per LSP)? I don't think this scales well.
>
> We need a data structure in the kernel for each
> Forwarding Equivalent Class (aka per VPN per LSP) the only question is
> how expensive that data structure should be.
>
> In big-O notation the scaling is equal. The practical question how large
> are our constant factors and are they a problem. If the L3VPN results
> in enough entries on a machine then it is a scaling problem otherwise
> not so much.
>
>>> For going from normal ip routing to mpls routing somewhere we need the
>>> the destination ip prefix to mpls tunnel mapping. There are a couple of
>>> possible ways this could be solved.
>>> - One ingress network device per mpls tunnel.
>>> - One ingress network device and with with a a configurable routing
>>> prefix to mpls mapping. Possibly loaded on the fly. net/atm/clip.c
>>> does something like this for ATM virtual circuits.
>>> - One ingress network device that looks at IP_ROUTE_CLASSID and
>>> use that to select the mpls labels to use.
>>> - Teach the IP network stack how to insert packets in tunnels without
>>> needing a magic netdevice.
>>>
>>
>> I feel it should be along the lines of "teach the IP network stack how
>> to push labels".
>
> That phrasing sets off alarms bells in my mind of mpls specific hacks in
> the kernel, which most likely will cause performance regression and
> maintenance complications.
>
>> In general, MPLS LSPs can be setup as hop-by-hop
>> routed LSPs (when using a signaling protocol like LDP or BGP) as well
>> as tunnels that may take a different path than normal routing. I feel
>> it is good if the dataplane can support both models. In the former,
>> the IP network stack should push the labels which are just
>> encapsulation and then just transmit on the underlying netdevice that
>> corresponds to the neighbor interface. To achieve this, maybe it is
>> the neighbor (nexthop) that has to reference the mpls_route. In the
>> latter (LSPs are treated as tunnels and/or this is the only model
>> supported), the IP network stack would still need to impose any inner
>> labels (i.e., VPN or pseudowire, later on Entropy or Segment labels)
>> and then transmit over the tunnel netdevice which would impose the
>> tunnel label.
>
> Potentially. This part of the discussion has reached the point where I
> need to see code to carry this part of the discussion any farther.
>
> Eric
I'm in full agreement too that there shouldn't be any mpls-specific
hacks in the kernel.
Thank you for the discussion. We shall take your patches and
brainstorm internally on what we'd add or change. As soon as we have
code to share, I'll come back to seek opinion and continue the
discussion.
Vivek
next prev parent reply other threads:[~2015-03-06 6:05 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-25 17:09 [PATCH net-next 0/8] Basic MPLS support Eric W. Biederman
2015-02-25 17:13 ` [PATCH net-next 1/8] mpls: Refactor how the mpls module is built Eric W. Biederman
2015-02-26 2:05 ` Simon Horman
2015-02-26 2:15 ` Eric W. Biederman
2015-02-26 2:28 ` Simon Horman
2015-02-25 17:14 ` [PATCH net-next 2/8] mpls: Basic routing support Eric W. Biederman
2015-02-25 17:15 ` [PATCH net-next 3/8] mpls: Add a sysctl to control the size of the mpls label table Eric W. Biederman
2015-02-25 17:16 ` [PATCH net-next 4/8] mpls: Basic support for adding and removing routes Eric W. Biederman
2015-02-25 17:16 ` [PATCH net-next 5/8] mpls: Functions for reading and wrinting mpls labels over netlink Eric W. Biederman
2015-02-25 17:17 ` [PATCH net-next 6/8] mpls: Netlink commands to add, remove, and dump routes Eric W. Biederman
2015-02-25 17:18 ` [PATCH net-next 8/8] ipmpls: Basic device for injecting packets into an mpls tunnel Eric W. Biederman
2015-03-05 9:17 ` Vivek Venkatraman
2015-03-05 14:00 ` Eric W. Biederman
2015-03-05 16:25 ` Vivek Venkatraman
2015-03-05 19:52 ` Eric W. Biederman
2015-03-06 6:05 ` Vivek Venkatraman [this message]
2015-03-07 10:36 ` Robert Shearman
2015-03-07 21:12 ` Eric W. Biederman
2015-02-25 17:19 ` [PATCH net-next 7/8] mpls: Multicast route table change notifications Eric W. Biederman
2015-02-26 7:21 ` roopa
2015-02-26 14:03 ` Eric W. Biederman
2015-02-26 15:12 ` roopa
2015-03-05 1:56 ` Andy Gospodarek
2015-02-25 17:37 ` [PATCH iproute2] mpls: Add basic mpls support to iproute Eric W. Biederman
2015-02-26 6:58 ` [PATCH net-next 0/8] Basic MPLS support roopa
2015-02-27 21:21 ` David Miller
2015-02-28 0:58 ` Eric W. Biederman
2015-03-02 0:05 ` Shrijeet Mukherjee
2015-03-02 4:03 ` David Miller
2015-03-02 5:10 ` Eric W. Biederman
2015-03-02 5:53 ` David Miller
2015-03-02 5:59 ` [PATCH net-next 0/15] Neighbour table and ax25 cleanups Eric W. Biederman
2015-03-02 5:59 ` [PATCH net-next 01/15] ax25: In ax25_rebuild_header add missing kfree_skb Eric W. Biederman
2015-03-02 6:01 ` [PATCH net-next 02/15] rose: Set the destination address in rose_header Eric W. Biederman
2015-03-02 6:02 ` [PATCH net-next 03/15] rose: Transmit packets in rose_xmit not rose_rebuild_header Eric W. Biederman
2015-03-02 6:03 ` [PATCH net-next 04/15] ax25/kiss: Replace ax_header_ops with ax25_header_ops Eric W. Biederman
2015-03-02 6:03 ` [PATCH net-next 05/15] ax25/6pack: Replace sp_header_ops " Eric W. Biederman
2015-03-02 6:04 ` [PATCH net-next 06/15] ax25: Make ax25_header and ax25_rebuild_header static Eric W. Biederman
2015-03-02 6:05 ` [PATCH net-next 07/15] ax25: Refactor to use private neighbour operations Eric W. Biederman
2015-03-02 6:06 ` [PATCH net-next 08/15] arp: Remove special case to give AX25 it's open arp operations Eric W. Biederman
2015-03-02 6:07 ` [PATCH net-next 09/15] neigh: Move neigh_compat_output into ax25_ip.c Eric W. Biederman
2015-03-02 6:08 ` [PATCH net-next 10/15] ax25: Stop calling/abusing dev_rebuild_header Eric W. Biederman
2015-03-02 6:09 ` [PATCH net-next 11/15] ax25: Stop depending on arp_find Eric W. Biederman
2015-03-02 6:11 ` [PATCH net-next 12/15] net: Kill dev_rebuild_header Eric W. Biederman
2015-03-02 6:12 ` [PATCH net-next 13/15] arp: Kill arp_find Eric W. Biederman
2015-03-02 6:13 ` [PATCH net-next 14/15] neigh: Don't require dst in neigh_hh_init Eric W. Biederman
2015-03-02 6:14 ` [PATCH net-next 15/15] neigh: Don't require a dst in neigh_resolve_output Eric W. Biederman
2015-03-02 21:44 ` [PATCH net-next 0/15] Neighbour table and ax25 cleanups David Miller
2015-03-03 15:41 ` [PATCH net-next] ax25: Stop using magic neighbour cache operations Eric W. Biederman
2015-03-03 19:45 ` David Miller
2015-03-03 20:22 ` Eric W. Biederman
2015-03-03 20:33 ` David Miller
2015-03-03 23:09 ` [PATCH net-next 0/2] Neighbour table prep for MPLS Eric W. Biederman
2015-03-03 23:10 ` [PATCH net-next 1/2] neigh: Factor out ___neigh_lookup_noref Eric W. Biederman
2015-03-04 14:53 ` Andy Gospodarek
2015-03-04 15:58 ` Eric W. Biederman
2015-03-04 16:30 ` Andy Gospodarek
2015-03-03 23:11 ` [PATCH net-next 2/2] neigh: Add helper function neigh_xmit Eric W. Biederman
2015-03-04 1:06 ` [PATCH net-next 0/7] Basic MPLS support take 2 Eric W. Biederman
2015-03-04 1:10 ` [PATCH net-next 1/7] mpls: Refactor how the mpls module is built Eric W. Biederman
2015-03-04 1:10 ` [PATCH net-next 2/7] mpls: Basic routing support Eric W. Biederman
2015-03-05 16:36 ` Vivek Venkatraman
2015-03-05 18:42 ` Eric W. Biederman
2015-03-04 1:11 ` [PATCH net-next 3/7] mpls: Add a sysctl to control the size of the mpls label table Eric W. Biederman
2015-03-05 9:45 ` Vivek Venkatraman
2015-03-05 13:22 ` Eric W. Biederman
2015-03-05 14:38 ` Eric W. Biederman
2015-03-05 16:49 ` Vivek Venkatraman
2015-03-04 1:12 ` [PATCH net-next 4/7] mpls: Basic support for adding and removing routes Eric W. Biederman
2015-03-04 8:13 ` roopa
2015-03-04 20:36 ` Eric W. Biederman
2015-03-05 0:30 ` roopa
2015-03-05 2:50 ` Bill Fink
2015-03-05 11:54 ` Eric W. Biederman
2015-03-05 19:10 ` Bill Fink
2015-03-04 1:13 ` [PATCH net-next 5/7] mpls: Functions for reading and wrinting mpls labels over netlink Eric W. Biederman
2015-03-04 1:13 ` [PATCH net-next 6/7] mpls: Netlink commands to add, remove, and dump routes Eric W. Biederman
2015-03-04 1:14 ` [PATCH net-next 7/7] mpls: Multicast route table change notifications Eric W. Biederman
2015-03-04 5:27 ` [PATCH net-next 0/7] Basic MPLS support take 2 David Miller
2015-03-04 6:13 ` Eric W. Biederman
2015-03-04 5:25 ` [PATCH net-next 0/2] Neighbour table prep for MPLS David Miller
2015-03-04 5:53 ` Eric W. Biederman
2015-03-04 14:56 ` Andy Gospodarek
2015-03-04 21:04 ` David Miller
2015-03-05 12:35 ` Eric W. Biederman
2015-03-05 10:14 ` [PATCH net-next] ax25: Stop using magic neighbour cache operations Steven Whitehouse
2015-03-06 20:44 ` Eric W. Biederman
2015-03-14 0:33 ` Steven Whitehouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMs_D19uPsshwSDtwP44GCJNO-xeWDVP7Lp03UnC5c7+J2RtPQ@mail.gmail.com \
--to=vivek@cumulusnetworks.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=netdev@vger.kernel.org \
--cc=roopa@cumulusnetworks.com \
--cc=santiago@crfreenet.org \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).