All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Robert Shearman <rshearma@brocade.com>
Cc: <netdev@vger.kernel.org>, roopa <roopa@cumulusnetworks.com>,
	Thomas Graf <tgraf@suug.ch>
Subject: Re: [RFC net-next 0/3] IP imposition of per-nh MPLS encap
Date: Tue, 02 Jun 2015 16:10:34 -0500	[thread overview]
Message-ID: <87d21dolyt.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <556E18BE.8060505@brocade.com> (Robert Shearman's message of "Tue, 2 Jun 2015 21:57:34 +0100")

Robert Shearman <rshearma@brocade.com> writes:

> On 02/06/15 19:11, Eric W. Biederman wrote:
>> Robert Shearman <rshearma@brocade.com> writes:
>>
>>> In order to be able to function as a Label Edge Router in an MPLS
>>> network, it is necessary to be able to take IP packets and impose an
>>> MPLS encap and forward them out. The traditional approach of setting
>>> up an interface for each "tunnel" endpoint doesn't scale for the
>>> common MPLS use-cases where each IP route tends to be assigned a
>>> different label as encap.
>>>
>>> The solution suggested here for further discussion is to provide the
>>> facility to define encap data on a per-nexthop basis using a new
>>> netlink attribue, RTA_ENCAP, which would be opaque to the IPv4/IPv6
>>> forwarding code, but interpreted by the virtual interface assigned to
>>> the nexthop.
>>>
>>> A new ipmpls interface type is defined to show the use of this
>>> facility to allow IP packets to be imposed with an MPLS
>>> encap. However, the facility is designed to be general enough to be
>>> used by any encapsulation/tunneling mechanism that has similar
>>> requirements of high-scale, high-variation-of-encap.
>>
>> I am still digging into the details but adding a new network device to
>> make this possible if very undesirable.
>>
>> It is a pain point.  Those network devices get to be a major source of
>> memory consumption when there are 4K network namespaces in existence.
>>
>> It is conceptually wrong.  The network device will never be used as an
>> ordinary network device.  All the network device gives you is the
>> ability to avoid creating an enumeration of different kinds of
>> encapsulation.
>
> This isn't true. The network device also gives some of the things you
> take for granted. Things like fragmentation through specifying the mtu
> on the shared tunnel device, being able to specify rules using the
> shared tunnel output device, IP stats, and the ability specify a
> different destination namespace.

Granted you get a few more things.  It is still conceptually wrong as
the network device will netver be used as an ordinary network device.

Fragmentation is already silly because we are talking about multiple
tunnels with different properties.  You need per-route mtu to handle
that case.

Further I am not saying you don't need an output device (which is what
is needed to specify a different destination namespace) I am saying that
having a funny mpls device is wrong as far as I can see.  Certainly it
is a lot of bloody unnecessary overhead.

If we are going to design for maximum scaling (and 1 million+ routes)
sounds like maximum scaling we should see how far we can go without
dragging in the horrible heaviness of additional network devices.  35K a
piece last I measured it.  Just a small handful of them are already
scaling issues for network namespaces.

Eric

  reply	other threads:[~2015-06-02 21:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-01 16:46 [RFC net-next 0/3] IP imposition of per-nh MPLS encap Robert Shearman
2015-06-01 16:46 ` [RFC net-next 1/3] net: infra for per-nexthop encap data Robert Shearman
2015-06-02 18:15   ` Eric W. Biederman
2015-06-01 16:46 ` [RFC net-next 2/3] ipv4: storing and retrieval of per-nexthop encap Robert Shearman
2015-06-02 16:01   ` roopa
2015-06-02 16:35     ` Robert Shearman
2015-06-01 16:46 ` [RFC net-next 3/3] mpls: new ipmpls device for encapsulating IP packets as mpls Robert Shearman
2015-06-02 16:15   ` roopa
2015-06-02 16:33     ` Robert Shearman
2015-06-02 18:57       ` roopa
2015-06-02 21:06         ` Robert Shearman
2015-06-03 18:43           ` Vivek Venkatraman
2015-06-04 18:46             ` Robert Shearman
2015-06-04 21:38               ` Vivek Venkatraman
2015-06-02 18:26   ` Eric W. Biederman
2015-06-02 21:37     ` Thomas Graf
2015-06-02 22:48       ` Eric W. Biederman
2015-06-02 23:23       ` Eric W. Biederman
2015-06-03  9:50         ` Thomas Graf
2015-06-02  0:06 ` [RFC net-next 0/3] IP imposition of per-nh MPLS encap Thomas Graf
2015-06-02 13:28   ` Robert Shearman
2015-06-02 21:43     ` Thomas Graf
2015-06-03 13:30       ` Robert Shearman
2015-06-02 15:31 ` roopa
2015-06-02 18:30   ` Eric W. Biederman
2015-06-02 18:39     ` roopa
2015-06-02 18:11 ` Eric W. Biederman
2015-06-02 20:57   ` Robert Shearman
2015-06-02 21:10     ` Eric W. Biederman [this message]
2015-06-02 22:15       ` Robert Shearman
2015-06-02 22:58         ` Eric W. Biederman
2015-06-04 15:12           ` Nicolas Dichtel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d21dolyt.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=netdev@vger.kernel.org \
    --cc=roopa@cumulusnetworks.com \
    --cc=rshearma@brocade.com \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.