From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Shearman Subject: Re: [PATCH net-next] mpls: Infer payload of packet from via address family. Date: Wed, 11 Mar 2015 22:02:06 +0000 Message-ID: <5500BB5E.4070909@brocade.com> References: <1426078702-23246-1-git-send-email-rshearma@brocade.com> <87sidba1ea.fsf@x220.int.ebiederm.org> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Cc: "davem@davemloft.net" , "netdev@vger.kernel.org" To: "Eric W. Biederman" Return-path: Received: from mx0b-000f0801.pphosted.com ([67.231.152.113]:49042 "EHLO mx0b-000f0801.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751488AbbCKWCO (ORCPT ); Wed, 11 Mar 2015 18:02:14 -0400 In-Reply-To: <87sidba1ea.fsf@x220.int.ebiederm.org> Sender: netdev-owner@vger.kernel.org List-ID: On 11/03/15 17:29, Eric W. Biederman wrote: > Robert Shearman writes: > >> This ensures that if a routing protocol incorrectly advertises a label >> for a prefix whose address-family is inconsistent with that of the >> nexthop, then the traffic will be dropped, rather than the issue being >> silently worked around. > > The address family of the next hop need have no particular relationship > to the address families you can send to the next hop. As such I > consider the behavior your are proposing here actively wrong. It > appears to block valid use cases such as using a single mpls label > to carry both ipv4 and ipv6 traffic simply because we use an ipv4 next > hop. > > I am not opposed to adding configurability to force the issue, > especially as unexpected traffic may be a problem but I don't think > that should be the default. Having read RFC 3032 section 2.2, I realise that my original intention was misguided. However, as labels are allocated and advertised locally and so the implementation gets to decide what labels are used for, then I think the statement of this being actively wrong is a bit too strong. Allowing the carrying of IPv4 and IPv6 traffic using the same attached nexthop route means our implementation to won't be able to use a single cached hardware header associated with the neighbour (although admittedly it would require reworking for it to be usable here anyway). In my experience of MPLS (which admittedly lacks TP) there is no MPLS application that will allocate a label to forward traffic to an attached nexthop that will carry both IPv4 and IPv6 traffic. The only exception I can think of is statically configured labels, but there is no reason why you can't use two labels with appropriate nexthops. Since adding such a restriction after this has shipped will have implications for anybody relying on it, we should consider the implications of being too liberal with what the kernel accepts. > I think the default for a tunnel egress should > be assume everyone sticking packets in that tunnel are playing nice so > we should decode as much as possible. Naturally - the entire MPLS core is one big trust domain and the PEs have to be trusted to be doing the right thing. My point was more about restricting what the local control plane can do in terms of allocating labels while we have the freedom to do so. That raises a related issue - as it stands today, with the kernel module loaded any interface can receive and process MPLS traffic. If this implementation is to act as a PE, then it's imperative we come up with a mechanism of ensuring that MPLS packets are only processed on interfaces that the operator has explicitly configured. > >> The accessible skb length should also be validated prior to the IPv4 >> or IPv6 headers being accessed, since only the label header will have >> previously been validated. > > I agree I goofed by not including the appropriate pskb_may_pull checks. > And that needs to be fixed. > >> Rename mpls_egress to mpls_egress_to_ip to make it more obvious that >> the function is used for traffic going out as IP, not for labeled >> traffic (or for the not-yet-implemented pseudo-wires). > > I disagree. > > The name of the function needs to be mpls_egress, and it should be > eventually expanded to handle as many cases are reasonable. With the > default being to start the decode of packets by looking at the first > nibble. > > Without explicit configuration it seems entirely reasonable to assume > that if the first nibble is 4 it is an ipv4 packet. If the first nibble > is 6 it is an ipv6 packet. If the first nibble is 1 it is a generic > association channel. If the first nibble is 0 it has a control word and > it is a pseudo wire where the output tunnel type matches the output > device. The intention of RFC 4385 was that the control word be used to prevent the payload of an MPLS packet being wrongly as interpreted by intermediate routers (e.g. as the RFC states for ECMP, or for ICMP generation). IMHO the intention wasn't to allow the first nibble to be used as a protocol field with 0/1 meaning L2. While I can't think of any concrete concerns, I do feel very uneasy about a payload being sent out directly as L2 without the control plane explicitly specifying this as the intention for the label route. > A handful of pseudo wires do things differently. SONET sets bits in the > first nibble, Ethernet has cases where it does not include the control > word and as such the first nible might not be zero. And then we have > oddball cases that need configuration such as should the ethernet > control words sequence number be honored. Just to clarify, the issue is more that in such cases without a control word the first nibble could be 0, 1, 4 or 6. I've seen PW deployments not using control words due to some devices not supporting them. I agree that we'll have to consider how the configuration of such cases will work. > I admit that supporting ethernet and similiar pseudo wires will require > the arguments to mpls_egress to be changed a little so that we can skip > taking the next hop address link layer address from the mpls_route, but > that does not mean we should just through it under a bus. > > Fundamentally mpls_egress is the function that we call when the bottom > of stack indicator is reached. It should either figure out that the > packet can be forwarded or it should indicate that the packet should be > dropped. I don't want to get too drawn into a discussion on what the code will look like with PW implemented, but I had imagined that mpls_forward would be refactored such that it would be known from the route lookup that this was an PW route with or without a control word, and the parsing of the first nibble wouldn't be necessary. Thanks, Rob