* MPLS outbound packets being dropped
@ 2015-12-06 10:20 Sam Russell
2015-12-07 5:56 ` Sam Russell
2015-12-07 10:40 ` Robert Shearman
0 siblings, 2 replies; 6+ messages in thread
From: Sam Russell @ 2015-12-06 10:20 UTC (permalink / raw)
To: netdev
tl;dr mpls_output expects skb->protocol to be set to correct
ethertype, but it isn't
https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
Problem:
I set up two interfaces pointed at each other, and added a static arp
entry to minimise complexity
ifconfig enp0s8 10.0.0.1/24 up
ifconfig enp0s9 up
arp -s 10.0.0.5 00:12:34:56:78:90
I then added an MPLS route
./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet
10.0.0.5 dev enp0s8
I then tried to ping an IP in this route but got errors back
ping 192.168.2.1
* PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
* ping: sendmsg: Invalid argument
I then recorded calls to skb_kfree
./tools/perf/perf record -e skb:kfree_skb -g -a
This gave me the following packet trace:
100.00% 100.00% ping [kernel.kallsyms] [k] kfree_skb
|
---kfree_skb
mpls_output
lwtunnel_output
ip_local_out_sk
ip_send_skb
ip_push_pending_frames
raw_sendmsg
inet_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
entry_SYSCALL_64_fastpath
sendmsg
0
I then went through mpls_output.c and put printk() at every call to
"goto drop" and found that this was being hit after failing to match
skb->protocol
https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64
My understanding is that skb->protocol is normally set after
dst_output. For example, a ping packet hitting a normal IPv4 route
should follow something like:
raw_sendmsg
ip_push_pending_frames
ip_send_skb
ip_local_out_sk
dst_output
ip_output
ip_output() is the first place where skb->protocol gets set
https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356
The path that a packet follows when hitting an MPLS route is as follows:
raw_sendmsg
ip_push_pending_frames
ip_send_skb
ip_local_out_sk
dst_output
lwtunnel_output
mpls_output
lwtunnel_output merely routes to the correct output function (mpls_output)
mpls_output expects skb->protocol to be set, but nothing has set it
yet, so it drops the packet!
Any suggestions on how mpls_output should detect the protocol?
Setup:
Ubuntu 15.10
iproute2 built from head
Kernel 4.3, both ubuntu mainline and home-built:
make menuconfig, add lwtunnel, mpls-router, mpls-gso and mpls-iptunnel
Running virtualbox on OSX
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: MPLS outbound packets being dropped 2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell @ 2015-12-07 5:56 ` Sam Russell 2015-12-07 10:40 ` Robert Shearman 1 sibling, 0 replies; 6+ messages in thread From: Sam Russell @ 2015-12-07 5:56 UTC (permalink / raw) To: netdev I can confirm that MPLS packets are sent correctly if the conditional at mpls_iptunnel.c:57-65 is removed and just replaced with lines 58-59: ttl = ip_hdr(skb)->ttl; rt = (struct rtable *)dst; Obviously this isn't sufficient for IPv6, but is a workaround that enables MPLS-encapsulated packets to be sent from Linux in the meantime On 6 December 2015 at 23:20, Sam Russell <sam.h.russell@gmail.com> wrote: > tl;dr mpls_output expects skb->protocol to be set to correct > ethertype, but it isn't > > https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64 > > Problem: > > I set up two interfaces pointed at each other, and added a static arp > entry to minimise complexity > > ifconfig enp0s8 10.0.0.1/24 up > ifconfig enp0s9 up > arp -s 10.0.0.5 00:12:34:56:78:90 > > I then added an MPLS route > > ./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet > 10.0.0.5 dev enp0s8 > > I then tried to ping an IP in this route but got errors back > > ping 192.168.2.1 > * PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data. > * ping: sendmsg: Invalid argument > > I then recorded calls to skb_kfree > > ./tools/perf/perf record -e skb:kfree_skb -g -a > > This gave me the following packet trace: > > 100.00% 100.00% ping [kernel.kallsyms] [k] kfree_skb > | > ---kfree_skb > mpls_output > lwtunnel_output > ip_local_out_sk > ip_send_skb > ip_push_pending_frames > raw_sendmsg > inet_sendmsg > sock_sendmsg > ___sys_sendmsg > __sys_sendmsg > sys_sendmsg > entry_SYSCALL_64_fastpath > sendmsg > 0 > > I then went through mpls_output.c and put printk() at every call to > "goto drop" and found that this was being hit after failing to match > skb->protocol > > https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64 > > My understanding is that skb->protocol is normally set after > dst_output. For example, a ping packet hitting a normal IPv4 route > should follow something like: > > raw_sendmsg > ip_push_pending_frames > ip_send_skb > ip_local_out_sk > dst_output > ip_output > > ip_output() is the first place where skb->protocol gets set > > https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356 > > The path that a packet follows when hitting an MPLS route is as follows: > > raw_sendmsg > ip_push_pending_frames > ip_send_skb > ip_local_out_sk > dst_output > lwtunnel_output > mpls_output > > lwtunnel_output merely routes to the correct output function (mpls_output) > mpls_output expects skb->protocol to be set, but nothing has set it > yet, so it drops the packet! > > Any suggestions on how mpls_output should detect the protocol? > > Setup: > > Ubuntu 15.10 > > iproute2 built from head > > Kernel 4.3, both ubuntu mainline and home-built: > make menuconfig, add lwtunnel, mpls-router, mpls-gso and mpls-iptunnel > > Running virtualbox on OSX ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: MPLS outbound packets being dropped 2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell 2015-12-07 5:56 ` Sam Russell @ 2015-12-07 10:40 ` Robert Shearman 2015-12-07 12:53 ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman 1 sibling, 1 reply; 6+ messages in thread From: Robert Shearman @ 2015-12-07 10:40 UTC (permalink / raw) To: Sam Russell, netdev; +Cc: roopa On 06/12/15 10:20, Sam Russell wrote: > tl;dr mpls_output expects skb->protocol to be set to correct > ethertype, but it isn't > > https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64 > > Problem: > > I set up two interfaces pointed at each other, and added a static arp > entry to minimise complexity > > ifconfig enp0s8 10.0.0.1/24 up > ifconfig enp0s9 up > arp -s 10.0.0.5 00:12:34:56:78:90 > > I then added an MPLS route > > ./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet > 10.0.0.5 dev enp0s8 > > I then tried to ping an IP in this route but got errors back > > ping 192.168.2.1 > * PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data. > * ping: sendmsg: Invalid argument > > I then recorded calls to skb_kfree > > ./tools/perf/perf record -e skb:kfree_skb -g -a > > This gave me the following packet trace: > > 100.00% 100.00% ping [kernel.kallsyms] [k] kfree_skb > | > ---kfree_skb > mpls_output > lwtunnel_output > ip_local_out_sk > ip_send_skb > ip_push_pending_frames > raw_sendmsg > inet_sendmsg > sock_sendmsg > ___sys_sendmsg > __sys_sendmsg > sys_sendmsg > entry_SYSCALL_64_fastpath > sendmsg > 0 > > I then went through mpls_output.c and put printk() at every call to > "goto drop" and found that this was being hit after failing to match > skb->protocol > > https://github.com/torvalds/linux/blob/ede2059dbaf9c6557a49d466c8c7778343b208ff/net/mpls/mpls_iptunnel.c#L64 > > My understanding is that skb->protocol is normally set after > dst_output. For example, a ping packet hitting a normal IPv4 route > should follow something like: > > raw_sendmsg > ip_push_pending_frames > ip_send_skb > ip_local_out_sk > dst_output > ip_output > > ip_output() is the first place where skb->protocol gets set > > https://github.com/torvalds/linux/blob/dbd3393c56a8794fe596e7dd20d0efa613b9cf61/net/ipv4/ip_output.c#L356 > > The path that a packet follows when hitting an MPLS route is as follows: > > raw_sendmsg > ip_push_pending_frames > ip_send_skb > ip_local_out_sk > dst_output > lwtunnel_output > mpls_output > > lwtunnel_output merely routes to the correct output function (mpls_output) > mpls_output expects skb->protocol to be set, but nothing has set it > yet, so it drops the packet! > > Any suggestions on how mpls_output should detect the protocol? Thanks for reporting this and for your analysis. We could write wrappers to lwtunnel_output for the v4 and v6 cases that set the protocol accordingly and then call lwtunnel_output, but since mpls_output relies on the AF-specific type of dst I think the simpler fix is to just test the type of the dst in mpls_output rather than skb->protocol. Thanks, Rob ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net] mpls: fix sending of local encapped packets 2015-12-07 10:40 ` Robert Shearman @ 2015-12-07 12:53 ` Robert Shearman 2015-12-07 17:53 ` Sam Russell 2015-12-07 21:33 ` David Miller 0 siblings, 2 replies; 6+ messages in thread From: Robert Shearman @ 2015-12-07 12:53 UTC (permalink / raw) To: davem; +Cc: netdev, roopa, ebiederm, Sam Russell, Robert Shearman Locally generated IPv4 and (probably) IPv6 packets are dropped because skb->protocol isn't set. We could write wrappers to lwtunnel_output for IPv4 and IPv6 that set the protocol accordingly and then call lwtunnel_output, but mpls_output relies on the AF-specific type of dst anyway to get the via address. Therefore, make use of dst->dst_ops->family in mpls_output to determine the type of nexthop and thus protocol of the packet instead of checking skb->protocol. Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry") Reported-by: Sam Russell <sam.h.russell@gmail.com> Signed-off-by: Robert Shearman <rshearma@brocade.com> --- net/mpls/mpls_iptunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c index 67591aef9cae..64afd3d0b144 100644 --- a/net/mpls/mpls_iptunnel.c +++ b/net/mpls/mpls_iptunnel.c @@ -54,10 +54,10 @@ int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb) unsigned int ttl; /* Obtain the ttl */ - if (skb->protocol == htons(ETH_P_IP)) { + if (dst->ops->family == AF_INET) { ttl = ip_hdr(skb)->ttl; rt = (struct rtable *)dst; - } else if (skb->protocol == htons(ETH_P_IPV6)) { + } else if (dst->ops->family == AF_INET6) { ttl = ipv6_hdr(skb)->hop_limit; rt6 = (struct rt6_info *)dst; } else { -- 2.1.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net] mpls: fix sending of local encapped packets 2015-12-07 12:53 ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman @ 2015-12-07 17:53 ` Sam Russell 2015-12-07 21:33 ` David Miller 1 sibling, 0 replies; 6+ messages in thread From: Sam Russell @ 2015-12-07 17:53 UTC (permalink / raw) To: Robert Shearman; +Cc: davem, netdev, roopa, ebiederm I can confirm that this patch works for IPv4 and IPv6 modprobe mpls_router modprobe mpls_gso modprobe mpls_iptunnel ifconfig enp0s8 192.168.99.2/24 up ifconfig enp0s9 up arp -s 192.168.99.5 00:12:34:56:78:90 ip -6 neigh add fe80::a00:27ff:1234:5678 lladdr 00:12:34:56:78:90 dev enp0s8 ./dev/iproute2/ip/ip route add 192.168.2.0/24 encap mpls 100 via inet 192.168.99.5 dev enp0s8 ./dev/iproute2/ip/ip -6 route add 2404:1234:5678::/48 encap mpls 150 via inet6 fe80::a00:27ff:1234:5678 dev enp0s8 ping 192.168.2.1 tcpdump: listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes 06:46:16.780466 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64) (tos 0x0, ttl 64, id 24189, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 1, length 64 06:46:17.790616 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64) (tos 0x0, ttl 64, id 24297, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 2, length 64 06:46:18.791064 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS unicast (0x8847), length 102: MPLS (label 100, exp 0, [S], ttl 64) (tos 0x0, ttl 64, id 24338, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.99.2 > 192.168.2.1: ICMP echo request, id 962, seq 3, length 64 ping6 2404:1234:5678::1 tcpdump: listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes 06:51:15.235286 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS unicast (0x8847), length 122: MPLS (label 150, exp 0, [S], ttl 64) (flowlabel 0xc4081, hlim 64, next-header ICMPv6 (58) payload length: 64) fe80::a00:27ff:fe39:3ce0 > 2404:1234:5678::1: [icmp6 sum ok] ICMP6, echo request, seq 1 06:51:16.236016 08:00:27:39:3c:e0 > 00:12:34:56:78:90, ethertype MPLS unicast (0x8847), length 122: MPLS (label 150, exp 0, [S], ttl 64) (flowlabel 0xc4081, hlim 64, next-header ICMPv6 (58) payload length: 64) fe80::a00:27ff:fe39:3ce0 > 2404:1234:5678::1: [icmp6 sum ok] ICMP6, echo request, seq 2 Thanks for the quick response! Cheers Sam On 8 December 2015 at 01:53, Robert Shearman <rshearma@brocade.com> wrote: > Locally generated IPv4 and (probably) IPv6 packets are dropped because > skb->protocol isn't set. We could write wrappers to lwtunnel_output > for IPv4 and IPv6 that set the protocol accordingly and then call > lwtunnel_output, but mpls_output relies on the AF-specific type of dst > anyway to get the via address. > > Therefore, make use of dst->dst_ops->family in mpls_output to > determine the type of nexthop and thus protocol of the packet instead > of checking skb->protocol. > > Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry") > Reported-by: Sam Russell <sam.h.russell@gmail.com> > Signed-off-by: Robert Shearman <rshearma@brocade.com> > --- > net/mpls/mpls_iptunnel.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c > index 67591aef9cae..64afd3d0b144 100644 > --- a/net/mpls/mpls_iptunnel.c > +++ b/net/mpls/mpls_iptunnel.c > @@ -54,10 +54,10 @@ int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb) > unsigned int ttl; > > /* Obtain the ttl */ > - if (skb->protocol == htons(ETH_P_IP)) { > + if (dst->ops->family == AF_INET) { > ttl = ip_hdr(skb)->ttl; > rt = (struct rtable *)dst; > - } else if (skb->protocol == htons(ETH_P_IPV6)) { > + } else if (dst->ops->family == AF_INET6) { > ttl = ipv6_hdr(skb)->hop_limit; > rt6 = (struct rt6_info *)dst; > } else { > -- > 2.1.4 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] mpls: fix sending of local encapped packets 2015-12-07 12:53 ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman 2015-12-07 17:53 ` Sam Russell @ 2015-12-07 21:33 ` David Miller 1 sibling, 0 replies; 6+ messages in thread From: David Miller @ 2015-12-07 21:33 UTC (permalink / raw) To: rshearma; +Cc: netdev, roopa, ebiederm, sam.h.russell From: Robert Shearman <rshearma@brocade.com> Date: Mon, 7 Dec 2015 12:53:15 +0000 > Locally generated IPv4 and (probably) IPv6 packets are dropped because > skb->protocol isn't set. We could write wrappers to lwtunnel_output > for IPv4 and IPv6 that set the protocol accordingly and then call > lwtunnel_output, but mpls_output relies on the AF-specific type of dst > anyway to get the via address. > > Therefore, make use of dst->dst_ops->family in mpls_output to > determine the type of nexthop and thus protocol of the packet instead > of checking skb->protocol. > > Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry") > Reported-by: Sam Russell <sam.h.russell@gmail.com> > Signed-off-by: Robert Shearman <rshearma@brocade.com> Applied, thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-12-07 21:33 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-06 10:20 MPLS outbound packets being dropped Sam Russell 2015-12-07 5:56 ` Sam Russell 2015-12-07 10:40 ` Robert Shearman 2015-12-07 12:53 ` [PATCH net] mpls: fix sending of local encapped packets Robert Shearman 2015-12-07 17:53 ` Sam Russell 2015-12-07 21:33 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).