IPv6 routing/fragmentation panic

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* IPv6 routing/fragmentation panic
@ 2015-09-15 15:53 David Woodhouse
  2015-09-15 16:07 ` Michal Kubecek
  2015-09-15 23:48 ` Florian Westphal
  0 siblings, 2 replies; 8+ messages in thread
From: David Woodhouse @ 2015-09-15 15:53 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 6306 bytes --]

I can repeatably crash my router with 'ping6 -s 2000' to an external
machine:

[   61.741618] skbuff: skb_under_panic: text:c1277f1e len:1294 put:14 head:dec98000 data:dec97ffc tail:0xdec9850a end:0xdec98f40 dev:br-lan
[   61.754128] ------------[ cut here ]------------
[   61.758754] Kernel BUG at c1201b1f [verbose debug info unavailable]
[   61.764005] invalid opcode: 0000 [#1] 
[   61.764005] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586
[   61.764005] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0+ #2
[   61.764005] task: c138d540 ti: c1386000 task.ti: c1386000
[   61.764005] EIP: 0060:[<c1201b1f>] EFLAGS: 00210286 CPU: 0
[   61.764005] EIP is at skb_panic+0x3b/0x3d
[   61.764005] EAX: 0000007c EBX: deca3000 ECX: c13a0910 EDX: c139f3c4
[   61.764005] ESI: dee85d8c EDI: dec9800a EBP: defe3b40 ESP: dec0bd50
[   61.764005]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[   61.764005] CR0: 8005003b CR2: b7704474 CR3: 1ef0d000 CR4: 00000090
[   61.764005] Stack:
[   61.764005]  c135e48c c12e1580 c1277f1e 0000050e 0000000e dec98000 dec97ffc dec9850a
[   61.764005]  dec98f40 deca3000 dee85d00 c120337b c12e1580 c1277f1e 00000000 0000000e
[   61.764005]  dee85d7c ff671e02 deca3000 c109afd3 00200282 00001d91 00000028 dec98012
[   61.764005] Call Trace:
[   61.764005]  [<c1277f1e>] ? ip6_finish_output2+0x196/0x4da
[   61.764005]  [<c120337b>] ? skb_push+0x2c/0x2c
[   61.764005]  [<c1277f1e>] ? ip6_finish_output2+0x196/0x4da
[   61.764005]  [<c109afd3>] ? __kmalloc_track_caller+0x5a/0xd9
[   61.764005]  [<c10840d6>] ? kmemdup+0x15/0x4a
[   61.764005]  [<c1277d88>] ? ip6_forward_finish+0xa/0xa
[   61.764005]  [<c127aad4>] ? ip6_fragment+0x924/0xb49
[   61.764005]  [<c1277d88>] ? ip6_forward_finish+0xa/0xa
[   61.764005]  [<c122fda6>] ? nf_hook_slow+0x50/0x92
[   61.764005]  [<c127ae23>] ? ip6_output+0x85/0xeb
[   61.764005]  [<c127acf9>] ? ip6_fragment+0xb49/0xb49
[   61.764005]  [<c1279fa0>] ? ip6_forward+0x4a9/0x6b9
[   61.764005]  [<c1277d7e>] ? ac6_proc_exit+0xd/0xd
[   61.764005]  [<c127b46c>] ? ip6_make_skb+0x15f/0x15f
[   61.764005]  [<c127b4e6>] ? ip6_rcv_finish+0x7a/0x7e
[   61.764005]  [<e02bb0c4>] ? ipv6_defrag+0xc3/0xc5 [nf_defrag_ipv6]
[   61.764005]  [<c127b46c>] ? ip6_make_skb+0x15f/0x15f
[   61.764005]  [<c122fd4d>] ? nf_iterate+0x5b/0x64
[   61.764005]  [<c122fda6>] ? nf_hook_slow+0x50/0x92
[   61.764005]  [<c127babb>] ? ipv6_rcv+0x305/0x470
[   61.764005]  [<c127b46c>] ? ip6_make_skb+0x15f/0x15f
[   61.764005]  [<c120c04d>] ? __netif_receive_skb_core+0x643/0x836
[   61.764005]  [<c100632a>] ? nommu_map_page+0x2d/0x4d
[   61.764005]  [<e036a153>] ? solos_bh+0x681/0x751 [solos_pci]
[   61.764005]  [<c120dc51>] ? process_backlog+0x45/0x96
[   61.764005]  [<c120d6a9>] ? net_rx_action+0x15b/0x238
[   61.764005]  [<c102dcfc>] ? __do_softirq+0xb4/0x18a
[   61.764005]  [<c102dc48>] ? __hrtimer_tasklet_trampoline+0x12/0x12
[   61.764005]  [<c1002feb>] ? do_softirq_own_stack+0x1b/0x20
[   61.764005]  <IRQ> 
[   61.764005]  [<c1002e24>] ? do_IRQ+0x38/0x9a
[   61.764005]  [<c12bbd89>] ? common_interrupt+0x29/0x30
[   61.764005]  [<c1007d2e>] ? default_idle+0x2/0x3
[   61.764005]  [<c10080fc>] ? arch_cpu_idle+0x6/0x7
[   61.764005]  [<c1044cf6>] ? cpu_startup_entry+0xed/0x189
[   61.764005]  [<c13bc9ac>] ? start_kernel+0x2e5/0x2e8
[   61.764005] Code: ff b0 9c 00 00 00 ff b0 98 00 00 00 ff b0 a4 00 00 00 ff b0 a0 00 00 00 52 ff 70 54 51 ff 74 24 28 68 8c e4 35 c1 e8 9c 73 0b 00 <0f> 0b 89 c1 83 79 58 00 8b 80 98 00 00 00 75 17 53 8d 1c 10 01
[   61.764005] EIP: [<c1201b1f>] skb_panic+0x3b/0x3d SS:ESP 0068:dec0bd50
[   62.120408] ---[ end trace 45d5375a04f3aef4 ]---
[   62.125034] Kernel panic - not syncing: Fatal exception in interrupt
[   62.130381] Kernel Offset: disabled
[   62.130381] Rebooting in 3 seconds..<FF>

I can 'fix' it thus (which demonstrates that the issue was with incoming
packets arriving over PPPoATM and being routed out the internal Ethernet):

--- drivers/atm/solos-pci.c~	2015-08-31 23:19:23.000000000 +0100
+++ drivers/atm/solos-pci.c	2015-09-15 15:10:42.534125968 +0100
@@ -869,8 +869,9 @@ static void solos_bh(unsigned long card_
 		/* Allocate RX skbs for any ports which need them */
 		if (card->using_dma && card->atmdev[port] &&
 		    !card->rx_skb[port]) {
-			struct sk_buff *skb = alloc_skb(RX_DMA_SIZE, GFP_ATOMIC);
+			struct sk_buff *skb = alloc_skb(RX_DMA_SIZE + 16, GFP_ATOMIC);
 			if (skb) {
+				skb_reserve(skb, 16);
 				SKB_CB(skb)->dma_addr =
 					pci_map_single(card->dev, skb->data,
 						       RX_DMA_SIZE, PCI_DMA_FROMDEVICE);

Now, I probably should have done this a long time ago, because that
lack of headroom probably meant that the machine was always having to
reallocate buffers just to fit the Ethernet header on the front of them
when routing incoming packets. So I might be happy enough with
submitting a variant of the above patch and calling it a performance
improvement.

But should the kernel *panic* without it? If there are requirements on
the headroom I must leave on received packets, where are they
documented? Or is this a bug in the IPv6 fragmentation code, to make
such assumptions?

I'm not entirely sure how to interpret the above stack trace. Is the
incoming IPv6 packet being reassembled for netfilter's benefit, then re
-fragmented for transmission?
 
-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 routing/fragmentation panic
  2015-09-15 15:53 IPv6 routing/fragmentation panic David Woodhouse
@ 2015-09-15 16:07 ` Michal Kubecek
  2015-09-15 23:48 ` Florian Westphal
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Kubecek @ 2015-09-15 16:07 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev

On Tue, Sep 15, 2015 at 04:53:20PM +0100, David Woodhouse wrote:
> I'm not entirely sure how to interpret the above stack trace. Is the
> incoming IPv6 packet being reassembled for netfilter's benefit, then re
> -fragmented for transmission?

Not refragmented. Both the reassembled packet and the original fragments
are kept. Reassembled packet is used for connection tracking and (since
3.13) netfilter rule matching, the original fragments are then forwarded
on (if it passes the rules).

                                                          Michal Kubecek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 routing/fragmentation panic
  2015-09-15 15:53 IPv6 routing/fragmentation panic David Woodhouse
  2015-09-15 16:07 ` Michal Kubecek
@ 2015-09-15 23:48 ` Florian Westphal
  2015-09-16 10:09   ` David Woodhouse
  1 sibling, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2015-09-15 23:48 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev

David Woodhouse <dwmw2@infradead.org> wrote:
> I can repeatably crash my router with 'ping6 -s 2000' to an external
> machine:
> [   61.741618] skbuff: skb_under_panic: text:c1277f1e len:1294 put:14 head:dec98000 data:dec97ffc tail:0xdec9850a end:0xdec98f40 dev:br-lan
> [   61.754128] ------------[ cut here ]------------
> [   61.758754] Kernel BUG at c1201b1f [verbose debug info unavailable]
> [   61.764005] invalid opcode: 0000 [#1] 
> [   61.764005] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_ae
 s cbc arc4 aes_i586
> [   61.764005] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0+ #2
> [   61.764005] task: c138d540 ti: c1386000 task.ti: c1386000
> [   61.764005] EIP: 0060:[<c1201b1f>] EFLAGS: 00210286 CPU: 0
> [   61.764005] EIP is at skb_panic+0x3b/0x3d
> [   61.764005] EAX: 0000007c EBX: deca3000 ECX: c13a0910 EDX: c139f3c4
> [   61.764005] ESI: dee85d8c EDI: dec9800a EBP: defe3b40 ESP: dec0bd50
> [   61.764005]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [   61.764005] CR0: 8005003b CR2: b7704474 CR3: 1ef0d000 CR4: 00000090
> [   61.764005] Stack:
> [   61.764005]  c135e48c c12e1580 c1277f1e 0000050e 0000000e dec98000 dec97ffc dec9850a
> [   61.764005]  dec98f40 deca3000 dee85d00 c120337b c12e1580 c1277f1e 00000000 0000000e
> [   61.764005]  dee85d7c ff671e02 deca3000 c109afd3 00200282 00001d91 00000028 dec98012
> [   61.764005] Call Trace:
> [   61.764005]  [<c1277f1e>] ? ip6_finish_output2+0x196/0x4da

Hmm, unlike ip the ip6 stack doesn't check headroom size before adding hh.

> But should the kernel *panic* without it? If there are requirements on
> the headroom I must leave on received packets, where are they
> documented? Or is this a bug in the IPv6 fragmentation code, to make
> such assumptions?

I'm not sure the ipv6 (re)fragmentation code is to blame here.
In particular, we could have setups where additional headers need to be
inserted which could also require headroom expansion.

> I'm not entirely sure how to interpret the above stack trace. Is the
> incoming IPv6 packet being reassembled for netfilter's benefit, then re
> -fragmented for transmission?

Yes, ipv6 connection tracking depends on defragmentation.

ip6_fragment should use the frag_list of the (reassembled) skb so no
refragmentation should be happening, we should just be re-using the
original fragmented skbs from that fraglist.

What I don't understand is why you see this with fragmented ipv6 packets only
(and not with all ipv6 forwarded skbs).

Something like this copy-pastry from ip_finish_output2 should fix it:

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -62,6 +62,7 @@ static int ip6_finish_output2(struct sock *sk, struct sk_buff *skb)
 	struct net_device *dev = dst->dev;
 	struct neighbour *neigh;
 	struct in6_addr *nexthop;
+	unsigned int hh_len;
 	int ret;
 
 	skb->protocol = htons(ETH_P_IPV6);
@@ -104,6 +105,21 @@ static int ip6_finish_output2(struct sock *sk, struct sk_buff *skb)
 		}
 	}
 
+	hh_len = LL_RESERVED_SPACE(dev);
+	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
+		struct sk_buff *skb2;
+
+		skb2 = skb_realloc_headroom(skb, hh_len);
+		if (!skb2) {
+			kfree_skb(skb);
+			return -ENOMEM;
+		}
+		if (skb->sk)
+			skb_set_owner_w(skb2, skb->sk);
+		consume_skb(skb);
+		skb = skb2;
+	}
+
 	rcu_read_lock_bh();
 	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
 	neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 routing/fragmentation panic
  2015-09-15 23:48 ` Florian Westphal
@ 2015-09-16 10:09   ` David Woodhouse
  2015-09-16 13:27     ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: David Woodhouse @ 2015-09-16 10:09 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, johannes

[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]

On Wed, 2015-09-16 at 01:48 +0200, Florian Westphal wrote:
> 
> What I don't understand is why you see this with fragmented ipv6 
> packets only (and not with all ipv6 forwarded skbs).
> 
> Something like this copy-pastry from ip_finish_output2 should fix it:

That works; thanks.

Tested-by: David Woodhouse <David.Woodhouse@intel.com>

A little extra debugging output shows that the offending fragments were
arriving here with skb_headroom(skb)==10. Which is reasonable, being
the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP
frame type.

The non-fragmented packets, on the other hand, are arriving with a
headroom of 42 bytes. Could something else already have reallocated
them before they get that far? (Do we have any way to gather statistics
on such reallocations? It seems that might be useful for performance
investigation.)

Johannes and I were talking on IRC yesterday about trying to make this
kind of thing easier to reproduce without odd hardware. We postulated a
skb_torture() function which, when an appropriate debugging option was
enabled, would randomly screw around with the skb in various
interesting ways — shifting the data down so that there's no headroom,
deliberately making it *non-linear*, temporarily cloning it and freeing
the clone a couple of seconds later, etc.

Then we could insert calls to skb_torture() in interesting places like
netif_rx(), ip6_finish_output2() and anywhere else that seems
appropriate (perhaps with flags to indicate *what* kind of torture is
permissible in certain locations). And see what breaks...

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 routing/fragmentation panic
  2015-09-16 10:09   ` David Woodhouse
@ 2015-09-16 13:27     ` Florian Westphal
  2015-09-16 13:34       ` David Woodhouse
  2015-09-16 14:00       ` David Woodhouse
  0 siblings, 2 replies; 8+ messages in thread
From: Florian Westphal @ 2015-09-16 13:27 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Florian Westphal, netdev, johannes

David Woodhouse <dwmw2@infradead.org> wrote:
> On Wed, 2015-09-16 at 01:48 +0200, Florian Westphal wrote:
> > 
> > What I don't understand is why you see this with fragmented ipv6 
> > packets only (and not with all ipv6 forwarded skbs).
> > 
> > Something like this copy-pastry from ip_finish_output2 should fix it:
> 
> That works; thanks.
> 
> Tested-by: David Woodhouse <David.Woodhouse@intel.com>
> 
> A little extra debugging output shows that the offending fragments were
> arriving here with skb_headroom(skb)==10. Which is reasonable, being
> the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP
> frame type.
> 
> The non-fragmented packets, on the other hand, are arriving with a
> headroom of 42 bytes. Could something else already have reallocated
> them before they get that far?

Yep.  I missed

        if (skb_cow(skb, dst->dev->hard_header_len)) {

call in ip6_forward().

Problem is of course that we only expand headroom of the skb
and not of the fragment(s) stored in that skbs frag list.

So we have several options for a fix.

- expand headroom in ip6_finish_output2, like we do for ipv4
- expand headroom in ip6_fragment
- defer to slowpath if frags don't have enough headroom.

The latter is the smallest patch and would not add test for locally
generated, non-fragmented skbs.

(not even compile tested)
David, could you test this?  I'd do an official patch submission then.

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -586,6 +586,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb,
 	frag_id = ipv6_select_ident(net, &ipv6_hdr(skb)->daddr,
 				    &ipv6_hdr(skb)->saddr);
 
+	hroom = LL_RESERVED_SPACE(rt->dst.dev);
 	if (skb_has_frag_list(skb)) {
 		int first_len = skb_pagelen(skb);
 		struct sk_buff *frag2;
@@ -599,7 +600,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb,
 			/* Correct geometry. */
 			if (frag->len > mtu ||
 			    ((frag->len & 7) && frag->next) ||
-			    skb_headroom(frag) < hlen)
+			    skb_headroom(frag) < (hlen + hroom))
 				goto slow_path_clean;
 
 			/* Partially cloned skb? */
@@ -724,7 +725,6 @@ slow_path:
 	 */
 
 	*prevhdr = NEXTHDR_FRAGMENT;
-	hroom = LL_RESERVED_SPACE(rt->dst.dev);
 	troom = rt->dst.dev->needed_tailroom;
 
 	/*

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 routing/fragmentation panic
  2015-09-16 13:27     ` Florian Westphal
@ 2015-09-16 13:34       ` David Woodhouse
  2015-09-16 14:00       ` David Woodhouse
  1 sibling, 0 replies; 8+ messages in thread
From: David Woodhouse @ 2015-09-16 13:34 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, johannes

[-- Attachment #1: Type: text/plain, Size: 195 bytes --]

On Wed, 2015-09-16 at 15:27 +0200, Florian Westphal wrote:
> 
> David, could you test this?  I'd do an official patch submission
> then.

Compiles. Doesn't fix the problem.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 routing/fragmentation panic
  2015-09-16 13:27     ` Florian Westphal
  2015-09-16 13:34       ` David Woodhouse
@ 2015-09-16 14:00       ` David Woodhouse
  2015-09-16 15:30         ` Florian Westphal
  1 sibling, 1 reply; 8+ messages in thread
From: David Woodhouse @ 2015-09-16 14:00 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, johannes

[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]

On Wed, 2015-09-16 at 15:27 +0200, Florian Westphal wrote:
> @@ -599,7 +600,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff
> *skb,
>                         /* Correct geometry. */
>                         if (frag->len > mtu ||
>                             ((frag->len & 7) && frag->next) ||
> -                           skb_headroom(frag) < hlen)
> +                           skb_headroom(frag) < (hlen + hroom))
>                                 goto slow_path_clean;
>  
>                         /* Partially cloned skb? */

My test is 'ping -s 2000', and I end up with a fragment of 1280 bytes
followed by a fragment of 776 bytes.

The test cited above is only actually running on the latter fragment
(which for some reason is fine and has headroom of 58 bytes).

The first, larger, fragment isn't being checked. And that's the one
with only 10 bytes of headroom.

[   62.027984] has frag list                                                    
[   62.030616] line 604 check frag ddc5fcc0 len 776 headroom 58 (hlen 40 hroom 16)             
[   62.036720] line 678 send skb ded050c0 len 1280 headroom 10                                
[   62.041096] skbuff: skb_under_panic: text:c125f9ca len:1294 put:14 head:dec89
000 data:dec88ffc tail:0xdec8950a end:0xdec89f50 dev:br-lan                     

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: IPv6 routing/fragmentation panic
  2015-09-16 14:00       ` David Woodhouse
@ 2015-09-16 15:30         ` Florian Westphal
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2015-09-16 15:30 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Florian Westphal, netdev, johannes

David Woodhouse <dwmw2@infradead.org> wrote:
> >                         if (frag->len > mtu ||
> >                             ((frag->len & 7) && frag->next) ||
> > -                           skb_headroom(frag) < hlen)
> > +                           skb_headroom(frag) < (hlen + hroom))
> >                                 goto slow_path_clean;
> >  
> >                         /* Partially cloned skb? */
> 
> My test is 'ping -s 2000', and I end up with a fragment of 1280 bytes
> followed by a fragment of 776 bytes.
> 
> The test cited above is only actually running on the latter fragment
> (which for some reason is fine and has headroom of 58 bytes).
> 
> The first, larger, fragment isn't being checked. And that's the one
> with only 10 bytes of headroom.

Thanks for this detailed analysis.
I've sent a patch that should address all of these issues.

Turns out that all tests are wrong in your case.

ip6_fragment doesn't expand headroom, since this skb had the ipv6
fragment header pulled, so that part thinks there are 18 bytes
available (we later push the frag header back when sending fragments).

The 'skb_headroom(frag) < hlen))' is wrong since it neither accounts for
device header length nor the fragment header that we need to push.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-16 15:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-15 15:53 IPv6 routing/fragmentation panic David Woodhouse
2015-09-15 16:07 ` Michal Kubecek
2015-09-15 23:48 ` Florian Westphal
2015-09-16 10:09   ` David Woodhouse
2015-09-16 13:27     ` Florian Westphal
2015-09-16 13:34       ` David Woodhouse
2015-09-16 14:00       ` David Woodhouse
2015-09-16 15:30         ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).