netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: MARK in OUTPUT + ip_tunnel causes kernel panic
@ 2013-08-21 14:00 Kristian Evensen
  2013-09-25  8:31 ` Konstantin Kuzov
  0 siblings, 1 reply; 5+ messages in thread
From: Kristian Evensen @ 2013-08-21 14:00 UTC (permalink / raw)
  To: netdev

Hello,

When trying to tunnel traffic originating from the same machine as the
tunnel endpoint, I am experiencing kernel panics for some types of
traffic (ICMP and UDP). TCP seems not to be affected by this, at least
I have not been able to trigger the panic.

I have one tunnel (without an IP address) and use policy routing to
steer some traffic through the tunnels. For my example, I use the
following commands to configure the tunnel and the routing:

ip tunnel add tun0 mode ipip remote 10.110.112.2 local 10.110.112.1
ip link set dev tun0 up
ip rule add fwmark 0x1 lookup 101
ip ro add default via tun0 table 101
iptables -A OUTPUT -t mangle -d 8.8.8.8 -j MARK --set-mark 0x1

The remote address of the tunnel does not matter, as the packets never
get that far. I then run ping 8.8.8.8/nc -u 8.8.8.8 9999, which
triggers the panic:

With ICMP:
skbuff: skb_under_panic: text:ffffffff815f9baf len:118 put:14
head:ffff880100cff800 data:ffff880100cff7ee tail:0x64 end:0xc0
dev:eth4

With UDP:
skbuff: skb_under_panic: text:ffffffff815f9baf len:71 put:14
head:ffff880118554c00 data:ffff880118554bee tail:0x35 end:0xc0
dev:eth4

Stack trace is the same:
[  304.217036] ------------[ cut here ]------------
[  304.217106] Kernel BUG at ffffffff816b75cc [verbose debug info unavailable]
[  304.217195] invalid opcode: 0000 [#1] SMP
[  304.217278] Modules linked in: iptable_mangle xt_mark ip_tables
x_tables ipip tunnel4 ip_tunnel netconsole configfs asix usbnet mii
i915 snd_hda_codec_hdmi snd_hda_intel joydev snd_hda_codec hid_generic
drm_kms_helper snd_hwdep drm usbhid hid snd_pcm video snd_page_alloc
mac_hid snd_timer snd lpc_ich soundcore i2c_algo_bit lp parport e1000e
ahci libahci ptp pps_core [last unloaded: netconsole]
[  304.218140] CPU: 3 PID: 1437 Comm: nc Not tainted 3.11.0-rc5 #33
[  304.218221] Hardware name:                  /D33217GKE, BIOS
GKPPT10H.86A.0020.2012.0919.2135 09/19/2012
[  304.218325] task: ffff880110555d40 ti: ffff88011ef4c000 task.ti:
ffff88011ef4c000
[  304.218411] RIP: 0010:[<ffffffff816b75cc>]  [<ffffffff816b75cc>]
skb_panic+0x63/0x65
[  304.218528] RSP: 0018:ffff88011ef4d8d0  EFLAGS: 00010292
[  304.218595] RAX: 0000000000000084 RBX: ffff88011f1e8f00 RCX: 0000000000000006
[  304.218677] RDX: 0000000000000007 RSI: 0000000000000046 RDI: ffff88011f38d490
[  304.218759] RBP: ffff88011ef4d8f0 R08: 0000000002000000 R09: 0000000000000300
[  304.218841] R10: ffff88002ed6b880 R11: 0000000000000000 R12: ffff8801104f72d8
[  304.218923] R13: 000000000000000e R14: ffff8801104f72e8 R15: 0000000000000000
[  304.219005] FS:  00007f49f196d740(0000) GS:ffff88011f380000(0000)
knlGS:0000000000000000
[  304.219095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  304.219164] CR2: 00007f49f10799d0 CR3: 00000001184ed000 CR4: 00000000001407e0
[  304.219245] Stack:
[  304.219273]  ffff880118554bee 0000000000000035 00000000000000c0
ffff8801193d7000
[  304.219411]  ffff88011ef4d900 ffffffff815b183a ffff88011ef4d948
ffffffff815f9baf
[  304.219544]  ffff8801193d7000 02706e0a81d00380 ffff88011f1e8f00
0000000000000000
[  304.219672] Call Trace:
[  304.219724]  [<ffffffff815b183a>] skb_push+0x3a/0x40
[  304.219794]  [<ffffffff815f9baf>] ip_finish_output+0x2af/0x3d0
[  304.219873]  [<ffffffff815fa5d5>] ip_output+0x55/0x90
[  304.219941]  [<ffffffff815f9d85>] ip_local_out+0x25/0x30
[  304.220014]  [<ffffffff8163be77>] iptunnel_xmit+0x1a7/0x1e0
[  304.220095]  [<ffffffffa0284d49>] ip_tunnel_xmit+0x2e9/0x15a0 [ip_tunnel]
[  304.220181]  [<ffffffffa028a6a1>] ipip_tunnel_xmit+0x61/0x80 [ipip]
[  304.220264]  [<ffffffff815c3138>] dev_hard_start_xmit+0x338/0x510
[  304.220342]  [<ffffffffa02a710b>] ? iptable_mangle_hook+0x7b/0x13c
[iptable_mangle]
[  304.220435]  [<ffffffff815c363f>] dev_queue_xmit+0x32f/0x490
[  304.220512]  [<ffffffff815c9401>] neigh_direct_output+0x11/0x20
[  304.220589]  [<ffffffff815f9aaf>] ip_finish_output+0x1af/0x3d0
[  304.220665]  [<ffffffff815fa5d5>] ip_output+0x55/0x90
[  304.220735]  [<ffffffff815f9d85>] ip_local_out+0x25/0x30
[  304.220806]  [<ffffffff815fb075>] ip_send_skb+0x15/0x50
[  304.220880]  [<ffffffff816201f7>] udp_send_skb+0x227/0x2b0
[  304.220953]  [<ffffffff815f82d0>] ? ip_copy_metadata+0x1a0/0x1a0
[  304.221034]  [<ffffffff81621c14>] udp_sendmsg+0x2c4/0x9e0
[  304.221108]  [<ffffffff81128f00>] ? __page_cache_alloc+0xc0/0xd0
[  304.221186]  [<ffffffff8112af2d>] ? filemap_fault+0xbd/0x470
[  304.221259]  [<ffffffff81129443>] ? unlock_page+0x23/0x30
[  304.221335]  [<ffffffff8114dc39>] ? __do_fault+0x3a9/0x4c0
[  304.221407]  [<ffffffff8162ce73>] inet_sendmsg+0x63/0xb0
[  304.221481]  [<ffffffff815aa36f>] sock_aio_write+0x13f/0x160
[  304.221555]  [<ffffffff81402e32>] ? n_tty_set_room+0x12/0xc0
[  304.221634]  [<ffffffff8118def0>] do_sync_write+0x80/0xb0
[  304.221707]  [<ffffffff8118edc5>] vfs_write+0x1b5/0x1e0
[  304.221778]  [<ffffffff8118f1c2>] SyS_write+0x52/0xa0
[  304.221850]  [<ffffffff816c4346>] system_call_fastpath+0x1a/0x1f
[  304.221924] Code: 00 00 48 89 44 24 10 8b 87 d0 00 00 00 48 89 44
24 08 48 8b 87 e0 00 00 00 48 c7 c7 58 14 ad 81 48 89 04 24 31 c0 e8
d9 97 ff ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48
89 fb
[  304.222830] RIP  [<ffffffff816b75cc>] skb_panic+0x63/0x65
[  304.222910]  RSP <ffff88011ef4d8d0>
[  304.222993] ---[ end trace e19b480453293c10 ]---
[  304.223086] Kernel panic - not syncing: Fatal exception in interrupt
[  304.223221] drm_kms_helper: panic occurred, switching back to text console

An interesting thing is that I have seen different kernel panics being
triggered. The other one I have seen has RIP pointing to
e1000_xmit_frame() and the message "protocol 0800 is buggy". However,
the one I have posted is by far the most common.

What puzzles me is this additional skb_push()-call. I have not been
able to trace its origin. As far as I can see, the skb data is not
touched after iptunnel_xmit(). It is clear that some bogus data is
introduced to the head of the packet at some point after
iptunnel_xmit():

[  500.537604] ip_tunnel_core: Len before push: 84 (size of iphdr struct 20)
[  500.541096] ip_tunnel_core: Len after push: 104
[  500.650000] ip_tunnel_core: 4 5 2706e0a 1706e0a (protocol, header
length, dst, src ip)
[  500.735787] rawpost: IN= OUT=eth4 SRC=236.168.107.243
DST=138.226.8.0 LEN=0 TOS=0x00 PREC=0x00 TTL=1 ID=180 DF FRAG:7704
PROTO=UDPLITE MARK=0x1

The three first lines is debug-output I added to iptunnel_xmit(),
while the last line is from iptables' LOG target used in rawpost
(combined with a DROP to avoid triggering the panic).

I do not see this kernel panic when I mangle in PREROUTING, give the
interface an address or don't mangle at all (for example ping -I tun0
8.8.8.8). The machine this occurs on is an Intel NUC with the Intel
82579 Gigabit Ethernet Controller. I have tested this against latest
net-next (pulled this morning).

Thanks in advance for any help,
Kristian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG: MARK in OUTPUT + ip_tunnel causes kernel panic
  2013-08-21 14:00 BUG: MARK in OUTPUT + ip_tunnel causes kernel panic Kristian Evensen
@ 2013-09-25  8:31 ` Konstantin Kuzov
  2013-09-25  8:59   ` Steffen Klassert
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Kuzov @ 2013-09-25  8:31 UTC (permalink / raw)
  To: netdev

Kristian Evensen <kristian.evensen <at> gmail.com> writes:

> When trying to tunnel traffic originating from the same machine as the
> tunnel endpoint, I am experiencing kernel panics for some types of
> traffic (ICMP and UDP). TCP seems not to be affected by this, at least
> I have not been able to trigger the panic.
> 
> I have one tunnel (without an IP address) and use policy routing to
> steer some traffic through the tunnels.
[...]
> An interesting thing is that I have seen different kernel panics being
> triggered. The other one I have seen has RIP pointing to
> e1000_xmit_frame() and the message "protocol 0800 is buggy". However,
> the one I have posted is by far the most common.
I'm experiencing the same issue on two different machines. It happens on any 
kernel starting from 3.10 when ip_tunnel/ip_tunnel_core were introduced.

But in my configuration I have addresses on tunnel interfaces and only doing 
masquerading on postrouting...

Same as you it triggers different kernel panics depends on which kernel 
modules involved (nic, vbox, etc...) here are some samples:
http://nosferatu.g0x.ru/pub/kerneloops/

But most common one looks like that:

[   81.797190] skbuff: skb_under_panic: text:ffffffff8170307f len:142 put:14 
head:ffff88040cd12a00 data:ffff88040cd129ee tail:0x7c end:0xc0 dev:v33
[   81.797240] ------------[ cut here ]------------
[   81.797256] kernel BUG at net/core/skbuff.c:126!
[   81.797272] invalid opcode: 0000 [#1] SMP 
[   81.797291] Modules linked in: ext2
[   81.797309] CPU: 0 PID: 4654 Comm: ffmpeg Not tainted 3.11.1 #3
[   81.797328] Hardware name: Gigabyte Technology Co., Ltd. Z68A-D3H-
B3/Z68A-D3H-B3, BIOS F13 03/20/2012
[   81.797356] task: ffff88040cd5bc80 ti: ffff880407c38000 task.ti: 
ffff880407c38000
[   81.797380] RIP: 0010:[<ffffffff81881a55>]  [<ffffffff81881a55>] 
skb_panic+0x5e/0x60
[   81.797411] RSP: 0000:ffff88041fa03898  EFLAGS: 00010296
[   81.797428] RAX: 0000000000000084 RBX: ffff88040a2bd300 RCX: 
0000000000000000
[   81.797450] RDX: ffff88041fa0eb48 RSI: ffff88041fa0d258 RDI: 
ffff88041fa0d258
[   81.797472] RBP: ffff88041fa038b8 R08: 0000000000000000 R09: 
00000000000003ca
[   81.797494] R10: 0000000000000001 R11: 0000000000aaaaaa R12: 
ffff88040b90aed8
[   81.797516] R13: 000000000000000e R14: ffff88040b90aee8 R15: 
0000000000000000
[   81.797538] FS:  00007ff3002a0740(0000) GS:ffff88041fa00000(0000) 
knlGS:0000000000000000
[   81.797564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.797582] CR2: 0000000008212000 CR3: 000000040a097000 CR4: 
00000000000407f0
[   81.797604] Stack:
[   81.797613]  ffff88040cd129ee 000000000000007c 00000000000000c0 
ffff88040c71e000
[   81.797643]  ffff88041fa038c8 ffffffff816971d5 ffff88041fa03928 
ffffffff8170307f
[   81.797673]  ffffffff81ee10a0 ffff88040c71e000 ffff88040d79dfc0 
fe971fac81ee10a0
[   81.797703] Call Trace:
[   81.797713]  <IRQ> 
[   81.797721] 
[   81.797730]  [<ffffffff816971d5>] skb_push+0x35/0x40
[   81.797745]  [<ffffffff8170307f>] ip_finish_output+0x2af/0x3a0
[   81.797765]  [<ffffffff81703a98>] ip_output+0x88/0x90
[   81.797782]  [<ffffffff81703214>] ip_local_out+0x24/0x30
[   81.797801]  [<ffffffff8174291b>] iptunnel_xmit+0x17b/0x1b0
[   81.797820]  [<ffffffff81744560>] ip_tunnel_xmit+0x2e0/0x7d0
[   81.797839]  [<ffffffff8174a9ec>] ipip_tunnel_xmit+0x5c/0x70
[   81.797859]  [<ffffffff816a8240>] dev_hard_start_xmit+0x300/0x510
[   81.798750]  [<ffffffff81758668>] ? nf_nat_ipv4_out+0x58/0x100
[   81.799648]  [<ffffffff816a8718>] dev_queue_xmit+0x2c8/0x460
[   81.800543]  [<ffffffff816ae14c>] neigh_direct_output+0xc/0x10
[   81.801434]  [<ffffffff81702f7f>] ip_finish_output+0x1af/0x3a0
[   81.802337]  [<ffffffff81703a98>] ip_output+0x88/0x90
[   81.803243]  [<ffffffff81703214>] ip_local_out+0x24/0x30
[   81.804143]  [<ffffffff817044f4>] ip_send_skb+0x14/0x50
[   81.805040]  [<ffffffff81704562>] ip_push_pending_frames+0x32/0x40
[   81.805950]  [<ffffffff8172e1ae>] icmp_push_reply+0xee/0x120
[   81.806851]  [<ffffffff8172e969>] icmp_send+0x419/0x490
[   81.807750]  [<ffffffff816a6082>] ? __netif_receive_skb_core+0x622/0x7f0
[   81.808651]  [<ffffffff81077c00>] ? update_curr+0x10/0x160
[   81.809552]  [<ffffffff816b00e0>] ? neigh_invalidate+0x120/0x120
[   81.810445]  [<ffffffff816f933d>] ipv4_link_failure+0x1d/0x70
[   81.811333]  [<ffffffff8172c50d>] arp_error_report+0x2d/0x40
[   81.812217]  [<ffffffff816b004c>] neigh_invalidate+0x8c/0x120
[   81.813102]  [<ffffffff816b0316>] neigh_timer_handler+0x236/0x2a0
[   81.813989]  [<ffffffff8104fb3a>] call_timer_fn+0x3a/0x110
[   81.814883]  [<ffffffff816b00e0>] ? neigh_invalidate+0x120/0x120
[   81.815787]  [<ffffffff81050ea0>] run_timer_softirq+0x1c0/0x2a0
[   81.816695]  [<ffffffff81048ee9>] __do_softirq+0xe9/0x230
[   81.817605]  [<ffffffff81049185>] irq_exit+0x95/0xa0
[   81.818513]  [<ffffffff8102da75>] smp_apic_timer_interrupt+0x45/0x60
[   81.819431]  [<ffffffff8188feca>] apic_timer_interrupt+0x6a/0x70
[   81.820355]  <EOI> 
[   81.820363] 
[   81.821277]  [<ffffffff8188f312>] ? system_call_fastpath+0x16/0x1b
[   81.822208] Code: 00 00 48 89 44 24 10 8b 87 d0 00 00 00 48 89 44 24 08 
48 8b 87 e0 00 00 00 48 c7 c7 80 fa c5 81 48 89 04 24 31 c0 e8 94 95 ff ff 
<0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 
[   81.823365] RIP  [<ffffffff81881a55>] skb_panic+0x5e/0x60
[   81.824409]  RSP <ffff88041fa03898>

I also can't trace why that happens and strangely I can't reproduce that 
issue on virtualbox. Have you discovered anything more about this issue in 
past month?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG: MARK in OUTPUT + ip_tunnel causes kernel panic
  2013-09-25  8:31 ` Konstantin Kuzov
@ 2013-09-25  8:59   ` Steffen Klassert
  2013-09-26  2:06     ` Konstantin Kuzov
  2013-09-26 18:27     ` Kristian Evensen
  0 siblings, 2 replies; 5+ messages in thread
From: Steffen Klassert @ 2013-09-25  8:59 UTC (permalink / raw)
  To: Konstantin Kuzov; +Cc: netdev

On Wed, Sep 25, 2013 at 08:31:52AM +0000, Konstantin Kuzov wrote:
> Kristian Evensen <kristian.evensen <at> gmail.com> writes:
> 
> > When trying to tunnel traffic originating from the same machine as the
> > tunnel endpoint, I am experiencing kernel panics for some types of
> > traffic (ICMP and UDP). TCP seems not to be affected by this, at least
> > I have not been able to trigger the panic.
> > 
> > I have one tunnel (without an IP address) and use policy routing to
> > steer some traffic through the tunnels.
> [...]
> > An interesting thing is that I have seen different kernel panics being
> > triggered. The other one I have seen has RIP pointing to
> > e1000_xmit_frame() and the message "protocol 0800 is buggy". However,
> > the one I have posted is by far the most common.
> I'm experiencing the same issue on two different machines. It happens on any 
> kernel starting from 3.10 when ip_tunnel/ip_tunnel_core were introduced.
> 

Can you please try the patch below?
I've posted the same patch already to netdev in the morning.

Subject: [PATCH net 1/2] ip_tunnel: Fix a memory corruption in ip_tunnel_xmit

We might extend the used aera of a skb beyond the total
headroom when we install the ipip header. Fix this by
calling skb_cow_head() unconditionally.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_tunnel.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index ac9fabe..b8ce640 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -641,13 +641,13 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr)
 			+ rt->dst.header_len;
-	if (max_headroom > dev->needed_headroom) {
+	if (max_headroom > dev->needed_headroom)
 		dev->needed_headroom = max_headroom;
-		if (skb_cow_head(skb, dev->needed_headroom)) {
-			dev->stats.tx_dropped++;
-			dev_kfree_skb(skb);
-			return;
-		}
+
+	if (skb_cow_head(skb, dev->needed_headroom)) {
+		dev->stats.tx_dropped++;
+		dev_kfree_skb(skb);
+		return;
 	}
 
 	err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, protocol,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: BUG: MARK in OUTPUT + ip_tunnel causes kernel panic
  2013-09-25  8:59   ` Steffen Klassert
@ 2013-09-26  2:06     ` Konstantin Kuzov
  2013-09-26 18:27     ` Kristian Evensen
  1 sibling, 0 replies; 5+ messages in thread
From: Konstantin Kuzov @ 2013-09-26  2:06 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev

On Wed, 25 Sep 2013 10:59:47 +0200
Steffen Klassert wrote:

 > > > When trying to tunnel traffic originating from the same machine as the
 > > > tunnel endpoint, I am experiencing kernel panics for some types of
 > > > traffic (ICMP and UDP). TCP seems not to be affected by this, at least
 > > > I have not been able to trigger the panic.
 > > > 
 > > > I have one tunnel (without an IP address) and use policy routing to
 > > > steer some traffic through the tunnels.  
 > > [...]  
 > > > An interesting thing is that I have seen different kernel panics being
 > > > triggered. The other one I have seen has RIP pointing to
 > > > e1000_xmit_frame() and the message "protocol 0800 is buggy". However,
 > > > the one I have posted is by far the most common.  
 > > I'm experiencing the same issue on two different machines. It happens on any 
 > > kernel starting from 3.10 when ip_tunnel/ip_tunnel_core were introduced.
 > >   
 > Can you please try the patch below?
 > I've posted the same patch already to netdev in the morning.
 > 
 > Subject: [PATCH net 1/2] ip_tunnel: Fix a memory corruption in ip_tunnel_xmit  
 
Thank you very much. All works fine with that patch applied.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG: MARK in OUTPUT + ip_tunnel causes kernel panic
  2013-09-25  8:59   ` Steffen Klassert
  2013-09-26  2:06     ` Konstantin Kuzov
@ 2013-09-26 18:27     ` Kristian Evensen
  1 sibling, 0 replies; 5+ messages in thread
From: Kristian Evensen @ 2013-09-26 18:27 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Konstantin Kuzov, Network Development

On Wed, Sep 25, 2013 at 10:59 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> Can you please try the patch below?
> I've posted the same patch already to netdev in the morning.

Thank you very much for this patch. I am currently travelling, but
will give it a try when I am back in the office next week.

-Kristian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-26 18:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-21 14:00 BUG: MARK in OUTPUT + ip_tunnel causes kernel panic Kristian Evensen
2013-09-25  8:31 ` Konstantin Kuzov
2013-09-25  8:59   ` Steffen Klassert
2013-09-26  2:06     ` Konstantin Kuzov
2013-09-26 18:27     ` Kristian Evensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).