* destroy_conntrack GPF in 3.7.9
@ 2013-03-06 15:59 Dave Jones
2013-03-06 16:44 ` Dave Jones
2013-03-06 16:54 ` Eric Dumazet
0 siblings, 2 replies; 6+ messages in thread
From: Dave Jones @ 2013-03-06 15:59 UTC (permalink / raw)
To: netdev; +Cc: Fedora Kernel Team
I know 3.7.9 is EOL, but this code doesn't look like it's changed in current.
(unless the cause/fix was in code unrelated to these paths)
A user reported the following GPF..
general protection fault: 0000 [#1] SMP
Modules linked in: ipheth fuse ebtable_nat xt_CHECKSUM bridge stp llc ip6t_REJECT iptable_mangle nf_conntrack(-) ebtable_filter ebtables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc hp_wmi snd_timer coretemp iTCO_wdt tg3 snd sparse_keymap rfkill soundcore iTCO_vendor_support lpc_ich i7core_edac edac_core serio_raw microcode mfd_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc uinput nouveau mxm_wmi crc32c_intel video i2c_algo_bit drm_kms_helper ttm firewire_ohci firewire_core drm crc_itu_t i2c_core wmi [last unloaded: xt_conntrack]
CPU 2
Pid: 25407, comm: qemu-kvm Not tainted 3.7.9-205.fc18.x86_64 #1 Hewlett-Packard HP Z400 Workstation/0B4Ch
RIP: 0010:[<ffffffffa0399bd5>] [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
RSP: 0018:ffff880276913d78 EFLAGS: 00010206
RAX: 50626b6b7876376c RBX: ffff88026e530d68 RCX: ffff88028d158e00
RDX: ffff88026d0d5470 RSI: 0000000000000011 RDI: 0000000000000002
RBP: ffff880276913d88 R08: 0000000000000000 R09: ffff880295002900
R10: 0000000000000000 R11: 0000000000000003 R12: ffffffff81ca3b40
R13: ffffffff8151a8e0 R14: ffff880270875000 R15: 0000000000000002
FS: 00007ff3bce38a00(0000) GS:ffff88029fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fd1430bd000 CR3: 000000027042b000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-kvm (pid: 25407, threadinfo ffff880276912000, task ffff88028c369720)
Stack:
ffff880156f59100 ffff880156f59100 ffff880276913d98 ffffffff815534f7
ffff880276913db8 ffffffff8151a74b ffff880270875000 ffff880156f59100
ffff880276913dd8 ffffffff8151a5a6 ffff880276913dd8 ffff88026d0d5470
Call Trace:
[<ffffffff815534f7>] nf_conntrack_destroy+0x17/0x20
[<ffffffff8151a74b>] skb_release_head_state+0x7b/0x100
[<ffffffff8151a5a6>] __kfree_skb+0x16/0xa0
[<ffffffff8151a666>] kfree_skb+0x36/0xa0
[<ffffffff8151a8e0>] skb_queue_purge+0x20/0x40
[<ffffffffa02205f7>] __tun_detach+0x117/0x140 [tun]
[<ffffffffa022184c>] tun_chr_close+0x3c/0xd0 [tun]
[<ffffffff8119669c>] __fput+0xec/0x240
[<ffffffff811967fe>] ____fput+0xe/0x10
[<ffffffff8107eb27>] task_work_run+0xa7/0xe0
[<ffffffff810149e1>] do_notify_resume+0x71/0xb0
[<ffffffff81640152>] int_signal+0x12/0x17
Code: 00 00 04 48 89 e5 41 54 53 48 89 fb 4c 8b a7 e8 00 00 00 0f 85 de 00 00 00 0f b6 73 3e 0f b7 7b 2a e8 10 40 00 00 48 85 c0 74 0e <48> 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 c7 c7 08 6a 3a a0
RIP [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
RSP <ffff880276913d78>
/* To make sure we don't get any weird locking issues here:
* destroy_conntrack() MUST NOT be called with a write lock
* to nf_conntrack_lock!!! -HW */
rcu_read_lock();
l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
1378: 0f b6 b3 86 00 00 00 movzbl 0x86(%rbx),%esi
137f: 0f b7 7b 72 movzwl 0x72(%rbx),%edi
1383: e8 00 00 00 00 callq 1388 <destroy_conntrack+0x78>
if (l4proto && l4proto->destroy)
1388: 48 85 c0 test %rax,%rax
138b: 74 0e je 139b <destroy_conntrack+0x8b>
138d: 48 8b 40 28 mov 0x28(%rax),%rax <----- HERE
1391: 48 85 c0 test %rax,%rax
1394: 74 05 je 139b <destroy_conntrack+0x8b>
l4proto->destroy(ct);
1396: 48 89 df mov %rbx,%rdi
1399: ff d0 callq *%rax
l4proto (%rax) is garbage (0x50626b6b7876376c) which looks a little like ascii,
but P>kkxv7l doesn't mean much to me.
https://bugzilla.redhat.com/show_bug.cgi?id=917792 is the original report, but
there aren't any further details yet.
Dave
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: destroy_conntrack GPF in 3.7.9
2013-03-06 15:59 destroy_conntrack GPF in 3.7.9 Dave Jones
@ 2013-03-06 16:44 ` Dave Jones
2013-03-06 16:54 ` Eric Dumazet
1 sibling, 0 replies; 6+ messages in thread
From: Dave Jones @ 2013-03-06 16:44 UTC (permalink / raw)
To: netdev; +Cc: Fedora Kernel Team
On Wed, Mar 06, 2013 at 10:59:55AM -0500, Dave Jones wrote:
> I know 3.7.9 is EOL, but this code doesn't look like it's changed in current.
> (unless the cause/fix was in code unrelated to these paths)
>
> A user reported the following GPF..
>
> /* To make sure we don't get any weird locking issues here:
> * destroy_conntrack() MUST NOT be called with a write lock
> * to nf_conntrack_lock!!! -HW */
> rcu_read_lock();
> l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
> 1378: 0f b6 b3 86 00 00 00 movzbl 0x86(%rbx),%esi
> 137f: 0f b7 7b 72 movzwl 0x72(%rbx),%edi
> 1383: e8 00 00 00 00 callq 1388 <destroy_conntrack+0x78>
> if (l4proto && l4proto->destroy)
> 1388: 48 85 c0 test %rax,%rax
> 138b: 74 0e je 139b <destroy_conntrack+0x8b>
> 138d: 48 8b 40 28 mov 0x28(%rax),%rax <----- HERE
> 1391: 48 85 c0 test %rax,%rax
> 1394: 74 05 je 139b <destroy_conntrack+0x8b>
> l4proto->destroy(ct);
> 1396: 48 89 df mov %rbx,%rdi
> 1399: ff d0 callq *%rax
>
>
> l4proto (%rax) is garbage (0x50626b6b7876376c) which looks a little like ascii,
> but P>kkxv7l doesn't mean much to me.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=917792 is the original report, but
> there aren't any further details yet.
I just realised I reported this last September against 3.6
https://bugzilla.redhat.com/show_bug.cgi?id=859346
shows what looks like a use after free. l4proto got freed somewhere.
Dave
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: destroy_conntrack GPF in 3.7.9
2013-03-06 15:59 destroy_conntrack GPF in 3.7.9 Dave Jones
2013-03-06 16:44 ` Dave Jones
@ 2013-03-06 16:54 ` Eric Dumazet
2013-03-06 20:46 ` David Miller
1 sibling, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2013-03-06 16:54 UTC (permalink / raw)
To: Dave Jones; +Cc: netdev, Fedora Kernel Team
On Wed, 2013-03-06 at 10:59 -0500, Dave Jones wrote:
> I know 3.7.9 is EOL, but this code doesn't look like it's changed in current.
> (unless the cause/fix was in code unrelated to these paths)
>
> A user reported the following GPF..
>
> general protection fault: 0000 [#1] SMP
> Modules linked in: ipheth fuse ebtable_nat xt_CHECKSUM bridge stp llc ip6t_REJECT iptable_mangle nf_conntrack(-) ebtable_filter ebtables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc hp_wmi snd_timer coretemp iTCO_wdt tg3 snd sparse_keymap rfkill soundcore iTCO_vendor_support lpc_ich i7core_edac edac_core serio_raw microcode mfd_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc uinput nouveau mxm_wmi crc32c_intel video i2c_algo_bit drm_kms_helper ttm firewire_ohci firewire_core drm crc_itu_t i2c_core wmi [last unloaded: xt_conntrack]
> CPU 2
> Pid: 25407, comm: qemu-kvm Not tainted 3.7.9-205.fc18.x86_64 #1 Hewlett-Packard HP Z400 Workstation/0B4Ch
> RIP: 0010:[<ffffffffa0399bd5>] [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
> RSP: 0018:ffff880276913d78 EFLAGS: 00010206
> RAX: 50626b6b7876376c RBX: ffff88026e530d68 RCX: ffff88028d158e00
> RDX: ffff88026d0d5470 RSI: 0000000000000011 RDI: 0000000000000002
> RBP: ffff880276913d88 R08: 0000000000000000 R09: ffff880295002900
> R10: 0000000000000000 R11: 0000000000000003 R12: ffffffff81ca3b40
> R13: ffffffff8151a8e0 R14: ffff880270875000 R15: 0000000000000002
> FS: 00007ff3bce38a00(0000) GS:ffff88029fc40000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fd1430bd000 CR3: 000000027042b000 CR4: 00000000000027e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process qemu-kvm (pid: 25407, threadinfo ffff880276912000, task ffff88028c369720)
> Stack:
> ffff880156f59100 ffff880156f59100 ffff880276913d98 ffffffff815534f7
> ffff880276913db8 ffffffff8151a74b ffff880270875000 ffff880156f59100
> ffff880276913dd8 ffffffff8151a5a6 ffff880276913dd8 ffff88026d0d5470
> Call Trace:
> [<ffffffff815534f7>] nf_conntrack_destroy+0x17/0x20
> [<ffffffff8151a74b>] skb_release_head_state+0x7b/0x100
> [<ffffffff8151a5a6>] __kfree_skb+0x16/0xa0
> [<ffffffff8151a666>] kfree_skb+0x36/0xa0
> [<ffffffff8151a8e0>] skb_queue_purge+0x20/0x40
> [<ffffffffa02205f7>] __tun_detach+0x117/0x140 [tun]
> [<ffffffffa022184c>] tun_chr_close+0x3c/0xd0 [tun]
> [<ffffffff8119669c>] __fput+0xec/0x240
> [<ffffffff811967fe>] ____fput+0xe/0x10
> [<ffffffff8107eb27>] task_work_run+0xa7/0xe0
> [<ffffffff810149e1>] do_notify_resume+0x71/0xb0
> [<ffffffff81640152>] int_signal+0x12/0x17
> Code: 00 00 04 48 89 e5 41 54 53 48 89 fb 4c 8b a7 e8 00 00 00 0f 85 de 00 00 00 0f b6 73 3e 0f b7 7b 2a e8 10 40 00 00 48 85 c0 74 0e <48> 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 c7 c7 08 6a 3a a0
> RIP [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
> RSP <ffff880276913d78>
>
>
>
> /* To make sure we don't get any weird locking issues here:
> * destroy_conntrack() MUST NOT be called with a write lock
> * to nf_conntrack_lock!!! -HW */
> rcu_read_lock();
> l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
> 1378: 0f b6 b3 86 00 00 00 movzbl 0x86(%rbx),%esi
> 137f: 0f b7 7b 72 movzwl 0x72(%rbx),%edi
> 1383: e8 00 00 00 00 callq 1388 <destroy_conntrack+0x78>
> if (l4proto && l4proto->destroy)
> 1388: 48 85 c0 test %rax,%rax
> 138b: 74 0e je 139b <destroy_conntrack+0x8b>
> 138d: 48 8b 40 28 mov 0x28(%rax),%rax <----- HERE
> 1391: 48 85 c0 test %rax,%rax
> 1394: 74 05 je 139b <destroy_conntrack+0x8b>
> l4proto->destroy(ct);
> 1396: 48 89 df mov %rbx,%rdi
> 1399: ff d0 callq *%rax
>
>
> l4proto (%rax) is garbage (0x50626b6b7876376c) which looks a little like ascii,
> but P>kkxv7l doesn't mean much to me.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=917792 is the original report, but
> there aren't any further details yet.
>
> Dave
>
tun driver lacks a nf_reset(skb) call
I would try :
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2c6a22e..b7c457a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -747,6 +747,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
goto drop;
skb_orphan(skb);
+ nf_reset(skb);
+
/* Enqueue packet */
skb_queue_tail(&tfile->socket.sk->sk_receive_queue, skb);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: destroy_conntrack GPF in 3.7.9
2013-03-06 16:54 ` Eric Dumazet
@ 2013-03-06 20:46 ` David Miller
2013-03-06 21:02 ` [PATCH] tun: add a missing nf_reset() in tun_net_xmit() Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2013-03-06 20:46 UTC (permalink / raw)
To: eric.dumazet; +Cc: davej, netdev, kernel-team
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 06 Mar 2013 08:54:07 -0800
> tun driver lacks a nf_reset(skb) call
>
> I would try :
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2c6a22e..b7c457a 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -747,6 +747,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> goto drop;
> skb_orphan(skb);
>
> + nf_reset(skb);
> +
We just recently fixed a similar bug in the VXLAN driver too.
Eric please submit the TUN fix formally, thanks.
commit 88c4c066c6b4db26dc4909ee94e6bf377e8e8e81
Author: Zang MingJie <zealot0630@gmail.com>
Date: Mon Mar 4 06:07:34 2013 +0000
reset nf before xmit vxlan encapsulated packet
We should reset nf settings bond to the skb as ipip/ipgre do.
If not, the conntrack/nat info bond to the origin packet may continually
redirect the packet to vxlan interface causing a routing loop.
this is the scenario:
VETP VXLAN Gateway
/----\ /---------------\
| | | |
| vx+--+vx --NAT-> eth0+--> Internet
| | | |
\----/ \---------------/
when there are any packet coming from internet to the vetp, there will be lots
of garbage packets coming out the gateway's vxlan interface, but none actually
sent to the physical interface, because they are redirected back to the vxlan
interface in the postrouting chain of NAT rule, and dmesg complains:
Mar 1 21:52:53 debian kernel: [ 8802.997699] Dead loop on virtual device vxlan0, fix it urgently!
Mar 1 21:52:54 debian kernel: [ 8804.004907] Dead loop on virtual device vxlan0, fix it urgently!
Mar 1 21:52:55 debian kernel: [ 8805.012189] Dead loop on virtual device vxlan0, fix it urgently!
Mar 1 21:52:56 debian kernel: [ 8806.020593] Dead loop on virtual device vxlan0, fix it urgently!
the patch should fix the problem
Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f10e58a..c3e3d29 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -961,6 +961,8 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
iph->ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
tunnel_ip_select_ident(skb, old_iph, &rt->dst);
+ nf_reset(skb);
+
vxlan_set_owner(dev, skb);
/* See iptunnel_xmit() */
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH] tun: add a missing nf_reset() in tun_net_xmit()
2013-03-06 20:46 ` David Miller
@ 2013-03-06 21:02 ` Eric Dumazet
2013-03-06 21:05 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2013-03-06 21:02 UTC (permalink / raw)
To: David Miller; +Cc: davej, netdev, kernel-team
From: Eric Dumazet <edumazet@google.com>
Dave reported following crash :
general protection fault: 0000 [#1] SMP
CPU 2
Pid: 25407, comm: qemu-kvm Not tainted 3.7.9-205.fc18.x86_64 #1 Hewlett-Packard HP Z400 Workstation/0B4Ch
RIP: 0010:[<ffffffffa0399bd5>] [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
RSP: 0018:ffff880276913d78 EFLAGS: 00010206
RAX: 50626b6b7876376c RBX: ffff88026e530d68 RCX: ffff88028d158e00
RDX: ffff88026d0d5470 RSI: 0000000000000011 RDI: 0000000000000002
RBP: ffff880276913d88 R08: 0000000000000000 R09: ffff880295002900
R10: 0000000000000000 R11: 0000000000000003 R12: ffffffff81ca3b40
R13: ffffffff8151a8e0 R14: ffff880270875000 R15: 0000000000000002
FS: 00007ff3bce38a00(0000) GS:ffff88029fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fd1430bd000 CR3: 000000027042b000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-kvm (pid: 25407, threadinfo ffff880276912000, task ffff88028c369720)
Stack:
ffff880156f59100 ffff880156f59100 ffff880276913d98 ffffffff815534f7
ffff880276913db8 ffffffff8151a74b ffff880270875000 ffff880156f59100
ffff880276913dd8 ffffffff8151a5a6 ffff880276913dd8 ffff88026d0d5470
Call Trace:
[<ffffffff815534f7>] nf_conntrack_destroy+0x17/0x20
[<ffffffff8151a74b>] skb_release_head_state+0x7b/0x100
[<ffffffff8151a5a6>] __kfree_skb+0x16/0xa0
[<ffffffff8151a666>] kfree_skb+0x36/0xa0
[<ffffffff8151a8e0>] skb_queue_purge+0x20/0x40
[<ffffffffa02205f7>] __tun_detach+0x117/0x140 [tun]
[<ffffffffa022184c>] tun_chr_close+0x3c/0xd0 [tun]
[<ffffffff8119669c>] __fput+0xec/0x240
[<ffffffff811967fe>] ____fput+0xe/0x10
[<ffffffff8107eb27>] task_work_run+0xa7/0xe0
[<ffffffff810149e1>] do_notify_resume+0x71/0xb0
[<ffffffff81640152>] int_signal+0x12/0x17
Code: 00 00 04 48 89 e5 41 54 53 48 89 fb 4c 8b a7 e8 00 00 00 0f 85 de 00 00 00 0f b6 73 3e 0f b7 7b 2a e8 10 40 00 00 48 85 c0 74 0e <48> 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 c7 c7 08 6a 3a a0
RIP [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
RSP <ffff880276913d78>
This is because tun_net_xmit() needs to call nf_reset()
before queuing skb into receive_queue
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/tun.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2c6a22e..b7c457a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -747,6 +747,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
goto drop;
skb_orphan(skb);
+ nf_reset(skb);
+
/* Enqueue packet */
skb_queue_tail(&tfile->socket.sk->sk_receive_queue, skb);
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-03-06 21:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-06 15:59 destroy_conntrack GPF in 3.7.9 Dave Jones
2013-03-06 16:44 ` Dave Jones
2013-03-06 16:54 ` Eric Dumazet
2013-03-06 20:46 ` David Miller
2013-03-06 21:02 ` [PATCH] tun: add a missing nf_reset() in tun_net_xmit() Eric Dumazet
2013-03-06 21:05 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox