* openconnect triggers soft lockup in __skb_get_rxhash
@ 2012-12-17 0:56 Kirill A. Shutemov
2012-12-17 1:22 ` David Miller
0 siblings, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2012-12-17 0:56 UTC (permalink / raw)
To: Maxim Krasnyansky, David S. Miller; +Cc: netdev, David Woodhouse
Hi,
In few minutes after starting openconnect it starts consume 100% CPU and I
can see soft lockup report in dmesg:
[ 231.684591] BUG: soft lockup - CPU#3 stuck for 22s! [openconnect:3537]
[ 231.684595] Modules linked in: rfcomm bnep iwldvm btusb iwlwifi bluetooth acpi_cpufreq container mperf thermal battery ac processor
[ 231.684607] CPU 3
[ 231.684610] Pid: 3537, comm: openconnect Not tainted 3.7.0-08585-g2a74dbb #165 Hewlett-Packard HP EliteBook 8440p/172A
[ 231.684612] RIP: 0010:[<ffffffff815f3f62>] [<ffffffff815f3f62>] skb_flow_dissect+0x52/0x3b0
[ 231.684621] RSP: 0018:ffff88012ebbdc48 EFLAGS: 00000246
[ 231.684622] RAX: 0000000000000004 RBX: ffffffff8173b360 RCX: ff040404ff040404
[ 231.684623] RDX: 0000000000000000 RSI: ffff88012ebbdcc8 RDI: ffff880122770e00
[ 231.684624] RBP: ffff88012ebbdcb8 R08: 00000000000004f2 R09: ffff88012faa6000
[ 231.684626] R10: ffff880122770e00 R11: 0000000000000000 R12: ffffffff8173b476
[ 231.684627] R13: 0000000010000002 R14: ffff88012ebbc000 R15: 0000000000000004
[ 231.684628] FS: 00007f31249d9740(0000) GS:ffff880132e00000(0000) knlGS:0000000000000000
[ 231.684630] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 231.684631] CR2: 00007fed86cca000 CR3: 0000000122489000 CR4: 00000000000007e0
[ 231.684632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 231.684633] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 231.684635] Process openconnect (pid: 3537, threadinfo ffff88012ebbc000, task ffff8801305be4a0)
[ 231.684636] Stack:
[ 231.684637] ffff88012ebbdce8 ffffffff815f0072 ffff8801305be4a0 00000000000004f2
[ 231.684640] 0000000000000000 00000000000004f2 00000000000004f2 0000000000000000
[ 231.684643] ffff88012ebbdce8 00000000748aa9b5 ffff880122770e00 ffff88012ebbddf0
[ 231.684646] Call Trace:
[ 231.684651] [<ffffffff815f0072>] ? memcpy_fromiovecend+0x72/0xc0
[ 231.684654] [<ffffffff815f71fa>] __skb_get_rxhash+0x1a/0xd0
[ 231.684659] [<ffffffff814a04a8>] tun_get_user+0x5b8/0x7b0
[ 231.684662] [<ffffffff8149ea79>] ? __tun_get+0x59/0x80
[ 231.684664] [<ffffffff814a07b1>] tun_chr_aio_write+0x81/0xb0
[ 231.684670] [<ffffffff810e342e>] ? put_lock_stats.isra.15+0xe/0x40
[ 231.684675] [<ffffffff811ac857>] do_sync_write+0xa7/0xe0
[ 231.684678] [<ffffffff811acf7b>] vfs_write+0xab/0x190
[ 231.684681] [<ffffffff811ad2f5>] sys_write+0x55/0xb0
[ 231.684684] [<ffffffff81742906>] system_call_fastpath+0x1a/0x1f
[ 231.684685] Code: 31 c0 44 8b a7 a0 00 00 00 44 0f b7 6f 76 4c 03 a7 b0 00 00 00 44 2b a7 b8 00 00 00 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 <66> 41 81 fd 81 00 0f 84 72 01 00 00 77 70 66 41 83 fd 08 74 29
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash
2012-12-17 0:56 openconnect triggers soft lockup in __skb_get_rxhash Kirill A. Shutemov
@ 2012-12-17 1:22 ` David Miller
2012-12-17 1:46 ` Kirill A. Shutemov
0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2012-12-17 1:22 UTC (permalink / raw)
To: kirill; +Cc: maxk, netdev, dwmw2
Already fixed in Linus's tree by:
>From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 12 Dec 2012 19:22:57 +0000
Subject: [PATCH 18/19] tuntap: dont use skb after netif_rx_ni(skb)
On Wed, 2012-12-12 at 23:16 -0500, Dave Jones wrote:
> Since todays net merge, I see this when I start openvpn..
>
> general protection fault: 0000 [#1] PREEMPT SMP
> Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy
> CPU 0
> Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14 /D975XBX
> RIP: 0010:[<ffffffff815b54a4>] [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
> RSP: 0018:ffff88007d0d9c48 EFLAGS: 00010206
> RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a
> RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80
> RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80
> R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80
> FS: 00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60)
> Stack:
> ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000
> ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831
> ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000
> Call Trace:
> [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0
> [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290
> [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0
> [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun]
> [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0
> [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0
> [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun]
> [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50
> [<ffffffff811db917>] do_sync_write+0xa7/0xe0
> [<ffffffff811dc01f>] vfs_write+0xaf/0x190
> [<ffffffff811dc375>] sys_write+0x55/0xa0
> [<ffffffff81705540>] tracesys+0xdd/0xe2
> Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48
> RIP [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
> RSP <ffff88007d0d9c48>
> ---[ end trace 6d42c834c72c002e ]---
>
>
> Faulting instruction is
>
> 0: 8b 03 mov (%rbx),%eax
>
> rbx is slab poison (-20) so this looks like a use-after-free here...
>
> flow->ports = *ports;
> 314: 8b 03 mov (%rbx),%eax
> 316: 41 89 46 08 mov %eax,0x8(%r14)
>
> in the inlined skb_header_pointer in skb_flow_dissect
>
> Dave
>
commit 96442e4242 (tuntap: choose the txq based on rxq) added
a use after free.
Cache rxhash in a temp variable before calling netif_rx_ni()
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/tun.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2ac2164..40b426e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -297,13 +297,12 @@ static void tun_flow_cleanup(unsigned long data)
spin_unlock_bh(&tun->lock);
}
-static void tun_flow_update(struct tun_struct *tun, struct sk_buff *skb,
+static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
u16 queue_index)
{
struct hlist_head *head;
struct tun_flow_entry *e;
unsigned long delay = tun->ageing_time;
- u32 rxhash = skb_get_rxhash(skb);
if (!rxhash)
return;
@@ -1010,6 +1009,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
int copylen;
bool zerocopy = false;
int err;
+ u32 rxhash;
if (!(tun->flags & TUN_NO_PI)) {
if ((len -= sizeof(pi)) > total_len)
@@ -1162,12 +1162,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
}
+ rxhash = skb_get_rxhash(skb);
netif_rx_ni(skb);
tun->dev->stats.rx_packets++;
tun->dev->stats.rx_bytes += len;
- tun_flow_update(tun, skb, tfile->queue_index);
+ tun_flow_update(tun, rxhash, tfile->queue_index);
return total_len;
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash
2012-12-17 1:22 ` David Miller
@ 2012-12-17 1:46 ` Kirill A. Shutemov
2012-12-17 4:46 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2012-12-17 1:46 UTC (permalink / raw)
To: David Miller; +Cc: maxk, netdev, dwmw2
On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
>
> Already fixed in Linus's tree by:
>
> From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
still see the issue.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash
2012-12-17 1:46 ` Kirill A. Shutemov
@ 2012-12-17 4:46 ` Eric Dumazet
2012-12-17 8:11 ` Kirill A. Shutemov
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2012-12-17 4:46 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: David Miller, maxk, netdev, dwmw2
On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote:
> On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
> >
> > Already fixed in Linus's tree by:
> >
> > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
>
> No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
> still see the issue.
>
Coud you try the following one liner ?
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 255a9f5..173acf5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
}
+ skb_reset_network_header(skb);
rxhash = skb_get_rxhash(skb);
netif_rx_ni(skb);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash
2012-12-17 4:46 ` Eric Dumazet
@ 2012-12-17 8:11 ` Kirill A. Shutemov
2012-12-17 13:38 ` Daniel Borkmann
0 siblings, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2012-12-17 8:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, maxk, netdev, dwmw2
On Sun, Dec 16, 2012 at 08:46:29PM -0800, Eric Dumazet wrote:
> On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote:
> > On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
> > >
> > > Already fixed in Linus's tree by:
> > >
> > > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
> >
> > No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
> > still see the issue.
> >
>
> Coud you try the following one liner ?
Works for me. So far no problems.
Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 255a9f5..173acf5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
> }
>
> + skb_reset_network_header(skb);
> rxhash = skb_get_rxhash(skb);
> netif_rx_ni(skb);
>
>
>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash
2012-12-17 8:11 ` Kirill A. Shutemov
@ 2012-12-17 13:38 ` Daniel Borkmann
2012-12-17 14:39 ` [PATCH] tuntap: reset network header before calling skb_get_rxhash() Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Borkmann @ 2012-12-17 13:38 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: Eric Dumazet, David Miller, maxk, netdev, dwmw2
On Mon, Dec 17, 2012 at 9:11 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Sun, Dec 16, 2012 at 08:46:29PM -0800, Eric Dumazet wrote:
>> On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote:
>> > On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
>> > >
>> > > Already fixed in Linus's tree by:
>> > >
>> > > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
>> >
>> > No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
>> > still see the issue.
>> >
>>
>> Coud you try the following one liner ?
>
> Works for me. So far no problems.
>
> Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name>
I can confirm the same, ran into the same issue when being connected
via VPN and seems stable now.
Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] tuntap: reset network header before calling skb_get_rxhash()
2012-12-17 13:38 ` Daniel Borkmann
@ 2012-12-17 14:39 ` Eric Dumazet
2012-12-17 20:33 ` David Miller
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2012-12-17 14:39 UTC (permalink / raw)
To: Daniel Borkmann, David Miller; +Cc: Kirill A. Shutemov, maxk, netdev, dwmw2
From: Eric Dumazet <edumazet@google.com>
Commit 499744209b2c (tuntap: dont use skb after netif_rx_ni(skb))
introduced another bug.
skb_get_rxhash() needs to access the network header, and it was
set for us in netif_rx_ni().
We need to reset network header or else skb_flow_dissect() behavior
is out of control.
Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/tun.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 255a9f5..173acf5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
}
+ skb_reset_network_header(skb);
rxhash = skb_get_rxhash(skb);
netif_rx_ni(skb);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] tuntap: reset network header before calling skb_get_rxhash()
2012-12-17 14:39 ` [PATCH] tuntap: reset network header before calling skb_get_rxhash() Eric Dumazet
@ 2012-12-17 20:33 ` David Miller
0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2012-12-17 20:33 UTC (permalink / raw)
To: eric.dumazet; +Cc: danborkmann, kirill, maxk, netdev, dwmw2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 17 Dec 2012 06:39:20 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> Commit 499744209b2c (tuntap: dont use skb after netif_rx_ni(skb))
> introduced another bug.
>
> skb_get_rxhash() needs to access the network header, and it was
> set for us in netif_rx_ni().
>
> We need to reset network header or else skb_flow_dissect() behavior
> is out of control.
>
> Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name>
> Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, hopefully we've really resolved this fully now :)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-12-17 20:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-17 0:56 openconnect triggers soft lockup in __skb_get_rxhash Kirill A. Shutemov
2012-12-17 1:22 ` David Miller
2012-12-17 1:46 ` Kirill A. Shutemov
2012-12-17 4:46 ` Eric Dumazet
2012-12-17 8:11 ` Kirill A. Shutemov
2012-12-17 13:38 ` Daniel Borkmann
2012-12-17 14:39 ` [PATCH] tuntap: reset network header before calling skb_get_rxhash() Eric Dumazet
2012-12-17 20:33 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).