* openconnect triggers soft lockup in __skb_get_rxhash @ 2012-12-17 0:56 Kirill A. Shutemov 2012-12-17 1:22 ` David Miller 0 siblings, 1 reply; 8+ messages in thread From: Kirill A. Shutemov @ 2012-12-17 0:56 UTC (permalink / raw) To: Maxim Krasnyansky, David S. Miller; +Cc: netdev, David Woodhouse Hi, In few minutes after starting openconnect it starts consume 100% CPU and I can see soft lockup report in dmesg: [ 231.684591] BUG: soft lockup - CPU#3 stuck for 22s! [openconnect:3537] [ 231.684595] Modules linked in: rfcomm bnep iwldvm btusb iwlwifi bluetooth acpi_cpufreq container mperf thermal battery ac processor [ 231.684607] CPU 3 [ 231.684610] Pid: 3537, comm: openconnect Not tainted 3.7.0-08585-g2a74dbb #165 Hewlett-Packard HP EliteBook 8440p/172A [ 231.684612] RIP: 0010:[<ffffffff815f3f62>] [<ffffffff815f3f62>] skb_flow_dissect+0x52/0x3b0 [ 231.684621] RSP: 0018:ffff88012ebbdc48 EFLAGS: 00000246 [ 231.684622] RAX: 0000000000000004 RBX: ffffffff8173b360 RCX: ff040404ff040404 [ 231.684623] RDX: 0000000000000000 RSI: ffff88012ebbdcc8 RDI: ffff880122770e00 [ 231.684624] RBP: ffff88012ebbdcb8 R08: 00000000000004f2 R09: ffff88012faa6000 [ 231.684626] R10: ffff880122770e00 R11: 0000000000000000 R12: ffffffff8173b476 [ 231.684627] R13: 0000000010000002 R14: ffff88012ebbc000 R15: 0000000000000004 [ 231.684628] FS: 00007f31249d9740(0000) GS:ffff880132e00000(0000) knlGS:0000000000000000 [ 231.684630] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 231.684631] CR2: 00007fed86cca000 CR3: 0000000122489000 CR4: 00000000000007e0 [ 231.684632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 231.684633] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 231.684635] Process openconnect (pid: 3537, threadinfo ffff88012ebbc000, task ffff8801305be4a0) [ 231.684636] Stack: [ 231.684637] ffff88012ebbdce8 ffffffff815f0072 ffff8801305be4a0 00000000000004f2 [ 231.684640] 0000000000000000 00000000000004f2 00000000000004f2 0000000000000000 [ 231.684643] ffff88012ebbdce8 00000000748aa9b5 ffff880122770e00 ffff88012ebbddf0 [ 231.684646] Call Trace: [ 231.684651] [<ffffffff815f0072>] ? memcpy_fromiovecend+0x72/0xc0 [ 231.684654] [<ffffffff815f71fa>] __skb_get_rxhash+0x1a/0xd0 [ 231.684659] [<ffffffff814a04a8>] tun_get_user+0x5b8/0x7b0 [ 231.684662] [<ffffffff8149ea79>] ? __tun_get+0x59/0x80 [ 231.684664] [<ffffffff814a07b1>] tun_chr_aio_write+0x81/0xb0 [ 231.684670] [<ffffffff810e342e>] ? put_lock_stats.isra.15+0xe/0x40 [ 231.684675] [<ffffffff811ac857>] do_sync_write+0xa7/0xe0 [ 231.684678] [<ffffffff811acf7b>] vfs_write+0xab/0x190 [ 231.684681] [<ffffffff811ad2f5>] sys_write+0x55/0xb0 [ 231.684684] [<ffffffff81742906>] system_call_fastpath+0x1a/0x1f [ 231.684685] Code: 31 c0 44 8b a7 a0 00 00 00 44 0f b7 6f 76 4c 03 a7 b0 00 00 00 44 2b a7 b8 00 00 00 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 <66> 41 81 fd 81 00 0f 84 72 01 00 00 77 70 66 41 83 fd 08 74 29 -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash 2012-12-17 0:56 openconnect triggers soft lockup in __skb_get_rxhash Kirill A. Shutemov @ 2012-12-17 1:22 ` David Miller 2012-12-17 1:46 ` Kirill A. Shutemov 0 siblings, 1 reply; 8+ messages in thread From: David Miller @ 2012-12-17 1:22 UTC (permalink / raw) To: kirill; +Cc: maxk, netdev, dwmw2 Already fixed in Linus's tree by: >From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@google.com> Date: Wed, 12 Dec 2012 19:22:57 +0000 Subject: [PATCH 18/19] tuntap: dont use skb after netif_rx_ni(skb) On Wed, 2012-12-12 at 23:16 -0500, Dave Jones wrote: > Since todays net merge, I see this when I start openvpn.. > > general protection fault: 0000 [#1] PREEMPT SMP > Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy > CPU 0 > Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14 /D975XBX > RIP: 0010:[<ffffffff815b54a4>] [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0 > RSP: 0018:ffff88007d0d9c48 EFLAGS: 00010206 > RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a > RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80 > RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80 > R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80 > FS: 00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60) > Stack: > ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000 > ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831 > ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000 > Call Trace: > [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0 > [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290 > [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0 > [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun] > [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0 > [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0 > [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun] > [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50 > [<ffffffff811db917>] do_sync_write+0xa7/0xe0 > [<ffffffff811dc01f>] vfs_write+0xaf/0x190 > [<ffffffff811dc375>] sys_write+0x55/0xa0 > [<ffffffff81705540>] tracesys+0xdd/0xe2 > Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48 > RIP [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0 > RSP <ffff88007d0d9c48> > ---[ end trace 6d42c834c72c002e ]--- > > > Faulting instruction is > > 0: 8b 03 mov (%rbx),%eax > > rbx is slab poison (-20) so this looks like a use-after-free here... > > flow->ports = *ports; > 314: 8b 03 mov (%rbx),%eax > 316: 41 89 46 08 mov %eax,0x8(%r14) > > in the inlined skb_header_pointer in skb_flow_dissect > > Dave > commit 96442e4242 (tuntap: choose the txq based on rxq) added a use after free. Cache rxhash in a temp variable before calling netif_rx_ni() Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> --- drivers/net/tun.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 2ac2164..40b426e 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -297,13 +297,12 @@ static void tun_flow_cleanup(unsigned long data) spin_unlock_bh(&tun->lock); } -static void tun_flow_update(struct tun_struct *tun, struct sk_buff *skb, +static void tun_flow_update(struct tun_struct *tun, u32 rxhash, u16 queue_index) { struct hlist_head *head; struct tun_flow_entry *e; unsigned long delay = tun->ageing_time; - u32 rxhash = skb_get_rxhash(skb); if (!rxhash) return; @@ -1010,6 +1009,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, int copylen; bool zerocopy = false; int err; + u32 rxhash; if (!(tun->flags & TUN_NO_PI)) { if ((len -= sizeof(pi)) > total_len) @@ -1162,12 +1162,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; } + rxhash = skb_get_rxhash(skb); netif_rx_ni(skb); tun->dev->stats.rx_packets++; tun->dev->stats.rx_bytes += len; - tun_flow_update(tun, skb, tfile->queue_index); + tun_flow_update(tun, rxhash, tfile->queue_index); return total_len; } -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash 2012-12-17 1:22 ` David Miller @ 2012-12-17 1:46 ` Kirill A. Shutemov 2012-12-17 4:46 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Kirill A. Shutemov @ 2012-12-17 1:46 UTC (permalink / raw) To: David Miller; +Cc: maxk, netdev, dwmw2 On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote: > > Already fixed in Linus's tree by: > > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001 No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and still see the issue. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash 2012-12-17 1:46 ` Kirill A. Shutemov @ 2012-12-17 4:46 ` Eric Dumazet 2012-12-17 8:11 ` Kirill A. Shutemov 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2012-12-17 4:46 UTC (permalink / raw) To: Kirill A. Shutemov; +Cc: David Miller, maxk, netdev, dwmw2 On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote: > On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote: > > > > Already fixed in Linus's tree by: > > > > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001 > > No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and > still see the issue. > Coud you try the following one liner ? diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 255a9f5..173acf5 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; } + skb_reset_network_header(skb); rxhash = skb_get_rxhash(skb); netif_rx_ni(skb); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash 2012-12-17 4:46 ` Eric Dumazet @ 2012-12-17 8:11 ` Kirill A. Shutemov 2012-12-17 13:38 ` Daniel Borkmann 0 siblings, 1 reply; 8+ messages in thread From: Kirill A. Shutemov @ 2012-12-17 8:11 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, maxk, netdev, dwmw2 On Sun, Dec 16, 2012 at 08:46:29PM -0800, Eric Dumazet wrote: > On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote: > > On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote: > > > > > > Already fixed in Linus's tree by: > > > > > > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001 > > > > No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and > > still see the issue. > > > > Coud you try the following one liner ? Works for me. So far no problems. Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 255a9f5..173acf5 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, > skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; > } > > + skb_reset_network_header(skb); > rxhash = skb_get_rxhash(skb); > netif_rx_ni(skb); > > > -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: openconnect triggers soft lockup in __skb_get_rxhash 2012-12-17 8:11 ` Kirill A. Shutemov @ 2012-12-17 13:38 ` Daniel Borkmann 2012-12-17 14:39 ` [PATCH] tuntap: reset network header before calling skb_get_rxhash() Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Daniel Borkmann @ 2012-12-17 13:38 UTC (permalink / raw) To: Kirill A. Shutemov; +Cc: Eric Dumazet, David Miller, maxk, netdev, dwmw2 On Mon, Dec 17, 2012 at 9:11 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote: > On Sun, Dec 16, 2012 at 08:46:29PM -0800, Eric Dumazet wrote: >> On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote: >> > On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote: >> > > >> > > Already fixed in Linus's tree by: >> > > >> > > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001 >> > >> > No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and >> > still see the issue. >> > >> >> Coud you try the following one liner ? > > Works for me. So far no problems. > > Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name> I can confirm the same, ran into the same issue when being connected via VPN and seems stable now. Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] tuntap: reset network header before calling skb_get_rxhash() 2012-12-17 13:38 ` Daniel Borkmann @ 2012-12-17 14:39 ` Eric Dumazet 2012-12-17 20:33 ` David Miller 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2012-12-17 14:39 UTC (permalink / raw) To: Daniel Borkmann, David Miller; +Cc: Kirill A. Shutemov, maxk, netdev, dwmw2 From: Eric Dumazet <edumazet@google.com> Commit 499744209b2c (tuntap: dont use skb after netif_rx_ni(skb)) introduced another bug. skb_get_rxhash() needs to access the network header, and it was set for us in netif_rx_ni(). We need to reset network header or else skb_flow_dissect() behavior is out of control. Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name> Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: Eric Dumazet <edumazet@google.com> --- drivers/net/tun.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 255a9f5..173acf5 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; } + skb_reset_network_header(skb); rxhash = skb_get_rxhash(skb); netif_rx_ni(skb); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] tuntap: reset network header before calling skb_get_rxhash() 2012-12-17 14:39 ` [PATCH] tuntap: reset network header before calling skb_get_rxhash() Eric Dumazet @ 2012-12-17 20:33 ` David Miller 0 siblings, 0 replies; 8+ messages in thread From: David Miller @ 2012-12-17 20:33 UTC (permalink / raw) To: eric.dumazet; +Cc: danborkmann, kirill, maxk, netdev, dwmw2 From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 17 Dec 2012 06:39:20 -0800 > From: Eric Dumazet <edumazet@google.com> > > Commit 499744209b2c (tuntap: dont use skb after netif_rx_ni(skb)) > introduced another bug. > > skb_get_rxhash() needs to access the network header, and it was > set for us in netif_rx_ni(). > > We need to reset network header or else skb_flow_dissect() behavior > is out of control. > > Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name> > Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, hopefully we've really resolved this fully now :) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-12-17 20:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-17 0:56 openconnect triggers soft lockup in __skb_get_rxhash Kirill A. Shutemov 2012-12-17 1:22 ` David Miller 2012-12-17 1:46 ` Kirill A. Shutemov 2012-12-17 4:46 ` Eric Dumazet 2012-12-17 8:11 ` Kirill A. Shutemov 2012-12-17 13:38 ` Daniel Borkmann 2012-12-17 14:39 ` [PATCH] tuntap: reset network header before calling skb_get_rxhash() Eric Dumazet 2012-12-17 20:33 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).