Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
@ 2015-08-12 19:19 linux
  2015-08-12 20:41 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: linux @ 2015-08-12 19:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

Hi,

On my box running Xen with a 4.2-rc6 kernel i still get this splat in 
dom0,
which crashes the box.
(i reported a similar splat before (at rc4) here, 
http://www.spinics.net/lists/netdev/msg337570.html)

Never seen this one on 4.1, so it seems a regression.

--
Sander


[81133.193439] general protection fault: 0000 [#1] SMP
[81133.204284] Modules linked in:
[81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 
4.2.0-rc6-20150811-linus-doflr+ #1
[81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
V1.8B1 09/13/2010
[81133.236237] task: ffff880059b91580 ti: ffff880059bb4000 task.ti: 
ffff880059bb4000
[81133.246808] RIP: e030:[<ffffffff8110fb18>]  [<ffffffff8110fb18>] 
detach_if_pending+0x18/0x80
[81133.257354] RSP: e02b:ffff880059bb7848  EFLAGS: 00010086
[81133.267749] RAX: ffff88004eddc7f0 RBX: ffff88000e20ae08 RCX: 
dead000000200200
[81133.278201] RDX: 0000000000000000 RSI: ffff88005f60e600 RDI: 
ffff88000e20ae08
[81133.288723] RBP: ffff880059bb7848 R08: 0000000000000001 R09: 
0000000000000001
[81133.298930] R10: 0000000000000003 R11: ffff88000e20ad68 R12: 
0000000000000000
[81133.308875] R13: 0000000101735569 R14: 0000000000015f90 R15: 
ffff88005f60e600
[81133.318845] FS:  00007f28c6f7c800(0000) GS:ffff88005f600000(0000) 
knlGS:0000000000000000
[81133.328864] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[81133.338693] CR2: ffff8000007f6800 CR3: 000000003d55c000 CR4: 
0000000000000660
[81133.348462] Stack:
[81133.358005]  ffff880059bb7898 ffffffff8110fe3f ffffffff810fc261 
0000000000000200
[81133.367682]  0000000000000003 ffff88000e20ad68 0000000000000000 
ffff88005854d400
[81133.377064]  0000000000015f90 0000000000000000 ffff880059bb78c8 
ffffffff819b5243
[81133.386374] Call Trace:
[81133.395596]  [<ffffffff8110fe3f>] mod_timer_pending+0x3f/0xe0
[81133.404999]  [<ffffffff810fc261>] ? 
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[81133.414255]  [<ffffffff819b5243>] __nf_ct_refresh_acct+0xa3/0xb0
[81133.423137]  [<ffffffff819bbe8b>] tcp_packet+0xb3b/0x1290
[81133.431894]  [<ffffffff810cb8ca>] ? __local_bh_enable_ip+0x2a/0x90
[81133.440622]  [<ffffffff819b4939>] ? 
__nf_conntrack_find_get+0x129/0x2a0
[81133.449339]  [<ffffffff819b682c>] nf_conntrack_in+0x29c/0x7c0
[81133.457940]  [<ffffffff81a67181>] ipv4_conntrack_in+0x21/0x30
[81133.466296]  [<ffffffff819aea1c>] nf_iterate+0x4c/0x80
[81133.474401]  [<ffffffff819aeab4>] nf_hook_slow+0x64/0xc0
[81133.482615]  [<ffffffff81a211ec>] ip_rcv+0x2ec/0x380
[81133.490781]  [<ffffffff81a209f0>] ? 
ip_local_deliver_finish+0x130/0x130
[81133.498790]  [<ffffffff8197e140>] 
__netif_receive_skb_core+0x2a0/0x970
[81133.506714]  [<ffffffff81a56db8>] ? inet_gro_receive+0x1c8/0x200
[81133.514609]  [<ffffffff81980705>] __netif_receive_skb+0x15/0x70
[81133.522333]  [<ffffffff8198077e>] 
netif_receive_skb_internal+0x1e/0x80
[81133.529840]  [<ffffffff81980f3b>] napi_gro_receive+0x6b/0x90
[81133.537173]  [<ffffffff81740fb6>] rtl8169_poll+0x2e6/0x600
[81133.544444]  [<ffffffff810fc261>] ? 
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[81133.551566]  [<ffffffff81981ad7>] net_rx_action+0x1f7/0x300
[81133.558412]  [<ffffffff810cb6c3>] __do_softirq+0x103/0x210
[81133.565353]  [<ffffffff810cb807>] run_ksoftirqd+0x37/0x60
[81133.572359]  [<ffffffff810e4de0>] smpboot_thread_fn+0x130/0x190
[81133.579215]  [<ffffffff810e4cb0>] ? sort_range+0x20/0x20
[81133.586042]  [<ffffffff810e1fae>] kthread+0xee/0x110
[81133.592792]  [<ffffffff810e1ec0>] ? 
kthread_create_on_node+0x1b0/0x1b0
[81133.599694]  [<ffffffff81af92df>] ret_from_fork+0x3f/0x70
[81133.606662]  [<ffffffff810e1ec0>] ? 
kthread_create_on_node+0x1b0/0x1b0
[81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 
00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 
74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48
[81133.627196] RIP  [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
[81133.634036]  RSP <ffff880059bb7848>
[81133.640817] ---[ end trace eaf596e1fcf6a591 ]---
[81133.647521] Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-12 19:19 Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80 linux
@ 2015-08-12 20:41 ` Eric Dumazet
  2015-08-12 20:50   ` linux
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2015-08-12 20:41 UTC (permalink / raw)
  To: linux; +Cc: linux-kernel, netdev

On Wed, 2015-08-12 at 21:19 +0200, linux@eikelenboom.it wrote:
> Hi,
> 
> On my box running Xen with a 4.2-rc6 kernel i still get this splat in 
> dom0,
> which crashes the box.
> (i reported a similar splat before (at rc4) here, 
> http://www.spinics.net/lists/netdev/msg337570.html)
> 
> Never seen this one on 4.1, so it seems a regression.
> 
> --
> Sander
> 
> 
> [81133.193439] general protection fault: 0000 [#1] SMP
> [81133.204284] Modules linked in:
> [81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 
> 4.2.0-rc6-20150811-linus-doflr+ #1
> [81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
> V1.8B1 09/13/2010
> [81133.236237] task: ffff880059b91580 ti: ffff880059bb4000 task.ti: 
> ffff880059bb4000
> [81133.246808] RIP: e030:[<ffffffff8110fb18>]  [<ffffffff8110fb18>] 
> detach_if_pending+0x18/0x80
> [81133.257354] RSP: e02b:ffff880059bb7848  EFLAGS: 00010086
> [81133.267749] RAX: ffff88004eddc7f0 RBX: ffff88000e20ae08 RCX: 
> dead000000200200
> [81133.278201] RDX: 0000000000000000 RSI: ffff88005f60e600 RDI: 
> ffff88000e20ae08
> [81133.288723] RBP: ffff880059bb7848 R08: 0000000000000001 R09: 
> 0000000000000001
> [81133.298930] R10: 0000000000000003 R11: ffff88000e20ad68 R12: 
> 0000000000000000
> [81133.308875] R13: 0000000101735569 R14: 0000000000015f90 R15: 
> ffff88005f60e600
> [81133.318845] FS:  00007f28c6f7c800(0000) GS:ffff88005f600000(0000) 
> knlGS:0000000000000000
> [81133.328864] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [81133.338693] CR2: ffff8000007f6800 CR3: 000000003d55c000 CR4: 
> 0000000000000660
> [81133.348462] Stack:
> [81133.358005]  ffff880059bb7898 ffffffff8110fe3f ffffffff810fc261 
> 0000000000000200
> [81133.367682]  0000000000000003 ffff88000e20ad68 0000000000000000 
> ffff88005854d400
> [81133.377064]  0000000000015f90 0000000000000000 ffff880059bb78c8 
> ffffffff819b5243
> [81133.386374] Call Trace:
> [81133.395596]  [<ffffffff8110fe3f>] mod_timer_pending+0x3f/0xe0
> [81133.404999]  [<ffffffff810fc261>] ? 
> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
> [81133.414255]  [<ffffffff819b5243>] __nf_ct_refresh_acct+0xa3/0xb0
> [81133.423137]  [<ffffffff819bbe8b>] tcp_packet+0xb3b/0x1290
> [81133.431894]  [<ffffffff810cb8ca>] ? __local_bh_enable_ip+0x2a/0x90
> [81133.440622]  [<ffffffff819b4939>] ? 
> __nf_conntrack_find_get+0x129/0x2a0
> [81133.449339]  [<ffffffff819b682c>] nf_conntrack_in+0x29c/0x7c0
> [81133.457940]  [<ffffffff81a67181>] ipv4_conntrack_in+0x21/0x30
> [81133.466296]  [<ffffffff819aea1c>] nf_iterate+0x4c/0x80
> [81133.474401]  [<ffffffff819aeab4>] nf_hook_slow+0x64/0xc0
> [81133.482615]  [<ffffffff81a211ec>] ip_rcv+0x2ec/0x380
> [81133.490781]  [<ffffffff81a209f0>] ? 
> ip_local_deliver_finish+0x130/0x130
> [81133.498790]  [<ffffffff8197e140>] 
> __netif_receive_skb_core+0x2a0/0x970
> [81133.506714]  [<ffffffff81a56db8>] ? inet_gro_receive+0x1c8/0x200
> [81133.514609]  [<ffffffff81980705>] __netif_receive_skb+0x15/0x70
> [81133.522333]  [<ffffffff8198077e>] 
> netif_receive_skb_internal+0x1e/0x80
> [81133.529840]  [<ffffffff81980f3b>] napi_gro_receive+0x6b/0x90
> [81133.537173]  [<ffffffff81740fb6>] rtl8169_poll+0x2e6/0x600
> [81133.544444]  [<ffffffff810fc261>] ? 
> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
> [81133.551566]  [<ffffffff81981ad7>] net_rx_action+0x1f7/0x300
> [81133.558412]  [<ffffffff810cb6c3>] __do_softirq+0x103/0x210
> [81133.565353]  [<ffffffff810cb807>] run_ksoftirqd+0x37/0x60
> [81133.572359]  [<ffffffff810e4de0>] smpboot_thread_fn+0x130/0x190
> [81133.579215]  [<ffffffff810e4cb0>] ? sort_range+0x20/0x20
> [81133.586042]  [<ffffffff810e1fae>] kthread+0xee/0x110
> [81133.592792]  [<ffffffff810e1ec0>] ? 
> kthread_create_on_node+0x1b0/0x1b0
> [81133.599694]  [<ffffffff81af92df>] ret_from_fork+0x3f/0x70
> [81133.606662]  [<ffffffff810e1ec0>] ? 
> kthread_create_on_node+0x1b0/0x1b0
> [81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 
> 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 
> 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48
> [81133.627196] RIP  [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
> [81133.634036]  RSP <ffff880059bb7848>
> [81133.640817] ---[ end trace eaf596e1fcf6a591 ]---
> [81133.647521] Kernel panic - not syncing: Fatal exception in interrupt

This looks like the bug fixed in David Miller net tree :

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=2235f2ac75fd2501c251b0b699a9632e80239a6d

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-12 20:41 ` Eric Dumazet
@ 2015-08-12 20:50   ` linux
  2015-08-12 21:40     ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: linux @ 2015-08-12 20:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev

On 2015-08-12 22:41, Eric Dumazet wrote:
> On Wed, 2015-08-12 at 21:19 +0200, linux@eikelenboom.it wrote:
>> Hi,
>> 
>> On my box running Xen with a 4.2-rc6 kernel i still get this splat in
>> dom0,
>> which crashes the box.
>> (i reported a similar splat before (at rc4) here,
>> http://www.spinics.net/lists/netdev/msg337570.html)
>> 
>> Never seen this one on 4.1, so it seems a regression.
>> 
>> --
>> Sander
>> 
>> 
>> [81133.193439] general protection fault: 0000 [#1] SMP
>> [81133.204284] Modules linked in:
>> [81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
>> 4.2.0-rc6-20150811-linus-doflr+ #1
>> [81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , 
>> BIOS
>> V1.8B1 09/13/2010
>> [81133.236237] task: ffff880059b91580 ti: ffff880059bb4000 task.ti:
>> ffff880059bb4000
>> [81133.246808] RIP: e030:[<ffffffff8110fb18>]  [<ffffffff8110fb18>]
>> detach_if_pending+0x18/0x80
>> [81133.257354] RSP: e02b:ffff880059bb7848  EFLAGS: 00010086
>> [81133.267749] RAX: ffff88004eddc7f0 RBX: ffff88000e20ae08 RCX:
>> dead000000200200
>> [81133.278201] RDX: 0000000000000000 RSI: ffff88005f60e600 RDI:
>> ffff88000e20ae08
>> [81133.288723] RBP: ffff880059bb7848 R08: 0000000000000001 R09:
>> 0000000000000001
>> [81133.298930] R10: 0000000000000003 R11: ffff88000e20ad68 R12:
>> 0000000000000000
>> [81133.308875] R13: 0000000101735569 R14: 0000000000015f90 R15:
>> ffff88005f60e600
>> [81133.318845] FS:  00007f28c6f7c800(0000) GS:ffff88005f600000(0000)
>> knlGS:0000000000000000
>> [81133.328864] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [81133.338693] CR2: ffff8000007f6800 CR3: 000000003d55c000 CR4:
>> 0000000000000660
>> [81133.348462] Stack:
>> [81133.358005]  ffff880059bb7898 ffffffff8110fe3f ffffffff810fc261
>> 0000000000000200
>> [81133.367682]  0000000000000003 ffff88000e20ad68 0000000000000000
>> ffff88005854d400
>> [81133.377064]  0000000000015f90 0000000000000000 ffff880059bb78c8
>> ffffffff819b5243
>> [81133.386374] Call Trace:
>> [81133.395596]  [<ffffffff8110fe3f>] mod_timer_pending+0x3f/0xe0
>> [81133.404999]  [<ffffffff810fc261>] ?
>> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>> [81133.414255]  [<ffffffff819b5243>] __nf_ct_refresh_acct+0xa3/0xb0
>> [81133.423137]  [<ffffffff819bbe8b>] tcp_packet+0xb3b/0x1290
>> [81133.431894]  [<ffffffff810cb8ca>] ? __local_bh_enable_ip+0x2a/0x90
>> [81133.440622]  [<ffffffff819b4939>] ?
>> __nf_conntrack_find_get+0x129/0x2a0
>> [81133.449339]  [<ffffffff819b682c>] nf_conntrack_in+0x29c/0x7c0
>> [81133.457940]  [<ffffffff81a67181>] ipv4_conntrack_in+0x21/0x30
>> [81133.466296]  [<ffffffff819aea1c>] nf_iterate+0x4c/0x80
>> [81133.474401]  [<ffffffff819aeab4>] nf_hook_slow+0x64/0xc0
>> [81133.482615]  [<ffffffff81a211ec>] ip_rcv+0x2ec/0x380
>> [81133.490781]  [<ffffffff81a209f0>] ?
>> ip_local_deliver_finish+0x130/0x130
>> [81133.498790]  [<ffffffff8197e140>]
>> __netif_receive_skb_core+0x2a0/0x970
>> [81133.506714]  [<ffffffff81a56db8>] ? inet_gro_receive+0x1c8/0x200
>> [81133.514609]  [<ffffffff81980705>] __netif_receive_skb+0x15/0x70
>> [81133.522333]  [<ffffffff8198077e>]
>> netif_receive_skb_internal+0x1e/0x80
>> [81133.529840]  [<ffffffff81980f3b>] napi_gro_receive+0x6b/0x90
>> [81133.537173]  [<ffffffff81740fb6>] rtl8169_poll+0x2e6/0x600
>> [81133.544444]  [<ffffffff810fc261>] ?
>> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>> [81133.551566]  [<ffffffff81981ad7>] net_rx_action+0x1f7/0x300
>> [81133.558412]  [<ffffffff810cb6c3>] __do_softirq+0x103/0x210
>> [81133.565353]  [<ffffffff810cb807>] run_ksoftirqd+0x37/0x60
>> [81133.572359]  [<ffffffff810e4de0>] smpboot_thread_fn+0x130/0x190
>> [81133.579215]  [<ffffffff810e4cb0>] ? sort_range+0x20/0x20
>> [81133.586042]  [<ffffffff810e1fae>] kthread+0xee/0x110
>> [81133.592792]  [<ffffffff810e1ec0>] ?
>> kthread_create_on_node+0x1b0/0x1b0
>> [81133.599694]  [<ffffffff81af92df>] ret_from_fork+0x3f/0x70
>> [81133.606662]  [<ffffffff810e1ec0>] ?
>> kthread_create_on_node+0x1b0/0x1b0
>> [81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 
>> 00
>> 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 
>> 08
>> 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48
>> [81133.627196] RIP  [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
>> [81133.634036]  RSP <ffff880059bb7848>
>> [81133.640817] ---[ end trace eaf596e1fcf6a591 ]---
>> [81133.647521] Kernel panic - not syncing: Fatal exception in 
>> interrupt
> 
> This looks like the bug fixed in David Miller net tree :
> 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=2235f2ac75fd2501c251b0b699a9632e80239a6d

Will pull the net-tree in and re-test.
But since it only seems to crash after a day or two, that will take some 
time.

Thanks,

Sander

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-12 20:50   ` linux
@ 2015-08-12 21:40     ` David Miller
  2015-08-12 21:46       ` Sander Eikelenboom
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2015-08-12 21:40 UTC (permalink / raw)
  To: linux; +Cc: eric.dumazet, linux-kernel, netdev

From: linux@eikelenboom.it
Date: Wed, 12 Aug 2015 22:50:42 +0200

> On 2015-08-12 22:41, Eric Dumazet wrote:
>> On Wed, 2015-08-12 at 21:19 +0200, linux@eikelenboom.it wrote:
>>> Hi,
>>> On my box running Xen with a 4.2-rc6 kernel i still get this splat in
>>> dom0,
>>> which crashes the box.
>>> (i reported a similar splat before (at rc4) here,
>>> http://www.spinics.net/lists/netdev/msg337570.html)
>>> Never seen this one on 4.1, so it seems a regression.
>>> --
>>> Sander
>>> [81133.193439] general protection fault: 0000 [#1] SMP
>>> [81133.204284] Modules linked in:
>>> [81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
>>> 4.2.0-rc6-20150811-linus-doflr+ #1
>>> [81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS
>>> V1.8B1 09/13/2010
>>> [81133.236237] task: ffff880059b91580 ti: ffff880059bb4000 task.ti:
>>> ffff880059bb4000
>>> [81133.246808] RIP: e030:[<ffffffff8110fb18>]  [<ffffffff8110fb18>]
>>> detach_if_pending+0x18/0x80
>>> [81133.257354] RSP: e02b:ffff880059bb7848  EFLAGS: 00010086
>>> [81133.267749] RAX: ffff88004eddc7f0 RBX: ffff88000e20ae08 RCX:
>>> dead000000200200
>>> [81133.278201] RDX: 0000000000000000 RSI: ffff88005f60e600 RDI:
>>> ffff88000e20ae08
>>> [81133.288723] RBP: ffff880059bb7848 R08: 0000000000000001 R09:
>>> 0000000000000001
>>> [81133.298930] R10: 0000000000000003 R11: ffff88000e20ad68 R12:
>>> 0000000000000000
>>> [81133.308875] R13: 0000000101735569 R14: 0000000000015f90 R15:
>>> ffff88005f60e600
>>> [81133.318845] FS:  00007f28c6f7c800(0000) GS:ffff88005f600000(0000)
>>> knlGS:0000000000000000
>>> [81133.328864] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [81133.338693] CR2: ffff8000007f6800 CR3: 000000003d55c000 CR4:
>>> 0000000000000660
>>> [81133.348462] Stack:
>>> [81133.358005]  ffff880059bb7898 ffffffff8110fe3f ffffffff810fc261
>>> 0000000000000200
>>> [81133.367682]  0000000000000003 ffff88000e20ad68 0000000000000000
>>> ffff88005854d400
>>> [81133.377064]  0000000000015f90 0000000000000000 ffff880059bb78c8
>>> ffffffff819b5243
>>> [81133.386374] Call Trace:
>>> [81133.395596]  [<ffffffff8110fe3f>] mod_timer_pending+0x3f/0xe0
>>> [81133.404999]  [<ffffffff810fc261>] ?
>>> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>>> [81133.414255]  [<ffffffff819b5243>] __nf_ct_refresh_acct+0xa3/0xb0
>>> [81133.423137]  [<ffffffff819bbe8b>] tcp_packet+0xb3b/0x1290
>>> [81133.431894]  [<ffffffff810cb8ca>] ? __local_bh_enable_ip+0x2a/0x90
>>> [81133.440622]  [<ffffffff819b4939>] ?
>>> __nf_conntrack_find_get+0x129/0x2a0
>>> [81133.449339]  [<ffffffff819b682c>] nf_conntrack_in+0x29c/0x7c0
>>> [81133.457940]  [<ffffffff81a67181>] ipv4_conntrack_in+0x21/0x30
>>> [81133.466296]  [<ffffffff819aea1c>] nf_iterate+0x4c/0x80
>>> [81133.474401]  [<ffffffff819aeab4>] nf_hook_slow+0x64/0xc0
>>> [81133.482615]  [<ffffffff81a211ec>] ip_rcv+0x2ec/0x380
>>> [81133.490781]  [<ffffffff81a209f0>] ?
>>> ip_local_deliver_finish+0x130/0x130
>>> [81133.498790]  [<ffffffff8197e140>]
>>> __netif_receive_skb_core+0x2a0/0x970
>>> [81133.506714]  [<ffffffff81a56db8>] ? inet_gro_receive+0x1c8/0x200
>>> [81133.514609]  [<ffffffff81980705>] __netif_receive_skb+0x15/0x70
>>> [81133.522333]  [<ffffffff8198077e>]
>>> netif_receive_skb_internal+0x1e/0x80
>>> [81133.529840]  [<ffffffff81980f3b>] napi_gro_receive+0x6b/0x90
>>> [81133.537173]  [<ffffffff81740fb6>] rtl8169_poll+0x2e6/0x600
>>> [81133.544444]  [<ffffffff810fc261>] ?
>>> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>>> [81133.551566]  [<ffffffff81981ad7>] net_rx_action+0x1f7/0x300
>>> [81133.558412]  [<ffffffff810cb6c3>] __do_softirq+0x103/0x210
>>> [81133.565353]  [<ffffffff810cb807>] run_ksoftirqd+0x37/0x60
>>> [81133.572359]  [<ffffffff810e4de0>] smpboot_thread_fn+0x130/0x190
>>> [81133.579215]  [<ffffffff810e4cb0>] ? sort_range+0x20/0x20
>>> [81133.586042]  [<ffffffff810e1fae>] kthread+0xee/0x110
>>> [81133.592792]  [<ffffffff810e1ec0>] ?
>>> kthread_create_on_node+0x1b0/0x1b0
>>> [81133.599694]  [<ffffffff81af92df>] ret_from_fork+0x3f/0x70
>>> [81133.606662]  [<ffffffff810e1ec0>] ?
>>> kthread_create_on_node+0x1b0/0x1b0
>>> [81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
>>> 00
>>> 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89
>>> 08
>>> 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48
>>> [81133.627196] RIP  [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
>>> [81133.634036]  RSP <ffff880059bb7848>
>>> [81133.640817] ---[ end trace eaf596e1fcf6a591 ]---
>>> [81133.647521] Kernel panic - not syncing: Fatal exception in
>>> interrupt
>> This looks like the bug fixed in David Miller net tree :
>> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=2235f2ac75fd2501c251b0b699a9632e80239a6d
> 
> Will pull the net-tree in and re-test.

You should not pull the 'net-next', but rather the 'net' one.

'net' is not necessarily included in 'net-next'.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-12 21:40     ` David Miller
@ 2015-08-12 21:46       ` Sander Eikelenboom
  2015-08-12 22:41         ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Sander Eikelenboom @ 2015-08-12 21:46 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, linux-kernel, netdev

On 2015-08-12 23:40, David Miller wrote:
> From: linux@eikelenboom.it
> Date: Wed, 12 Aug 2015 22:50:42 +0200
> 
>> On 2015-08-12 22:41, Eric Dumazet wrote:
>>> On Wed, 2015-08-12 at 21:19 +0200, linux@eikelenboom.it wrote:
>>>> Hi,
>>>> On my box running Xen with a 4.2-rc6 kernel i still get this splat 
>>>> in
>>>> dom0,
>>>> which crashes the box.
>>>> (i reported a similar splat before (at rc4) here,
>>>> http://www.spinics.net/lists/netdev/msg337570.html)
>>>> Never seen this one on 4.1, so it seems a regression.
>>>> --
>>>> Sander
>>>> [81133.193439] general protection fault: 0000 [#1] SMP
>>>> [81133.204284] Modules linked in:
>>>> [81133.214934] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
>>>> 4.2.0-rc6-20150811-linus-doflr+ #1
>>>> [81133.225632] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , 
>>>> BIOS
>>>> V1.8B1 09/13/2010
>>>> [81133.236237] task: ffff880059b91580 ti: ffff880059bb4000 task.ti:
>>>> ffff880059bb4000
>>>> [81133.246808] RIP: e030:[<ffffffff8110fb18>]  [<ffffffff8110fb18>]
>>>> detach_if_pending+0x18/0x80
>>>> [81133.257354] RSP: e02b:ffff880059bb7848  EFLAGS: 00010086
>>>> [81133.267749] RAX: ffff88004eddc7f0 RBX: ffff88000e20ae08 RCX:
>>>> dead000000200200
>>>> [81133.278201] RDX: 0000000000000000 RSI: ffff88005f60e600 RDI:
>>>> ffff88000e20ae08
>>>> [81133.288723] RBP: ffff880059bb7848 R08: 0000000000000001 R09:
>>>> 0000000000000001
>>>> [81133.298930] R10: 0000000000000003 R11: ffff88000e20ad68 R12:
>>>> 0000000000000000
>>>> [81133.308875] R13: 0000000101735569 R14: 0000000000015f90 R15:
>>>> ffff88005f60e600
>>>> [81133.318845] FS:  00007f28c6f7c800(0000) GS:ffff88005f600000(0000)
>>>> knlGS:0000000000000000
>>>> [81133.328864] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [81133.338693] CR2: ffff8000007f6800 CR3: 000000003d55c000 CR4:
>>>> 0000000000000660
>>>> [81133.348462] Stack:
>>>> [81133.358005]  ffff880059bb7898 ffffffff8110fe3f ffffffff810fc261
>>>> 0000000000000200
>>>> [81133.367682]  0000000000000003 ffff88000e20ad68 0000000000000000
>>>> ffff88005854d400
>>>> [81133.377064]  0000000000015f90 0000000000000000 ffff880059bb78c8
>>>> ffffffff819b5243
>>>> [81133.386374] Call Trace:
>>>> [81133.395596]  [<ffffffff8110fe3f>] mod_timer_pending+0x3f/0xe0
>>>> [81133.404999]  [<ffffffff810fc261>] ?
>>>> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>>>> [81133.414255]  [<ffffffff819b5243>] __nf_ct_refresh_acct+0xa3/0xb0
>>>> [81133.423137]  [<ffffffff819bbe8b>] tcp_packet+0xb3b/0x1290
>>>> [81133.431894]  [<ffffffff810cb8ca>] ? 
>>>> __local_bh_enable_ip+0x2a/0x90
>>>> [81133.440622]  [<ffffffff819b4939>] ?
>>>> __nf_conntrack_find_get+0x129/0x2a0
>>>> [81133.449339]  [<ffffffff819b682c>] nf_conntrack_in+0x29c/0x7c0
>>>> [81133.457940]  [<ffffffff81a67181>] ipv4_conntrack_in+0x21/0x30
>>>> [81133.466296]  [<ffffffff819aea1c>] nf_iterate+0x4c/0x80
>>>> [81133.474401]  [<ffffffff819aeab4>] nf_hook_slow+0x64/0xc0
>>>> [81133.482615]  [<ffffffff81a211ec>] ip_rcv+0x2ec/0x380
>>>> [81133.490781]  [<ffffffff81a209f0>] ?
>>>> ip_local_deliver_finish+0x130/0x130
>>>> [81133.498790]  [<ffffffff8197e140>]
>>>> __netif_receive_skb_core+0x2a0/0x970
>>>> [81133.506714]  [<ffffffff81a56db8>] ? inet_gro_receive+0x1c8/0x200
>>>> [81133.514609]  [<ffffffff81980705>] __netif_receive_skb+0x15/0x70
>>>> [81133.522333]  [<ffffffff8198077e>]
>>>> netif_receive_skb_internal+0x1e/0x80
>>>> [81133.529840]  [<ffffffff81980f3b>] napi_gro_receive+0x6b/0x90
>>>> [81133.537173]  [<ffffffff81740fb6>] rtl8169_poll+0x2e6/0x600
>>>> [81133.544444]  [<ffffffff810fc261>] ?
>>>> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>>>> [81133.551566]  [<ffffffff81981ad7>] net_rx_action+0x1f7/0x300
>>>> [81133.558412]  [<ffffffff810cb6c3>] __do_softirq+0x103/0x210
>>>> [81133.565353]  [<ffffffff810cb807>] run_ksoftirqd+0x37/0x60
>>>> [81133.572359]  [<ffffffff810e4de0>] smpboot_thread_fn+0x130/0x190
>>>> [81133.579215]  [<ffffffff810e4cb0>] ? sort_range+0x20/0x20
>>>> [81133.586042]  [<ffffffff810e1fae>] kthread+0xee/0x110
>>>> [81133.592792]  [<ffffffff810e1ec0>] ?
>>>> kthread_create_on_node+0x1b0/0x1b0
>>>> [81133.599694]  [<ffffffff81af92df>] ret_from_fork+0x3f/0x70
>>>> [81133.606662]  [<ffffffff810e1ec0>] ?
>>>> kthread_create_on_node+0x1b0/0x1b0
>>>> [81133.613445] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00
>>>> 00
>>>> 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89
>>>> 08
>>>> 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 
>>>> 48
>>>> [81133.627196] RIP  [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
>>>> [81133.634036]  RSP <ffff880059bb7848>
>>>> [81133.640817] ---[ end trace eaf596e1fcf6a591 ]---
>>>> [81133.647521] Kernel panic - not syncing: Fatal exception in
>>>> interrupt
>>> This looks like the bug fixed in David Miller net tree :
>>> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=2235f2ac75fd2501c251b0b699a9632e80239a6d
>> 
>> Will pull the net-tree in and re-test.
> 
> You should not pull the 'net-next', but rather the 'net' one.
> 
> 'net' is not necessarily included in 'net-next'.

Thanks for the reminder, but luckily i was aware of that,
seen enough of your replies asking for patches to be resubmitted
against "the other tree" ;)
Kernel with patch is currently running so fingers crossed.

--
Sander

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-12 21:46       ` Sander Eikelenboom
@ 2015-08-12 22:41         ` Eric Dumazet
  2015-08-14 22:09           ` Sander Eikelenboom
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2015-08-12 22:41 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: David Miller, linux-kernel, netdev

On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote:

> Thanks for the reminder, but luckily i was aware of that,
> seen enough of your replies asking for patches to be resubmitted
> against "the other tree" ;)
> Kernel with patch is currently running so fingers crossed.

Thanks for testing. I am definitely interested knowing your results.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-12 22:41         ` Eric Dumazet
@ 2015-08-14 22:09           ` Sander Eikelenboom
  2015-08-14 22:16             ` Sander Eikelenboom
  2015-08-14 22:39             ` Eric Dumazet
  0 siblings, 2 replies; 20+ messages in thread
From: Sander Eikelenboom @ 2015-08-14 22:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel

On 2015-08-13 00:41, Eric Dumazet wrote:
> On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote:
> 
>> Thanks for the reminder, but luckily i was aware of that,
>> seen enough of your replies asking for patches to be resubmitted
>> against "the other tree" ;)
>> Kernel with patch is currently running so fingers crossed.
> 
> Thanks for testing. I am definitely interested knowing your results.

Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is 
breaking things
(have to test if a revert helps) i get this in some guests:

NMI watchdog: BUG: soft lockup - CPU#0 stuck for 506s! [swapper/0:0]
[ 6620.282805] Modules linked in:
[ 6620.282805] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.2.0-rc6-20150814-linus-doflr-apicrevert+ #1
[ 6620.282805] task: ffffffff8221a580 ti: ffffffff82200000 task.ti: 
ffffffff82200000
[ 6620.282805] RIP: e030:[<ffffffff8100122a>]  [<ffffffff8100122a>] 
xen_hypercall_xen_version+0xa/0x20
[ 6620.282805] RSP: e02b:ffff88000fc03d48  EFLAGS: 00000246
[ 6620.282805] RAX: 0000000000040006 RBX: 0000000000000200 RCX: 
ffffffff8100122a
[ 6620.282805] RDX: 0000000000000001 RSI: 00000000deadbeef RDI: 
00000000deadbeef
[ 6620.282805] RBP: ffff88000fc03d60 R08: ffff88000fc03ee0 R09: 
00000000000000ee
[ 6620.282805] R10: ffffffff8220a0c0 R11: 0000000000000246 R12: 
00000000ffffffff
[ 6620.282805] R13: 0000000000000001 R14: ffff880003b53054 R15: 
0000000000000005
[ 6620.282805] FS:  00007fec747ad800(0000) GS:ffff88000fc00000(0000) 
knlGS:0000000000000000
[ 6620.282805] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6620.282805] CR2: 00007ffcb7a7a6d8 CR3: 0000000003164000 CR4: 
0000000000000660
[ 6620.282805] Stack:
[ 6620.282805]  0000000000000068 0000000000000007 ffffffff81008dbd 
ffff88000fc03dd8
[ 6620.282805]  ffffffff81009592 0000000000000068 ffffffff8220a0c0 
00000000000000ee
[ 6620.282805]  ffff88000fc03ee0 0000000000000200 0000000000000200 
0000000000000001
[ 6620.282805] Call Trace:
[ 6620.282805]  <IRQ>
[ 6620.282805]  [<ffffffff81008dbd>] ? 
xen_force_evtchn_callback+0xd/0x10
[ 6620.282805]  [<ffffffff81009592>] check_events+0x12/0x20
[ 6620.282805]  [<ffffffff8100957f>] ? 
xen_restore_fl_direct_reloc+0x4/0x4
[ 6620.282805]  [<ffffffff81af79a5>] ? 
_raw_spin_unlock_irqrestore+0x25/0x30
[ 6620.282805]  [<ffffffff8110ed43>] try_to_del_timer_sync+0x43/0x60
[ 6620.282805]  [<ffffffff8110eda7>] del_timer_sync+0x47/0x60
[ 6620.282805]  [<ffffffff81a2b698>] 
inet_csk_reqsk_queue_drop+0x118/0x1f0
[ 6620.282805]  [<ffffffff81a2b8c6>] reqsk_timer_handler+0x156/0x260
[ 6620.282805]  [<ffffffff81a2b770>] ? 
inet_csk_reqsk_queue_drop+0x1f0/0x1f0
[ 6620.282805]  [<ffffffff8110f3c7>] call_timer_fn.isra.27+0x17/0x80
[ 6620.282805]  [<ffffffff81a2b770>] ? 
inet_csk_reqsk_queue_drop+0x1f0/0x1f0
[ 6620.282805]  [<ffffffff8110f55d>] run_timer_softirq+0x12d/0x200
[ 6620.282805]  [<ffffffff810ca6c3>] __do_softirq+0x103/0x210
[ 6620.282805]  [<ffffffff810ca9cb>] irq_exit+0x4b/0xa0
[ 6620.282805]  [<ffffffff814f05d4>] xen_evtchn_do_upcall+0x34/0x50
[ 6620.282805]  [<ffffffff81af932e>] 
xen_do_hypervisor_callback+0x1e/0x40
[ 6620.282805]  <EOI>
[ 6620.282805]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[ 6620.282805]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[ 6620.282805]  [<ffffffff81008d60>] ? xen_safe_halt+0x10/0x20
[ 6620.282805]  [<ffffffff810188d3>] ? default_idle+0x13/0x20
[ 6620.282805]  [<ffffffff81018e1a>] ? arch_cpu_idle+0xa/0x10
[ 6620.282805]  [<ffffffff810f8e7e>] ? default_idle_call+0x2e/0x50
[ 6620.282805]  [<ffffffff810f9112>] ? cpu_startup_entry+0x272/0x2e0
[ 6620.282805]  [<ffffffff81ae7967>] ? rest_init+0x77/0x80
[ 6620.282805]  [<ffffffff82312f58>] ? start_kernel+0x43b/0x448
[ 6620.282805]  [<ffffffff823124ef>] ? 
x86_64_start_reservations+0x2a/0x2c
[ 6620.282805]  [<ffffffff82316008>] ? xen_start_kernel+0x550/0x55c
[ 6620.282805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc 
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 
0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-14 22:09           ` Sander Eikelenboom
@ 2015-08-14 22:16             ` Sander Eikelenboom
  2015-08-14 22:39             ` Eric Dumazet
  1 sibling, 0 replies; 20+ messages in thread
From: Sander Eikelenboom @ 2015-08-14 22:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel

On 2015-08-15 00:09, Sander Eikelenboom wrote:
> On 2015-08-13 00:41, Eric Dumazet wrote:
>> On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote:
>> 
>>> Thanks for the reminder, but luckily i was aware of that,
>>> seen enough of your replies asking for patches to be resubmitted
>>> against "the other tree" ;)
>>> Kernel with patch is currently running so fingers crossed.
>> 
>> Thanks for testing. I am definitely interested knowing your results.
> 
> Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is
> breaking things
> (have to test if a revert helps) i get this in some guests:

Should have done that before, because it wasn't in yet .. and likely to 
fix the issue,
also pulled and compiling now.

--
Sander



> NMI watchdog: BUG: soft lockup - CPU#0 stuck for 506s! [swapper/0:0]
> [ 6620.282805] Modules linked in:
> [ 6620.282805] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 4.2.0-rc6-20150814-linus-doflr-apicrevert+ #1
> [ 6620.282805] task: ffffffff8221a580 ti: ffffffff82200000 task.ti:
> ffffffff82200000
> [ 6620.282805] RIP: e030:[<ffffffff8100122a>]  [<ffffffff8100122a>]
> xen_hypercall_xen_version+0xa/0x20
> [ 6620.282805] RSP: e02b:ffff88000fc03d48  EFLAGS: 00000246
> [ 6620.282805] RAX: 0000000000040006 RBX: 0000000000000200 RCX: 
> ffffffff8100122a
> [ 6620.282805] RDX: 0000000000000001 RSI: 00000000deadbeef RDI: 
> 00000000deadbeef
> [ 6620.282805] RBP: ffff88000fc03d60 R08: ffff88000fc03ee0 R09: 
> 00000000000000ee
> [ 6620.282805] R10: ffffffff8220a0c0 R11: 0000000000000246 R12: 
> 00000000ffffffff
> [ 6620.282805] R13: 0000000000000001 R14: ffff880003b53054 R15: 
> 0000000000000005
> [ 6620.282805] FS:  00007fec747ad800(0000) GS:ffff88000fc00000(0000)
> knlGS:0000000000000000
> [ 6620.282805] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 6620.282805] CR2: 00007ffcb7a7a6d8 CR3: 0000000003164000 CR4: 
> 0000000000000660
> [ 6620.282805] Stack:
> [ 6620.282805]  0000000000000068 0000000000000007 ffffffff81008dbd
> ffff88000fc03dd8
> [ 6620.282805]  ffffffff81009592 0000000000000068 ffffffff8220a0c0
> 00000000000000ee
> [ 6620.282805]  ffff88000fc03ee0 0000000000000200 0000000000000200
> 0000000000000001
> [ 6620.282805] Call Trace:
> [ 6620.282805]  <IRQ>
> [ 6620.282805]  [<ffffffff81008dbd>] ? 
> xen_force_evtchn_callback+0xd/0x10
> [ 6620.282805]  [<ffffffff81009592>] check_events+0x12/0x20
> [ 6620.282805]  [<ffffffff8100957f>] ? 
> xen_restore_fl_direct_reloc+0x4/0x4
> [ 6620.282805]  [<ffffffff81af79a5>] ? 
> _raw_spin_unlock_irqrestore+0x25/0x30
> [ 6620.282805]  [<ffffffff8110ed43>] try_to_del_timer_sync+0x43/0x60
> [ 6620.282805]  [<ffffffff8110eda7>] del_timer_sync+0x47/0x60
> [ 6620.282805]  [<ffffffff81a2b698>] 
> inet_csk_reqsk_queue_drop+0x118/0x1f0
> [ 6620.282805]  [<ffffffff81a2b8c6>] reqsk_timer_handler+0x156/0x260
> [ 6620.282805]  [<ffffffff81a2b770>] ? 
> inet_csk_reqsk_queue_drop+0x1f0/0x1f0
> [ 6620.282805]  [<ffffffff8110f3c7>] call_timer_fn.isra.27+0x17/0x80
> [ 6620.282805]  [<ffffffff81a2b770>] ? 
> inet_csk_reqsk_queue_drop+0x1f0/0x1f0
> [ 6620.282805]  [<ffffffff8110f55d>] run_timer_softirq+0x12d/0x200
> [ 6620.282805]  [<ffffffff810ca6c3>] __do_softirq+0x103/0x210
> [ 6620.282805]  [<ffffffff810ca9cb>] irq_exit+0x4b/0xa0
> [ 6620.282805]  [<ffffffff814f05d4>] xen_evtchn_do_upcall+0x34/0x50
> [ 6620.282805]  [<ffffffff81af932e>] 
> xen_do_hypervisor_callback+0x1e/0x40
> [ 6620.282805]  <EOI>
> [ 6620.282805]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [ 6620.282805]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [ 6620.282805]  [<ffffffff81008d60>] ? xen_safe_halt+0x10/0x20
> [ 6620.282805]  [<ffffffff810188d3>] ? default_idle+0x13/0x20
> [ 6620.282805]  [<ffffffff81018e1a>] ? arch_cpu_idle+0xa/0x10
> [ 6620.282805]  [<ffffffff810f8e7e>] ? default_idle_call+0x2e/0x50
> [ 6620.282805]  [<ffffffff810f9112>] ? cpu_startup_entry+0x272/0x2e0
> [ 6620.282805]  [<ffffffff81ae7967>] ? rest_init+0x77/0x80
> [ 6620.282805]  [<ffffffff82312f58>] ? start_kernel+0x43b/0x448
> [ 6620.282805]  [<ffffffff823124ef>] ? 
> x86_64_start_reservations+0x2a/0x2c
> [ 6620.282805]  [<ffffffff82316008>] ? xen_start_kernel+0x550/0x55c
> [ 6620.282805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc
> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00
> 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
> cc cc

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-14 22:09           ` Sander Eikelenboom
  2015-08-14 22:16             ` Sander Eikelenboom
@ 2015-08-14 22:39             ` Eric Dumazet
  2015-08-17  9:09               ` Sander Eikelenboom
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2015-08-14 22:39 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel

On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote:
> On 2015-08-13 00:41, Eric Dumazet wrote:
> > On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote:
> > 
> >> Thanks for the reminder, but luckily i was aware of that,
> >> seen enough of your replies asking for patches to be resubmitted
> >> against "the other tree" ;)
> >> Kernel with patch is currently running so fingers crossed.
> > 
> > Thanks for testing. I am definitely interested knowing your results.
> 
> Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is 
> breaking things
> (have to test if a revert helps) i get this in some guests:


Yes, this was fixed by :
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-14 22:39             ` Eric Dumazet
@ 2015-08-17  9:09               ` Sander Eikelenboom
  2015-08-17 13:37                 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Sander Eikelenboom @ 2015-08-17  9:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel


Saturday, August 15, 2015, 12:39:25 AM, you wrote:

> On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote:
>> On 2015-08-13 00:41, Eric Dumazet wrote:
>> > On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote:
>> > 
>> >> Thanks for the reminder, but luckily i was aware of that,
>> >> seen enough of your replies asking for patches to be resubmitted
>> >> against "the other tree" ;)
>> >> Kernel with patch is currently running so fingers crossed.
>> > 
>> > Thanks for testing. I am definitely interested knowing your results.
>> 
>> Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is 
>> breaking things
>> (have to test if a revert helps) i get this in some guests:


> Yes, this was fixed by :
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af


Hi Eric,

With that patch i had a crash again this night, see below.

--
Sander

[177459.188808] general protection fault: 0000 [#1] SMP 
[177459.199746] Modules linked in:
[177459.210540] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150815-linus-doflr-net+ #1
[177459.221441] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
[177459.232247] task: ffffffff8221a580 ti: ffffffff82200000 task.ti: ffffffff82200000
[177459.242931] RIP: e030:[<ffffffff8110eb58>]  [<ffffffff8110eb58>] detach_if_pending+0x18/0x80
[177459.253503] RSP: e02b:ffff88005f6039d8  EFLAGS: 00010086
[177459.264051] RAX: ffff8800584d6580 RBX: ffff880004901420 RCX: dead000000200200
[177459.274599] RDX: 0000000000000000 RSI: ffff88005f60e5c0 RDI: ffff880004901420
[177459.285122] RBP: ffff88005f6039d8 R08: 0000000000000001 R09: 0000000000000000
[177459.295286] R10: 0000000000000003 R11: ffff880004901394 R12: 0000000000000003
[177459.305388] R13: 000000010ae47040 R14: 0000000007b98a00 R15: ffff88005f60e5c0
[177459.315345] FS:  00007f51317ec700(0000) GS:ffff88005f600000(0000) knlGS:0000000000000000
[177459.325340] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[177459.335217] CR2: 00000000010f8000 CR3: 000000002a154000 CR4: 0000000000000660
[177459.345129] Stack:
[177459.354783]  ffff88005f603a28 ffffffff8110ee7f ffffffff810fb261 0000000000000200
[177459.364505]  0000000000000003 ffff880004901380 0000000000000003 ffff8800567d0d00
[177459.374064]  0000000007b98a00 0000000000000000 ffff88005f603a58 ffffffff819b3eb3
[177459.383532] Call Trace:
[177459.392878]  <IRQ> 
[177459.392935]  [<ffffffff8110ee7f>] mod_timer_pending+0x3f/0xe0
[177459.411058]  [<ffffffff810fb261>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[177459.419876]  [<ffffffff819b3eb3>] __nf_ct_refresh_acct+0xa3/0xb0
[177459.428642]  [<ffffffff819baafb>] tcp_packet+0xb3b/0x1290
[177459.437285]  [<ffffffff81a2535e>] ? ip_output+0x5e/0xc0
[177459.445845]  [<ffffffff810ca8ca>] ? __local_bh_enable_ip+0x2a/0x90
[177459.454331]  [<ffffffff819b35a9>] ? __nf_conntrack_find_get+0x129/0x2a0
[177459.462642]  [<ffffffff819b549c>] nf_conntrack_in+0x29c/0x7c0
[177459.470711]  [<ffffffff81a65e9c>] ipv4_conntrack_local+0x4c/0x50
[177459.478753]  [<ffffffff819ad67c>] nf_iterate+0x4c/0x80
[177459.486726]  [<ffffffff81102437>] ? generic_handle_irq+0x27/0x40
[177459.494634]  [<ffffffff819ad714>] nf_hook_slow+0x64/0xc0
[177459.502486]  [<ffffffff81a22d40>] __ip_local_out_sk+0x90/0xa0
[177459.510248]  [<ffffffff81a22c40>] ? ip_forward_options+0x1a0/0x1a0
[177459.517782]  [<ffffffff81a22d66>] ip_local_out_sk+0x16/0x40
[177459.525044]  [<ffffffff81a2343d>] ip_queue_xmit+0x14d/0x350
[177459.532247]  [<ffffffff81a3ae7e>] tcp_transmit_skb+0x48e/0x960
[177459.539413]  [<ffffffff81a3cddb>] tcp_xmit_probe_skb+0xdb/0xf0
[177459.546389]  [<ffffffff81a3dffb>] tcp_write_wakeup+0x5b/0x150
[177459.553061]  [<ffffffff81a3e51b>] tcp_keepalive_timer+0x1fb/0x230
[177459.559761]  [<ffffffff81a3e320>] ? tcp_init_xmit_timers+0x20/0x20
[177459.566447]  [<ffffffff8110f3c7>] call_timer_fn.isra.27+0x17/0x80
[177459.573121]  [<ffffffff81a3e320>] ? tcp_init_xmit_timers+0x20/0x20
[177459.579778]  [<ffffffff8110f55d>] run_timer_softirq+0x12d/0x200
[177459.586448]  [<ffffffff810ca6c3>] __do_softirq+0x103/0x210
[177459.593138]  [<ffffffff810ca9cb>] irq_exit+0x4b/0xa0
[177459.599783]  [<ffffffff814f05d4>] xen_evtchn_do_upcall+0x34/0x50
[177459.606300]  [<ffffffff81af93ae>] xen_do_hypervisor_callback+0x1e/0x40
[177459.612583]  <EOI> 
[177459.612637]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[177459.625010]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[177459.631157]  [<ffffffff81008d60>] ? xen_safe_halt+0x10/0x20
[177459.637158]  [<ffffffff810188d3>] ? default_idle+0x13/0x20
[177459.643072]  [<ffffffff81018e1a>] ? arch_cpu_idle+0xa/0x10
[177459.648809]  [<ffffffff810f8e7e>] ? default_idle_call+0x2e/0x50
[177459.654650]  [<ffffffff810f9112>] ? cpu_startup_entry+0x272/0x2e0
[177459.660488]  [<ffffffff81ae79f7>] ? rest_init+0x77/0x80
[177459.666297]  [<ffffffff82312f58>] ? start_kernel+0x43b/0x448
[177459.672092]  [<ffffffff823124ef>] ? x86_64_start_reservations+0x2a/0x2c
[177459.677800]  [<ffffffff82316008>] ? xen_start_kernel+0x550/0x55c
[177459.683451] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48 
[177459.695332] RIP  [<ffffffff8110eb58>] detach_if_pending+0x18/0x80
[177459.701154]  RSP <ffff88005f6039d8>
(XEN) [2015-08-17 00:11:51.426] Hardware Dom0 crashed: rebooting machine in 5 seconds.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17  9:09               ` Sander Eikelenboom
@ 2015-08-17 13:37                 ` Eric Dumazet
  2015-08-17 13:48                   ` Sander Eikelenboom
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2015-08-17 13:37 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel

On Mon, 2015-08-17 at 11:09 +0200, Sander Eikelenboom wrote:
> Saturday, August 15, 2015, 12:39:25 AM, you wrote:
> 
> > On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote:
> >> On 2015-08-13 00:41, Eric Dumazet wrote:
> >> > On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote:
> >> > 
> >> >> Thanks for the reminder, but luckily i was aware of that,
> >> >> seen enough of your replies asking for patches to be resubmitted
> >> >> against "the other tree" ;)
> >> >> Kernel with patch is currently running so fingers crossed.
> >> > 
> >> > Thanks for testing. I am definitely interested knowing your results.
> >> 
> >> Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is 
> >> breaking things
> >> (have to test if a revert helps) i get this in some guests:
> 
> 
> > Yes, this was fixed by :
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> 
> 
> Hi Eric,
> 
> With that patch i had a crash again this night, see below.
> 
> --
> Sander
> 
> [177459.188808] general protection fault: 0000 [#1] SMP 
> [177459.199746] Modules linked in:
> [177459.210540] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150815-linus-doflr-net+ #1
> [177459.221441] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
> [177459.232247] task: ffffffff8221a580 ti: ffffffff82200000 task.ti: ffffffff82200000
> [177459.242931] RIP: e030:[<ffffffff8110eb58>]  [<ffffffff8110eb58>] detach_if_pending+0x18/0x80
> [177459.253503] RSP: e02b:ffff88005f6039d8  EFLAGS: 00010086
> [177459.264051] RAX: ffff8800584d6580 RBX: ffff880004901420 RCX: dead000000200200
> [177459.274599] RDX: 0000000000000000 RSI: ffff88005f60e5c0 RDI: ffff880004901420
> [177459.285122] RBP: ffff88005f6039d8 R08: 0000000000000001 R09: 0000000000000000
> [177459.295286] R10: 0000000000000003 R11: ffff880004901394 R12: 0000000000000003
> [177459.305388] R13: 000000010ae47040 R14: 0000000007b98a00 R15: ffff88005f60e5c0
> [177459.315345] FS:  00007f51317ec700(0000) GS:ffff88005f600000(0000) knlGS:0000000000000000
> [177459.325340] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [177459.335217] CR2: 00000000010f8000 CR3: 000000002a154000 CR4: 0000000000000660
> [177459.345129] Stack:
> [177459.354783]  ffff88005f603a28 ffffffff8110ee7f ffffffff810fb261 0000000000000200
> [177459.364505]  0000000000000003 ffff880004901380 0000000000000003 ffff8800567d0d00
> [177459.374064]  0000000007b98a00 0000000000000000 ffff88005f603a58 ffffffff819b3eb3
> [177459.383532] Call Trace:
> [177459.392878]  <IRQ> 
> [177459.392935]  [<ffffffff8110ee7f>] mod_timer_pending+0x3f/0xe0
> [177459.411058]  [<ffffffff810fb261>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
> [177459.419876]  [<ffffffff819b3eb3>] __nf_ct_refresh_acct+0xa3/0xb0
> [177459.428642]  [<ffffffff819baafb>] tcp_packet+0xb3b/0x1290
> [177459.437285]  [<ffffffff81a2535e>] ? ip_output+0x5e/0xc0
> [177459.445845]  [<ffffffff810ca8ca>] ? __local_bh_enable_ip+0x2a/0x90
> [177459.454331]  [<ffffffff819b35a9>] ? __nf_conntrack_find_get+0x129/0x2a0
> [177459.462642]  [<ffffffff819b549c>] nf_conntrack_in+0x29c/0x7c0
> [177459.470711]  [<ffffffff81a65e9c>] ipv4_conntrack_local+0x4c/0x50
> [177459.478753]  [<ffffffff819ad67c>] nf_iterate+0x4c/0x80
> [177459.486726]  [<ffffffff81102437>] ? generic_handle_irq+0x27/0x40
> [177459.494634]  [<ffffffff819ad714>] nf_hook_slow+0x64/0xc0
> [177459.502486]  [<ffffffff81a22d40>] __ip_local_out_sk+0x90/0xa0
> [177459.510248]  [<ffffffff81a22c40>] ? ip_forward_options+0x1a0/0x1a0
> [177459.517782]  [<ffffffff81a22d66>] ip_local_out_sk+0x16/0x40
> [177459.525044]  [<ffffffff81a2343d>] ip_queue_xmit+0x14d/0x350
> [177459.532247]  [<ffffffff81a3ae7e>] tcp_transmit_skb+0x48e/0x960
> [177459.539413]  [<ffffffff81a3cddb>] tcp_xmit_probe_skb+0xdb/0xf0
> [177459.546389]  [<ffffffff81a3dffb>] tcp_write_wakeup+0x5b/0x150
> [177459.553061]  [<ffffffff81a3e51b>] tcp_keepalive_timer+0x1fb/0x230
> [177459.559761]  [<ffffffff81a3e320>] ? tcp_init_xmit_timers+0x20/0x20
> [177459.566447]  [<ffffffff8110f3c7>] call_timer_fn.isra.27+0x17/0x80
> [177459.573121]  [<ffffffff81a3e320>] ? tcp_init_xmit_timers+0x20/0x20
> [177459.579778]  [<ffffffff8110f55d>] run_timer_softirq+0x12d/0x200
> [177459.586448]  [<ffffffff810ca6c3>] __do_softirq+0x103/0x210
> [177459.593138]  [<ffffffff810ca9cb>] irq_exit+0x4b/0xa0
> [177459.599783]  [<ffffffff814f05d4>] xen_evtchn_do_upcall+0x34/0x50
> [177459.606300]  [<ffffffff81af93ae>] xen_do_hypervisor_callback+0x1e/0x40
> [177459.612583]  <EOI> 
> [177459.612637]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [177459.625010]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [177459.631157]  [<ffffffff81008d60>] ? xen_safe_halt+0x10/0x20
> [177459.637158]  [<ffffffff810188d3>] ? default_idle+0x13/0x20
> [177459.643072]  [<ffffffff81018e1a>] ? arch_cpu_idle+0xa/0x10
> [177459.648809]  [<ffffffff810f8e7e>] ? default_idle_call+0x2e/0x50
> [177459.654650]  [<ffffffff810f9112>] ? cpu_startup_entry+0x272/0x2e0
> [177459.660488]  [<ffffffff81ae79f7>] ? rest_init+0x77/0x80
> [177459.666297]  [<ffffffff82312f58>] ? start_kernel+0x43b/0x448
> [177459.672092]  [<ffffffff823124ef>] ? x86_64_start_reservations+0x2a/0x2c
> [177459.677800]  [<ffffffff82316008>] ? xen_start_kernel+0x550/0x55c
> [177459.683451] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48 
> [177459.695332] RIP  [<ffffffff8110eb58>] detach_if_pending+0x18/0x80
> [177459.701154]  RSP <ffff88005f6039d8>
> (XEN) [2015-08-17 00:11:51.426] Hardware Dom0 crashed: rebooting machine in 5 seconds.
> 


might be conntracking related then.
You might try :

1) reproduce the issue without conntracking.

2) bisect the bug

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 13:37                 ` Eric Dumazet
@ 2015-08-17 13:48                   ` Sander Eikelenboom
  2015-08-17 14:02                     ` Jon Christopherson
  0 siblings, 1 reply; 20+ messages in thread
From: Sander Eikelenboom @ 2015-08-17 13:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel


Monday, August 17, 2015, 3:37:13 PM, you wrote:

> On Mon, 2015-08-17 at 11:09 +0200, Sander Eikelenboom wrote:
>> Saturday, August 15, 2015, 12:39:25 AM, you wrote:
>> 
>> > On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote:
>> >> On 2015-08-13 00:41, Eric Dumazet wrote:
>> >> > On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote:
>> >> > 
>> >> >> Thanks for the reminder, but luckily i was aware of that,
>> >> >> seen enough of your replies asking for patches to be resubmitted
>> >> >> against "the other tree" ;)
>> >> >> Kernel with patch is currently running so fingers crossed.
>> >> > 
>> >> > Thanks for testing. I am definitely interested knowing your results.
>> >> 
>> >> Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is 
>> >> breaking things
>> >> (have to test if a revert helps) i get this in some guests:
>> 
>> 
>> > Yes, this was fixed by :
>> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>> 
>> 
>> Hi Eric,
>> 
>> With that patch i had a crash again this night, see below.
>> 
>> --
>> Sander
>> 
>> [177459.188808] general protection fault: 0000 [#1] SMP 
>> [177459.199746] Modules linked in:
>> [177459.210540] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150815-linus-doflr-net+ #1
>> [177459.221441] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>> [177459.232247] task: ffffffff8221a580 ti: ffffffff82200000 task.ti: ffffffff82200000
>> [177459.242931] RIP: e030:[<ffffffff8110eb58>]  [<ffffffff8110eb58>] detach_if_pending+0x18/0x80
>> [177459.253503] RSP: e02b:ffff88005f6039d8  EFLAGS: 00010086
>> [177459.264051] RAX: ffff8800584d6580 RBX: ffff880004901420 RCX: dead000000200200
>> [177459.274599] RDX: 0000000000000000 RSI: ffff88005f60e5c0 RDI: ffff880004901420
>> [177459.285122] RBP: ffff88005f6039d8 R08: 0000000000000001 R09: 0000000000000000
>> [177459.295286] R10: 0000000000000003 R11: ffff880004901394 R12: 0000000000000003
>> [177459.305388] R13: 000000010ae47040 R14: 0000000007b98a00 R15: ffff88005f60e5c0
>> [177459.315345] FS:  00007f51317ec700(0000) GS:ffff88005f600000(0000) knlGS:0000000000000000
>> [177459.325340] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [177459.335217] CR2: 00000000010f8000 CR3: 000000002a154000 CR4: 0000000000000660
>> [177459.345129] Stack:
>> [177459.354783]  ffff88005f603a28 ffffffff8110ee7f ffffffff810fb261 0000000000000200
>> [177459.364505]  0000000000000003 ffff880004901380 0000000000000003 ffff8800567d0d00
>> [177459.374064]  0000000007b98a00 0000000000000000 ffff88005f603a58 ffffffff819b3eb3
>> [177459.383532] Call Trace:
>> [177459.392878]  <IRQ> 
>> [177459.392935]  [<ffffffff8110ee7f>] mod_timer_pending+0x3f/0xe0
>> [177459.411058]  [<ffffffff810fb261>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>> [177459.419876]  [<ffffffff819b3eb3>] __nf_ct_refresh_acct+0xa3/0xb0
>> [177459.428642]  [<ffffffff819baafb>] tcp_packet+0xb3b/0x1290
>> [177459.437285]  [<ffffffff81a2535e>] ? ip_output+0x5e/0xc0
>> [177459.445845]  [<ffffffff810ca8ca>] ? __local_bh_enable_ip+0x2a/0x90
>> [177459.454331]  [<ffffffff819b35a9>] ? __nf_conntrack_find_get+0x129/0x2a0
>> [177459.462642]  [<ffffffff819b549c>] nf_conntrack_in+0x29c/0x7c0
>> [177459.470711]  [<ffffffff81a65e9c>] ipv4_conntrack_local+0x4c/0x50
>> [177459.478753]  [<ffffffff819ad67c>] nf_iterate+0x4c/0x80
>> [177459.486726]  [<ffffffff81102437>] ? generic_handle_irq+0x27/0x40
>> [177459.494634]  [<ffffffff819ad714>] nf_hook_slow+0x64/0xc0
>> [177459.502486]  [<ffffffff81a22d40>] __ip_local_out_sk+0x90/0xa0
>> [177459.510248]  [<ffffffff81a22c40>] ? ip_forward_options+0x1a0/0x1a0
>> [177459.517782]  [<ffffffff81a22d66>] ip_local_out_sk+0x16/0x40
>> [177459.525044]  [<ffffffff81a2343d>] ip_queue_xmit+0x14d/0x350
>> [177459.532247]  [<ffffffff81a3ae7e>] tcp_transmit_skb+0x48e/0x960
>> [177459.539413]  [<ffffffff81a3cddb>] tcp_xmit_probe_skb+0xdb/0xf0
>> [177459.546389]  [<ffffffff81a3dffb>] tcp_write_wakeup+0x5b/0x150
>> [177459.553061]  [<ffffffff81a3e51b>] tcp_keepalive_timer+0x1fb/0x230
>> [177459.559761]  [<ffffffff81a3e320>] ? tcp_init_xmit_timers+0x20/0x20
>> [177459.566447]  [<ffffffff8110f3c7>] call_timer_fn.isra.27+0x17/0x80
>> [177459.573121]  [<ffffffff81a3e320>] ? tcp_init_xmit_timers+0x20/0x20
>> [177459.579778]  [<ffffffff8110f55d>] run_timer_softirq+0x12d/0x200
>> [177459.586448]  [<ffffffff810ca6c3>] __do_softirq+0x103/0x210
>> [177459.593138]  [<ffffffff810ca9cb>] irq_exit+0x4b/0xa0
>> [177459.599783]  [<ffffffff814f05d4>] xen_evtchn_do_upcall+0x34/0x50
>> [177459.606300]  [<ffffffff81af93ae>] xen_do_hypervisor_callback+0x1e/0x40
>> [177459.612583]  <EOI> 
>> [177459.612637]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [177459.625010]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [177459.631157]  [<ffffffff81008d60>] ? xen_safe_halt+0x10/0x20
>> [177459.637158]  [<ffffffff810188d3>] ? default_idle+0x13/0x20
>> [177459.643072]  [<ffffffff81018e1a>] ? arch_cpu_idle+0xa/0x10
>> [177459.648809]  [<ffffffff810f8e7e>] ? default_idle_call+0x2e/0x50
>> [177459.654650]  [<ffffffff810f9112>] ? cpu_startup_entry+0x272/0x2e0
>> [177459.660488]  [<ffffffff81ae79f7>] ? rest_init+0x77/0x80
>> [177459.666297]  [<ffffffff82312f58>] ? start_kernel+0x43b/0x448
>> [177459.672092]  [<ffffffff823124ef>] ? x86_64_start_reservations+0x2a/0x2c
>> [177459.677800]  [<ffffffff82316008>] ? xen_start_kernel+0x550/0x55c
>> [177459.683451] Code: 77 28 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 08 55 48 89 e5 48 85 c0 74 6a 48 8b 0f 48 85 c9 48 89 08 74 04 <48> 89 41 08 84 d2 74 08 48 c7 47 08 00 00 00 00 f6 47 2a 10 48 
>> [177459.695332] RIP  [<ffffffff8110eb58>] detach_if_pending+0x18/0x80
>> [177459.701154]  RSP <ffff88005f6039d8>
>> (XEN) [2015-08-17 00:11:51.426] Hardware Dom0 crashed: rebooting machine in 5 seconds.
>> 


> might be conntracking related then.
> You might try :

> 1) reproduce the issue without conntracking.
Will see if i can do that.

> 2) bisect the bug
Hmm that's  going to be quite painful, since i don't have an immediate 
and reliable testcase (running for "about two days" doessn't qualify).
Especially since there are all kinds of other known bugs in between. 

> Thanks.

--
Sander

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 13:48                   ` Sander Eikelenboom
@ 2015-08-17 14:02                     ` Jon Christopherson
  2015-08-17 14:21                       ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Jon Christopherson @ 2015-08-17 14:02 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel,
	Eric Dumazet

This is very similar to the behavior I am seeing in this bug:

https://bugzilla.kernel.org/show_bug.cgi?id=102911

-Jon

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 14:02                     ` Jon Christopherson
@ 2015-08-17 14:21                       ` Eric Dumazet
  2015-08-17 14:25                         ` Sander Eikelenboom
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2015-08-17 14:21 UTC (permalink / raw)
  To: Jon Christopherson
  Cc: Sander Eikelenboom, David Miller, linux-kernel, netdev, xen-devel,
	david.vrabel

On Mon, 2015-08-17 at 09:02 -0500, Jon Christopherson wrote:
> This is very similar to the behavior I am seeing in this bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=102911

OK, but have you applied the fix ?

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af

It will be part of net iteration from David Miller to Linus Torvald.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 14:21                       ` Eric Dumazet
@ 2015-08-17 14:25                         ` Sander Eikelenboom
  2015-08-17 15:16                           ` Jon Christopherson
  2015-08-17 17:18                           ` Eric Dumazet
  0 siblings, 2 replies; 20+ messages in thread
From: Sander Eikelenboom @ 2015-08-17 14:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jon Christopherson, David Miller, linux-kernel, netdev, xen-devel,
	david.vrabel


Monday, August 17, 2015, 4:21:47 PM, you wrote:

> On Mon, 2015-08-17 at 09:02 -0500, Jon Christopherson wrote:
>> This is very similar to the behavior I am seeing in this bug:
>> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=102911

> OK, but have you applied the fix ?

> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af

> It will be part of net iteration from David Miller to Linus Torvald.


I did have that patch in for my last report.
But i don't think he had (looking at the second part of his oops).
 
--
Sander

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 14:25                         ` Sander Eikelenboom
@ 2015-08-17 15:16                           ` Jon Christopherson
  2015-08-17 17:18                           ` Eric Dumazet
  1 sibling, 0 replies; 20+ messages in thread
From: Jon Christopherson @ 2015-08-17 15:16 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel,
	Eric Dumazet

On 08/17/2015 09:25 AM, Sander Eikelenboom wrote:
>
> > OK, but have you applied the fix ?
>
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>
> > It will be part of net iteration from David Miller to Linus Torvald.
>
>
I did not have that fix applied, but will apply and test.

Thanks,

Jon

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 14:25                         ` Sander Eikelenboom
  2015-08-17 15:16                           ` Jon Christopherson
@ 2015-08-17 17:18                           ` Eric Dumazet
  2015-08-17 18:27                             ` Sander Eikelenboom
                                               ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Eric Dumazet @ 2015-08-17 17:18 UTC (permalink / raw)
  To: Sander Eikelenboom, Thomas Gleixner
  Cc: Jon Christopherson, David Miller, linux-kernel, netdev, xen-devel,
	david.vrabel

From: Eric Dumazet <edumazet@google.com>

On Mon, 2015-08-17 at 16:25 +0200, Sander Eikelenboom wrote:
> Monday, August 17, 2015, 4:21:47 PM, you wrote:
> 
> > On Mon, 2015-08-17 at 09:02 -0500, Jon Christopherson wrote:
> >> This is very similar to the behavior I am seeing in this bug:
> >> 
> >> https://bugzilla.kernel.org/show_bug.cgi?id=102911
> 
> > OK, but have you applied the fix ?
> 
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> 
> > It will be part of net iteration from David Miller to Linus Torvald.
> 
> 
> I did have that patch in for my last report.
> But i don't think he had (looking at the second part of his oops).
>  

Then can you try following fix as well ?

Thanks !

[PATCH] timer: fix a race in __mod_timer()

lock_timer_base() can not catch following :

CPU1 ( in __mod_timer()
timer->flags |= TIMER_MIGRATING;
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
timer->flags &= ~TIMER_BASEMASK;
                                  CPU2 (in lock_timer_base())
                                  see timer base is cpu0 base
                                  spin_lock_irqsave(&base->lock, *flags);
                                  if (timer->flags == tf)
                                       return base; // oops, wrong base
timer->flags |= base->cpu // too late

We must write timer->flags in one go, otherwise we can fool other cpus.

Fixes: bc7a34b8b9eb ("timer: Reduce timer migration overhead if disabled")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timer.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 5e097fa9faf7..84190f02b521 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -807,8 +807,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 			spin_unlock(&base->lock);
 			base = new_base;
 			spin_lock(&base->lock);
-			timer->flags &= ~TIMER_BASEMASK;
-			timer->flags |= base->cpu;
+			WRITE_ONCE(timer->flags,
+				   (timer->flags & ~TIMER_BASEMASK) | base->cpu);
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 17:18                           ` Eric Dumazet
@ 2015-08-17 18:27                             ` Sander Eikelenboom
  2015-08-17 19:13                             ` Thomas Gleixner
  2015-08-18  0:05                             ` Jon Christopherson
  2 siblings, 0 replies; 20+ messages in thread
From: Sander Eikelenboom @ 2015-08-17 18:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Jon Christopherson, David Miller, linux-kernel,
	netdev, xen-devel, david.vrabel

On 2015-08-17 19:18, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> On Mon, 2015-08-17 at 16:25 +0200, Sander Eikelenboom wrote:
>> Monday, August 17, 2015, 4:21:47 PM, you wrote:
>> 
>> > On Mon, 2015-08-17 at 09:02 -0500, Jon Christopherson wrote:
>> >> This is very similar to the behavior I am seeing in this bug:
>> >>
>> >> https://bugzilla.kernel.org/show_bug.cgi?id=102911
>> 
>> > OK, but have you applied the fix ?
>> 
>> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>> 
>> > It will be part of net iteration from David Miller to Linus Torvald.
>> 
>> 
>> I did have that patch in for my last report.
>> But i don't think he had (looking at the second part of his oops).
>> 
> 
> Then can you try following fix as well ?
> 
> Thanks !

Running now :)


> 
> [PATCH] timer: fix a race in __mod_timer()
> 
> lock_timer_base() can not catch following :
> 
> CPU1 ( in __mod_timer()
> timer->flags |= TIMER_MIGRATING;
> spin_unlock(&base->lock);
> base = new_base;
> spin_lock(&base->lock);
> timer->flags &= ~TIMER_BASEMASK;
>                                   CPU2 (in lock_timer_base())
>                                   see timer base is cpu0 base
>                                   spin_lock_irqsave(&base->lock, 
> *flags);
>                                   if (timer->flags == tf)
>                                        return base; // oops, wrong base
> timer->flags |= base->cpu // too late
> 
> We must write timer->flags in one go, otherwise we can fool other cpus.
> 
> Fixes: bc7a34b8b9eb ("timer: Reduce timer migration overhead if 
> disabled")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/timer.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 5e097fa9faf7..84190f02b521 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -807,8 +807,8 @@ __mod_timer(struct timer_list *timer, unsigned long 
> expires,
>  			spin_unlock(&base->lock);
>  			base = new_base;
>  			spin_lock(&base->lock);
> -			timer->flags &= ~TIMER_BASEMASK;
> -			timer->flags |= base->cpu;
> +			WRITE_ONCE(timer->flags,
> +				   (timer->flags & ~TIMER_BASEMASK) | base->cpu);
>  		}
>  	}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 17:18                           ` Eric Dumazet
  2015-08-17 18:27                             ` Sander Eikelenboom
@ 2015-08-17 19:13                             ` Thomas Gleixner
  2015-08-18  0:05                             ` Jon Christopherson
  2 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2015-08-17 19:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Sander Eikelenboom, Jon Christopherson, David Miller,
	linux-kernel, netdev, xen-devel, david.vrabel

On Mon, 17 Aug 2015, Eric Dumazet wrote:
> [PATCH] timer: fix a race in __mod_timer()
> 
> lock_timer_base() can not catch following :
> 
> CPU1 ( in __mod_timer()
> timer->flags |= TIMER_MIGRATING;
> spin_unlock(&base->lock);
> base = new_base;
> spin_lock(&base->lock);
> timer->flags &= ~TIMER_BASEMASK;
>                                   CPU2 (in lock_timer_base())
>                                   see timer base is cpu0 base
>                                   spin_lock_irqsave(&base->lock, *flags);
>                                   if (timer->flags == tf)
>                                        return base; // oops, wrong base
> timer->flags |= base->cpu // too late
> 
> We must write timer->flags in one go, otherwise we can fool other cpus.
> 
> Fixes: bc7a34b8b9eb ("timer: Reduce timer migration overhead if disabled")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/timer.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 5e097fa9faf7..84190f02b521 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -807,8 +807,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>  			spin_unlock(&base->lock);
>  			base = new_base;
>  			spin_lock(&base->lock);
> -			timer->flags &= ~TIMER_BASEMASK;
> -			timer->flags |= base->cpu;
> +			WRITE_ONCE(timer->flags,
> +				   (timer->flags & ~TIMER_BASEMASK) | base->cpu);

Duh, yes. Picking it up for timers/urgent.

Thanks for spotting it.

       tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80
  2015-08-17 17:18                           ` Eric Dumazet
  2015-08-17 18:27                             ` Sander Eikelenboom
  2015-08-17 19:13                             ` Thomas Gleixner
@ 2015-08-18  0:05                             ` Jon Christopherson
  2 siblings, 0 replies; 20+ messages in thread
From: Jon Christopherson @ 2015-08-18  0:05 UTC (permalink / raw)
  To: Eric Dumazet, Sander Eikelenboom, Thomas Gleixner
  Cc: David Miller, linux-kernel, netdev, xen-devel, david.vrabel

On 08/17/2015 12:18 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>

<snip>

>
> Then can you try following fix as well ?
>
> Thanks !
>
> [PATCH] timer: fix a race in __mod_timer()
>

<snip>

I have been running the latest code from git with the 2 patches in this 
thread applied. No issues so far.

-Jon

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-08-18  0:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-12 19:19 Linux 4.2-rc6 regression: RIP: e030:[<ffffffff8110fb18>] [<ffffffff8110fb18>] detach_if_pending+0x18/0x80 linux
2015-08-12 20:41 ` Eric Dumazet
2015-08-12 20:50   ` linux
2015-08-12 21:40     ` David Miller
2015-08-12 21:46       ` Sander Eikelenboom
2015-08-12 22:41         ` Eric Dumazet
2015-08-14 22:09           ` Sander Eikelenboom
2015-08-14 22:16             ` Sander Eikelenboom
2015-08-14 22:39             ` Eric Dumazet
2015-08-17  9:09               ` Sander Eikelenboom
2015-08-17 13:37                 ` Eric Dumazet
2015-08-17 13:48                   ` Sander Eikelenboom
2015-08-17 14:02                     ` Jon Christopherson
2015-08-17 14:21                       ` Eric Dumazet
2015-08-17 14:25                         ` Sander Eikelenboom
2015-08-17 15:16                           ` Jon Christopherson
2015-08-17 17:18                           ` Eric Dumazet
2015-08-17 18:27                             ` Sander Eikelenboom
2015-08-17 19:13                             ` Thomas Gleixner
2015-08-18  0:05                             ` Jon Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).