linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* e1000_netpoll() , BUG: sleeping function called from invalid context
@ 2017-02-17 21:20 Gabriel C
  2017-02-17 22:38 ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2017-02-17 21:20 UTC (permalink / raw)
  To: lkml; +Cc: Thomas Gleixner, WANG Cong, David S. Miller,
	Peter Zijlstra (Intel)

Hi all,

while poking at a different issue I found the following on my logs :

[85362.132770] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
[85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name: systemd-journal
[85362.132772] no locks held by systemd-journal/1153.
[85362.132772] irq event stamp: 60088359
[85362.132777] hardirqs last  enabled at (60088359): [<ffffffff810d07c2>] vprintk_emit+0x432/0x470
[85362.132779] hardirqs last disabled at (60088358): [<ffffffff810d03ec>] vprintk_emit+0x5c/0x470
[85362.132782] softirqs last  enabled at (60088258): [<ffffffff810688fd>] __do_softirq+0x22d/0x290
[85362.132784] softirqs last disabled at (60088233): [<ffffffff81068c0a>] irq_exit+0x6a/0xd0
[85362.132784] Preemption disabled at:
[85362.132787] [<ffffffff815203de>] write_msg+0x4e/0xf0
[85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G          I     4.10.0-rc8-debug-00001-ga1015e374d94-dirty #5
[85362.132791] Hardware name: FUJITSU                          PRIMERGY TX200 S5             /D2709, BIOS 6.00 Rev. 1.14.2709              02/04/2013
[85362.132792] Call Trace:
[85362.132796]  dump_stack+0x86/0xc1
[85362.132799]  ___might_sleep+0x213/0x230
[85362.132801]  __might_sleep+0x6b/0x80
[85362.132803]  synchronize_irq+0x33/0x90
[85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
[85362.132807]  ? __disable_irq_nosync+0x4e/0x60
[85362.132808]  disable_irq+0x17/0x20
[85362.132810]  e1000_netpoll+0x3d/0x110
[85362.132813]  netpoll_poll_dev+0xa0/0x170
[85362.132814]  netpoll_send_skb_on_dev+0x1ab/0x2b0
[85362.132816]  netpoll_send_udp+0x417/0x450
[85362.132817]  write_msg+0xdb/0xf0
[85362.132819]  console_unlock+0x2e5/0x430
[85362.132821]  ? __down_trylock_console_sem+0x41/0x60
[85362.132822]  vprintk_emit+0x456/0x470
[85362.132825]  printk_emit+0x2e/0x36
[85362.132827]  ? simple_strtoull+0x2c/0x50
[85362.132829]  devkmsg_write+0x115/0x160
[85362.132831]  do_iter_readv_writev+0xd8/0x100
[85362.132833]  do_readv_writev+0xdf/0x220
[85362.132835]  ? __might_fault+0x37/0x90
[85362.132838]  ? entry_SYSCALL_64_fastpath+0x5/0xc2
[85362.132839]  vfs_writev+0x3a/0x50
[85362.132841]  do_writev+0x4c/0xd0
[85362.132842]  SyS_writev+0xb/0x10
[85362.132843]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[85362.132845] RIP: 0033:0x7fc6d305c46d
[85362.132846] RSP: 002b:00007ffca94161e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
[85362.132847] RAX: ffffffffffffffda RBX: ffffffff813afad3 RCX: 00007fc6d305c46d
[85362.132848] RDX: 0000000000000005 RSI: 00007ffca94162f0 RDI: 0000000000000006
[85362.132849] RBP: ffffc9000d4c3f88 R08: 0000000000000000 R09: 0000000000000008
[85362.132850] R10: 0000000000000069 R11: 0000000000000293 R12: 00007ffca94162f0
[85362.132851] R13: 000055768b19c920 R14: 00007ffca9416330 R15: 00007ffca94163c0
[85362.132853]  ? __this_cpu_preempt_check+0x13/0x20


Seems to be introduced by :

commit 311191297125156319be8f86d546ea1c569f7e95
Author: WANG Cong <xiyou.wangcong@gmail.com>
Date:   Sat Dec 10 14:22:42 2016 -0800

     e1000: use disable_hardirq() for e1000_netpoll()

...

The used config can be found there:

http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/config-rcx


Regards,

Gabriel C

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-02-17 21:20 e1000_netpoll() , BUG: sleeping function called from invalid context Gabriel C
@ 2017-02-17 22:38 ` Cong Wang
  2017-02-17 23:16   ` Gabriel C
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2017-02-17 22:38 UTC (permalink / raw)
  To: Gabriel C; +Cc: lkml, Thomas Gleixner, David S. Miller, Peter Zijlstra (Intel)

On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C <nix.or.die@gmail.com> wrote:
> Hi all,
>
> while poking at a different issue I found the following on my logs :
>
> [85362.132770] BUG: sleeping function called from invalid context at
> kernel/irq/manage.c:110
> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
> systemd-journal
> [85362.132772] no locks held by systemd-journal/1153.
> [85362.132772] irq event stamp: 60088359
> [85362.132777] hardirqs last  enabled at (60088359): [<ffffffff810d07c2>]
> vprintk_emit+0x432/0x470
> [85362.132779] hardirqs last disabled at (60088358): [<ffffffff810d03ec>]
> vprintk_emit+0x5c/0x470
> [85362.132782] softirqs last  enabled at (60088258): [<ffffffff810688fd>]
> __do_softirq+0x22d/0x290
> [85362.132784] softirqs last disabled at (60088233): [<ffffffff81068c0a>]
> irq_exit+0x6a/0xd0
> [85362.132784] Preemption disabled at:
> [85362.132787] [<ffffffff815203de>] write_msg+0x4e/0xf0
> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G          I
> 4.10.0-rc8-debug-00001-ga1015e374d94-dirty #5
> [85362.132791] Hardware name: FUJITSU                          PRIMERGY
> TX200 S5             /D2709, BIOS 6.00 Rev. 1.14.2709
> 02/04/2013
> [85362.132792] Call Trace:
> [85362.132796]  dump_stack+0x86/0xc1
> [85362.132799]  ___might_sleep+0x213/0x230
> [85362.132801]  __might_sleep+0x6b/0x80
> [85362.132803]  synchronize_irq+0x33/0x90
> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
> [85362.132808]  disable_irq+0x17/0x20


Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
look like prior to my commit, because with my commit here should
be disable_hardirq() calling synchronize_hardirq().

Did you revert it or make any local changes?


> [85362.132810]  e1000_netpoll+0x3d/0x110
> [85362.132813]  netpoll_poll_dev+0xa0/0x170
> [85362.132814]  netpoll_send_skb_on_dev+0x1ab/0x2b0
> [85362.132816]  netpoll_send_udp+0x417/0x450
> [85362.132817]  write_msg+0xdb/0xf0
> [85362.132819]  console_unlock+0x2e5/0x430
> [85362.132821]  ? __down_trylock_console_sem+0x41/0x60
> [85362.132822]  vprintk_emit+0x456/0x470
> [85362.132825]  printk_emit+0x2e/0x36
> [85362.132827]  ? simple_strtoull+0x2c/0x50
> [85362.132829]  devkmsg_write+0x115/0x160
> [85362.132831]  do_iter_readv_writev+0xd8/0x100
> [85362.132833]  do_readv_writev+0xdf/0x220
> [85362.132835]  ? __might_fault+0x37/0x90
> [85362.132838]  ? entry_SYSCALL_64_fastpath+0x5/0xc2
> [85362.132839]  vfs_writev+0x3a/0x50
> [85362.132841]  do_writev+0x4c/0xd0
> [85362.132842]  SyS_writev+0xb/0x10
> [85362.132843]  entry_SYSCALL_64_fastpath+0x1f/0xc2
> [85362.132845] RIP: 0033:0x7fc6d305c46d
> [85362.132846] RSP: 002b:00007ffca94161e0 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000014
> [85362.132847] RAX: ffffffffffffffda RBX: ffffffff813afad3 RCX:
> 00007fc6d305c46d
> [85362.132848] RDX: 0000000000000005 RSI: 00007ffca94162f0 RDI:
> 0000000000000006
> [85362.132849] RBP: ffffc9000d4c3f88 R08: 0000000000000000 R09:
> 0000000000000008
> [85362.132850] R10: 0000000000000069 R11: 0000000000000293 R12:
> 00007ffca94162f0
> [85362.132851] R13: 000055768b19c920 R14: 00007ffca9416330 R15:
> 00007ffca94163c0
> [85362.132853]  ? __this_cpu_preempt_check+0x13/0x20
>
>
> Seems to be introduced by :
>
> commit 311191297125156319be8f86d546ea1c569f7e95
> Author: WANG Cong <xiyou.wangcong@gmail.com>
> Date:   Sat Dec 10 14:22:42 2016 -0800
>
>     e1000: use disable_hardirq() for e1000_netpoll()
>
> ...
>
> The used config can be found there:
>
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/config-rcx
>
>
> Regards,
>
> Gabriel C

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-02-17 22:38 ` Cong Wang
@ 2017-02-17 23:16   ` Gabriel C
  2017-02-17 23:24     ` Gabriel C
  2017-02-17 23:25     ` Cong Wang
  0 siblings, 2 replies; 14+ messages in thread
From: Gabriel C @ 2017-02-17 23:16 UTC (permalink / raw)
  To: Cong Wang; +Cc: lkml, Thomas Gleixner, David S. Miller, Peter Zijlstra (Intel)



On 17.02.2017 23:38, Cong Wang wrote:
> On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C <nix.or.die@gmail.com> wrote:
>> Hi all,
>>
>> while poking at a different issue I found the following on my logs :
>>
>> [85362.132770] BUG: sleeping function called from invalid context at
>> kernel/irq/manage.c:110
>> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
>> systemd-journal
>> [85362.132772] no locks held by systemd-journal/1153.
>> [85362.132772] irq event stamp: 60088359
>> [85362.132777] hardirqs last  enabled at (60088359): [<ffffffff810d07c2>]
>> vprintk_emit+0x432/0x470
>> [85362.132779] hardirqs last disabled at (60088358): [<ffffffff810d03ec>]
>> vprintk_emit+0x5c/0x470
>> [85362.132782] softirqs last  enabled at (60088258): [<ffffffff810688fd>]
>> __do_softirq+0x22d/0x290
>> [85362.132784] softirqs last disabled at (60088233): [<ffffffff81068c0a>]
>> irq_exit+0x6a/0xd0
>> [85362.132784] Preemption disabled at:
>> [85362.132787] [<ffffffff815203de>] write_msg+0x4e/0xf0
>> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G          I
>> 4.10.0-rc8-debug-00001-ga1015e374d94-dirty #5
>> [85362.132791] Hardware name: FUJITSU                          PRIMERGY
>> TX200 S5             /D2709, BIOS 6.00 Rev. 1.14.2709
>> 02/04/2013
>> [85362.132792] Call Trace:
>> [85362.132796]  dump_stack+0x86/0xc1
>> [85362.132799]  ___might_sleep+0x213/0x230
>> [85362.132801]  __might_sleep+0x6b/0x80
>> [85362.132803]  synchronize_irq+0x33/0x90
>> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
>> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
>> [85362.132808]  disable_irq+0x17/0x20
>
>
> Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
> look like prior to my commit, because with my commit here should
> be disable_hardirq() calling synchronize_hardirq().
>
> Did you revert it or make any local changes?

The kernel is -rc8 with reverted d966564fcdc19e13eb6ba1fbe6b8101070339c3d
+ http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=202461e2f3c15dbfb05825d29ace0d20cdf55fa4
+ an debug patch from Thomas to find these goldfish issues.
( http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/goldfish-debug.patch )

No other changes..


>
>
>> [85362.132810]  e1000_netpoll+0x3d/0x110
>> [85362.132813]  netpoll_poll_dev+0xa0/0x170
>> [85362.132814]  netpoll_send_skb_on_dev+0x1ab/0x2b0
>> [85362.132816]  netpoll_send_udp+0x417/0x450
>> [85362.132817]  write_msg+0xdb/0xf0
>> [85362.132819]  console_unlock+0x2e5/0x430
>> [85362.132821]  ? __down_trylock_console_sem+0x41/0x60
>> [85362.132822]  vprintk_emit+0x456/0x470
>> [85362.132825]  printk_emit+0x2e/0x36
>> [85362.132827]  ? simple_strtoull+0x2c/0x50
>> [85362.132829]  devkmsg_write+0x115/0x160
>> [85362.132831]  do_iter_readv_writev+0xd8/0x100
>> [85362.132833]  do_readv_writev+0xdf/0x220
>> [85362.132835]  ? __might_fault+0x37/0x90
>> [85362.132838]  ? entry_SYSCALL_64_fastpath+0x5/0xc2
>> [85362.132839]  vfs_writev+0x3a/0x50
>> [85362.132841]  do_writev+0x4c/0xd0
>> [85362.132842]  SyS_writev+0xb/0x10
>> [85362.132843]  entry_SYSCALL_64_fastpath+0x1f/0xc2
>> [85362.132845] RIP: 0033:0x7fc6d305c46d
>> [85362.132846] RSP: 002b:00007ffca94161e0 EFLAGS: 00000293 ORIG_RAX:
>> 0000000000000014
>> [85362.132847] RAX: ffffffffffffffda RBX: ffffffff813afad3 RCX:
>> 00007fc6d305c46d
>> [85362.132848] RDX: 0000000000000005 RSI: 00007ffca94162f0 RDI:
>> 0000000000000006
>> [85362.132849] RBP: ffffc9000d4c3f88 R08: 0000000000000000 R09:
>> 0000000000000008
>> [85362.132850] R10: 0000000000000069 R11: 0000000000000293 R12:
>> 00007ffca94162f0
>> [85362.132851] R13: 000055768b19c920 R14: 00007ffca9416330 R15:
>> 00007ffca94163c0
>> [85362.132853]  ? __this_cpu_preempt_check+0x13/0x20
>>
>>
>> Seems to be introduced by :
>>
>> commit 311191297125156319be8f86d546ea1c569f7e95
>> Author: WANG Cong <xiyou.wangcong@gmail.com>
>> Date:   Sat Dec 10 14:22:42 2016 -0800
>>
>>     e1000: use disable_hardirq() for e1000_netpoll()
>>
>> ...
>>
>> The used config can be found there:
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/config-rcx
>>
>>
>> Regards,
>>
>> Gabriel C

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-02-17 23:16   ` Gabriel C
@ 2017-02-17 23:24     ` Gabriel C
  2017-02-17 23:25     ` Cong Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Gabriel C @ 2017-02-17 23:24 UTC (permalink / raw)
  To: Cong Wang; +Cc: lkml, Thomas Gleixner, David S. Miller, Peter Zijlstra (Intel)



On 18.02.2017 00:16, Gabriel C wrote:
>
>
> On 17.02.2017 23:38, Cong Wang wrote:
>> On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C <nix.or.die@gmail.com> wrote:
>>> Hi all,
>>>
>>> while poking at a different issue I found the following on my logs :
>>>
>>> [85362.132770] BUG: sleeping function called from invalid context at
>>> kernel/irq/manage.c:110
>>> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
>>> systemd-journal
>>> [85362.132772] no locks held by systemd-journal/1153.
>>> [85362.132772] irq event stamp: 60088359
>>> [85362.132777] hardirqs last  enabled at (60088359): [<ffffffff810d07c2>]
>>> vprintk_emit+0x432/0x470
>>> [85362.132779] hardirqs last disabled at (60088358): [<ffffffff810d03ec>]
>>> vprintk_emit+0x5c/0x470
>>> [85362.132782] softirqs last  enabled at (60088258): [<ffffffff810688fd>]
>>> __do_softirq+0x22d/0x290
>>> [85362.132784] softirqs last disabled at (60088233): [<ffffffff81068c0a>]
>>> irq_exit+0x6a/0xd0
>>> [85362.132784] Preemption disabled at:
>>> [85362.132787] [<ffffffff815203de>] write_msg+0x4e/0xf0
>>> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G          I
>>> 4.10.0-rc8-debug-00001-ga1015e374d94-dirty #5
>>> [85362.132791] Hardware name: FUJITSU                          PRIMERGY
>>> TX200 S5             /D2709, BIOS 6.00 Rev. 1.14.2709
>>> 02/04/2013
>>> [85362.132792] Call Trace:
>>> [85362.132796]  dump_stack+0x86/0xc1
>>> [85362.132799]  ___might_sleep+0x213/0x230
>>> [85362.132801]  __might_sleep+0x6b/0x80
>>> [85362.132803]  synchronize_irq+0x33/0x90
>>> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
>>> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
>>> [85362.132808]  disable_irq+0x17/0x20
>>
>>
>> Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
>> look like prior to my commit, because with my commit here should
>> be disable_hardirq() calling synchronize_hardirq().
>>
>> Did you revert it or make any local changes?
>
> The kernel is -rc8 with reverted d966564fcdc19e13eb6ba1fbe6b8101070339c3d
> + http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=202461e2f3c15dbfb05825d29ace0d20cdf55fa4
> + an debug patch from Thomas to find these goldfish issues.
> ( http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/goldfish-debug.patch )
>
> No other changes..
>

Forgot to tell ,  netconsole is running on that box ..

>
>>
>>
>>> [85362.132810]  e1000_netpoll+0x3d/0x110
>>> [85362.132813]  netpoll_poll_dev+0xa0/0x170
>>> [85362.132814]  netpoll_send_skb_on_dev+0x1ab/0x2b0
>>> [85362.132816]  netpoll_send_udp+0x417/0x450
>>> [85362.132817]  write_msg+0xdb/0xf0
>>> [85362.132819]  console_unlock+0x2e5/0x430
>>> [85362.132821]  ? __down_trylock_console_sem+0x41/0x60
>>> [85362.132822]  vprintk_emit+0x456/0x470
>>> [85362.132825]  printk_emit+0x2e/0x36
>>> [85362.132827]  ? simple_strtoull+0x2c/0x50
>>> [85362.132829]  devkmsg_write+0x115/0x160
>>> [85362.132831]  do_iter_readv_writev+0xd8/0x100
>>> [85362.132833]  do_readv_writev+0xdf/0x220
>>> [85362.132835]  ? __might_fault+0x37/0x90
>>> [85362.132838]  ? entry_SYSCALL_64_fastpath+0x5/0xc2
>>> [85362.132839]  vfs_writev+0x3a/0x50
>>> [85362.132841]  do_writev+0x4c/0xd0
>>> [85362.132842]  SyS_writev+0xb/0x10
>>> [85362.132843]  entry_SYSCALL_64_fastpath+0x1f/0xc2
>>> [85362.132845] RIP: 0033:0x7fc6d305c46d
>>> [85362.132846] RSP: 002b:00007ffca94161e0 EFLAGS: 00000293 ORIG_RAX:
>>> 0000000000000014
>>> [85362.132847] RAX: ffffffffffffffda RBX: ffffffff813afad3 RCX:
>>> 00007fc6d305c46d
>>> [85362.132848] RDX: 0000000000000005 RSI: 00007ffca94162f0 RDI:
>>> 0000000000000006
>>> [85362.132849] RBP: ffffc9000d4c3f88 R08: 0000000000000000 R09:
>>> 0000000000000008
>>> [85362.132850] R10: 0000000000000069 R11: 0000000000000293 R12:
>>> 00007ffca94162f0
>>> [85362.132851] R13: 000055768b19c920 R14: 00007ffca9416330 R15:
>>> 00007ffca94163c0
>>> [85362.132853]  ? __this_cpu_preempt_check+0x13/0x20
>>>
>>>
>>> Seems to be introduced by :
>>>
>>> commit 311191297125156319be8f86d546ea1c569f7e95
>>> Author: WANG Cong <xiyou.wangcong@gmail.com>
>>> Date:   Sat Dec 10 14:22:42 2016 -0800
>>>
>>>     e1000: use disable_hardirq() for e1000_netpoll()
>>>
>>> ...
>>>
>>> The used config can be found there:
>>>
>>> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/config-rcx
>>>
>>>
>>> Regards,
>>>
>>> Gabriel C

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-02-17 23:16   ` Gabriel C
  2017-02-17 23:24     ` Gabriel C
@ 2017-02-17 23:25     ` Cong Wang
  2017-02-17 23:38       ` Gabriel C
  1 sibling, 1 reply; 14+ messages in thread
From: Cong Wang @ 2017-02-17 23:25 UTC (permalink / raw)
  To: Gabriel C; +Cc: lkml, Thomas Gleixner, David S. Miller, Peter Zijlstra (Intel)

On Fri, Feb 17, 2017 at 3:16 PM, Gabriel C <nix.or.die@gmail.com> wrote:
>
>
> On 17.02.2017 23:38, Cong Wang wrote:
>>
>> On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C <nix.or.die@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> while poking at a different issue I found the following on my logs :
>>>
>>> [85362.132770] BUG: sleeping function called from invalid context at
>>> kernel/irq/manage.c:110
>>> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
>>> systemd-journal
>>> [85362.132772] no locks held by systemd-journal/1153.
>>> [85362.132772] irq event stamp: 60088359
>>> [85362.132777] hardirqs last  enabled at (60088359): [<ffffffff810d07c2>]
>>> vprintk_emit+0x432/0x470
>>> [85362.132779] hardirqs last disabled at (60088358): [<ffffffff810d03ec>]
>>> vprintk_emit+0x5c/0x470
>>> [85362.132782] softirqs last  enabled at (60088258): [<ffffffff810688fd>]
>>> __do_softirq+0x22d/0x290
>>> [85362.132784] softirqs last disabled at (60088233): [<ffffffff81068c0a>]
>>> irq_exit+0x6a/0xd0
>>> [85362.132784] Preemption disabled at:
>>> [85362.132787] [<ffffffff815203de>] write_msg+0x4e/0xf0
>>> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G
>>> I
>>> 4.10.0-rc8-debug-00001-ga1015e374d94-dirty #5
>>> [85362.132791] Hardware name: FUJITSU                          PRIMERGY
>>> TX200 S5             /D2709, BIOS 6.00 Rev. 1.14.2709
>>> 02/04/2013
>>> [85362.132792] Call Trace:
>>> [85362.132796]  dump_stack+0x86/0xc1
>>> [85362.132799]  ___might_sleep+0x213/0x230
>>> [85362.132801]  __might_sleep+0x6b/0x80
>>> [85362.132803]  synchronize_irq+0x33/0x90
>>> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
>>> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
>>> [85362.132808]  disable_irq+0x17/0x20
>>
>>
>>
>> Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
>> look like prior to my commit, because with my commit here should
>> be disable_hardirq() calling synchronize_hardirq().
>>
>> Did you revert it or make any local changes?
>
>
> The kernel is -rc8 with reverted d966564fcdc19e13eb6ba1fbe6b8101070339c3d
> +
> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=202461e2f3c15dbfb05825d29ace0d20cdf55fa4
> + an debug patch from Thomas to find these goldfish issues.
> (
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/goldfish-debug.patch
> )
>
> No other changes..

That is weird, the stack trace doesn't match the source code for some reason.
Can you objdump your e1000.ko module to see if that is true?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-02-17 23:25     ` Cong Wang
@ 2017-02-17 23:38       ` Gabriel C
  2017-02-17 23:44         ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2017-02-17 23:38 UTC (permalink / raw)
  To: Cong Wang; +Cc: lkml, Thomas Gleixner, David S. Miller, Peter Zijlstra (Intel)



On 18.02.2017 00:25, Cong Wang wrote:
> On Fri, Feb 17, 2017 at 3:16 PM, Gabriel C <nix.or.die@gmail.com> wrote:
>>
>>
>> On 17.02.2017 23:38, Cong Wang wrote:
>>>
>>> On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C <nix.or.die@gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> while poking at a different issue I found the following on my logs :
>>>>
>>>> [85362.132770] BUG: sleeping function called from invalid context at
>>>> kernel/irq/manage.c:110
>>>> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
>>>> systemd-journal
>>>> [85362.132772] no locks held by systemd-journal/1153.
>>>> [85362.132772] irq event stamp: 60088359
>>>> [85362.132777] hardirqs last  enabled at (60088359): [<ffffffff810d07c2>]
>>>> vprintk_emit+0x432/0x470
>>>> [85362.132779] hardirqs last disabled at (60088358): [<ffffffff810d03ec>]
>>>> vprintk_emit+0x5c/0x470
>>>> [85362.132782] softirqs last  enabled at (60088258): [<ffffffff810688fd>]
>>>> __do_softirq+0x22d/0x290
>>>> [85362.132784] softirqs last disabled at (60088233): [<ffffffff81068c0a>]
>>>> irq_exit+0x6a/0xd0
>>>> [85362.132784] Preemption disabled at:
>>>> [85362.132787] [<ffffffff815203de>] write_msg+0x4e/0xf0
>>>> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G
>>>> I
>>>> 4.10.0-rc8-debug-00001-ga1015e374d94-dirty #5
>>>> [85362.132791] Hardware name: FUJITSU                          PRIMERGY
>>>> TX200 S5             /D2709, BIOS 6.00 Rev. 1.14.2709
>>>> 02/04/2013
>>>> [85362.132792] Call Trace:
>>>> [85362.132796]  dump_stack+0x86/0xc1
>>>> [85362.132799]  ___might_sleep+0x213/0x230
>>>> [85362.132801]  __might_sleep+0x6b/0x80
>>>> [85362.132803]  synchronize_irq+0x33/0x90
>>>> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
>>>> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
>>>> [85362.132808]  disable_irq+0x17/0x20
>>>
>>>
>>>
>>> Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
>>> look like prior to my commit, because with my commit here should
>>> be disable_hardirq() calling synchronize_hardirq().
>>>
>>> Did you revert it or make any local changes?
>>
>>
>> The kernel is -rc8 with reverted d966564fcdc19e13eb6ba1fbe6b8101070339c3d
>> +
>> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=202461e2f3c15dbfb05825d29ace0d20cdf55fa4
>> + an debug patch from Thomas to find these goldfish issues.
>> (
>> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/goldfish-debug.patch
>> )
>>
>> No other changes..
>
> That is weird, the stack trace doesn't match the source code for some reason.
> Can you objdump your e1000.ko module to see if that is true?
>

My card seems to use the e1000e driver which is buit-in..

Anyway here an objdump -x :

http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/objdump-x_e1000.ko.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-02-17 23:38       ` Gabriel C
@ 2017-02-17 23:44         ` Cong Wang
  2017-02-17 23:53           ` Gabriel C
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2017-02-17 23:44 UTC (permalink / raw)
  To: Gabriel C; +Cc: lkml, Thomas Gleixner, David S. Miller, Peter Zijlstra (Intel)

On Fri, Feb 17, 2017 at 3:38 PM, Gabriel C <nix.or.die@gmail.com> wrote:
>
> My card seems to use the e1000e driver which is buit-in..
>
> Anyway here an objdump -x :
>
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/objdump-x_e1000.ko.txt
>

Found disable_hardirq() but not disable_irq().

Are you sure the kernel warning was emitted by this binary rather than
some old one?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-02-17 23:44         ` Cong Wang
@ 2017-02-17 23:53           ` Gabriel C
  2017-03-01 17:13             ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2017-02-17 23:53 UTC (permalink / raw)
  To: Cong Wang; +Cc: lkml, Thomas Gleixner, David S. Miller, Peter Zijlstra (Intel)



On 18.02.2017 00:44, Cong Wang wrote:
> On Fri, Feb 17, 2017 at 3:38 PM, Gabriel C <nix.or.die@gmail.com> wrote:
>>
>> My card seems to use the e1000e driver which is buit-in..
>>
>> Anyway here an objdump -x :
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/objdump-x_e1000.ko.txt
>>
>
> Found disable_hardirq() but not disable_irq().
>
> Are you sure the kernel warning was emitted by this binary rather than
> some old one?
>

Yes , I use an clean build , remove any other files in /boot && /lib/modules before installing the new build kernel.

I've asked Thomas before I send this to lkml and to you and he said :

..


It's a real issue. netconsole calls disable_irq() which might sleep from an
interrupt and preemption disabled context.

 > [85362.132801]  __might_sleep+0x6b/0x80
 > [85362.132803]  synchronize_irq+0x33/0x90
 > [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
 > [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
 > [85362.132808]  disable_irq+0x17/0x20
 > [85362.132810]  e1000_netpoll+0x3d/0x110

Though, what's weird is that the e1000_netpoll() does not longer call
disable_irq(). It calls disable_hardirq().

That got changed in commit 3111912971251 which got into Linus tree during
the 4.10 merge window. So it is in 4.10-rc8.

Confused.

...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-02-17 23:53           ` Gabriel C
@ 2017-03-01 17:13             ` Thomas Gleixner
  2017-03-01 22:32               ` Gabriel C
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2017-03-01 17:13 UTC (permalink / raw)
  To: Gabriel C; +Cc: Cong Wang, lkml, David S. Miller, Peter Zijlstra (Intel)

On Sat, 18 Feb 2017, Gabriel C wrote:
> That got changed in commit 3111912971251 which got into Linus tree during
> the 4.10 merge window. So it is in 4.10-rc8.
> 
> Confused.

I'm still confused. Gabriel, can you please try to reproduce with 4.10
final?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-03-01 17:13             ` Thomas Gleixner
@ 2017-03-01 22:32               ` Gabriel C
  2017-03-01 22:38                 ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2017-03-01 22:32 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Cong Wang, lkml, David S. Miller, Peter Zijlstra (Intel)



On 01.03.2017 18:13, Thomas Gleixner wrote:
> On Sat, 18 Feb 2017, Gabriel C wrote:
>> That got changed in commit 3111912971251 which got into Linus tree during
>> the 4.10 merge window. So it is in 4.10-rc8.
>>
>> Confused.
>
> I'm still confused. Gabriel, can you please try to reproduce with 4.10
> final?
>

Sure , do you want plain 4.10.0 or 4.10.1 ?

Btw is possible to rename e1000_netpoll() from e1000e driver
to e1000e_netpoll() so we know what driver is in use ?

My card uses the e1000e driver..

Regard,

Gabriel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-03-01 22:32               ` Gabriel C
@ 2017-03-01 22:38                 ` Thomas Gleixner
  2017-03-01 22:44                   ` Gabriel C
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2017-03-01 22:38 UTC (permalink / raw)
  To: Gabriel C; +Cc: Cong Wang, lkml, David S. Miller, Peter Zijlstra (Intel)

On Wed, 1 Mar 2017, Gabriel C wrote:
> On 01.03.2017 18:13, Thomas Gleixner wrote:
> > On Sat, 18 Feb 2017, Gabriel C wrote:
> > > That got changed in commit 3111912971251 which got into Linus tree during
> > > the 4.10 merge window. So it is in 4.10-rc8.
> > > 
> > > Confused.
> > 
> > I'm still confused. Gabriel, can you please try to reproduce with 4.10
> > final?
> > 
> 
> Sure , do you want plain 4.10.0 or 4.10.1 ?
> 
> Btw is possible to rename e1000_netpoll() from e1000e driver
> to e1000e_netpoll() so we know what driver is in use ?

Grr. yes. That would be definitely helpful. I was staring into the wrong
one of course.

> My card uses the e1000e driver..

Does the patch below fix it?

Thanks,

	tglx
	
8<--------------

--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6716,19 +6716,19 @@ static irqreturn_t e1000_intr_msix(int _
 
 		vector = 0;
 		msix_irq = adapter->msix_entries[vector].vector;
-		disable_irq(msix_irq);
+		disable_hardirq(msix_irq);
 		e1000_intr_msix_rx(msix_irq, netdev);
 		enable_irq(msix_irq);
 
 		vector++;
 		msix_irq = adapter->msix_entries[vector].vector;
-		disable_irq(msix_irq);
+		disable_hardirq(msix_irq);
 		e1000_intr_msix_tx(msix_irq, netdev);
 		enable_irq(msix_irq);
 
 		vector++;
 		msix_irq = adapter->msix_entries[vector].vector;
-		disable_irq(msix_irq);
+		disable_hardirq(msix_irq);
 		e1000_msix_other(msix_irq, netdev);
 		enable_irq(msix_irq);
 	}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-03-01 22:38                 ` Thomas Gleixner
@ 2017-03-01 22:44                   ` Gabriel C
  2017-03-02  0:04                     ` Gabriel C
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2017-03-01 22:44 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Cong Wang, lkml, David S. Miller, Peter Zijlstra (Intel)



On 01.03.2017 23:38, Thomas Gleixner wrote:
> On Wed, 1 Mar 2017, Gabriel C wrote:
>> On 01.03.2017 18:13, Thomas Gleixner wrote:
>>> On Sat, 18 Feb 2017, Gabriel C wrote:
>>>> That got changed in commit 3111912971251 which got into Linus tree during
>>>> the 4.10 merge window. So it is in 4.10-rc8.
>>>>
>>>> Confused.
>>>
>>> I'm still confused. Gabriel, can you please try to reproduce with 4.10
>>> final?
>>>
>>
>> Sure , do you want plain 4.10.0 or 4.10.1 ?
>>
>> Btw is possible to rename e1000_netpoll() from e1000e driver
>> to e1000e_netpoll() so we know what driver is in use ?
>
> Grr. yes. That would be definitely helpful. I was staring into the wrong
> one of course.
>
>> My card uses the e1000e driver..
>
> Does the patch below fix it?

I'll test you patch in a bit and let you know.

>
> Thanks,
>
> 	tglx
> 	
> 8<--------------
>
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6716,19 +6716,19 @@ static irqreturn_t e1000_intr_msix(int _
>
>  		vector = 0;
>  		msix_irq = adapter->msix_entries[vector].vector;
> -		disable_irq(msix_irq);
> +		disable_hardirq(msix_irq);
>  		e1000_intr_msix_rx(msix_irq, netdev);
>  		enable_irq(msix_irq);
>
>  		vector++;
>  		msix_irq = adapter->msix_entries[vector].vector;
> -		disable_irq(msix_irq);
> +		disable_hardirq(msix_irq);
>  		e1000_intr_msix_tx(msix_irq, netdev);
>  		enable_irq(msix_irq);
>
>  		vector++;
>  		msix_irq = adapter->msix_entries[vector].vector;
> -		disable_irq(msix_irq);
> +		disable_hardirq(msix_irq);
>  		e1000_msix_other(msix_irq, netdev);
>  		enable_irq(msix_irq);
>  	}
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-03-01 22:44                   ` Gabriel C
@ 2017-03-02  0:04                     ` Gabriel C
  2017-03-02 12:22                       ` Gabriel C
  0 siblings, 1 reply; 14+ messages in thread
From: Gabriel C @ 2017-03-02  0:04 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Cong Wang, lkml, David S. Miller, Peter Zijlstra (Intel)



>> Does the patch below fix it?
>
> I'll test you patch in a bit and let you know.
>

It seem to work.

But to be really sure I let the box running with this setup over night.


>> 	
>> 8<--------------
>>
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -6716,19 +6716,19 @@ static irqreturn_t e1000_intr_msix(int _
>>
>>  		vector = 0;
>>  		msix_irq = adapter->msix_entries[vector].vector;
>> -		disable_irq(msix_irq);
>> +		disable_hardirq(msix_irq);
>>  		e1000_intr_msix_rx(msix_irq, netdev);
>>  		enable_irq(msix_irq);
>>
>>  		vector++;
>>  		msix_irq = adapter->msix_entries[vector].vector;
>> -		disable_irq(msix_irq);
>> +		disable_hardirq(msix_irq);
>>  		e1000_intr_msix_tx(msix_irq, netdev);
>>  		enable_irq(msix_irq);
>>
>>  		vector++;
>>  		msix_irq = adapter->msix_entries[vector].vector;
>> -		disable_irq(msix_irq);
>> +		disable_hardirq(msix_irq);
>>  		e1000_msix_other(msix_irq, netdev);
>>  		enable_irq(msix_irq);
>>  	}
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e1000_netpoll() , BUG: sleeping function called from invalid context
  2017-03-02  0:04                     ` Gabriel C
@ 2017-03-02 12:22                       ` Gabriel C
  0 siblings, 0 replies; 14+ messages in thread
From: Gabriel C @ 2017-03-02 12:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Cong Wang, lkml, David S. Miller, Peter Zijlstra (Intel)



On 02.03.2017 01:04, Gabriel C wrote:
>
>
>>> Does the patch below fix it?
>>
>> I'll test you patch in a bit and let you know.
>>
>
> It seem to work.
>
> But to be really sure I let the box running with this setup over night.
>

Also with 4.10.1 + your patch all seems fine..

The box is up 13 hours now and I cannot trigger the BUG()..

>>> 	
>>> 8<--------------
>>>
>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> @@ -6716,19 +6716,19 @@ static irqreturn_t e1000_intr_msix(int _
>>>
>>>  		vector = 0;
>>>  		msix_irq = adapter->msix_entries[vector].vector;
>>> -		disable_irq(msix_irq);
>>> +		disable_hardirq(msix_irq);
>>>  		e1000_intr_msix_rx(msix_irq, netdev);
>>>  		enable_irq(msix_irq);
>>>
>>>  		vector++;
>>>  		msix_irq = adapter->msix_entries[vector].vector;
>>> -		disable_irq(msix_irq);
>>> +		disable_hardirq(msix_irq);
>>>  		e1000_intr_msix_tx(msix_irq, netdev);
>>>  		enable_irq(msix_irq);
>>>
>>>  		vector++;
>>>  		msix_irq = adapter->msix_entries[vector].vector;
>>> -		disable_irq(msix_irq);
>>> +		disable_hardirq(msix_irq);
>>>  		e1000_msix_other(msix_irq, netdev);
>>>  		enable_irq(msix_irq);
>>>  	}
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-03-02 14:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-17 21:20 e1000_netpoll() , BUG: sleeping function called from invalid context Gabriel C
2017-02-17 22:38 ` Cong Wang
2017-02-17 23:16   ` Gabriel C
2017-02-17 23:24     ` Gabriel C
2017-02-17 23:25     ` Cong Wang
2017-02-17 23:38       ` Gabriel C
2017-02-17 23:44         ` Cong Wang
2017-02-17 23:53           ` Gabriel C
2017-03-01 17:13             ` Thomas Gleixner
2017-03-01 22:32               ` Gabriel C
2017-03-01 22:38                 ` Thomas Gleixner
2017-03-01 22:44                   ` Gabriel C
2017-03-02  0:04                     ` Gabriel C
2017-03-02 12:22                       ` Gabriel C

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).