WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
@ 2016-07-05 13:33 Mason
  2016-07-05 14:50 ` Mason
  0 siblings, 1 reply; 9+ messages in thread
From: Mason @ 2016-07-05 13:33 UTC (permalink / raw)
  To: linux-pm, netdev, LKML; +Cc: Sebastian Frias

Hello,

I was testing suspend/resume sequences where the suspend operation
fails and returns without having suspended the platform.

# echo mem > /sys/power/state
[   90.322264] PM: Syncing filesystems ... done.
[   90.328758] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   90.337092] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
[   90.346765] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   90.355357] Suspending console(s) (use no_console_suspend to debug)
[   90.364590] PM: suspend of devices complete after 2.068 msecs
[   90.365554] PM: late suspend of devices complete after 0.954 msecs
[   90.366223] PM: noirq suspend of devices complete after 0.662 msecs
[   90.366227] Disabling non-boot CPUs ...
[   90.379004] CPU1: shutdown
[   90.412661] Enabling non-boot CPUs ...
[   90.450385] CPU1 is up
[   90.450979] PM: noirq resume of devices complete after 0.584 msecs
[   90.451672] PM: early resume of devices complete after 0.667 msecs
[   90.453149] nb8800 26000.ethernet eth0: Link is Down
[   90.453264] PM: resume of devices complete after 1.583 msecs
[   90.508180] Restarting tasks ... done.
-sh: echo: write error: Input/output error
[   93.860411] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

(The error message is expected, as my suspend routine returns -EIO
on failure.)

I left the system to idle at the prompt; then 5 minutes later,
the system printed the following trace.

[  400.718491] ------------[ cut here ]------------
[  400.723175] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
[  400.731582] Modules linked in:
[  400.734689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6-00010-gd07031bdc433 #1
[  400.742646] Hardware name: Sigma Tango DT
[  400.746671] Backtrace: 
[  400.749141] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
[  400.756747]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
[  400.762454] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
[  400.769722] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
[  400.776717]  r7:00000009 r6:c05e3fbc r5:00000000 r4:00000000
[  400.782417] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
[  400.790022]  r9:dfbdd4e0 r8:0000000a r7:c0801de8 r6:df6f9514 r5:df5df144 r4:df5df040
[  400.797825] [<c011c070>] (warn_slowpath_null) from [<c0463c04>] (inet_sock_destruct+0x1c4/0x1dc)
[  400.806661] [<c0463a40>] (inet_sock_destruct) from [<c03e9c60>] (__sk_destruct+0x28/0xe0)
[  400.814878]  r7:c0801de8 r6:df6f9514 r5:df5df040 r4:df5df1ec
[  400.820584] [<c03e9c38>] (__sk_destruct) from [<c016f230>] (rcu_process_callbacks+0x488/0x59c)
[  400.829237]  r5:00000000 r4:00000000
[  400.832836] [<c016eda8>] (rcu_process_callbacks) from [<c01207e4>] (__do_softirq+0x138/0x264)
[  400.841402]  r10:c08020a0 r9:40000001 r8:00000101 r7:c0800000 r6:c08020a4 r5:00000009
[  400.849285]  r4:00000000
[  400.851829] [<c01206ac>] (__do_softirq) from [<c0120c04>] (irq_exit+0xc8/0x104)
[  400.859172]  r10:c0801f10 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
[  400.867053]  r4:c0735428
[  400.869601] [<c0120b3c>] (irq_exit) from [<c0162610>] (__handle_domain_irq+0x88/0xf4)
[  400.877473] [<c0162588>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
[  400.885865]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c r5:c080277c
[  400.893747]  r4:c080eca0 r3:c0801f10
[  400.897342] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
[  400.904861] Exception stack(0xc0801f10 to 0xc0801f58)
[  400.909936] 1f00:                                     00000000 00000000 0000826a c0117c80
[  400.918156] 1f20: c0800000 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 dfffcdc0 c0801f6c
[  400.926376] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 60000013 ffffffff
[  400.933020]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6:ffffffff r5:60000013 r4:c01086b4
[  400.940826] [<c0108674>] (arch_cpu_idle) from [<c0155f54>] (default_idle_call+0x28/0x34)
[  400.948960] [<c0155f2c>] (default_idle_call) from [<c0156088>] (cpu_startup_entry+0x128/0x17c)
[  400.957620] [<c0155f60>] (cpu_startup_entry) from [<c04a3f54>] (rest_init+0x8c/0x90)
[  400.965400]  r7:ffffffff r4:00000002
[  400.969005] [<c04a3ec8>] (rest_init) from [<c0700cb4>] (start_kernel+0x310/0x31c)
[  400.976522]  r5:c081e4c0 r4:00000001
[  400.980121] [<c07009a4>] (start_kernel) from [<8000807c>] (0x8000807c)
[  400.986716] ---[ end trace f8deb50d1b3d3c7a ]---


Did I implement something incorrectly?

Regards.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 13:33 WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc Mason
@ 2016-07-05 14:50 ` Mason
  2016-07-05 15:28   ` Florian Fainelli
  0 siblings, 1 reply; 9+ messages in thread
From: Mason @ 2016-07-05 14:50 UTC (permalink / raw)
  To: linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 05/07/2016 15:33, Mason wrote:

> I was testing suspend/resume sequences where the suspend operation
> fails and returns without having suspended the platform.
> 
> # echo mem > /sys/power/state
> [   90.322264] PM: Syncing filesystems ... done.
> [   90.328758] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [   90.337092] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
> [   90.346765] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   90.355357] Suspending console(s) (use no_console_suspend to debug)
> [   90.364590] PM: suspend of devices complete after 2.068 msecs
> [   90.365554] PM: late suspend of devices complete after 0.954 msecs
> [   90.366223] PM: noirq suspend of devices complete after 0.662 msecs
> [   90.366227] Disabling non-boot CPUs ...
> [   90.379004] CPU1: shutdown
> [   90.412661] Enabling non-boot CPUs ...
> [   90.450385] CPU1 is up
> [   90.450979] PM: noirq resume of devices complete after 0.584 msecs
> [   90.451672] PM: early resume of devices complete after 0.667 msecs
> [   90.453149] nb8800 26000.ethernet eth0: Link is Down
> [   90.453264] PM: resume of devices complete after 1.583 msecs
> [   90.508180] Restarting tasks ... done.
> -sh: echo: write error: Input/output error
> [   93.860411] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> 
> (The error message is expected, as my suspend routine returns -EIO
> on failure.)
> 
> I left the system to idle at the prompt; then 5 minutes later,
> the system printed the following trace.
> 
> [  400.718491] ------------[ cut here ]------------
> [  400.723175] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
> [  400.731582] Modules linked in:
> [  400.734689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6-00010-gd07031bdc433 #1
> [  400.742646] Hardware name: Sigma Tango DT
> [  400.746671] Backtrace: 
> [  400.749141] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
> [  400.756747]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
> [  400.762454] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
> [  400.769722] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
> [  400.776717]  r7:00000009 r6:c05e3fbc r5:00000000 r4:00000000
> [  400.782417] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
> [  400.790022]  r9:dfbdd4e0 r8:0000000a r7:c0801de8 r6:df6f9514 r5:df5df144 r4:df5df040
> [  400.797825] [<c011c070>] (warn_slowpath_null) from [<c0463c04>] (inet_sock_destruct+0x1c4/0x1dc)
> [  400.806661] [<c0463a40>] (inet_sock_destruct) from [<c03e9c60>] (__sk_destruct+0x28/0xe0)
> [  400.814878]  r7:c0801de8 r6:df6f9514 r5:df5df040 r4:df5df1ec
> [  400.820584] [<c03e9c38>] (__sk_destruct) from [<c016f230>] (rcu_process_callbacks+0x488/0x59c)
> [  400.829237]  r5:00000000 r4:00000000
> [  400.832836] [<c016eda8>] (rcu_process_callbacks) from [<c01207e4>] (__do_softirq+0x138/0x264)
> [  400.841402]  r10:c08020a0 r9:40000001 r8:00000101 r7:c0800000 r6:c08020a4 r5:00000009
> [  400.849285]  r4:00000000
> [  400.851829] [<c01206ac>] (__do_softirq) from [<c0120c04>] (irq_exit+0xc8/0x104)
> [  400.859172]  r10:c0801f10 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
> [  400.867053]  r4:c0735428
> [  400.869601] [<c0120b3c>] (irq_exit) from [<c0162610>] (__handle_domain_irq+0x88/0xf4)
> [  400.877473] [<c0162588>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
> [  400.885865]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c r5:c080277c
> [  400.893747]  r4:c080eca0 r3:c0801f10
> [  400.897342] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
> [  400.904861] Exception stack(0xc0801f10 to 0xc0801f58)
> [  400.909936] 1f00:                                     00000000 00000000 0000826a c0117c80
> [  400.918156] 1f20: c0800000 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 dfffcdc0 c0801f6c
> [  400.926376] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 60000013 ffffffff
> [  400.933020]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6:ffffffff r5:60000013 r4:c01086b4
> [  400.940826] [<c0108674>] (arch_cpu_idle) from [<c0155f54>] (default_idle_call+0x28/0x34)
> [  400.948960] [<c0155f2c>] (default_idle_call) from [<c0156088>] (cpu_startup_entry+0x128/0x17c)
> [  400.957620] [<c0155f60>] (cpu_startup_entry) from [<c04a3f54>] (rest_init+0x8c/0x90)
> [  400.965400]  r7:ffffffff r4:00000002
> [  400.969005] [<c04a3ec8>] (rest_init) from [<c0700cb4>] (start_kernel+0x310/0x31c)
> [  400.976522]  r5:c081e4c0 r4:00000001
> [  400.980121] [<c07009a4>] (start_kernel) from [<8000807c>] (0x8000807c)
> [  400.986716] ---[ end trace f8deb50d1b3d3c7a ]---


NB: The warning shows up 310 seconds after the suspend attempt.

I rebooted, tried the same operation, and hit the same warning
still 310 seconds later:

# echo mem > /sys/power/state
[   25.665905] PM: Syncing filesystems ... done.
[   25.672102] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   25.680807] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds) 
[   25.690494] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   25.699118] Suspending console(s) (use no_console_suspend to debug)
[   25.707899] PM: suspend of devices complete after 1.639 msecs
[   25.708796] PM: late suspend of devices complete after 0.887 msecs
[   25.709460] PM: noirq suspend of devices complete after 0.657 msecs
[   25.709465] Disabling non-boot CPUs ...
[   25.729045] CPU1: shutdown
[   25.762704] Enabling non-boot CPUs ...
[   25.800416] CPU1 is up
[   25.801024] PM: noirq resume of devices complete after 0.595 msecs
[   25.801730] PM: early resume of devices complete after 0.678 msecs
[   25.803194] nb8800 26000.ethernet eth0: Link is Down
[   25.803311] PM: resume of devices complete after 1.571 msecs
[   25.858245] Restarting tasks ... done.
-sh: echo: write error: Input/output error
[   29.186902] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

[  335.865192] ------------[ cut here ]------------
[  335.869875] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
[  335.878284] Modules linked in:
[  335.881366] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6-00010-gd07031bdc433 #1
[  335.889321] Hardware name: Sigma Tango DT
[  335.893346] Backtrace: 
[  335.895815] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
[  335.903420]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
[  335.909125] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
[  335.916391] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
[  335.923386]  r7:00000009 r6:c05e3fbc r5:00000000 r4:00000000
[  335.929086] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
[  335.936691]  r9:dfbdd4e0 r8:0000000a r7:c0801de8 r6:df75ec54 r5:df5af3c4 r4:df5af2c0
[  335.944491] [<c011c070>] (warn_slowpath_null) from [<c0463c04>] (inet_sock_destruct+0x1c4/0x1dc)
[  335.953326] [<c0463a40>] (inet_sock_destruct) from [<c03e9c60>] (__sk_destruct+0x28/0xe0)
[  335.961542]  r7:c0801de8 r6:df75ec54 r5:df5af2c0 r4:df5af46c
[  335.967248] [<c03e9c38>] (__sk_destruct) from [<c016f230>] (rcu_process_callbacks+0x488/0x59c)
[  335.975901]  r5:00000000 r4:00000000
[  335.979499] [<c016eda8>] (rcu_process_callbacks) from [<c01207e4>] (__do_softirq+0x138/0x264)
[  335.988065]  r10:c08020a0 r9:40000001 r8:00000101 r7:c0800000 r6:c08020a4 r5:00000009
[  335.995947]  r4:00000000
[  335.998492] [<c01206ac>] (__do_softirq) from [<c0120c04>] (irq_exit+0xc8/0x104)
[  336.005835]  r10:c0801f10 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
[  336.013716]  r4:c0735428
[  336.016264] [<c0120b3c>] (irq_exit) from [<c0162610>] (__handle_domain_irq+0x88/0xf4)
[  336.024136] [<c0162588>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
[  336.032526]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c r5:c080277c
[  336.040406]  r4:c080eca0 r3:c0801f10
[  336.044001] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
[  336.051520] Exception stack(0xc0801f10 to 0xc0801f58)
[  336.056594] 1f00:                                     00000000 00000000 000079b2 c0117c80
[  336.064815] 1f20: c0800000 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 dfffcdc0 c0801f6c
[  336.073034] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 60000013 ffffffff
[  336.079678]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6:ffffffff r5:60000013 r4:c01086b4
[  336.087483] [<c0108674>] (arch_cpu_idle) from [<c0155f54>] (default_idle_call+0x28/0x34)
[  336.095616] [<c0155f2c>] (default_idle_call) from [<c0156088>] (cpu_startup_entry+0x128/0x17c)
[  336.104277] [<c0155f60>] (cpu_startup_entry) from [<c04a3f54>] (rest_init+0x8c/0x90)
[  336.112057]  r7:ffffffff r4:00000002
[  336.115662] [<c04a3ec8>] (rest_init) from [<c0700cb4>] (start_kernel+0x310/0x31c)
[  336.123180]  r5:c081e4c0 r4:00000001
[  336.126777] [<c07009a4>] (start_kernel) from [<8000807c>] (0x8000807c)
[  336.133349] ---[ end trace d6b09977089e89b4 ]---


Does this 310 second lag ring a bell for anyone?

Regards.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 14:50 ` Mason
@ 2016-07-05 15:28   ` Florian Fainelli
  2016-07-05 15:56     ` Mason
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Fainelli @ 2016-07-05 15:28 UTC (permalink / raw)
  To: Mason, linux-pm, netdev, LKML; +Cc: Sebastian Frias

Le 05/07/2016 07:50, Mason a écrit :
> On 05/07/2016 15:33, Mason wrote:
> 
>> I was testing suspend/resume sequences where the suspend operation
>> fails and returns without having suspended the platform.
>>
>> # echo mem > /sys/power/state
>> [   90.322264] PM: Syncing filesystems ... done.
>> [   90.328758] Freezing user space processes ... (elapsed 0.001 seconds) done.
>> [   90.337092] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
>> [   90.346765] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> [   90.355357] Suspending console(s) (use no_console_suspend to debug)
>> [   90.364590] PM: suspend of devices complete after 2.068 msecs
>> [   90.365554] PM: late suspend of devices complete after 0.954 msecs
>> [   90.366223] PM: noirq suspend of devices complete after 0.662 msecs
>> [   90.366227] Disabling non-boot CPUs ...
>> [   90.379004] CPU1: shutdown
>> [   90.412661] Enabling non-boot CPUs ...
>> [   90.450385] CPU1 is up
>> [   90.450979] PM: noirq resume of devices complete after 0.584 msecs
>> [   90.451672] PM: early resume of devices complete after 0.667 msecs
>> [   90.453149] nb8800 26000.ethernet eth0: Link is Down
>> [   90.453264] PM: resume of devices complete after 1.583 msecs
>> [   90.508180] Restarting tasks ... done.
>> -sh: echo: write error: Input/output error
>> [   93.860411] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
>>
>> (The error message is expected, as my suspend routine returns -EIO
>> on failure.)
>>
>> I left the system to idle at the prompt; then 5 minutes later,
>> the system printed the following trace.
>>
>> [  400.718491] ------------[ cut here ]------------
>> [  400.723175] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
>> [  400.731582] Modules linked in:
>> [  400.734689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6-00010-gd07031bdc433 #1
>> [  400.742646] Hardware name: Sigma Tango DT
>> [  400.746671] Backtrace: 
>> [  400.749141] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
>> [  400.756747]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
>> [  400.762454] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
>> [  400.769722] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
>> [  400.776717]  r7:00000009 r6:c05e3fbc r5:00000000 r4:00000000
>> [  400.782417] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
>> [  400.790022]  r9:dfbdd4e0 r8:0000000a r7:c0801de8 r6:df6f9514 r5:df5df144 r4:df5df040
>> [  400.797825] [<c011c070>] (warn_slowpath_null) from [<c0463c04>] (inet_sock_destruct+0x1c4/0x1dc)
>> [  400.806661] [<c0463a40>] (inet_sock_destruct) from [<c03e9c60>] (__sk_destruct+0x28/0xe0)
>> [  400.814878]  r7:c0801de8 r6:df6f9514 r5:df5df040 r4:df5df1ec
>> [  400.820584] [<c03e9c38>] (__sk_destruct) from [<c016f230>] (rcu_process_callbacks+0x488/0x59c)
>> [  400.829237]  r5:00000000 r4:00000000
>> [  400.832836] [<c016eda8>] (rcu_process_callbacks) from [<c01207e4>] (__do_softirq+0x138/0x264)
>> [  400.841402]  r10:c08020a0 r9:40000001 r8:00000101 r7:c0800000 r6:c08020a4 r5:00000009
>> [  400.849285]  r4:00000000
>> [  400.851829] [<c01206ac>] (__do_softirq) from [<c0120c04>] (irq_exit+0xc8/0x104)
>> [  400.859172]  r10:c0801f10 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
>> [  400.867053]  r4:c0735428
>> [  400.869601] [<c0120b3c>] (irq_exit) from [<c0162610>] (__handle_domain_irq+0x88/0xf4)
>> [  400.877473] [<c0162588>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
>> [  400.885865]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c r5:c080277c
>> [  400.893747]  r4:c080eca0 r3:c0801f10
>> [  400.897342] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
>> [  400.904861] Exception stack(0xc0801f10 to 0xc0801f58)
>> [  400.909936] 1f00:                                     00000000 00000000 0000826a c0117c80
>> [  400.918156] 1f20: c0800000 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 dfffcdc0 c0801f6c
>> [  400.926376] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 60000013 ffffffff
>> [  400.933020]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6:ffffffff r5:60000013 r4:c01086b4
>> [  400.940826] [<c0108674>] (arch_cpu_idle) from [<c0155f54>] (default_idle_call+0x28/0x34)
>> [  400.948960] [<c0155f2c>] (default_idle_call) from [<c0156088>] (cpu_startup_entry+0x128/0x17c)
>> [  400.957620] [<c0155f60>] (cpu_startup_entry) from [<c04a3f54>] (rest_init+0x8c/0x90)
>> [  400.965400]  r7:ffffffff r4:00000002
>> [  400.969005] [<c04a3ec8>] (rest_init) from [<c0700cb4>] (start_kernel+0x310/0x31c)
>> [  400.976522]  r5:c081e4c0 r4:00000001
>> [  400.980121] [<c07009a4>] (start_kernel) from [<8000807c>] (0x8000807c)
>> [  400.986716] ---[ end trace f8deb50d1b3d3c7a ]---
> 
> 
> NB: The warning shows up 310 seconds after the suspend attempt.
> 
> I rebooted, tried the same operation, and hit the same warning
> still 310 seconds later:
> 
> # echo mem > /sys/power/state
> [   25.665905] PM: Syncing filesystems ... done.
> [   25.672102] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [   25.680807] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds) 
> [   25.690494] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   25.699118] Suspending console(s) (use no_console_suspend to debug)
> [   25.707899] PM: suspend of devices complete after 1.639 msecs
> [   25.708796] PM: late suspend of devices complete after 0.887 msecs
> [   25.709460] PM: noirq suspend of devices complete after 0.657 msecs
> [   25.709465] Disabling non-boot CPUs ...
> [   25.729045] CPU1: shutdown
> [   25.762704] Enabling non-boot CPUs ...
> [   25.800416] CPU1 is up
> [   25.801024] PM: noirq resume of devices complete after 0.595 msecs
> [   25.801730] PM: early resume of devices complete after 0.678 msecs
> [   25.803194] nb8800 26000.ethernet eth0: Link is Down
> [   25.803311] PM: resume of devices complete after 1.571 msecs
> [   25.858245] Restarting tasks ... done.
> -sh: echo: write error: Input/output error
> [   29.186902] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

nb8800.c does not currently show suspend/resume hooks implemented, are
you positive that when you suspend, you properly tear down all HW, stop
transmit queues, etc. and do the opposite upon resumption?

Is your system clocksource also correctly saved/restored, or if you go
through a firmware in-between could it be changing the counter values
and make Linux think that more time as elapsed than it really happened?
-- 
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 15:28   ` Florian Fainelli
@ 2016-07-05 15:56     ` Mason
  2016-07-05 16:20       ` Florian Fainelli
  0 siblings, 1 reply; 9+ messages in thread
From: Mason @ 2016-07-05 15:56 UTC (permalink / raw)
  To: Florian Fainelli, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 05/07/2016 17:28, Florian Fainelli wrote:

> nb8800.c does not currently show suspend/resume hooks implemented, are
> you positive that when you suspend, you properly tear down all HW, stop
> transmit queues, etc. and do the opposite upon resumption?

I am currently testing the error path for my suspend routine.
Firmware is, in fact, denying the suspend request, and immediately
returns control to Linux, without having powered anything down.

I expected not having to save any context in that situation.
Am I mistaken?

You mention "stop transmit queues". Can you say more about this?

> Is your system clocksource also correctly saved/restored, or if you go
> through a firmware in-between could it be changing the counter values
> and make Linux think that more time as elapsed than it really happened?

Thanks for pointing this out, I was not aware I was supposed to save
and restore the tick counter on suspend/resume. (This is not an issue
in this specific situation, as the platform is NOT suspended.)

However, your remark has brought some more confusion to my mind.
Linux is expecting time to stand still when it suspends?
What if the tick counter is in an always-on power domain, and other
processors depend on the counter? I can't just overwrite the reg
when Linux resumes...

Regards.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 15:56     ` Mason
@ 2016-07-05 16:20       ` Florian Fainelli
  2016-07-05 20:26         ` Mason
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Fainelli @ 2016-07-05 16:20 UTC (permalink / raw)
  To: Mason, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 07/05/2016 08:56 AM, Mason wrote:
> On 05/07/2016 17:28, Florian Fainelli wrote:
> 
>> nb8800.c does not currently show suspend/resume hooks implemented, are
>> you positive that when you suspend, you properly tear down all HW, stop
>> transmit queues, etc. and do the opposite upon resumption?
> 
> I am currently testing the error path for my suspend routine.
> Firmware is, in fact, denying the suspend request, and immediately
> returns control to Linux, without having powered anything down.
> 
> I expected not having to save any context in that situation.
> Am I mistaken?

It depends what power state you are going to and resuming from, and how
much of this is platform dependent, on the platforms I work with S2
preserves register states for our On/Off domain, while S3 only keeps an
always-on power island and shuts off the On/Off domain, you therefore
need to have your drivers in the On/Off domain suspend any activity and
preserve important register states, or re-initialize them from scratch
whichever is the most convenient.


> 
> You mention "stop transmit queues". Can you say more about this?

See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
that takes care of that for instance, look for bcmgenet_{suspend,resume}

> 
>> Is your system clocksource also correctly saved/restored, or if you go
>> through a firmware in-between could it be changing the counter values
>> and make Linux think that more time as elapsed than it really happened?
> 
> Thanks for pointing this out, I was not aware I was supposed to save
> and restore the tick counter on suspend/resume. (This is not an issue
> in this specific situation, as the platform is NOT suspended.)

You don't have to save and restore the clocksource counter, although if
you want proper time accounting to be done across suspend states, you
would want to use a clocksource which is persistent across these suspend
states.

> 
> However, your remark has brought some more confusion to my mind.
> Linux is expecting time to stand still when it suspends?
> What if the tick counter is in an always-on power domain, and other
> processors depend on the counter? I can't just overwrite the reg
> when Linux resumes...

The point is more that if the firmware initializes the timer, or even
re-initializes it, Linux could think that events expired because the
timebase has a big offset compared to where it was. Just pointing out
that this *could* be a problem. If your timer is in the always on domain
and your firmware does not touch it, that should be fine without
anything specific (except adding an "always-on" boolean property to the
timer nodes in DT maybe).
-- 
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 16:20       ` Florian Fainelli
@ 2016-07-05 20:26         ` Mason
  2016-07-05 21:22           ` Florian Fainelli
  0 siblings, 1 reply; 9+ messages in thread
From: Mason @ 2016-07-05 20:26 UTC (permalink / raw)
  To: Florian Fainelli, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 05/07/2016 18:20, Florian Fainelli wrote:
> On 07/05/2016 08:56 AM, Mason wrote:
>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>
>>> nb8800.c does not currently show suspend/resume hooks implemented, are
>>> you positive that when you suspend, you properly tear down all HW, stop
>>> transmit queues, etc. and do the opposite upon resumption?
>>
>> I am currently testing the error path for my suspend routine.
>> Firmware is, in fact, denying the suspend request, and immediately
>> returns control to Linux, without having powered anything down.
>>
>> I expected not having to save any context in that situation.
>> Am I mistaken?
> 
> It depends what power state you are going to and resuming from, and how
> much of this is platform dependent, on the platforms I work with S2
> preserves register states for our On/Off domain, while S3 only keeps an
> always-on power island and shuts off the On/Off domain, you therefore
> need to have your drivers in the On/Off domain suspend any activity and
> preserve important register states, or re-initialize them from scratch
> whichever is the most convenient.

Thanks for bringing these details to my attention, they will
definitely prove useful when I test an actual suspend/resume
sequence. However, I must stress that the platform did NOT
power down in my test case, because the firmware currently
denies all suspend requests.

Therefore, loss of context cannot possibly explain the
warning I am seeing.

>> You mention "stop transmit queues". Can you say more about this?
> 
> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
> that takes care of that for instance, look for bcmgenet_{suspend,resume}

Thanks. I will look into it.

If I understand correctly, something is missing in the
network interface code? (My system is using an NFS root
filesystem, so network is an important subsystem.)

>>> Is your system clocksource also correctly saved/restored, or if you go
>>> through a firmware in-between could it be changing the counter values
>>> and make Linux think that more time as elapsed than it really happened?
>>
>> Thanks for pointing this out, I was not aware I was supposed to save
>> and restore the tick counter on suspend/resume. (This is not an issue
>> in this specific situation, as the platform is NOT suspended.)
> 
> You don't have to save and restore the clocksource counter, although if
> you want proper time accounting to be done across suspend states, you
> would want to use a clocksource which is persistent across these suspend
> states.

The clocksource is a 27 MHz 32-bit tick counter. In other words,
the counter wraps around every 159 seconds. If Linux suspends
for several hours, how can it determine how much time went by?

Regards.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 20:26         ` Mason
@ 2016-07-05 21:22           ` Florian Fainelli
  2016-07-05 21:51             ` Mason
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Fainelli @ 2016-07-05 21:22 UTC (permalink / raw)
  To: Mason, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 07/05/2016 01:26 PM, Mason wrote:
> On 05/07/2016 18:20, Florian Fainelli wrote:
>> On 07/05/2016 08:56 AM, Mason wrote:
>>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>>
>>>> nb8800.c does not currently show suspend/resume hooks implemented, are
>>>> you positive that when you suspend, you properly tear down all HW, stop
>>>> transmit queues, etc. and do the opposite upon resumption?
>>>
>>> I am currently testing the error path for my suspend routine.
>>> Firmware is, in fact, denying the suspend request, and immediately
>>> returns control to Linux, without having powered anything down.
>>>
>>> I expected not having to save any context in that situation.
>>> Am I mistaken?
>>
>> It depends what power state you are going to and resuming from, and how
>> much of this is platform dependent, on the platforms I work with S2
>> preserves register states for our On/Off domain, while S3 only keeps an
>> always-on power island and shuts off the On/Off domain, you therefore
>> need to have your drivers in the On/Off domain suspend any activity and
>> preserve important register states, or re-initialize them from scratch
>> whichever is the most convenient.
> 
> Thanks for bringing these details to my attention, they will
> definitely prove useful when I test an actual suspend/resume
> sequence. However, I must stress that the platform did NOT
> power down in my test case, because the firmware currently
> denies all suspend requests.
> 
> Therefore, loss of context cannot possibly explain the
> warning I am seeing.

No, but if you go all the way down to trying to suspend and the last
step is the firmware failing, anything you have suspended needs to be
unwinded, for your ethernet driver that means that you went through a
successful suspend then resume cycle even if it failed down later when
the platform attempted to suspend.

> 
>>> You mention "stop transmit queues". Can you say more about this?
>>
>> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
>> that takes care of that for instance, look for bcmgenet_{suspend,resume}
> 
> Thanks. I will look into it.
> 
> If I understand correctly, something is missing in the
> network interface code? (My system is using an NFS root
> filesystem, so network is an important subsystem.)

The typical things are detaching the network device and stopping
transmit queues, but without knowing what changes you have done to
nb8800.c, hard to tell what else is needed.

> 
>>>> Is your system clocksource also correctly saved/restored, or if you go
>>>> through a firmware in-between could it be changing the counter values
>>>> and make Linux think that more time as elapsed than it really happened?
>>>
>>> Thanks for pointing this out, I was not aware I was supposed to save
>>> and restore the tick counter on suspend/resume. (This is not an issue
>>> in this specific situation, as the platform is NOT suspended.)
>>
>> You don't have to save and restore the clocksource counter, although if
>> you want proper time accounting to be done across suspend states, you
>> would want to use a clocksource which is persistent across these suspend
>> states.
> 
> The clocksource is a 27 MHz 32-bit tick counter. In other words,
> the counter wraps around every 159 seconds. If Linux suspends
> for several hours, how can it determine how much time went by?

Well, that's unfortunate, then you are pretty much either doomed to
accepting to lose time in between and rely on e.g: NTP to resync your
time upon resumption, or, if you had smarter hardware you could have a
prescaler or something that makes this counter wrap far ahead (like
years or days after).
-- 
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 21:22           ` Florian Fainelli
@ 2016-07-05 21:51             ` Mason
  2016-07-05 21:55               ` Florian Fainelli
  0 siblings, 1 reply; 9+ messages in thread
From: Mason @ 2016-07-05 21:51 UTC (permalink / raw)
  To: Florian Fainelli, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 05/07/2016 23:22, Florian Fainelli wrote:
> On 07/05/2016 01:26 PM, Mason wrote:
>> On 05/07/2016 18:20, Florian Fainelli wrote:
>>> On 07/05/2016 08:56 AM, Mason wrote:
>>>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>>>
>>>>> nb8800.c does not currently show suspend/resume hooks implemented, are
>>>>> you positive that when you suspend, you properly tear down all HW, stop
>>>>> transmit queues, etc. and do the opposite upon resumption?
>>>>
>>>> I am currently testing the error path for my suspend routine.
>>>> Firmware is, in fact, denying the suspend request, and immediately
>>>> returns control to Linux, without having powered anything down.
>>>>
>>>> I expected not having to save any context in that situation.
>>>> Am I mistaken?
>>>
>>> It depends what power state you are going to and resuming from, and how
>>> much of this is platform dependent, on the platforms I work with S2
>>> preserves register states for our On/Off domain, while S3 only keeps an
>>> always-on power island and shuts off the On/Off domain, you therefore
>>> need to have your drivers in the On/Off domain suspend any activity and
>>> preserve important register states, or re-initialize them from scratch
>>> whichever is the most convenient.
>>
>> Thanks for bringing these details to my attention, they will
>> definitely prove useful when I test an actual suspend/resume
>> sequence. However, I must stress that the platform did NOT
>> power down in my test case, because the firmware currently
>> denies all suspend requests.
>>
>> Therefore, loss of context cannot possibly explain the
>> warning I am seeing.
> 
> No, but if you go all the way down to trying to suspend and the last
> step is the firmware failing, anything you have suspended needs to be
> unwinded, for your ethernet driver that means that you went through a
> successful suspend then resume cycle even if it failed down later when
> the platform attempted to suspend.

So it is the driver's responsibility to "shut down" on resume?
(I had the vague impression that the suspend framework would
"disable" the device through the appropriate callback.)

>>> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
>>> that takes care of that for instance, look for bcmgenet_{suspend,resume}
>>
>> Thanks. I will look into it.
>>
>> If I understand correctly, something is missing in the
>> network interface code? (My system is using an NFS root
>> filesystem, so network is an important subsystem.)
> 
> The typical things are detaching the network device and stopping
> transmit queues, but without knowing what changes you have done to
> nb8800.c, hard to tell what else is needed.

I'm using the driver unaltered. So I guess I need to figure out
the exact steps required for suspending a network device.
(I'll look at bcmgenet.c tomorrow.)

>>>>> Is your system clocksource also correctly saved/restored, or if you go
>>>>> through a firmware in-between could it be changing the counter values
>>>>> and make Linux think that more time as elapsed than it really happened?
>>>>
>>>> Thanks for pointing this out, I was not aware I was supposed to save
>>>> and restore the tick counter on suspend/resume. (This is not an issue
>>>> in this specific situation, as the platform is NOT suspended.)
>>>
>>> You don't have to save and restore the clocksource counter, although if
>>> you want proper time accounting to be done across suspend states, you
>>> would want to use a clocksource which is persistent across these suspend
>>> states.
>>
>> The clocksource is a 27 MHz 32-bit tick counter. In other words,
>> the counter wraps around every 159 seconds. If Linux suspends
>> for several hours, how can it determine how much time went by?
> 
> Well, that's unfortunate, then you are pretty much either doomed to
> accepting to lose time in between and rely on e.g: NTP to resync your
> time upon resumption, or, if you had smarter hardware you could have a
> prescaler or something that makes this counter wrap far ahead (like
> years or days after).

Maybe the hardware devs thought of that problem, because they
"widened" the counter to 64 bits on newer platforms.

Regards.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 21:51             ` Mason
@ 2016-07-05 21:55               ` Florian Fainelli
  0 siblings, 0 replies; 9+ messages in thread
From: Florian Fainelli @ 2016-07-05 21:55 UTC (permalink / raw)
  To: Mason, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 07/05/2016 02:51 PM, Mason wrote:
>>> Therefore, loss of context cannot possibly explain the
>>> warning I am seeing.
>>
>> No, but if you go all the way down to trying to suspend and the last
>> step is the firmware failing, anything you have suspended needs to be
>> unwinded, for your ethernet driver that means that you went through a
>> successful suspend then resume cycle even if it failed down later when
>> the platform attempted to suspend.
> 
> So it is the driver's responsibility to "shut down" on resume?

It is the driver responsibility to know how to suspend and resume a
device it manages, and it does that by implementing appropriate
suspend/resume callbacks.

> (I had the vague impression that the suspend framework would
> "disable" the device through the appropriate callback.)

The suspend framework knows which drivers implement suspend/resume and
calls them appropriately (based on parenting/bus hierarchy), but it
won't automatigally do anything because there is no such thing as magic
when it comes to suspending hardware, this needs to be a controlled
sequence.
-- 
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-05 21:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-05 13:33 WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc Mason
2016-07-05 14:50 ` Mason
2016-07-05 15:28   ` Florian Fainelli
2016-07-05 15:56     ` Mason
2016-07-05 16:20       ` Florian Fainelli
2016-07-05 20:26         ` Mason
2016-07-05 21:22           ` Florian Fainelli
2016-07-05 21:51             ` Mason
2016-07-05 21:55               ` Florian Fainelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).