* [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
@ 2015-08-12 18:25 Grygorii Strashko
2015-08-16 11:42 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 6+ messages in thread
From: Grygorii Strashko @ 2015-08-12 18:25 UTC (permalink / raw)
To: linux-arm-kernel
Hi All,
I can constantly see below error report with RT-kernel on TI ARM dra7-evm
if I'm trying to do Suspend to RAM.
Sometimes, It's displayed alone (Log1) and sometimes in combination with
"INFO: suspicious RCU usage" (Log2).
Below error was not reproduced on vanilla kernel (4.2-rc6).
I don't know if there are any relations, but on vanilla kernel (4.2-rc6)
i can see issue listed in log 3 (just in case).
I'm not familiar with affected code parts and would be appreciated for any comments.
===== LOG 1 =====
Disabling non-boot CPUs ...
PM: noirq suspend of devices complete after 7.295 msecs
[ 100.285729] Disabling non-boot CPUs ...
[ 100.287575] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
[ 100.287580] in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
[ 100.287583] INFO: lockdep is turned off.
[ 100.287586] irq event stamp: 122
[ 100.287600] hardirqs last enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
[ 100.287609] hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
[ 100.287620] softirqs last enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8
[ 100.287625] softirqs last disabled at (0): [< (null)>] (null)
[ 100.287630] Preemption disabled at:[< (null)>] (null)
[ 100.287631]
[ 100.287639] CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
[ 100.287642] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 100.287659] [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
[ 100.287671] [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
[ 100.287681] [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
[ 100.287689] [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
[ 100.287699] [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
[ 100.287710] [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
[ 100.287720] [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
[ 100.287728] [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
[ 100.287742] [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
[ 100.287754] [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
[ 100.287767] [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
[ 100.287779] [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
[ 100.287791] [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
[ 100.288114] CPU1: shutdown
[ 100.317696] PM: Calling sched_clock_suspend+0x0/0x40
[ 100.317698] PM: Calling timekeeping_suspend+0x0/0x2e0
[ 100.317698] PM: Calling irq_gc_suspend+0x0/0x68
[ 100.317698] PM: Calling fw_suspend+0x0/0x2c
[ 100.317698] PM: Calling cpu_pm_suspend+0x0/0x28
==== LOG 2 ====
[ 156.125122] PM: noirq suspend of devices complete after 16.499 msecs
[ 156.125250] Disabling non-boot CPUs ...
[ 156.153626] NOHZ: local_softirq_pending 02
[ 156.154939] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
[ 156.154943] in_atomic(): 1, irissueqs_disabled(): 128, pid: 18, name: migration/1
[ 156.154948] 2 locks held by migration/1/18:
[ 156.154974] #0: (rcu_read_lock){......}, at: [<c004093c>] clear_tasks_mm_cpumask+0x44/0x1ac
[ 156.154992] #1: (rcu_read_lock){......}, at: [<c013f6f4>] find_lock_task_mm+0x0/0x174
[ 156.154995] irq event stamp: 126
[ 156.155005] hardirqs last enabled at (125): [<c06ac0e8>] _raw_spin_unlock_irq+0x34/0x78
[ 156.155016] hardirqs last disabled at (126): [<c00f25d8>] multi_cpu_stop+0xf4/0x158
[ 156.155025] softirqs last enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8
[ 156.155031] softirqs last disabled at (0): [< (null)>] (null)
[ 156.155036] Preemption disabled at:[< (null)>] (null)
[ 156.155038]
[ 156.155046] CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
[ 156.155049] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 156.155062] [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
[ 156.155074] [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
[ 156.155084] [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
[ 156.155091] [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
[ 156.155099] [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
[ 156.155109] [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
[ 156.155118] [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
[ 156.155126] [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
[ 156.155136] [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
[ 156.155147] [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
[ 156.155158] [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
[ 156.155170] [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
[ 156.155181] [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
[ 156.156643]
[ 156.156646] ===============================
[ 156.156648] [ INFO: suspicious RCU usage. ]
[ 156.156653] 4.1.4-rt3-01046-g96ac8da #204 Tainted: G W
[ 156.156655] -------------------------------
[ 156.156660] include/trace/events/power.h:297 suspicious rcu_dereference_check() usage!
[ 156.156662]
[ 156.156662] other info that might help us debug this:
[ 156.156662]
[ 156.156666]
[ 156.156666] RCU used illegally from offline CPU!
[ 156.156666] rcu_scheduler_active = 1, debug_locks = 0
[ 156.156670] no locks held by swapper/1/0.
[ 156.156671]
[ 156.156671] stack backtrace:
[ 156.156678] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
[ 156.156681] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 156.156692] [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
[ 156.156702] [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
[ 156.156715] [<c06a70f4>] (dump_stack) from [<c0091258>] (lockdep_rcu_suspicious+0xbc/0x11c)
[ 156.156728] [<c0091258>] (lockdep_rcu_suspicious) from [<c00334d0>] (pwrdm_set_next_pwrst+0x134/0x2ec)
[ 156.156740] [<c00334d0>] (pwrdm_set_next_pwrst) from [<c002f8ac>] (omap4_hotplug_cpu+0x68/0xd4)
[ 156.156751] [<c002f8ac>] (omap4_hotplug_cpu) from [<c06a2d48>] (omap4_cpu_die+0x54/0x9c)
[ 156.156759] [<c06a2d48>] (omap4_cpu_die) from [<c06a2cac>] (cpu_die+0x5c/0xa4)
[ 156.156769] [<c06a2cac>] (cpu_die) from [<c0010bdc>] (arch_cpu_idle_dead+0x18/0x1c)
[ 156.156780] [<c0010bdc>] (arch_cpu_idle_dead) from [<c008d7d4>] (cpu_startup_entry+0x22c/0x4f0)
[ 156.156789] [<c008d7d4>] (cpu_startup_entry) from [<c001688c>] (secondary_start_kernel+0x130/0x14c)
[ 156.156792] CPU1: shutdown
[ 156.156799] [<c001688c>] (secondary_start_kernel) from [<8000970c>] (0x8000970c)
[ 156.186995] PM: Calling sched_clock_suspend+0x0/0x40
[ 156.186997] PM: Calling timekeeping_suspend+0x0/0x2e0
[ 156.186997] PM: Calling irq_gc_suspend+0x0/0x68
[ 156.186997] PM: Calling fw_suspend+0x0/0x2c
[ 156.186997] PM: Calling cpu_pm_suspend+0x0/0x28
===== LOG 3 =====
[ 8.099469] PM: suspend of devices complete after 7.968 msecs
[ 8.103121] PM: late suspend of devices complete after 3.642 msecs
[ 8.114611] PM: noirq suspend of devices complete after 11.481 msecs
[ 8.114670] Disabling non-boot CPUs ...
[ 8.119535]
[ 8.119538] ===============================
[ 8.119540] [ INFO: suspicious RCU usage. ]
[ 8.119545] 4.2.0-rc6-00011-g4a7ab45-dirty #50 Not tainted
[ 8.119548] -------------------------------
[ 8.119552] kernel/sched/fair.c:5032 suspicious rcu_dereference_check() usage!
[ 8.119554]
[ 8.119554] other info that might help us debug this:
[ 8.119554]
[ 8.119558]
[ 8.119558] RCU used illegally from offline CPU!
[ 8.119558] rcu_scheduler_active = 1, debug_locks = 0
[ 8.119562] 3 locks held by swapper/1/0:
[ 8.119585] #0: ((cpu_died).wait.lock){......}, at: [<c00837a4>] complete+0x14/0x44
[ 8.119606] #1: (&p->pi_lock){-.-.-.}, at: [<c006ab00>] try_to_wake_up+0x2c/0x410
[ 8.119622] #2: (rcu_read_lock){......}, at: [<c0072c84>] select_task_rq_fair+0x60/0xae0
[ 8.119625]
[ 8.119625] stack backtrace:
[ 8.119631] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc6-00011-g4a7ab45-dirty #50
[ 8.119634] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 8.119646] [<c0016e24>] (unwind_backtrace) from [<c0013338>] (show_stack+0x10/0x14)
[ 8.119656] [<c0013338>] (show_stack) from [<c05f6b2c>] (dump_stack+0x84/0x9c)
[ 8.119665] [<c05f6b2c>] (dump_stack) from [<c0073338>] (select_task_rq_fair+0x714/0xae0)
[ 8.119675] [<c0073338>] (select_task_rq_fair) from [<c006abf0>] (try_to_wake_up+0x11c/0x410)
[ 8.119684] [<c006abf0>] (try_to_wake_up) from [<c0082fe8>] (__wake_up_common+0x4c/0x80)
[ 8.119690] [<c0082fe8>] (__wake_up_common) from [<c0083030>] (__wake_up_locked+0x14/0x1c)
[ 8.119697] [<c0083030>] (__wake_up_locked) from [<c00837c4>] (complete+0x34/0x44)
[ 8.119705] [<c00837c4>] (complete) from [<c05f2d34>] (cpu_die+0x34/0x9c)
[ 8.119713] [<c05f2d34>] (cpu_die) from [<c0083be0>] (cpu_startup_entry+0x204/0x314)
[ 8.119720] [<c0083be0>] (cpu_startup_entry) from [<800095ac>] (0x800095ac)
[ 8.119748] CPU1: shutdown
--
regards,
-grygorii
^ permalink raw reply [flat|nested] 6+ messages in thread
* [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
2015-08-12 18:25 [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 Grygorii Strashko
@ 2015-08-16 11:42 ` Sebastian Andrzej Siewior
2015-08-17 15:46 ` Grygorii Strashko
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-08-16 11:42 UTC (permalink / raw)
To: linux-arm-kernel
* Grygorii Strashko | 2015-08-12 21:25:50 [+0300]:
>Hi All,
>
>I can constantly see below error report with RT-kernel on TI ARM dra7-evm
>if I'm trying to do Suspend to RAM.
do you see the same problem on x86 with -RT?
> Disabling non-boot CPUs ...
> PM: noirq suspend of devices complete after 7.295 msecs
>[ 100.285729] Disabling non-boot CPUs ...
>[ 100.287575] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
>[ 100.287580] in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
>[ 100.287583] INFO: lockdep is turned off.
>[ 100.287586] irq event stamp: 122
>[ 100.287600] hardirqs last enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
>[ 100.287609] hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
>[ 100.287620] softirqs last enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8
>[ 100.287625] softirqs last disabled at (0): [< (null)>] (null)
>[ 100.287630] Preemption disabled at:[< (null)>] (null)
>[ 100.287631]
>[ 100.287639] CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
>[ 100.287642] Hardware name: Generic DRA74X (Flattened Device Tree)
>[ 100.287659] [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
>[ 100.287671] [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
>[ 100.287681] [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
>[ 100.287689] [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
>[ 100.287699] [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
this is task_lock() which takes a sleeping lock.
>[ 100.287710] [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
>[ 100.287720] [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
>[ 100.287728] [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
>[ 100.287742] [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
>[ 100.287754] [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
this function contains local_save_flags().
>[ 100.287767] [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
>[ 100.287779] [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
>[ 100.287791] [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
>[ 100.288114] CPU1: shutdown
The local_save_flags() should be probably replaced with something else.
>regards,
>-grygorii
Sebastian
^ permalink raw reply [flat|nested] 6+ messages in thread
* [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
2015-08-16 11:42 ` Sebastian Andrzej Siewior
@ 2015-08-17 15:46 ` Grygorii Strashko
2015-09-04 16:46 ` Grygorii Strashko
0 siblings, 1 reply; 6+ messages in thread
From: Grygorii Strashko @ 2015-08-17 15:46 UTC (permalink / raw)
To: linux-arm-kernel
Hi Sebastian,
On 08/16/2015 02:42 PM, Sebastian Andrzej Siewior wrote:
> * Grygorii Strashko | 2015-08-12 21:25:50 [+0300]:
>>
>> I can constantly see below error report with RT-kernel on TI ARM dra7-evm
>> if I'm trying to do Suspend to RAM.
>
> do you see the same problem on x86 with -RT?
Unfortunately, I'm not working with x86 now, and I'm not sure I'll be able
to try it in the nearest future.
Below, I've tried to analyze involved code path and it seems expected
to call clear_tasks_mm_cpumask() in atomic context, as the whole
take_cpu_down() call chain expected to be atomic.
>
>> Disabling non-boot CPUs ...
>> PM: noirq suspend of devices complete after 7.295 msecs
>> [ 100.285729] Disabling non-boot CPUs ...
>> [ 100.287575] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
>> [ 100.287580] in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
>> [ 100.287583] INFO: lockdep is turned off.
>> [ 100.287586] irq event stamp: 122
>> [ 100.287600] hardirqs last enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
>> [ 100.287609] hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
>> [ 100.287620] softirqs last enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8
>> [ 100.287625] softirqs last disabled at (0): [< (null)>] (null)
>> [ 100.287630] Preemption disabled at:[< (null)>] (null)
>> [ 100.287631]
>> [ 100.287639] CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
>> [ 100.287642] Hardware name: Generic DRA74X (Flattened Device Tree)
>> [ 100.287659] [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
>> [ 100.287671] [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
>> [ 100.287681] [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
>> [ 100.287689] [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
>> [ 100.287699] [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
>
> this is task_lock() which takes a sleeping lock.
>
>> [ 100.287710] [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
>> [ 100.287720] [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
>> [ 100.287728] [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
>> [ 100.287742] [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
>> [ 100.287754] [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
>
> this function contains local_save_flags().
multi_cpu_stop:
local_save_flags(flags);
if (!msdata->active_cpus)
is_active = cpu == cpumask_first(cpu_online_mask);
else
is_active = cpumask_test_cpu(cpu, msdata->active_cpus);
/* Simple state machine */
do {
/* Chill out and ensure we re-read multi_stop_state. */
cpu_relax();
if (msdata->state != curstate) {
curstate = msdata->state;
switch (curstate) {
case MULTI_STOP_DISABLE_IRQ:
local_irq_disable();
hard_irq_disable(); <==== Step 2 disable IRQs
break;
case MULTI_STOP_RUN:
if (is_active)
err = msdata->fn(msdata->data); ===> Step 3 take_cpu_down()
break;
default:
break;
}
ack_state(msdata);
}
} while (curstate != MULTI_STOP_EXIT);
local_irq_restore(flags);
>
>> [ 100.287767] [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
cpu_stopper_thread:
preempt_disable();
ret = fn(arg); ===> multi_cpu_stop()
if (ret)
done->ret = ret;
preempt_enable();
>> [ 100.287779] [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
>> [ 100.287791] [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
>> [ 100.288114] CPU1: shutdown
>
> The local_save_flags() should be probably replaced with something else.
>
Below is list of clear_tasks_mm_cpumask() users:
arch/arm/kernel/smp.c
int __cpu_disable(void)
216: clear_tasks_mm_cpumask(cpu);
arch/arm64/kernel/smp.c
int __cpu_disable(void)
238: clear_tasks_mm_cpumask(cpu);
arch/metag/kernel/smp.c
int __cpu_disable(void)
290: clear_tasks_mm_cpumask(cpu);
arch/powerpc/mm/mmu_context_nohash.c
_cpu_down()
+ cpu_notify_nofail(CPU_DEAD | mod, hcpu);
+ mmu_context_cpu_notify()
400: clear_tasks_mm_cpumask(cpu);
arch/sh/kernel/smp.c
int __cpu_disable(void)
155: clear_tasks_mm_cpumask(cpu);
arch/xtensa/kernel/smp.c
int __cpu_disable(void)
279: clear_tasks_mm_cpumask(cpu);
As per above, in all case, except *powerpc*, the clear_tasks_mm_cpumask()
is called from __cpu_disable() (atomic context).
While in powerpc case it's called from _cpu_down() (if I understand right,
in non-atomic context).
Commit which adds clear_tasks_mm_cpumask() call for arm is below:
commit 3eaa73bde2fb475b731a0fde7dd11c3ecfb8679c
Author: Anton Vorontsov <anton.vorontsov@linaro.org>
Date: Thu May 31 16:26:22 2012 -0700
arm: use clear_tasks_mm_cpumask()
Checking for process->mm is not enough because process' main thread may
exit or detach its mm via use_mm(), but other threads may still have a
valid mm.
To fix this we would need to use find_lock_task_mm(), which would walk up
all threads and returns an appropriate task (with task lock held).
clear_tasks_mm_cpumask() has this issue fixed, so let's use it.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index b735521..2c7217d 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -109,7 +109,6 @@ static void percpu_timer_stop(void);
int __cpu_disable(void)
{
unsigned int cpu = smp_processor_id();
- struct task_struct *p;
int ret;
ret = platform_cpu_disable(cpu);
@@ -139,12 +138,7 @@ int __cpu_disable(void)
flush_cache_all();
local_flush_tlb_all();
- read_lock(&tasklist_lock);
- for_each_process(p) {
- if (p->mm)
- cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
- }
- read_unlock(&tasklist_lock);
+ clear_tasks_mm_cpumask(cpu);
return 0;
}
Thanks for your reply.
--
regards,
-grygorii
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
2015-08-17 15:46 ` Grygorii Strashko
@ 2015-09-04 16:46 ` Grygorii Strashko
2015-12-11 16:20 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 6+ messages in thread
From: Grygorii Strashko @ 2015-09-04 16:46 UTC (permalink / raw)
To: linux-arm-kernel
Hi All,
On 08/17/2015 06:46 PM, Grygorii Strashko wrote:
> On 08/16/2015 02:42 PM, Sebastian Andrzej Siewior wrote:
>> * Grygorii Strashko | 2015-08-12 21:25:50 [+0300]:
>>>
>>> I can constantly see below error report with RT-kernel on TI ARM dra7-evm
>>> if I'm trying to do Suspend to RAM.
>>
>> do you see the same problem on x86 with -RT?
>
> Unfortunately, I'm not working with x86 now, and I'm not sure I'll be able
> to try it in the nearest future.
>
> Below, I've tried to analyze involved code path and it seems expected
> to call clear_tasks_mm_cpumask() in atomic context, as the whole
> take_cpu_down() call chain expected to be atomic.
>
>>
>>> Disabling non-boot CPUs ...
>>> PM: noirq suspend of devices complete after 7.295 msecs
>>> [ 100.285729] Disabling non-boot CPUs ...
>>> [ 100.287575] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
>>> [ 100.287580] in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
>>> [ 100.287583] INFO: lockdep is turned off.
>>> [ 100.287586] irq event stamp: 122
>>> [ 100.287600] hardirqs last enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
>>> [ 100.287609] hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
>>> [ 100.287620] softirqs last enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8
>>> [ 100.287625] softirqs last disabled at (0): [< (null)>] (null)
>>> [ 100.287630] Preemption disabled at:[< (null)>] (null)
>>> [ 100.287631]
>>> [ 100.287639] CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
>>> [ 100.287642] Hardware name: Generic DRA74X (Flattened Device Tree)
>>> [ 100.287659] [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
>>> [ 100.287671] [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
>>> [ 100.287681] [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
>>> [ 100.287689] [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
>>> [ 100.287699] [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
>>
>> this is task_lock() which takes a sleeping lock.
>>
>>> [ 100.287710] [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
>>> [ 100.287720] [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
>>> [ 100.287728] [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
>>> [ 100.287742] [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
>>> [ 100.287754] [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
>>
>> this function contains local_save_flags().
>
> multi_cpu_stop:
> local_save_flags(flags);
>
> if (!msdata->active_cpus)
> is_active = cpu == cpumask_first(cpu_online_mask);
> else
> is_active = cpumask_test_cpu(cpu, msdata->active_cpus);
>
> /* Simple state machine */
> do {
> /* Chill out and ensure we re-read multi_stop_state. */
> cpu_relax();
> if (msdata->state != curstate) {
> curstate = msdata->state;
> switch (curstate) {
> case MULTI_STOP_DISABLE_IRQ:
> local_irq_disable();
> hard_irq_disable(); <==== Step 2 disable IRQs
> break;
> case MULTI_STOP_RUN:
> if (is_active)
> err = msdata->fn(msdata->data); ===> Step 3 take_cpu_down()
> break;
> default:
> break;
> }
> ack_state(msdata);
> }
> } while (curstate != MULTI_STOP_EXIT);
>
> local_irq_restore(flags);
>
>>
>>> [ 100.287767] [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
>
> cpu_stopper_thread:
> preempt_disable();
>
> ret = fn(arg); ===> multi_cpu_stop()
> if (ret)
> done->ret = ret;
>
> preempt_enable();
>
>
>>> [ 100.287779] [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
>>> [ 100.287791] [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
>>> [ 100.288114] CPU1: shutdown
>>
>> The local_save_flags() should be probably replaced with something else.
>>
>
> Below is list of clear_tasks_mm_cpumask() users:
> arch/arm/kernel/smp.c
> int __cpu_disable(void)
> 216: clear_tasks_mm_cpumask(cpu);
> arch/arm64/kernel/smp.c
> int __cpu_disable(void)
> 238: clear_tasks_mm_cpumask(cpu);
> arch/metag/kernel/smp.c
> int __cpu_disable(void)
> 290: clear_tasks_mm_cpumask(cpu);
>
> arch/powerpc/mm/mmu_context_nohash.c
> _cpu_down()
> + cpu_notify_nofail(CPU_DEAD | mod, hcpu);
> + mmu_context_cpu_notify()
> 400: clear_tasks_mm_cpumask(cpu);
>
> arch/sh/kernel/smp.c
> int __cpu_disable(void)
> 155: clear_tasks_mm_cpumask(cpu);
> arch/xtensa/kernel/smp.c
> int __cpu_disable(void)
> 279: clear_tasks_mm_cpumask(cpu);
>
> As per above, in all case, except *powerpc*, the clear_tasks_mm_cpumask()
> is called from __cpu_disable() (atomic context).
> While in powerpc case it's called from _cpu_down() (if I understand right,
> in non-atomic context).
>
Below diff fixes issue for me. More over, with this change I can do
CPU1 plug/unplug many times and suspend/resume works much more stable
(before it stuck some times after "Disabling non-boot CPUs ...").
I'd very appreciated for comments.
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 3d6b782..20b5741 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -215,8 +215,6 @@ int __cpu_disable(void)
flush_cache_louis();
local_flush_tlb_all();
- clear_tasks_mm_cpumask(cpu);
-
return 0;
}
@@ -232,6 +230,8 @@ void __cpu_die(unsigned int cpu)
pr_err("CPU%u: cpu didn't die\n", cpu);
return;
}
+ clear_tasks_mm_cpumask(cpu);
+
pr_notice("CPU%u: shutdown\n", cpu);
/*
--
regards,
-grygorii
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
2015-09-04 16:46 ` Grygorii Strashko
@ 2015-12-11 16:20 ` Sebastian Andrzej Siewior
2015-12-12 8:29 ` Grygorii Strashko
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-12-11 16:20 UTC (permalink / raw)
To: linux-arm-kernel
* Grygorii Strashko | 2015-09-04 19:46:51 [+0300]:
>Hi All,
Hi,
>I'd very appreciated for comments.
did you run into any problems with this or is it working for you?
Sebastian
^ permalink raw reply [flat|nested] 6+ messages in thread
* [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
2015-12-11 16:20 ` Sebastian Andrzej Siewior
@ 2015-12-12 8:29 ` Grygorii Strashko
0 siblings, 0 replies; 6+ messages in thread
From: Grygorii Strashko @ 2015-12-12 8:29 UTC (permalink / raw)
To: linux-arm-kernel
On 12/11/2015 06:20 PM, Sebastian Andrzej Siewior wrote:
> * Grygorii Strashko | 2015-09-04 19:46:51 [+0300]:
>
>> Hi All,
> Hi,
>
>> I'd very appreciated for comments.
>
> did you run into any problems with this or is it working for you?
>
No. And corresponding patch was merged in v4.1.7-rt8
9754cf6 ARM: smp: Move clear_tasks_mm_cpumask() call to __cpu_die()
--
regards,
-grygorii
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-12-12 8:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-12 18:25 [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 Grygorii Strashko
2015-08-16 11:42 ` Sebastian Andrzej Siewior
2015-08-17 15:46 ` Grygorii Strashko
2015-09-04 16:46 ` Grygorii Strashko
2015-12-11 16:20 ` Sebastian Andrzej Siewior
2015-12-12 8:29 ` Grygorii Strashko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).