Fix two sleeping function called in atomic context bug

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fix two sleeping function called in atomic context bug
@ 2013-09-16 21:09 Yang Shi
  2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi
  2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi
  0 siblings, 2 replies; 9+ messages in thread
From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw)
  To: linux-rt-users; +Cc: paul.gortmaker, yang.shi


This patch series fix up two "sleeping function called from invalid context"
bugs triggerred by ltp test cases (leapsec_timer and oom) on 3.10 rt.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
  2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi
@ 2013-09-16 21:09 ` Yang Shi
  2013-10-04 15:46   ` Sebastian Andrzej Siewior
  2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi
  1 sibling, 1 reply; 9+ messages in thread
From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw)
  To: linux-rt-users; +Cc: paul.gortmaker, yang.shi

The following trace is triggered when running ltp oom test cases:

BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0

CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30

So, re-enable preemption before schedule_work_on, then disable preemption again.
See a similar change in commit f5eb5588262cab7232ed1d77cf612b327db50767
("ring-buffer: Do not use schedule_work_on() for current CPU") as a precedent.

Since mem_cgroup_reclaim acquires mutex lock before moving forward, and mutex
can promote priority of the process which holds the mutex under PI mechanism,
so it's safe to re-enable preemption for a short period of time because it
won't be preempted by lower priority process.

Signed-off-by: Yang Shi <yang.shi@windriver.com>
---
 mm/memcontrol.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 82a187a..9f7cc0f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
 		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
 			if (cpu == curcpu)
 				drain_local_stock(&stock->work);
-			else
+			else {
+				preempt_enable();
 				schedule_work_on(cpu, &stock->work);
+				preempt_disable();
+			}
 		}
 	}
 	put_cpu();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
  2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi
@ 2013-10-04 15:46   ` Sebastian Andrzej Siewior
  2013-10-04 16:36     ` Yang Shi
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 15:46 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker

* Yang Shi | 2013-09-16 14:09:18 [-0700]:

>---
> mm/memcontrol.c |    5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 82a187a..9f7cc0f 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
> 		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
> 			if (cpu == curcpu)
> 				drain_local_stock(&stock->work);
>-			else
>+			else {
>+				preempt_enable();
> 				schedule_work_on(cpu, &stock->work);
>+				preempt_disable();
>+			}
> 		}

What ensures that you don't switch CPUs between preempt_enable() &
preempt_disable() and is curcpu != smp_processor_id() ?

What about removing the get_cpu() & put_cpu() calls (and the shortcut)?

> 	}
> 	put_cpu();

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
  2013-10-04 15:46   ` Sebastian Andrzej Siewior
@ 2013-10-04 16:36     ` Yang Shi
  2013-10-04 17:10       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Yang Shi @ 2013-10-04 16:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, paul.gortmaker

On 10/4/2013 8:46 AM, Sebastian Andrzej Siewior wrote:
> * Yang Shi | 2013-09-16 14:09:18 [-0700]:
>
>> ---
>> mm/memcontrol.c |    5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 82a187a..9f7cc0f 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
>> 		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>> 			if (cpu == curcpu)
>> 				drain_local_stock(&stock->work);
>> -			else
>> +			else {
>> +				preempt_enable();
>> 				schedule_work_on(cpu, &stock->work);
>> +				preempt_disable();
>> +			}
>> 		}
> What ensures that you don't switch CPUs between preempt_enable() &
> preempt_disable() and is curcpu != smp_processor_id() ?

drain_all_stock is called by drain_all_stock_async or 
drain_all_stock_sync, and the call in both is protected by mutex:

if (!mutex_trylock(&percpu_charge_mutex))
                 return;
         drain_all_stock(root_memcg, false);
         mutex_unlock(&percpu_charge_mutex);


So, I suppose this should be able to protect from migration?

Thanks,
Yang

>
> What about removing the get_cpu() & put_cpu() calls (and the shortcut)?
>
>> 	}
>> 	put_cpu();
> Sebastian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
  2013-10-04 16:36     ` Yang Shi
@ 2013-10-04 17:10       ` Sebastian Andrzej Siewior
  2013-10-04 17:49         ` Yang Shi
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 17:10 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker

* Yang Shi | 2013-10-04 09:36:41 [-0700]:

>>>--- a/mm/memcontrol.c
>>>+++ b/mm/memcontrol.c
>>>@@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
>>>		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>>>			if (cpu == curcpu)
>>>				drain_local_stock(&stock->work);
>>>-			else
>>>+			else {
>>>+				preempt_enable();
>>>				schedule_work_on(cpu, &stock->work);
>>>+				preempt_disable();
>>>+			}
>>>		}
>>What ensures that you don't switch CPUs between preempt_enable() &
>>preempt_disable() and is curcpu != smp_processor_id() ?
>
>drain_all_stock is called by drain_all_stock_async or
>drain_all_stock_sync, and the call in both is protected by mutex:
>
>if (!mutex_trylock(&percpu_charge_mutex))
>                return;
>        drain_all_stock(root_memcg, false);
>        mutex_unlock(&percpu_charge_mutex);
>
>
>So, I suppose this should be able to protect from migration?

preempt_disable() ensures that the task executing drain_all_stock() is
not moved from cpu1 to cpu5. Lets say we run cpu1, on first invocation
we get we get moved from cpu1 to cpu5 after preempt_enable(). On the
second run we have (1 == 1) and invoke drain_local_stock() the argument
is ignored so we execute drain_local_stock() with data of cpu5. Later we
schedule work for cpu5 again but we never did it for cpu1.

The code here is robust enough that nothing bad happens if
drain_local_stock() is invoked twice on one CPU and the system probably
survives it if one CPU is skipped. However I would prefer not to have
such an example in the queue where it seems that it is okay to just
enable preemption and invoke schedule_work_on() because it breaks the
assumptions which are made by get_cpu().

>Thanks,
>Yang

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
  2013-10-04 17:10       ` Sebastian Andrzej Siewior
@ 2013-10-04 17:49         ` Yang Shi
  2013-10-04 17:56           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Yang Shi @ 2013-10-04 17:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, paul.gortmaker

On 10/4/2013 10:10 AM, Sebastian Andrzej Siewior wrote:
> * Yang Shi | 2013-10-04 09:36:41 [-0700]:
>
>>>> --- a/mm/memcontrol.c
>>>> +++ b/mm/memcontrol.c
>>>> @@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
>>>> 		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>>>> 			if (cpu == curcpu)
>>>> 				drain_local_stock(&stock->work);
>>>> -			else
>>>> +			else {
>>>> +				preempt_enable();
>>>> 				schedule_work_on(cpu, &stock->work);
>>>> +				preempt_disable();
>>>> +			}
>>>> 		}
>>> What ensures that you don't switch CPUs between preempt_enable() &
>>> preempt_disable() and is curcpu != smp_processor_id() ?
>> drain_all_stock is called by drain_all_stock_async or
>> drain_all_stock_sync, and the call in both is protected by mutex:
>>
>> if (!mutex_trylock(&percpu_charge_mutex))
>>                 return;
>>         drain_all_stock(root_memcg, false);
>>         mutex_unlock(&percpu_charge_mutex);
>>
>>
>> So, I suppose this should be able to protect from migration?
> preempt_disable() ensures that the task executing drain_all_stock() is
> not moved from cpu1 to cpu5. Lets say we run cpu1, on first invocation
> we get we get moved from cpu1 to cpu5 after preempt_enable(). On the
> second run we have (1 == 1) and invoke drain_local_stock() the argument
> is ignored so we execute drain_local_stock() with data of cpu5. Later we
> schedule work for cpu5 again but we never did it for cpu1.
>
> The code here is robust enough that nothing bad happens if
> drain_local_stock() is invoked twice on one CPU and the system probably
> survives it if one CPU is skipped. However I would prefer not to have
> such an example in the queue where it seems that it is okay to just
> enable preemption and invoke schedule_work_on() because it breaks the
> assumptions which are made by get_cpu().

Ok, I see. Anyway, we can't call schedule_work_on with preempt disabled.

And, I checked the git commit history about the drain_local_stock call, 
it sounds it is just an optimization for preventing from deferring local 
stock drain to work queue.

So, It sounds safe to remove the get_cpu and the shortcut to make 
schedule_work_on called safely as you suggested.

If this sounds fine to you, I'm going to come up with V2.

Thanks,
Yang


>
>> Thanks,
>> Yang
> Sebastian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
  2013-10-04 17:49         ` Yang Shi
@ 2013-10-04 17:56           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 17:56 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker

On 10/04/2013 07:49 PM, Yang Shi wrote:
> And, I checked the git commit history about the drain_local_stock call,
> it sounds it is just an optimization for preventing from deferring local
> stock drain to work queue.
> 
> So, It sounds safe to remove the get_cpu and the shortcut to make
> schedule_work_on called safely as you suggested.
> 
> If this sounds fine to you, I'm going to come up with V2.

This sounds fine to me, thanks.

> 
> Thanks,
> Yang

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] rt: Move schedule_work call to helper thread
  2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi
  2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi
@ 2013-09-16 21:09 ` Yang Shi
  2013-10-04 17:11   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 9+ messages in thread
From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw)
  To: linux-rt-users; +Cc: paul.gortmaker, yang.shi

When run ltp leapsec_timer test, the following call trace is caught:

BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffffffff81c2f800 ffff880076843e40 ffffffff8169918d ffff880076843e58
ffffffff8106db31 ffff88007684b4a0 ffff880076843e70 ffffffff8169d9c0
ffff88007684b4a0 ffff880076843eb0 ffffffff81059da1 0000001876851200
Call Trace:
<IRQ>  [<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff81065aa1>] clock_was_set_delayed+0x21/0x30
[<ffffffff810883be>] do_timer+0x40e/0x660
[<ffffffff8108f487>] tick_do_update_jiffies64+0xf7/0x140
[<ffffffff8108fe42>] tick_check_idle+0x92/0xc0
[<ffffffff81044327>] irq_enter+0x57/0x70
[<ffffffff816a040e>] smp_apic_timer_interrupt+0x3e/0x9b
[<ffffffff8169f80a>] apic_timer_interrupt+0x6a/0x70
<EOI>  [<ffffffff8155ea1c>] ? cpuidle_enter_state+0x4c/0xc0
[<ffffffff8155eb68>] cpuidle_idle_call+0xd8/0x2d0
[<ffffffff8100b59e>] arch_cpu_idle+0xe/0x30
[<ffffffff8108585e>] cpu_startup_entry+0x19e/0x310
[<ffffffff8168efa2>] start_secondary+0x1ad/0x1b0

The clock_was_set_delayed is called in hard IRQ handler (timer interrupt), which
calls schedule_work.

Under PREEMPT_RT_FULL, schedule_work calls spinlocks which could sleep, so it's
not safe to call schedule_work in interrupt context.

Reference upstream commit b68d61c705ef02384c0538b8d9374545097899ca
(rt,ntp: Move call to schedule_delayed_work() to helper thread)
from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git, which
makes a similar change.

add a helper thread which does the call to schedule_work and wake up that
thread instead of calling schedule_work directly.

Signed-off-by: Yang Shi <yang.shi@windriver.com>
---
 kernel/hrtimer.c |   40 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index bd61c40..a63cfaf 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -47,6 +47,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/sched/rt.h>
 #include <linux/timer.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 
@@ -740,6 +741,44 @@ static void clock_was_set_work(struct work_struct *work)
 
 static DECLARE_WORK(hrtimer_work, clock_was_set_work);
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * RT can not call schedule_work from real interrupt context.
+ * Need to make a thread to do the real work.
+ */
+static struct task_struct *clock_set_delay_thread;
+static bool do_clock_set_delay;
+
+static int run_clock_set_delay(void *ignore)
+{
+	while (!kthread_should_stop()) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (do_clock_set_delay) {
+			do_clock_set_delay = false;
+			schedule_work(&hrtimer_work);
+		}
+		schedule();
+	}
+	__set_current_state(TASK_RUNNING);
+	return 0;
+}
+
+void clock_was_set_delayed(void)
+{
+	do_clock_set_delay = true;
+	/* Make visible before waking up process */
+	smp_wmb();
+	wake_up_process(clock_set_delay_thread);
+}
+
+static __init int create_clock_set_delay_thread(void)
+{
+	clock_set_delay_thread = kthread_run(run_clock_set_delay, NULL, "kclksetdelayd");
+	BUG_ON(!clock_set_delay_thread);
+	return 0;
+}
+early_initcall(create_clock_set_delay_thread);
+#else /* PREEMPT_RT_FULL */
 /*
  * Called from timekeeping and resume code to reprogramm the hrtimer
  * interrupt device on all cpus.
@@ -748,6 +787,7 @@ void clock_was_set_delayed(void)
 {
 	schedule_work(&hrtimer_work);
 }
+#endif
 
 #else
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] rt: Move schedule_work call to helper thread
  2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi
@ 2013-10-04 17:11   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 17:11 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker

* Yang Shi | 2013-09-16 14:09:19 [-0700]:

>When run ltp leapsec_timer test, the following call trace is caught:
>
>BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
>in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310

This is okay and I take this, thanks.
In the long run I need something smarter here because this is the third
thread/process for a corner case.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-10-04 17:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi
2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi
2013-10-04 15:46   ` Sebastian Andrzej Siewior
2013-10-04 16:36     ` Yang Shi
2013-10-04 17:10       ` Sebastian Andrzej Siewior
2013-10-04 17:49         ` Yang Shi
2013-10-04 17:56           ` Sebastian Andrzej Siewior
2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi
2013-10-04 17:11   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).