* Fix two sleeping function called in atomic context bug
@ 2013-09-16 21:09 Yang Shi
2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi
2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi
0 siblings, 2 replies; 9+ messages in thread
From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw)
To: linux-rt-users; +Cc: paul.gortmaker, yang.shi
This patch series fix up two "sleeping function called from invalid context"
bugs triggerred by ltp test cases (leapsec_timer and oom) on 3.10 rt.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi
@ 2013-09-16 21:09 ` Yang Shi
2013-10-04 15:46 ` Sebastian Andrzej Siewior
2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi
1 sibling, 1 reply; 9+ messages in thread
From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw)
To: linux-rt-users; +Cc: paul.gortmaker, yang.shi
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, re-enable preemption before schedule_work_on, then disable preemption again.
See a similar change in commit f5eb5588262cab7232ed1d77cf612b327db50767
("ring-buffer: Do not use schedule_work_on() for current CPU") as a precedent.
Since mem_cgroup_reclaim acquires mutex lock before moving forward, and mutex
can promote priority of the process which holds the mutex under PI mechanism,
so it's safe to re-enable preemption for a short period of time because it
won't be preempted by lower priority process.
Signed-off-by: Yang Shi <yang.shi@windriver.com>
---
mm/memcontrol.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 82a187a..9f7cc0f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
if (cpu == curcpu)
drain_local_stock(&stock->work);
- else
+ else {
+ preempt_enable();
schedule_work_on(cpu, &stock->work);
+ preempt_disable();
+ }
}
}
put_cpu();
--
1.7.5.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] rt: Move schedule_work call to helper thread
2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi
2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi
@ 2013-09-16 21:09 ` Yang Shi
2013-10-04 17:11 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 9+ messages in thread
From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw)
To: linux-rt-users; +Cc: paul.gortmaker, yang.shi
When run ltp leapsec_timer test, the following call trace is caught:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffffffff81c2f800 ffff880076843e40 ffffffff8169918d ffff880076843e58
ffffffff8106db31 ffff88007684b4a0 ffff880076843e70 ffffffff8169d9c0
ffff88007684b4a0 ffff880076843eb0 ffffffff81059da1 0000001876851200
Call Trace:
<IRQ> [<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff81065aa1>] clock_was_set_delayed+0x21/0x30
[<ffffffff810883be>] do_timer+0x40e/0x660
[<ffffffff8108f487>] tick_do_update_jiffies64+0xf7/0x140
[<ffffffff8108fe42>] tick_check_idle+0x92/0xc0
[<ffffffff81044327>] irq_enter+0x57/0x70
[<ffffffff816a040e>] smp_apic_timer_interrupt+0x3e/0x9b
[<ffffffff8169f80a>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff8155ea1c>] ? cpuidle_enter_state+0x4c/0xc0
[<ffffffff8155eb68>] cpuidle_idle_call+0xd8/0x2d0
[<ffffffff8100b59e>] arch_cpu_idle+0xe/0x30
[<ffffffff8108585e>] cpu_startup_entry+0x19e/0x310
[<ffffffff8168efa2>] start_secondary+0x1ad/0x1b0
The clock_was_set_delayed is called in hard IRQ handler (timer interrupt), which
calls schedule_work.
Under PREEMPT_RT_FULL, schedule_work calls spinlocks which could sleep, so it's
not safe to call schedule_work in interrupt context.
Reference upstream commit b68d61c705ef02384c0538b8d9374545097899ca
(rt,ntp: Move call to schedule_delayed_work() to helper thread)
from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git, which
makes a similar change.
add a helper thread which does the call to schedule_work and wake up that
thread instead of calling schedule_work directly.
Signed-off-by: Yang Shi <yang.shi@windriver.com>
---
kernel/hrtimer.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 files changed, 40 insertions(+), 0 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index bd61c40..a63cfaf 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -47,6 +47,7 @@
#include <linux/sched/sysctl.h>
#include <linux/sched/rt.h>
#include <linux/timer.h>
+#include <linux/kthread.h>
#include <asm/uaccess.h>
@@ -740,6 +741,44 @@ static void clock_was_set_work(struct work_struct *work)
static DECLARE_WORK(hrtimer_work, clock_was_set_work);
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * RT can not call schedule_work from real interrupt context.
+ * Need to make a thread to do the real work.
+ */
+static struct task_struct *clock_set_delay_thread;
+static bool do_clock_set_delay;
+
+static int run_clock_set_delay(void *ignore)
+{
+ while (!kthread_should_stop()) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (do_clock_set_delay) {
+ do_clock_set_delay = false;
+ schedule_work(&hrtimer_work);
+ }
+ schedule();
+ }
+ __set_current_state(TASK_RUNNING);
+ return 0;
+}
+
+void clock_was_set_delayed(void)
+{
+ do_clock_set_delay = true;
+ /* Make visible before waking up process */
+ smp_wmb();
+ wake_up_process(clock_set_delay_thread);
+}
+
+static __init int create_clock_set_delay_thread(void)
+{
+ clock_set_delay_thread = kthread_run(run_clock_set_delay, NULL, "kclksetdelayd");
+ BUG_ON(!clock_set_delay_thread);
+ return 0;
+}
+early_initcall(create_clock_set_delay_thread);
+#else /* PREEMPT_RT_FULL */
/*
* Called from timekeeping and resume code to reprogramm the hrtimer
* interrupt device on all cpus.
@@ -748,6 +787,7 @@ void clock_was_set_delayed(void)
{
schedule_work(&hrtimer_work);
}
+#endif
#else
--
1.7.5.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi
@ 2013-10-04 15:46 ` Sebastian Andrzej Siewior
2013-10-04 16:36 ` Yang Shi
0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 15:46 UTC (permalink / raw)
To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker
* Yang Shi | 2013-09-16 14:09:18 [-0700]:
>---
> mm/memcontrol.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 82a187a..9f7cc0f 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
> if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
> if (cpu == curcpu)
> drain_local_stock(&stock->work);
>- else
>+ else {
>+ preempt_enable();
> schedule_work_on(cpu, &stock->work);
>+ preempt_disable();
>+ }
> }
What ensures that you don't switch CPUs between preempt_enable() &
preempt_disable() and is curcpu != smp_processor_id() ?
What about removing the get_cpu() & put_cpu() calls (and the shortcut)?
> }
> put_cpu();
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
2013-10-04 15:46 ` Sebastian Andrzej Siewior
@ 2013-10-04 16:36 ` Yang Shi
2013-10-04 17:10 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 9+ messages in thread
From: Yang Shi @ 2013-10-04 16:36 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, paul.gortmaker
On 10/4/2013 8:46 AM, Sebastian Andrzej Siewior wrote:
> * Yang Shi | 2013-09-16 14:09:18 [-0700]:
>
>> ---
>> mm/memcontrol.c | 5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 82a187a..9f7cc0f 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
>> if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>> if (cpu == curcpu)
>> drain_local_stock(&stock->work);
>> - else
>> + else {
>> + preempt_enable();
>> schedule_work_on(cpu, &stock->work);
>> + preempt_disable();
>> + }
>> }
> What ensures that you don't switch CPUs between preempt_enable() &
> preempt_disable() and is curcpu != smp_processor_id() ?
drain_all_stock is called by drain_all_stock_async or
drain_all_stock_sync, and the call in both is protected by mutex:
if (!mutex_trylock(&percpu_charge_mutex))
return;
drain_all_stock(root_memcg, false);
mutex_unlock(&percpu_charge_mutex);
So, I suppose this should be able to protect from migration?
Thanks,
Yang
>
> What about removing the get_cpu() & put_cpu() calls (and the shortcut)?
>
>> }
>> put_cpu();
> Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
2013-10-04 16:36 ` Yang Shi
@ 2013-10-04 17:10 ` Sebastian Andrzej Siewior
2013-10-04 17:49 ` Yang Shi
0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 17:10 UTC (permalink / raw)
To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker
* Yang Shi | 2013-10-04 09:36:41 [-0700]:
>>>--- a/mm/memcontrol.c
>>>+++ b/mm/memcontrol.c
>>>@@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
>>> if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>>> if (cpu == curcpu)
>>> drain_local_stock(&stock->work);
>>>- else
>>>+ else {
>>>+ preempt_enable();
>>> schedule_work_on(cpu, &stock->work);
>>>+ preempt_disable();
>>>+ }
>>> }
>>What ensures that you don't switch CPUs between preempt_enable() &
>>preempt_disable() and is curcpu != smp_processor_id() ?
>
>drain_all_stock is called by drain_all_stock_async or
>drain_all_stock_sync, and the call in both is protected by mutex:
>
>if (!mutex_trylock(&percpu_charge_mutex))
> return;
> drain_all_stock(root_memcg, false);
> mutex_unlock(&percpu_charge_mutex);
>
>
>So, I suppose this should be able to protect from migration?
preempt_disable() ensures that the task executing drain_all_stock() is
not moved from cpu1 to cpu5. Lets say we run cpu1, on first invocation
we get we get moved from cpu1 to cpu5 after preempt_enable(). On the
second run we have (1 == 1) and invoke drain_local_stock() the argument
is ignored so we execute drain_local_stock() with data of cpu5. Later we
schedule work for cpu5 again but we never did it for cpu1.
The code here is robust enough that nothing bad happens if
drain_local_stock() is invoked twice on one CPU and the system probably
survives it if one CPU is skipped. However I would prefer not to have
such an example in the queue where it seems that it is okay to just
enable preemption and invoke schedule_work_on() because it breaks the
assumptions which are made by get_cpu().
>Thanks,
>Yang
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] rt: Move schedule_work call to helper thread
2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi
@ 2013-10-04 17:11 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 17:11 UTC (permalink / raw)
To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker
* Yang Shi | 2013-09-16 14:09:19 [-0700]:
>When run ltp leapsec_timer test, the following call trace is caught:
>
>BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
>in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310
This is okay and I take this, thanks.
In the long run I need something smarter here because this is the third
thread/process for a corner case.
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
2013-10-04 17:10 ` Sebastian Andrzej Siewior
@ 2013-10-04 17:49 ` Yang Shi
2013-10-04 17:56 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 9+ messages in thread
From: Yang Shi @ 2013-10-04 17:49 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, paul.gortmaker
On 10/4/2013 10:10 AM, Sebastian Andrzej Siewior wrote:
> * Yang Shi | 2013-10-04 09:36:41 [-0700]:
>
>>>> --- a/mm/memcontrol.c
>>>> +++ b/mm/memcontrol.c
>>>> @@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
>>>> if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>>>> if (cpu == curcpu)
>>>> drain_local_stock(&stock->work);
>>>> - else
>>>> + else {
>>>> + preempt_enable();
>>>> schedule_work_on(cpu, &stock->work);
>>>> + preempt_disable();
>>>> + }
>>>> }
>>> What ensures that you don't switch CPUs between preempt_enable() &
>>> preempt_disable() and is curcpu != smp_processor_id() ?
>> drain_all_stock is called by drain_all_stock_async or
>> drain_all_stock_sync, and the call in both is protected by mutex:
>>
>> if (!mutex_trylock(&percpu_charge_mutex))
>> return;
>> drain_all_stock(root_memcg, false);
>> mutex_unlock(&percpu_charge_mutex);
>>
>>
>> So, I suppose this should be able to protect from migration?
> preempt_disable() ensures that the task executing drain_all_stock() is
> not moved from cpu1 to cpu5. Lets say we run cpu1, on first invocation
> we get we get moved from cpu1 to cpu5 after preempt_enable(). On the
> second run we have (1 == 1) and invoke drain_local_stock() the argument
> is ignored so we execute drain_local_stock() with data of cpu5. Later we
> schedule work for cpu5 again but we never did it for cpu1.
>
> The code here is robust enough that nothing bad happens if
> drain_local_stock() is invoked twice on one CPU and the system probably
> survives it if one CPU is skipped. However I would prefer not to have
> such an example in the queue where it seems that it is okay to just
> enable preemption and invoke schedule_work_on() because it breaks the
> assumptions which are made by get_cpu().
Ok, I see. Anyway, we can't call schedule_work_on with preempt disabled.
And, I checked the git commit history about the drain_local_stock call,
it sounds it is just an optimization for preventing from deferring local
stock drain to work queue.
So, It sounds safe to remove the get_cpu and the shortcut to make
schedule_work_on called safely as you suggested.
If this sounds fine to you, I'm going to come up with V2.
Thanks,
Yang
>
>> Thanks,
>> Yang
> Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context
2013-10-04 17:49 ` Yang Shi
@ 2013-10-04 17:56 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 17:56 UTC (permalink / raw)
To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker
On 10/04/2013 07:49 PM, Yang Shi wrote:
> And, I checked the git commit history about the drain_local_stock call,
> it sounds it is just an optimization for preventing from deferring local
> stock drain to work queue.
>
> So, It sounds safe to remove the get_cpu and the shortcut to make
> schedule_work_on called safely as you suggested.
>
> If this sounds fine to you, I'm going to come up with V2.
This sounds fine to me, thanks.
>
> Thanks,
> Yang
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-04 17:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi
2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi
2013-10-04 15:46 ` Sebastian Andrzej Siewior
2013-10-04 16:36 ` Yang Shi
2013-10-04 17:10 ` Sebastian Andrzej Siewior
2013-10-04 17:49 ` Yang Shi
2013-10-04 17:56 ` Sebastian Andrzej Siewior
2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi
2013-10-04 17:11 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).