* Fix two sleeping function called in atomic context bug @ 2013-09-16 21:09 Yang Shi 2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi 2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi 0 siblings, 2 replies; 9+ messages in thread From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw) To: linux-rt-users; +Cc: paul.gortmaker, yang.shi This patch series fix up two "sleeping function called from invalid context" bugs triggerred by ltp test cases (leapsec_timer and oom) on 3.10 rt. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context 2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi @ 2013-09-16 21:09 ` Yang Shi 2013-10-04 15:46 ` Sebastian Andrzej Siewior 2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi 1 sibling, 1 reply; 9+ messages in thread From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw) To: linux-rt-users; +Cc: paul.gortmaker, yang.shi The following trace is triggered when running ltp oom test cases: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03 Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0 CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2 Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010 ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70 ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0 ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0 Call Trace: [<ffffffff8169918d>] dump_stack+0x19/0x1b [<ffffffff8106db31>] __might_sleep+0xf1/0x170 [<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50 [<ffffffff81059da1>] queue_work_on+0x61/0x100 [<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0 [<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0 [<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40 [<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0 [<ffffffff8106f200>] ? sched_exec+0x40/0xb0 [<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70 [<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30 [<ffffffff8110af68>] handle_pte_fault+0x618/0x840 [<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70 [<ffffffff81070f94>] ? migrate_enable+0xd4/0x200 [<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0 [<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0 [<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70 [<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40 [<ffffffff8103053e>] do_page_fault+0xe/0x10 [<ffffffff8169e4c2>] page_fault+0x22/0x30 So, re-enable preemption before schedule_work_on, then disable preemption again. See a similar change in commit f5eb5588262cab7232ed1d77cf612b327db50767 ("ring-buffer: Do not use schedule_work_on() for current CPU") as a precedent. Since mem_cgroup_reclaim acquires mutex lock before moving forward, and mutex can promote priority of the process which holds the mutex under PI mechanism, so it's safe to re-enable preemption for a short period of time because it won't be preempted by lower priority process. Signed-off-by: Yang Shi <yang.shi@windriver.com> --- mm/memcontrol.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 82a187a..9f7cc0f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { if (cpu == curcpu) drain_local_stock(&stock->work); - else + else { + preempt_enable(); schedule_work_on(cpu, &stock->work); + preempt_disable(); + } } } put_cpu(); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context 2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi @ 2013-10-04 15:46 ` Sebastian Andrzej Siewior 2013-10-04 16:36 ` Yang Shi 0 siblings, 1 reply; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2013-10-04 15:46 UTC (permalink / raw) To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker * Yang Shi | 2013-09-16 14:09:18 [-0700]: >--- > mm/memcontrol.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > >diff --git a/mm/memcontrol.c b/mm/memcontrol.c >index 82a187a..9f7cc0f 100644 >--- a/mm/memcontrol.c >+++ b/mm/memcontrol.c >@@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) > if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > if (cpu == curcpu) > drain_local_stock(&stock->work); >- else >+ else { >+ preempt_enable(); > schedule_work_on(cpu, &stock->work); >+ preempt_disable(); >+ } > } What ensures that you don't switch CPUs between preempt_enable() & preempt_disable() and is curcpu != smp_processor_id() ? What about removing the get_cpu() & put_cpu() calls (and the shortcut)? > } > put_cpu(); Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context 2013-10-04 15:46 ` Sebastian Andrzej Siewior @ 2013-10-04 16:36 ` Yang Shi 2013-10-04 17:10 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 9+ messages in thread From: Yang Shi @ 2013-10-04 16:36 UTC (permalink / raw) To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, paul.gortmaker On 10/4/2013 8:46 AM, Sebastian Andrzej Siewior wrote: > * Yang Shi | 2013-09-16 14:09:18 [-0700]: > >> --- >> mm/memcontrol.c | 5 ++++- >> 1 files changed, 4 insertions(+), 1 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 82a187a..9f7cc0f 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) >> if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { >> if (cpu == curcpu) >> drain_local_stock(&stock->work); >> - else >> + else { >> + preempt_enable(); >> schedule_work_on(cpu, &stock->work); >> + preempt_disable(); >> + } >> } > What ensures that you don't switch CPUs between preempt_enable() & > preempt_disable() and is curcpu != smp_processor_id() ? drain_all_stock is called by drain_all_stock_async or drain_all_stock_sync, and the call in both is protected by mutex: if (!mutex_trylock(&percpu_charge_mutex)) return; drain_all_stock(root_memcg, false); mutex_unlock(&percpu_charge_mutex); So, I suppose this should be able to protect from migration? Thanks, Yang > > What about removing the get_cpu() & put_cpu() calls (and the shortcut)? > >> } >> put_cpu(); > Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context 2013-10-04 16:36 ` Yang Shi @ 2013-10-04 17:10 ` Sebastian Andrzej Siewior 2013-10-04 17:49 ` Yang Shi 0 siblings, 1 reply; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2013-10-04 17:10 UTC (permalink / raw) To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker * Yang Shi | 2013-10-04 09:36:41 [-0700]: >>>--- a/mm/memcontrol.c >>>+++ b/mm/memcontrol.c >>>@@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) >>> if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { >>> if (cpu == curcpu) >>> drain_local_stock(&stock->work); >>>- else >>>+ else { >>>+ preempt_enable(); >>> schedule_work_on(cpu, &stock->work); >>>+ preempt_disable(); >>>+ } >>> } >>What ensures that you don't switch CPUs between preempt_enable() & >>preempt_disable() and is curcpu != smp_processor_id() ? > >drain_all_stock is called by drain_all_stock_async or >drain_all_stock_sync, and the call in both is protected by mutex: > >if (!mutex_trylock(&percpu_charge_mutex)) > return; > drain_all_stock(root_memcg, false); > mutex_unlock(&percpu_charge_mutex); > > >So, I suppose this should be able to protect from migration? preempt_disable() ensures that the task executing drain_all_stock() is not moved from cpu1 to cpu5. Lets say we run cpu1, on first invocation we get we get moved from cpu1 to cpu5 after preempt_enable(). On the second run we have (1 == 1) and invoke drain_local_stock() the argument is ignored so we execute drain_local_stock() with data of cpu5. Later we schedule work for cpu5 again but we never did it for cpu1. The code here is robust enough that nothing bad happens if drain_local_stock() is invoked twice on one CPU and the system probably survives it if one CPU is skipped. However I would prefer not to have such an example in the queue where it seems that it is okay to just enable preemption and invoke schedule_work_on() because it breaks the assumptions which are made by get_cpu(). >Thanks, >Yang Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context 2013-10-04 17:10 ` Sebastian Andrzej Siewior @ 2013-10-04 17:49 ` Yang Shi 2013-10-04 17:56 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 9+ messages in thread From: Yang Shi @ 2013-10-04 17:49 UTC (permalink / raw) To: Sebastian Andrzej Siewior; +Cc: linux-rt-users, paul.gortmaker On 10/4/2013 10:10 AM, Sebastian Andrzej Siewior wrote: > * Yang Shi | 2013-10-04 09:36:41 [-0700]: > >>>> --- a/mm/memcontrol.c >>>> +++ b/mm/memcontrol.c >>>> @@ -2453,8 +2453,11 @@ static void drain_all_stock(struct mem_cgroup *root_memcg, bool sync) >>>> if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { >>>> if (cpu == curcpu) >>>> drain_local_stock(&stock->work); >>>> - else >>>> + else { >>>> + preempt_enable(); >>>> schedule_work_on(cpu, &stock->work); >>>> + preempt_disable(); >>>> + } >>>> } >>> What ensures that you don't switch CPUs between preempt_enable() & >>> preempt_disable() and is curcpu != smp_processor_id() ? >> drain_all_stock is called by drain_all_stock_async or >> drain_all_stock_sync, and the call in both is protected by mutex: >> >> if (!mutex_trylock(&percpu_charge_mutex)) >> return; >> drain_all_stock(root_memcg, false); >> mutex_unlock(&percpu_charge_mutex); >> >> >> So, I suppose this should be able to protect from migration? > preempt_disable() ensures that the task executing drain_all_stock() is > not moved from cpu1 to cpu5. Lets say we run cpu1, on first invocation > we get we get moved from cpu1 to cpu5 after preempt_enable(). On the > second run we have (1 == 1) and invoke drain_local_stock() the argument > is ignored so we execute drain_local_stock() with data of cpu5. Later we > schedule work for cpu5 again but we never did it for cpu1. > > The code here is robust enough that nothing bad happens if > drain_local_stock() is invoked twice on one CPU and the system probably > survives it if one CPU is skipped. However I would prefer not to have > such an example in the queue where it seems that it is okay to just > enable preemption and invoke schedule_work_on() because it breaks the > assumptions which are made by get_cpu(). Ok, I see. Anyway, we can't call schedule_work_on with preempt disabled. And, I checked the git commit history about the drain_local_stock call, it sounds it is just an optimization for preventing from deferring local stock drain to work queue. So, It sounds safe to remove the get_cpu and the shortcut to make schedule_work_on called safely as you suggested. If this sounds fine to you, I'm going to come up with V2. Thanks, Yang > >> Thanks, >> Yang > Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context 2013-10-04 17:49 ` Yang Shi @ 2013-10-04 17:56 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2013-10-04 17:56 UTC (permalink / raw) To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker On 10/04/2013 07:49 PM, Yang Shi wrote: > And, I checked the git commit history about the drain_local_stock call, > it sounds it is just an optimization for preventing from deferring local > stock drain to work queue. > > So, It sounds safe to remove the get_cpu and the shortcut to make > schedule_work_on called safely as you suggested. > > If this sounds fine to you, I'm going to come up with V2. This sounds fine to me, thanks. > > Thanks, > Yang Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] rt: Move schedule_work call to helper thread 2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi 2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi @ 2013-09-16 21:09 ` Yang Shi 2013-10-04 17:11 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 9+ messages in thread From: Yang Shi @ 2013-09-16 21:09 UTC (permalink / raw) To: linux-rt-users; +Cc: paul.gortmaker, yang.shi When run ltp leapsec_timer test, the following call trace is caught: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-rt3 #2 Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010 ffffffff81c2f800 ffff880076843e40 ffffffff8169918d ffff880076843e58 ffffffff8106db31 ffff88007684b4a0 ffff880076843e70 ffffffff8169d9c0 ffff88007684b4a0 ffff880076843eb0 ffffffff81059da1 0000001876851200 Call Trace: <IRQ> [<ffffffff8169918d>] dump_stack+0x19/0x1b [<ffffffff8106db31>] __might_sleep+0xf1/0x170 [<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50 [<ffffffff81059da1>] queue_work_on+0x61/0x100 [<ffffffff81065aa1>] clock_was_set_delayed+0x21/0x30 [<ffffffff810883be>] do_timer+0x40e/0x660 [<ffffffff8108f487>] tick_do_update_jiffies64+0xf7/0x140 [<ffffffff8108fe42>] tick_check_idle+0x92/0xc0 [<ffffffff81044327>] irq_enter+0x57/0x70 [<ffffffff816a040e>] smp_apic_timer_interrupt+0x3e/0x9b [<ffffffff8169f80a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff8155ea1c>] ? cpuidle_enter_state+0x4c/0xc0 [<ffffffff8155eb68>] cpuidle_idle_call+0xd8/0x2d0 [<ffffffff8100b59e>] arch_cpu_idle+0xe/0x30 [<ffffffff8108585e>] cpu_startup_entry+0x19e/0x310 [<ffffffff8168efa2>] start_secondary+0x1ad/0x1b0 The clock_was_set_delayed is called in hard IRQ handler (timer interrupt), which calls schedule_work. Under PREEMPT_RT_FULL, schedule_work calls spinlocks which could sleep, so it's not safe to call schedule_work in interrupt context. Reference upstream commit b68d61c705ef02384c0538b8d9374545097899ca (rt,ntp: Move call to schedule_delayed_work() to helper thread) from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git, which makes a similar change. add a helper thread which does the call to schedule_work and wake up that thread instead of calling schedule_work directly. Signed-off-by: Yang Shi <yang.shi@windriver.com> --- kernel/hrtimer.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index bd61c40..a63cfaf 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -47,6 +47,7 @@ #include <linux/sched/sysctl.h> #include <linux/sched/rt.h> #include <linux/timer.h> +#include <linux/kthread.h> #include <asm/uaccess.h> @@ -740,6 +741,44 @@ static void clock_was_set_work(struct work_struct *work) static DECLARE_WORK(hrtimer_work, clock_was_set_work); +#ifdef CONFIG_PREEMPT_RT_FULL +/* + * RT can not call schedule_work from real interrupt context. + * Need to make a thread to do the real work. + */ +static struct task_struct *clock_set_delay_thread; +static bool do_clock_set_delay; + +static int run_clock_set_delay(void *ignore) +{ + while (!kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); + if (do_clock_set_delay) { + do_clock_set_delay = false; + schedule_work(&hrtimer_work); + } + schedule(); + } + __set_current_state(TASK_RUNNING); + return 0; +} + +void clock_was_set_delayed(void) +{ + do_clock_set_delay = true; + /* Make visible before waking up process */ + smp_wmb(); + wake_up_process(clock_set_delay_thread); +} + +static __init int create_clock_set_delay_thread(void) +{ + clock_set_delay_thread = kthread_run(run_clock_set_delay, NULL, "kclksetdelayd"); + BUG_ON(!clock_set_delay_thread); + return 0; +} +early_initcall(create_clock_set_delay_thread); +#else /* PREEMPT_RT_FULL */ /* * Called from timekeeping and resume code to reprogramm the hrtimer * interrupt device on all cpus. @@ -748,6 +787,7 @@ void clock_was_set_delayed(void) { schedule_work(&hrtimer_work); } +#endif #else -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] rt: Move schedule_work call to helper thread 2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi @ 2013-10-04 17:11 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2013-10-04 17:11 UTC (permalink / raw) To: Yang Shi; +Cc: linux-rt-users, paul.gortmaker * Yang Shi | 2013-09-16 14:09:19 [-0700]: >When run ltp leapsec_timer test, the following call trace is caught: > >BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 >in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 >Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310 This is okay and I take this, thanks. In the long run I need something smarter here because this is the third thread/process for a corner case. Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-04 17:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-16 21:09 Fix two sleeping function called in atomic context bug Yang Shi 2013-09-16 21:09 ` [PATCH 1/2] rt: Don't call schedule_work_on in preemption disabled context Yang Shi 2013-10-04 15:46 ` Sebastian Andrzej Siewior 2013-10-04 16:36 ` Yang Shi 2013-10-04 17:10 ` Sebastian Andrzej Siewior 2013-10-04 17:49 ` Yang Shi 2013-10-04 17:56 ` Sebastian Andrzej Siewior 2013-09-16 21:09 ` [PATCH 2/2] rt: Move schedule_work call to helper thread Yang Shi 2013-10-04 17:11 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).