* [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads @ 2014-02-28 6:23 Mike Galbraith 2014-02-28 11:32 ` Peter Zijlstra 2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith 0 siblings, 2 replies; 11+ messages in thread From: Mike Galbraith @ 2014-02-28 6:23 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Mel Gorman, LKML, RT Bad idea: [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf Signed-off-by: Mike Galbraith <bitbucket@online.de> diff --git a/kernel/fork.c b/kernel/fork.c index a17621c6cd42..332688e5e7b4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(atomic_read(&tsk->usage)); WARN_ON(tsk == current); + task_numa_free(tsk); security_task_free(tsk); exit_creds(tsk); delayacct_tsk_free(tsk); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 6edbef296ece..94a963fd040e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2149,8 +2149,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev) if (mm) mmdrop(mm); if (unlikely(prev_state == TASK_DEAD)) { - task_numa_free(prev); - if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads 2014-02-28 6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith @ 2014-02-28 11:32 ` Peter Zijlstra 2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith 1 sibling, 0 replies; 11+ messages in thread From: Peter Zijlstra @ 2014-02-28 11:32 UTC (permalink / raw) To: Mike Galbraith; +Cc: Mel Gorman, LKML, RT On Fri, Feb 28, 2014 at 07:23:11AM +0100, Mike Galbraith wrote: > > Bad idea: > [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 > [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 > [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 > [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae > [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 > [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 > [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 > [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 > [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 > [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 > [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf > > Signed-off-by: Mike Galbraith <bitbucket@online.de> Fair enough; thanks! ^ permalink raw reply [flat|nested] 11+ messages in thread
* [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-02-28 6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith 2014-02-28 11:32 ` Peter Zijlstra @ 2014-03-11 12:40 ` tip-bot for Mike Galbraith 2014-04-06 19:17 ` Sasha Levin 1 sibling, 1 reply; 11+ messages in thread From: tip-bot for Mike Galbraith @ 2014-03-11 12:40 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, torvalds, peterz, bitbucket, mgorman, akpm, tglx Commit-ID: 156654f491dd8d52687a5fbe1637f472a52ce75b Gitweb: http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b Author: Mike Galbraith <bitbucket@online.de> AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 11 Mar 2014 12:05:43 +0100 sched/numa: Move task_numa_free() to __put_task_struct() Bad idea on -rt: [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf and since upstream does not mind where we do this, be a bit nicer ... Signed-off-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/fork.c | 1 + kernel/sched/core.c | 2 -- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index a17621c..332688e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(atomic_read(&tsk->usage)); WARN_ON(tsk == current); + task_numa_free(tsk); security_task_free(tsk); exit_creds(tsk); delayacct_tsk_free(tsk); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dd89c27..9e126a2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2151,8 +2151,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev) if (mm) mmdrop(mm); if (unlikely(prev_state == TASK_DEAD)) { - task_numa_free(prev); - if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith @ 2014-04-06 19:17 ` Sasha Levin 2014-04-07 5:29 ` Mike Galbraith 0 siblings, 1 reply; 11+ messages in thread From: Sasha Levin @ 2014-04-06 19:17 UTC (permalink / raw) To: mingo, hpa, linux-kernel, torvalds, peterz, bitbucket, mgorman, akpm, tglx, linux-tip-commits Cc: Dave Jones On 03/11/2014 08:40 AM, tip-bot for Mike Galbraith wrote: > Commit-ID: 156654f491dd8d52687a5fbe1637f472a52ce75b > Gitweb: http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b > Author: Mike Galbraith <bitbucket@online.de> > AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100 > Committer: Ingo Molnar <mingo@kernel.org> > CommitDate: Tue, 11 Mar 2014 12:05:43 +0100 > > sched/numa: Move task_numa_free() to __put_task_struct() > > Bad idea on -rt: > > [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 > [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 > [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 > [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae > [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 > [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 > [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 > [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 > [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 > [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 > [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf > > and since upstream does not mind where we do this, be a bit nicer ... > > Signed-off-by: Mike Galbraith <bitbucket@online.de> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > Cc: Mel Gorman <mgorman@suse.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net > Signed-off-by: Ingo Molnar <mingo@kernel.org> As it seems, upstream does mind: [ 2590.260734] ====================================================== [ 2590.261695] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] [ 2590.262748] 3.14.0-next-20140403-sasha-00022-g10224c0 #377 Tainted: G W [ 2590.263846] ------------------------------------------------------ [ 2590.264730] trinity-c244/1210 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 2590.265783] (&(&grp->lock)->rlock){+.+...}, at: task_numa_free (kernel/sched/fair.c:1714) [ 2590.267179] [ 2590.267179] and this task is already holding: [ 2590.267996] (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998) [ 2590.269381] which would create a new lock dependency: [ 2590.270067] (&(&new_timer->it_lock)->rlock){-.....} -> (&(&grp->lock)->rlock){+.+...} [ 2590.270067] [ 2590.270067] but this new dependency connects a HARDIRQ-irq-safe lock: [ 2590.270067] (&(&new_timer->it_lock)->rlock){-.....} ... which became HARDIRQ-irq-safe at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2783 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159) [ 2590.270067] posix_timer_fn (kernel/posix-timers.c:437) [ 2590.270067] __run_hrtimer (kernel/hrtimer.c:1245 (discriminator 2)) [ 2590.270067] hrtimer_interrupt (kernel/hrtimer.c:1892) [ 2590.270067] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921) [ 2590.270067] smp_apic_timer_interrupt (arch/x86/include/asm/apic.h:696 arch/x86/kernel/apic/apic.c:945) [ 2590.270067] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1164) [ 2590.270067] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310) [ 2590.270067] arch_cpu_idle (arch/x86/kernel/process.c:302) [ 2590.270067] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226) [ 2590.270067] cpu_startup_entry (??:?) [ 2590.270067] start_secondary (arch/x86/kernel/smpboot.c:267) [ 2590.270067] [ 2590.270067] to a HARDIRQ-irq-unsafe lock: [ 2590.270067] (&(&grp->lock)->rlock){+.+...} ... which became HARDIRQ-irq-unsafe at: [ 2590.270067] ... __lock_acquire (kernel/locking/lockdep.c:2800 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504) [ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794) [ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909) [ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935) [ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220) [ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273) [ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263) [ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496) [ 2590.270067] [ 2590.270067] other info that might help us debug this: [ 2590.270067] [ 2590.270067] Possible interrupt unsafe locking scenario: [ 2590.270067] [ 2590.270067] CPU0 CPU1 [ 2590.270067] ---- ---- [ 2590.270067] lock(&(&grp->lock)->rlock); [ 2590.270067] local_irq_disable(); [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); [ 2590.270067] lock(&(&grp->lock)->rlock); [ 2590.270067] <Interrupt> [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); [ 2590.270067] [ 2590.270067] *** DEADLOCK *** [ 2590.270067] [ 2590.270067] 1 lock held by trinity-c244/1210: [ 2590.270067] #0: (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998) [ 2590.270067] the dependencies between HARDIRQ-irq-safe lock and the holding lock: [ 2590.270067] -> (&(&new_timer->it_lock)->rlock){-.....} ops: 361 { [ 2590.270067] IN-HARDIRQ-W at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2783 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159) [ 2590.270067] posix_timer_fn (kernel/posix-timers.c:437) [ 2590.270067] __run_hrtimer (kernel/hrtimer.c:1245 (discriminator 2)) [ 2590.270067] hrtimer_interrupt (kernel/hrtimer.c:1892) [ 2590.270067] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921) [ 2590.270067] smp_apic_timer_interrupt (arch/x86/include/asm/apic.h:696 arch/x86/kernel/apic/apic.c:945) [ 2590.270067] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1164) [ 2590.270067] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310) [ 2590.270067] arch_cpu_idle (arch/x86/kernel/process.c:302) [ 2590.270067] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226) [ 2590.270067] cpu_startup_entry (??:?) [ 2590.270067] start_secondary (arch/x86/kernel/smpboot.c:267) [ 2590.270067] INITIAL USE at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:3142) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159) [ 2590.270067] exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998) [ 2590.270067] do_exit (kernel/exit.c:766) [ 2590.270067] do_group_exit (kernel/exit.c:919) [ 2590.270067] SyS_exit_group (kernel/exit.c:930) [ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749) [ 2590.270067] } [ 2590.270067] ... key at: __key.33130 (??:?) [ 2590.270067] ... acquired at: [ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638) [ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2)) [ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409) [ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998) [ 2590.270067] do_exit (kernel/exit.c:766) [ 2590.270067] do_group_exit (kernel/exit.c:919) [ 2590.270067] SyS_exit_group (kernel/exit.c:930) [ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749) [ 2590.270067] [ 2590.270067] the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock: [ 2590.270067] -> (&(&grp->lock)->rlock){+.+...} ops: 91 { [ 2590.270067] HARDIRQ-ON-W at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2800 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504) [ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794) [ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909) [ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935) [ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220) [ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273) [ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263) [ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496) [ 2590.270067] SOFTIRQ-ON-W at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2804 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504) [ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794) [ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909) [ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935) [ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220) [ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273) [ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263) [ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496) [ 2590.270067] INITIAL USE at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:3142) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504) [ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794) [ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909) [ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935) [ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220) [ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273) [ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263) [ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496) [ 2590.270067] } [ 2590.270067] ... key at: __key.32449 (??:?) [ 2590.270067] ... acquired at: [ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638) [ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2)) [ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409) [ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998) [ 2590.270067] do_exit (kernel/exit.c:766) [ 2590.270067] do_group_exit (kernel/exit.c:919) [ 2590.270067] SyS_exit_group (kernel/exit.c:930) [ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749) [ 2590.270067] [ 2590.270067] [ 2590.270067] stack backtrace: [ 2590.270067] CPU: 3 PID: 1210 Comm: trinity-c244 Tainted: G W 3.14.0-next-20140403-sasha-00022-g10224c0 #377 [ 2590.270067] ffffffff87a83b60 ffff880081695ad8 ffffffff844bfb3f 0000000000000000 [ 2590.270067] ffff880081698cf0 ffff880081695be8 ffffffff811c0d05 0000000000000000 [ 2590.270067] ffffffff00000000 ffff880000000001 ffffffff8107aac5 ffff880081695b38 [ 2590.270067] Call Trace: [ 2590.270067] dump_stack (lib/dump_stack.c:52) [ 2590.270067] check_usage (kernel/locking/lockdep.c:1549 kernel/locking/lockdep.c:1580) [ 2590.270067] ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305) [ 2590.270067] ? sched_clock_cpu (kernel/sched/clock.c:311) [ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638) [ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) [ 2590.270067] ? __lock_acquire (kernel/locking/lockdep.c:3189) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] ? task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] ? task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2638 (discriminator 2)) [ 2590.270067] task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2)) [ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409) [ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998) [ 2590.270067] do_exit (kernel/exit.c:766) [ 2590.270067] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 2590.270067] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599) [ 2590.270067] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) [ 2590.270067] do_group_exit (kernel/exit.c:919) [ 2590.270067] SyS_exit_group (kernel/exit.c:930) [ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749) Thanks, Sasha ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-06 19:17 ` Sasha Levin @ 2014-04-07 5:29 ` Mike Galbraith 2014-04-07 7:30 ` Mike Galbraith 0 siblings, 1 reply; 11+ messages in thread From: Mike Galbraith @ 2014-04-07 5:29 UTC (permalink / raw) To: Sasha Levin Cc: mingo, hpa, linux-kernel, torvalds, peterz, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Sun, 2014-04-06 at 15:17 -0400, Sasha Levin wrote: > On 03/11/2014 08:40 AM, tip-bot for Mike Galbraith wrote: > > Commit-ID: 156654f491dd8d52687a5fbe1637f472a52ce75b > > Gitweb: http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b > > Author: Mike Galbraith <bitbucket@online.de> > > AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100 > > Committer: Ingo Molnar <mingo@kernel.org> > > CommitDate: Tue, 11 Mar 2014 12:05:43 +0100 > > > > sched/numa: Move task_numa_free() to __put_task_struct() > > > > Bad idea on -rt: > > > > [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 > > [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 > > [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 > > [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae > > [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 > > [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 > > [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 > > [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 > > [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 > > [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 > > [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf > > > > and since upstream does not mind where we do this, be a bit nicer ... > > > > Signed-off-by: Mike Galbraith <bitbucket@online.de> > > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > > Cc: Mel Gorman <mgorman@suse.com> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net > > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > As it seems, upstream does mind: > > [ 2590.260734] ====================================================== > [ 2590.261695] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] > [ 2590.262748] 3.14.0-next-20140403-sasha-00022-g10224c0 #377 Tainted: G W > [ 2590.263846] ------------------------------------------------------ > [ 2590.264730] trinity-c244/1210 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > [ 2590.265783] (&(&grp->lock)->rlock){+.+...}, at: task_numa_free (kernel/sched/fair.c:1714) > [ 2590.267179] > [ 2590.267179] and this task is already holding: > [ 2590.267996] (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998) > [ 2590.269381] which would create a new lock dependency: > [ 2590.270067] (&(&new_timer->it_lock)->rlock){-.....} -> (&(&grp->lock)->rlock){+.+...} I'm not getting it. I moved task_numa_free() from one interrupts enabled spot to another. But, with numa=fake=4 and lockdep enabled, not only does lockdep gripe, my little box locks up on splat. Saying spin_lock/unlock_irq() did the expected, just moved lockdep gripe to task_numa_fault(). > [ 2590.270067] Possible interrupt unsafe locking scenario: > [ 2590.270067] > [ 2590.270067] CPU0 CPU1 > [ 2590.270067] ---- ---- > [ 2590.270067] lock(&(&grp->lock)->rlock); > [ 2590.270067] local_irq_disable(); > [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); > [ 2590.270067] lock(&(&grp->lock)->rlock); > [ 2590.270067] <Interrupt> > [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); > [ 2590.270067] > [ 2590.270067] *** DEADLOCK *** Ok, so how did I manage that HARDIRQ-safe -> HARDIRQ-unsafe? -Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 5:29 ` Mike Galbraith @ 2014-04-07 7:30 ` Mike Galbraith 2014-04-07 8:16 ` Peter Zijlstra 0 siblings, 1 reply; 11+ messages in thread From: Mike Galbraith @ 2014-04-07 7:30 UTC (permalink / raw) To: Sasha Levin Cc: mingo, hpa, linux-kernel, torvalds, peterz, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, 2014-04-07 at 07:29 +0200, Mike Galbraith wrote: > I'm not getting it. > > I moved task_numa_free() from one interrupts enabled spot to another. > But, with numa=fake=4 and lockdep enabled, not only does lockdep gripe, > my little box locks up on splat. Saying spin_lock/unlock_irq() did the > expected, just moved lockdep gripe to task_numa_fault(). > > > > [ 2590.270067] Possible interrupt unsafe locking scenario: > > [ 2590.270067] > > [ 2590.270067] CPU0 CPU1 > > [ 2590.270067] ---- ---- > > [ 2590.270067] lock(&(&grp->lock)->rlock); > > [ 2590.270067] local_irq_disable(); > > [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); > > [ 2590.270067] lock(&(&grp->lock)->rlock); > > [ 2590.270067] <Interrupt> > > [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); > > [ 2590.270067] > > [ 2590.270067] *** DEADLOCK *** > > Ok, so how did I manage that HARDIRQ-safe -> HARDIRQ-unsafe? Think I'll turn lockdep off, and make context switches take a good long while after finish_lock_switch(), but meanwhile, this made it happy. Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made numa_group.lock interrupt unsafe. While I don't see how that could be given the commit in question moved task_numa_free() from one irq enabled region to another, the below does make both gripes and lockup upon gripe with numa=fake=4 go away. Reported-by: Sasha Levin <sasha.levin@oracle.com> Not-signed-off-by: Mike Galbraith <bitbucket@online.de> --- kernel/sched/fair.c | 12 +++++++----- kernel/sched/sched.h | 9 +++++++++ 2 files changed, 16 insertions(+), 5 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t /* If the task is part of a group prevent parallel updates to group stats */ if (p->numa_group) { group_lock = &p->numa_group->lock; - spin_lock(group_lock); + spin_lock_irq(group_lock); } /* Find the node with the highest number of faults */ @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t } } - spin_unlock(group_lock); + spin_unlock_irq(group_lock); } /* Preferred node as the node with the most faults */ @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_ if (!join) return; - double_lock(&my_grp->lock, &grp->lock); + BUG_ON(irqs_disabled()); + double_lock_irq(&my_grp->lock, &grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { my_grp->faults[i] -= p->numa_faults_memory[i]; @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ spin_unlock(&my_grp->lock); spin_unlock(&grp->lock); + local_irq_enable(); rcu_assign_pointer(p->numa_group, grp); @@ -1710,14 +1712,14 @@ void task_numa_free(struct task_struct * void *numa_faults = p->numa_faults_memory; if (grp) { - spin_lock(&grp->lock); + spin_lock_irq(&grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) grp->faults[i] -= p->numa_faults_memory[i]; grp->total_faults -= p->total_numa_faults; list_del(&p->numa_entry); grp->nr_tasks--; - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, NULL); put_numa_group(grp); } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_ spin_lock_nested(l2, SINGLE_DEPTH_NESTING); } +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2) +{ + if (l1 > l2) + swap(l1, l2); + + spin_lock_irq(l1); + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); +} + static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) { if (l1 > l2) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 7:30 ` Mike Galbraith @ 2014-04-07 8:16 ` Peter Zijlstra 2014-04-07 8:40 ` Mike Galbraith 2014-04-07 8:55 ` Mike Galbraith 0 siblings, 2 replies; 11+ messages in thread From: Peter Zijlstra @ 2014-04-07 8:16 UTC (permalink / raw) To: Mike Galbraith Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote: > - double_lock(&my_grp->lock, &grp->lock); > + BUG_ON(irqs_disabled()); > + double_lock_irq(&my_grp->lock, &grp->lock); So either make this: local_irq_disable(); double_lock(); or > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { > my_grp->faults[i] -= p->numa_faults_memory[i]; > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ > > spin_unlock(&my_grp->lock); > spin_unlock(&grp->lock); > + local_irq_enable(); use: spin_unlock() spin_unlock_irq() or so, but this imbalance is making my itch :-) > > rcu_assign_pointer(p->numa_group, grp); > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 8:16 ` Peter Zijlstra @ 2014-04-07 8:40 ` Mike Galbraith 2014-04-07 8:55 ` Mike Galbraith 1 sibling, 0 replies; 11+ messages in thread From: Mike Galbraith @ 2014-04-07 8:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: > On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote: > > - double_lock(&my_grp->lock, &grp->lock); > > + BUG_ON(irqs_disabled()); > > + double_lock_irq(&my_grp->lock, &grp->lock); > > So either make this: > > local_irq_disable(); > double_lock(); > > or > > > > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { > > my_grp->faults[i] -= p->numa_faults_memory[i]; > > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ > > > > spin_unlock(&my_grp->lock); > > spin_unlock(&grp->lock); > > + local_irq_enable(); > > use: > spin_unlock() > spin_unlock_irq() *thwap* Well duh. > or so, but this imbalance is making my itch :-) Yeah, much better. Before I actually sign that off, mind cluing me in as to why I should not be sitting here thinking lockdep smoked its breakfast? -Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 8:16 ` Peter Zijlstra 2014-04-07 8:40 ` Mike Galbraith @ 2014-04-07 8:55 ` Mike Galbraith 2014-04-13 20:53 ` Govindarajulu Varadarajan 2014-04-14 7:22 ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith 1 sibling, 2 replies; 11+ messages in thread From: Mike Galbraith @ 2014-04-07 8:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: > On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote: > > - double_lock(&my_grp->lock, &grp->lock); > > + BUG_ON(irqs_disabled()); > > + double_lock_irq(&my_grp->lock, &grp->lock); > > So either make this: > > local_irq_disable(); > double_lock(); > > or > > > > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { > > my_grp->faults[i] -= p->numa_faults_memory[i]; > > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ > > > > spin_unlock(&my_grp->lock); > > spin_unlock(&grp->lock); > > + local_irq_enable(); > > use: > spin_unlock() > spin_unlock_irq() > > or so, but this imbalance is making my itch :-) sched, numa: fix task_numa_free() lockdep splat Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made numa_group.lock interrupt unsafe. While I don't see how that could be given the commit in question moved task_numa_free() from one irq enabled region to another, the below does make both gripes and lockups upon gripe with numa=fake=4 go away. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Mike Galbraith <bitbucket@online.de> --- kernel/sched/fair.c | 13 +++++++------ kernel/sched/sched.h | 9 +++++++++ 2 files changed, 16 insertions(+), 6 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t /* If the task is part of a group prevent parallel updates to group stats */ if (p->numa_group) { group_lock = &p->numa_group->lock; - spin_lock(group_lock); + spin_lock_irq(group_lock); } /* Find the node with the highest number of faults */ @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t } } - spin_unlock(group_lock); + spin_unlock_irq(group_lock); } /* Preferred node as the node with the most faults */ @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_ if (!join) return; - double_lock(&my_grp->lock, &grp->lock); + BUG_ON(irqs_disabled()); + double_lock_irq(&my_grp->lock, &grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { my_grp->faults[i] -= p->numa_faults_memory[i]; @@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_ grp->nr_tasks++; spin_unlock(&my_grp->lock); - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, grp); @@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct * void *numa_faults = p->numa_faults_memory; if (grp) { - spin_lock(&grp->lock); + spin_lock_irq(&grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) grp->faults[i] -= p->numa_faults_memory[i]; grp->total_faults -= p->total_numa_faults; list_del(&p->numa_entry); grp->nr_tasks--; - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, NULL); put_numa_group(grp); } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_ spin_lock_nested(l2, SINGLE_DEPTH_NESTING); } +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2) +{ + if (l1 > l2) + swap(l1, l2); + + spin_lock_irq(l1); + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); +} + static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) { if (l1 > l2) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 8:55 ` Mike Galbraith @ 2014-04-13 20:53 ` Govindarajulu Varadarajan 2014-04-14 7:22 ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith 1 sibling, 0 replies; 11+ messages in thread From: Govindarajulu Varadarajan @ 2014-04-13 20:53 UTC (permalink / raw) To: Mike Galbraith Cc: Peter Zijlstra, Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, 7 Apr 2014, Mike Galbraith wrote: > On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: >> On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote: >>> - double_lock(&my_grp->lock, &grp->lock); >>> + BUG_ON(irqs_disabled()); >>> + double_lock_irq(&my_grp->lock, &grp->lock); >> >> So either make this: >> >> local_irq_disable(); >> double_lock(); >> >> or >> >>> >>> for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { >>> my_grp->faults[i] -= p->numa_faults_memory[i]; >>> @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ >>> >>> spin_unlock(&my_grp->lock); >>> spin_unlock(&grp->lock); >>> + local_irq_enable(); >> >> use: >> spin_unlock() >> spin_unlock_irq() >> >> or so, but this imbalance is making my itch :-) > > sched, numa: fix task_numa_free() lockdep splat > > Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made > numa_group.lock interrupt unsafe. While I don't see how that could be given the > commit in question moved task_numa_free() from one irq enabled region to another, > the below does make both gripes and lockups upon gripe with numa=fake=4 go away. > Hi I Am hitting this bug quite frequently. I do not see the problem after applying this patch. Thanks Tested-by: Govindarajulu Varadarajan <govindx7c6@gmail.com> > Reported-by: Sasha Levin <sasha.levin@oracle.com> > Signed-off-by: Mike Galbraith <bitbucket@online.de> > --- > kernel/sched/fair.c | 13 +++++++------ > kernel/sched/sched.h | 9 +++++++++ > 2 files changed, 16 insertions(+), 6 deletions(-) > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t > /* If the task is part of a group prevent parallel updates to group stats */ > if (p->numa_group) { > group_lock = &p->numa_group->lock; > - spin_lock(group_lock); > + spin_lock_irq(group_lock); > } > > /* Find the node with the highest number of faults */ > @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t > } > } > > - spin_unlock(group_lock); > + spin_unlock_irq(group_lock); > } > > /* Preferred node as the node with the most faults */ > @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_ > if (!join) > return; > > - double_lock(&my_grp->lock, &grp->lock); > + BUG_ON(irqs_disabled()); > + double_lock_irq(&my_grp->lock, &grp->lock); > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { > my_grp->faults[i] -= p->numa_faults_memory[i]; > @@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_ > grp->nr_tasks++; > > spin_unlock(&my_grp->lock); > - spin_unlock(&grp->lock); > + spin_unlock_irq(&grp->lock); > > rcu_assign_pointer(p->numa_group, grp); > > @@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct * > void *numa_faults = p->numa_faults_memory; > > if (grp) { > - spin_lock(&grp->lock); > + spin_lock_irq(&grp->lock); > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) > grp->faults[i] -= p->numa_faults_memory[i]; > grp->total_faults -= p->total_numa_faults; > > list_del(&p->numa_entry); > grp->nr_tasks--; > - spin_unlock(&grp->lock); > + spin_unlock_irq(&grp->lock); > rcu_assign_pointer(p->numa_group, NULL); > put_numa_group(grp); > } > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_ > spin_lock_nested(l2, SINGLE_DEPTH_NESTING); > } > > +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2) > +{ > + if (l1 > l2) > + swap(l1, l2); > + > + spin_lock_irq(l1); > + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); > +} > + > static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) > { > if (l1 > l2) > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat 2014-04-07 8:55 ` Mike Galbraith 2014-04-13 20:53 ` Govindarajulu Varadarajan @ 2014-04-14 7:22 ` tip-bot for Mike Galbraith 1 sibling, 0 replies; 11+ messages in thread From: tip-bot for Mike Galbraith @ 2014-04-14 7:22 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, sasha.levin, hpa, mingo, peterz, bitbucket, davej, tglx Commit-ID: 60e69eed85bb7b5198ef70643b5895c26ad76ef7 Gitweb: http://git.kernel.org/tip/60e69eed85bb7b5198ef70643b5895c26ad76ef7 Author: Mike Galbraith <bitbucket@online.de> AuthorDate: Mon, 7 Apr 2014 10:55:15 +0200 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 11 Apr 2014 10:39:15 +0200 sched/numa: Fix task_numa_free() lockdep splat Sasha reported that lockdep claims that the following commit: made numa_group.lock interrupt unsafe: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()") While I don't see how that could be, given the commit in question moved task_numa_free() from one irq enabled region to another, the below does make both gripes and lockups upon gripe with numa=fake=4 go away. Reported-by: Sasha Levin <sasha.levin@oracle.com> Fixes: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()") Signed-off-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Cc: mgorman@suse.com Cc: akpm@linux-foundation.org Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 13 +++++++------ kernel/sched/sched.h | 9 +++++++++ 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7e9bd0b..4f14a65 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct task_struct *p) /* If the task is part of a group prevent parallel updates to group stats */ if (p->numa_group) { group_lock = &p->numa_group->lock; - spin_lock(group_lock); + spin_lock_irq(group_lock); } /* Find the node with the highest number of faults */ @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct task_struct *p) } } - spin_unlock(group_lock); + spin_unlock_irq(group_lock); } /* Preferred node as the node with the most faults */ @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags, if (!join) return; - double_lock(&my_grp->lock, &grp->lock); + BUG_ON(irqs_disabled()); + double_lock_irq(&my_grp->lock, &grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { my_grp->faults[i] -= p->numa_faults_memory[i]; @@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags, grp->nr_tasks++; spin_unlock(&my_grp->lock); - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, grp); @@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct *p) void *numa_faults = p->numa_faults_memory; if (grp) { - spin_lock(&grp->lock); + spin_lock_irq(&grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) grp->faults[i] -= p->numa_faults_memory[i]; grp->total_faults -= p->total_numa_faults; list_del(&p->numa_entry); grp->nr_tasks--; - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, NULL); put_numa_group(grp); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index c9007f2..456e492 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1385,6 +1385,15 @@ static inline void double_lock(spinlock_t *l1, spinlock_t *l2) spin_lock_nested(l2, SINGLE_DEPTH_NESTING); } +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2) +{ + if (l1 > l2) + swap(l1, l2); + + spin_lock_irq(l1); + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); +} + static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) { if (l1 > l2) ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-04-14 7:23 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-28 6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith 2014-02-28 11:32 ` Peter Zijlstra 2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith 2014-04-06 19:17 ` Sasha Levin 2014-04-07 5:29 ` Mike Galbraith 2014-04-07 7:30 ` Mike Galbraith 2014-04-07 8:16 ` Peter Zijlstra 2014-04-07 8:40 ` Mike Galbraith 2014-04-07 8:55 ` Mike Galbraith 2014-04-13 20:53 ` Govindarajulu Varadarajan 2014-04-14 7:22 ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).