linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads
@ 2014-02-28  6:23 Mike Galbraith
  2014-02-28 11:32 ` Peter Zijlstra
  2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Galbraith @ 2014-02-28  6:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mel Gorman, LKML, RT


Bad idea:
[  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
[  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
[  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
[  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
[  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
[  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
[  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
[  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
[  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
[  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
[  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf

Signed-off-by: Mike Galbraith <bitbucket@online.de>

diff --git a/kernel/fork.c b/kernel/fork.c
index a17621c6cd42..332688e5e7b4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(atomic_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	task_numa_free(tsk);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6edbef296ece..94a963fd040e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2149,8 +2149,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
-		task_numa_free(prev);
-
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
 



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads
  2014-02-28  6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith
@ 2014-02-28 11:32 ` Peter Zijlstra
  2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2014-02-28 11:32 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Mel Gorman, LKML, RT

On Fri, Feb 28, 2014 at 07:23:11AM +0100, Mike Galbraith wrote:
> 
> Bad idea:
> [  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
> [  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
> [  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
> [  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
> [  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
> [  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
> [  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
> [  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
> [  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
> [  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
> [  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf
> 
> Signed-off-by: Mike Galbraith <bitbucket@online.de>

Fair enough; thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-02-28  6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith
  2014-02-28 11:32 ` Peter Zijlstra
@ 2014-03-11 12:40 ` tip-bot for Mike Galbraith
  2014-04-06 19:17   ` Sasha Levin
  1 sibling, 1 reply; 11+ messages in thread
From: tip-bot for Mike Galbraith @ 2014-03-11 12:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, bitbucket, mgorman,
	akpm, tglx

Commit-ID:  156654f491dd8d52687a5fbe1637f472a52ce75b
Gitweb:     http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b
Author:     Mike Galbraith <bitbucket@online.de>
AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 11 Mar 2014 12:05:43 +0100

sched/numa: Move task_numa_free() to __put_task_struct()

Bad idea on -rt:

[  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
[  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
[  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
[  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
[  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
[  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
[  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
[  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
[  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
[  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
[  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf

and since upstream does not mind where we do this, be a bit nicer ...

Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/fork.c       | 1 +
 kernel/sched/core.c | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index a17621c..332688e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(atomic_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	task_numa_free(tsk);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd89c27..9e126a2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2151,8 +2151,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
-		task_numa_free(prev);
-
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith
@ 2014-04-06 19:17   ` Sasha Levin
  2014-04-07  5:29     ` Mike Galbraith
  0 siblings, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2014-04-06 19:17 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, torvalds, peterz, bitbucket, mgorman,
	akpm, tglx, linux-tip-commits
  Cc: Dave Jones

On 03/11/2014 08:40 AM, tip-bot for Mike Galbraith wrote:
> Commit-ID:  156654f491dd8d52687a5fbe1637f472a52ce75b
> Gitweb:     http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b
> Author:     Mike Galbraith <bitbucket@online.de>
> AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 11 Mar 2014 12:05:43 +0100
> 
> sched/numa: Move task_numa_free() to __put_task_struct()
> 
> Bad idea on -rt:
> 
> [  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
> [  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
> [  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
> [  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
> [  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
> [  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
> [  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
> [  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
> [  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
> [  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
> [  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf
> 
> and since upstream does not mind where we do this, be a bit nicer ...
> 
> Signed-off-by: Mike Galbraith <bitbucket@online.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Mel Gorman <mgorman@suse.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

As it seems, upstream does mind:

[ 2590.260734] ======================================================
[ 2590.261695] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
[ 2590.262748] 3.14.0-next-20140403-sasha-00022-g10224c0 #377 Tainted: G        W
[ 2590.263846] ------------------------------------------------------
[ 2590.264730] trinity-c244/1210 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 2590.265783] (&(&grp->lock)->rlock){+.+...}, at: task_numa_free (kernel/sched/fair.c:1714)
[ 2590.267179]
[ 2590.267179] and this task is already holding:
[ 2590.267996] (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998)
[ 2590.269381] which would create a new lock dependency:
[ 2590.270067]  (&(&new_timer->it_lock)->rlock){-.....} -> (&(&grp->lock)->rlock){+.+...}
[ 2590.270067]
[ 2590.270067] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 2590.270067]  (&(&new_timer->it_lock)->rlock){-.....}
... which became HARDIRQ-irq-safe at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2783 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 2590.270067] posix_timer_fn (kernel/posix-timers.c:437)
[ 2590.270067] __run_hrtimer (kernel/hrtimer.c:1245 (discriminator 2))
[ 2590.270067] hrtimer_interrupt (kernel/hrtimer.c:1892)
[ 2590.270067] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921)
[ 2590.270067] smp_apic_timer_interrupt (arch/x86/include/asm/apic.h:696 arch/x86/kernel/apic/apic.c:945)
[ 2590.270067] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1164)
[ 2590.270067] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310)
[ 2590.270067] arch_cpu_idle (arch/x86/kernel/process.c:302)
[ 2590.270067] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226)
[ 2590.270067] cpu_startup_entry (??:?)
[ 2590.270067] start_secondary (arch/x86/kernel/smpboot.c:267)
[ 2590.270067]
[ 2590.270067] to a HARDIRQ-irq-unsafe lock:
[ 2590.270067]  (&(&grp->lock)->rlock){+.+...}
... which became HARDIRQ-irq-unsafe at:
[ 2590.270067] ... __lock_acquire (kernel/locking/lockdep.c:2800 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504)
[ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794)
[ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909)
[ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935)
[ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220)
[ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
[ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263)
[ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496)
[ 2590.270067]
[ 2590.270067] other info that might help us debug this:
[ 2590.270067]
[ 2590.270067]  Possible interrupt unsafe locking scenario:
[ 2590.270067]
[ 2590.270067]        CPU0                    CPU1
[ 2590.270067]        ----                    ----
[ 2590.270067]   lock(&(&grp->lock)->rlock);
[ 2590.270067]                                local_irq_disable();
[ 2590.270067]                                lock(&(&new_timer->it_lock)->rlock);
[ 2590.270067]                                lock(&(&grp->lock)->rlock);
[ 2590.270067]   <Interrupt>
[ 2590.270067]     lock(&(&new_timer->it_lock)->rlock);
[ 2590.270067]
[ 2590.270067]  *** DEADLOCK ***
[ 2590.270067]
[ 2590.270067] 1 lock held by trinity-c244/1210:
[ 2590.270067] #0: (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998)
[ 2590.270067]
the dependencies between HARDIRQ-irq-safe lock and the holding lock:
[ 2590.270067] -> (&(&new_timer->it_lock)->rlock){-.....} ops: 361 {
[ 2590.270067]    IN-HARDIRQ-W at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2783 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 2590.270067] posix_timer_fn (kernel/posix-timers.c:437)
[ 2590.270067] __run_hrtimer (kernel/hrtimer.c:1245 (discriminator 2))
[ 2590.270067] hrtimer_interrupt (kernel/hrtimer.c:1892)
[ 2590.270067] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921)
[ 2590.270067] smp_apic_timer_interrupt (arch/x86/include/asm/apic.h:696 arch/x86/kernel/apic/apic.c:945)
[ 2590.270067] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1164)
[ 2590.270067] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310)
[ 2590.270067] arch_cpu_idle (arch/x86/kernel/process.c:302)
[ 2590.270067] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226)
[ 2590.270067] cpu_startup_entry (??:?)
[ 2590.270067] start_secondary (arch/x86/kernel/smpboot.c:267)
[ 2590.270067]    INITIAL USE at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:3142)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 2590.270067] exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998)
[ 2590.270067] do_exit (kernel/exit.c:766)
[ 2590.270067] do_group_exit (kernel/exit.c:919)
[ 2590.270067] SyS_exit_group (kernel/exit.c:930)
[ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749)
[ 2590.270067]  }
[ 2590.270067] ... key at: __key.33130 (??:?)
[ 2590.270067]  ... acquired at:
[ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638)
[ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2))
[ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409)
[ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998)
[ 2590.270067] do_exit (kernel/exit.c:766)
[ 2590.270067] do_group_exit (kernel/exit.c:919)
[ 2590.270067] SyS_exit_group (kernel/exit.c:930)
[ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749)
[ 2590.270067]
[ 2590.270067]
the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock:
[ 2590.270067] -> (&(&grp->lock)->rlock){+.+...} ops: 91 {
[ 2590.270067]    HARDIRQ-ON-W at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2800 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504)
[ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794)
[ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909)
[ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935)
[ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220)
[ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
[ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263)
[ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496)
[ 2590.270067]    SOFTIRQ-ON-W at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2804 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504)
[ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794)
[ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909)
[ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935)
[ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220)
[ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
[ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263)
[ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496)
[ 2590.270067]    INITIAL USE at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:3142)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504)
[ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794)
[ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909)
[ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935)
[ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220)
[ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
[ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263)
[ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496)
[ 2590.270067]  }
[ 2590.270067] ... key at: __key.32449 (??:?)
[ 2590.270067]  ... acquired at:
[ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638)
[ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2))
[ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409)
[ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998)
[ 2590.270067] do_exit (kernel/exit.c:766)
[ 2590.270067] do_group_exit (kernel/exit.c:919)
[ 2590.270067] SyS_exit_group (kernel/exit.c:930)
[ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749)
[ 2590.270067]
[ 2590.270067]
[ 2590.270067] stack backtrace:
[ 2590.270067] CPU: 3 PID: 1210 Comm: trinity-c244 Tainted: G        W     3.14.0-next-20140403-sasha-00022-g10224c0 #377
[ 2590.270067]  ffffffff87a83b60 ffff880081695ad8 ffffffff844bfb3f 0000000000000000
[ 2590.270067]  ffff880081698cf0 ffff880081695be8 ffffffff811c0d05 0000000000000000
[ 2590.270067]  ffffffff00000000 ffff880000000001 ffffffff8107aac5 ffff880081695b38
[ 2590.270067] Call Trace:
[ 2590.270067] dump_stack (lib/dump_stack.c:52)
[ 2590.270067] check_usage (kernel/locking/lockdep.c:1549 kernel/locking/lockdep.c:1580)
[ 2590.270067] ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[ 2590.270067] ? sched_clock_cpu (kernel/sched/clock.c:311)
[ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638)
[ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 2590.270067] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] ? task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] ? task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2638 (discriminator 2))
[ 2590.270067] task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2))
[ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409)
[ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998)
[ 2590.270067] do_exit (kernel/exit.c:766)
[ 2590.270067] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 2590.270067] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
[ 2590.270067] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[ 2590.270067] do_group_exit (kernel/exit.c:919)
[ 2590.270067] SyS_exit_group (kernel/exit.c:930)
[ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749)


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-06 19:17   ` Sasha Levin
@ 2014-04-07  5:29     ` Mike Galbraith
  2014-04-07  7:30       ` Mike Galbraith
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2014-04-07  5:29 UTC (permalink / raw)
  To: Sasha Levin
  Cc: mingo, hpa, linux-kernel, torvalds, peterz, mgorman, akpm, tglx,
	linux-tip-commits, Dave Jones

On Sun, 2014-04-06 at 15:17 -0400, Sasha Levin wrote: 
> On 03/11/2014 08:40 AM, tip-bot for Mike Galbraith wrote:
> > Commit-ID:  156654f491dd8d52687a5fbe1637f472a52ce75b
> > Gitweb:     http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b
> > Author:     Mike Galbraith <bitbucket@online.de>
> > AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Tue, 11 Mar 2014 12:05:43 +0100
> > 
> > sched/numa: Move task_numa_free() to __put_task_struct()
> > 
> > Bad idea on -rt:
> > 
> > [  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
> > [  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
> > [  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
> > [  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
> > [  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
> > [  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
> > [  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
> > [  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
> > [  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
> > [  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
> > [  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf
> > 
> > and since upstream does not mind where we do this, be a bit nicer ...
> > 
> > Signed-off-by: Mike Galbraith <bitbucket@online.de>
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > Cc: Mel Gorman <mgorman@suse.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> As it seems, upstream does mind:
> 
> [ 2590.260734] ======================================================
> [ 2590.261695] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
> [ 2590.262748] 3.14.0-next-20140403-sasha-00022-g10224c0 #377 Tainted: G        W
> [ 2590.263846] ------------------------------------------------------
> [ 2590.264730] trinity-c244/1210 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 2590.265783] (&(&grp->lock)->rlock){+.+...}, at: task_numa_free (kernel/sched/fair.c:1714)
> [ 2590.267179]
> [ 2590.267179] and this task is already holding:
> [ 2590.267996] (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998)
> [ 2590.269381] which would create a new lock dependency:
> [ 2590.270067]  (&(&new_timer->it_lock)->rlock){-.....} -> (&(&grp->lock)->rlock){+.+...}

I'm not getting it.

I moved task_numa_free() from one interrupts enabled spot to another.
But, with numa=fake=4 and lockdep enabled, not only does lockdep gripe,
my little box locks up on splat.  Saying spin_lock/unlock_irq() did the
expected, just moved lockdep gripe to task_numa_fault().


> [ 2590.270067]  Possible interrupt unsafe locking scenario:
> [ 2590.270067]
> [ 2590.270067]        CPU0                    CPU1
> [ 2590.270067]        ----                    ----
> [ 2590.270067]   lock(&(&grp->lock)->rlock);
> [ 2590.270067]                                local_irq_disable();
> [ 2590.270067]                                lock(&(&new_timer->it_lock)->rlock);
> [ 2590.270067]                                lock(&(&grp->lock)->rlock);
> [ 2590.270067]   <Interrupt>
> [ 2590.270067]     lock(&(&new_timer->it_lock)->rlock);
> [ 2590.270067]
> [ 2590.270067]  *** DEADLOCK ***

Ok, so how did I manage that HARDIRQ-safe -> HARDIRQ-unsafe?

-Mike


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  5:29     ` Mike Galbraith
@ 2014-04-07  7:30       ` Mike Galbraith
  2014-04-07  8:16         ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2014-04-07  7:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: mingo, hpa, linux-kernel, torvalds, peterz, mgorman, akpm, tglx,
	linux-tip-commits, Dave Jones

On Mon, 2014-04-07 at 07:29 +0200, Mike Galbraith wrote:

> I'm not getting it.
> 
> I moved task_numa_free() from one interrupts enabled spot to another.
> But, with numa=fake=4 and lockdep enabled, not only does lockdep gripe,
> my little box locks up on splat.  Saying spin_lock/unlock_irq() did the
> expected, just moved lockdep gripe to task_numa_fault().
> 
> 
> > [ 2590.270067]  Possible interrupt unsafe locking scenario:
> > [ 2590.270067]
> > [ 2590.270067]        CPU0                    CPU1
> > [ 2590.270067]        ----                    ----
> > [ 2590.270067]   lock(&(&grp->lock)->rlock);
> > [ 2590.270067]                                local_irq_disable();
> > [ 2590.270067]                                lock(&(&new_timer->it_lock)->rlock);
> > [ 2590.270067]                                lock(&(&grp->lock)->rlock);
> > [ 2590.270067]   <Interrupt>
> > [ 2590.270067]     lock(&(&new_timer->it_lock)->rlock);
> > [ 2590.270067]
> > [ 2590.270067]  *** DEADLOCK ***
> 
> Ok, so how did I manage that HARDIRQ-safe -> HARDIRQ-unsafe?

Think I'll turn lockdep off, and make context switches take a good long
while after finish_lock_switch(), but meanwhile, this made it happy.

Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made
numa_group.lock interrupt unsafe.  While I don't see how that could be given the
commit in question moved task_numa_free() from one irq enabled region to another,
the below does make both gripes and lockup upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Not-signed-off-by: Mike Galbraith <bitbucket@online.de>
---
 kernel/sched/fair.c  |   12 +++++++-----
 kernel/sched/sched.h |    9 +++++++++
 2 files changed, 16 insertions(+), 5 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t
 	/* If the task is part of a group prevent parallel updates to group stats */
 	if (p->numa_group) {
 		group_lock = &p->numa_group->lock;
-		spin_lock(group_lock);
+		spin_lock_irq(group_lock);
 	}
 
 	/* Find the node with the highest number of faults */
@@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t
 			}
 		}
 
-		spin_unlock(group_lock);
+		spin_unlock_irq(group_lock);
 	}
 
 	/* Preferred node as the node with the most faults */
@@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_
 	if (!join)
 		return;
 
-	double_lock(&my_grp->lock, &grp->lock);
+	BUG_ON(irqs_disabled());
+	double_lock_irq(&my_grp->lock, &grp->lock);
 
 	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
 		my_grp->faults[i] -= p->numa_faults_memory[i];
@@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
 
 	spin_unlock(&my_grp->lock);
 	spin_unlock(&grp->lock);
+	local_irq_enable();
 
 	rcu_assign_pointer(p->numa_group, grp);
 
@@ -1710,14 +1712,14 @@ void task_numa_free(struct task_struct *
 	void *numa_faults = p->numa_faults_memory;
 
 	if (grp) {
-		spin_lock(&grp->lock);
+		spin_lock_irq(&grp->lock);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
 			grp->faults[i] -= p->numa_faults_memory[i];
 		grp->total_faults -= p->total_numa_faults;
 
 		list_del(&p->numa_entry);
 		grp->nr_tasks--;
-		spin_unlock(&grp->lock);
+		spin_unlock_irq(&grp->lock);
 		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_
 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
 }
 
+static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+{
+	if (l1 > l2)
+		swap(l1, l2);
+
+	spin_lock_irq(l1);
+	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
+}
+
 static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
 {
 	if (l1 > l2)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  7:30       ` Mike Galbraith
@ 2014-04-07  8:16         ` Peter Zijlstra
  2014-04-07  8:40           ` Mike Galbraith
  2014-04-07  8:55           ` Mike Galbraith
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2014-04-07  8:16 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm,
	tglx, linux-tip-commits, Dave Jones

On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote:
> -	double_lock(&my_grp->lock, &grp->lock);
> +	BUG_ON(irqs_disabled());
> +	double_lock_irq(&my_grp->lock, &grp->lock);

So either make this:

	local_irq_disable();
	double_lock();

or

>  
>  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
>  		my_grp->faults[i] -= p->numa_faults_memory[i];
> @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
>  
>  	spin_unlock(&my_grp->lock);
>  	spin_unlock(&grp->lock);
> +	local_irq_enable();

use:
	spin_unlock()
	spin_unlock_irq()

or so, but this imbalance is making my itch :-)


>  
>  	rcu_assign_pointer(p->numa_group, grp);
>  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  8:16         ` Peter Zijlstra
@ 2014-04-07  8:40           ` Mike Galbraith
  2014-04-07  8:55           ` Mike Galbraith
  1 sibling, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2014-04-07  8:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm,
	tglx, linux-tip-commits, Dave Jones

On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: 
> On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote:
> > -	double_lock(&my_grp->lock, &grp->lock);
> > +	BUG_ON(irqs_disabled());
> > +	double_lock_irq(&my_grp->lock, &grp->lock);
> 
> So either make this:
> 
> 	local_irq_disable();
> 	double_lock();
> 
> or
> 
> >  
> >  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
> >  		my_grp->faults[i] -= p->numa_faults_memory[i];
> > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
> >  
> >  	spin_unlock(&my_grp->lock);
> >  	spin_unlock(&grp->lock);
> > +	local_irq_enable();
> 
> use:
> 	spin_unlock()
> 	spin_unlock_irq()

*thwap*  Well duh.

> or so, but this imbalance is making my itch :-)

Yeah, much better.

Before I actually sign that off, mind cluing me in as to why I should
not be sitting here thinking lockdep smoked its breakfast?

-Mike


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  8:16         ` Peter Zijlstra
  2014-04-07  8:40           ` Mike Galbraith
@ 2014-04-07  8:55           ` Mike Galbraith
  2014-04-13 20:53             ` Govindarajulu Varadarajan
  2014-04-14  7:22             ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith
  1 sibling, 2 replies; 11+ messages in thread
From: Mike Galbraith @ 2014-04-07  8:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm,
	tglx, linux-tip-commits, Dave Jones

On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: 
> On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote:
> > -	double_lock(&my_grp->lock, &grp->lock);
> > +	BUG_ON(irqs_disabled());
> > +	double_lock_irq(&my_grp->lock, &grp->lock);
> 
> So either make this:
> 
> 	local_irq_disable();
> 	double_lock();
> 
> or
> 
> >  
> >  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
> >  		my_grp->faults[i] -= p->numa_faults_memory[i];
> > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
> >  
> >  	spin_unlock(&my_grp->lock);
> >  	spin_unlock(&grp->lock);
> > +	local_irq_enable();
> 
> use:
> 	spin_unlock()
> 	spin_unlock_irq()
> 
> or so, but this imbalance is making my itch :-)

sched, numa: fix task_numa_free() lockdep splat

Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made
numa_group.lock interrupt unsafe.  While I don't see how that could be given the
commit in question moved task_numa_free() from one irq enabled region to another,
the below does make both gripes and lockups upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Mike Galbraith <bitbucket@online.de>
---
 kernel/sched/fair.c  |   13 +++++++------
 kernel/sched/sched.h |    9 +++++++++
 2 files changed, 16 insertions(+), 6 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t
 	/* If the task is part of a group prevent parallel updates to group stats */
 	if (p->numa_group) {
 		group_lock = &p->numa_group->lock;
-		spin_lock(group_lock);
+		spin_lock_irq(group_lock);
 	}
 
 	/* Find the node with the highest number of faults */
@@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t
 			}
 		}
 
-		spin_unlock(group_lock);
+		spin_unlock_irq(group_lock);
 	}
 
 	/* Preferred node as the node with the most faults */
@@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_
 	if (!join)
 		return;
 
-	double_lock(&my_grp->lock, &grp->lock);
+	BUG_ON(irqs_disabled());
+	double_lock_irq(&my_grp->lock, &grp->lock);
 
 	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
 		my_grp->faults[i] -= p->numa_faults_memory[i];
@@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_
 	grp->nr_tasks++;
 
 	spin_unlock(&my_grp->lock);
-	spin_unlock(&grp->lock);
+	spin_unlock_irq(&grp->lock);
 
 	rcu_assign_pointer(p->numa_group, grp);
 
@@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct *
 	void *numa_faults = p->numa_faults_memory;
 
 	if (grp) {
-		spin_lock(&grp->lock);
+		spin_lock_irq(&grp->lock);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
 			grp->faults[i] -= p->numa_faults_memory[i];
 		grp->total_faults -= p->total_numa_faults;
 
 		list_del(&p->numa_entry);
 		grp->nr_tasks--;
-		spin_unlock(&grp->lock);
+		spin_unlock_irq(&grp->lock);
 		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_
 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
 }
 
+static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+{
+	if (l1 > l2)
+		swap(l1, l2);
+
+	spin_lock_irq(l1);
+	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
+}
+
 static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
 {
 	if (l1 > l2)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  8:55           ` Mike Galbraith
@ 2014-04-13 20:53             ` Govindarajulu Varadarajan
  2014-04-14  7:22             ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith
  1 sibling, 0 replies; 11+ messages in thread
From: Govindarajulu Varadarajan @ 2014-04-13 20:53 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Sasha Levin, mingo, hpa, linux-kernel, torvalds,
	mgorman, akpm, tglx, linux-tip-commits, Dave Jones



On Mon, 7 Apr 2014, Mike Galbraith wrote:

> On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote:
>> On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote:
>>> -	double_lock(&my_grp->lock, &grp->lock);
>>> +	BUG_ON(irqs_disabled());
>>> +	double_lock_irq(&my_grp->lock, &grp->lock);
>>
>> So either make this:
>>
>> 	local_irq_disable();
>> 	double_lock();
>>
>> or
>>
>>>
>>>  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
>>>  		my_grp->faults[i] -= p->numa_faults_memory[i];
>>> @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
>>>
>>>  	spin_unlock(&my_grp->lock);
>>>  	spin_unlock(&grp->lock);
>>> +	local_irq_enable();
>>
>> use:
>> 	spin_unlock()
>> 	spin_unlock_irq()
>>
>> or so, but this imbalance is making my itch :-)
>
> sched, numa: fix task_numa_free() lockdep splat
>
> Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made
> numa_group.lock interrupt unsafe.  While I don't see how that could be given the
> commit in question moved task_numa_free() from one irq enabled region to another,
> the below does make both gripes and lockups upon gripe with numa=fake=4 go away.
>

Hi

I Am hitting this bug quite frequently. I do not see the problem after applying
this patch.

Thanks

Tested-by: Govindarajulu Varadarajan <govindx7c6@gmail.com>

> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Mike Galbraith <bitbucket@online.de>
> ---
> kernel/sched/fair.c  |   13 +++++++------
> kernel/sched/sched.h |    9 +++++++++
> 2 files changed, 16 insertions(+), 6 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t
> 	/* If the task is part of a group prevent parallel updates to group stats */
> 	if (p->numa_group) {
> 		group_lock = &p->numa_group->lock;
> -		spin_lock(group_lock);
> +		spin_lock_irq(group_lock);
> 	}
>
> 	/* Find the node with the highest number of faults */
> @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t
> 			}
> 		}
>
> -		spin_unlock(group_lock);
> +		spin_unlock_irq(group_lock);
> 	}
>
> 	/* Preferred node as the node with the most faults */
> @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_
> 	if (!join)
> 		return;
>
> -	double_lock(&my_grp->lock, &grp->lock);
> +	BUG_ON(irqs_disabled());
> +	double_lock_irq(&my_grp->lock, &grp->lock);
>
> 	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
> 		my_grp->faults[i] -= p->numa_faults_memory[i];
> @@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_
> 	grp->nr_tasks++;
>
> 	spin_unlock(&my_grp->lock);
> -	spin_unlock(&grp->lock);
> +	spin_unlock_irq(&grp->lock);
>
> 	rcu_assign_pointer(p->numa_group, grp);
>
> @@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct *
> 	void *numa_faults = p->numa_faults_memory;
>
> 	if (grp) {
> -		spin_lock(&grp->lock);
> +		spin_lock_irq(&grp->lock);
> 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
> 			grp->faults[i] -= p->numa_faults_memory[i];
> 		grp->total_faults -= p->total_numa_faults;
>
> 		list_del(&p->numa_entry);
> 		grp->nr_tasks--;
> -		spin_unlock(&grp->lock);
> +		spin_unlock_irq(&grp->lock);
> 		rcu_assign_pointer(p->numa_group, NULL);
> 		put_numa_group(grp);
> 	}
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_
> 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
> }
>
> +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
> +{
> +	if (l1 > l2)
> +		swap(l1, l2);
> +
> +	spin_lock_irq(l1);
> +	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
> +}
> +
> static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
> {
> 	if (l1 > l2)
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat
  2014-04-07  8:55           ` Mike Galbraith
  2014-04-13 20:53             ` Govindarajulu Varadarajan
@ 2014-04-14  7:22             ` tip-bot for Mike Galbraith
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Mike Galbraith @ 2014-04-14  7:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, sasha.levin, hpa, mingo, peterz, bitbucket, davej,
	tglx

Commit-ID:  60e69eed85bb7b5198ef70643b5895c26ad76ef7
Gitweb:     http://git.kernel.org/tip/60e69eed85bb7b5198ef70643b5895c26ad76ef7
Author:     Mike Galbraith <bitbucket@online.de>
AuthorDate: Mon, 7 Apr 2014 10:55:15 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 11 Apr 2014 10:39:15 +0200

sched/numa: Fix task_numa_free() lockdep splat

Sasha reported that lockdep claims that the following commit:
made numa_group.lock interrupt unsafe:

  156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")

While I don't see how that could be, given the commit in question moved
task_numa_free() from one irq enabled region to another, the below does
make both gripes and lockups upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Fixes: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Cc: mgorman@suse.com
Cc: akpm@linux-foundation.org
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c  | 13 +++++++------
 kernel/sched/sched.h |  9 +++++++++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e9bd0b..4f14a65 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1497,7 +1497,7 @@ static void task_numa_placement(struct task_struct *p)
 	/* If the task is part of a group prevent parallel updates to group stats */
 	if (p->numa_group) {
 		group_lock = &p->numa_group->lock;
-		spin_lock(group_lock);
+		spin_lock_irq(group_lock);
 	}
 
 	/* Find the node with the highest number of faults */
@@ -1572,7 +1572,7 @@ static void task_numa_placement(struct task_struct *p)
 			}
 		}
 
-		spin_unlock(group_lock);
+		spin_unlock_irq(group_lock);
 	}
 
 	/* Preferred node as the node with the most faults */
@@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
 	if (!join)
 		return;
 
-	double_lock(&my_grp->lock, &grp->lock);
+	BUG_ON(irqs_disabled());
+	double_lock_irq(&my_grp->lock, &grp->lock);
 
 	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
 		my_grp->faults[i] -= p->numa_faults_memory[i];
@@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
 	grp->nr_tasks++;
 
 	spin_unlock(&my_grp->lock);
-	spin_unlock(&grp->lock);
+	spin_unlock_irq(&grp->lock);
 
 	rcu_assign_pointer(p->numa_group, grp);
 
@@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct *p)
 	void *numa_faults = p->numa_faults_memory;
 
 	if (grp) {
-		spin_lock(&grp->lock);
+		spin_lock_irq(&grp->lock);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
 			grp->faults[i] -= p->numa_faults_memory[i];
 		grp->total_faults -= p->total_numa_faults;
 
 		list_del(&p->numa_entry);
 		grp->nr_tasks--;
-		spin_unlock(&grp->lock);
+		spin_unlock_irq(&grp->lock);
 		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c9007f2..456e492 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1385,6 +1385,15 @@ static inline void double_lock(spinlock_t *l1, spinlock_t *l2)
 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
 }
 
+static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+{
+	if (l1 > l2)
+		swap(l1, l2);
+
+	spin_lock_irq(l1);
+	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
+}
+
 static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
 {
 	if (l1 > l2)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-14  7:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-28  6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith
2014-02-28 11:32 ` Peter Zijlstra
2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith
2014-04-06 19:17   ` Sasha Levin
2014-04-07  5:29     ` Mike Galbraith
2014-04-07  7:30       ` Mike Galbraith
2014-04-07  8:16         ` Peter Zijlstra
2014-04-07  8:40           ` Mike Galbraith
2014-04-07  8:55           ` Mike Galbraith
2014-04-13 20:53             ` Govindarajulu Varadarajan
2014-04-14  7:22             ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).