From: Sasha Levin <sasha.levin@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Kirill Tkhai <ktkhai@parallels.com>,
linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
Ingo Molnar <mingo@redhat.com>,
Vladimir Davydov <vdavydov@parallels.com>,
Kirill Tkhai <tkhai@yandex.ru>
Subject: Re: [PATCH v4] sched/numa: fix unsafe get_task_struct() in task_numa_assign()
Date: Tue, 18 Nov 2014 12:30:31 -0500 [thread overview]
Message-ID: <546B8237.5000107@oracle.com> (raw)
In-Reply-To: <5466BCAF.4020706@oracle.com>
On 11/14/2014 09:38 PM, Sasha Levin wrote:
> On 11/10/2014 11:03 AM, Peter Zijlstra wrote:
>> > On Fri, Nov 07, 2014 at 10:48:27PM -0500, Sasha Levin wrote:
>>>> >> > [ 829.539183] BUG: spinlock recursion on CPU#10, trinity-c594/11067
>>>> >> > [ 829.539203] lock: 0xffff880631dd6b80, .magic: dead4ead, .owner: trinity-c594/11067, .owner_cpu: 13
>> > Ooh, look at that. CPU#10 vs .owner_cpu: 13 on the _same_ task.
>> >
>> > One of those again :/
> This actually reproduces pretty easily. The stack trace seems to be different
> but the end result is the same as above. Anything we can do to debug it? I'm
> really not sure how just the owner_cpu can be different here.
I've added saving stack traces on raw spinlock lock/unlock. Had to keep it at
8 entries, but it's enough to get the gist of it. The trace I'm seeing is:
[ 1239.788220] BUG: spinlock recursion on CPU#0, trinity-c767/32076
[ 1239.788270] irq event stamp: 9135954
[ 1239.788455] hardirqs last enabled at (9135953): free_pages_prepare (mm/page_alloc.c:770)
[ 1239.788573] hardirqs last disabled at (9135954): _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:109 kernel/locking/spinlock.c:159)
[ 1239.788687] softirqs last enabled at (9134166): __do_softirq (./arch/x86/include/asm/preempt.h:22 kernel/softirq.c:296)
[ 1239.788796] softirqs last disabled at (9134161): irq_exit (kernel/softirq.c:346 kernel/softirq.c:387)
[ 1239.791152] lock: 0xffff8805c3dd9100, .magic: dead4ead, .owner: trinity-c767/32076, .owner_cpu: -1
[ 1239.791152] lock trace:
[ 1239.791152] save_stack_trace (arch/x86/kernel/stacktrace.c:64)
[ 1239.791152] do_raw_spin_trylock (kernel/locking/spinlock_debug.c:171)
[ 1239.791152] _raw_spin_lock_irq (include/linux/spinlock_api_smp.h:129 kernel/locking/spinlock.c:167)
[ 1239.791152] __schedule (kernel/sched/core.c:2803)
[ 1239.791152] schedule (kernel/sched/core.c:2870)
[ 1239.791152] p9_client_rpc (net/9p/client.c:756 (discriminator 13))
[ 1239.791152] p9_client_read (net/9p/client.c:1582)
[ 1239.791152] v9fs_fid_readn (fs/9p/vfs_file.c:386)
[ 1239.791152] unlock trace:
[ 1239.791152] save_stack_trace (arch/x86/kernel/stacktrace.c:64)
[ 1239.791152] do_raw_spin_unlock (kernel/locking/spinlock_debug.c:120 include/linux/jump_label.h:114 ./arch/x86/include/asm/spinlock.h:149 kernel/locking/spinlock_debug.c:176)
[ 1239.791152] _raw_spin_unlock_irq (include/linux/spinlock_api_smp.h:168 kernel/locking/spinlock.c:199)
[ 1239.791152] finish_task_switch (./arch/x86/include/asm/current.h:14 kernel/sched/core.c:2251)
[ 1239.791152] __schedule (kernel/sched/core.c:2358 kernel/sched/core.c:2840)
[ 1239.791152] schedule_preempt_disabled (kernel/sched/core.c:2897)
[ 1239.791152] cpu_startup_entry (kernel/sched/idle.c:252 kernel/sched/idle.c:274)
[ 1239.791152] start_secondary (arch/x86/kernel/smpboot.c:238)
[ 1239.791152] CPU: 0 PID: 32076 Comm: trinity-c767 Not tainted 3.18.0-rc4-next-20141117-sasha-00046-g481e3e8-dirty #1471
[ 1239.791152] 0000000000000000 0000000000000000 ffff88089a90b000 ffff880867223728
[ 1239.791152] ffffffff920d1ec1 0000000000000030 ffff8805c3dd9100 ffff880867223768
[ 1239.791152] ffffffff815d0617 ffffffff9eec7730 ffff88089a90bf58 ffff8805c3dd9100
[ 1239.791152] Call Trace:
[ 1239.791152] dump_stack (lib/dump_stack.c:52)
[ 1239.791152] spin_dump (kernel/locking/spinlock_debug.c:81 (discriminator 8))
[ 1239.791152] spin_bug (kernel/locking/spinlock_debug.c:89)
[ 1239.791152] do_raw_spin_lock (kernel/locking/spinlock_debug.c:97 kernel/locking/spinlock_debug.c:152)
[ 1239.791152] ? load_balance (kernel/sched/fair.c:5582 kernel/sched/fair.c:6884)
[ 1239.791152] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 1239.791152] ? load_balance (kernel/sched/fair.c:5582 kernel/sched/fair.c:6884)
[ 1239.791152] load_balance (kernel/sched/fair.c:5582 kernel/sched/fair.c:6884)
[ 1239.791152] pick_next_task_fair (kernel/sched/fair.c:7154 kernel/sched/fair.c:5093)
[ 1239.791152] ? pick_next_task_fair (kernel/sched/fair.c:7131 kernel/sched/fair.c:5093)
[ 1239.791152] __schedule (kernel/sched/core.c:2716 kernel/sched/core.c:2830)
[ 1239.791152] ? preempt_count_sub (kernel/sched/core.c:2644)
[ 1239.791152] schedule (kernel/sched/core.c:2870)
[ 1239.791152] p9_client_rpc (net/9p/client.c:756 (discriminator 13))
[ 1239.791152] ? __init_waitqueue_head (kernel/sched/wait.c:292)
[ 1239.791152] ? p9_idpool_put (net/9p/util.c:127)
[ 1239.791152] p9_client_read (net/9p/client.c:1582)
[ 1239.791152] ? p9_free_req (net/9p/client.c:410)
[ 1239.791152] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
[ 1239.791152] ? might_fault (mm/memory.c:3734)
[ 1239.791152] v9fs_fid_readn (fs/9p/vfs_file.c:386)
[ 1239.791152] v9fs_file_read (fs/9p/vfs_file.c:447)
[ 1239.791152] do_loop_readv_writev (fs/read_write.c:724)
[ 1239.791152] ? v9fs_fid_readn (fs/9p/vfs_file.c:433)
[ 1239.791152] ? v9fs_fid_readn (fs/9p/vfs_file.c:433)
[ 1239.791152] do_readv_writev (fs/read_write.c:856)
[ 1239.791152] ? save_stack_trace (arch/x86/kernel/stacktrace.c:64)
[ 1239.791152] vfs_readv (fs/read_write.c:882)
[ 1239.791152] SyS_readv (fs/read_write.c:908 fs/read_write.c:899)
[ 1239.791152] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
So I still can't explain why owner_cpu is "-1" rather than something that makes sense,
but the trace *kinda* makes sense.
We locked the rq once in __schedule(), proceeded to finding the same rq as the busiest
one in load_balance() and locking it yet again.
The odd thing here is that load_balance() has a check just for that:
BUG_ON(busiest == env.dst_rq);
But that doesn't get hit, although we lock it only a bit later:
more_balance:
raw_spin_lock_irqsave(&busiest->lock, flags);
And there's no check before the actual lock there, so I've added another BUG_ON
there and will update with results.
Thanks,
Sasha
prev parent reply other threads:[~2014-11-18 17:31 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-22 7:17 [PATCH v4] sched/numa: fix unsafe get_task_struct() in task_numa_assign() Kirill Tkhai
2014-10-22 21:30 ` introduce task_rcu_dereference? Oleg Nesterov
2014-10-22 22:23 ` Oleg Nesterov
2014-10-23 18:15 ` Oleg Nesterov
2014-10-23 8:10 ` Kirill Tkhai
2014-10-23 18:18 ` Oleg Nesterov
2014-10-24 7:51 ` Kirill Tkhai
2014-10-27 19:53 ` [PATCH 0/3] introduce task_rcu_dereference() Oleg Nesterov
2014-10-27 19:54 ` [PATCH 1/3] probe_kernel_address() can use __probe_kernel_read() Oleg Nesterov
2014-10-27 19:54 ` [PATCH 2/3] introduce probe_slab_address() Oleg Nesterov
2014-10-27 19:21 ` Christoph Lameter
2014-10-28 5:44 ` Kirill Tkhai
2014-10-28 5:48 ` Kirill Tkhai
2014-10-28 15:01 ` Peter Zijlstra
2014-10-28 17:56 ` Kirill Tkhai
2014-10-28 18:00 ` Kirill Tkhai
2014-10-28 19:55 ` Oleg Nesterov
2014-10-28 20:12 ` Oleg Nesterov
2014-10-29 5:10 ` Kirill Tkhai
2014-10-27 19:54 ` [PATCH 3/3] introduce task_rcu_dereference() Oleg Nesterov
2014-10-28 6:22 ` Kirill Tkhai
2016-05-18 17:02 ` Peter Zijlstra
2016-05-18 18:23 ` Oleg Nesterov
2016-05-18 19:10 ` Peter Zijlstra
2016-05-18 19:57 ` Oleg Nesterov
2016-05-26 11:34 ` Peter Zijlstra
2016-06-03 10:49 ` [tip:sched/core] sched/fair: Use task_rcu_dereference() tip-bot for Oleg Nesterov
2016-06-03 10:48 ` [tip:sched/core] sched/api: Introduce task_rcu_dereference() and try_get_task_struct() tip-bot for Oleg Nesterov
2014-10-28 11:02 ` [tip:sched/core] sched/numa: Fix unsafe get_task_struct() in task_numa_assign() tip-bot for Kirill Tkhai
2014-11-08 3:48 ` [PATCH v4] sched/numa: fix " Sasha Levin
2014-11-09 14:07 ` Kirill Tkhai
2014-11-10 10:03 ` Peter Zijlstra
2014-11-10 15:48 ` Sasha Levin
2014-11-10 16:01 ` Peter Zijlstra
2014-11-16 9:50 ` [tip:sched/urgent] sched/numa: Avoid selecting oneself as swap target tip-bot for Peter Zijlstra
2014-11-10 16:03 ` [PATCH v4] sched/numa: fix unsafe get_task_struct() in task_numa_assign() Peter Zijlstra
2014-11-10 16:09 ` Sasha Levin
2014-11-10 16:16 ` Peter Zijlstra
2014-11-10 16:10 ` Kirill Tkhai
2014-11-10 16:36 ` Kirill Tkhai
2014-11-10 16:44 ` Sasha Levin
2014-11-10 20:01 ` Kirill Tkhai
2014-11-12 9:49 ` Kirill Tkhai
2014-11-15 2:38 ` Sasha Levin
2014-11-18 17:30 ` Sasha Levin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=546B8237.5000107@oracle.com \
--to=sasha.levin@oracle.com \
--cc=ktkhai@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tkhai@yandex.ru \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.