From: Balbir Singh <bsingharora@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>, Tejun Heo <tj@kernel.org>,
torvalds@linux-foundation.org
Cc: linuxppc-dev@lists.ozlabs.org, akpm@linux-foundation.org,
kernel-team@fb.com, jiangshanlai@gmail.com,
linux-kernel@vger.kernel.org
Subject: Re: Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot)
Date: Mon, 10 Oct 2016 22:17:16 +1100 [thread overview]
Message-ID: <b0622515-61b9-9622-cca3-d475733eb4a6@gmail.com> (raw)
In-Reply-To: <87twck5wqo.fsf@concordia.ellerman.id.au>
On 10/10/16 21:22, Michael Ellerman wrote:
> Hi Tejun,
>
> Tejun Heo <tj@kernel.org> writes:
>> From f85002f627f7fdc7b3cda526863f5c9a8d36b997 Mon Sep 17 00:00:00 2001
>> From: Tejun Heo <tj@kernel.org>
>> Date: Fri, 16 Sep 2016 15:49:32 -0400
>> Subject: [PATCH] workqueue: make workqueue available early during boot
>>
>> Workqueue is currently initialized in an early init call; however,
>> there are cases where early boot code has to be split and reordered to
>> come after workqueue initialization or the same code path which makes
>> use of workqueues is used both before workqueue initailization and
>> after. The latter cases have to gate workqueue usages with
>> keventd_up() tests, which is nasty and easy to get wrong.
>>
>> Workqueue usages have become widespread and it'd be a lot more
>> convenient if it can be used very early from boot. This patch splits
>> workqueue initialization into two steps. workqueue_init_early() which
>> sets up the basic data structures so that workqueues can be created
>> and work items queued, and workqueue_init() which actually brings up
>> workqueues online and starts executing queued work items. The former
>> step can be done very early during boot once memory allocation,
>> cpumasks and idr are initialized. The latter right after kthreads
>> become available.
>>
>> This allows work item queueing and canceling from very early boot
>> which is what most of these use cases want.
>>
>> * As systemd_wq being initialized doesn't indicate that workqueue is
>> fully online anymore, update keventd_up() to test wq_online instead.
>> The follow-up patches will get rid of all its usages and the
>> function itself.
>>
>> * Flushing doesn't make sense before workqueue is fully initialized.
>> The flush functions trigger WARN and return immediately before fully
>> online.
>>
>> * Work items are never in-flight before fully online. Canceling can
>> always succeed by skipping the flush step.
>>
>> * Some code paths can no longer assume to be called with irq enabled
>> as irq is disabled during early boot. Use irqsave/restore
>> operations instead.
>>
>> v2: Watchdog init, which requires timer to be running, moved from
>> workqueue_init_early() to workqueue_init().
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
>
>
> This patch seems to be causing one of my Power8 boxes not to boot.
>
> Specifically commit 3347fa092821 ("workqueue: make workqueue available
> early during boot") in linux-next.
>
> If I revert this on top of next-20161005 then the machine boots again.
>
> I've attached the oops below. It looks like the cfs_rq of p->se is NULL?
>
> cheers
>
>
> bootconsole [udbg0] disabled
> bootconsole [udbg0] disabled
> mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
> pid_max: default: 163840 minimum: 1280
> Dentry cache hash table entries: 16777216 (order: 11, 134217728 bytes)
> Inode-cache hash table entries: 8388608 (order: 10, 67108864 bytes)
> Mount-cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Mountpoint-cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Unable to handle kernel paging request for data at address 0x00000038
> Faulting instruction address: 0xc0000000000fc0cc
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161005 #94
> task: c0000007f5400000 task.stack: c000001ffc084000
> NIP: c0000000000fc0cc LR: c0000000000ed928 CTR: c0000000000fbfd0
> REGS: c000001ffc087780 TRAP: 0300 Not tainted (4.8.0-compiler_gcc-6.2.0-next-20161005)
> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 48000424 XER: 00000000
> CFAR: c0000000000089dc DAR: 0000000000000038 DSISR: 40000000 SOFTE: 0
> GPR00: c0000000000ed928 c000001ffc087a00 c000000000e63200 c000000010d6d600
> GPR04: c0000007f5409200 0000000000000021 000000000748e08c 000000000000001f
> GPR08: 0000000000000000 0000000000000021 000000000748f1f8 0000000000000000
> GPR12: 0000000028000422 c00000000fb80000 c00000000000e0c8 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000021 0000000000000001
> GPR20: ffffffffafb50401 0000000000000000 c000000010d6d600 000000000000ba7e
> GPR24: 000000000000ba7e c000000000d8bc58 afb504000afb5041 0000000000000001
> GPR28: 0000000000000000 0000000000000004 c0000007f5409280 0000000000000000
> NIP [c0000000000fc0cc] enqueue_task_fair+0xfc/0x18b0
> LR [c0000000000ed928] activate_task+0x78/0xe0
> Call Trace:
> [c000001ffc087a00] [c0000007f5409200] 0xc0000007f5409200 (unreliable)
> [c000001ffc087b10] [c0000000000ed928] activate_task+0x78/0xe0
> [c000001ffc087b50] [c0000000000ede58] ttwu_do_activate+0x68/0xc0
> [c000001ffc087b90] [c0000000000ef1b8] try_to_wake_up+0x208/0x4f0
> [c000001ffc087c10] [c0000000000d3484] create_worker+0x144/0x250
> [c000001ffc087cb0] [c000000000cd72d0] workqueue_init+0x124/0x150
> [c000001ffc087d00] [c000000000cc0e74] kernel_init_freeable+0x158/0x360
> [c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160
> [c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc
> Instruction dump:
> 62940401 3b800000 3aa00000 7f17c378 3a600001 3b600001 60000000 60000000
> 60420000 72490021 ebfe0150 2f890001 <ebbf0038> 419e0de0 7fbee840 419e0e58
> ---[ end trace 0000000000000000 ]---
>
>
> c0000000000fbfd0 <enqueue_task_fair>:
> r4 = p
> ...
> c0000000000fc040: 80 00 c4 3b addi r30,r4,128 r30 = r4 + 128 (&p->se)
> ...
> c0000000000fc0c4: 50 01 fe eb ld r31,336(r30) r31 = *(r30 + 336) = se->cfs_rq
> c0000000000fc0c8: 01 00 89 2f cmpwi cr7,r9,1
> c0000000000fc0cc: 38 00 bf eb ld r29,56(r31) r29 = cfs_rq->curr
>
I think there is a race
rest_init()
{
...
kernel_thread(kernel_init, NULL, CLONE_FS);
numa_default_policy();
pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
...
}
create_worker() needs kthreadd, it wakes up kthreadd in kthread_create_on_node,
workqueue_init() is called from kernel_init() , but kthreadd is created after
the call to kernel_init(), so its touch and go
Balbir Singh.
next prev parent reply other threads:[~2016-10-10 11:17 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1473967821-24363-1-git-send-email-tj@kernel.org>
[not found] ` <1473967821-24363-2-git-send-email-tj@kernel.org>
[not found] ` <20160917172314.GB10771@mtj.duckdns.org>
2016-10-10 10:22 ` Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot) Michael Ellerman
2016-10-10 11:17 ` Balbir Singh [this message]
2016-10-10 12:53 ` Tejun Heo
2016-10-10 13:22 ` Balbir Singh
2016-10-10 13:02 ` Tejun Heo
2016-10-10 13:14 ` Tejun Heo
2016-10-11 11:22 ` Michael Ellerman
2016-10-11 12:21 ` Balbir Singh
2016-10-14 15:08 ` Tejun Heo
2016-10-15 3:43 ` Balbir Singh
2016-10-14 15:07 ` Tejun Heo
2016-10-15 1:25 ` Balbir Singh
2016-10-15 9:48 ` Michael Ellerman
2016-10-17 18:13 ` Tejun Heo
2016-10-17 12:24 ` Michael Ellerman
2016-10-17 12:51 ` Balbir Singh
2016-10-18 2:35 ` Michael Ellerman
2016-10-17 18:15 ` Tejun Heo
2016-10-17 19:30 ` Tejun Heo
2016-10-18 4:37 ` Michael Ellerman
2016-10-18 18:58 ` Tejun Heo
2016-10-19 11:16 ` Michael Ellerman
2016-10-19 16:15 ` [PATCH wq/for-4.10] workqueue: move wq_numa_init() to workqueue_init() Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b0622515-61b9-9622-cca3-d475733eb4a6@gmail.com \
--to=bsingharora@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).