linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <bsingharora@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>, Tejun Heo <tj@kernel.org>,
	torvalds@linux-foundation.org
Cc: linuxppc-dev@lists.ozlabs.org, akpm@linux-foundation.org,
	kernel-team@fb.com, jiangshanlai@gmail.com,
	linux-kernel@vger.kernel.org
Subject: Re: Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot)
Date: Mon, 10 Oct 2016 22:17:16 +1100	[thread overview]
Message-ID: <b0622515-61b9-9622-cca3-d475733eb4a6@gmail.com> (raw)
In-Reply-To: <87twck5wqo.fsf@concordia.ellerman.id.au>



On 10/10/16 21:22, Michael Ellerman wrote:
> Hi Tejun,
> 
> Tejun Heo <tj@kernel.org> writes:
>> From f85002f627f7fdc7b3cda526863f5c9a8d36b997 Mon Sep 17 00:00:00 2001
>> From: Tejun Heo <tj@kernel.org>
>> Date: Fri, 16 Sep 2016 15:49:32 -0400
>> Subject: [PATCH] workqueue: make workqueue available early during boot
>>
>> Workqueue is currently initialized in an early init call; however,
>> there are cases where early boot code has to be split and reordered to
>> come after workqueue initialization or the same code path which makes
>> use of workqueues is used both before workqueue initailization and
>> after.  The latter cases have to gate workqueue usages with
>> keventd_up() tests, which is nasty and easy to get wrong.
>>
>> Workqueue usages have become widespread and it'd be a lot more
>> convenient if it can be used very early from boot.  This patch splits
>> workqueue initialization into two steps.  workqueue_init_early() which
>> sets up the basic data structures so that workqueues can be created
>> and work items queued, and workqueue_init() which actually brings up
>> workqueues online and starts executing queued work items.  The former
>> step can be done very early during boot once memory allocation,
>> cpumasks and idr are initialized.  The latter right after kthreads
>> become available.
>>
>> This allows work item queueing and canceling from very early boot
>> which is what most of these use cases want.
>>
>> * As systemd_wq being initialized doesn't indicate that workqueue is
>>   fully online anymore, update keventd_up() to test wq_online instead.
>>   The follow-up patches will get rid of all its usages and the
>>   function itself.
>>
>> * Flushing doesn't make sense before workqueue is fully initialized.
>>   The flush functions trigger WARN and return immediately before fully
>>   online.
>>
>> * Work items are never in-flight before fully online.  Canceling can
>>   always succeed by skipping the flush step.
>>
>> * Some code paths can no longer assume to be called with irq enabled
>>   as irq is disabled during early boot.  Use irqsave/restore
>>   operations instead.
>>
>> v2: Watchdog init, which requires timer to be running, moved from
>>     workqueue_init_early() to workqueue_init().
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
> 
> 
> This patch seems to be causing one of my Power8 boxes not to boot.
> 
> Specifically commit 3347fa092821 ("workqueue: make workqueue available
> early during boot") in linux-next.
> 
> If I revert this on top of next-20161005 then the machine boots again.
> 
> I've attached the oops below. It looks like the cfs_rq of p->se is NULL?
> 
> cheers
> 
> 
> bootconsole [udbg0] disabled
> bootconsole [udbg0] disabled
> mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
> pid_max: default: 163840 minimum: 1280
> Dentry cache hash table entries: 16777216 (order: 11, 134217728 bytes)
> Inode-cache hash table entries: 8388608 (order: 10, 67108864 bytes)
> Mount-cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Mountpoint-cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Unable to handle kernel paging request for data at address 0x00000038
> Faulting instruction address: 0xc0000000000fc0cc
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161005 #94
> task: c0000007f5400000 task.stack: c000001ffc084000
> NIP: c0000000000fc0cc LR: c0000000000ed928 CTR: c0000000000fbfd0
> REGS: c000001ffc087780 TRAP: 0300   Not tainted  (4.8.0-compiler_gcc-6.2.0-next-20161005)
> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 48000424  XER: 00000000
> CFAR: c0000000000089dc DAR: 0000000000000038 DSISR: 40000000 SOFTE: 0 
> GPR00: c0000000000ed928 c000001ffc087a00 c000000000e63200 c000000010d6d600 
> GPR04: c0000007f5409200 0000000000000021 000000000748e08c 000000000000001f 
> GPR08: 0000000000000000 0000000000000021 000000000748f1f8 0000000000000000 
> GPR12: 0000000028000422 c00000000fb80000 c00000000000e0c8 0000000000000000 
> GPR16: 0000000000000000 0000000000000000 0000000000000021 0000000000000001 
> GPR20: ffffffffafb50401 0000000000000000 c000000010d6d600 000000000000ba7e 
> GPR24: 000000000000ba7e c000000000d8bc58 afb504000afb5041 0000000000000001 
> GPR28: 0000000000000000 0000000000000004 c0000007f5409280 0000000000000000 
> NIP [c0000000000fc0cc] enqueue_task_fair+0xfc/0x18b0
> LR [c0000000000ed928] activate_task+0x78/0xe0
> Call Trace:
> [c000001ffc087a00] [c0000007f5409200] 0xc0000007f5409200 (unreliable)
> [c000001ffc087b10] [c0000000000ed928] activate_task+0x78/0xe0
> [c000001ffc087b50] [c0000000000ede58] ttwu_do_activate+0x68/0xc0
> [c000001ffc087b90] [c0000000000ef1b8] try_to_wake_up+0x208/0x4f0
> [c000001ffc087c10] [c0000000000d3484] create_worker+0x144/0x250
> [c000001ffc087cb0] [c000000000cd72d0] workqueue_init+0x124/0x150
> [c000001ffc087d00] [c000000000cc0e74] kernel_init_freeable+0x158/0x360
> [c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160
> [c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc
> Instruction dump:
> 62940401 3b800000 3aa00000 7f17c378 3a600001 3b600001 60000000 60000000 
> 60420000 72490021 ebfe0150 2f890001 <ebbf0038> 419e0de0 7fbee840 419e0e58 
> ---[ end trace 0000000000000000 ]---
> 
> 
> c0000000000fbfd0 <enqueue_task_fair>:
> r4 = p
> ...
> c0000000000fc040:	80 00 c4 3b 	addi    r30,r4,128	r30 = r4 + 128	(&p->se)
> ...
> c0000000000fc0c4:	50 01 fe eb 	ld      r31,336(r30)	r31 = *(r30 + 336) = se->cfs_rq
> c0000000000fc0c8:	01 00 89 2f 	cmpwi   cr7,r9,1
> c0000000000fc0cc:	38 00 bf eb 	ld      r29,56(r31)	r29 = cfs_rq->curr
> 

I think there is a race

rest_init()
{
	...
	kernel_thread(kernel_init, NULL, CLONE_FS);
	numa_default_policy();
	pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
	rcu_read_lock();
	kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
	...

}

create_worker() needs kthreadd, it wakes up kthreadd in kthread_create_on_node,
workqueue_init() is called from kernel_init() , but kthreadd is created after
the call to kernel_init(), so its touch and go

Balbir Singh.

  reply	other threads:[~2016-10-10 11:17 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1473967821-24363-1-git-send-email-tj@kernel.org>
     [not found] ` <1473967821-24363-2-git-send-email-tj@kernel.org>
     [not found]   ` <20160917172314.GB10771@mtj.duckdns.org>
2016-10-10 10:22     ` Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot) Michael Ellerman
2016-10-10 11:17       ` Balbir Singh [this message]
2016-10-10 12:53         ` Tejun Heo
2016-10-10 13:22           ` Balbir Singh
2016-10-10 13:02       ` Tejun Heo
2016-10-10 13:14         ` Tejun Heo
2016-10-11 11:22         ` Michael Ellerman
2016-10-11 12:21           ` Balbir Singh
2016-10-14 15:08             ` Tejun Heo
2016-10-15  3:43               ` Balbir Singh
2016-10-14 15:07           ` Tejun Heo
2016-10-15  1:25             ` Balbir Singh
2016-10-15  9:48             ` Michael Ellerman
2016-10-17 18:13               ` Tejun Heo
2016-10-17 12:24             ` Michael Ellerman
2016-10-17 12:51               ` Balbir Singh
2016-10-18  2:35                 ` Michael Ellerman
2016-10-17 18:15               ` Tejun Heo
2016-10-17 19:30                 ` Tejun Heo
2016-10-18  4:37                   ` Michael Ellerman
2016-10-18 18:58                     ` Tejun Heo
2016-10-19 11:16                       ` Michael Ellerman
2016-10-19 16:15                         ` [PATCH wq/for-4.10] workqueue: move wq_numa_init() to workqueue_init() Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0622515-61b9-9622-cca3-d475733eb4a6@gmail.com \
    --to=bsingharora@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).