From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965230AbcDYVno (ORCPT ); Mon, 25 Apr 2016 17:43:44 -0400 Received: from mail-pa0-f43.google.com ([209.85.220.43]:34153 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965188AbcDYVnm (ORCPT ); Mon, 25 Apr 2016 17:43:42 -0400 Subject: Re: [linux-next PATCH] sched: cgroup: enable interrupt before calling threadgroup_change_begin To: Peter Zijlstra References: <1461383788-15102-1-git-send-email-yang.shi@linaro.org> <20160423091447.GA3430@twins.programming.kicks-ass.net> <571E5571.7050106@linaro.org> Cc: tj@kernel.org, mingo@redhat.com, lizefan@huawei.com, linux-kernel@vger.kernel.org, linux-next@vger.kernel.org, linaro-kernel@lists.linaro.org From: "Shi, Yang" Message-ID: <571E8F8D.9040701@linaro.org> Date: Mon, 25 Apr 2016 14:43:41 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <571E5571.7050106@linaro.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/25/2016 10:35 AM, Shi, Yang wrote: > On 4/23/2016 2:14 AM, Peter Zijlstra wrote: >> On Fri, Apr 22, 2016 at 08:56:28PM -0700, Yang Shi wrote: >>> When kernel oops happens in some kernel thread, i.e. kcompactd in the >>> test, >>> the below bug might be triggered by the oops handler: >> >> What are you trying to fix? You already oopsed the thing is wrecked. > > Actually, I ran into the below kernel BUG: > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [] release_freepages+0x18/0xa0 > PGD 0 > Oops: 0000 [#1] PREEMPT SMP > Modules linked in: > CPU: 6 PID: 110 Comm: kcompactd0 Not tainted 4.6.0-rc4-next-20160420 #4 > Hardware name: Intel Corporation S5520HC/S5520HC, BIOS > S5500.86B.01.10.0025.030220091519 03/02/2009 > task: ffff880361732680 ti: ffff88036173c000 task.ti: ffff88036173c000 > RIP: 0010:[] [] > release_freepages+0x18/0xa0 > RSP: 0018:ffff88036173fcf8 EFLAGS: 00010282 > RAX: 0000000000000000 RBX: ffff88036ffde7c0 RCX: 0000000000000009 > RDX: 0000000000001bf1 RSI: 000000000000000f RDI: ffff88036173fdd0 > RBP: ffff88036173fd20 R08: 0000000000000007 R09: 0000160000000000 > R10: ffff88036ffde7c0 R11: 0000000000000000 R12: 0000000000000000 > R13: ffff88036173fdd0 R14: ffff88036173fdc0 R15: ffff88036173fdb0 > FS: 0000000000000000(0000) GS:ffff880363cc0000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000000002206000 CR4: 00000000000006e0 > Stack: > ffff88036ffde7c0 0000000000000000 0000000000001a00 ffff88036173fdc0 > ffff88036173fdb0 ffff88036173fda0 ffffffff8119f13d ffffffff81196239 > 0000000000000000 ffff880361732680 0000000000000001 0000000000100000 > Call Trace: > [] compact_zone+0x55d/0x9f0 > [] ? fragmentation_index+0x19/0x70 > [] kcompactd_do_work+0x10f/0x230 > [] kcompactd+0x90/0x1e0 > [] ? wait_woken+0xa0/0xa0 > [] ? kcompactd_do_work+0x230/0x230 > [] kthread+0xdd/0x100 > [] ret_from_fork+0x22/0x40 > [] ? kthread_create_on_node+0x180/0x180 > Code: c1 fa 06 31 f6 e8 a9 9b fd ff eb 98 0f 1f 80 00 00 00 00 66 66 66 > 66 90 55 48 89 e5 41 57 41 56 41 55 49 89 fd 41 54 53 48 8b 07 <48> 8b > 10 48 8d 78 e0 49 39 c5 4c 8d 62 e0 74 70 49 be 00 00 00 > RIP [] release_freepages+0x18/0xa0 > RSP > CR2: 0000000000000000 > ---[ end trace 2e96d09e0ba6342f ]--- > > Then the "schedule in atomic context" bug is triggered which cause the > system hang. But, the system is still alive without the "schedule in > atomic context" bug. The previous null pointer deference issue doesn't > bring the system down other than killing the compactd kthread. BTW, I don't have "panic on oops" set. So, the kernel doesn't panic. Thanks, Yang > > Thanks, > Yang > >> >