From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01095C04AB4 for ; Tue, 14 May 2019 13:08:58 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2AD822147A for ; Tue, 14 May 2019 13:08:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2AD822147A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ellerman.id.au Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 453J0k5NgvzDqMX for ; Tue, 14 May 2019 23:08:54 +1000 (AEST) Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 453Hxx2NydzDq7j for ; Tue, 14 May 2019 23:06:29 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ellerman.id.au Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 453Hxv650Lz9sBr; Tue, 14 May 2019 23:06:27 +1000 (AEST) From: Michael Ellerman To: Christophe Leroy , Sachin Sant , "Aneesh Kumar K.V" Subject: Re: Kernel OOPS followed by a panic on next20190507 with 4K page size In-Reply-To: References: <0414d06e-1c4e-e9ec-e265-fd9662308df8@linux.ibm.com> <4465D9C6-BE89-4215-9730-21CD40ABEA50@linux.vnet.ibm.com> Date: Tue, 14 May 2019 23:06:25 +1000 Message-ID: <87pnolrxri.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-next@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Christophe Leroy writes: > Le 14/05/2019 =C3=A0 10:57, Sachin Sant a =C3=A9crit=C2=A0: >>> On 14-May-2019, at 7:00 AM, Aneesh Kumar K.V wrote: >>> On 5/8/19 4:30 PM, Sachin Sant wrote: >>>> While running LTP tests (specifically futex_wake04) against next-20199= 597 >>>> build with 4K page size on a POWER8 LPAR following crash is observed. >>>> [ 4233.214876] BUG: Kernel NULL pointer dereference at 0x0000001c >>>> [ 4233.214898] Faulting instruction address: 0xc000000001d1e58c >>>> [ 4233.214905] Oops: Kernel access of bad area, sig: 11 [#1] >>>> [ 4233.214911] LE PAGE_SIZE=3D4K MMU=3DHash SMP NR_CPUS=3D2048 NUMA pS= eries >>>> [ 4233.214920] Dumping ftrace buffer: >>>> [ 4233.214928] (ftrace buffer empty) >>>> [ 4233.214933] Modules linked in: overlay rpadlpar_io rpaphp iptable_m= angle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_= ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm iptable_fil= ter pseries_rng rng_core vmx_crypto ip_tables x_tables autofs4 [last unload= ed: dummy_del_mod] >>>> [ 4233.214973] CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G W= O 5.1.0-next-20190507-autotest #1 >>>> [ 4233.214980] NIP: c000000001d1e58c LR: c000000001d1e54c CTR: 000000= 0000000000 >>>> [ 4233.214987] REGS: c000000004937890 TRAP: 0300 Tainted: G W= O (5.1.0-next-20190507-autotest) >>>> [ 4233.214993] MSR: 8000000000009033 CR: 2242= 4822 XER: 00000000 >>>> [ 4233.215005] CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 400= 00000 IRQMASK: 0 >>>> [ 4233.215005] GPR00: c000000001901a80 c000000004937b20 c0000000039387= 00 0000000000000000 >>>> [ 4233.215005] GPR04: 0000000000400cc0 000000000003efff 000000027966e0= 00 c000000003ba8700 >>>> [ 4233.215005] GPR08: c000000003ba8700 000000000d601125 c000000003ba87= 00 0000000080000000 >>>> [ 4233.215005] GPR12: 0000000022424822 c00000001ecae280 00000000000000= 00 0000000000000000 >>>> [ 4233.215005] GPR16: 0000000000000000 0000000000000000 00000000000000= 00 0000000000000000 >>>> [ 4233.215005] GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d= 28 c0000002762da460 >>>> [ 4233.215005] GPR24: 000000000000001c 0000000000000000 00000000000000= 01 c000000001901a80 >>>> [ 4233.215005] GPR28: 0000000000400cc0 0000000000000000 00000000000000= 00 0000000000400cc0 >>>> [ 4233.215065] NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0 >>>> [ 4233.215071] LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0 >>>> [ 4233.215075] Call Trace: >>>> [ 4233.215081] [c000000004937b20] [c000000001c91150] __pud_alloc+0x160= /0x200 (unreliable) >>>> [ 4233.215090] [c000000004937b80] [c000000001901a80] huge_pte_alloc+0x= 580/0x950 >>>> [ 4233.215098] [c000000004937c00] [c000000001cf7910] hugetlb_fault+0x9= a0/0x1250 >>>> [ 4233.215106] [c000000004937ce0] [c000000001c94a80] handle_mm_fault+0= x490/0x4a0 >>>> [ 4233.215114] [c000000004937d20] [c0000000018d529c] __do_page_fault+0= x77c/0x1f00 >>>> [ 4233.215121] [c000000004937e00] [c0000000018d6a48] do_page_fault+0x2= 8/0x50 >>>> [ 4233.215129] [c000000004937e20] [c00000000183b0d4] handle_page_fault= +0x18/0x38 >>>> [ 4233.215135] Instruction dump: >>>> [ 4233.215139] 39290001 f92ac1b0 419e009c 3ce20027 3ba00000 e927c1f0 3= 9290001 f927c1f0 >>>> [ 4233.215149] 3d420027 e92ac290 39290001 f92ac290 <8359001c> 83390018= 60000000 3ce20027 >>> >>> I did send a patch to the list to handle page allocation failures in th= is patch. But i guess what we are finding here is get_current() crashing. A= ny chance to bisect this? >>> >>=20 >> Following commit seems to have introduced this problem. >>=20 >> 723f268f19 - powerpc/mm: cleanup ifdef mess in add_huge_page_size() >>=20 >> Reverting this patch allows the test case to execute properly without a = crash. > > Oops ... > > Can you check by replacing > > mmu_psize =3D check_and_get_huge_psize(size); > > by > > mmu_psize =3D check_and_get_huge_psize(shift); > > in add_huge_page_size() Yeah that's it :) I'm writing a commit, unless you have already? cheers