From mboxrd@z Thu Jan 1 00:00:00 1970 From: vladimir.murzin@arm.com (Vladimir Murzin) Date: Wed, 20 Apr 2016 08:42:03 +0100 Subject: [BUG linux-next] Kernel panic found with linux-next-20160414 In-Reply-To: <5716C29F.1090205@linaro.org> References: <5716C29F.1090205@linaro.org> Message-ID: <571732CB.8010206@arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org CC LAKML in case somebody hit the same panic there. Vladimir On 20/04/16 00:43, Shi, Yang wrote: > Hi folks, > > When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the > below kernel panic: > > Unable to handle kernel paging request at virtual address ffffffc007846000 > pgd = ffffffc01e21d000 > [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 > Internal error: Oops: 96000047 [#11] PREEMPT SMP > Modules linked in: loop > CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D > 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 > Hardware name: Freescale Layerscape 2085a RDB Board (DT) > task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 > PC is at copy_page+0x38/0x120 > LR is at migrate_page_copy+0x604/0x1660 > pc : [] lr : [] pstate: 20000145 > sp : ffffffc01ea8ecd0 > x29: ffffffc01ea8ecd0 x28: 0000000000000000 > x27: 1ffffff7b80240f8 x26: ffffffc018196f20 > x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 > x23: 0000000000000000 x22: ffffffc01e3fcf80 > x21: ffffffc00481f000 x20: ffffff900a31d000 > x19: ffffffbdc01207c0 x18: 0000000000000f00 > x17: 0000000000000000 x16: 0000000000000000 > x15: 0000000000000000 x14: 0000000000000000 > x13: 0000000000000000 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 > x9 : 0000000000000000 x8 : 0000000000000000 > x7 : 0000000000000000 x6 : 0000000000000000 > x5 : 0000000000000000 x4 : 0000000000000000 > x3 : 0000000000000000 x2 : 0000000000000000 > x1 : ffffffc00481f080 x0 : ffffffc007846000 > > Call trace: > Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) > 2ec0: ffffffbdc00887c0 ffffff900a31d000 > 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 > 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 > 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 > 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 > 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d > 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 > 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 > 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 > 2fe0: f9407e11d00001f0 d61f02209103e210 > [] copy_page+0x38/0x120 > [] migrate_page+0x74/0x98 > [] nfs_migrate_page+0x58/0x80 > [] move_to_new_page+0x15c/0x4d8 > [] migrate_pages+0x7c8/0x11f0 > [] compact_zone+0xdfc/0x2570 > [] compact_zone_order+0xe0/0x170 > [] try_to_compact_pages+0x2e8/0x8f8 > [] __alloc_pages_direct_compact+0x100/0x540 > [] __alloc_pages_nodemask+0xc40/0x1c58 > [] khugepaged+0x468/0x19c8 > [] kthread+0x248/0x2c0 > [] ret_from_fork+0x10/0x40 > Code: d281f012 91020021 f1020252 d503201f (a8000c02) > > > I did some initial investigation and found it is caused by > DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline > 4.6-rc3 works well. > > It should be not arch specific although I got it caught on ARM64. I > suspect this might be caused by Hugh's huge tmpfs patches. > > Thanks, > Yang > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo at kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email at kvack.org > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 354AC6B007E for ; Tue, 19 Apr 2016 19:43:30 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id u190so56769279pfb.0 for ; Tue, 19 Apr 2016 16:43:30 -0700 (PDT) Received: from mail-pf0-x22c.google.com (mail-pf0-x22c.google.com. [2607:f8b0:400e:c00::22c]) by mx.google.com with ESMTPS id kg11si14630898pab.171.2016.04.19.16.43.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Apr 2016 16:43:29 -0700 (PDT) Received: by mail-pf0-x22c.google.com with SMTP id c20so11503659pfc.1 for ; Tue, 19 Apr 2016 16:43:29 -0700 (PDT) From: "Shi, Yang" Subject: [BUG linux-next] Kernel panic found with linux-next-20160414 Message-ID: <5716C29F.1090205@linaro.org> Date: Tue, 19 Apr 2016 16:43:27 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , sfr@canb.auug.org.au, Hugh Dickins Cc: LKML , linux-mm@kvack.org, yang.shi@linaro.org Hi folks, When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the below kernel panic: Unable to handle kernel paging request at virtual address ffffffc007846000 pgd = ffffffc01e21d000 [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 Internal error: Oops: 96000047 [#11] PREEMPT SMP Modules linked in: loop CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 Hardware name: Freescale Layerscape 2085a RDB Board (DT) task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 PC is at copy_page+0x38/0x120 LR is at migrate_page_copy+0x604/0x1660 pc : [] lr : [] pstate: 20000145 sp : ffffffc01ea8ecd0 x29: ffffffc01ea8ecd0 x28: 0000000000000000 x27: 1ffffff7b80240f8 x26: ffffffc018196f20 x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 x23: 0000000000000000 x22: ffffffc01e3fcf80 x21: ffffffc00481f000 x20: ffffff900a31d000 x19: ffffffbdc01207c0 x18: 0000000000000f00 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffffffc00481f080 x0 : ffffffc007846000 Call trace: Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) 2ec0: ffffffbdc00887c0 ffffff900a31d000 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 2fe0: f9407e11d00001f0 d61f02209103e210 [] copy_page+0x38/0x120 [] migrate_page+0x74/0x98 [] nfs_migrate_page+0x58/0x80 [] move_to_new_page+0x15c/0x4d8 [] migrate_pages+0x7c8/0x11f0 [] compact_zone+0xdfc/0x2570 [] compact_zone_order+0xe0/0x170 [] try_to_compact_pages+0x2e8/0x8f8 [] __alloc_pages_direct_compact+0x100/0x540 [] __alloc_pages_nodemask+0xc40/0x1c58 [] khugepaged+0x468/0x19c8 [] kthread+0x248/0x2c0 [] ret_from_fork+0x10/0x40 Code: d281f012 91020021 f1020252 d503201f (a8000c02) I did some initial investigation and found it is caused by DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works well. It should be not arch specific although I got it caught on ARM64. I suspect this might be caused by Hugh's huge tmpfs patches. Thanks, Yang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f71.google.com (mail-vk0-f71.google.com [209.85.213.71]) by kanga.kvack.org (Postfix) with ESMTP id 3772C6B0260 for ; Wed, 20 Apr 2016 04:01:20 -0400 (EDT) Received: by mail-vk0-f71.google.com with SMTP id e185so77149999vkb.2 for ; Wed, 20 Apr 2016 01:01:20 -0700 (PDT) Received: from mail-qg0-x236.google.com (mail-qg0-x236.google.com. [2607:f8b0:400d:c04::236]) by mx.google.com with ESMTPS id k66si3264513qhc.64.2016.04.20.01.01.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Apr 2016 01:01:19 -0700 (PDT) Received: by mail-qg0-x236.google.com with SMTP id f52so23854573qga.3 for ; Wed, 20 Apr 2016 01:01:19 -0700 (PDT) Date: Wed, 20 Apr 2016 01:01:12 -0700 (PDT) From: Hugh Dickins Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 In-Reply-To: <5716C29F.1090205@linaro.org> Message-ID: References: <5716C29F.1090205@linaro.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "Shi, Yang" Cc: Andrew Morton , sfr@canb.auug.org.au, Hugh Dickins , Vlastimil Babka , LKML , linux-mm@kvack.org On Tue, 19 Apr 2016, Shi, Yang wrote: > Hi folks, > > When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the below > kernel panic: > > Unable to handle kernel paging request at virtual address ffffffc007846000 > pgd = ffffffc01e21d000 > [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 > Internal error: Oops: 96000047 [#11] PREEMPT SMP > Modules linked in: loop > CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D > 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 > Hardware name: Freescale Layerscape 2085a RDB Board (DT) > task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 > PC is at copy_page+0x38/0x120 > LR is at migrate_page_copy+0x604/0x1660 > pc : [] lr : [] pstate: 20000145 > sp : ffffffc01ea8ecd0 > x29: ffffffc01ea8ecd0 x28: 0000000000000000 > x27: 1ffffff7b80240f8 x26: ffffffc018196f20 > x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 > x23: 0000000000000000 x22: ffffffc01e3fcf80 > x21: ffffffc00481f000 x20: ffffff900a31d000 > x19: ffffffbdc01207c0 x18: 0000000000000f00 > x17: 0000000000000000 x16: 0000000000000000 > x15: 0000000000000000 x14: 0000000000000000 > x13: 0000000000000000 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 > x9 : 0000000000000000 x8 : 0000000000000000 > x7 : 0000000000000000 x6 : 0000000000000000 > x5 : 0000000000000000 x4 : 0000000000000000 > x3 : 0000000000000000 x2 : 0000000000000000 > x1 : ffffffc00481f080 x0 : ffffffc007846000 > > Call trace: > Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) > 2ec0: ffffffbdc00887c0 ffffff900a31d000 > 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 > 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 > 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 > 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 > 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d > 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 > 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 > 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 > 2fe0: f9407e11d00001f0 d61f02209103e210 > [] copy_page+0x38/0x120 > [] migrate_page+0x74/0x98 > [] nfs_migrate_page+0x58/0x80 > [] move_to_new_page+0x15c/0x4d8 > [] migrate_pages+0x7c8/0x11f0 > [] compact_zone+0xdfc/0x2570 > [] compact_zone_order+0xe0/0x170 > [] try_to_compact_pages+0x2e8/0x8f8 > [] __alloc_pages_direct_compact+0x100/0x540 > [] __alloc_pages_nodemask+0xc40/0x1c58 > [] khugepaged+0x468/0x19c8 > [] kthread+0x248/0x2c0 > [] ret_from_fork+0x10/0x40 > Code: d281f012 91020021 f1020252 d503201f (a8000c02) > > > I did some initial investigation and found it is caused by DEBUG_PAGEALLOC > and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works well. > > It should be not arch specific although I got it caught on ARM64. I suspect > this might be caused by Hugh's huge tmpfs patches. Thanks for testing. It might be caused by my patches, but I don't think that's very likely. This is page migraton for compaction, in the service of anon THP's khugepaged; and I wonder if you were even exercising huge tmpfs when running LTP here (it certainly can be done: I like to mount a huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other tmpfs mounts are also huge). There are compaction changes in linux-next too, but I don't see any reason why they'd cause this. I don't know arm64 traces enough to know whether it's the source page or the destination page for the copy, but it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before reaching migration's copy. Needs more debugging, I'm afraid: is it reproducible? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 674B06B025E for ; Wed, 20 Apr 2016 03:42:33 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id e190so72914047pfe.3 for ; Wed, 20 Apr 2016 00:42:33 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id l27si18826291pfj.18.2016.04.20.00.42.30 for ; Wed, 20 Apr 2016 00:42:30 -0700 (PDT) Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 References: <5716C29F.1090205@linaro.org> From: Vladimir Murzin Message-ID: <571732CB.8010206@arm.com> Date: Wed, 20 Apr 2016 08:42:03 +0100 MIME-Version: 1.0 In-Reply-To: <5716C29F.1090205@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Shi, Yang" , Andrew Morton , sfr@canb.auug.org.au, Hugh Dickins Cc: LKML , linux-mm@kvack.org, "linux-arm-kernel@lists.infradead.org" CC LAKML in case somebody hit the same panic there. Vladimir On 20/04/16 00:43, Shi, Yang wrote: > Hi folks, > > When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the > below kernel panic: > > Unable to handle kernel paging request at virtual address ffffffc007846000 > pgd = ffffffc01e21d000 > [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 > Internal error: Oops: 96000047 [#11] PREEMPT SMP > Modules linked in: loop > CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D > 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 > Hardware name: Freescale Layerscape 2085a RDB Board (DT) > task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 > PC is at copy_page+0x38/0x120 > LR is at migrate_page_copy+0x604/0x1660 > pc : [] lr : [] pstate: 20000145 > sp : ffffffc01ea8ecd0 > x29: ffffffc01ea8ecd0 x28: 0000000000000000 > x27: 1ffffff7b80240f8 x26: ffffffc018196f20 > x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 > x23: 0000000000000000 x22: ffffffc01e3fcf80 > x21: ffffffc00481f000 x20: ffffff900a31d000 > x19: ffffffbdc01207c0 x18: 0000000000000f00 > x17: 0000000000000000 x16: 0000000000000000 > x15: 0000000000000000 x14: 0000000000000000 > x13: 0000000000000000 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 > x9 : 0000000000000000 x8 : 0000000000000000 > x7 : 0000000000000000 x6 : 0000000000000000 > x5 : 0000000000000000 x4 : 0000000000000000 > x3 : 0000000000000000 x2 : 0000000000000000 > x1 : ffffffc00481f080 x0 : ffffffc007846000 > > Call trace: > Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) > 2ec0: ffffffbdc00887c0 ffffff900a31d000 > 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 > 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 > 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 > 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 > 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d > 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 > 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 > 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 > 2fe0: f9407e11d00001f0 d61f02209103e210 > [] copy_page+0x38/0x120 > [] migrate_page+0x74/0x98 > [] nfs_migrate_page+0x58/0x80 > [] move_to_new_page+0x15c/0x4d8 > [] migrate_pages+0x7c8/0x11f0 > [] compact_zone+0xdfc/0x2570 > [] compact_zone_order+0xe0/0x170 > [] try_to_compact_pages+0x2e8/0x8f8 > [] __alloc_pages_direct_compact+0x100/0x540 > [] __alloc_pages_nodemask+0xc40/0x1c58 > [] khugepaged+0x468/0x19c8 > [] kthread+0x248/0x2c0 > [] ret_from_fork+0x10/0x40 > Code: d281f012 91020021 f1020252 d503201f (a8000c02) > > > I did some initial investigation and found it is caused by > DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline > 4.6-rc3 works well. > > It should be not arch specific although I got it caught on ARM64. I > suspect this might be caused by Hugh's huge tmpfs patches. > > Thanks, > Yang > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f69.google.com (mail-pa0-f69.google.com [209.85.220.69]) by kanga.kvack.org (Postfix) with ESMTP id 5E0A76B027A for ; Wed, 20 Apr 2016 12:11:56 -0400 (EDT) Received: by mail-pa0-f69.google.com with SMTP id zy2so70008999pac.1 for ; Wed, 20 Apr 2016 09:11:56 -0700 (PDT) Received: from mail-pf0-x22f.google.com (mail-pf0-x22f.google.com. [2607:f8b0:400e:c00::22f]) by mx.google.com with ESMTPS id h10si8075420paw.142.2016.04.20.09.11.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Apr 2016 09:11:52 -0700 (PDT) Received: by mail-pf0-x22f.google.com with SMTP id 184so19878341pff.0 for ; Wed, 20 Apr 2016 09:11:51 -0700 (PDT) Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 References: <5716C29F.1090205@linaro.org> From: "Shi, Yang" Message-ID: <5717AA46.5020905@linaro.org> Date: Wed, 20 Apr 2016 09:11:50 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Andrew Morton , sfr@canb.auug.org.au, Vlastimil Babka , LKML , linux-mm@kvack.org On 4/20/2016 1:01 AM, Hugh Dickins wrote: > On Tue, 19 Apr 2016, Shi, Yang wrote: >> Hi folks, >> >> When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the below >> kernel panic: >> >> Unable to handle kernel paging request at virtual address ffffffc007846000 >> pgd = ffffffc01e21d000 >> [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 >> Internal error: Oops: 96000047 [#11] PREEMPT SMP >> Modules linked in: loop >> CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D >> 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 >> Hardware name: Freescale Layerscape 2085a RDB Board (DT) >> task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 >> PC is at copy_page+0x38/0x120 >> LR is at migrate_page_copy+0x604/0x1660 >> pc : [] lr : [] pstate: 20000145 >> sp : ffffffc01ea8ecd0 >> x29: ffffffc01ea8ecd0 x28: 0000000000000000 >> x27: 1ffffff7b80240f8 x26: ffffffc018196f20 >> x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 >> x23: 0000000000000000 x22: ffffffc01e3fcf80 >> x21: ffffffc00481f000 x20: ffffff900a31d000 >> x19: ffffffbdc01207c0 x18: 0000000000000f00 >> x17: 0000000000000000 x16: 0000000000000000 >> x15: 0000000000000000 x14: 0000000000000000 >> x13: 0000000000000000 x12: 0000000000000000 >> x11: 0000000000000000 x10: 0000000000000000 >> x9 : 0000000000000000 x8 : 0000000000000000 >> x7 : 0000000000000000 x6 : 0000000000000000 >> x5 : 0000000000000000 x4 : 0000000000000000 >> x3 : 0000000000000000 x2 : 0000000000000000 >> x1 : ffffffc00481f080 x0 : ffffffc007846000 >> >> Call trace: >> Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) >> 2ec0: ffffffbdc00887c0 ffffff900a31d000 >> 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 >> 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 >> 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 >> 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 >> 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d >> 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 >> 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 >> 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 >> 2fe0: f9407e11d00001f0 d61f02209103e210 >> [] copy_page+0x38/0x120 >> [] migrate_page+0x74/0x98 >> [] nfs_migrate_page+0x58/0x80 >> [] move_to_new_page+0x15c/0x4d8 >> [] migrate_pages+0x7c8/0x11f0 >> [] compact_zone+0xdfc/0x2570 >> [] compact_zone_order+0xe0/0x170 >> [] try_to_compact_pages+0x2e8/0x8f8 >> [] __alloc_pages_direct_compact+0x100/0x540 >> [] __alloc_pages_nodemask+0xc40/0x1c58 >> [] khugepaged+0x468/0x19c8 >> [] kthread+0x248/0x2c0 >> [] ret_from_fork+0x10/0x40 >> Code: d281f012 91020021 f1020252 d503201f (a8000c02) >> >> >> I did some initial investigation and found it is caused by DEBUG_PAGEALLOC >> and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works well. >> >> It should be not arch specific although I got it caught on ARM64. I suspect >> this might be caused by Hugh's huge tmpfs patches. > > Thanks for testing. It might be caused by my patches, but I don't think > that's very likely. This is page migraton for compaction, in the service > of anon THP's khugepaged; and I wonder if you were even exercising huge > tmpfs when running LTP here (it certainly can be done: I like to mount a > huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other > tmpfs mounts are also huge). Some further investigation shows I got the panic even though I don't have tmpfs mounted with huge=1 or set shmem_huge to 2. > > There are compaction changes in linux-next too, but I don't see any > reason why they'd cause this. I don't know arm64 traces enough to know > whether it's the source page or the destination page for the copy, but > it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before > reaching migration's copy. The fault address is passed by x0, which is dest in the implementation of copy_page, so it is the destination page. > > Needs more debugging, I'm afraid: is it reproducible? Yes, as long as I enable those two PAGEALLOC debug options, I can get the panic once I run ltp. But, it is not caused any specific ltp test case directly, the panic happens randomly during ltp is running. Thanks, Yang > > Hugh > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f69.google.com (mail-pa0-f69.google.com [209.85.220.69]) by kanga.kvack.org (Postfix) with ESMTP id ED5D16B026A for ; Wed, 27 Apr 2016 04:14:22 -0400 (EDT) Received: by mail-pa0-f69.google.com with SMTP id xm6so50141908pab.3 for ; Wed, 27 Apr 2016 01:14:22 -0700 (PDT) Received: from mail-pa0-x232.google.com (mail-pa0-x232.google.com. [2607:f8b0:400e:c03::232]) by mx.google.com with ESMTPS id x10si4141632pas.64.2016.04.27.01.14.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Apr 2016 01:14:22 -0700 (PDT) Received: by mail-pa0-x232.google.com with SMTP id bt5so16506446pac.3 for ; Wed, 27 Apr 2016 01:14:22 -0700 (PDT) Date: Wed, 27 Apr 2016 01:14:13 -0700 (PDT) From: Hugh Dickins Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 In-Reply-To: <5717AA46.5020905@linaro.org> Message-ID: References: <5716C29F.1090205@linaro.org> <5717AA46.5020905@linaro.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "Shi, Yang" Cc: Hugh Dickins , Andrew Morton , sfr@canb.auug.org.au, Vlastimil Babka , LKML , linux-mm@kvack.org On Wed, 20 Apr 2016, Shi, Yang wrote: > On 4/20/2016 1:01 AM, Hugh Dickins wrote: > > On Tue, 19 Apr 2016, Shi, Yang wrote: > > > Hi folks, > > > > > > When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the > > > below > > > kernel panic: > > > > > > Unable to handle kernel paging request at virtual address > > > ffffffc007846000 > > > pgd = ffffffc01e21d000 > > > [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 > > > Internal error: Oops: 96000047 [#11] PREEMPT SMP > > > Modules linked in: loop > > > CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D > > > 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 > > > Hardware name: Freescale Layerscape 2085a RDB Board (DT) > > > task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 > > > PC is at copy_page+0x38/0x120 > > > LR is at migrate_page_copy+0x604/0x1660 > > > pc : [] lr : [] pstate: 20000145 > > > sp : ffffffc01ea8ecd0 > > > x29: ffffffc01ea8ecd0 x28: 0000000000000000 > > > x27: 1ffffff7b80240f8 x26: ffffffc018196f20 > > > x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 > > > x23: 0000000000000000 x22: ffffffc01e3fcf80 > > > x21: ffffffc00481f000 x20: ffffff900a31d000 > > > x19: ffffffbdc01207c0 x18: 0000000000000f00 > > > x17: 0000000000000000 x16: 0000000000000000 > > > x15: 0000000000000000 x14: 0000000000000000 > > > x13: 0000000000000000 x12: 0000000000000000 > > > x11: 0000000000000000 x10: 0000000000000000 > > > x9 : 0000000000000000 x8 : 0000000000000000 > > > x7 : 0000000000000000 x6 : 0000000000000000 > > > x5 : 0000000000000000 x4 : 0000000000000000 > > > x3 : 0000000000000000 x2 : 0000000000000000 > > > x1 : ffffffc00481f080 x0 : ffffffc007846000 > > > > > > Call trace: > > > Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) > > > 2ec0: ffffffbdc00887c0 ffffff900a31d000 > > > 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 > > > 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 > > > 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 > > > 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 > > > 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d > > > 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 > > > 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 > > > 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 > > > 2fe0: f9407e11d00001f0 d61f02209103e210 > > > [] copy_page+0x38/0x120 > > > [] migrate_page+0x74/0x98 > > > [] nfs_migrate_page+0x58/0x80 > > > [] move_to_new_page+0x15c/0x4d8 > > > [] migrate_pages+0x7c8/0x11f0 > > > [] compact_zone+0xdfc/0x2570 > > > [] compact_zone_order+0xe0/0x170 > > > [] try_to_compact_pages+0x2e8/0x8f8 > > > [] __alloc_pages_direct_compact+0x100/0x540 > > > [] __alloc_pages_nodemask+0xc40/0x1c58 > > > [] khugepaged+0x468/0x19c8 > > > [] kthread+0x248/0x2c0 > > > [] ret_from_fork+0x10/0x40 > > > Code: d281f012 91020021 f1020252 d503201f (a8000c02) > > > > > > > > > I did some initial investigation and found it is caused by > > > DEBUG_PAGEALLOC > > > and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works > > > well. > > > > > > It should be not arch specific although I got it caught on ARM64. I > > > suspect > > > this might be caused by Hugh's huge tmpfs patches. > > > > Thanks for testing. It might be caused by my patches, but I don't think > > that's very likely. This is page migraton for compaction, in the service > > of anon THP's khugepaged; and I wonder if you were even exercising huge > > tmpfs when running LTP here (it certainly can be done: I like to mount a > > huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other > > tmpfs mounts are also huge). > > Some further investigation shows I got the panic even though I don't have > tmpfs mounted with huge=1 or set shmem_huge to 2. > > > > > There are compaction changes in linux-next too, but I don't see any > > reason why they'd cause this. I don't know arm64 traces enough to know > > whether it's the source page or the destination page for the copy, but > > it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before > > reaching migration's copy. > > The fault address is passed by x0, which is dest in the implementation of > copy_page, so it is the destination page. > > > > > Needs more debugging, I'm afraid: is it reproducible? > > Yes, as long as I enable those two PAGEALLOC debug options, I can get the > panic once I run ltp. But, it is not caused any specific ltp test case > directly, the panic happens randomly during ltp is running. Your ping on the crash in release_freepages() reminded me to take another look at this one. And found that I only needed to enable DEBUG_PAGEALLOC and run LTP to get it on x86_64 too, as you suspected. It's another of those compaction errors, in mmotm and linux-next of a week or two ago, whose patch has since been withdrawn (but huge tmpfs has also been withdrawn for now, so you're right to stick with the older linux-next for testing it). I believe the patch below fixes it; but I've not done full diligence on it - if I had more time, I'd want to check that all of the things that need doing are now being done on this path, and that it's also okay if the release undoes them even when they didn't get to be done. But not worth that diligence if the patch is withdrawn already. It's rather horrible that compaction.c uses functions in page_alloc.c which skip doing some of the things we expect to be done: the non-debug preparation tends to get noticed, but the debug options overlooked. We can expect more problems of this kind in future: someone will add yet another debug prep line in page_alloc.c, and at first nobody will notice that it's also needed in compaction.c. I am hopeful, since the missed map_pages() does KASAN initialization too, that this might also fix your KASAN use-after-free in nfs_do_filldir(), which you also reported on April 20th. But with this patch in, I do get a more interesting crash in remap_team_by_ptes() from LTP's mmapstress10: there appears to be an anon THP in a huge tmpfs vma. Maybe I've got the test at the head of __split_huge_pmd() wrong, but I don't recall seeing this before rebuilding with DEBUG_PAGEALLOC. Can't spend longer on it now, will return to it tomorrow. Hugh --- mm/compaction.c | 1 + 1 file changed, 1 insertion(+) --- 4.6-rc2-mm1/mm/compaction.c 2016-04-11 11:35:08.000000000 -0700 +++ linux/mm/compaction.c 2016-04-26 22:15:10.954455303 -0700 @@ -1113,6 +1113,7 @@ static void isolate_freepages_direct(str } spin_unlock_irqrestore(&cc->zone->lock, flags); + map_pages(&cc->freepages); } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f72.google.com (mail-lf0-f72.google.com [209.85.215.72]) by kanga.kvack.org (Postfix) with ESMTP id CB03D6B0005 for ; Wed, 27 Apr 2016 04:51:26 -0400 (EDT) Received: by mail-lf0-f72.google.com with SMTP id k200so33580571lfg.1 for ; Wed, 27 Apr 2016 01:51:26 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id a1si7977554wmi.12.2016.04.27.01.51.25 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 27 Apr 2016 01:51:25 -0700 (PDT) Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 References: <5716C29F.1090205@linaro.org> <5717AA46.5020905@linaro.org> From: Vlastimil Babka Message-ID: <57207D88.4000508@suse.cz> Date: Wed, 27 Apr 2016 10:51:20 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins , "Shi, Yang" Cc: Andrew Morton , sfr@canb.auug.org.au, LKML , linux-mm@kvack.org On 04/27/2016 10:14 AM, Hugh Dickins wrote: > It's rather horrible that compaction.c uses functions in page_alloc.c > which skip doing some of the things we expect to be done: the non-debug > preparation tends to get noticed, but the debug options overlooked. > We can expect more problems of this kind in future: someone will add > yet another debug prep line in page_alloc.c, and at first nobody will > notice that it's also needed in compaction.c. Point taken, I'll try to come up with more maintainable solution next time I attempt the isolate_freepages_direct() approach. Sorry about the troubles. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 895F86B0005 for ; Thu, 28 Apr 2016 14:21:50 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id e190so179268775pfe.3 for ; Thu, 28 Apr 2016 11:21:50 -0700 (PDT) Received: from mail-pa0-x234.google.com (mail-pa0-x234.google.com. [2607:f8b0:400e:c03::234]) by mx.google.com with ESMTPS id z25si11411137pfa.82.2016.04.28.11.21.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Apr 2016 11:21:49 -0700 (PDT) Received: by mail-pa0-x234.google.com with SMTP id bt5so35010839pac.3 for ; Thu, 28 Apr 2016 11:21:49 -0700 (PDT) Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 References: <5716C29F.1090205@linaro.org> <5717AA46.5020905@linaro.org> From: "Shi, Yang" Message-ID: Date: Thu, 28 Apr 2016 11:21:47 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Andrew Morton , sfr@canb.auug.org.au, Vlastimil Babka , LKML , linux-mm@kvack.org On 4/27/2016 1:14 AM, Hugh Dickins wrote: > On Wed, 20 Apr 2016, Shi, Yang wrote: >> On 4/20/2016 1:01 AM, Hugh Dickins wrote: >>> On Tue, 19 Apr 2016, Shi, Yang wrote: >>>> Hi folks, >>>> >>>> When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the >>>> below >>>> kernel panic: >>>> >>>> Unable to handle kernel paging request at virtual address >>>> ffffffc007846000 >>>> pgd = ffffffc01e21d000 >>>> [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 >>>> Internal error: Oops: 96000047 [#11] PREEMPT SMP >>>> Modules linked in: loop >>>> CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D >>>> 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 >>>> Hardware name: Freescale Layerscape 2085a RDB Board (DT) >>>> task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 >>>> PC is at copy_page+0x38/0x120 >>>> LR is at migrate_page_copy+0x604/0x1660 >>>> pc : [] lr : [] pstate: 20000145 >>>> sp : ffffffc01ea8ecd0 >>>> x29: ffffffc01ea8ecd0 x28: 0000000000000000 >>>> x27: 1ffffff7b80240f8 x26: ffffffc018196f20 >>>> x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 >>>> x23: 0000000000000000 x22: ffffffc01e3fcf80 >>>> x21: ffffffc00481f000 x20: ffffff900a31d000 >>>> x19: ffffffbdc01207c0 x18: 0000000000000f00 >>>> x17: 0000000000000000 x16: 0000000000000000 >>>> x15: 0000000000000000 x14: 0000000000000000 >>>> x13: 0000000000000000 x12: 0000000000000000 >>>> x11: 0000000000000000 x10: 0000000000000000 >>>> x9 : 0000000000000000 x8 : 0000000000000000 >>>> x7 : 0000000000000000 x6 : 0000000000000000 >>>> x5 : 0000000000000000 x4 : 0000000000000000 >>>> x3 : 0000000000000000 x2 : 0000000000000000 >>>> x1 : ffffffc00481f080 x0 : ffffffc007846000 >>>> >>>> Call trace: >>>> Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) >>>> 2ec0: ffffffbdc00887c0 ffffff900a31d000 >>>> 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 >>>> 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 >>>> 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 >>>> 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 >>>> 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d >>>> 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 >>>> 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 >>>> 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 >>>> 2fe0: f9407e11d00001f0 d61f02209103e210 >>>> [] copy_page+0x38/0x120 >>>> [] migrate_page+0x74/0x98 >>>> [] nfs_migrate_page+0x58/0x80 >>>> [] move_to_new_page+0x15c/0x4d8 >>>> [] migrate_pages+0x7c8/0x11f0 >>>> [] compact_zone+0xdfc/0x2570 >>>> [] compact_zone_order+0xe0/0x170 >>>> [] try_to_compact_pages+0x2e8/0x8f8 >>>> [] __alloc_pages_direct_compact+0x100/0x540 >>>> [] __alloc_pages_nodemask+0xc40/0x1c58 >>>> [] khugepaged+0x468/0x19c8 >>>> [] kthread+0x248/0x2c0 >>>> [] ret_from_fork+0x10/0x40 >>>> Code: d281f012 91020021 f1020252 d503201f (a8000c02) >>>> >>>> >>>> I did some initial investigation and found it is caused by >>>> DEBUG_PAGEALLOC >>>> and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works >>>> well. >>>> >>>> It should be not arch specific although I got it caught on ARM64. I >>>> suspect >>>> this might be caused by Hugh's huge tmpfs patches. >>> >>> Thanks for testing. It might be caused by my patches, but I don't think >>> that's very likely. This is page migraton for compaction, in the service >>> of anon THP's khugepaged; and I wonder if you were even exercising huge >>> tmpfs when running LTP here (it certainly can be done: I like to mount a >>> huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other >>> tmpfs mounts are also huge). >> >> Some further investigation shows I got the panic even though I don't have >> tmpfs mounted with huge=1 or set shmem_huge to 2. >> >>> >>> There are compaction changes in linux-next too, but I don't see any >>> reason why they'd cause this. I don't know arm64 traces enough to know >>> whether it's the source page or the destination page for the copy, but >>> it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before >>> reaching migration's copy. >> >> The fault address is passed by x0, which is dest in the implementation of >> copy_page, so it is the destination page. >> >>> >>> Needs more debugging, I'm afraid: is it reproducible? >> >> Yes, as long as I enable those two PAGEALLOC debug options, I can get the >> panic once I run ltp. But, it is not caused any specific ltp test case >> directly, the panic happens randomly during ltp is running. > > Your ping on the crash in release_freepages() reminded me to take another > look at this one. And found that I only needed to enable DEBUG_PAGEALLOC > and run LTP to get it on x86_64 too, as you suspected. > > It's another of those compaction errors, in mmotm and linux-next of a > week or two ago, whose patch has since been withdrawn (but huge tmpfs > has also been withdrawn for now, so you're right to stick with the > older linux-next for testing it). Yes, I saw the discussion on LSFMM 2016 and the patches have gone in my latest update from linux-next. I will stick to 20160420 for the huge tmpfs testing. > > I believe the patch below fixes it; but I've not done full diligence > on it - if I had more time, I'd want to check that all of the things > that need doing are now being done on this path, and that it's also > okay if the release undoes them even when they didn't get to be done. > But not worth that diligence if the patch is withdrawn already. > > It's rather horrible that compaction.c uses functions in page_alloc.c > which skip doing some of the things we expect to be done: the non-debug > preparation tends to get noticed, but the debug options overlooked. > We can expect more problems of this kind in future: someone will add > yet another debug prep line in page_alloc.c, and at first nobody will > notice that it's also needed in compaction.c. > > I am hopeful, since the missed map_pages() does KASAN initialization too, > that this might also fix your KASAN use-after-free in nfs_do_filldir(), > which you also reported on April 20th. > > But with this patch in, I do get a more interesting crash in > remap_team_by_ptes() from LTP's mmapstress10: there appears to be an > anon THP in a huge tmpfs vma. Maybe I've got the test at the head of > __split_huge_pmd() wrong, but I don't recall seeing this before > rebuilding with DEBUG_PAGEALLOC. Can't spend longer on it now, > will return to it tomorrow. Thanks for the patch and the patch for another problem. Regards, Yang > > Hugh > --- > mm/compaction.c | 1 + > 1 file changed, 1 insertion(+) > > --- 4.6-rc2-mm1/mm/compaction.c 2016-04-11 11:35:08.000000000 -0700 > +++ linux/mm/compaction.c 2016-04-26 22:15:10.954455303 -0700 > @@ -1113,6 +1113,7 @@ static void isolate_freepages_direct(str > } > > spin_unlock_irqrestore(&cc->zone->lock, flags); > + map_pages(&cc->freepages); > } > > /* > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753522AbcDSXna (ORCPT ); Tue, 19 Apr 2016 19:43:30 -0400 Received: from mail-pf0-f173.google.com ([209.85.192.173]:34987 "EHLO mail-pf0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752771AbcDSXn3 (ORCPT ); Tue, 19 Apr 2016 19:43:29 -0400 To: Andrew Morton , sfr@canb.auug.org.au, Hugh Dickins Cc: LKML , linux-mm@kvack.org, yang.shi@linaro.org From: "Shi, Yang" Subject: [BUG linux-next] Kernel panic found with linux-next-20160414 Message-ID: <5716C29F.1090205@linaro.org> Date: Tue, 19 Apr 2016 16:43:27 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi folks, When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the below kernel panic: Unable to handle kernel paging request at virtual address ffffffc007846000 pgd = ffffffc01e21d000 [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 Internal error: Oops: 96000047 [#11] PREEMPT SMP Modules linked in: loop CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 Hardware name: Freescale Layerscape 2085a RDB Board (DT) task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 PC is at copy_page+0x38/0x120 LR is at migrate_page_copy+0x604/0x1660 pc : [] lr : [] pstate: 20000145 sp : ffffffc01ea8ecd0 x29: ffffffc01ea8ecd0 x28: 0000000000000000 x27: 1ffffff7b80240f8 x26: ffffffc018196f20 x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 x23: 0000000000000000 x22: ffffffc01e3fcf80 x21: ffffffc00481f000 x20: ffffff900a31d000 x19: ffffffbdc01207c0 x18: 0000000000000f00 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffffffc00481f080 x0 : ffffffc007846000 Call trace: Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) 2ec0: ffffffbdc00887c0 ffffff900a31d000 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 2fe0: f9407e11d00001f0 d61f02209103e210 [] copy_page+0x38/0x120 [] migrate_page+0x74/0x98 [] nfs_migrate_page+0x58/0x80 [] move_to_new_page+0x15c/0x4d8 [] migrate_pages+0x7c8/0x11f0 [] compact_zone+0xdfc/0x2570 [] compact_zone_order+0xe0/0x170 [] try_to_compact_pages+0x2e8/0x8f8 [] __alloc_pages_direct_compact+0x100/0x540 [] __alloc_pages_nodemask+0xc40/0x1c58 [] khugepaged+0x468/0x19c8 [] kthread+0x248/0x2c0 [] ret_from_fork+0x10/0x40 Code: d281f012 91020021 f1020252 d503201f (a8000c02) I did some initial investigation and found it is caused by DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works well. It should be not arch specific although I got it caught on ARM64. I suspect this might be caused by Hugh's huge tmpfs patches. Thanks, Yang From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754552AbcDTHmc (ORCPT ); Wed, 20 Apr 2016 03:42:32 -0400 Received: from foss.arm.com ([217.140.101.70]:44876 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932066AbcDTHma (ORCPT ); Wed, 20 Apr 2016 03:42:30 -0400 Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 To: "Shi, Yang" , Andrew Morton , sfr@canb.auug.org.au, Hugh Dickins References: <5716C29F.1090205@linaro.org> Cc: LKML , linux-mm@kvack.org, "linux-arm-kernel@lists.infradead.org" From: Vladimir Murzin Message-ID: <571732CB.8010206@arm.com> Date: Wed, 20 Apr 2016 08:42:03 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <5716C29F.1090205@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org CC LAKML in case somebody hit the same panic there. Vladimir On 20/04/16 00:43, Shi, Yang wrote: > Hi folks, > > When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the > below kernel panic: > > Unable to handle kernel paging request at virtual address ffffffc007846000 > pgd = ffffffc01e21d000 > [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 > Internal error: Oops: 96000047 [#11] PREEMPT SMP > Modules linked in: loop > CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D > 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 > Hardware name: Freescale Layerscape 2085a RDB Board (DT) > task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 > PC is at copy_page+0x38/0x120 > LR is at migrate_page_copy+0x604/0x1660 > pc : [] lr : [] pstate: 20000145 > sp : ffffffc01ea8ecd0 > x29: ffffffc01ea8ecd0 x28: 0000000000000000 > x27: 1ffffff7b80240f8 x26: ffffffc018196f20 > x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 > x23: 0000000000000000 x22: ffffffc01e3fcf80 > x21: ffffffc00481f000 x20: ffffff900a31d000 > x19: ffffffbdc01207c0 x18: 0000000000000f00 > x17: 0000000000000000 x16: 0000000000000000 > x15: 0000000000000000 x14: 0000000000000000 > x13: 0000000000000000 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 > x9 : 0000000000000000 x8 : 0000000000000000 > x7 : 0000000000000000 x6 : 0000000000000000 > x5 : 0000000000000000 x4 : 0000000000000000 > x3 : 0000000000000000 x2 : 0000000000000000 > x1 : ffffffc00481f080 x0 : ffffffc007846000 > > Call trace: > Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) > 2ec0: ffffffbdc00887c0 ffffff900a31d000 > 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 > 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 > 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 > 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 > 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d > 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 > 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 > 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 > 2fe0: f9407e11d00001f0 d61f02209103e210 > [] copy_page+0x38/0x120 > [] migrate_page+0x74/0x98 > [] nfs_migrate_page+0x58/0x80 > [] move_to_new_page+0x15c/0x4d8 > [] migrate_pages+0x7c8/0x11f0 > [] compact_zone+0xdfc/0x2570 > [] compact_zone_order+0xe0/0x170 > [] try_to_compact_pages+0x2e8/0x8f8 > [] __alloc_pages_direct_compact+0x100/0x540 > [] __alloc_pages_nodemask+0xc40/0x1c58 > [] khugepaged+0x468/0x19c8 > [] kthread+0x248/0x2c0 > [] ret_from_fork+0x10/0x40 > Code: d281f012 91020021 f1020252 d503201f (a8000c02) > > > I did some initial investigation and found it is caused by > DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline > 4.6-rc3 works well. > > It should be not arch specific although I got it caught on ARM64. I > suspect this might be caused by Hugh's huge tmpfs patches. > > Thanks, > Yang > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933043AbcDTIBY (ORCPT ); Wed, 20 Apr 2016 04:01:24 -0400 Received: from mail-qg0-f53.google.com ([209.85.192.53]:34145 "EHLO mail-qg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932632AbcDTIBU (ORCPT ); Wed, 20 Apr 2016 04:01:20 -0400 Date: Wed, 20 Apr 2016 01:01:12 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Shi, Yang" cc: Andrew Morton , sfr@canb.auug.org.au, Hugh Dickins , Vlastimil Babka , LKML , linux-mm@kvack.org Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 In-Reply-To: <5716C29F.1090205@linaro.org> Message-ID: References: <5716C29F.1090205@linaro.org> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 19 Apr 2016, Shi, Yang wrote: > Hi folks, > > When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the below > kernel panic: > > Unable to handle kernel paging request at virtual address ffffffc007846000 > pgd = ffffffc01e21d000 > [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 > Internal error: Oops: 96000047 [#11] PREEMPT SMP > Modules linked in: loop > CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D > 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 > Hardware name: Freescale Layerscape 2085a RDB Board (DT) > task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 > PC is at copy_page+0x38/0x120 > LR is at migrate_page_copy+0x604/0x1660 > pc : [] lr : [] pstate: 20000145 > sp : ffffffc01ea8ecd0 > x29: ffffffc01ea8ecd0 x28: 0000000000000000 > x27: 1ffffff7b80240f8 x26: ffffffc018196f20 > x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 > x23: 0000000000000000 x22: ffffffc01e3fcf80 > x21: ffffffc00481f000 x20: ffffff900a31d000 > x19: ffffffbdc01207c0 x18: 0000000000000f00 > x17: 0000000000000000 x16: 0000000000000000 > x15: 0000000000000000 x14: 0000000000000000 > x13: 0000000000000000 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 > x9 : 0000000000000000 x8 : 0000000000000000 > x7 : 0000000000000000 x6 : 0000000000000000 > x5 : 0000000000000000 x4 : 0000000000000000 > x3 : 0000000000000000 x2 : 0000000000000000 > x1 : ffffffc00481f080 x0 : ffffffc007846000 > > Call trace: > Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) > 2ec0: ffffffbdc00887c0 ffffff900a31d000 > 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 > 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 > 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 > 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 > 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d > 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 > 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 > 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 > 2fe0: f9407e11d00001f0 d61f02209103e210 > [] copy_page+0x38/0x120 > [] migrate_page+0x74/0x98 > [] nfs_migrate_page+0x58/0x80 > [] move_to_new_page+0x15c/0x4d8 > [] migrate_pages+0x7c8/0x11f0 > [] compact_zone+0xdfc/0x2570 > [] compact_zone_order+0xe0/0x170 > [] try_to_compact_pages+0x2e8/0x8f8 > [] __alloc_pages_direct_compact+0x100/0x540 > [] __alloc_pages_nodemask+0xc40/0x1c58 > [] khugepaged+0x468/0x19c8 > [] kthread+0x248/0x2c0 > [] ret_from_fork+0x10/0x40 > Code: d281f012 91020021 f1020252 d503201f (a8000c02) > > > I did some initial investigation and found it is caused by DEBUG_PAGEALLOC > and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works well. > > It should be not arch specific although I got it caught on ARM64. I suspect > this might be caused by Hugh's huge tmpfs patches. Thanks for testing. It might be caused by my patches, but I don't think that's very likely. This is page migraton for compaction, in the service of anon THP's khugepaged; and I wonder if you were even exercising huge tmpfs when running LTP here (it certainly can be done: I like to mount a huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other tmpfs mounts are also huge). There are compaction changes in linux-next too, but I don't see any reason why they'd cause this. I don't know arm64 traces enough to know whether it's the source page or the destination page for the copy, but it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before reaching migration's copy. Needs more debugging, I'm afraid: is it reproducible? Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751889AbcDTQLz (ORCPT ); Wed, 20 Apr 2016 12:11:55 -0400 Received: from mail-pf0-f174.google.com ([209.85.192.174]:35670 "EHLO mail-pf0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751608AbcDTQLw (ORCPT ); Wed, 20 Apr 2016 12:11:52 -0400 Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 To: Hugh Dickins References: <5716C29F.1090205@linaro.org> Cc: Andrew Morton , sfr@canb.auug.org.au, Vlastimil Babka , LKML , linux-mm@kvack.org From: "Shi, Yang" Message-ID: <5717AA46.5020905@linaro.org> Date: Wed, 20 Apr 2016 09:11:50 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/20/2016 1:01 AM, Hugh Dickins wrote: > On Tue, 19 Apr 2016, Shi, Yang wrote: >> Hi folks, >> >> When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the below >> kernel panic: >> >> Unable to handle kernel paging request at virtual address ffffffc007846000 >> pgd = ffffffc01e21d000 >> [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 >> Internal error: Oops: 96000047 [#11] PREEMPT SMP >> Modules linked in: loop >> CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D >> 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 >> Hardware name: Freescale Layerscape 2085a RDB Board (DT) >> task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 >> PC is at copy_page+0x38/0x120 >> LR is at migrate_page_copy+0x604/0x1660 >> pc : [] lr : [] pstate: 20000145 >> sp : ffffffc01ea8ecd0 >> x29: ffffffc01ea8ecd0 x28: 0000000000000000 >> x27: 1ffffff7b80240f8 x26: ffffffc018196f20 >> x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 >> x23: 0000000000000000 x22: ffffffc01e3fcf80 >> x21: ffffffc00481f000 x20: ffffff900a31d000 >> x19: ffffffbdc01207c0 x18: 0000000000000f00 >> x17: 0000000000000000 x16: 0000000000000000 >> x15: 0000000000000000 x14: 0000000000000000 >> x13: 0000000000000000 x12: 0000000000000000 >> x11: 0000000000000000 x10: 0000000000000000 >> x9 : 0000000000000000 x8 : 0000000000000000 >> x7 : 0000000000000000 x6 : 0000000000000000 >> x5 : 0000000000000000 x4 : 0000000000000000 >> x3 : 0000000000000000 x2 : 0000000000000000 >> x1 : ffffffc00481f080 x0 : ffffffc007846000 >> >> Call trace: >> Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) >> 2ec0: ffffffbdc00887c0 ffffff900a31d000 >> 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 >> 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 >> 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 >> 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 >> 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d >> 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 >> 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 >> 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 >> 2fe0: f9407e11d00001f0 d61f02209103e210 >> [] copy_page+0x38/0x120 >> [] migrate_page+0x74/0x98 >> [] nfs_migrate_page+0x58/0x80 >> [] move_to_new_page+0x15c/0x4d8 >> [] migrate_pages+0x7c8/0x11f0 >> [] compact_zone+0xdfc/0x2570 >> [] compact_zone_order+0xe0/0x170 >> [] try_to_compact_pages+0x2e8/0x8f8 >> [] __alloc_pages_direct_compact+0x100/0x540 >> [] __alloc_pages_nodemask+0xc40/0x1c58 >> [] khugepaged+0x468/0x19c8 >> [] kthread+0x248/0x2c0 >> [] ret_from_fork+0x10/0x40 >> Code: d281f012 91020021 f1020252 d503201f (a8000c02) >> >> >> I did some initial investigation and found it is caused by DEBUG_PAGEALLOC >> and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works well. >> >> It should be not arch specific although I got it caught on ARM64. I suspect >> this might be caused by Hugh's huge tmpfs patches. > > Thanks for testing. It might be caused by my patches, but I don't think > that's very likely. This is page migraton for compaction, in the service > of anon THP's khugepaged; and I wonder if you were even exercising huge > tmpfs when running LTP here (it certainly can be done: I like to mount a > huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other > tmpfs mounts are also huge). Some further investigation shows I got the panic even though I don't have tmpfs mounted with huge=1 or set shmem_huge to 2. > > There are compaction changes in linux-next too, but I don't see any > reason why they'd cause this. I don't know arm64 traces enough to know > whether it's the source page or the destination page for the copy, but > it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before > reaching migration's copy. The fault address is passed by x0, which is dest in the implementation of copy_page, so it is the destination page. > > Needs more debugging, I'm afraid: is it reproducible? Yes, as long as I enable those two PAGEALLOC debug options, I can get the panic once I run ltp. But, it is not caused any specific ltp test case directly, the panic happens randomly during ltp is running. Thanks, Yang > > Hugh > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753273AbcD0IO1 (ORCPT ); Wed, 27 Apr 2016 04:14:27 -0400 Received: from mail-pa0-f48.google.com ([209.85.220.48]:36318 "EHLO mail-pa0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752647AbcD0IOW (ORCPT ); Wed, 27 Apr 2016 04:14:22 -0400 Date: Wed, 27 Apr 2016 01:14:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Shi, Yang" cc: Hugh Dickins , Andrew Morton , sfr@canb.auug.org.au, Vlastimil Babka , LKML , linux-mm@kvack.org Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 In-Reply-To: <5717AA46.5020905@linaro.org> Message-ID: References: <5716C29F.1090205@linaro.org> <5717AA46.5020905@linaro.org> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 20 Apr 2016, Shi, Yang wrote: > On 4/20/2016 1:01 AM, Hugh Dickins wrote: > > On Tue, 19 Apr 2016, Shi, Yang wrote: > > > Hi folks, > > > > > > When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the > > > below > > > kernel panic: > > > > > > Unable to handle kernel paging request at virtual address > > > ffffffc007846000 > > > pgd = ffffffc01e21d000 > > > [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 > > > Internal error: Oops: 96000047 [#11] PREEMPT SMP > > > Modules linked in: loop > > > CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D > > > 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 > > > Hardware name: Freescale Layerscape 2085a RDB Board (DT) > > > task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 > > > PC is at copy_page+0x38/0x120 > > > LR is at migrate_page_copy+0x604/0x1660 > > > pc : [] lr : [] pstate: 20000145 > > > sp : ffffffc01ea8ecd0 > > > x29: ffffffc01ea8ecd0 x28: 0000000000000000 > > > x27: 1ffffff7b80240f8 x26: ffffffc018196f20 > > > x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 > > > x23: 0000000000000000 x22: ffffffc01e3fcf80 > > > x21: ffffffc00481f000 x20: ffffff900a31d000 > > > x19: ffffffbdc01207c0 x18: 0000000000000f00 > > > x17: 0000000000000000 x16: 0000000000000000 > > > x15: 0000000000000000 x14: 0000000000000000 > > > x13: 0000000000000000 x12: 0000000000000000 > > > x11: 0000000000000000 x10: 0000000000000000 > > > x9 : 0000000000000000 x8 : 0000000000000000 > > > x7 : 0000000000000000 x6 : 0000000000000000 > > > x5 : 0000000000000000 x4 : 0000000000000000 > > > x3 : 0000000000000000 x2 : 0000000000000000 > > > x1 : ffffffc00481f080 x0 : ffffffc007846000 > > > > > > Call trace: > > > Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) > > > 2ec0: ffffffbdc00887c0 ffffff900a31d000 > > > 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 > > > 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 > > > 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 > > > 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 > > > 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d > > > 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 > > > 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 > > > 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 > > > 2fe0: f9407e11d00001f0 d61f02209103e210 > > > [] copy_page+0x38/0x120 > > > [] migrate_page+0x74/0x98 > > > [] nfs_migrate_page+0x58/0x80 > > > [] move_to_new_page+0x15c/0x4d8 > > > [] migrate_pages+0x7c8/0x11f0 > > > [] compact_zone+0xdfc/0x2570 > > > [] compact_zone_order+0xe0/0x170 > > > [] try_to_compact_pages+0x2e8/0x8f8 > > > [] __alloc_pages_direct_compact+0x100/0x540 > > > [] __alloc_pages_nodemask+0xc40/0x1c58 > > > [] khugepaged+0x468/0x19c8 > > > [] kthread+0x248/0x2c0 > > > [] ret_from_fork+0x10/0x40 > > > Code: d281f012 91020021 f1020252 d503201f (a8000c02) > > > > > > > > > I did some initial investigation and found it is caused by > > > DEBUG_PAGEALLOC > > > and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works > > > well. > > > > > > It should be not arch specific although I got it caught on ARM64. I > > > suspect > > > this might be caused by Hugh's huge tmpfs patches. > > > > Thanks for testing. It might be caused by my patches, but I don't think > > that's very likely. This is page migraton for compaction, in the service > > of anon THP's khugepaged; and I wonder if you were even exercising huge > > tmpfs when running LTP here (it certainly can be done: I like to mount a > > huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other > > tmpfs mounts are also huge). > > Some further investigation shows I got the panic even though I don't have > tmpfs mounted with huge=1 or set shmem_huge to 2. > > > > > There are compaction changes in linux-next too, but I don't see any > > reason why they'd cause this. I don't know arm64 traces enough to know > > whether it's the source page or the destination page for the copy, but > > it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before > > reaching migration's copy. > > The fault address is passed by x0, which is dest in the implementation of > copy_page, so it is the destination page. > > > > > Needs more debugging, I'm afraid: is it reproducible? > > Yes, as long as I enable those two PAGEALLOC debug options, I can get the > panic once I run ltp. But, it is not caused any specific ltp test case > directly, the panic happens randomly during ltp is running. Your ping on the crash in release_freepages() reminded me to take another look at this one. And found that I only needed to enable DEBUG_PAGEALLOC and run LTP to get it on x86_64 too, as you suspected. It's another of those compaction errors, in mmotm and linux-next of a week or two ago, whose patch has since been withdrawn (but huge tmpfs has also been withdrawn for now, so you're right to stick with the older linux-next for testing it). I believe the patch below fixes it; but I've not done full diligence on it - if I had more time, I'd want to check that all of the things that need doing are now being done on this path, and that it's also okay if the release undoes them even when they didn't get to be done. But not worth that diligence if the patch is withdrawn already. It's rather horrible that compaction.c uses functions in page_alloc.c which skip doing some of the things we expect to be done: the non-debug preparation tends to get noticed, but the debug options overlooked. We can expect more problems of this kind in future: someone will add yet another debug prep line in page_alloc.c, and at first nobody will notice that it's also needed in compaction.c. I am hopeful, since the missed map_pages() does KASAN initialization too, that this might also fix your KASAN use-after-free in nfs_do_filldir(), which you also reported on April 20th. But with this patch in, I do get a more interesting crash in remap_team_by_ptes() from LTP's mmapstress10: there appears to be an anon THP in a huge tmpfs vma. Maybe I've got the test at the head of __split_huge_pmd() wrong, but I don't recall seeing this before rebuilding with DEBUG_PAGEALLOC. Can't spend longer on it now, will return to it tomorrow. Hugh --- mm/compaction.c | 1 + 1 file changed, 1 insertion(+) --- 4.6-rc2-mm1/mm/compaction.c 2016-04-11 11:35:08.000000000 -0700 +++ linux/mm/compaction.c 2016-04-26 22:15:10.954455303 -0700 @@ -1113,6 +1113,7 @@ static void isolate_freepages_direct(str } spin_unlock_irqrestore(&cc->zone->lock, flags); + map_pages(&cc->freepages); } /* From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752377AbcD0Iv1 (ORCPT ); Wed, 27 Apr 2016 04:51:27 -0400 Received: from mx2.suse.de ([195.135.220.15]:36874 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751241AbcD0IvZ (ORCPT ); Wed, 27 Apr 2016 04:51:25 -0400 Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 To: Hugh Dickins , "Shi, Yang" References: <5716C29F.1090205@linaro.org> <5717AA46.5020905@linaro.org> Cc: Andrew Morton , sfr@canb.auug.org.au, LKML , linux-mm@kvack.org From: Vlastimil Babka Message-ID: <57207D88.4000508@suse.cz> Date: Wed, 27 Apr 2016 10:51:20 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/27/2016 10:14 AM, Hugh Dickins wrote: > It's rather horrible that compaction.c uses functions in page_alloc.c > which skip doing some of the things we expect to be done: the non-debug > preparation tends to get noticed, but the debug options overlooked. > We can expect more problems of this kind in future: someone will add > yet another debug prep line in page_alloc.c, and at first nobody will > notice that it's also needed in compaction.c. Point taken, I'll try to come up with more maintainable solution next time I attempt the isolate_freepages_direct() approach. Sorry about the troubles. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753198AbcD1SV5 (ORCPT ); Thu, 28 Apr 2016 14:21:57 -0400 Received: from mail-pa0-f50.google.com ([209.85.220.50]:35542 "EHLO mail-pa0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752634AbcD1SVz (ORCPT ); Thu, 28 Apr 2016 14:21:55 -0400 Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414 To: Hugh Dickins References: <5716C29F.1090205@linaro.org> <5717AA46.5020905@linaro.org> Cc: Andrew Morton , sfr@canb.auug.org.au, Vlastimil Babka , LKML , linux-mm@kvack.org From: "Shi, Yang" Message-ID: Date: Thu, 28 Apr 2016 11:21:47 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/27/2016 1:14 AM, Hugh Dickins wrote: > On Wed, 20 Apr 2016, Shi, Yang wrote: >> On 4/20/2016 1:01 AM, Hugh Dickins wrote: >>> On Tue, 19 Apr 2016, Shi, Yang wrote: >>>> Hi folks, >>>> >>>> When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the >>>> below >>>> kernel panic: >>>> >>>> Unable to handle kernel paging request at virtual address >>>> ffffffc007846000 >>>> pgd = ffffffc01e21d000 >>>> [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000 >>>> Internal error: Oops: 96000047 [#11] PREEMPT SMP >>>> Modules linked in: loop >>>> CPU: 7 PID: 274 Comm: systemd-journal Tainted: G D >>>> 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9 >>>> Hardware name: Freescale Layerscape 2085a RDB Board (DT) >>>> task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000 >>>> PC is at copy_page+0x38/0x120 >>>> LR is at migrate_page_copy+0x604/0x1660 >>>> pc : [] lr : [] pstate: 20000145 >>>> sp : ffffffc01ea8ecd0 >>>> x29: ffffffc01ea8ecd0 x28: 0000000000000000 >>>> x27: 1ffffff7b80240f8 x26: ffffffc018196f20 >>>> x25: ffffffbdc01e1180 x24: ffffffbdc01e1180 >>>> x23: 0000000000000000 x22: ffffffc01e3fcf80 >>>> x21: ffffffc00481f000 x20: ffffff900a31d000 >>>> x19: ffffffbdc01207c0 x18: 0000000000000f00 >>>> x17: 0000000000000000 x16: 0000000000000000 >>>> x15: 0000000000000000 x14: 0000000000000000 >>>> x13: 0000000000000000 x12: 0000000000000000 >>>> x11: 0000000000000000 x10: 0000000000000000 >>>> x9 : 0000000000000000 x8 : 0000000000000000 >>>> x7 : 0000000000000000 x6 : 0000000000000000 >>>> x5 : 0000000000000000 x4 : 0000000000000000 >>>> x3 : 0000000000000000 x2 : 0000000000000000 >>>> x1 : ffffffc00481f080 x0 : ffffffc007846000 >>>> >>>> Call trace: >>>> Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0) >>>> 2ec0: ffffffbdc00887c0 ffffff900a31d000 >>>> 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025 >>>> 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0 >>>> 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0 >>>> 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8 >>>> 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d >>>> 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940 >>>> 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8 >>>> 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080 >>>> 2fe0: f9407e11d00001f0 d61f02209103e210 >>>> [] copy_page+0x38/0x120 >>>> [] migrate_page+0x74/0x98 >>>> [] nfs_migrate_page+0x58/0x80 >>>> [] move_to_new_page+0x15c/0x4d8 >>>> [] migrate_pages+0x7c8/0x11f0 >>>> [] compact_zone+0xdfc/0x2570 >>>> [] compact_zone_order+0xe0/0x170 >>>> [] try_to_compact_pages+0x2e8/0x8f8 >>>> [] __alloc_pages_direct_compact+0x100/0x540 >>>> [] __alloc_pages_nodemask+0xc40/0x1c58 >>>> [] khugepaged+0x468/0x19c8 >>>> [] kthread+0x248/0x2c0 >>>> [] ret_from_fork+0x10/0x40 >>>> Code: d281f012 91020021 f1020252 d503201f (a8000c02) >>>> >>>> >>>> I did some initial investigation and found it is caused by >>>> DEBUG_PAGEALLOC >>>> and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works >>>> well. >>>> >>>> It should be not arch specific although I got it caught on ARM64. I >>>> suspect >>>> this might be caused by Hugh's huge tmpfs patches. >>> >>> Thanks for testing. It might be caused by my patches, but I don't think >>> that's very likely. This is page migraton for compaction, in the service >>> of anon THP's khugepaged; and I wonder if you were even exercising huge >>> tmpfs when running LTP here (it certainly can be done: I like to mount a >>> huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other >>> tmpfs mounts are also huge). >> >> Some further investigation shows I got the panic even though I don't have >> tmpfs mounted with huge=1 or set shmem_huge to 2. >> >>> >>> There are compaction changes in linux-next too, but I don't see any >>> reason why they'd cause this. I don't know arm64 traces enough to know >>> whether it's the source page or the destination page for the copy, but >>> it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before >>> reaching migration's copy. >> >> The fault address is passed by x0, which is dest in the implementation of >> copy_page, so it is the destination page. >> >>> >>> Needs more debugging, I'm afraid: is it reproducible? >> >> Yes, as long as I enable those two PAGEALLOC debug options, I can get the >> panic once I run ltp. But, it is not caused any specific ltp test case >> directly, the panic happens randomly during ltp is running. > > Your ping on the crash in release_freepages() reminded me to take another > look at this one. And found that I only needed to enable DEBUG_PAGEALLOC > and run LTP to get it on x86_64 too, as you suspected. > > It's another of those compaction errors, in mmotm and linux-next of a > week or two ago, whose patch has since been withdrawn (but huge tmpfs > has also been withdrawn for now, so you're right to stick with the > older linux-next for testing it). Yes, I saw the discussion on LSFMM 2016 and the patches have gone in my latest update from linux-next. I will stick to 20160420 for the huge tmpfs testing. > > I believe the patch below fixes it; but I've not done full diligence > on it - if I had more time, I'd want to check that all of the things > that need doing are now being done on this path, and that it's also > okay if the release undoes them even when they didn't get to be done. > But not worth that diligence if the patch is withdrawn already. > > It's rather horrible that compaction.c uses functions in page_alloc.c > which skip doing some of the things we expect to be done: the non-debug > preparation tends to get noticed, but the debug options overlooked. > We can expect more problems of this kind in future: someone will add > yet another debug prep line in page_alloc.c, and at first nobody will > notice that it's also needed in compaction.c. > > I am hopeful, since the missed map_pages() does KASAN initialization too, > that this might also fix your KASAN use-after-free in nfs_do_filldir(), > which you also reported on April 20th. > > But with this patch in, I do get a more interesting crash in > remap_team_by_ptes() from LTP's mmapstress10: there appears to be an > anon THP in a huge tmpfs vma. Maybe I've got the test at the head of > __split_huge_pmd() wrong, but I don't recall seeing this before > rebuilding with DEBUG_PAGEALLOC. Can't spend longer on it now, > will return to it tomorrow. Thanks for the patch and the patch for another problem. Regards, Yang > > Hugh > --- > mm/compaction.c | 1 + > 1 file changed, 1 insertion(+) > > --- 4.6-rc2-mm1/mm/compaction.c 2016-04-11 11:35:08.000000000 -0700 > +++ linux/mm/compaction.c 2016-04-26 22:15:10.954455303 -0700 > @@ -1113,6 +1113,7 @@ static void isolate_freepages_direct(str > } > > spin_unlock_irqrestore(&cc->zone->lock, flags); > + map_pages(&cc->freepages); > } > > /* >