From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
linux-arm-kernel@lists.infradead.org,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
linux-s390@vger.kernel.org
Subject: Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Thu, 18 Feb 2016 16:00:37 +0100 [thread overview]
Message-ID: <20160218160037.627cc7ec@thinkpad> (raw)
In-Reply-To: <20160217235808.GA21696@node.shutemov.name>
On Thu, 18 Feb 2016 01:58:08 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> >
> > > [ 59.875935] ------------[ cut here ]------------
> > > [ 59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > [ 59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > [ 59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > [ 59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G W 4.4.0-07794-ga4eff16-dirty #77
> > > [ 59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > [ 59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > [ 59.876045] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > [ 59.876051] 0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > [ 59.876054] 000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > [ 59.876058] 000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > [ 59.876068] Krnl Code: 00000000002bf39c: d507d010a000 clc 16(8,%%r13),0(%%r10)
> > > 00000000002bf3a2: a7840004 brc 8,2bf3aa
> > > #00000000002bf3a6: a7f40001 brc 15,2bf3a8
> > > >00000000002bf3aa: 91407440 tm 1088(%%r7),64
> > > 00000000002bf3ae: a7840208 brc 8,2bf7be
> > > 00000000002bf3b2: a7f401e9 brc 15,2bf784
> > > 00000000002bf3b6: 9104a006 tm 6(%%r10),4
> > > 00000000002bf3ba: a7740004 brc 7,2bf3c2
> > > [ 59.876089] Call Trace:
> > > [ 59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > [ 59.876095] [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > [ 59.876099] [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > [ 59.876102] [<0000000000282d66>] zap_page_range+0x116/0x318
> > > [ 59.876105] [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > [ 59.876108] [<00000000006f9f56>] system_call+0xd6/0x258
> > > [ 59.876111] [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > [ 59.876113] INFO: lockdep is turned off.
> > > [ 59.876115] Last Breaking-Event-Address:
> > > [ 59.876118] [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> >
> > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > which is good, but doesn't that mean that if such a collapsed hugepage will
> > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> >
> > This behavior is not new, it was the same before the THP rework, so I do not
> > assume that it is related to the current problems, maybe with the exception
> > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > and the other crashes probably cannot be explained with this. Maybe I am
> > also missing something, but I do not see how collapse_huge_page() and the
> > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > checks. Any thoughts?
>
> I don't think there's a problem: ptes in the pgtable are cleared with
> pte_clear() in __collapse_huge_page_copy().
>
Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
list inside the pre-allocated pgtables, instead of the struct pages, it may
also explain why we see don't the problems on x86.
We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
issues with the deposit/withdraw list, see below:
[ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping: (null) index:0x0
[ 2489.384075] flags: 0x0()
[ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
[ 2489.384086] ------------[ cut here ]------------
[ 2489.384088] kernel BUG at include/linux/mm.h:1700!
[ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
[ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G B W 4.5.0-rc3-00083-gc05235d #10
[ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
[ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
[ 2489.384184] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
[ 2489.384189] 0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
[ 2489.384192] 000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
[ 2489.384195] 000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
[ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd larl %%r3,84669e
0000000000283cea: c0e5ffffd217 brasl %%r14,27e118
#0000000000283cf0: a7f40001 brc 15,283cf2
>0000000000283cf4: c0e5fffffe5a brasl %%r14,2839a8
0000000000283cfa: b9040027 lgr %%r2,%%r7
0000000000283cfe: b904003c lgr %%r3,%%r12
0000000000283d02: c0e5fff509e3 brasl %%r14,1250c8
0000000000283d08: e31070000004 lg %%r1,0(%%r7)
[ 2489.384221] Call Trace:
[ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
[ 2489.384227] [<0000000000283f38>] free_pgtables+0x118/0x148
[ 2489.384230] [<000000000028c32e>] exit_mmap+0xd6/0x300
[ 2489.384233] [<0000000000134d70>] mmput+0x90/0x118
[ 2489.384235] [<000000000013a55c>] do_exit+0x41c/0xd18
[ 2489.384238] [<000000000013c3c2>] do_group_exit+0x92/0xd8
[ 2489.384241] [<000000000013c432>] SyS_exit_group+0x2a/0x30
[ 2489.384244] [<00000000006b1a36>] system_call+0xd6/0x258
[ 2489.384246] [<000003ff7d343698>] 0x3ff7d343698
[ 2489.384248] INFO: lockdep is turned off.
[ 2489.384251] Last Breaking-Event-Address:
[ 2489.384253] [<0000000000283cf0>] free_pgd_range+0x330/0x460
[ 2489.384256]
[ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops
I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
makes any difference.
WARNING: multiple messages have this Message-ID (diff)
From: gerald.schaefer@de.ibm.com (Gerald Schaefer)
To: linux-arm-kernel@lists.infradead.org
Subject: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Thu, 18 Feb 2016 16:00:37 +0100 [thread overview]
Message-ID: <20160218160037.627cc7ec@thinkpad> (raw)
In-Reply-To: <20160217235808.GA21696@node.shutemov.name>
On Thu, 18 Feb 2016 01:58:08 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> >
> > > [ 59.875935] ------------[ cut here ]------------
> > > [ 59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > [ 59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > [ 59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > [ 59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G W 4.4.0-07794-ga4eff16-dirty #77
> > > [ 59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > [ 59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > [ 59.876045] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > [ 59.876051] 0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > [ 59.876054] 000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > [ 59.876058] 000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > [ 59.876068] Krnl Code: 00000000002bf39c: d507d010a000 clc 16(8,%%r13),0(%%r10)
> > > 00000000002bf3a2: a7840004 brc 8,2bf3aa
> > > #00000000002bf3a6: a7f40001 brc 15,2bf3a8
> > > >00000000002bf3aa: 91407440 tm 1088(%%r7),64
> > > 00000000002bf3ae: a7840208 brc 8,2bf7be
> > > 00000000002bf3b2: a7f401e9 brc 15,2bf784
> > > 00000000002bf3b6: 9104a006 tm 6(%%r10),4
> > > 00000000002bf3ba: a7740004 brc 7,2bf3c2
> > > [ 59.876089] Call Trace:
> > > [ 59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > [ 59.876095] [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > [ 59.876099] [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > [ 59.876102] [<0000000000282d66>] zap_page_range+0x116/0x318
> > > [ 59.876105] [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > [ 59.876108] [<00000000006f9f56>] system_call+0xd6/0x258
> > > [ 59.876111] [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > [ 59.876113] INFO: lockdep is turned off.
> > > [ 59.876115] Last Breaking-Event-Address:
> > > [ 59.876118] [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> >
> > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > which is good, but doesn't that mean that if such a collapsed hugepage will
> > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> >
> > This behavior is not new, it was the same before the THP rework, so I do not
> > assume that it is related to the current problems, maybe with the exception
> > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > and the other crashes probably cannot be explained with this. Maybe I am
> > also missing something, but I do not see how collapse_huge_page() and the
> > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > checks. Any thoughts?
>
> I don't think there's a problem: ptes in the pgtable are cleared with
> pte_clear() in __collapse_huge_page_copy().
>
Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
list inside the pre-allocated pgtables, instead of the struct pages, it may
also explain why we see don't the problems on x86.
We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
issues with the deposit/withdraw list, see below:
[ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping: (null) index:0x0
[ 2489.384075] flags: 0x0()
[ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
[ 2489.384086] ------------[ cut here ]------------
[ 2489.384088] kernel BUG at include/linux/mm.h:1700!
[ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
[ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G B W 4.5.0-rc3-00083-gc05235d #10
[ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
[ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
[ 2489.384184] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
[ 2489.384189] 0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
[ 2489.384192] 000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
[ 2489.384195] 000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
[ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd larl %%r3,84669e
0000000000283cea: c0e5ffffd217 brasl %%r14,27e118
#0000000000283cf0: a7f40001 brc 15,283cf2
>0000000000283cf4: c0e5fffffe5a brasl %%r14,2839a8
0000000000283cfa: b9040027 lgr %%r2,%%r7
0000000000283cfe: b904003c lgr %%r3,%%r12
0000000000283d02: c0e5fff509e3 brasl %%r14,1250c8
0000000000283d08: e31070000004 lg %%r1,0(%%r7)
[ 2489.384221] Call Trace:
[ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
[ 2489.384227] [<0000000000283f38>] free_pgtables+0x118/0x148
[ 2489.384230] [<000000000028c32e>] exit_mmap+0xd6/0x300
[ 2489.384233] [<0000000000134d70>] mmput+0x90/0x118
[ 2489.384235] [<000000000013a55c>] do_exit+0x41c/0xd18
[ 2489.384238] [<000000000013c3c2>] do_group_exit+0x92/0xd8
[ 2489.384241] [<000000000013c432>] SyS_exit_group+0x2a/0x30
[ 2489.384244] [<00000000006b1a36>] system_call+0xd6/0x258
[ 2489.384246] [<000003ff7d343698>] 0x3ff7d343698
[ 2489.384248] INFO: lockdep is turned off.
[ 2489.384251] Last Breaking-Event-Address:
[ 2489.384253] [<0000000000283cf0>] free_pgd_range+0x330/0x460
[ 2489.384256]
[ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops
I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
makes any difference.
WARNING: multiple messages have this Message-ID (diff)
From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
linux-arm-kernel@lists.infradead.org,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
linux-s390@vger.kernel.org
Subject: Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Thu, 18 Feb 2016 16:00:37 +0100 [thread overview]
Message-ID: <20160218160037.627cc7ec@thinkpad> (raw)
In-Reply-To: <20160217235808.GA21696@node.shutemov.name>
On Thu, 18 Feb 2016 01:58:08 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> >
> > > [ 59.875935] ------------[ cut here ]------------
> > > [ 59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > [ 59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > [ 59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > [ 59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G W 4.4.0-07794-ga4eff16-dirty #77
> > > [ 59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > [ 59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > [ 59.876045] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > [ 59.876051] 0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > [ 59.876054] 000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > [ 59.876058] 000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > [ 59.876068] Krnl Code: 00000000002bf39c: d507d010a000 clc 16(8,%%r13),0(%%r10)
> > > 00000000002bf3a2: a7840004 brc 8,2bf3aa
> > > #00000000002bf3a6: a7f40001 brc 15,2bf3a8
> > > >00000000002bf3aa: 91407440 tm 1088(%%r7),64
> > > 00000000002bf3ae: a7840208 brc 8,2bf7be
> > > 00000000002bf3b2: a7f401e9 brc 15,2bf784
> > > 00000000002bf3b6: 9104a006 tm 6(%%r10),4
> > > 00000000002bf3ba: a7740004 brc 7,2bf3c2
> > > [ 59.876089] Call Trace:
> > > [ 59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > [ 59.876095] [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > [ 59.876099] [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > [ 59.876102] [<0000000000282d66>] zap_page_range+0x116/0x318
> > > [ 59.876105] [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > [ 59.876108] [<00000000006f9f56>] system_call+0xd6/0x258
> > > [ 59.876111] [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > [ 59.876113] INFO: lockdep is turned off.
> > > [ 59.876115] Last Breaking-Event-Address:
> > > [ 59.876118] [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> >
> > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > which is good, but doesn't that mean that if such a collapsed hugepage will
> > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> >
> > This behavior is not new, it was the same before the THP rework, so I do not
> > assume that it is related to the current problems, maybe with the exception
> > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > and the other crashes probably cannot be explained with this. Maybe I am
> > also missing something, but I do not see how collapse_huge_page() and the
> > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > checks. Any thoughts?
>
> I don't think there's a problem: ptes in the pgtable are cleared with
> pte_clear() in __collapse_huge_page_copy().
>
Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
list inside the pre-allocated pgtables, instead of the struct pages, it may
also explain why we see don't the problems on x86.
We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
issues with the deposit/withdraw list, see below:
[ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping: (null) index:0x0
[ 2489.384075] flags: 0x0()
[ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
[ 2489.384086] ------------[ cut here ]------------
[ 2489.384088] kernel BUG at include/linux/mm.h:1700!
[ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
[ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G B W 4.5.0-rc3-00083-gc05235d #10
[ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
[ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
[ 2489.384184] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
[ 2489.384189] 0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
[ 2489.384192] 000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
[ 2489.384195] 000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
[ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd larl %%r3,84669e
0000000000283cea: c0e5ffffd217 brasl %%r14,27e118
#0000000000283cf0: a7f40001 brc 15,283cf2
>0000000000283cf4: c0e5fffffe5a brasl %%r14,2839a8
0000000000283cfa: b9040027 lgr %%r2,%%r7
0000000000283cfe: b904003c lgr %%r3,%%r12
0000000000283d02: c0e5fff509e3 brasl %%r14,1250c8
0000000000283d08: e31070000004 lg %%r1,0(%%r7)
[ 2489.384221] Call Trace:
[ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
[ 2489.384227] [<0000000000283f38>] free_pgtables+0x118/0x148
[ 2489.384230] [<000000000028c32e>] exit_mmap+0xd6/0x300
[ 2489.384233] [<0000000000134d70>] mmput+0x90/0x118
[ 2489.384235] [<000000000013a55c>] do_exit+0x41c/0xd18
[ 2489.384238] [<000000000013c3c2>] do_group_exit+0x92/0xd8
[ 2489.384241] [<000000000013c432>] SyS_exit_group+0x2a/0x30
[ 2489.384244] [<00000000006b1a36>] system_call+0xd6/0x258
[ 2489.384246] [<000003ff7d343698>] 0x3ff7d343698
[ 2489.384248] INFO: lockdep is turned off.
[ 2489.384251] Last Breaking-Event-Address:
[ 2489.384253] [<0000000000283cf0>] free_pgd_range+0x330/0x460
[ 2489.384256]
[ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops
I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
makes any difference.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-02-18 15:00 UTC|newest]
Thread overview: 149+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-11 18:22 [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM) Gerald Schaefer
2016-02-11 18:22 ` Gerald Schaefer
2016-02-11 18:22 ` Gerald Schaefer
2016-02-11 19:09 ` Kirill A. Shutemov
2016-02-11 19:09 ` Kirill A. Shutemov
2016-02-11 19:09 ` Kirill A. Shutemov
2016-02-11 19:12 ` Kirill A. Shutemov
2016-02-11 19:12 ` Kirill A. Shutemov
2016-02-11 19:12 ` Kirill A. Shutemov
2016-02-12 12:21 ` Sebastian Ott
2016-02-12 12:21 ` Sebastian Ott
2016-02-12 12:21 ` Sebastian Ott
2016-02-11 19:57 ` Gerald Schaefer
2016-02-11 19:57 ` Gerald Schaefer
2016-02-11 19:57 ` Gerald Schaefer
2016-02-12 4:04 ` Aneesh Kumar K.V
2016-02-12 4:04 ` Aneesh Kumar K.V
2016-02-12 4:04 ` Aneesh Kumar K.V
2016-02-12 11:59 ` Gerald Schaefer
2016-02-12 11:59 ` Gerald Schaefer
2016-02-12 11:59 ` Gerald Schaefer
2016-02-12 16:17 ` Aneesh Kumar K.V
2016-02-12 16:17 ` Aneesh Kumar K.V
2016-02-12 16:17 ` Aneesh Kumar K.V
2016-02-12 10:01 ` Will Deacon
2016-02-12 10:01 ` Will Deacon
2016-02-12 10:01 ` Will Deacon
2016-02-12 10:12 ` Sebastian Ott
2016-02-12 10:12 ` Sebastian Ott
2016-02-12 10:12 ` Sebastian Ott
2016-02-12 15:52 ` Will Deacon
2016-02-12 15:52 ` Will Deacon
2016-02-12 15:52 ` Will Deacon
2016-02-12 15:41 ` Kirill A. Shutemov
2016-02-12 15:41 ` Kirill A. Shutemov
2016-02-12 15:41 ` Kirill A. Shutemov
2016-02-12 15:57 ` Christian Borntraeger
2016-02-12 15:57 ` Christian Borntraeger
2016-02-12 15:57 ` Christian Borntraeger
2016-02-12 17:16 ` Gerald Schaefer
2016-02-12 17:16 ` Gerald Schaefer
2016-02-12 17:16 ` Gerald Schaefer
2016-02-12 23:15 ` Kirill A. Shutemov
2016-02-12 23:15 ` Kirill A. Shutemov
2016-02-12 23:15 ` Kirill A. Shutemov
2016-02-13 11:58 ` Sebastian Ott
2016-02-13 11:58 ` Sebastian Ott
2016-02-13 11:58 ` Sebastian Ott
2016-02-15 11:31 ` Kirill A. Shutemov
2016-02-15 11:31 ` Kirill A. Shutemov
2016-02-15 11:31 ` Kirill A. Shutemov
2016-02-15 16:38 ` Sebastian Ott
2016-02-15 16:38 ` Sebastian Ott
2016-02-15 16:38 ` Sebastian Ott
2016-02-15 18:37 ` Gerald Schaefer
2016-02-15 18:37 ` Gerald Schaefer
2016-02-15 18:37 ` Gerald Schaefer
2016-02-15 18:37 ` Gerald Schaefer
2016-02-15 21:35 ` Kirill A. Shutemov
2016-02-15 21:35 ` Kirill A. Shutemov
2016-02-15 21:35 ` Kirill A. Shutemov
2016-02-16 9:54 ` Sebastian Ott
2016-02-16 9:54 ` Sebastian Ott
2016-02-16 9:54 ` Sebastian Ott
2016-02-16 16:24 ` Gerald Schaefer
2016-02-16 16:24 ` Gerald Schaefer
2016-02-16 16:24 ` Gerald Schaefer
2016-02-16 16:24 ` Gerald Schaefer
2016-02-17 15:04 ` Kirill A. Shutemov
2016-02-17 15:04 ` Kirill A. Shutemov
2016-02-17 15:04 ` Kirill A. Shutemov
2016-02-17 19:04 ` Sebastian Ott
2016-02-17 19:04 ` Sebastian Ott
2016-02-17 19:04 ` Sebastian Ott
2016-02-16 18:46 ` Christian Borntraeger
2016-02-16 18:46 ` Christian Borntraeger
2016-02-16 18:46 ` Christian Borntraeger
2016-02-17 19:13 ` Gerald Schaefer
2016-02-17 19:13 ` Gerald Schaefer
2016-02-17 19:13 ` Gerald Schaefer
2016-02-17 23:58 ` Kirill A. Shutemov
2016-02-17 23:58 ` Kirill A. Shutemov
2016-02-17 23:58 ` Kirill A. Shutemov
2016-02-18 15:00 ` Gerald Schaefer [this message]
2016-02-18 15:00 ` Gerald Schaefer
2016-02-18 15:00 ` Gerald Schaefer
2016-02-18 17:06 ` Kirill A. Shutemov
2016-02-18 17:06 ` Kirill A. Shutemov
2016-02-18 17:06 ` Kirill A. Shutemov
2016-02-19 14:15 ` Sebastian Ott
2016-02-19 14:15 ` Sebastian Ott
2016-02-19 14:15 ` Sebastian Ott
2016-02-15 16:41 ` Gerald Schaefer
2016-02-15 16:41 ` Gerald Schaefer
2016-02-15 16:41 ` Gerald Schaefer
2016-02-23 10:32 ` Kirill A. Shutemov
2016-02-23 10:32 ` Kirill A. Shutemov
2016-02-23 10:32 ` Kirill A. Shutemov
2016-02-23 17:46 ` Linus Torvalds
2016-02-23 17:46 ` Linus Torvalds
2016-02-23 17:46 ` Linus Torvalds
2016-02-23 18:19 ` Gerald Schaefer
2016-02-23 18:19 ` Gerald Schaefer
2016-02-23 18:19 ` Gerald Schaefer
2016-02-23 18:47 ` Will Deacon
2016-02-23 18:47 ` Will Deacon
2016-02-23 18:47 ` Will Deacon
2016-02-25 15:49 ` Steve Capper
2016-02-25 15:49 ` Steve Capper
2016-02-25 15:49 ` Steve Capper
2016-02-25 16:01 ` Kirill A. Shutemov
2016-02-25 16:01 ` Kirill A. Shutemov
2016-02-25 16:01 ` Kirill A. Shutemov
2016-02-25 16:08 ` Steve Capper
2016-02-25 16:08 ` Steve Capper
2016-02-25 16:08 ` Steve Capper
2016-02-23 19:33 ` Kirill A. Shutemov
2016-02-23 19:33 ` Kirill A. Shutemov
2016-02-23 19:33 ` Kirill A. Shutemov
2016-02-23 20:22 ` Will Deacon
2016-02-23 20:22 ` Will Deacon
2016-02-23 20:22 ` Will Deacon
2016-02-24 10:16 ` Christian Borntraeger
2016-02-24 10:16 ` Christian Borntraeger
2016-02-24 10:16 ` Christian Borntraeger
2016-02-24 10:41 ` Will Deacon
2016-02-24 10:41 ` Will Deacon
2016-02-24 10:41 ` Will Deacon
2016-02-24 10:51 ` Christian Borntraeger
2016-02-24 10:51 ` Christian Borntraeger
2016-02-24 10:51 ` Christian Borntraeger
2016-02-24 11:02 ` Will Deacon
2016-02-24 11:02 ` Will Deacon
2016-02-24 11:02 ` Will Deacon
2016-02-24 17:22 ` Aneesh Kumar K.V
2016-02-24 17:22 ` Aneesh Kumar K.V
2016-02-24 17:22 ` Aneesh Kumar K.V
2016-02-24 8:39 ` Martin Schwidefsky
2016-02-24 8:39 ` Martin Schwidefsky
2016-02-24 8:39 ` Martin Schwidefsky
2016-02-24 12:11 ` Sebastian Ott
2016-02-24 12:11 ` Sebastian Ott
2016-02-24 12:11 ` Sebastian Ott
2016-02-24 16:44 ` Gerald Schaefer
2016-02-24 16:44 ` Gerald Schaefer
2016-02-24 16:44 ` Gerald Schaefer
2016-02-24 8:22 ` Martin Schwidefsky
2016-02-24 8:22 ` Martin Schwidefsky
2016-02-24 8:22 ` Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160218160037.627cc7ec@thinkpad \
--to=gerald.schaefer@de.ibm.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=borntraeger@de.ibm.com \
--cc=catalin.marinas@arm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=schwidefsky@de.ibm.com \
--cc=sebott@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.