diff for duplicates of <20160215193702.4a15ed5e@thinkpad> diff --git a/a/1.txt b/N1/1.txt index 2ee45e4..4e04dbd 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -2,117 +2,80 @@ On Mon, 15 Feb 2016 13:31:59 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote: > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote: -> >=20 +> > > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote: > > > Could you check if revert of fecffad25458 helps? -> >=20 +> > > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with: -> >=20 -> > =C2=A2 1851.721062! Unable to handle kernel pointer dereference in virt= -ual kernel address space -> > =C2=A2 1851.721075! failing address: 0000000000000000 TEID: 00000000000= -00483 -> > =C2=A2 1851.721078! Fault in home space mode while using kernel ASCE. -> > =C2=A2 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000f= -fffa800 P:000000000000003d -> > =C2=A2 1851.721128! Oops: 0004 ilc:3 =C2=A2#1! PREEMPT SMP DEBUG_PAGEAL= -LOC -> > =C2=A2 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx= -4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core = -ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic= - genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod = -scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4 -> > =C2=A2 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3= --00058-g07923d7-dirty #178 -> > =C2=A2 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti= -: 000000008c604000 -> > =C2=A2 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_= -erase_color+0x280/0x308) -> > =C2=A2 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3= - CC:1 PM:0 EA:3 -> > Krnl GPRS: 0000000000000001 0000000000000020 00000000000= -00000 00000000bd07eff1 -> > =C2=A2 1851.721205! 000000000027ca10 0000000000000000 000000= -0083e45898 0000000077b61198 -> > =C2=A2 1851.721207! 000000007ce1a490 00000000bd07eff0 000000= -007ce1a548 000000000027ca10 -> > =C2=A2 1851.721210! 00000000bd07c350 00000000bd07eff0 000000= -008c607aa8 000000008c607a68 -> > =C2=A2 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg= - %%r12,8(%%r13) -> > 000000000045d3b0: b9040039 lgr = -%%r3,%%r9 -> > #000000000045d3b4: a53b0001 oill = -%%r3,1 -> > >000000000045d3b8: e33010000024 stg = -%%r3,0(%%r1) -> > 000000000045d3be: ec28000e007c cgij = -%%r2,0,8,45d3da -> > 000000000045d3c4: e34020000004 lg = -%%r4,0(%%r2) -> > 000000000045d3ca: b904001c lgr = -%%r1,%%r12 -> > 000000000045d3ce: ec143f3f0056 rosbg = -%%r1,%%r4,63,63,0 -> > =C2=A2 1851.721269! Call Trace: -> > =C2=A2 1851.721273! (=C2=A2<0000000083e45898>! 0x83e45898) -> > =C2=A2 1851.721279! =C2=A2<000000000029342a>! unlink_anon_vmas+0x9a/0x= -1d8 -> > =C2=A2 1851.721282! =C2=A2<0000000000283f34>! free_pgtables+0xcc/0x148 -> > =C2=A2 1851.721285! =C2=A2<000000000028c376>! exit_mmap+0xd6/0x300 -> > =C2=A2 1851.721289! =C2=A2<0000000000134db8>! mmput+0x90/0x118 -> > =C2=A2 1851.721294! =C2=A2<00000000002d76bc>! flush_old_exec+0x5d4/0x7= -00 -> > =C2=A2 1851.721298! =C2=A2<00000000003369f4>! load_elf_binary+0x2f4/0x= -13e8 -> > =C2=A2 1851.721301! =C2=A2<00000000002d6e4a>! search_binary_handler+0x= -9a/0x1f8 -> > =C2=A2 1851.721304! =C2=A2<00000000002d8970>! do_execveat_common.isra.= -32+0x668/0x9a0 -> > =C2=A2 1851.721307! =C2=A2<00000000002d8cec>! do_execve+0x44/0x58 -> > =C2=A2 1851.721310! =C2=A2<00000000002d8f92>! SyS_execve+0x3a/0x48 -> > =C2=A2 1851.721315! =C2=A2<00000000006fb096>! system_call+0xd6/0x258 -> > =C2=A2 1851.721317! =C2=A2<000003ff997436d6>! 0x3ff997436d6 -> > =C2=A2 1851.721319! INFO: lockdep is turned off. -> > =C2=A2 1851.721321! Last Breaking-Event-Address: -> > =C2=A2 1851.721323! =C2=A2<000000000045d31a>! __rb_erase_color+0x1e2/0= -x308 -> > =C2=A2 1851.721327! -> > =C2=A2 1851.721329! ---=C2=A2 end trace 0d80041ac00cfae2 !--- -> >=20 -> >=20 -> > >=20 -> > > And could you share how crashes looks like? I haven't seen backtraces= - yet. -> > >=20 -> >=20 +> > +> > ? 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space +> > ? 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483 +> > ? 1851.721078! Fault in home space mode while using kernel ASCE. +> > ? 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d +> > ? 1851.721128! Oops: 0004 ilc:3 ?#1! PREEMPT SMP DEBUG_PAGEALLOC +> > ? 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4 +> > ? 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178 +> > ? 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000 +> > ? 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308) +> > ? 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 +> > Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1 +> > ? 1851.721205! 000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198 +> > ? 1851.721207! 000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10 +> > ? 1851.721210! 00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68 +> > ? 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg %%r12,8(%%r13) +> > 000000000045d3b0: b9040039 lgr %%r3,%%r9 +> > #000000000045d3b4: a53b0001 oill %%r3,1 +> > >000000000045d3b8: e33010000024 stg %%r3,0(%%r1) +> > 000000000045d3be: ec28000e007c cgij %%r2,0,8,45d3da +> > 000000000045d3c4: e34020000004 lg %%r4,0(%%r2) +> > 000000000045d3ca: b904001c lgr %%r1,%%r12 +> > 000000000045d3ce: ec143f3f0056 rosbg %%r1,%%r4,63,63,0 +> > ? 1851.721269! Call Trace: +> > ? 1851.721273! (?<0000000083e45898>! 0x83e45898) +> > ? 1851.721279! ?<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8 +> > ? 1851.721282! ?<0000000000283f34>! free_pgtables+0xcc/0x148 +> > ? 1851.721285! ?<000000000028c376>! exit_mmap+0xd6/0x300 +> > ? 1851.721289! ?<0000000000134db8>! mmput+0x90/0x118 +> > ? 1851.721294! ?<00000000002d76bc>! flush_old_exec+0x5d4/0x700 +> > ? 1851.721298! ?<00000000003369f4>! load_elf_binary+0x2f4/0x13e8 +> > ? 1851.721301! ?<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8 +> > ? 1851.721304! ?<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0 +> > ? 1851.721307! ?<00000000002d8cec>! do_execve+0x44/0x58 +> > ? 1851.721310! ?<00000000002d8f92>! SyS_execve+0x3a/0x48 +> > ? 1851.721315! ?<00000000006fb096>! system_call+0xd6/0x258 +> > ? 1851.721317! ?<000003ff997436d6>! 0x3ff997436d6 +> > ? 1851.721319! INFO: lockdep is turned off. +> > ? 1851.721321! Last Breaking-Event-Address: +> > ? 1851.721323! ?<000000000045d31a>! __rb_erase_color+0x1e2/0x308 +> > ? 1851.721327! +> > ? 1851.721329! ---? end trace 0d80041ac00cfae2 !--- +> > +> > +> > > +> > > And could you share how crashes looks like? I haven't seen backtraces yet. +> > > +> > > > Sure. I didn't because they really looked random to me. Most of the time -> > in rcu or list debugging but I thought these have just been the messeng= -er -> > observing a corruption first. Anyhow, here is an older one that might l= -ook +> > in rcu or list debugging but I thought these have just been the messenger +> > observing a corruption first. Anyhow, here is an older one that might look > > interesting: -> >=20 -> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb0= -00, but was 0000000000000400 ->=20 +> > +> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400 +> > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm.. ->=20 +> > Could you check if you see the problem on commit 1c290f642101 and its > immediate parent? ->=20 +> How should the page->mapping poison end up as next->prev in the list of pre-allocated THP splitting page tables? Also, commit 1c290f642101 is before the THP rework, at least the non-bisectable part, so we should expect not to see the problem there. -0x400 is also the value of an empty pte on s390, and the thp_deposit/withdr= -aw -listheads are placed inside the pre-allocated pagetables instead of page->l= -ru, -because we have 2K pagetables on s390 and cannot use struct page =3D=3D pgt= -able_t. +0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw +listheads are placed inside the pre-allocated pagetables instead of page->lru, +because we have 2K pagetables on s390 and cannot use struct page == pgtable_t. So, for example, two concurrent withdraws could produce such a list corruption, because the first withdraw will overwrite the listhead at the diff --git a/a/content_digest b/N1/content_digest index 553d349..7d67fd3 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -7,146 +7,90 @@ "ref\020160212231510.GB15142@node.shutemov.name\0" "ref\0alpine.LFD.2.20.1602131238260.1910@schleppi\0" "ref\020160215113159.GA28832@node.shutemov.name\0" - "From\0Gerald Schaefer <gerald.schaefer@de.ibm.com>\0" - "Subject\0Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)\0" + "From\0gerald.schaefer@de.ibm.com (Gerald Schaefer)\0" + "Subject\0[BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)\0" "Date\0Mon, 15 Feb 2016 19:37:02 +0100\0" - "To\0Kirill A. Shutemov <kirill@shutemov.name>\0" - "Cc\0Sebastian Ott <sebott@linux.vnet.ibm.com>" - Andrea Arcangeli <aarcange@redhat.com> - Christian Borntraeger <borntraeger@de.ibm.com> - Kirill A. Shutemov <kirill.shutemov@linux.intel.com> - linux-mm@kvack.org - linux-kernel@vger.kernel.org - Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> - Andrew Morton <akpm@linux-foundation.org> - Linus Torvalds <torvalds@linux-foundation.org> - Michael Ellerman <mpe@ellerman.id.au> - Benjamin Herrenschmidt <benh@kernel.crashing.org> - Paul Mackerras <paulus@samba.org> - linuxppc-dev@lists.ozlabs.org - Catalin Marinas <catalin.marinas@arm.com> - Will Deacon <will.deacon@arm.com> - linux-arm-kernel@lists.infradead.org - Martin Schwidefsky <schwidefsky@de.ibm.com> - Heiko Carstens <heiko.carstens@de.ibm.com> - " linux-s390@vger.kernel.org\0" + "To\0linux-arm-kernel@lists.infradead.org\0" "\00:1\0" "b\0" "On Mon, 15 Feb 2016 13:31:59 +0200\n" "\"Kirill A. Shutemov\" <kirill@shutemov.name> wrote:\n" "\n" "> On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:\n" - "> >=20\n" + "> > \n" "> > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:\n" "> > > Could you check if revert of fecffad25458 helps?\n" - "> >=20\n" + "> > \n" "> > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:\n" - "> >=20\n" - "> > =C2=A2 1851.721062! Unable to handle kernel pointer dereference in virt=\n" - "ual kernel address space\n" - "> > =C2=A2 1851.721075! failing address: 0000000000000000 TEID: 00000000000=\n" - "00483\n" - "> > =C2=A2 1851.721078! Fault in home space mode while using kernel ASCE.\n" - "> > =C2=A2 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000f=\n" - "fffa800 P:000000000000003d\n" - "> > =C2=A2 1851.721128! Oops: 0004 ilc:3 =C2=A2#1! PREEMPT SMP DEBUG_PAGEAL=\n" - "LOC\n" - "> > =C2=A2 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx=\n" - "4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core =\n" - "ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic=\n" - " genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod =\n" - "scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4\n" - "> > =C2=A2 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3=\n" - "-00058-g07923d7-dirty #178\n" - "> > =C2=A2 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti=\n" - ": 000000008c604000\n" - "> > =C2=A2 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_=\n" - "erase_color+0x280/0x308)\n" - "> > =C2=A2 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3=\n" - " CC:1 PM:0 EA:3\n" - "> > Krnl GPRS: 0000000000000001 0000000000000020 00000000000=\n" - "00000 00000000bd07eff1\n" - "> > =C2=A2 1851.721205! 000000000027ca10 0000000000000000 000000=\n" - "0083e45898 0000000077b61198\n" - "> > =C2=A2 1851.721207! 000000007ce1a490 00000000bd07eff0 000000=\n" - "007ce1a548 000000000027ca10\n" - "> > =C2=A2 1851.721210! 00000000bd07c350 00000000bd07eff0 000000=\n" - "008c607aa8 000000008c607a68\n" - "> > =C2=A2 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg=\n" - " %%r12,8(%%r13)\n" - "> > 000000000045d3b0: b9040039 lgr =\n" - "%%r3,%%r9\n" - "> > #000000000045d3b4: a53b0001 oill =\n" - "%%r3,1\n" - "> > >000000000045d3b8: e33010000024 stg =\n" - "%%r3,0(%%r1)\n" - "> > 000000000045d3be: ec28000e007c cgij =\n" - "%%r2,0,8,45d3da\n" - "> > 000000000045d3c4: e34020000004 lg =\n" - "%%r4,0(%%r2)\n" - "> > 000000000045d3ca: b904001c lgr =\n" - "%%r1,%%r12\n" - "> > 000000000045d3ce: ec143f3f0056 rosbg =\n" - "%%r1,%%r4,63,63,0\n" - "> > =C2=A2 1851.721269! Call Trace:\n" - "> > =C2=A2 1851.721273! (=C2=A2<0000000083e45898>! 0x83e45898)\n" - "> > =C2=A2 1851.721279! =C2=A2<000000000029342a>! unlink_anon_vmas+0x9a/0x=\n" - "1d8\n" - "> > =C2=A2 1851.721282! =C2=A2<0000000000283f34>! free_pgtables+0xcc/0x148\n" - "> > =C2=A2 1851.721285! =C2=A2<000000000028c376>! exit_mmap+0xd6/0x300\n" - "> > =C2=A2 1851.721289! =C2=A2<0000000000134db8>! mmput+0x90/0x118\n" - "> > =C2=A2 1851.721294! =C2=A2<00000000002d76bc>! flush_old_exec+0x5d4/0x7=\n" - "00\n" - "> > =C2=A2 1851.721298! =C2=A2<00000000003369f4>! load_elf_binary+0x2f4/0x=\n" - "13e8\n" - "> > =C2=A2 1851.721301! =C2=A2<00000000002d6e4a>! search_binary_handler+0x=\n" - "9a/0x1f8\n" - "> > =C2=A2 1851.721304! =C2=A2<00000000002d8970>! do_execveat_common.isra.=\n" - "32+0x668/0x9a0\n" - "> > =C2=A2 1851.721307! =C2=A2<00000000002d8cec>! do_execve+0x44/0x58\n" - "> > =C2=A2 1851.721310! =C2=A2<00000000002d8f92>! SyS_execve+0x3a/0x48\n" - "> > =C2=A2 1851.721315! =C2=A2<00000000006fb096>! system_call+0xd6/0x258\n" - "> > =C2=A2 1851.721317! =C2=A2<000003ff997436d6>! 0x3ff997436d6\n" - "> > =C2=A2 1851.721319! INFO: lockdep is turned off.\n" - "> > =C2=A2 1851.721321! Last Breaking-Event-Address:\n" - "> > =C2=A2 1851.721323! =C2=A2<000000000045d31a>! __rb_erase_color+0x1e2/0=\n" - "x308\n" - "> > =C2=A2 1851.721327!\n" - "> > =C2=A2 1851.721329! ---=C2=A2 end trace 0d80041ac00cfae2 !---\n" - "> >=20\n" - "> >=20\n" - "> > >=20\n" - "> > > And could you share how crashes looks like? I haven't seen backtraces=\n" - " yet.\n" - "> > >=20\n" - "> >=20\n" + "> > \n" + "> > ? 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space\n" + "> > ? 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483\n" + "> > ? 1851.721078! Fault in home space mode while using kernel ASCE.\n" + "> > ? 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d\n" + "> > ? 1851.721128! Oops: 0004 ilc:3 ?#1! PREEMPT SMP DEBUG_PAGEALLOC\n" + "> > ? 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4\n" + "> > ? 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178\n" + "> > ? 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000\n" + "> > ? 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)\n" + "> > ? 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3\n" + "> > Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1\n" + "> > ? 1851.721205! 000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198\n" + "> > ? 1851.721207! 000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10\n" + "> > ? 1851.721210! 00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68\n" + "> > ? 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg %%r12,8(%%r13)\n" + "> > 000000000045d3b0: b9040039 lgr %%r3,%%r9\n" + "> > #000000000045d3b4: a53b0001 oill %%r3,1\n" + "> > >000000000045d3b8: e33010000024 stg %%r3,0(%%r1)\n" + "> > 000000000045d3be: ec28000e007c cgij %%r2,0,8,45d3da\n" + "> > 000000000045d3c4: e34020000004 lg %%r4,0(%%r2)\n" + "> > 000000000045d3ca: b904001c lgr %%r1,%%r12\n" + "> > 000000000045d3ce: ec143f3f0056 rosbg %%r1,%%r4,63,63,0\n" + "> > ? 1851.721269! Call Trace:\n" + "> > ? 1851.721273! (?<0000000083e45898>! 0x83e45898)\n" + "> > ? 1851.721279! ?<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8\n" + "> > ? 1851.721282! ?<0000000000283f34>! free_pgtables+0xcc/0x148\n" + "> > ? 1851.721285! ?<000000000028c376>! exit_mmap+0xd6/0x300\n" + "> > ? 1851.721289! ?<0000000000134db8>! mmput+0x90/0x118\n" + "> > ? 1851.721294! ?<00000000002d76bc>! flush_old_exec+0x5d4/0x700\n" + "> > ? 1851.721298! ?<00000000003369f4>! load_elf_binary+0x2f4/0x13e8\n" + "> > ? 1851.721301! ?<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8\n" + "> > ? 1851.721304! ?<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0\n" + "> > ? 1851.721307! ?<00000000002d8cec>! do_execve+0x44/0x58\n" + "> > ? 1851.721310! ?<00000000002d8f92>! SyS_execve+0x3a/0x48\n" + "> > ? 1851.721315! ?<00000000006fb096>! system_call+0xd6/0x258\n" + "> > ? 1851.721317! ?<000003ff997436d6>! 0x3ff997436d6\n" + "> > ? 1851.721319! INFO: lockdep is turned off.\n" + "> > ? 1851.721321! Last Breaking-Event-Address:\n" + "> > ? 1851.721323! ?<000000000045d31a>! __rb_erase_color+0x1e2/0x308\n" + "> > ? 1851.721327!\n" + "> > ? 1851.721329! ---? end trace 0d80041ac00cfae2 !---\n" + "> > \n" + "> > \n" + "> > > \n" + "> > > And could you share how crashes looks like? I haven't seen backtraces yet.\n" + "> > > \n" + "> > \n" "> > Sure. I didn't because they really looked random to me. Most of the time\n" - "> > in rcu or list debugging but I thought these have just been the messeng=\n" - "er\n" - "> > observing a corruption first. Anyhow, here is an older one that might l=\n" - "ook\n" + "> > in rcu or list debugging but I thought these have just been the messenger\n" + "> > observing a corruption first. Anyhow, here is an older one that might look\n" "> > interesting:\n" - "> >=20\n" - "> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb0=\n" - "00, but was 0000000000000400\n" - ">=20\n" + "> > \n" + "> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400\n" + "> \n" "> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..\n" - ">=20\n" + "> \n" "> Could you check if you see the problem on commit 1c290f642101 and its\n" "> immediate parent?\n" - ">=20\n" + "> \n" "\n" "How should the page->mapping poison end up as next->prev in the list of\n" "pre-allocated THP splitting page tables? Also, commit 1c290f642101\n" "is before the THP rework, at least the non-bisectable part, so we should\n" "expect not to see the problem there.\n" "\n" - "0x400 is also the value of an empty pte on s390, and the thp_deposit/withdr=\n" - "aw\n" - "listheads are placed inside the pre-allocated pagetables instead of page->l=\n" - "ru,\n" - "because we have 2K pagetables on s390 and cannot use struct page =3D=3D pgt=\n" - "able_t.\n" + "0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw\n" + "listheads are placed inside the pre-allocated pagetables instead of page->lru,\n" + "because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.\n" "\n" "So, for example, two concurrent withdraws could produce such a list\n" "corruption, because the first withdraw will overwrite the listhead at the\n" @@ -154,4 +98,4 @@ "\n" Has anything changed regarding the general THP deposit/withdraw logic? -44905ef53279bc01c4cb5058bfaf95ae92f69db2d957d357777618fc75cbd096 +e83d422054e460c3a199443fac14b5f9c12563ae1bb0b98f563db0e72a5f5dc6
diff --git a/a/1.txt b/N2/1.txt index 2ee45e4..ac27fb7 100644 --- a/a/1.txt +++ b/N2/1.txt @@ -2,120 +2,89 @@ On Mon, 15 Feb 2016 13:31:59 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote: > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote: -> >=20 +> > > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote: > > > Could you check if revert of fecffad25458 helps? -> >=20 +> > > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with: -> >=20 -> > =C2=A2 1851.721062! Unable to handle kernel pointer dereference in virt= -ual kernel address space -> > =C2=A2 1851.721075! failing address: 0000000000000000 TEID: 00000000000= -00483 -> > =C2=A2 1851.721078! Fault in home space mode while using kernel ASCE. -> > =C2=A2 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000f= -fffa800 P:000000000000003d -> > =C2=A2 1851.721128! Oops: 0004 ilc:3 =C2=A2#1! PREEMPT SMP DEBUG_PAGEAL= -LOC -> > =C2=A2 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx= -4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core = -ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic= - genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod = -scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4 -> > =C2=A2 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3= --00058-g07923d7-dirty #178 -> > =C2=A2 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti= -: 000000008c604000 -> > =C2=A2 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_= -erase_color+0x280/0x308) -> > =C2=A2 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3= - CC:1 PM:0 EA:3 -> > Krnl GPRS: 0000000000000001 0000000000000020 00000000000= -00000 00000000bd07eff1 -> > =C2=A2 1851.721205! 000000000027ca10 0000000000000000 000000= -0083e45898 0000000077b61198 -> > =C2=A2 1851.721207! 000000007ce1a490 00000000bd07eff0 000000= -007ce1a548 000000000027ca10 -> > =C2=A2 1851.721210! 00000000bd07c350 00000000bd07eff0 000000= -008c607aa8 000000008c607a68 -> > =C2=A2 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg= - %%r12,8(%%r13) -> > 000000000045d3b0: b9040039 lgr = -%%r3,%%r9 -> > #000000000045d3b4: a53b0001 oill = -%%r3,1 -> > >000000000045d3b8: e33010000024 stg = -%%r3,0(%%r1) -> > 000000000045d3be: ec28000e007c cgij = -%%r2,0,8,45d3da -> > 000000000045d3c4: e34020000004 lg = -%%r4,0(%%r2) -> > 000000000045d3ca: b904001c lgr = -%%r1,%%r12 -> > 000000000045d3ce: ec143f3f0056 rosbg = -%%r1,%%r4,63,63,0 -> > =C2=A2 1851.721269! Call Trace: -> > =C2=A2 1851.721273! (=C2=A2<0000000083e45898>! 0x83e45898) -> > =C2=A2 1851.721279! =C2=A2<000000000029342a>! unlink_anon_vmas+0x9a/0x= -1d8 -> > =C2=A2 1851.721282! =C2=A2<0000000000283f34>! free_pgtables+0xcc/0x148 -> > =C2=A2 1851.721285! =C2=A2<000000000028c376>! exit_mmap+0xd6/0x300 -> > =C2=A2 1851.721289! =C2=A2<0000000000134db8>! mmput+0x90/0x118 -> > =C2=A2 1851.721294! =C2=A2<00000000002d76bc>! flush_old_exec+0x5d4/0x7= -00 -> > =C2=A2 1851.721298! =C2=A2<00000000003369f4>! load_elf_binary+0x2f4/0x= -13e8 -> > =C2=A2 1851.721301! =C2=A2<00000000002d6e4a>! search_binary_handler+0x= -9a/0x1f8 -> > =C2=A2 1851.721304! =C2=A2<00000000002d8970>! do_execveat_common.isra.= -32+0x668/0x9a0 -> > =C2=A2 1851.721307! =C2=A2<00000000002d8cec>! do_execve+0x44/0x58 -> > =C2=A2 1851.721310! =C2=A2<00000000002d8f92>! SyS_execve+0x3a/0x48 -> > =C2=A2 1851.721315! =C2=A2<00000000006fb096>! system_call+0xd6/0x258 -> > =C2=A2 1851.721317! =C2=A2<000003ff997436d6>! 0x3ff997436d6 -> > =C2=A2 1851.721319! INFO: lockdep is turned off. -> > =C2=A2 1851.721321! Last Breaking-Event-Address: -> > =C2=A2 1851.721323! =C2=A2<000000000045d31a>! __rb_erase_color+0x1e2/0= -x308 -> > =C2=A2 1851.721327! -> > =C2=A2 1851.721329! ---=C2=A2 end trace 0d80041ac00cfae2 !--- -> >=20 -> >=20 -> > >=20 -> > > And could you share how crashes looks like? I haven't seen backtraces= - yet. -> > >=20 -> >=20 +> > +> > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space +> > ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483 +> > ¢ 1851.721078! Fault in home space mode while using kernel ASCE. +> > ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d +> > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC +> > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4 +> > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178 +> > ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000 +> > ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308) +> > ¢ 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 +> > Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1 +> > ¢ 1851.721205! 000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198 +> > ¢ 1851.721207! 000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10 +> > ¢ 1851.721210! 00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68 +> > ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg %%r12,8(%%r13) +> > 000000000045d3b0: b9040039 lgr %%r3,%%r9 +> > #000000000045d3b4: a53b0001 oill %%r3,1 +> > >000000000045d3b8: e33010000024 stg %%r3,0(%%r1) +> > 000000000045d3be: ec28000e007c cgij %%r2,0,8,45d3da +> > 000000000045d3c4: e34020000004 lg %%r4,0(%%r2) +> > 000000000045d3ca: b904001c lgr %%r1,%%r12 +> > 000000000045d3ce: ec143f3f0056 rosbg %%r1,%%r4,63,63,0 +> > ¢ 1851.721269! Call Trace: +> > ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898) +> > ¢ 1851.721279! ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8 +> > ¢ 1851.721282! ¢<0000000000283f34>! free_pgtables+0xcc/0x148 +> > ¢ 1851.721285! ¢<000000000028c376>! exit_mmap+0xd6/0x300 +> > ¢ 1851.721289! ¢<0000000000134db8>! mmput+0x90/0x118 +> > ¢ 1851.721294! ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700 +> > ¢ 1851.721298! ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8 +> > ¢ 1851.721301! ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8 +> > ¢ 1851.721304! ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0 +> > ¢ 1851.721307! ¢<00000000002d8cec>! do_execve+0x44/0x58 +> > ¢ 1851.721310! ¢<00000000002d8f92>! SyS_execve+0x3a/0x48 +> > ¢ 1851.721315! ¢<00000000006fb096>! system_call+0xd6/0x258 +> > ¢ 1851.721317! ¢<000003ff997436d6>! 0x3ff997436d6 +> > ¢ 1851.721319! INFO: lockdep is turned off. +> > ¢ 1851.721321! Last Breaking-Event-Address: +> > ¢ 1851.721323! ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308 +> > ¢ 1851.721327! +> > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !--- +> > +> > +> > > +> > > And could you share how crashes looks like? I haven't seen backtraces yet. +> > > +> > > > Sure. I didn't because they really looked random to me. Most of the time -> > in rcu or list debugging but I thought these have just been the messeng= -er -> > observing a corruption first. Anyhow, here is an older one that might l= -ook +> > in rcu or list debugging but I thought these have just been the messenger +> > observing a corruption first. Anyhow, here is an older one that might look > > interesting: -> >=20 -> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb0= -00, but was 0000000000000400 ->=20 +> > +> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400 +> > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm.. ->=20 +> > Could you check if you see the problem on commit 1c290f642101 and its > immediate parent? ->=20 +> How should the page->mapping poison end up as next->prev in the list of pre-allocated THP splitting page tables? Also, commit 1c290f642101 is before the THP rework, at least the non-bisectable part, so we should expect not to see the problem there. -0x400 is also the value of an empty pte on s390, and the thp_deposit/withdr= -aw -listheads are placed inside the pre-allocated pagetables instead of page->l= -ru, -because we have 2K pagetables on s390 and cannot use struct page =3D=3D pgt= -able_t. +0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw +listheads are placed inside the pre-allocated pagetables instead of page->lru, +because we have 2K pagetables on s390 and cannot use struct page == pgtable_t. So, for example, two concurrent withdraws could produce such a list corruption, because the first withdraw will overwrite the listhead at the beginning of the pagetable with 2 empty ptes. Has anything changed regarding the general THP deposit/withdraw logic? + +-- +To unsubscribe, send a message with 'unsubscribe linux-mm' in +the body to majordomo@kvack.org. For more info on Linux MM, +see: http://www.linux-mm.org/ . +Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> diff --git a/a/content_digest b/N2/content_digest index 553d349..511b449 100644 --- a/a/content_digest +++ b/N2/content_digest @@ -36,122 +36,91 @@ "\"Kirill A. Shutemov\" <kirill@shutemov.name> wrote:\n" "\n" "> On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:\n" - "> >=20\n" + "> > \n" "> > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:\n" "> > > Could you check if revert of fecffad25458 helps?\n" - "> >=20\n" + "> > \n" "> > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:\n" - "> >=20\n" - "> > =C2=A2 1851.721062! Unable to handle kernel pointer dereference in virt=\n" - "ual kernel address space\n" - "> > =C2=A2 1851.721075! failing address: 0000000000000000 TEID: 00000000000=\n" - "00483\n" - "> > =C2=A2 1851.721078! Fault in home space mode while using kernel ASCE.\n" - "> > =C2=A2 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000f=\n" - "fffa800 P:000000000000003d\n" - "> > =C2=A2 1851.721128! Oops: 0004 ilc:3 =C2=A2#1! PREEMPT SMP DEBUG_PAGEAL=\n" - "LOC\n" - "> > =C2=A2 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx=\n" - "4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core =\n" - "ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic=\n" - " genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod =\n" - "scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4\n" - "> > =C2=A2 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3=\n" - "-00058-g07923d7-dirty #178\n" - "> > =C2=A2 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti=\n" - ": 000000008c604000\n" - "> > =C2=A2 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_=\n" - "erase_color+0x280/0x308)\n" - "> > =C2=A2 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3=\n" - " CC:1 PM:0 EA:3\n" - "> > Krnl GPRS: 0000000000000001 0000000000000020 00000000000=\n" - "00000 00000000bd07eff1\n" - "> > =C2=A2 1851.721205! 000000000027ca10 0000000000000000 000000=\n" - "0083e45898 0000000077b61198\n" - "> > =C2=A2 1851.721207! 000000007ce1a490 00000000bd07eff0 000000=\n" - "007ce1a548 000000000027ca10\n" - "> > =C2=A2 1851.721210! 00000000bd07c350 00000000bd07eff0 000000=\n" - "008c607aa8 000000008c607a68\n" - "> > =C2=A2 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg=\n" - " %%r12,8(%%r13)\n" - "> > 000000000045d3b0: b9040039 lgr =\n" - "%%r3,%%r9\n" - "> > #000000000045d3b4: a53b0001 oill =\n" - "%%r3,1\n" - "> > >000000000045d3b8: e33010000024 stg =\n" - "%%r3,0(%%r1)\n" - "> > 000000000045d3be: ec28000e007c cgij =\n" - "%%r2,0,8,45d3da\n" - "> > 000000000045d3c4: e34020000004 lg =\n" - "%%r4,0(%%r2)\n" - "> > 000000000045d3ca: b904001c lgr =\n" - "%%r1,%%r12\n" - "> > 000000000045d3ce: ec143f3f0056 rosbg =\n" - "%%r1,%%r4,63,63,0\n" - "> > =C2=A2 1851.721269! Call Trace:\n" - "> > =C2=A2 1851.721273! (=C2=A2<0000000083e45898>! 0x83e45898)\n" - "> > =C2=A2 1851.721279! =C2=A2<000000000029342a>! unlink_anon_vmas+0x9a/0x=\n" - "1d8\n" - "> > =C2=A2 1851.721282! =C2=A2<0000000000283f34>! free_pgtables+0xcc/0x148\n" - "> > =C2=A2 1851.721285! =C2=A2<000000000028c376>! exit_mmap+0xd6/0x300\n" - "> > =C2=A2 1851.721289! =C2=A2<0000000000134db8>! mmput+0x90/0x118\n" - "> > =C2=A2 1851.721294! =C2=A2<00000000002d76bc>! flush_old_exec+0x5d4/0x7=\n" - "00\n" - "> > =C2=A2 1851.721298! =C2=A2<00000000003369f4>! load_elf_binary+0x2f4/0x=\n" - "13e8\n" - "> > =C2=A2 1851.721301! =C2=A2<00000000002d6e4a>! search_binary_handler+0x=\n" - "9a/0x1f8\n" - "> > =C2=A2 1851.721304! =C2=A2<00000000002d8970>! do_execveat_common.isra.=\n" - "32+0x668/0x9a0\n" - "> > =C2=A2 1851.721307! =C2=A2<00000000002d8cec>! do_execve+0x44/0x58\n" - "> > =C2=A2 1851.721310! =C2=A2<00000000002d8f92>! SyS_execve+0x3a/0x48\n" - "> > =C2=A2 1851.721315! =C2=A2<00000000006fb096>! system_call+0xd6/0x258\n" - "> > =C2=A2 1851.721317! =C2=A2<000003ff997436d6>! 0x3ff997436d6\n" - "> > =C2=A2 1851.721319! INFO: lockdep is turned off.\n" - "> > =C2=A2 1851.721321! Last Breaking-Event-Address:\n" - "> > =C2=A2 1851.721323! =C2=A2<000000000045d31a>! __rb_erase_color+0x1e2/0=\n" - "x308\n" - "> > =C2=A2 1851.721327!\n" - "> > =C2=A2 1851.721329! ---=C2=A2 end trace 0d80041ac00cfae2 !---\n" - "> >=20\n" - "> >=20\n" - "> > >=20\n" - "> > > And could you share how crashes looks like? I haven't seen backtraces=\n" - " yet.\n" - "> > >=20\n" - "> >=20\n" + "> > \n" + "> > \302\242 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space\n" + "> > \302\242 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483\n" + "> > \302\242 1851.721078! Fault in home space mode while using kernel ASCE.\n" + "> > \302\242 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d\n" + "> > \302\242 1851.721128! Oops: 0004 ilc:3 \302\242#1! PREEMPT SMP DEBUG_PAGEALLOC\n" + "> > \302\242 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4\n" + "> > \302\242 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178\n" + "> > \302\242 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000\n" + "> > \302\242 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)\n" + "> > \302\242 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3\n" + "> > Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1\n" + "> > \302\242 1851.721205! 000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198\n" + "> > \302\242 1851.721207! 000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10\n" + "> > \302\242 1851.721210! 00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68\n" + "> > \302\242 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg %%r12,8(%%r13)\n" + "> > 000000000045d3b0: b9040039 lgr %%r3,%%r9\n" + "> > #000000000045d3b4: a53b0001 oill %%r3,1\n" + "> > >000000000045d3b8: e33010000024 stg %%r3,0(%%r1)\n" + "> > 000000000045d3be: ec28000e007c cgij %%r2,0,8,45d3da\n" + "> > 000000000045d3c4: e34020000004 lg %%r4,0(%%r2)\n" + "> > 000000000045d3ca: b904001c lgr %%r1,%%r12\n" + "> > 000000000045d3ce: ec143f3f0056 rosbg %%r1,%%r4,63,63,0\n" + "> > \302\242 1851.721269! Call Trace:\n" + "> > \302\242 1851.721273! (\302\242<0000000083e45898>! 0x83e45898)\n" + "> > \302\242 1851.721279! \302\242<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8\n" + "> > \302\242 1851.721282! \302\242<0000000000283f34>! free_pgtables+0xcc/0x148\n" + "> > \302\242 1851.721285! \302\242<000000000028c376>! exit_mmap+0xd6/0x300\n" + "> > \302\242 1851.721289! \302\242<0000000000134db8>! mmput+0x90/0x118\n" + "> > \302\242 1851.721294! \302\242<00000000002d76bc>! flush_old_exec+0x5d4/0x700\n" + "> > \302\242 1851.721298! \302\242<00000000003369f4>! load_elf_binary+0x2f4/0x13e8\n" + "> > \302\242 1851.721301! \302\242<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8\n" + "> > \302\242 1851.721304! \302\242<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0\n" + "> > \302\242 1851.721307! \302\242<00000000002d8cec>! do_execve+0x44/0x58\n" + "> > \302\242 1851.721310! \302\242<00000000002d8f92>! SyS_execve+0x3a/0x48\n" + "> > \302\242 1851.721315! \302\242<00000000006fb096>! system_call+0xd6/0x258\n" + "> > \302\242 1851.721317! \302\242<000003ff997436d6>! 0x3ff997436d6\n" + "> > \302\242 1851.721319! INFO: lockdep is turned off.\n" + "> > \302\242 1851.721321! Last Breaking-Event-Address:\n" + "> > \302\242 1851.721323! \302\242<000000000045d31a>! __rb_erase_color+0x1e2/0x308\n" + "> > \302\242 1851.721327!\n" + "> > \302\242 1851.721329! ---\302\242 end trace 0d80041ac00cfae2 !---\n" + "> > \n" + "> > \n" + "> > > \n" + "> > > And could you share how crashes looks like? I haven't seen backtraces yet.\n" + "> > > \n" + "> > \n" "> > Sure. I didn't because they really looked random to me. Most of the time\n" - "> > in rcu or list debugging but I thought these have just been the messeng=\n" - "er\n" - "> > observing a corruption first. Anyhow, here is an older one that might l=\n" - "ook\n" + "> > in rcu or list debugging but I thought these have just been the messenger\n" + "> > observing a corruption first. Anyhow, here is an older one that might look\n" "> > interesting:\n" - "> >=20\n" - "> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb0=\n" - "00, but was 0000000000000400\n" - ">=20\n" + "> > \n" + "> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400\n" + "> \n" "> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..\n" - ">=20\n" + "> \n" "> Could you check if you see the problem on commit 1c290f642101 and its\n" "> immediate parent?\n" - ">=20\n" + "> \n" "\n" "How should the page->mapping poison end up as next->prev in the list of\n" "pre-allocated THP splitting page tables? Also, commit 1c290f642101\n" "is before the THP rework, at least the non-bisectable part, so we should\n" "expect not to see the problem there.\n" "\n" - "0x400 is also the value of an empty pte on s390, and the thp_deposit/withdr=\n" - "aw\n" - "listheads are placed inside the pre-allocated pagetables instead of page->l=\n" - "ru,\n" - "because we have 2K pagetables on s390 and cannot use struct page =3D=3D pgt=\n" - "able_t.\n" + "0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw\n" + "listheads are placed inside the pre-allocated pagetables instead of page->lru,\n" + "because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.\n" "\n" "So, for example, two concurrent withdraws could produce such a list\n" "corruption, because the first withdraw will overwrite the listhead at the\n" "beginning of the pagetable with 2 empty ptes.\n" "\n" - Has anything changed regarding the general THP deposit/withdraw logic? + "Has anything changed regarding the general THP deposit/withdraw logic?\n" + "\n" + "--\n" + "To unsubscribe, send a message with 'unsubscribe linux-mm' in\n" + "the body to majordomo@kvack.org. For more info on Linux MM,\n" + "see: http://www.linux-mm.org/ .\n" + "Don't email: <a href=mailto:\"dont@kvack.org\"> email@kvack.org </a>" -44905ef53279bc01c4cb5058bfaf95ae92f69db2d957d357777618fc75cbd096 +e20ee89e988b89ff6ded8abcd5696fe7bd7ee6f72ecf4f41391870d83a0fddaa
diff --git a/a/1.txt b/N3/1.txt index 2ee45e4..d3119af 100644 --- a/a/1.txt +++ b/N3/1.txt @@ -2,117 +2,80 @@ On Mon, 15 Feb 2016 13:31:59 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote: > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote: -> >=20 +> > > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote: > > > Could you check if revert of fecffad25458 helps? -> >=20 +> > > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with: -> >=20 -> > =C2=A2 1851.721062! Unable to handle kernel pointer dereference in virt= -ual kernel address space -> > =C2=A2 1851.721075! failing address: 0000000000000000 TEID: 00000000000= -00483 -> > =C2=A2 1851.721078! Fault in home space mode while using kernel ASCE. -> > =C2=A2 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000f= -fffa800 P:000000000000003d -> > =C2=A2 1851.721128! Oops: 0004 ilc:3 =C2=A2#1! PREEMPT SMP DEBUG_PAGEAL= -LOC -> > =C2=A2 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx= -4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core = -ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic= - genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod = -scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4 -> > =C2=A2 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3= --00058-g07923d7-dirty #178 -> > =C2=A2 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti= -: 000000008c604000 -> > =C2=A2 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_= -erase_color+0x280/0x308) -> > =C2=A2 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3= - CC:1 PM:0 EA:3 -> > Krnl GPRS: 0000000000000001 0000000000000020 00000000000= -00000 00000000bd07eff1 -> > =C2=A2 1851.721205! 000000000027ca10 0000000000000000 000000= -0083e45898 0000000077b61198 -> > =C2=A2 1851.721207! 000000007ce1a490 00000000bd07eff0 000000= -007ce1a548 000000000027ca10 -> > =C2=A2 1851.721210! 00000000bd07c350 00000000bd07eff0 000000= -008c607aa8 000000008c607a68 -> > =C2=A2 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg= - %%r12,8(%%r13) -> > 000000000045d3b0: b9040039 lgr = -%%r3,%%r9 -> > #000000000045d3b4: a53b0001 oill = -%%r3,1 -> > >000000000045d3b8: e33010000024 stg = -%%r3,0(%%r1) -> > 000000000045d3be: ec28000e007c cgij = -%%r2,0,8,45d3da -> > 000000000045d3c4: e34020000004 lg = -%%r4,0(%%r2) -> > 000000000045d3ca: b904001c lgr = -%%r1,%%r12 -> > 000000000045d3ce: ec143f3f0056 rosbg = -%%r1,%%r4,63,63,0 -> > =C2=A2 1851.721269! Call Trace: -> > =C2=A2 1851.721273! (=C2=A2<0000000083e45898>! 0x83e45898) -> > =C2=A2 1851.721279! =C2=A2<000000000029342a>! unlink_anon_vmas+0x9a/0x= -1d8 -> > =C2=A2 1851.721282! =C2=A2<0000000000283f34>! free_pgtables+0xcc/0x148 -> > =C2=A2 1851.721285! =C2=A2<000000000028c376>! exit_mmap+0xd6/0x300 -> > =C2=A2 1851.721289! =C2=A2<0000000000134db8>! mmput+0x90/0x118 -> > =C2=A2 1851.721294! =C2=A2<00000000002d76bc>! flush_old_exec+0x5d4/0x7= -00 -> > =C2=A2 1851.721298! =C2=A2<00000000003369f4>! load_elf_binary+0x2f4/0x= -13e8 -> > =C2=A2 1851.721301! =C2=A2<00000000002d6e4a>! search_binary_handler+0x= -9a/0x1f8 -> > =C2=A2 1851.721304! =C2=A2<00000000002d8970>! do_execveat_common.isra.= -32+0x668/0x9a0 -> > =C2=A2 1851.721307! =C2=A2<00000000002d8cec>! do_execve+0x44/0x58 -> > =C2=A2 1851.721310! =C2=A2<00000000002d8f92>! SyS_execve+0x3a/0x48 -> > =C2=A2 1851.721315! =C2=A2<00000000006fb096>! system_call+0xd6/0x258 -> > =C2=A2 1851.721317! =C2=A2<000003ff997436d6>! 0x3ff997436d6 -> > =C2=A2 1851.721319! INFO: lockdep is turned off. -> > =C2=A2 1851.721321! Last Breaking-Event-Address: -> > =C2=A2 1851.721323! =C2=A2<000000000045d31a>! __rb_erase_color+0x1e2/0= -x308 -> > =C2=A2 1851.721327! -> > =C2=A2 1851.721329! ---=C2=A2 end trace 0d80041ac00cfae2 !--- -> >=20 -> >=20 -> > >=20 -> > > And could you share how crashes looks like? I haven't seen backtraces= - yet. -> > >=20 -> >=20 +> > +> > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space +> > ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483 +> > ¢ 1851.721078! Fault in home space mode while using kernel ASCE. +> > ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d +> > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC +> > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4 +> > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178 +> > ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000 +> > ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308) +> > ¢ 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 +> > Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1 +> > ¢ 1851.721205! 000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198 +> > ¢ 1851.721207! 000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10 +> > ¢ 1851.721210! 00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68 +> > ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg %%r12,8(%%r13) +> > 000000000045d3b0: b9040039 lgr %%r3,%%r9 +> > #000000000045d3b4: a53b0001 oill %%r3,1 +> > >000000000045d3b8: e33010000024 stg %%r3,0(%%r1) +> > 000000000045d3be: ec28000e007c cgij %%r2,0,8,45d3da +> > 000000000045d3c4: e34020000004 lg %%r4,0(%%r2) +> > 000000000045d3ca: b904001c lgr %%r1,%%r12 +> > 000000000045d3ce: ec143f3f0056 rosbg %%r1,%%r4,63,63,0 +> > ¢ 1851.721269! Call Trace: +> > ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898) +> > ¢ 1851.721279! ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8 +> > ¢ 1851.721282! ¢<0000000000283f34>! free_pgtables+0xcc/0x148 +> > ¢ 1851.721285! ¢<000000000028c376>! exit_mmap+0xd6/0x300 +> > ¢ 1851.721289! ¢<0000000000134db8>! mmput+0x90/0x118 +> > ¢ 1851.721294! ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700 +> > ¢ 1851.721298! ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8 +> > ¢ 1851.721301! ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8 +> > ¢ 1851.721304! ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0 +> > ¢ 1851.721307! ¢<00000000002d8cec>! do_execve+0x44/0x58 +> > ¢ 1851.721310! ¢<00000000002d8f92>! SyS_execve+0x3a/0x48 +> > ¢ 1851.721315! ¢<00000000006fb096>! system_call+0xd6/0x258 +> > ¢ 1851.721317! ¢<000003ff997436d6>! 0x3ff997436d6 +> > ¢ 1851.721319! INFO: lockdep is turned off. +> > ¢ 1851.721321! Last Breaking-Event-Address: +> > ¢ 1851.721323! ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308 +> > ¢ 1851.721327! +> > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !--- +> > +> > +> > > +> > > And could you share how crashes looks like? I haven't seen backtraces yet. +> > > +> > > > Sure. I didn't because they really looked random to me. Most of the time -> > in rcu or list debugging but I thought these have just been the messeng= -er -> > observing a corruption first. Anyhow, here is an older one that might l= -ook +> > in rcu or list debugging but I thought these have just been the messenger +> > observing a corruption first. Anyhow, here is an older one that might look > > interesting: -> >=20 -> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb0= -00, but was 0000000000000400 ->=20 +> > +> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400 +> > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm.. ->=20 +> > Could you check if you see the problem on commit 1c290f642101 and its > immediate parent? ->=20 +> How should the page->mapping poison end up as next->prev in the list of pre-allocated THP splitting page tables? Also, commit 1c290f642101 is before the THP rework, at least the non-bisectable part, so we should expect not to see the problem there. -0x400 is also the value of an empty pte on s390, and the thp_deposit/withdr= -aw -listheads are placed inside the pre-allocated pagetables instead of page->l= -ru, -because we have 2K pagetables on s390 and cannot use struct page =3D=3D pgt= -able_t. +0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw +listheads are placed inside the pre-allocated pagetables instead of page->lru, +because we have 2K pagetables on s390 and cannot use struct page == pgtable_t. So, for example, two concurrent withdraws could produce such a list corruption, because the first withdraw will overwrite the listhead at the diff --git a/a/content_digest b/N3/content_digest index 553d349..e8d51ab 100644 --- a/a/content_digest +++ b/N3/content_digest @@ -36,117 +36,80 @@ "\"Kirill A. Shutemov\" <kirill@shutemov.name> wrote:\n" "\n" "> On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:\n" - "> >=20\n" + "> > \n" "> > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:\n" "> > > Could you check if revert of fecffad25458 helps?\n" - "> >=20\n" + "> > \n" "> > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:\n" - "> >=20\n" - "> > =C2=A2 1851.721062! Unable to handle kernel pointer dereference in virt=\n" - "ual kernel address space\n" - "> > =C2=A2 1851.721075! failing address: 0000000000000000 TEID: 00000000000=\n" - "00483\n" - "> > =C2=A2 1851.721078! Fault in home space mode while using kernel ASCE.\n" - "> > =C2=A2 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000f=\n" - "fffa800 P:000000000000003d\n" - "> > =C2=A2 1851.721128! Oops: 0004 ilc:3 =C2=A2#1! PREEMPT SMP DEBUG_PAGEAL=\n" - "LOC\n" - "> > =C2=A2 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx=\n" - "4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core =\n" - "ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic=\n" - " genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod =\n" - "scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4\n" - "> > =C2=A2 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3=\n" - "-00058-g07923d7-dirty #178\n" - "> > =C2=A2 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti=\n" - ": 000000008c604000\n" - "> > =C2=A2 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_=\n" - "erase_color+0x280/0x308)\n" - "> > =C2=A2 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3=\n" - " CC:1 PM:0 EA:3\n" - "> > Krnl GPRS: 0000000000000001 0000000000000020 00000000000=\n" - "00000 00000000bd07eff1\n" - "> > =C2=A2 1851.721205! 000000000027ca10 0000000000000000 000000=\n" - "0083e45898 0000000077b61198\n" - "> > =C2=A2 1851.721207! 000000007ce1a490 00000000bd07eff0 000000=\n" - "007ce1a548 000000000027ca10\n" - "> > =C2=A2 1851.721210! 00000000bd07c350 00000000bd07eff0 000000=\n" - "008c607aa8 000000008c607a68\n" - "> > =C2=A2 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg=\n" - " %%r12,8(%%r13)\n" - "> > 000000000045d3b0: b9040039 lgr =\n" - "%%r3,%%r9\n" - "> > #000000000045d3b4: a53b0001 oill =\n" - "%%r3,1\n" - "> > >000000000045d3b8: e33010000024 stg =\n" - "%%r3,0(%%r1)\n" - "> > 000000000045d3be: ec28000e007c cgij =\n" - "%%r2,0,8,45d3da\n" - "> > 000000000045d3c4: e34020000004 lg =\n" - "%%r4,0(%%r2)\n" - "> > 000000000045d3ca: b904001c lgr =\n" - "%%r1,%%r12\n" - "> > 000000000045d3ce: ec143f3f0056 rosbg =\n" - "%%r1,%%r4,63,63,0\n" - "> > =C2=A2 1851.721269! Call Trace:\n" - "> > =C2=A2 1851.721273! (=C2=A2<0000000083e45898>! 0x83e45898)\n" - "> > =C2=A2 1851.721279! =C2=A2<000000000029342a>! unlink_anon_vmas+0x9a/0x=\n" - "1d8\n" - "> > =C2=A2 1851.721282! =C2=A2<0000000000283f34>! free_pgtables+0xcc/0x148\n" - "> > =C2=A2 1851.721285! =C2=A2<000000000028c376>! exit_mmap+0xd6/0x300\n" - "> > =C2=A2 1851.721289! =C2=A2<0000000000134db8>! mmput+0x90/0x118\n" - "> > =C2=A2 1851.721294! =C2=A2<00000000002d76bc>! flush_old_exec+0x5d4/0x7=\n" - "00\n" - "> > =C2=A2 1851.721298! =C2=A2<00000000003369f4>! load_elf_binary+0x2f4/0x=\n" - "13e8\n" - "> > =C2=A2 1851.721301! =C2=A2<00000000002d6e4a>! search_binary_handler+0x=\n" - "9a/0x1f8\n" - "> > =C2=A2 1851.721304! =C2=A2<00000000002d8970>! do_execveat_common.isra.=\n" - "32+0x668/0x9a0\n" - "> > =C2=A2 1851.721307! =C2=A2<00000000002d8cec>! do_execve+0x44/0x58\n" - "> > =C2=A2 1851.721310! =C2=A2<00000000002d8f92>! SyS_execve+0x3a/0x48\n" - "> > =C2=A2 1851.721315! =C2=A2<00000000006fb096>! system_call+0xd6/0x258\n" - "> > =C2=A2 1851.721317! =C2=A2<000003ff997436d6>! 0x3ff997436d6\n" - "> > =C2=A2 1851.721319! INFO: lockdep is turned off.\n" - "> > =C2=A2 1851.721321! Last Breaking-Event-Address:\n" - "> > =C2=A2 1851.721323! =C2=A2<000000000045d31a>! __rb_erase_color+0x1e2/0=\n" - "x308\n" - "> > =C2=A2 1851.721327!\n" - "> > =C2=A2 1851.721329! ---=C2=A2 end trace 0d80041ac00cfae2 !---\n" - "> >=20\n" - "> >=20\n" - "> > >=20\n" - "> > > And could you share how crashes looks like? I haven't seen backtraces=\n" - " yet.\n" - "> > >=20\n" - "> >=20\n" + "> > \n" + "> > \302\242 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space\n" + "> > \302\242 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483\n" + "> > \302\242 1851.721078! Fault in home space mode while using kernel ASCE.\n" + "> > \302\242 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d\n" + "> > \302\242 1851.721128! Oops: 0004 ilc:3 \302\242#1! PREEMPT SMP DEBUG_PAGEALLOC\n" + "> > \302\242 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4\n" + "> > \302\242 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178\n" + "> > \302\242 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000\n" + "> > \302\242 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)\n" + "> > \302\242 1851.721200! R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3\n" + "> > Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1\n" + "> > \302\242 1851.721205! 000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198\n" + "> > \302\242 1851.721207! 000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10\n" + "> > \302\242 1851.721210! 00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68\n" + "> > \302\242 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024 stg %%r12,8(%%r13)\n" + "> > 000000000045d3b0: b9040039 lgr %%r3,%%r9\n" + "> > #000000000045d3b4: a53b0001 oill %%r3,1\n" + "> > >000000000045d3b8: e33010000024 stg %%r3,0(%%r1)\n" + "> > 000000000045d3be: ec28000e007c cgij %%r2,0,8,45d3da\n" + "> > 000000000045d3c4: e34020000004 lg %%r4,0(%%r2)\n" + "> > 000000000045d3ca: b904001c lgr %%r1,%%r12\n" + "> > 000000000045d3ce: ec143f3f0056 rosbg %%r1,%%r4,63,63,0\n" + "> > \302\242 1851.721269! Call Trace:\n" + "> > \302\242 1851.721273! (\302\242<0000000083e45898>! 0x83e45898)\n" + "> > \302\242 1851.721279! \302\242<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8\n" + "> > \302\242 1851.721282! \302\242<0000000000283f34>! free_pgtables+0xcc/0x148\n" + "> > \302\242 1851.721285! \302\242<000000000028c376>! exit_mmap+0xd6/0x300\n" + "> > \302\242 1851.721289! \302\242<0000000000134db8>! mmput+0x90/0x118\n" + "> > \302\242 1851.721294! \302\242<00000000002d76bc>! flush_old_exec+0x5d4/0x700\n" + "> > \302\242 1851.721298! \302\242<00000000003369f4>! load_elf_binary+0x2f4/0x13e8\n" + "> > \302\242 1851.721301! \302\242<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8\n" + "> > \302\242 1851.721304! \302\242<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0\n" + "> > \302\242 1851.721307! \302\242<00000000002d8cec>! do_execve+0x44/0x58\n" + "> > \302\242 1851.721310! \302\242<00000000002d8f92>! SyS_execve+0x3a/0x48\n" + "> > \302\242 1851.721315! \302\242<00000000006fb096>! system_call+0xd6/0x258\n" + "> > \302\242 1851.721317! \302\242<000003ff997436d6>! 0x3ff997436d6\n" + "> > \302\242 1851.721319! INFO: lockdep is turned off.\n" + "> > \302\242 1851.721321! Last Breaking-Event-Address:\n" + "> > \302\242 1851.721323! \302\242<000000000045d31a>! __rb_erase_color+0x1e2/0x308\n" + "> > \302\242 1851.721327!\n" + "> > \302\242 1851.721329! ---\302\242 end trace 0d80041ac00cfae2 !---\n" + "> > \n" + "> > \n" + "> > > \n" + "> > > And could you share how crashes looks like? I haven't seen backtraces yet.\n" + "> > > \n" + "> > \n" "> > Sure. I didn't because they really looked random to me. Most of the time\n" - "> > in rcu or list debugging but I thought these have just been the messeng=\n" - "er\n" - "> > observing a corruption first. Anyhow, here is an older one that might l=\n" - "ook\n" + "> > in rcu or list debugging but I thought these have just been the messenger\n" + "> > observing a corruption first. Anyhow, here is an older one that might look\n" "> > interesting:\n" - "> >=20\n" - "> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb0=\n" - "00, but was 0000000000000400\n" - ">=20\n" + "> > \n" + "> > [ 59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400\n" + "> \n" "> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..\n" - ">=20\n" + "> \n" "> Could you check if you see the problem on commit 1c290f642101 and its\n" "> immediate parent?\n" - ">=20\n" + "> \n" "\n" "How should the page->mapping poison end up as next->prev in the list of\n" "pre-allocated THP splitting page tables? Also, commit 1c290f642101\n" "is before the THP rework, at least the non-bisectable part, so we should\n" "expect not to see the problem there.\n" "\n" - "0x400 is also the value of an empty pte on s390, and the thp_deposit/withdr=\n" - "aw\n" - "listheads are placed inside the pre-allocated pagetables instead of page->l=\n" - "ru,\n" - "because we have 2K pagetables on s390 and cannot use struct page =3D=3D pgt=\n" - "able_t.\n" + "0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw\n" + "listheads are placed inside the pre-allocated pagetables instead of page->lru,\n" + "because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.\n" "\n" "So, for example, two concurrent withdraws could produce such a list\n" "corruption, because the first withdraw will overwrite the listhead at the\n" @@ -154,4 +117,4 @@ "\n" Has anything changed regarding the general THP deposit/withdraw logic? -44905ef53279bc01c4cb5058bfaf95ae92f69db2d957d357777618fc75cbd096 +f454158279aa913eae87256a7764d3177694ab25a22a1146098b368c98cb2ad3
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.