* ARM64: kernel panics in DABT in sys_msync path
@ 2017-09-24 21:36 Yury Norov
2017-09-25 10:53 ` Will Deacon
0 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2017-09-24 21:36 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
I found that running with qemu-10 with '-smp 4' option kernel v4.13 and
v4.14-rc1 panics with LTP test rwtest03:
rwtest -N rwtest03 -c -q -i 60s -n 2 -f buffered -s mmread,mmwrite -m random -Dv 10%25000:mm-buff-$$
[ 2068.307587] Unable to handle kernel paging request at virtual address ffffffffc0000d68
[ 2068.308195] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff00000901f000
[ 2068.308387] [ffffffffc0000d68] *pgd=0000000000000000
[ 2068.308643] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 2068.308865] Modules linked in:
[ 2068.309013] CPU: 0 PID: 9861 Comm: doio Not tainted 4.13.0-00027-g2fdc18baa2ae #196
[ 2068.309205] Hardware name: linux,dummy-virt (DT)
[ 2068.309331] task: ffff80000300d400 task.stack: ffff80003d28c000
[ 2068.309728] PC is at check_pte+0x8/0x130
[ 2068.309848] LR is at page_vma_mapped_walk+0x240/0x498
[ 2068.309995] pc : [<ffff0000081c5268>] lr : [<ffff0000081c55d0>] pstate: 00000145
[...]
[ 2068.338791] [<ffff0000081c5268>] check_pte+0x8/0x130
[ 2068.339070] [<ffff0000081c66c0>] page_mkclean_one+0xa0/0x258
[ 2068.339209] [<ffff0000081c6a70>] rmap_walk_file+0xe8/0x238
[ 2068.339331] [<ffff0000081c88c8>] rmap_walk+0x48/0x70
[ 2068.339436] [<ffff0000081c8ae8>] page_mkclean+0x80/0x98
[ 2068.339592] [<ffff00000819178c>] clear_page_dirty_for_io+0xac/0x298
[ 2068.339770] [<ffff0000082a36cc>] mpage_submit_page+0x2c/0x90
[ 2068.340004] [<ffff0000082a3864>] mpage_process_page_bufs+0x134/0x140
[ 2068.340261] [<ffff0000082a398c>] mpage_prepare_extent_to_map+0x11c/0x270
[ 2068.340438] [<ffff0000082a9058>] ext4_writepages+0x2f0/0xb30
[ 2068.340600] [<ffff000008193b78>] do_writepages+0x60/0x90
[ 2068.340742] [<ffff000008185a44>] __filemap_fdatawrite_range+0xa4/0xf0
[ 2068.340908] [<ffff0000081861c8>] file_write_and_wait_range+0x50/0xb8
[ 2068.341071] [<ffff000008299b40>] ext4_sync_file+0x80/0x340
[ 2068.341222] [<ffff00000823f668>] vfs_fsync_range+0x48/0xc8
[ 2068.341425] [<ffff0000081c51f4>] SyS_msync+0x1bc/0x228
[ 2068.341572] [<ffff00000808375c>] el0_svc_naked+0x20/0x24
The bug is reproducible for ilp32 and lp64 binaries. For kernel 4.12
and for all kernels if '-smp 1' is passed to qemu, everything works
fine. If no ideas, I think I'm able bisect it.
Some logs attached.
Yury
-------------- next part --------------
[ 2037.551071] LTP: starting rwtest03 (export LTPROOT; rwtest -N rwtest03 -c -q -i 60s -n 2 -f buffered -s mmread,mmwrite -m random -Dv 10%25000:mm-buff-$$)
[ 2068.307587] Unable to handle kernel paging request at virtual address ffffffffc0000d68
[ 2068.308195] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff00000901f000
[ 2068.308387] [ffffffffc0000d68] *pgd=0000000000000000
[ 2068.308643] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 2068.308865] Modules linked in:
[ 2068.309013] CPU: 0 PID: 9861 Comm: doio Not tainted 4.13.0-00027-g2fdc18baa2ae #196
[ 2068.309205] Hardware name: linux,dummy-virt (DT)
[ 2068.309331] task: ffff80000300d400 task.stack: ffff80003d28c000
[ 2068.309728] PC is at check_pte+0x8/0x130
[ 2068.309848] LR is at page_vma_mapped_walk+0x240/0x498
[ 2068.309995] pc : [<ffff0000081c5268>] lr : [<ffff0000081c55d0>] pstate: 00000145
[ 2068.310184] sp : ffff80003d28f880
[ 2068.310280] x29: ffff80003d28f880 x28: 0000000000000000
[ 2068.310437] x27: 00000000f73ad000 x26: 0008000000000080
[ 2068.310589] x25: 00000000000001ad x24: 0040000000000041
[ 2068.310742] x23: ffff000008d2e000 x22: ffff80003c5a3100
[ 2068.310922] x21: ffff7e0000951ac0 x20: 0400000000000001
[ 2068.311095] x19: ffff80003d28f948 x18: 0000000000000000
[ 2068.311251] x17: 00000000f7f90d70 x16: ffff0000081c5038
[ 2068.311410] x15: 000000000000001c x14: 000001e190b946b0
[ 2068.311570] x13: 0000000000000000 x12: ffff8000093bc600
[ 2068.311728] x11: 0000000000000001 x10: 0000000000000a00
[ 2068.311923] x9 : ffff80003d28f860 x8 : ffff80000300de60
[ 2068.312169] x7 : fffffffffff981d7 x6 : 0000001e1970e2d6
[ 2068.312325] x5 : ffffffffffffffff x4 : ffff0000081c6620
[ 2068.312478] x3 : 0000000000000000 x2 : 0000000000000001
[ 2068.312630] x1 : ffffffffc0000d68 x0 : ffff80003d28f948
[ 2068.312798] Process doio (pid: 9861, stack limit = 0xffff80003d28c000)
[ 2068.313012] Stack: (0xffff80003d28f880 to 0xffff80003d290000)
[ 2068.313241] f880: ffff80003d28f8d0 ffff0000081c66c0 00000000f73ad000 ffff80003d9e52e0
[ 2068.313448] f8a0: ffff80003d28fa14 00000000f73ae000 0040000000000001 ffff80001c4a9750
[ 2068.313650] f8c0: 0400000000000001 ffff80001c4a9750 ffff80003d28f980 ffff0000081c6a70
[ 2068.313851] f8e0: 00000000f73ad000 ffff7e0000951ac0 ffff80003d28fa18 00000000000001b0
[ 2068.314048] f900: 00000000000001b0 ffff80001c4a9750 0000000000000000 ffff80003d9e52e0
[ 2068.314248] f920: ffff80001c4a9778 0000000000000000 0000000000000001 00000000f73ae000
[ 2068.314449] f940: dead000000000100 ffff7e0000951ac0 ffff80003d9e52e0 00000000f73ad000
[ 2068.314648] f960: ffff80000939fdc8 ffffffffc0000d68 ffff7e000024f370 0000000000000001
[ 2068.314864] f980: ffff80003d28f9e0 ffff0000081c88c8 ffff7e0000951ac0 ffff80001c4a9750
[ 2068.315065] f9a0: 0000000000000001 ffff80001c4a95d8 ffff80003d28fbc0 0000000000000c34
[ 2068.315264] f9c0: 0000000000000000 ffff80001c4a9750 0000000000000002 ffff0000081c8ab8
[ 2068.315465] f9e0: ffff80003d28f9f0 ffff0000081c8ae8 ffff80003d28fa40 ffff00000819178c
[ 2068.315664] fa00: ffff7e0000951ac0 ffff7e0000c231c0 000000013d28fa80 ffff80003d28fa14
[ 2068.315904] fa20: ffff0000081c6620 0000000000000000 0000000000000000 ffff0000081c6350
[ 2068.316236] fa40: ffff80003d28fa80 ffff0000082a36cc ffff80003d28fc98 ffff7e0000951ac0
[ 2068.316562] fa60: ffff80003d28fba0 7ffffffffffffff6 ffff80003d28fbc0 0000000000000c34
[ 2068.316822] fa80: ffff80003d28faa0 ffff0000082a3864 0000000000000c35 ffff80003d28fc98
[ 2068.317033] faa0: ffff80003d28fad0 ffff0000082a398c ffff7e0000951ac0 ffff0000082a3910
[ 2068.318221] fac0: ffff80001c4a9750 000001b1081911e0 ffff80003d28fbc0 ffff0000082a9058
[ 2068.318419] fae0: ffff80001c4a9750 ffff80001c4a95d8 ffff800009166000 ffff80003d28fd40
[ 2068.318611] fb00: ffff800009166580 ffff80003c5a0ae8 00000000014000c0 0000000000000000
[ 2068.318803] fb20: ffff0000089ba1b8 ffff000008242548 ffff80003d28fb50 00000000000001b5
[ 2068.319005] fb40: 000000000000000e 0000000000000000 ffff7e00002831c0 ffff7e0000219a80
[ 2068.319198] fb60: ffff7e0000219ac0 ffff7e0000207980 ffff7e00002079c0 ffff7e000054fd80
[ 2068.319391] fb80: ffff7e000054fdc0 ffff7e0000c23180 ffff7e0000c231c0 ffff7e0000951ac0
[ 2068.319584] fba0: ffff7e0000a6c880 ffff7e0000a6c8c0 ffff7e0000040e00 ffff7e0000040e40
[ 2068.319775] fbc0: ffff80003d28fd00 ffff000008193b78 ffff80003d28fd40 ffff80001c4a9750
[ 2068.320070] fbe0: ffff80003daad000 0000000000000000 0000000000c34fff ffff80003c5a0ae8
[ 2068.320328] fc00: ffff80003c5a0a80 0000000000000000 0000000000000000 0000000000000000
[ 2068.320479] fc20: ffff0000089ba548 00000000082a3580 ffff80003d28fc50 ffff000008192f20
[ 2068.320633] fc40: ffff80003d0d7c00 0000000000000000 ffff800000000000 0000000000000001
[ 2068.320804] fc60: ffff7e0000e57bc0 ffff80003d28fc68 ffff80003d28fc68 ffff80003d28fc78
[ 2068.320974] fc80: ffff80003d28fc78 ffff80003d28fc88 ffff80003d28fc88 ffff80001c4a95d8
[ 2068.321148] fca0: ffff80003d28fd40 00000000000001b0 00000000000001b1 0000000000000c34
[ 2068.321317] fcc0: 0000000000000055 00000000f73c3000 ffff800008c0c7e8 ffff80003d28fd40
[ 2068.321486] fce0: ffff80003b660f00 ffff80003d473f40 000000000005a9b0 ffff00000823acc8
[ 2068.321737] fd00: ffff80003d28fd20 ffff000008185a44 ffff80001c4a9750 ffff80001c4a95d8
[ 2068.322020] fd20: ffff80003d28fda0 ffff0000081861c8 0000000000000000 ffff80001c4a9750
[ 2068.322270] fd40: 7ffffffffffffff6 0000000000000000 0000000000000000 0000000000c34fff
[ 2068.322496] fd60: 0000000000000001 ffff80003d0d7c58 ffff80001c4a95d8 0000000100000001
[ 2068.322747] fd80: 0000000000000000 0000000000009000 0000000000000000 0000000000000000
[ 2068.323038] fda0: ffff80003d28fde0 ffff000008299b40 0000000000000000 ffff80001c4a95d8
[ 2068.323269] fdc0: 0000000000000001 ffff800009166800 ffff80003daad000 0000000000000024
[ 2068.323508] fde0: ffff80003d28fe10 ffff00000823f668 ffff80003daad000 0000000000000000
[ 2068.323757] fe00: 00000000f7e32000 0000000000000004 ffff80003d28fe50 ffff0000081c51f4
[ 2068.324018] fe20: 00000000f71fd000 00000000f7e32000 00000000f71fd000 00000000f7e32000
[ 2068.324296] fe40: ffff80003d28fe50 ffff0000081c51d0 0000000000000000 ffff00000808375c
[ 2068.324566] fe60: 0000000000800000 00008000360fa000 ffffffffffffffff 00000000f7f90d98
[ 2068.324811] fe80: 0000000080000000 0000000000000015 0000000000000124 00000000000000e3
[ 2068.325056] fea0: ffff000008eb7000 ffff80000300d400 000000000dab929f ffffffff08e03000
[ 2068.325307] fec0: 00000000f71fd000 0000000000c35000 0000000000000004 00000000f7fa9000
[ 2068.325545] fee0: 00000000004614b7 00000000f73c3df3 61636f6c3a313638 6f643a74736f686c
[ 2068.325800] ff00: 00000000000000e3 686c61636f6c3a31 6f696f643a74736f 3a313638393a582a
[ 2068.326085] ff20: 6c3a313638393a58 74736f686c61636f 000000000000000c 000000000000001c
[ 2068.326334] ff40: 000000000041c10c 00000000f7f90d70 0000000000000000 00000000f71fd000
[ 2068.326569] ff60: 0000000000442180 00000000fffef368 0000000000442180 000000000043e208
[ 2068.326808] ff80: 0000000000000001 00000000f71fd000 0000000000442190 0000000000000005
[ 2068.334287] ffa0: 0000000000000002 00000000fffeeda0 0000000000403ba0 00000000fffeeda0
[ 2068.334607] ffc0: 00000000f7f90d98 0000000080000000 00000000f71fd000 00000000000000e3
[ 2068.335056] ffe0: 0000000000000000 0000000000000000 ffff80003d28fff0 ffff80003d28fff0
[ 2068.335565] Call trace:
[ 2068.335840] Exception stack(0xffff80003d28f6b0 to 0xffff80003d28f7e0)
[ 2068.336110] f6a0: ffff80003d28f948 0001000000000000
[ 2068.336420] f6c0: ffff80003d28f880 ffff0000081c5268 ffff80003d28f6f0 ffff00000810a3d4
[ 2068.336723] f6e0: 000000000006ef10 ffff00000900b000 ffff80003d28f720 ffff0000080f5a30
[ 2068.337036] f700: ffff80000300d400 000000000006ef10 ffff80003d28f760 ffff000008187b48
[ 2068.337380] f720: ffff80003d0ea500 0000000001091220 0000000001091220 0000000000000000
[ 2068.337675] f740: ffff000008105e38 0000000000000000 ffff80003d28f948 ffffffffc0000d68
[ 2068.337875] f760: 0000000000000001 0000000000000000 ffff0000081c6620 ffffffffffffffff
[ 2068.338068] f780: 0000001e1970e2d6 fffffffffff981d7 ffff80000300de60 ffff80003d28f860
[ 2068.338259] f7a0: 0000000000000a00 0000000000000001 ffff8000093bc600 0000000000000000
[ 2068.338456] f7c0: 000001e190b946b0 000000000000001c ffff0000081c5038 00000000f7f90d70
[ 2068.338791] [<ffff0000081c5268>] check_pte+0x8/0x130
[ 2068.339070] [<ffff0000081c66c0>] page_mkclean_one+0xa0/0x258
[ 2068.339209] [<ffff0000081c6a70>] rmap_walk_file+0xe8/0x238
[ 2068.339331] [<ffff0000081c88c8>] rmap_walk+0x48/0x70
[ 2068.339436] [<ffff0000081c8ae8>] page_mkclean+0x80/0x98
[ 2068.339592] [<ffff00000819178c>] clear_page_dirty_for_io+0xac/0x298
[ 2068.339770] [<ffff0000082a36cc>] mpage_submit_page+0x2c/0x90
[ 2068.340004] [<ffff0000082a3864>] mpage_process_page_bufs+0x134/0x140
[ 2068.340261] [<ffff0000082a398c>] mpage_prepare_extent_to_map+0x11c/0x270
[ 2068.340438] [<ffff0000082a9058>] ext4_writepages+0x2f0/0xb30
[ 2068.340600] [<ffff000008193b78>] do_writepages+0x60/0x90
[ 2068.340742] [<ffff000008185a44>] __filemap_fdatawrite_range+0xa4/0xf0
[ 2068.340908] [<ffff0000081861c8>] file_write_and_wait_range+0x50/0xb8
[ 2068.341071] [<ffff000008299b40>] ext4_sync_file+0x80/0x340
[ 2068.341222] [<ffff00000823f668>] vfs_fsync_range+0x48/0xc8
[ 2068.341425] [<ffff0000081c51f4>] SyS_msync+0x1bc/0x228
[ 2068.341572] [<ffff00000808375c>] el0_svc_naked+0x20/0x24
[ 2068.341820] Code: a943e7b8 17ffffc2 f9401001 b9403002 (f9400021)
[ 2068.342295] ---[ end trace 3f4dd4eaf4bfa846 ]---
[ 2068.342911] note: doio[9861] exited with preempt_count 1
[ 2523.805469] LTP: starting rwtest03 (export LTPROOT; rwtest -N rwtest03 -c -q -i 60s -n 2 -f buffered -s mmread,mmwrite -m random -Dv 10%25000:mm-buff-$$)
[ 2565.145947] Unable to handle kernel paging request at virtual address ffffffffc0000408
[ 2565.146363] Mem abort info:
[ 2565.146477] Exception class = DABT (current EL), IL = 32 bits
[ 2565.146639] SET = 0, FnV = 0
[ 2565.146735] EA = 0, S1PTW = 0
[ 2565.146851] Data abort info:
[ 2565.146947] ISV = 0, ISS = 0x00000004
[ 2565.147058] CM = 0, WnR = 0
[ 2565.147201] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff000009068000
[ 2565.147380] [ffffffffc0000408] *pgd=0000000000000000
[ 2565.147628] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 2565.147864] Modules linked in:
[ 2565.148015] CPU: 2 PID: 9879 Comm: doio Not tainted 4.14.0-rc1-00355-ga00c252d5d98 #199
[ 2565.148305] Hardware name: linux,dummy-virt (DT)
[ 2565.148490] task: ffff80003d3eba00 task.stack: ffff0000117c8000
[ 2565.149000] PC is at check_pte+0x8/0x150
[ 2565.149132] LR is at page_vma_mapped_walk+0x294/0x4c8
[ 2565.149285] pc : [<ffff0000081cb4c0>] lr : [<ffff0000081cb89c>] pstate: 00000145
[ 2565.149497] sp : ffff0000117cb880
[ 2565.149599] x29: ffff0000117cb880 x28: 0000000000000000
[ 2565.149770] x27: 00000000f7c81000 x26: 0008000000000080
[ 2565.149957] x25: 0000000000000081 x24: 0040000000000041
[ 2565.150117] x23: ffff000008d64000 x22: ffff80003a2b0e00
[ 2565.150265] x21: ffff7e0000043040 x20: 0400000000000001
[ 2565.150416] x19: ffff0000117cb948 x18: 0000000000000000
[ 2565.150568] x17: 00000000f7f90d70 x16: ffff0000081cb290
[ 2565.150680] x15: 000000000000001c x14: ffff80003b9858f8
[ 2565.150822] x13: 0000000000000000 x12: 000002553e979a20
[ 2565.150966] x11: ffff7e0000043000 x10: 0000000000000001
[ 2565.151111] x9 : 0000000000000001 x8 : 0000000000001200
[ 2565.151215] x7 : 0000000000001200 x6 : 0000000000000000
[ 2565.151319] x5 : ffffffffffffffff x4 : ffff0000081cc900
[ 2565.151421] x3 : 0000000000000000 x2 : 0000000000000001
[ 2565.151523] x1 : ffffffffc0000408 x0 : ffff0000117cb948
[ 2565.151639] Process doio (pid: 9879, stack limit = 0xffff0000117c8000)
[ 2565.151823] Call trace:
[ 2565.151939] Exception stack(0xffff0000117cb740 to 0xffff0000117cb880)
[ 2565.152166] b740: ffff0000117cb948 ffffffffc0000408 0000000000000001 0000000000000000
[ 2565.152318] b760: ffff0000081cc900 ffffffffffffffff 0000000000000000 0000000000001200
[ 2565.152473] b780: 0000000000001200 0000000000000001 0000000000000001 ffff7e0000043000
[ 2565.152643] b7a0: 000002553e979a20 0000000000000000 ffff80003b9858f8 000000000000001c
[ 2565.152778] b7c0: ffff0000081cb290 00000000f7f90d70 0000000000000000 ffff0000117cb948
[ 2565.152950] b7e0: 0400000000000001 ffff7e0000043040 ffff80003a2b0e00 ffff000008d64000
[ 2565.153222] b800: 0040000000000041 0000000000000081 0008000000000080 00000000f7c81000
[ 2565.153454] b820: 0000000000000000 ffff0000117cb880 ffff0000081cb89c ffff0000117cb880
[ 2565.153638] b840: ffff0000081cb4c0 0000000000000145 ffff80003efc5500 ffff80003da13a00
[ 2565.153838] b860: 0001000000000000 ffff80003d3eba00 ffff0000117cb880 ffff0000081cb4c0
[ 2565.154136] [<ffff0000081cb4c0>] check_pte+0x8/0x150
[ 2565.154282] [<ffff0000081cc9a0>] page_mkclean_one+0xa0/0x270
[ 2565.154433] [<ffff0000081ccd68>] rmap_walk_file+0xe8/0x238
[ 2565.154575] [<ffff0000081ceb70>] rmap_walk+0x48/0x70
[ 2565.154702] [<ffff0000081ced90>] page_mkclean+0x80/0x98
[ 2565.154838] [<ffff000008196c14>] clear_page_dirty_for_io+0xac/0x298
[ 2565.154997] [<ffff0000082aabc4>] mpage_submit_page+0x2c/0x90
[ 2565.155143] [<ffff0000082aad5c>] mpage_process_page_bufs+0x134/0x140
[ 2565.155304] [<ffff0000082aae94>] mpage_prepare_extent_to_map+0x12c/0x2b0
[ 2565.155453] [<ffff0000082b0560>] ext4_writepages+0x2f0/0xb30
[ 2565.155600] [<ffff000008199078>] do_writepages+0x60/0x90
[ 2565.155741] [<ffff00000818b014>] __filemap_fdatawrite_range+0xa4/0xf0
[ 2565.155904] [<ffff00000818b228>] file_write_and_wait_range+0x38/0xb8
[ 2565.156134] [<ffff0000082a0ec8>] ext4_sync_file+0x80/0x340
[ 2565.156350] [<ffff000008245eb8>] vfs_fsync_range+0x48/0xc8
[ 2565.156531] [<ffff0000081cb44c>] SyS_msync+0x1bc/0x228
[ 2565.156666] Exception stack(0xffff0000117cbec0 to 0xffff0000117cc000)
[ 2565.156842] bec0: 00000000f71fd000 0000000000c35000 0000000000000004 00000000f7fa9000
[ 2565.157038] bee0: 000000000045e48f 00000000f7c84ef0 6f696f643a74736f 3a393738393a452a
[ 2565.157232] bf00: 00000000000000e3 452a6f696f643a74 6f6c3a393738393a 3a74736f686c6163
[ 2565.157428] bf20: 6f696f643a74736f 3a393738393a452a 0000000000000003 000000000000001c
[ 2565.157623] bf40: 000000000041c10c 00000000f7f90d70 0000000000000000 00000000f71fd000
[ 2565.157817] bf60: 0000000000454990 00000000fffef368 0000000000454990 000000000043e208
[ 2565.158039] bf80: 0000000000000001 00000000f71fd000 00000000004549a2 0000000000000005
[ 2565.158234] bfa0: 0000000000000002 00000000fffeeda0 0000000000403ba0 00000000fffeeda0
[ 2565.162032] bfc0: 00000000f7f90d98 0000000080000000 00000000f71fd000 00000000000000e3
[ 2565.162460] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2565.162887] [<ffff0000080837dc>] el0_svc_naked+0x20/0x24
[ 2565.163331] Code: a943e7b8 17ffffc2 f9401001 b9403002 (f9400021)
[ 2565.163991] ---[ end trace 7fdb561fda1604d0 ]---
[ 2565.164578] note: doio[9879] exited with preempt_count 1
export LTPROOT=/home/yury/ilp32; ./rwtest -N rwtest03 -c -q -i 60s -n 2 -f buffered -s mmread,mmwrite -m random -Dv 10%25000:mm-buff-$$
[ 664.916027] Unable to handle kernel paging request at virtual address ffffffffc0000d48
[ 664.916337] Mem abort info:
[ 664.916483] Exception class = DABT (current EL), IL = 32 bits
[ 664.916959] SET = 0, FnV = 0
[ 664.917152] EA = 0, S1PTW = 0
[ 664.917355] Data abort info:
[ 664.917519] ISV = 0, ISS = 0x00000004
[ 664.917750] CM = 0, WnR = 0
[ 664.918064] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff000009068000
[ 664.918598] [ffffffffc0000d48] *pgd=0000000000000000
[ 664.919835] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 664.920194] Modules linked in:
[ 664.920621] CPU: 3 PID: 2332 Comm: doio Not tainted 4.14.0-rc1-00355-ga00c252d5d98 #199
[ 664.920858] Hardware name: linux,dummy-virt (DT)
[ 664.921091] task: ffff80003dbcba00 task.stack: ffff00000e3f0000
[ 664.921643] PC is at check_pte+0x8/0x150
[ 664.921778] LR is at page_vma_mapped_walk+0x294/0x4c8
[ 664.921920] pc : [<ffff0000081cb4c0>] lr : [<ffff0000081cb89c>] pstate: 00000145
[ 664.922104] sp : ffff00000e3f3880
[ 664.922211] x29: ffff00000e3f3880 x28: 0000000000000000
[ 664.922389] x27: 00000000f7ba9000 x26: 0008000000000080
[ 664.922593] x25: 00000000000001a9 x24: 0040000000000041
[ 664.922768] x23: ffff000008d64000 x22: ffff8000090f5c00
[ 664.922936] x21: ffff7e0000396300 x20: 0400000000000001
[ 664.923102] x19: ffff00000e3f3948 x18: 0000000000000000
[ 664.923284] x17: 00000000f7f90d70 x16: ffff0000081cb290
[ 664.923453] x15: 000000000000001b x14: ffff8000111f8900
[ 664.923621] x13: 0000000000000040 x12: 0000000000000238
[ 664.923789] x11: ffff7e00003962c0 x10: 0000000000000001
[ 664.923956] x9 : 0000000000000001 x8 : 0000000000001200
[ 664.924134] x7 : 0000000000001200 x6 : 0000000000000000
[ 664.924297] x5 : ffffffffffffffff x4 : ffff0000081cc900
[ 664.924458] x3 : 0000000000000000 x2 : 0000000000000001
[ 664.924687] x1 : ffffffffc0000d48 x0 : ffff00000e3f3948
[ 664.924950] Process doio (pid: 2332, stack limit = 0xffff00000e3f0000)
[ 664.925426] Call trace:
[ 664.925603] Exception stack(0xffff00000e3f3740 to 0xffff00000e3f3880)
[ 664.925876] 3740: ffff00000e3f3948 ffffffffc0000d48 0000000000000001 0000000000000000
[ 664.926112] 3760: ffff0000081cc900 ffffffffffffffff 0000000000000000 0000000000001200
[ 664.926318] 3780: 0000000000001200 0000000000000001 0000000000000001 ffff7e00003962c0
[ 664.926585] 37a0: 0000000000000238 0000000000000040 ffff8000111f8900 000000000000001b
[ 664.926810] 37c0: ffff0000081cb290 00000000f7f90d70 0000000000000000 ffff00000e3f3948
[ 664.927005] 37e0: 0400000000000001 ffff7e0000396300 ffff8000090f5c00 ffff000008d64000
[ 664.927205] 3800: 0040000000000041 00000000000001a9 0008000000000080 00000000f7ba9000
[ 664.927405] 3820: 0000000000000000 ffff00000e3f3880 ffff0000081cb89c ffff00000e3f3880
[ 664.927608] 3840: ffff0000081cb4c0 0000000000000145 ffff00000e3f38e0 ffff0000085dece8
[ 664.927808] 3860: 0001000000000000 ffff0000080f3b0c ffff00000e3f3880 ffff0000081cb4c0
[ 664.928161] [<ffff0000081cb4c0>] check_pte+0x8/0x150
[ 664.928316] [<ffff0000081cc9a0>] page_mkclean_one+0xa0/0x270
[ 664.928442] [<ffff0000081ccd68>] rmap_walk_file+0xe8/0x238
[ 664.928667] [<ffff0000081ceb70>] rmap_walk+0x48/0x70
[ 664.928852] [<ffff0000081ced90>] page_mkclean+0x80/0x98
[ 664.929034] [<ffff000008196c14>] clear_page_dirty_for_io+0xac/0x298
[ 664.929370] [<ffff0000082aabc4>] mpage_submit_page+0x2c/0x90
[ 664.929588] [<ffff0000082aad5c>] mpage_process_page_bufs+0x134/0x140
[ 664.929876] [<ffff0000082aae94>] mpage_prepare_extent_to_map+0x12c/0x2b0
[ 664.930086] [<ffff0000082b0560>] ext4_writepages+0x2f0/0xb30
[ 664.930273] [<ffff000008199078>] do_writepages+0x60/0x90
[ 664.930486] [<ffff00000818b014>] __filemap_fdatawrite_range+0xa4/0xf0
[ 664.930676] [<ffff00000818b228>] file_write_and_wait_range+0x38/0xb8
[ 664.930877] [<ffff0000082a0ec8>] ext4_sync_file+0x80/0x340
[ 664.931077] [<ffff000008245eb8>] vfs_fsync_range+0x48/0xc8
[ 664.931319] [<ffff0000081cb44c>] SyS_msync+0x1bc/0x228
[ 664.931485] Exception stack(0xffff00000e3f3ec0 to 0xffff00000e3f4000)
[ 664.931681] 3ec0: 00000000f71fd000 0000000000c35000 0000000000000004 00000000f7fa9000
[ 664.931894] 3ee0: 00000000004652f9 00000000f7babc55 3a74736f686c6163 323a572a6f696f64
[ 664.932113] 3f00: 00000000000000e3 6f643a74736f686c 3333323a572a6f69 686c61636f6c3a32
[ 664.932289] 3f20: 6c61636f6c3a3233 696f643a74736f68 000000000000000e 000000000000001b
[ 664.932446] 3f40: 000000000041c10c 00000000f7f90d70 0000000000000000 00000000f71fd000
[ 664.932678] 3f60: 000000000044df00 00000000fffef278 000000000044df00 000000000043e170
[ 664.932847] 3f80: 0000000000000001 00000000f71fd000 000000000044df12 0000000000000003
[ 664.933012] 3fa0: 0000000000000002 00000000fffeecb0 0000000000403ba0 00000000fffeecb0
[ 664.933173] 3fc0: 00000000f7f90d98 0000000080000000 00000000f71fd000 00000000000000e3
[ 664.933454] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 664.933774] [<ffff0000080837dc>] el0_svc_naked+0x20/0x24
[ 664.934265] Code: a943e7b8 17ffffc2 f9401001 b9403002 (f9400021)
[ 664.934854] ---[ end trace 02dc23c2662a0380 ]---
[ 664.935477] note: doio[2332] exited with preempt_count 1
[ 109.569993] Unable to handle kernel paging request at virtual address ffffffffc00001a0
[ 109.570748] Mem abort info:
[ 109.571051] Exception class = DABT (current EL), IL = 32 bits
[ 109.571284] SET = 0, FnV = 0
[ 109.571417] EA = 0, S1PTW = 0
[ 109.571527] Data abort info:
[ 109.571647] ISV = 0, ISS = 0x00000004
[ 109.572703] CM = 0, WnR = 0
[ 109.572888] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff000009068000
[ 109.573060] [ffffffffc00001a0] *pgd=0000000000000000
[ 109.573400] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 109.573550] Modules linked in:
[ 109.573789] CPU: 1 PID: 1894 Comm: doio Not tainted 4.14.0-rc1-00355-ga00c252d5d98 #199
[ 109.573928] Hardware name: linux,dummy-virt (DT)
[ 109.574050] task: ffff800009142b80 task.stack: ffff00000e020000
[ 109.574416] PC is at check_pte+0x8/0x150
[ 109.574541] LR is at page_vma_mapped_walk+0x294/0x4c8
[ 109.574640] pc : [<ffff0000081cb4c0>] lr : [<ffff0000081cb89c>] pstate: 00000145
[ 109.574767] sp : ffff00000e023880
[ 109.574880] x29: ffff00000e023880 x28: 0000000000000000
[ 109.575049] x27: 00000000f7234000 x26: 0008000000000080
[ 109.575150] x25: 0000000000000034 x24: 0040000000000041
[ 109.575267] x23: ffff000008d64000 x22: ffff800008c8df80
[ 109.575369] x21: ffff7e0000ddefc0 x20: 0400000000000001
[ 109.575473] x19: ffff00000e023948 x18: 0000000000000000
[ 109.575581] x17: 00000000f7f90d70 x16: ffff0000081cb290
[ 109.575684] x15: 000000000000001c x14: 000000000000000c
[ 109.575786] x13: 0000000000000000 x12: 0000001982daa520
[ 109.575887] x11: ffff7e0000ddef80 x10: 0000000000000001
[ 109.575989] x9 : 0000000000000001 x8 : 0000000000001200
[ 109.576089] x7 : 0000000000001200 x6 : 0000000000000000
[ 109.576196] x5 : ffffffffffffffff x4 : ffff0000081cc900
[ 109.576304] x3 : 0000000000000000 x2 : 0000000000000001
[ 109.576404] x1 : ffffffffc00001a0 x0 : ffff00000e023948
[ 109.576528] Process doio (pid: 1894, stack limit = 0xffff00000e020000)
[ 109.576708] Call trace:
[ 109.576857] Exception stack(0xffff00000e023740 to 0xffff00000e023880)
[ 109.577032] 3740: ffff00000e023948 ffffffffc00001a0 0000000000000001 0000000000000000
[ 109.577169] 3760: ffff0000081cc900 ffffffffffffffff 0000000000000000 0000000000001200
[ 109.577302] 3780: 0000000000001200 0000000000000001 0000000000000001 ffff7e0000ddef80
[ 109.577431] 37a0: 0000001982daa520 0000000000000000 000000000000000c 000000000000001c
[ 109.577563] 37c0: ffff0000081cb290 00000000f7f90d70 0000000000000000 ffff00000e023948
[ 109.577693] 37e0: 0400000000000001 ffff7e0000ddefc0 ffff800008c8df80 ffff000008d64000
[ 109.577822] 3800: 0040000000000041 0000000000000034 0008000000000080 00000000f7234000
[ 109.577950] 3820: 0000000000000000 ffff00000e023880 ffff0000081cb89c ffff00000e023880
[ 109.578079] 3840: ffff0000081cb4c0 0000000000000145 ffff80003efb0500 ffff80003da12b80
[ 109.578217] 3860: 0001000000000000 ffff800009142b80 ffff00000e023880 ffff0000081cb4c0
[ 109.578437] [<ffff0000081cb4c0>] check_pte+0x8/0x150
[ 109.578558] [<ffff0000081cc9a0>] page_mkclean_one+0xa0/0x270
[ 109.578660] [<ffff0000081ccd68>] rmap_walk_file+0xe8/0x238
[ 109.578806] [<ffff0000081ceb70>] rmap_walk+0x48/0x70
[ 109.578972] [<ffff0000081ced90>] page_mkclean+0x80/0x98
[ 109.579148] [<ffff000008196c14>] clear_page_dirty_for_io+0xac/0x298
[ 109.579273] [<ffff0000082aabc4>] mpage_submit_page+0x2c/0x90
[ 109.579441] [<ffff0000082aad5c>] mpage_process_page_bufs+0x134/0x140
[ 109.579582] [<ffff0000082aae94>] mpage_prepare_extent_to_map+0x12c/0x2b0
[ 109.579797] [<ffff0000082b0560>] ext4_writepages+0x2f0/0xb30
[ 109.579959] [<ffff000008199078>] do_writepages+0x60/0x90
[ 109.580110] [<ffff00000818b014>] __filemap_fdatawrite_range+0xa4/0xf0
[ 109.580288] [<ffff00000818b228>] file_write_and_wait_range+0x38/0xb8
[ 109.580462] [<ffff0000082a0ec8>] ext4_sync_file+0x80/0x340
[ 109.580631] [<ffff000008245eb8>] vfs_fsync_range+0x48/0xc8
[ 109.580837] [<ffff0000081cb44c>] SyS_msync+0x1bc/0x228
[ 109.580977] Exception stack(0xffff00000e023ec0 to 0xffff00000e024000)
[ 109.581160] 3ec0: 00000000f71fd000 0000000000c35000 0000000000000004 00000000f7fa9000
[ 109.581367] 3ee0: 0000000000463d9b 00000000f723f497 3a343938313a4d2a 736f686c61636f6c
[ 109.581550] 3f00: 00000000000000e3 6f6c3a343938313a 3a74736f686c6163 313a4d2a6f696f64
[ 109.581750] 3f20: 74736f686c61636f 3a4d2a6f696f643a 000000000000000c 000000000000001c
[ 109.581964] 3f40: 000000000041c10c 00000000f7f90d70 0000000000000000 00000000f71fd000
[ 109.582178] 3f60: 000000000044a8f0 00000000fffef278 000000000044a8f0 000000000043e208
[ 109.582394] 3f80: 0000000000000001 00000000f71fd000 000000000044a900 0000000000000004
[ 109.582649] 3fa0: 0000000000000002 00000000fffeecb0 0000000000403ba0 00000000fffeecb0
[ 109.582867] 3fc0: 00000000f7f90d98 0000000080000000 00000000f71fd000 00000000000000e3
[ 109.583074] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 109.583242] [<ffff0000080837dc>] el0_svc_naked+0x20/0x24
[ 109.583553] Code: a943e7b8 17ffffc2 f9401001 b9403002 (f9400021)
[ 109.583981] ---[ end trace 0fb20762cb6e4457 ]---
[ 109.584508] note: doio[1894] exited with preempt_count 1
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-24 21:36 ARM64: kernel panics in DABT in sys_msync path Yury Norov
@ 2017-09-25 10:53 ` Will Deacon
2017-09-25 14:02 ` Yury Norov
0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2017-09-25 10:53 UTC (permalink / raw)
To: linux-arm-kernel
Hi Yury,
Thanks for the report.
On Mon, Sep 25, 2017 at 12:36:22AM +0300, Yury Norov wrote:
> Hi all,
>
> I found that running with qemu-10 with '-smp 4' option kernel v4.13 and
> v4.14-rc1 panics with LTP test rwtest03:
> rwtest -N rwtest03 -c -q -i 60s -n 2 -f buffered -s mmread,mmwrite -m random -Dv 10%25000:mm-buff-$$
> [ 2068.307587] Unable to handle kernel paging request at virtual address ffffffffc0000d68
> [ 2068.308195] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff00000901f000
> [ 2068.308387] [ffffffffc0000d68] *pgd=0000000000000000
> [ 2068.308643] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 2068.308865] Modules linked in:
> [ 2068.309013] CPU: 0 PID: 9861 Comm: doio Not tainted 4.13.0-00027-g2fdc18baa2ae #196
> [ 2068.309205] Hardware name: linux,dummy-virt (DT)
> [ 2068.309331] task: ffff80000300d400 task.stack: ffff80003d28c000
> [ 2068.309728] PC is at check_pte+0x8/0x130
> [ 2068.309848] LR is at page_vma_mapped_walk+0x240/0x498
> [ 2068.309995] pc : [<ffff0000081c5268>] lr : [<ffff0000081c55d0>] pstate: 00000145
>
> [...]
>
> [ 2068.338791] [<ffff0000081c5268>] check_pte+0x8/0x130
> [ 2068.339070] [<ffff0000081c66c0>] page_mkclean_one+0xa0/0x258
> [ 2068.339209] [<ffff0000081c6a70>] rmap_walk_file+0xe8/0x238
> [ 2068.339331] [<ffff0000081c88c8>] rmap_walk+0x48/0x70
> [ 2068.339436] [<ffff0000081c8ae8>] page_mkclean+0x80/0x98
> [ 2068.339592] [<ffff00000819178c>] clear_page_dirty_for_io+0xac/0x298
> [ 2068.339770] [<ffff0000082a36cc>] mpage_submit_page+0x2c/0x90
> [ 2068.340004] [<ffff0000082a3864>] mpage_process_page_bufs+0x134/0x140
> [ 2068.340261] [<ffff0000082a398c>] mpage_prepare_extent_to_map+0x11c/0x270
> [ 2068.340438] [<ffff0000082a9058>] ext4_writepages+0x2f0/0xb30
> [ 2068.340600] [<ffff000008193b78>] do_writepages+0x60/0x90
> [ 2068.340742] [<ffff000008185a44>] __filemap_fdatawrite_range+0xa4/0xf0
> [ 2068.340908] [<ffff0000081861c8>] file_write_and_wait_range+0x50/0xb8
> [ 2068.341071] [<ffff000008299b40>] ext4_sync_file+0x80/0x340
> [ 2068.341222] [<ffff00000823f668>] vfs_fsync_range+0x48/0xc8
> [ 2068.341425] [<ffff0000081c51f4>] SyS_msync+0x1bc/0x228
> [ 2068.341572] [<ffff00000808375c>] el0_svc_naked+0x20/0x24
>
> The bug is reproducible for ilp32 and lp64 binaries. For kernel 4.12
> and for all kernels if '-smp 1' is passed to qemu, everything works
> fine. If no ideas, I think I'm able bisect it.
I tried to reproduce this on hardware, but failed to do so. Our nightly
tests are also coming back fine for rwtest03. I just built Qemu v2.10.0
and that also passes the test with -smp 4 for me, so I'm a bit stuck.
Could you share:
* Your kernel .config
* Your QEMU command line
* Details of your userspace
please?
Thanks,
Will
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-25 10:53 ` Will Deacon
@ 2017-09-25 14:02 ` Yury Norov
2017-09-25 19:04 ` Yury Norov
0 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2017-09-25 14:02 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
> > The bug is reproducible for ilp32 and lp64 binaries. For kernel 4.12
> > and for all kernels if '-smp 1' is passed to qemu, everything works
> > fine. If no ideas, I think I'm able bisect it.
>
> I tried to reproduce this on hardware, but failed to do so. Our nightly
> tests are also coming back fine for rwtest03. I just built Qemu v2.10.0
> and that also passes the test with -smp 4 for me, so I'm a bit stuck.
I also see the test passed sometimes. I run it in endless cycle and
leave for a while. 5-10 iterations are usually enough.
> Could you share:
>
> * Your kernel .config
> * Your QEMU command line
> * Details of your userspace
Qemu configure command:
./configure --target-list=aarch64-softmmu --enable-fdt --enable-vhost-net --enable-kvm
And run command:
/home/yury/work/qemu-2.10.0/aarch64-softmmu/qemu-system-aarch64 \
-machine virtualization=true -machine gic-version=3 \
-machine virt -cpu cortex-a57 -nographic -smp 4 -m 1024 \
-global virtio-blk-device.scsi=off -device virtio-scsi-device,id=scsi \
-drive file=img/ubuntu-core-14.04.1-core-arm64.img,id=coreimg,cache=unsafe,if=none -device scsi-hd,drive=coreimg \
-kernel /home/yury/work/linux/arch/arm64/boot/Image \
--append "console=ttyAMA0 root=/dev/sda" \
-initrd initrd.img-3.13.0-62-generic \
$NETWORK \
-redir tcp:2222::22 \
-s \
$@
My userspace is Ubuntu 14. I build lp64 tests with default Ubuntu
toolchain, and ilp32 tests with Linaro cross-toolchain.
The config is attached, and the branch is vanilla 4.13 kernel, or this
one:
https://github.com/norov/linux/tree/ilp32-4.13
Later today I will share the whole qemu environment I use.
Yury
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.gz
Type: application/gzip
Size: 34964 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170925/4c50da58/attachment-0001.gz>
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-25 14:02 ` Yury Norov
@ 2017-09-25 19:04 ` Yury Norov
2017-09-25 20:52 ` Ruigrok, Richard
[not found] ` <a3b714ae-97a5-8458-9bf7-140eeeebc4b9@codeaurora.org>
0 siblings, 2 replies; 12+ messages in thread
From: Yury Norov @ 2017-09-25 19:04 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Sep 25, 2017 at 05:02:40PM +0300, Yury Norov wrote:
> Hi Will,
>
> > > The bug is reproducible for ilp32 and lp64 binaries. For kernel 4.12
> > > and for all kernels if '-smp 1' is passed to qemu, everything works
> > > fine. If no ideas, I think I'm able bisect it.
> >
> > I tried to reproduce this on hardware, but failed to do so. Our nightly
> > tests are also coming back fine for rwtest03. I just built Qemu v2.10.0
> > and that also passes the test with -smp 4 for me, so I'm a bit stuck.
>
> I also see the test passed sometimes. I run it in endless cycle and
> leave for a while. 5-10 iterations are usually enough.
>
> > Could you share:
> >
> > * Your kernel .config
> > * Your QEMU command line
> > * Details of your userspace
>
> Qemu configure command:
> ./configure --target-list=aarch64-softmmu --enable-fdt --enable-vhost-net --enable-kvm
>
> And run command:
> /home/yury/work/qemu-2.10.0/aarch64-softmmu/qemu-system-aarch64 \
> -machine virtualization=true -machine gic-version=3 \
> -machine virt -cpu cortex-a57 -nographic -smp 4 -m 1024 \
> -global virtio-blk-device.scsi=off -device virtio-scsi-device,id=scsi \
> -drive file=img/ubuntu-core-14.04.1-core-arm64.img,id=coreimg,cache=unsafe,if=none -device scsi-hd,drive=coreimg \
> -kernel /home/yury/work/linux/arch/arm64/boot/Image \
> --append "console=ttyAMA0 root=/dev/sda" \
> -initrd initrd.img-3.13.0-62-generic \
> $NETWORK \
> -redir tcp:2222::22 \
> -s \
> $@
>
> My userspace is Ubuntu 14. I build lp64 tests with default Ubuntu
> toolchain, and ilp32 tests with Linaro cross-toolchain.
>
> The config is attached, and the branch is vanilla 4.13 kernel, or this
> one:
> https://github.com/norov/linux/tree/ilp32-4.13
>
> Later today I will share the whole qemu environment I use.
https://drive.google.com/file/d/0B07VUB3kjLD8Mm5XN21qTTBfbnc/view
> Yury
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-25 19:04 ` Yury Norov
@ 2017-09-25 20:52 ` Ruigrok, Richard
[not found] ` <a3b714ae-97a5-8458-9bf7-140eeeebc4b9@codeaurora.org>
1 sibling, 0 replies; 12+ messages in thread
From: Ruigrok, Richard @ 2017-09-25 20:52 UTC (permalink / raw)
To: linux-arm-kernel
On 9/25/2017 1:04 PM, Yury Norov wrote:
> On Mon, Sep 25, 2017 at 05:02:40PM +0300, Yury Norov wrote:
>> Hi Will,
>>
>>>> The bug is reproducible for ilp32 and lp64 binaries. For kernel 4.12
>>>> and for all kernels if '-smp 1' is passed to qemu, everything works
>>>> fine. If no ideas, I think I'm able bisect it.
>>> I tried to reproduce this on hardware, but failed to do so. Our nightly
>>> tests are also coming back fine for rwtest03. I just built Qemu v2.10.0
>>> and that also passes the test with -smp 4 for me, so I'm a bit stuck.
Hi Will,
I also found this issue with kernels from 4.11 through 4.13, reproduced on Hardware.? In my tests, I found that it reproduces only with 4K page and Transparent Huge Pages. With 64K page I was not able to reproduce. RH also reported it here: https://bugzilla.redhat.com/show_bug.cgi?id=1491504 and Linaro reported on the 4.12 based RPK kernel.
https://bugs.linaro.org/show_bug.cgi?id=3191
https://bugs.linaro.org/show_bug.cgi?id=3068.
I was able to bisect down to a specific commit:? f27176cfc363 mm: convert page_mkclean_one() to use page_vma_mapped_walk()
Ran with LTP 20170516 release, rwtest:?? ./runltp -p -f fs -s rwtest
To validate bisecting (good points), I ran 30 iterations.? Usually it reproduces in 5-10 iterations.
LMK if you have any suggestions for instrumentation, when running with ftrace I could not repro. I can run tests on 4.13 or on 4.11 at the above bisect point.
I have not tried any 4.14-rc yet.
>> I also see the test passed sometimes. I run it in endless cycle and
>> leave for a while. 5-10 iterations are usually enough.
>>
>>> Could you share:
>>>
>>> * Your kernel .config
>>> * Your QEMU command line
>>> * Details of your userspace
>> Qemu configure command:
>> ./configure --target-list=aarch64-softmmu --enable-fdt --enable-vhost-net --enable-kvm
>>
>> And run command:
>> /home/yury/work/qemu-2.10.0/aarch64-softmmu/qemu-system-aarch64 \
>> -machine virtualization=true -machine gic-version=3 \
>> -machine virt -cpu cortex-a57 -nographic -smp 4 -m 1024 \
>> -global virtio-blk-device.scsi=off -device virtio-scsi-device,id=scsi \
>> -drive file=img/ubuntu-core-14.04.1-core-arm64.img,id=coreimg,cache=unsafe,if=none -device scsi-hd,drive=coreimg \
>> -kernel /home/yury/work/linux/arch/arm64/boot/Image \
>> --append "console=ttyAMA0 root=/dev/sda" \
>> -initrd initrd.img-3.13.0-62-generic \
>> $NETWORK \
>> -redir tcp:2222::22 \
>> -s \
>> $@
>>
>> My userspace is Ubuntu 14. I build lp64 tests with default Ubuntu
>> toolchain, and ilp32 tests with Linaro cross-toolchain.
>>
>> The config is attached, and the branch is vanilla 4.13 kernel, or this
>> one:
>> https://github.com/norov/linux/tree/ilp32-4.13
>>
>> Later today I will share the whole qemu environment I use.
> https://drive.google.com/file/d/0B07VUB3kjLD8Mm5XN21qTTBfbnc/view
>
>> Yury
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
[not found] ` <a3b714ae-97a5-8458-9bf7-140eeeebc4b9@codeaurora.org>
@ 2017-09-26 10:23 ` Will Deacon
2017-09-26 11:54 ` Yury Norov
2017-09-26 14:23 ` Ruigrok, Richard
0 siblings, 2 replies; 12+ messages in thread
From: Will Deacon @ 2017-09-26 10:23 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote:
> I also found this issue with kernels from 4.11 through 4.13. In my tests, I
> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K
> page I was not able to reproduce. RH also reported it here: https://
> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel
> (4.12) on Centriq2400 and ThunderX
>
>
> https://bugs.linaro.org/show_bug.cgi?id=3191
>
> https://bugs.linaro.org/show_bug.cgi?id=3068.
These two aren't the same bug (that's a forward progress issue that we're
currently working on). I don't have permission to look at the redhat one,
but is it just an RCU stall or actually the Oops reported by Yury?
> I was able to bisect down to a specific commit.
I think we're chasing two different things here, so not sure I trust the
bisect!
Will
> First bad commit is:
> commit f27176cfc363d395eea8dc5c4a26e5d6d7d65eaf
> Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Date: Fri Feb 24 14:57:57 2017 -0800
>
> mm: convert page_mkclean_one() to use page_vma_mapped_walk()
>
> For consistency, it worth converting all page_check_address() to
> page_vma_mapped_walk(), so we could drop the former.
>
> PMD handling here is future-proofing, we don't have users yet. ext4
> with huge pages will be the first.
>
> I did not use virtualization, simply booting kernel and running the LTP
> rwtest: ./runltp -p -f fs -s rwtest
> To validate bisecting (good points), I ran 30 iterations. Usually it
> reproduces in 5-10 iterations.
>
> If you have any suggestions for instrumentation I can run tests, we can work
> with 4.13 or on 4.11 at the above bisect point.
> I have not tried the 4.14-rc's yet.
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-26 10:23 ` Will Deacon
@ 2017-09-26 11:54 ` Yury Norov
2017-09-26 14:23 ` Ruigrok, Richard
1 sibling, 0 replies; 12+ messages in thread
From: Yury Norov @ 2017-09-26 11:54 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Sep 26, 2017 at 11:23:24AM +0100, Will Deacon wrote:
> On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote:
> > I also found this issue with kernels from 4.11 through 4.13. In my tests, I
> > found that it reproduces only with 4K page and Transparent Huge Pages. With 64K
> > page I was not able to reproduce. RH also reported it here: https://
> > bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel
> > (4.12) on Centriq2400 and ThunderX
> >
> >
> > https://bugs.linaro.org/show_bug.cgi?id=3191
> >
> > https://bugs.linaro.org/show_bug.cgi?id=3068.
>
> These two aren't the same bug (that's a forward progress issue that we're
> currently working on). I don't have permission to look at the redhat one,
> but is it just an RCU stall or actually the Oops reported by Yury?
>
> > I was able to bisect down to a specific commit.
>
> I think we're chasing two different things here, so not sure I trust the
> bisect!
>
> Will
I ran test 30 times on 4.14-rc2 kernel with 64K pages, and no panics
happened. So it may be same bug though, or somehow related? I'll do
some bisects and report results here.
Yury
> > First bad commit is:
> > commit f27176cfc363d395eea8dc5c4a26e5d6d7d65eaf
> > Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Date: Fri Feb 24 14:57:57 2017 -0800
> >
> > mm: convert page_mkclean_one() to use page_vma_mapped_walk()
> >
> > For consistency, it worth converting all page_check_address() to
> > page_vma_mapped_walk(), so we could drop the former.
> >
> > PMD handling here is future-proofing, we don't have users yet. ext4
> > with huge pages will be the first.
> >
> > I did not use virtualization, simply booting kernel and running the LTP
> > rwtest: ./runltp -p -f fs -s rwtest
> > To validate bisecting (good points), I ran 30 iterations. Usually it
> > reproduces in 5-10 iterations.
> >
> > If you have any suggestions for instrumentation I can run tests, we can work
> > with 4.13 or on 4.11 at the above bisect point.
> > I have not tried the 4.14-rc's yet.
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-26 10:23 ` Will Deacon
2017-09-26 11:54 ` Yury Norov
@ 2017-09-26 14:23 ` Ruigrok, Richard
2017-09-26 17:31 ` Will Deacon
1 sibling, 1 reply; 12+ messages in thread
From: Ruigrok, Richard @ 2017-09-26 14:23 UTC (permalink / raw)
To: linux-arm-kernel
On 9/26/2017 4:23 AM, Will Deacon wrote:
> On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote:
>> I also found this issue with kernels from 4.11 through 4.13. In my tests, I
>> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K
>> page I was not able to reproduce. RH also reported it here: https://
>> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel
>> (4.12) on Centriq2400 and ThunderX
>>
>>
>> https://bugs.linaro.org/show_bug.cgi?id=3191
>>
>> https://bugs.linaro.org/show_bug.cgi?id=3068.
> These two aren't the same bug (that's a forward progress issue that we're
> currently working on). I don't have permission to look at the redhat one,
> but is it just an RCU stall or actually the Oops reported by Yury?
>
>> I was able to bisect down to a specific commit.
> I think we're chasing two different things here, so not sure I trust the
> bisect!
>
> Will
The RCU stall is side effect.? The issue I'm seeing has the same stack trace and same stimulus (rwtest).? Following are the details.
I agree the bisect needs to be verified.? Yury could you test commits before and at the bisect point I provided.? I did extensive test on our platform and bisect converged consistently to the same commit.
Details:
When running ARM64 kernel configured with THP enabled:
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
And 4k page (CONFIG_ARM64_4K_PAGES=y)
?
Running ltp release 20170516-182-g738dbdb? rwtest:? runltp -p -f fs -s rwtest
?
An unhandled page fault occurs in the mm code, when PC hits? line at mm/page_vma_mapped.c
http://elixir.free-electrons.com/linux/v4.13/source/mm/page_vma_mapped.c#L163
When an invalid pvmw pointer is passed to check_pte, in addition to the unhandled page fault, the entire system is brought down since the core on which the page fault occurs halts while holding the spinlock:??? spin_lock(pvmw->ptl);
>From <http://elixir.free-electrons.com/linux/v4.13/source/mm/page_vma_mapped.c#L163>
All other cores will show:? NMI watchdog: BUG: soft lockup - CPU#<n> stuck for 22s! [doio:4152]
list *(?? 0xffff0000081b9210 +0x70)
?
(gdb) list *(?? 0xffff0000081b9210 +0x70)
0xffff0000081b9280 is in page_mkclean_one (mm/rmap.c:1028).
1023??????????????????? .address = address,
1024??????????????????? .flags = PVMW_SYNC,
1025??????????? };
1026??????????? int *cleaned = arg;
1027
1028??????????? while (page_vma_mapped_walk(&pvmw)) {
1029??????????????????? int ret = 0;
1030??????????????????? address = pvmw.address;
1031??????????????????? if (pvmw.pte) {
1032??????????????????????????? pte_t entry;
(gdb)
?
?
Dump of assembler code for function check_pte:
?? 0xffff0000081b80c0 <+0>:???? ldr???? w1, [x0,#48]
list *(0xffff0000081b80c0 + 0x68)
?
(gdb) list *(0xffff0000081b80c0 + 0x68)
0xffff0000081b8128 is in check_pte (mm/page_vma_mapped.c:63).
58????????????????????????????? return false;
59????? #else
60????????????????????? WARN_ON_ONCE(1);
61????? #endif
62????????????? } else {
63????????????????????? if (!pte_present(*pvmw->pte))
64????????????????????????????? return false;
65
66????????????????????? /* THP can be referenced by any subpage */
67????????????????????? if (pte_page(*pvmw->pte) - pvmw->page >=
?
?
?
[? 544.799399] Unable to handle kernel paging request at virtual address ffff800000000c10
[? 544.806371] pgd = ffff8007d4d7b000
[? 544.809753] [ffff800000000c10] *pgd=0000000000000000
[? 544.814695] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[? 544.820248] Modules linked in:
[? 544.823287] CPU: 2 PID: 4153 Comm: doio Not tainted 4.10.0-dev-0907-t64-09623-g726c7c0 #93
[? 544.831526] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development Platform/ABW|SYS|CVR,1DPC|V3?????????? , BIOS XBL.DF.2.0.R1-00542 QDF2400_REL CR
[? 544.845328] task: ffff8007d8428d00 task.stack: ffff8007db4ac000
[? 544.851248] PC is at check_pte+0x68/0x150
[? 544.855231] LR is at page_vma_mapped_walk+0x260/0x3d8
[? 544.860259] pc : [<ffff0000081b8128>] lr : [<ffff0000081b8470>] pstate: 00400145
[? 544.867637] sp : ffff8007db4af8a0
[? 544.870942] x29: ffff8007db4af8a0 x28: 0000000000000714
[? 544.876231] x27: 0088000000000000 x26: ff77ffffffffffff
[? 544.881526] x25: 0400000000000001 x24: 0040000000000041
[? 544.886821] x23: ffff8007d77f7000 x22: ffff8007db4afa34
[? 544.892116] x21: ffff000009276000 x20: ffff7e001f292600
[? 544.897411] x19: ffff8007db4af958 x18: 0000000000000a03
[? 544.902706] x17: 0000ffff945fb1a0 x16: ffff0000081b7ee8
[? 544.908001] x15: ffff8007bd6a6b48 x14: 0000000000000040
[? 544.913297] x13: 0000000000000000 x12: 0000000000000002
[? 544.918592] x11: 0000000000000230 x10: 0000000000001200
[? 544.923887] x9 : ffff7e001f2925c0 x8 : 0000000000001200
[? 544.929182] x7 : 0000000000000001 x6 : 0000000000000c35
[? 544.934477] x5 : 0000000000000001 x4 : 0000000000000182
[? 544.939772] x3 : 0400000000000001 x2 : ffff800000000c10
[? 544.945067] x1 : 0000000000000000 x0 : ffff8007db4af958
?
?
?
?
?
[? 545.425022] Call trace:??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? [10/44993]
[? 545.427453] Exception stack(0xffff8007db4af6d0 to 0xffff8007db4af800)
[? 545.433870] f6c0:?????????????????????????????????? ffff8007db4af958 0001000000000000
[? 545.441683] f6e0: ffff8007db4af8a0 ffff0000081b8128 ffff8007db4af710 ffff0000081dc514
[? 545.449495] f700: 0000000000000000 ffff0000091ef000 ffff8007db4af770 ffff0000087f0444
[? 545.457308] f720: ffff8007d9f1e148 ffff0000095ad000 ffff8007d80eb000 0000000001011200
[? 545.465120] f740: ffff8007db4af7a0 ffff00000817cf40 0000000000000000 ffff8007d8e7f700
[? 545.472933] f760: 0000000001091220 ffff0000080fd998 ffff8007db4af958 0000000000000000
[? 545.480745] f780: ffff800000000c10 0400000000000001 0000000000000182 0000000000000001
[? 545.488558] f7a0: 0000000000000c35 0000000000000001 0000000000001200 ffff7e001f2925c0
[? 545.496370] f7c0: 0000000000001200 0000000000000230 0000000000000002 0000000000000000
[? 545.504183] f7e0: 0000000000000040 ffff8007bd6a6b48 ffff0000081b7ee8 0000ffff945fb1a0
[? 545.512008] [<ffff0000081b8128>] check_pte+0x68/0x150
[? 545.517043] [<ffff0000081b9280>] page_mkclean_one+0x70/0x1a0
[? 545.522672] [<ffff0000081b94dc>] rmap_walk_file+0xe4/0x290
[? 545.528141] [<ffff0000081bb788>] rmap_walk+0x48/0x70
[? 545.533089] [<ffff0000081bb9a8>] page_mkclean+0x88/0xa0
[? 545.538313] [<ffff0000081866dc>] clear_page_dirty_for_io+0x9c/0x200
[? 545.544564] [<ffff000008280a20>] mpage_submit_page+0x48/0x98
[? 545.550190] [<ffff000008280bb8>] mpage_process_page_bufs+0x148/0x158
[? 545.556526] [<ffff000008280d0c>] mpage_prepare_extent_to_map+0x144/0x270
[? 545.563217] [<ffff000008284f20>] ext4_writepages+0x3b0/0xa00
[? 545.568853] [<ffff000008188ccc>] do_writepages+0x24/0x48
[? 545.574161] [<ffff00000817b454>] __filemap_fdatawrite_range+0x9c/0xe8
[? 545.580571] [<ffff00000817b5b4>] filemap_write_and_wait_range+0x2c/0x88
[? 545.587175] [<ffff00000827c540>] ext4_sync_file+0x58/0x300
[? 545.592652] [<ffff00000822b46c>] vfs_fsync_range+0x44/0xc0
[? 545.598107] [<ffff0000081b806c>] SyS_msync+0x184/0x1d8
[? 545.603242] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
[? 545.608530] Code: f9401002 d2800023 f2e08003 52800001 (f9400042)
[? 545.614630] ---[ end trace 065a200dac27fe87 ]---
[? 545.619213] note: doio[4153] exited with preempt_count 1
[? 569.734898] NMI watchdog: BUG: soft lockup - CPU#27 stuck for 22s! [doio:4152]
[? 569.741155] Modules linked in:
[? 569.744193]
[? 569.745671] CPU: 27 PID: 4152 Comm: doio Tainted: G????? D???????? 4.10.0-dev-0907-t64-09623-g726c7c0 #93
[? 569.755218] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development Platform/ABW|SYS|CVR,1DPC|V3?????????? , BIOS XBL.DF.2.0.R1-00542 QDF2400_REL CR
[? 569.769020] task: ffff8007d842ce00 task.stack: ffff8007d8280000
[? 569.774938] PC is at _raw_spin_lock+0x34/0x48
[? 569.779279] LR is at alloc_set_pte+0x438/0x560
Thanks,
Richard.
>> First bad commit is:
>> commit f27176cfc363d395eea8dc5c4a26e5d6d7d65eaf
>> Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Date: Fri Feb 24 14:57:57 2017 -0800
>>
>> mm: convert page_mkclean_one() to use page_vma_mapped_walk()
>>
>> For consistency, it worth converting all page_check_address() to
>> page_vma_mapped_walk(), so we could drop the former.
>>
>> PMD handling here is future-proofing, we don't have users yet. ext4
>> with huge pages will be the first.
>>
>> I did not use virtualization, simply booting kernel and running the LTP
>> rwtest: ./runltp -p -f fs -s rwtest
>> To validate bisecting (good points), I ran 30 iterations. Usually it
>> reproduces in 5-10 iterations.
>>
>> If you have any suggestions for instrumentation I can run tests, we can work
>> with 4.13 or on 4.11 at the above bisect point.
>> I have not tried the 4.14-rc's yet.
--
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-26 14:23 ` Ruigrok, Richard
@ 2017-09-26 17:31 ` Will Deacon
2017-09-27 15:50 ` Will Deacon
0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2017-09-26 17:31 UTC (permalink / raw)
To: linux-arm-kernel
Yury, Richard,
On Tue, Sep 26, 2017 at 08:23:35AM -0600, Ruigrok, Richard wrote:
> On 9/26/2017 4:23 AM, Will Deacon wrote:
> > On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote:
> >> I also found this issue with kernels from 4.11 through 4.13. In my tests, I
> >> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K
> >> page I was not able to reproduce. RH also reported it here: https://
> >> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel
> >> (4.12) on Centriq2400 and ThunderX
> >>
> >>
> >> https://bugs.linaro.org/show_bug.cgi?id=3191
> >>
> >> https://bugs.linaro.org/show_bug.cgi?id=3068.
> > These two aren't the same bug (that's a forward progress issue that we're
> > currently working on). I don't have permission to look at the redhat one,
> > but is it just an RCU stall or actually the Oops reported by Yury?
> >
> >> I was able to bisect down to a specific commit.
> > I think we're chasing two different things here, so not sure I trust the
> > bisect!
> >
> The RCU stall is side effect.? The issue I'm seeing has the same stack
> trace and same stimulus (rwtest).? Following are the details.
FWIW, I think I've worked out what's going on here and I should have a patch
tomorrow.
Will
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-26 17:31 ` Will Deacon
@ 2017-09-27 15:50 ` Will Deacon
2017-09-27 18:00 ` Richard Ruigrok
0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2017-09-27 15:50 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Sep 26, 2017 at 06:31:12PM +0100, Will Deacon wrote:
> On Tue, Sep 26, 2017 at 08:23:35AM -0600, Ruigrok, Richard wrote:
> > On 9/26/2017 4:23 AM, Will Deacon wrote:
> > > On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote:
> > >> I also found this issue with kernels from 4.11 through 4.13. In my tests, I
> > >> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K
> > >> page I was not able to reproduce. RH also reported it here: https://
> > >> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel
> > >> (4.12) on Centriq2400 and ThunderX
> > >>
> > >>
> > >> https://bugs.linaro.org/show_bug.cgi?id=3191
> > >>
> > >> https://bugs.linaro.org/show_bug.cgi?id=3068.
> > > These two aren't the same bug (that's a forward progress issue that we're
> > > currently working on). I don't have permission to look at the redhat one,
> > > but is it just an RCU stall or actually the Oops reported by Yury?
> > >
> > >> I was able to bisect down to a specific commit.
> > > I think we're chasing two different things here, so not sure I trust the
> > > bisect!
> > >
> > The RCU stall is side effect.? The issue I'm seeing has the same stack
> > trace and same stimulus (rwtest).? Following are the details.
>
> FWIW, I think I've worked out what's going on here and I should have a patch
> tomorrow.
Diff below. I'm going to follow up with a separate thread about this,
because the proper fix is going to be invasive. I'll keep you on cc.
Out of curiosity: what version of GCC are you using to compile the kernel?
Will
--->8
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index bc4e92337d16..b46e54c2399b 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -401,7 +401,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
/* Find an entry in the third-level page table. */
#define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
-#define pte_offset_phys(dir,addr) (pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t))
+#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t))
#define pte_offset_kernel(dir,addr) ((pte_t *)__va(pte_offset_phys((dir), (addr))))
#define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr))
^ permalink raw reply related [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-27 15:50 ` Will Deacon
@ 2017-09-27 18:00 ` Richard Ruigrok
2017-09-28 3:31 ` Richard Ruigrok
0 siblings, 1 reply; 12+ messages in thread
From: Richard Ruigrok @ 2017-09-27 18:00 UTC (permalink / raw)
To: linux-arm-kernel
On 9/27/2017 9:50 AM, Will Deacon wrote:
> On Tue, Sep 26, 2017 at 06:31:12PM +0100, Will Deacon wrote:
>> On Tue, Sep 26, 2017 at 08:23:35AM -0600, Ruigrok, Richard wrote:
>>> On 9/26/2017 4:23 AM, Will Deacon wrote:
>>>> On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote:
>>>>> I also found this issue with kernels from 4.11 through 4.13. In my tests, I
>>>>> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K
>>>>> page I was not able to reproduce. RH also reported it here: https://
>>>>> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel
>>>>> (4.12) on Centriq2400 and ThunderX
>>>>>
>>>>>
>>>>> https://bugs.linaro.org/show_bug.cgi?id=3191
>>>>>
>>>>> https://bugs.linaro.org/show_bug.cgi?id=3068.
>>>> These two aren't the same bug (that's a forward progress issue that we're
>>>> currently working on). I don't have permission to look at the redhat one,
>>>> but is it just an RCU stall or actually the Oops reported by Yury?
>>>>
>>>>> I was able to bisect down to a specific commit.
>>>> I think we're chasing two different things here, so not sure I trust the
>>>> bisect!
>>>>
>>> The RCU stall is side effect.? The issue I'm seeing has the same stack
>>> trace and same stimulus (rwtest).? Following are the details.
>> FWIW, I think I've worked out what's going on here and I should have a patch
>> tomorrow.
> Diff below. I'm going to follow up with a separate thread about this,
> because the proper fix is going to be invasive. I'll keep you on cc.
>
> Out of curiosity: what version of GCC are you using to compile the kernel?
I'm using gcc-linaro-6.3.1-2017.02-x86_64_aarch64-linux-gnu
Thanks for the patch, test results to follow.
Richard
>
> Will
>
> --->8
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index bc4e92337d16..b46e54c2399b 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -401,7 +401,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
> /* Find an entry in the third-level page table. */
> #define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
>
> -#define pte_offset_phys(dir,addr) (pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t))
> +#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t))
> #define pte_offset_kernel(dir,addr) ((pte_t *)__va(pte_offset_phys((dir), (addr))))
>
> #define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr))
--
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply [flat|nested] 12+ messages in thread
* ARM64: kernel panics in DABT in sys_msync path
2017-09-27 18:00 ` Richard Ruigrok
@ 2017-09-28 3:31 ` Richard Ruigrok
0 siblings, 0 replies; 12+ messages in thread
From: Richard Ruigrok @ 2017-09-28 3:31 UTC (permalink / raw)
To: linux-arm-kernel
On 9/27/2017 12:00 PM, Richard Ruigrok wrote:
>
> On 9/27/2017 9:50 AM, Will Deacon wrote:
>> On Tue, Sep 26, 2017 at 06:31:12PM +0100, Will Deacon wrote:
>>> On Tue, Sep 26, 2017 at 08:23:35AM -0600, Ruigrok, Richard wrote:
>>>> On 9/26/2017 4:23 AM, Will Deacon wrote:
>>>>> On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote:
>>>>>> I also found this issue with kernels from 4.11 through 4.13. In my tests, I
>>>>>> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K
>>>>>> page I was not able to reproduce. RH also reported it here: https://
>>>>>> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel
>>>>>> (4.12) on Centriq2400 and ThunderX
>>>>>>
>>>>>>
>>>>>> https://bugs.linaro.org/show_bug.cgi?id=3191
>>>>>>
>>>>>> https://bugs.linaro.org/show_bug.cgi?id=3068.
>>>>> These two aren't the same bug (that's a forward progress issue that we're
>>>>> currently working on). I don't have permission to look at the redhat one,
>>>>> but is it just an RCU stall or actually the Oops reported by Yury?
>>>>>
>>>>>> I was able to bisect down to a specific commit.
>>>>> I think we're chasing two different things here, so not sure I trust the
>>>>> bisect!
>>>>>
>>>> The RCU stall is side effect.? The issue I'm seeing has the same stack
>>>> trace and same stimulus (rwtest).? Following are the details.
>>> FWIW, I think I've worked out what's going on here and I should have a patch
>>> tomorrow.
>> Diff below. I'm going to follow up with a separate thread about this,
>> because the proper fix is going to be invasive. I'll keep you on cc.
>>
>> Out of curiosity: what version of GCC are you using to compile the kernel?
> I'm using gcc-linaro-6.3.1-2017.02-x86_64_aarch64-linux-gnu
> Thanks for the patch, test results to follow.
> Richard
With this change applied on v4.13, the LTP rwtest passed 50 iterations, it appears to solve the issue I was seeing.
This kernel was built with 5.2.1,? I've also started using 6.3.1.? If you think it makes a difference I can test also with 6.3.1.
Linux version 4.13.0-00002-g8540910-dirty (rruigrok at rruigrok-lnx) (gcc version 5.2.1 20151005 (Linaro GCC 5.2-2015.11-1)) #55 SMP PREEMPT Wed Sep 27 13:37:25 MDT 2017
Richard
>> Will
>>
>> --->8
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index bc4e92337d16..b46e54c2399b 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -401,7 +401,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>> /* Find an entry in the third-level page table. */
>> #define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
>>
>> -#define pte_offset_phys(dir,addr) (pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t))
>> +#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t))
>> #define pte_offset_kernel(dir,addr) ((pte_t *)__va(pte_offset_phys((dir), (addr))))
>>
>> #define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr))
--
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-09-28 3:31 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-24 21:36 ARM64: kernel panics in DABT in sys_msync path Yury Norov
2017-09-25 10:53 ` Will Deacon
2017-09-25 14:02 ` Yury Norov
2017-09-25 19:04 ` Yury Norov
2017-09-25 20:52 ` Ruigrok, Richard
[not found] ` <a3b714ae-97a5-8458-9bf7-140eeeebc4b9@codeaurora.org>
2017-09-26 10:23 ` Will Deacon
2017-09-26 11:54 ` Yury Norov
2017-09-26 14:23 ` Ruigrok, Richard
2017-09-26 17:31 ` Will Deacon
2017-09-27 15:50 ` Will Deacon
2017-09-27 18:00 ` Richard Ruigrok
2017-09-28 3:31 ` Richard Ruigrok
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).