From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f199.google.com (mail-ot0-f199.google.com [74.125.82.199]) by kanga.kvack.org (Postfix) with ESMTP id 67AAB6B0003 for ; Fri, 23 Feb 2018 14:51:46 -0500 (EST) Received: by mail-ot0-f199.google.com with SMTP id 73so4749314oth.20 for ; Fri, 23 Feb 2018 11:51:46 -0800 (PST) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id d9sor1411462otc.256.2018.02.23.11.51.44 for (Google Transport Security); Fri, 23 Feb 2018 11:51:44 -0800 (PST) From: Laura Abbott Subject: Hangs in balance_dirty_pages with arm-32 LPAE + highmem Message-ID: Date: Fri, 23 Feb 2018 11:51:41 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Linux-MM , linux-block@vger.kernel.org Hi, The Fedora arm-32 build VMs have a somewhat long standing problem of hanging when running mkfs.ext4 with a bunch of processes stuck in D state. This has been seen as far back as 4.13 but is still present on 4.14: sysrq: SysRq : Show Blocked State [255/1885] task PC stack pid father auditd D 0 377 1 0x00000020 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (__sys_trace_return+0x0/0x10) rs:main Q:Reg D 0 441 1 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) ntpd D 0 1453 1 0x00000001 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [203/1885] [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) kojid D 0 4616 1 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) kworker/u8:0 D 0 28525 2 0x00000000 Workqueue: writeback wb_workfn (flush-7:0) [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (io_schedule+0x1c/0x2c) [] (io_schedule) from [] (wbt_wait+0x21c/0x300) [] (wbt_wait) from [] (blk_mq_make_request+0xac/0x560) [] (blk_mq_make_request) from [] (generic_make_request+0xd0/0x214) [] (generic_make_request) from [] (submit_bio+0x114/0x16c) [] (submit_bio) from [] (submit_bh_wbc+0x190/0x1a0) [] (submit_bh_wbc) from [] (__block_write_full_page+0x2e8/0x43c) [] (__block_write_full_page) from [] (block_write_full_page+0x80/0xec) [] (block_write_full_page) from [] (__writepage+0x1c/0x4c) [] (__writepage) from [] (write_cache_pages+0x350/0x3f0) [] (write_cache_pages) from [] (generic_writepages+0x44/0x60) [] (generic_writepages) from [] (do_writepages+0x3c/0x74) [] (do_writepages) from [] (__writeback_single_inode+0xb4/0x404) [] (__writeback_single_inode) from [] (writeback_sb_inodes+0x258/0x438) [] (writeback_sb_inodes) from [] (__writeback_inodes_wb+0x6c/0xa8) [] (__writeback_inodes_wb) from [] (wb_writeback+0x1c4/0x30c) [] (wb_writeback) from [] (wb_workfn+0x130/0x450) [] (wb_workfn) from [] (process_one_work+0x254/0x42c) [] (process_one_work) from [] (worker_thread+0x2d0/0x450) [] (worker_thread) from [] (kthread+0x13c/0x154) [] (kthread) from [] (ret_from_fork+0x14/0x3c) kworker/u8:1 D 0 16594 2 0x00000000 Workqueue: writeback wb_workfn (flush-252:0) [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (io_schedule+0x1c/0x2c) [] (io_schedule) from [] (wbt_wait+0x21c/0x300) [] (wbt_wait) from [] (blk_mq_make_request+0xac/0x560) [] (blk_mq_make_request) from [] (generic_make_request+0xd0/0x214) [] (generic_make_request) from [] (submit_bio+0x114/0x16c) [] (blk_mq_make_request) from [] (generic_make_request+0xd0/0[151/1885] [] (generic_make_request) from [] (submit_bio+0x114/0x16c) [] (submit_bio) from [] (submit_bh_wbc+0x190/0x1a0) [] (submit_bh_wbc) from [] (__block_write_full_page+0x2e8/0x43c) [] (__block_write_full_page) from [] (block_write_full_page+0x80/0xec) [] (block_write_full_page) from [] (__writepage+0x1c/0x4c) [] (__writepage) from [] (write_cache_pages+0x350/0x3f0) [] (write_cache_pages) from [] (generic_writepages+0x44/0x60) [] (generic_writepages) from [] (do_writepages+0x3c/0x74) [] (do_writepages) from [] (__writeback_single_inode+0xb4/0x404) [] (__writeback_single_inode) from [] (writeback_sb_inodes+0x258/0x438) [] (writeback_sb_inodes) from [] (__writeback_inodes_wb+0x6c/0xa8) [] (__writeback_inodes_wb) from [] (wb_writeback+0x1c4/0x30c) [] (wb_writeback) from [] (wb_workfn+0x130/0x450) [] (wb_workfn) from [] (process_one_work+0x254/0x42c) [] (process_one_work) from [] (worker_thread+0x2d0/0x450) [] (worker_thread) from [] (kthread+0x13c/0x154) [] (kthread) from [] (ret_from_fork+0x14/0x3c) loop0 D 0 9138 2 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (do_iter_readv_writev+0x118/0x140) [] (do_iter_readv_writev) from [] (do_iter_write+0x84/0xf8) [] (do_iter_write) from [] (lo_write_bvec+0x70/0xec [loop]) [] (lo_write_bvec [loop]) from [] (loop_queue_work+0x3b4/0x92c [loop]) [] (loop_queue_work [loop]) from [] (kthread_worker_fn+0x114/0x1c8) [] (kthread_worker_fn) from [] (kthread+0x13c/0x154) [] (kthread) from [] (ret_from_fork+0x14/0x3c) mkfs.ext4 D 0 9142 1535 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (io_schedule+0x1c/0x2c) [] (io_schedule) from [] (__lock_page+0x10c/0x144) [] (__lock_page) from [] (write_cache_pages+0x1d8/0x3f0) [] (write_cache_pages) from [] (generic_writepages+0x44/0x60) [] (generic_writepages) from [] (do_writepages+0x3c/0x74) [] (do_writepages) from [] (__filemap_fdatawrite_range+0xc0/0xe0) [] (__filemap_fdatawrite_range) from [] (file_write_and_wait_range+0x40 /0x78) [] (file_write_and_wait_range) from [] (blkdev_fsync+0x20/0x50) [] (blkdev_fsync) from [] (vfs_fsync+0x28/0x30) [] (vfs_fsync) from [] (do_fsync+0x30/0x4c) [] (do_fsync) from [] (ret_fast_syscall+0x0/0x4c) python D 0 9167 9165 0x00000000 Sun Feb 18 18:17:58 2018] [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (fault_dirty_shared_page+0 x9c/0xb4) [] (fault_dirty_shared_page) from [] (do_wp_page+0x628/0x688) [] (do_wp_page) from [] (handle_mm_fault+0xd5c/0xe08) [] (handle_mm_fault) from [] (do_page_fault+0x1f0/0x360) [] (do_page_fault) from [] (do_DataAbort+0x34/0xb4) [] (do_DataAbort) from [] (__dabt_usr+0x3c/0x40) Exception stack(0xc69dbfb0 to 0xc69dbff8) bfa0: 019fec50 00000002 cc684d00 cc684d00 bfc0: 00000001 b4214cf4 019fec50 00000002 000587b8 00000000 00000001 b4214cf0 bfe0: b4f39584 bec32990 b4eee238 b4e799a0 600f0010 ffffffff python D 0 9313 9304 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) python D 0 9326 9317 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) python D 0 9351 9342 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) python D 0 9361 9352 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) python D 0 9374 9365 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) python D 0 9385 9376 0x00000000 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (generic_perform_write+0x1 74/0x1a4) [] (generic_perform_write) from [] (__generic_file_write_iter+0x16c/0x1 98) [] (__generic_file_write_iter) from [] (ext4_file_write_iter+0x314/0x41 4) [] (ext4_file_write_iter) from [] (__vfs_write+0x100/0x128) [] (__vfs_write) from [] (vfs_write+0xc0/0x194) [] (vfs_write) from [] (SyS_write+0x44/0x7c) [] (SyS_write) from [] (ret_fast_syscall+0x0/0x4c) systemd-journal D 0 9678 1 0x00000080 [] (__schedule) from [] (schedule+0x98/0xbc) [] (schedule) from [] (schedule_timeout+0x328/0x3ac) [] (schedule_timeout) from [] (io_schedule_timeout+0x24/0x38) [] (io_schedule_timeout) from [] (balance_dirty_pages.constprop.6+0xac8 /0xc5c) [] (balance_dirty_pages.constprop.6) from [] (balance_dirty_pages_ratel imited+0x2b8/0x43c) [] (balance_dirty_pages_ratelimited) from [] (fault_dirty_shared_page+0 x9c/0xb4) [] (fault_dirty_shared_page) from [] (handle_mm_fault+0xc84/0xe08) [] (handle_mm_fault) from [] (do_page_fault+0x1f0/0x360) [] (do_page_fault) from [] (do_DataAbort+0x34/0xb4) [] (do_DataAbort) from [] (__dabt_usr+0x3c/0x40) Exception stack(0xc40d9fb0 to 0xc40d9ff8) 9fa0: b5e605e0 00000000 001c2b28 b5e64000 9fc0: 01b60ff0 00000000 be94e444 be94e448 001c6550 00000000 be94e660 be94e450 9fe0: 00000000 be94e400 b6cfffcc b6e94a50 20000010 ffffffff This looks like everything is blocked on the writeback completing but the writeback has been throttled. According to the infra team, this problem is _not_ seen without LPAE (i.e. only 4G of RAM). I did see https://patchwork.kernel.org/patch/10201593/ but that doesn't seem to quite match since this seems to be completely stuck. Any suggestions to narrow the problem down? Thanks, Laura -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org