From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matthew Wilcox Subject: Livelock when running xfstests generic/127 on ext4 with 3.15 Date: Fri, 20 Jun 2014 13:53:22 -0400 Message-ID: <20140620175322.GO12025@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org To: linux-ext4@vger.kernel.org Return-path: Received: from mga02.intel.com ([134.134.136.20]:22199 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752743AbaFTRxl (ORCPT ); Fri, 20 Jun 2014 13:53:41 -0400 Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-ID: I didn't see this with 3.14, but I'm not sure what's changed. When running generic/127, fsx ends up taking 30% CPU time with a kthread taking 70% CPU time for hours. It might be making forward progress, but if it is, it's incredibly slow. I can usually catch fsx waiting for the kthread: # ./check generic/127 FSTYP -- ext4 PLATFORM -- Linux/x86_64 walter 3.15.0 MKFS_OPTIONS -- /dev/ram1 MOUNT_OPTIONS -- -o acl,user_xattr /dev/ram1 /mnt/ram1 generic/127 19s ... $ sudo cat /proc/4795/stack [] writeback_inodes_sb_nr+0xa9/0xe0 [] try_to_writeback_inodes_sb_nr+0x5e/0x80 [] try_to_writeback_inodes_sb+0x25/0x30 [] ext4_nonda_switch+0x8a/0x90 [ext4] [] ext4_page_mkwrite+0x265/0x440 [ext4] [] do_page_mkwrite+0x3d/0x70 [] do_wp_page+0x627/0x770 [] handle_mm_fault+0x781/0xf00 [] __do_page_fault+0x186/0x570 [] do_page_fault+0x22/0x30 [] page_fault+0x28/0x30 [] 0xffffffffffffffff My setup is a 1GB ram disk: modprobe brd rd_size=1048576 rd_nr=2 local.config: TEST_DEV=/dev/ram0 TEST_DIR=/mnt/ram0 SCRATCH_DEV=/dev/ram1 SCRATCH_MNT=/mnt/ram1 Hardware is an Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4GB RAM, in case it matters. But I think what matters is that I'm running it on a "tiny" 1GB filesystem, since this code is only invoked whenever the number of dirty clusters is large relative to the number of free clusters. df shows: /dev/ram1 999320 1284 929224 1% /mnt/ram1 /dev/ram0 999320 646088 284420 70% /mnt/ram0 So it's not *unreasonably* full.