From mboxrd@z Thu Jan 1 00:00:00 1970 From: Amit Sahrawat Subject: Ext4: deadlock occurs when running fsstress and ENOSPC errors are seen. Date: Tue, 15 Apr 2014 13:24:26 +0530 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE To: Jan Kara , "Theodore Ts'o" , darrick.wong@oracle.com, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, LKML , Namjae Jeon Return-path: Received: from mail-we0-f172.google.com ([74.125.82.172]:52798 "EHLO mail-we0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750998AbaDOHy2 convert rfc822-to-8bit (ORCPT ); Tue, 15 Apr 2014 03:54:28 -0400 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Initially in normal write path, when the disk was almost full =E2=80=93= we got hung for the =E2=80=98sync=E2=80=99 because the flusher (which is busy = in the writepages is not responding). Before the hung task, we also found the logs like: EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1493, 0 clusters in bitmap, 58339 in gd EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1000, 0 clusters in bitmap, 3 in gd EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1425, 0 clusters in bitmap, 1 in gd JBD2: Spotted dirty metadata buffer (dev =3D sda1, blocknr =3D 0). Ther= e's a risk of filesystem corruption in case of system crash. JBD2: Spotted dirty metadata buffer (dev =3D sda1, blocknr =3D 0). Ther= e's a risk of filesystem corruption in case of system crash. EXT4-fs (sda1): error count: 58 EXT4-fs (sda1): initial error at 607: ext4_mb_generate_buddy:742 EXT4-fs (sda1): last error at 58: ext4_mb_generate_buddy:742 When we analysed the problem, it occurred from the writepages path in e= xt4. This is because of the difference in the free blocks reported by cluster bitmap and the number of free blocks reported by group descriptor. During ext4_fill_super, ext4 calculates the number of free blocks by reading all the descriptors in function ext4_count_free_clusters and store it in percpu counter s_freeclusters_counter. ext4_count_free_clusters: desc_count =3D 0; for (i =3D 0; i < ngroups; i++) { gdp =3D ext4_get_group_desc(sb, i, NULL); if (!gdp) continue; desc_count +=3D ext4_free_group_clusters(sb, gdp); } return desc_count; During writebegin call, ext4 checks this s_freeclusters_counter counter to know if there are free blocks present or not. When the free blocks reported by group descriptor are greater than the actual free blocks reported by bitmap, a call to writebegin could still succeed even if the free blocks represented by bitmaps are 0. This situation was seen from our logs above: EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1493, 0 clusters in bitmap, 58339 in gd EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1000, 0 clusters in bitmap, 3 in gd EXT4-fs error (device sda1): ext4_mb_generate_buddy:742: group 1425, 0 clusters in bitmap, 1 in gd In this situation, during writeback, block allocation fails as actually there are no free blocks available and ext4 revolves in unbreakable loop in function ext4_da_writepages. Due to this looping, sync thread which is waiting for completion for the flusher thread, which initiated the ext4 writeback, is never completed and hung task occur for sync thread. When searching for the relevant problem which occurs in this path. We got the patch-set from =E2=80=98Darrick=E2=80=99 which revolves around = this problem. ext4: error out if verifying the block bitmap fails ext4: fix type declaration of ext4_validate_block_bitmap ext4: mark block group as corrupt on block bitmap error ext4: mark block group as corrupt on inode bitmap error ext4: mark group corrupt on group descriptor checksum ext4: don't count free clusters from a corrupt block group After adopting the patch-set and performing verification on the similar setup, we ran =E2=80=98fsstress=E2=80=99. But now it is resulti= ng in hang at different points. In the current logs we got: EXT4-fs error (device sdb1): ext4_mb_generate_buddy:743: group 1, 20480 clusters in bitmap, 25443 in gd; block bitmap corrupt. JBD2: Spotted dirty metadata buffer (dev =3D sdb1, blocknr =3D 0). Ther= e's a risk of filesystem corruption in case of system crash. Hinting at the corruption in the filesystem for block bitmap And then logs like: EXT4-fs (sdb1): delayed block allocation failed for inode 241 at logical offset 17 with max blocks 22 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25443 EXT4-fs (sdb1): dirty_blocks=3D7847 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D22 EXT4-fs (sdb1): i_reserved_meta_blocks=3D1 EXT4-fs (sdb1): delayed block allocation failed for inode 218 at logical offset 82 with max blocks 17 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): delayed block allocation failed for inode 196 at logical offset 527 with max blocks 7 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25443 EXT4-fs (sdb1): dirty_blocks=3D7874 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D106 EXT4-fs (sdb1): i_reserved_meta_blocks=3D5 EXT4-fs (sdb1): free_blocks=3D25443 EXT4-fs (sdb1): dirty_blocks=3D7874 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D17 EXT4-fs (sdb1): i_reserved_meta_blocks=3D1 EXT4-fs (sdb1): delayed block allocation failed for inode 251 at logical offset 197 with max blocks 14 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25443 EXT4-fs (sdb1): dirty_blocks=3D7874 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D44 EXT4-fs (sdb1): i_reserved_meta_blocks=3D2 EXT4-fs (sdb1): delayed block allocation failed for inode 251 at logical offset 364 with max blocks 30 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25443 EXT4-fs (sdb1): dirty_blocks=3D7874 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D44 EXT4-fs (sdb1): i_reserved_meta_blocks=3D2 EXT4-fs (sdb1): delayed block allocation failed for inode 243 at logical offset 23 with max blocks 14 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25443 EXT4-fs (sdb1): dirty_blocks=3D7874 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D14 EXT4-fs (sdb1): i_reserved_meta_blocks=3D1 EXT4-fs (sdb1): delayed block allocation failed for inode 249 at logical offset 62 with max blocks 15 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25443 EXT4-fs (sdb1): dirty_blocks=3D7874 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D48 EXT4-fs (sdb1): i_reserved_meta_blocks=3D3 EXT4-fs (sdb1): delayed block allocation failed for inode 249 at logical offset 155 with max blocks 6 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25443 EXT4-fs (sdb1): dirty_blocks=3D7874 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D48 EXT4-fs (sdb1): i_reserved_meta_blocks=3D3 EXT4-fs (sdb1): delayed block allocation failed for inode 249 at logical offset 225 with max blocks 27 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 12 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25455 EXT4-fs (sdb1): dirty_blocks=3D7874 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D48 EXT4-fs (sdb1): i_reserved_meta_blocks=3D3 EXT4-fs (sdb1): delayed block allocation failed for inode 221 at logical offset 1 with max blocks 2 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25455 EXT4-fs (sdb1): dirty_blocks=3D7901 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D2 EXT4-fs (sdb1): i_reserved_meta_blocks=3D1 EXT4-fs (sdb1): delayed block allocation failed for inode 250 at logical offset 140 with max blocks 13 with error -28 EXT4-fs (sdb1): This should not happen!! Data will be lost EXT4-fs (sdb1): Total free blocks count 0 EXT4-fs (sdb1): Free/Dirty block details EXT4-fs (sdb1): free_blocks=3D25455 EXT4-fs (sdb1): dirty_blocks=3D7901 EXT4-fs (sdb1): Block reservation details EXT4-fs (sdb1): i_reserved_data_blocks=3D13 EXT4-fs (sdb1): i_reserved_meta_blocks=3D1 =46ollowed by the hungtask traces for: Pid: 662, comm: jbd2/sdb1-8 jbd2/sdb1-8 [2] D [e5354840] c03e910c 0 662 0x00000000 2 663 635 (kernel thread) Backtrace: [] (__schedule+0x0/0x5f4) from [] (schedule+0x40/0x= 80) [] (schedule+0x0/0x80) from [] (io_schedule+0x74/0x= a0) [] (io_schedule+0x0/0xa0) from [] (sleep_on_buffer+0x18/0x20) r6:e5921dd4 r5:002ecac6 r4:e5921dcc r3:00000002 [] (sleep_on_buffer+0x0/0x20) from [] (__wait_on_bit+0x8c/0xc4) [] (__wait_on_bit+0x0/0xc4) from [] (out_of_line_wait_on_bit+0x78/0x80) [] (out_of_line_wait_on_bit+0x0/0x80) from [] (__wait_on_buffer+0x30/0x38) r8:e5343e68 r7:e5343c00 r6:00000000 r5:e5677680 r4:00000000 [] (__wait_on_buffer+0x0/0x38) from [] (jbd2_journal_commit_transaction+0xbe8/0x1830) [] (jbd2_journal_commit_transaction+0x0/0x1830) from [] (kjournald2+0xb8/0x24c) [] (kjournald2+0x0/0x24c) from [] (kthread+0xb4/0xc= 0) [] (kthread+0x0/0xc0) from [] (ret_from_fork+0x14/0= x20) r7:00000000 r6:00000000 r5:c005c64c r4:e4cf5c04 Pid: 635, comm: flush-8:16 flush-8:16 [1] D [e5357b40] c03e910c 0 635 0x00000000 2 662 628 (kernel thread) Backtrace: [] (__schedule+0x0/0x5f4) from [] (schedule+0x40/0x= 80) [] (schedule+0x0/0x80) from [] (jbd2_log_wait_commit+0xa8/0x120) [] (jbd2_log_wait_commit+0x0/0x120) from [] (jbd2_journal_force_commit_nested+0x70/0xb4) [] (jbd2_journal_force_commit_nested+0x0/0xb4) from [] (ext4_da_writepages+0x450/0x5ec) r6:e588e000 r5:e5433660 r4:00000001 r3:ffffffe4 [] (ext4_da_writepages+0x0/0x5ec) from [] (do_writepages+0x34/0x48) [] (do_writepages+0x0/0x48) from [] (__writeback_single_inode+0x3c/0x17c) [] (__writeback_single_inode+0x0/0x17c) from [] (writeback_sb_inodes+0x1fc/0x39c) [] (writeback_sb_inodes+0x0/0x39c) from [] (wb_writeback+0xf8/0x334) [] (wb_writeback+0x0/0x334) from [] (wb_do_writeback+0xb0/0x280) [] (wb_do_writeback+0x0/0x280) from [] (bdi_writeback_thread+0x80/0x27c) [] (bdi_writeback_thread+0x0/0x27c) from [] (kthread+0xb4/0xc0) [] (kthread+0x0/0xc0) from [] (ret_from_fork+0x14/0= x20) r7:00000000 r6:00000000 r5:c005c64c r4:e5923e9c Kernel Version: 3.8 Test command: fsstress -p 10 -n 100 -l 100 -d /mnt/test_dir -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html