* Intentionally corrupted ext4s causing two different kernel panics at umount @ 2014-10-05 0:12 Sami Liedes 2014-10-06 2:48 ` [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode Theodore Ts'o 2014-10-07 20:56 ` One more corrupted fs crash in ext4_put_super Sami Liedes 0 siblings, 2 replies; 15+ messages in thread From: Sami Liedes @ 2014-10-05 0:12 UTC (permalink / raw) To: linux-ext4 [-- Attachment #1: Type: text/plain, Size: 12462 bytes --] Hi! I ran some fuzz tests on an ext4 filesystem on 3.16.3 and on 3.17-rc7 and found some filesystems that differ from a pristine filesystem by one bit and cause a kernel panic at unmount time. The set of operations I run for each filesystem is this: mount $TARGET_DEV /mnt -t $FSTYPE -o errors=continue cd /mnt timeout 30 cp -r doc doc2 >&/dev/null timeout 30 find -xdev >&/dev/null timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null timeout 30 mkdir tmp >&/dev/null timeout 30 echo whoah >tmp/filu >&/dev/null timeout 30 rm -rf /mnt/* >&/dev/null cd / umount /mnt I got two distinct backtraces, and for both of them I have two test images that differ from a clean ext4 filesystem by a single bit. You can get the pristine filesystem from http://www.niksula.hut.fi/~sliedes/ext4/testimg.ext4.pristine.bz2 For the rest of the files, see http://www.niksula.hut.fi/~sliedes/ext4/ 1. Crash in ext4_put_super ========================== Test filesystems and diffs to the pristine image: http://www.niksula.hut.fi/~sliedes/ext4/ext4_put_super/testimg.ext4.20942.min.bz2 --- /dev/fd/63 2014-10-05 02:22:36.822155073 +0300 +++ /dev/fd/62 2014-10-05 02:22:36.822155073 +0300 @@ -32572,7 +32572,7 @@ 001795a0 2d 70 63 73 70 6b 72 2d 65 76 65 6e 74 2d 73 70 |-pcspkr-event-sp| 001795b0 6b 72 0c 00 e1 01 00 00 20 00 18 02 62 75 73 5c |kr...... ...bus\| 001795c0 78 32 66 75 73 62 5c 78 32 66 30 30 38 5c 78 32 |x2fusb\x2f008\x2| -001795d0 66 30 30 31 05 02 00 00 18 00 0e 02 75 73 62 64 |f001........usbd| +001795d0 66 30 30 31 05 00 00 00 18 00 0e 02 75 73 62 64 |f001........usbd| 001795e0 65 76 37 2e 31 5f 65 70 38 31 10 00 1f 02 00 00 |ev7.1_ep81......| 001795f0 18 00 0e 02 75 73 62 64 65 76 31 2e 31 5f 65 70 |....usbdev1.1_ep| 00179600 30 30 04 02 25 02 00 00 18 00 0e 02 75 73 62 64 |00..%.......usbd| http://www.niksula.hut.fi/~sliedes/ext4/ext4_put_super/testimg.ext4.106360.min.bz2 --- /dev/fd/63 2014-10-05 02:22:36.501155217 +0300 +++ /dev/fd/62 2014-10-05 02:22:36.501155217 +0300 @@ -36271,7 +36271,7 @@ * 001b8400 03 04 00 00 0c 00 01 02 2e 00 00 00 0c 00 00 00 |................| 001b8410 0c 00 02 02 2e 2e 00 00 04 04 00 00 0c 00 04 04 |................| -001b8420 73 64 65 33 05 04 00 00 14 00 0c 04 72 6f 6f 74 |sde3........root| +001b8420 73 64 65 33 05 00 00 00 14 00 0c 04 72 6f 6f 74 |sde3........root| 001b8430 2d 63 72 79 70 74 65 64 06 04 00 00 24 00 1b 04 |-crypted....$...| 001b8440 6c 76 6d 32 7c 6d 79 5f 63 6f 6e 74 61 69 6e 65 |lvm2|my_containe| 001b8450 72 7c 6d 79 5f 72 65 67 69 6f 6e 00 07 04 00 00 |r|my_region.....| The backtrace, trimmed from http://www.niksula.hut.fi/~sliedes/ext4/ext4_put_super/testimg.ext4.20942.min.log [ 1.034753] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=continue [ 1.353376] EXT4-fs warning (device vdb): ext4_unlink:2820: Deleting nonexistent file (5), 0 [ 1.354480] EXT4-fs (vdb): Inode 5 (ffff8800048a0e10): orphan list check failed! [ 1.355433] ffff8800048a0e10: 00000000 00000000 00000000 00000000 ................ [...] [ 1.437175] ffff8800048a1500: 00000081 0000007f 00000000 00000000 ................ [ 1.437769] CPU: 0 PID: 207 Comm: rm Not tainted 3.16.3 #3 [ 1.438195] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1.438979] ffff8800048a0e10 ffff880000647dd0 ffffffff81850b5c ffff8800048a0f80 [ 1.439592] ffff880000647e00 ffffffff812615bd 0000000000000700 ffff880000000001 [ 1.440217] ffff8800048a0f80 ffff8800048a1000 ffff880000647e18 ffffffff8116d723 [ 1.440837] Call Trace: [ 1.441035] [<ffffffff81850b5c>] dump_stack+0x45/0x56 [ 1.441437] [<ffffffff812615bd>] ext4_destroy_inode+0x9d/0xa0 [ 1.441894] [<ffffffff8116d723>] destroy_inode+0x33/0x70 [ 1.442313] [<ffffffff8116dd72>] evict+0x112/0x1a0 [ 1.442696] [<ffffffff8116eacd>] iput+0xed/0x190 [ 1.443063] [<ffffffff81162cd7>] do_unlinkat+0x197/0x2c0 [ 1.443484] [<ffffffff81063485>] ? sys32_fstatat+0x15/0x30 [ 1.443920] [<ffffffff81162e16>] SyS_unlinkat+0x16/0x40 [ 1.444343] [<ffffffff81859aa8>] sysenter_dispatch+0x7/0x25 [ 1.447553] tsc: Refined TSC clocksource calibration: 3400.019 MHz [ 1.455218] EXT4-fs warning (device vdb): ext4_rmdir:2760: empty directory has too many links (3) [ 1.570473] EXT4-fs (vdb): sb orphan head is 5 [ 1.571220] sb_info orphan list: [ 1.571645] inode vdb:5 at ffff8800048a0f80: mode 100000, nlink 0, next 0 [ 1.572569] ------------[ cut here ]------------ [ 1.573168] kernel BUG at fs/ext4/super.c:836! [ 1.573745] invalid opcode: 0000 [#1] SMP [ 1.574308] CPU: 0 PID: 209 Comm: umount Not tainted 3.16.3 #3 [ 1.575060] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1.576354] task: ffff880005e5c100 ti: ffff880005e34000 task.ti: ffff880005e34000 [ 1.576549] RIP: 0010:[<ffffffff81261516>] [<ffffffff81261516>] ext4_put_super+0x366/0x370 [ 1.576549] RSP: 0018:ffff880005e37e70 EFLAGS: 00010202 [ 1.576549] RAX: 000000000000003f RBX: ffff880005e31800 RCX: 0000000000000006 [ 1.576549] RDX: 0000000000000007 RSI: 0000000000000001 RDI: 0000000000000246 [ 1.576549] RBP: ffff880005e37ea0 R08: 0000000000000001 R09: 0000000000000000 [ 1.576549] R10: 0000000000000000 R11: 0000000000000219 R12: ffff880005e31b28 [ 1.576549] R13: ffff880005e31000 R14: ffff880005e31a88 R15: ffff880005e31b28 [ 1.576549] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f746a780 [ 1.576549] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 1.576549] CR2: 0000000008d05014 CR3: 0000000005c2b000 CR4: 00000000000006b0 [ 1.576549] Stack: [ 1.576549] ffff880000000000 ffff880005e31000 ffff880005e310f8 ffffffff81a32840 [ 1.576549] 0000000000000000 0000000000000000 ffff880005e37ec8 ffffffff811547dd [ 1.576549] 0000000000000083 ffff880006c0e100 0000000000000000 ffff880005e37ee8 [ 1.576549] Call Trace: [ 1.576549] [<ffffffff811547dd>] generic_shutdown_super+0x6d/0xf0 [ 1.576549] [<ffffffff81155a12>] kill_block_super+0x22/0x70 [ 1.576549] [<ffffffff811544fc>] deactivate_locked_super+0x3c/0x60 [ 1.576549] [<ffffffff8115457c>] deactivate_super+0x5c/0x60 [ 1.576549] [<ffffffff811728c1>] mntput_no_expire+0x171/0x260 [ 1.576549] [<ffffffff811744aa>] ? SyS_oldumount+0x7a/0xe0 [ 1.576549] [<ffffffff811744aa>] SyS_oldumount+0x7a/0xe0 [ 1.576549] [<ffffffff81859aa8>] sysenter_dispatch+0x7/0x25 [ 1.576549] Code: b0 90 05 00 00 41 8b 87 64 ff ff ff 89 04 24 31 c0 e8 ab c1 5e 00 4d 8b 3f 4d 39 fc 75 b5 4c 3b a3 28 03 00 00 0f 84 af fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 4c 8d a7 90 fe [ 1.576549] RIP [<ffffffff81261516>] ext4_put_super+0x366/0x370 [ 1.576549] RSP <ffff880005e37e70> [ 1.596184] ---[ end trace e2c3a1b45e3598c1 ]--- [ 1.596551] Kernel panic - not syncing: Fatal exception [ 1.597076] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 1.597870] Rebooting in 1 seconds.. 2. Crash in start_this_handle ============================= Test filesystems and diffs to the pristine image: http://www.niksula.hut.fi/~sliedes/ext4/start_this_handle/testimg.ext4.8473.min.bz2 --- /dev/fd/63 2014-10-05 02:22:37.396154814 +0300 +++ /dev/fd/62 2014-10-05 02:22:37.395154815 +0300 @@ -164,7 +164,7 @@ * 0000b000 02 00 00 00 0c 00 01 02 2e 00 00 00 02 00 00 00 |................| 0000b010 0c 00 02 02 2e 2e 00 00 0b 00 00 00 14 00 0a 02 |................| -0000b020 6c 6f 73 74 2b 66 6f 75 6e 64 00 00 0c 00 00 00 |lost+found......| +0000b020 6c 6f 73 74 2b 66 6f 75 6e 64 00 00 08 00 00 00 |lost+found......| 0000b030 0c 00 03 02 64 65 76 00 ff 04 00 00 c8 03 03 02 |....dev.........| 0000b040 64 6f 63 00 00 00 00 00 00 00 00 00 00 00 00 00 |doc.............| 0000b050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| http://www.niksula.hut.fi/~sliedes/ext4/start_this_handle/testimg.ext4.610085.min.bz2 --- /dev/fd/63 2014-10-05 02:22:37.100154947 +0300 +++ /dev/fd/62 2014-10-05 02:22:37.100154947 +0300 @@ -36276,7 +36276,7 @@ 001b8440 6c 76 6d 32 7c 6d 79 5f 63 6f 6e 74 61 69 6e 65 |lvm2|my_containe| 001b8450 72 7c 6d 79 5f 72 65 67 69 6f 6e 00 07 04 00 00 |r|my_region.....| 001b8460 18 00 0f 04 6d 79 76 67 2d 72 6f 6f 74 5f 63 72 |....myvg-root_cr| -001b8470 79 70 74 00 08 04 00 00 28 00 1f 04 6c 76 6d 32 |ypt.....(...lvm2| +001b8470 79 70 74 00 08 00 00 00 28 00 1f 04 6c 76 6d 32 |ypt.....(...lvm2| 001b8480 7c 6d 79 5f 63 6f 6e 74 61 69 6e 65 72 7c 73 77 ||my_container|sw| 001b8490 61 70 30 2d 63 72 79 70 74 65 64 00 09 04 00 00 |ap0-crypted.....| 001b84a0 0c 00 04 04 73 64 64 32 0a 04 00 00 14 00 09 04 |....sdd2........| The backtrace, trimmed from http://www.niksula.hut.fi/~sliedes/ext4/start_this_handle/testimg.ext4.8473.min.log [ 1.025503] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=continue [ 1.275936] ------------[ cut here ]------------ [ 1.276860] kernel BUG at fs/jbd2/transaction.c:307! [ 1.277789] invalid opcode: 0000 [#1] SMP [ 1.278622] CPU: 0 PID: 208 Comm: umount Not tainted 3.16.3 #3 [ 1.279721] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1.279862] task: ffff880005db5140 ti: ffff88000042c000 task.ti: ffff88000042c000 [ 1.279862] RIP: 0010:[<ffffffff81293e60>] [<ffffffff81293e60>] start_this_handle+0x330/0x760 [ 1.279862] RSP: 0018:ffff88000042fc60 EFLAGS: 00010202 [ 1.279862] RAX: 0000000000000039 RBX: ffff880005e06828 RCX: 0000000000000002 [ 1.279862] RDX: 000000000000000a RSI: 0000000000000001 RDI: ffff880005e06828 [ 1.279862] RBP: ffff88000042fd00 R08: 0000000000000000 R09: 0000000000000000 [ 1.279862] R10: ffff880005e06840 R11: 0000000000000002 R12: ffff880005e06800 [ 1.279862] R13: ffff8800067fc000 R14: ffff880005e06800 R15: 0000000000000000 [ 1.279862] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f7424780 [ 1.279862] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 1.279862] CR2: 0000000009ae8014 CR3: 0000000005d53000 CR4: 00000000000006b0 [ 1.279862] Stack: [ 1.279862] 0000000000000286 ffff880005db5810 ffff8800049102b9 ffff880005e06df8 [ 1.279862] 0000000000000000 00000000fffedc46 ffff88000042fcc8 ffff8800067f9000 [ 1.279862] 0000005b00000050 ffffffff0000005b ffffffff81293a1b ffff8800067fc000 [ 1.279862] Call Trace: [ 1.279862] [<ffffffff81293a1b>] ? new_handle+0x1b/0x50 [ 1.279862] [<ffffffff8129451b>] jbd2__journal_start+0xcb/0x1a0 [ 1.279862] [<ffffffff8124a45d>] ? ext4_evict_inode+0x17d/0x500 [ 1.279862] [<ffffffff81272635>] __ext4_journal_start_sb+0x65/0xd0 [ 1.279862] [<ffffffff8124a45d>] ext4_evict_inode+0x17d/0x500 [ 1.279862] [<ffffffff8116dd0f>] evict+0xaf/0x1a0 [ 1.279862] [<ffffffff8116eacd>] iput+0xed/0x190 [ 1.279862] [<ffffffff8129f418>] jbd2_journal_destroy+0x1a8/0x240 [ 1.279862] [<ffffffff810a7710>] ? __wake_up_common+0x90/0x90 [ 1.279862] [<ffffffff8126120f>] ext4_put_super+0x5f/0x370 [ 1.279862] [<ffffffff811547dd>] generic_shutdown_super+0x6d/0xf0 [ 1.279862] [<ffffffff81155a12>] kill_block_super+0x22/0x70 [ 1.279862] [<ffffffff811544fc>] deactivate_locked_super+0x3c/0x60 [ 1.279862] [<ffffffff8115457c>] deactivate_super+0x5c/0x60 [ 1.279862] [<ffffffff811728c1>] mntput_no_expire+0x171/0x260 [ 1.279862] [<ffffffff811744aa>] ? SyS_oldumount+0x7a/0xe0 [ 1.279862] [<ffffffff811744aa>] SyS_oldumount+0x7a/0xe0 [ 1.279862] [<ffffffff81859aa8>] sysenter_dispatch+0x7/0x25 [ 1.279862] Code: 1f 40 00 8b 45 a8 3e 29 82 cc 00 00 00 4c 89 e7 e8 06 fc ff ff 48 89 df e8 fe 32 5c 00 49 8b 04 24 a8 01 0f 84 a7 fd ff ff 66 90 <0f> 0b 66 0f 1f 44 00 00 8b 45 a8 3e 41 29 00 48 89 df e8 19 34 [ 1.279862] RIP [<ffffffff81293e60>] start_this_handle+0x330/0x760 [ 1.279862] RSP <ffff88000042fc60> [ 1.301916] ---[ end trace 52c6387c01b65be9 ]--- [ 1.302279] Kernel panic - not syncing: Fatal exception [ 1.302792] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 1.303577] Rebooting in 1 seconds.. Sami [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode 2014-10-05 0:12 Intentionally corrupted ext4s causing two different kernel panics at umount Sami Liedes @ 2014-10-06 2:48 ` Theodore Ts'o 2014-10-06 2:48 ` [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups Theodore Ts'o 2014-10-06 15:06 ` [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode Jan Kara 2014-10-07 20:56 ` One more corrupted fs crash in ext4_put_super Sami Liedes 1 sibling, 2 replies; 15+ messages in thread From: Theodore Ts'o @ 2014-10-06 2:48 UTC (permalink / raw) To: Ext4 Developers List; +Cc: Theodore Ts'o, stable The boot loader inode (inode #5) should never be visible in the directory hierarchy, but it's possible if the file system is corrupted that there will be a directory entry that points at inode #5. In order to avoid accidentally trashing it, when such a directory inode is opened, the inode will be marked as a bad inode, so that it's not possible to modify (or read) the inode from userspace. Unfortunately, when we unlink this (invalid/illegal) directory entry, we will put the bad inode on the ophan list, and then when try to unlink the directory, we don't actually remove the bad inode from the orphan list before freeing in-memory inode structure. This means the in-memory orphan list is corrupted, leading to a kernel oops. In addition, avoid truncating a bad inode in ext4_destroy_inode(), since truncating the boot loader inode is not a smart thing to do. Reported-by: Sami Liedes <sami.liedes@iki.fi> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org --- fs/ext4/inode.c | 7 +++---- fs/ext4/namei.c | 2 +- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 41c4f97..59983b2 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -224,16 +224,15 @@ void ext4_evict_inode(struct inode *inode) goto no_delete; } - if (!is_bad_inode(inode)) - dquot_initialize(inode); + if (is_bad_inode(inode)) + goto no_delete; + dquot_initialize(inode); if (ext4_should_order_data(inode)) ext4_begin_ordered_truncate(inode, 0); truncate_inode_pages_final(&inode->i_data); WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count)); - if (is_bad_inode(inode)) - goto no_delete; /* * Protect us against freezing - iput() caller didn't have to have any diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 51705f8..a2a9d40 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2544,7 +2544,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode) int err = 0, rc; bool dirty = false; - if (!sbi->s_journal) + if (!sbi->s_journal || is_bad_inode(inode)) return 0; WARN_ON_ONCE(!(inode->i_state & (I_NEW | I_FREEING)) && -- 2.1.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups 2014-10-06 2:48 ` [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode Theodore Ts'o @ 2014-10-06 2:48 ` Theodore Ts'o 2014-10-06 2:52 ` Andreas Dilger 2014-10-06 15:09 ` Jan Kara 2014-10-06 15:06 ` [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode Jan Kara 1 sibling, 2 replies; 15+ messages in thread From: Theodore Ts'o @ 2014-10-06 2:48 UTC (permalink / raw) To: Ext4 Developers List; +Cc: Theodore Ts'o If there is a corrupted file system which has directory entries that point at reserved, metadata inodes, prohibit them from being used by treating them the same way we treat Boot Loader inodes --- that is, mark them to be bad inodes. This prohibits them from being opened, deleted, or modified via chmod, chown, utimes, etc. In particular, this prevents a corrupted file system which has a directory entry which points at the journal inode from being deleted and being released, after which point Much Hilarity Ensues. Reported-by: Sami Liedes <sami.liedes@iki.fi> Signed-off-by: Theodore Ts'o <tytso@mit.edu> --- fs/ext4/ext4.h | 1 + fs/ext4/inode.c | 10 ++++++++++ fs/ext4/namei.c | 4 ++-- fs/ext4/super.c | 2 +- 4 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 1eb5b7b..012e89b 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2109,6 +2109,7 @@ int do_journal_get_write_access(handle_t *handle, #define CONVERT_INLINE_DATA 2 extern struct inode *ext4_iget(struct super_block *, unsigned long); +extern struct inode *ext4_iget_normal(struct super_block *, unsigned long); extern int ext4_write_inode(struct inode *, struct writeback_control *); extern int ext4_setattr(struct dentry *, struct iattr *); extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 59983b2..437622c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4104,6 +4104,16 @@ bad_inode: return ERR_PTR(ret); } +struct inode *ext4_iget_normal(struct super_block *sb, unsigned long ino) +{ + struct inode *ret_inode = ext4_iget(sb, ino); + + if (ret_inode && !IS_ERR(ret_inode) && + ino < EXT4_FIRST_INO(sb) && ino != EXT4_ROOT_INO) + make_bad_inode(ret_inode); + return ret_inode; +} + static int ext4_inode_blocks_set(handle_t *handle, struct ext4_inode *raw_inode, struct ext4_inode_info *ei) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index a2a9d40..7037ecf 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1417,7 +1417,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi dentry); return ERR_PTR(-EIO); } - inode = ext4_iget(dir->i_sb, ino); + inode = ext4_iget_normal(dir->i_sb, ino); if (inode == ERR_PTR(-ESTALE)) { EXT4_ERROR_INODE(dir, "deleted inode referenced: %u", @@ -1450,7 +1450,7 @@ struct dentry *ext4_get_parent(struct dentry *child) return ERR_PTR(-EIO); } - return d_obtain_alias(ext4_iget(child->d_inode->i_sb, ino)); + return d_obtain_alias(ext4_iget_normal(child->d_inode->i_sb, ino)); } /* diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 1070d6e..a0811cc 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1001,7 +1001,7 @@ static struct inode *ext4_nfs_get_inode(struct super_block *sb, * Currently we don't know the generation for parent directory, so * a generation of 0 means "accept any" */ - inode = ext4_iget(sb, ino); + inode = ext4_iget_normal(sb, ino); if (IS_ERR(inode)) return ERR_CAST(inode); if (generation && inode->i_generation != generation) { -- 2.1.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups 2014-10-06 2:48 ` [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups Theodore Ts'o @ 2014-10-06 2:52 ` Andreas Dilger 2014-10-06 3:16 ` Theodore Ts'o 2014-10-06 15:09 ` Jan Kara 1 sibling, 1 reply; 15+ messages in thread From: Andreas Dilger @ 2014-10-06 2:52 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Ext4 Developers List [-- Attachment #1: Type: text/plain, Size: 3941 bytes --] On Oct 5, 2014, at 8:48 PM, Theodore Ts'o <tytso@mit.edu> wrote: > If there is a corrupted file system which has directory entries that > point at reserved, metadata inodes, prohibit them from being used by > treating them the same way we treat Boot Loader inodes --- that is, > mark them to be bad inodes. This prohibits them from being opened, > deleted, or modified via chmod, chown, utimes, etc. > > In particular, this prevents a corrupted file system which has a > directory entry which points at the journal inode from being deleted > and being released, after which point Much Hilarity Ensues. Wouldn't it be safer to change "ext4_iget()" to have these checks, and add an "ext4_iget_special()" or "ext4_iget_reserved()" for use in the few places that are opening reserved inodes? That would probably be safer for the future. Cheers, Andreas > Reported-by: Sami Liedes <sami.liedes@iki.fi> > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > fs/ext4/ext4.h | 1 + > fs/ext4/inode.c | 10 ++++++++++ > fs/ext4/namei.c | 4 ++-- > fs/ext4/super.c | 2 +- > 4 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 1eb5b7b..012e89b 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -2109,6 +2109,7 @@ int do_journal_get_write_access(handle_t *handle, > #define CONVERT_INLINE_DATA 2 > > extern struct inode *ext4_iget(struct super_block *, unsigned long); > +extern struct inode *ext4_iget_normal(struct super_block *, unsigned long); > extern int ext4_write_inode(struct inode *, struct writeback_control *); > extern int ext4_setattr(struct dentry *, struct iattr *); > extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 59983b2..437622c 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4104,6 +4104,16 @@ bad_inode: > return ERR_PTR(ret); > } > > +struct inode *ext4_iget_normal(struct super_block *sb, unsigned long ino) > +{ > + struct inode *ret_inode = ext4_iget(sb, ino); > + > + if (ret_inode && !IS_ERR(ret_inode) && > + ino < EXT4_FIRST_INO(sb) && ino != EXT4_ROOT_INO) > + make_bad_inode(ret_inode); > + return ret_inode; > +} > + > static int ext4_inode_blocks_set(handle_t *handle, > struct ext4_inode *raw_inode, > struct ext4_inode_info *ei) > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index a2a9d40..7037ecf 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -1417,7 +1417,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi > dentry); > return ERR_PTR(-EIO); > } > - inode = ext4_iget(dir->i_sb, ino); > + inode = ext4_iget_normal(dir->i_sb, ino); > if (inode == ERR_PTR(-ESTALE)) { > EXT4_ERROR_INODE(dir, > "deleted inode referenced: %u", > @@ -1450,7 +1450,7 @@ struct dentry *ext4_get_parent(struct dentry *child) > return ERR_PTR(-EIO); > } > > - return d_obtain_alias(ext4_iget(child->d_inode->i_sb, ino)); > + return d_obtain_alias(ext4_iget_normal(child->d_inode->i_sb, ino)); > } > > /* > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 1070d6e..a0811cc 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -1001,7 +1001,7 @@ static struct inode *ext4_nfs_get_inode(struct super_block *sb, > * Currently we don't know the generation for parent directory, so > * a generation of 0 means "accept any" > */ > - inode = ext4_iget(sb, ino); > + inode = ext4_iget_normal(sb, ino); > if (IS_ERR(inode)) > return ERR_CAST(inode); > if (generation && inode->i_generation != generation) { > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups 2014-10-06 2:52 ` Andreas Dilger @ 2014-10-06 3:16 ` Theodore Ts'o 0 siblings, 0 replies; 15+ messages in thread From: Theodore Ts'o @ 2014-10-06 3:16 UTC (permalink / raw) To: Andreas Dilger; +Cc: Ext4 Developers List On Sun, Oct 05, 2014 at 08:52:38PM -0600, Andreas Dilger wrote: > On Oct 5, 2014, at 8:48 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > If there is a corrupted file system which has directory entries that > > point at reserved, metadata inodes, prohibit them from being used by > > treating them the same way we treat Boot Loader inodes --- that is, > > mark them to be bad inodes. This prohibits them from being opened, > > deleted, or modified via chmod, chown, utimes, etc. > > > > In particular, this prevents a corrupted file system which has a > > directory entry which points at the journal inode from being deleted > > and being released, after which point Much Hilarity Ensues. > > Wouldn't it be safer to change "ext4_iget()" to have these checks, > and add an "ext4_iget_special()" or "ext4_iget_reserved()" for use > in the few places that are opening reserved inodes? That would > probably be safer for the future. There is actually much larger set of places where we iget reserved inodes -- in fact, double the he number of places where we return inodes back up to the VFS --- 3 for the latter, and 6 for the former. As for future additions, it's much more likely that we would be adding new code paths to read reserved inodes. New VFS functionality tends to go through the dcache layer, so I don't see the likelihood of needing to add a new call to ext4_iget_normal() any time soon. Cheers, - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups 2014-10-06 2:48 ` [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups Theodore Ts'o 2014-10-06 2:52 ` Andreas Dilger @ 2014-10-06 15:09 ` Jan Kara 2014-10-06 18:55 ` Theodore Ts'o 1 sibling, 1 reply; 15+ messages in thread From: Jan Kara @ 2014-10-06 15:09 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Ext4 Developers List On Sun 05-10-14 22:48:02, Ted Tso wrote: > If there is a corrupted file system which has directory entries that > point at reserved, metadata inodes, prohibit them from being used by > treating them the same way we treat Boot Loader inodes --- that is, > mark them to be bad inodes. This prohibits them from being opened, > deleted, or modified via chmod, chown, utimes, etc. > > In particular, this prevents a corrupted file system which has a > directory entry which points at the journal inode from being deleted > and being released, after which point Much Hilarity Ensues. > > Reported-by: Sami Liedes <sami.liedes@iki.fi> > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > fs/ext4/ext4.h | 1 + > fs/ext4/inode.c | 10 ++++++++++ > fs/ext4/namei.c | 4 ++-- > fs/ext4/super.c | 2 +- > 4 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 1eb5b7b..012e89b 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -2109,6 +2109,7 @@ int do_journal_get_write_access(handle_t *handle, > #define CONVERT_INLINE_DATA 2 > > extern struct inode *ext4_iget(struct super_block *, unsigned long); > +extern struct inode *ext4_iget_normal(struct super_block *, unsigned long); > extern int ext4_write_inode(struct inode *, struct writeback_control *); > extern int ext4_setattr(struct dentry *, struct iattr *); > extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry, > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 59983b2..437622c 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4104,6 +4104,16 @@ bad_inode: > return ERR_PTR(ret); > } > > +struct inode *ext4_iget_normal(struct super_block *sb, unsigned long ino) > +{ > + struct inode *ret_inode = ext4_iget(sb, ino); > + > + if (ret_inode && !IS_ERR(ret_inode) && > + ino < EXT4_FIRST_INO(sb) && ino != EXT4_ROOT_INO) > + make_bad_inode(ret_inode); > + return ret_inode; Hum, why don't we just return an error (like EIO) when invalid inode number is passed? Honza > +} > + > static int ext4_inode_blocks_set(handle_t *handle, > struct ext4_inode *raw_inode, > struct ext4_inode_info *ei) > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index a2a9d40..7037ecf 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -1417,7 +1417,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi > dentry); > return ERR_PTR(-EIO); > } > - inode = ext4_iget(dir->i_sb, ino); > + inode = ext4_iget_normal(dir->i_sb, ino); > if (inode == ERR_PTR(-ESTALE)) { > EXT4_ERROR_INODE(dir, > "deleted inode referenced: %u", > @@ -1450,7 +1450,7 @@ struct dentry *ext4_get_parent(struct dentry *child) > return ERR_PTR(-EIO); > } > > - return d_obtain_alias(ext4_iget(child->d_inode->i_sb, ino)); > + return d_obtain_alias(ext4_iget_normal(child->d_inode->i_sb, ino)); > } > > /* > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 1070d6e..a0811cc 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -1001,7 +1001,7 @@ static struct inode *ext4_nfs_get_inode(struct super_block *sb, > * Currently we don't know the generation for parent directory, so > * a generation of 0 means "accept any" > */ > - inode = ext4_iget(sb, ino); > + inode = ext4_iget_normal(sb, ino); > if (IS_ERR(inode)) > return ERR_CAST(inode); > if (generation && inode->i_generation != generation) { > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups 2014-10-06 15:09 ` Jan Kara @ 2014-10-06 18:55 ` Theodore Ts'o 0 siblings, 0 replies; 15+ messages in thread From: Theodore Ts'o @ 2014-10-06 18:55 UTC (permalink / raw) To: Jan Kara; +Cc: Ext4 Developers List On Mon, Oct 06, 2014 at 05:09:03PM +0200, Jan Kara wrote: > > + if (ret_inode && !IS_ERR(ret_inode) && > > + ino < EXT4_FIRST_INO(sb) && ino != EXT4_ROOT_INO) > > + make_bad_inode(ret_inode); > > + return ret_inode; > Hum, why don't we just return an error (like EIO) when invalid inode > number is passed? Yeah, I guess we can do that. We need to support the make_bad_inode() for the sake of EXT4_IOC_SWAP_BOOT. But that code path doesn't need to use ext4_iget_normal(). So yeah, in the case of ext4_iget_normal(), we should be able to just return -EIO and let the userspace fail fast with the open(2) instead of later on with the read(2) or write(2) or truncate(2) call. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode 2014-10-06 2:48 ` [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode Theodore Ts'o 2014-10-06 2:48 ` [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups Theodore Ts'o @ 2014-10-06 15:06 ` Jan Kara 1 sibling, 0 replies; 15+ messages in thread From: Jan Kara @ 2014-10-06 15:06 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Ext4 Developers List, stable On Sun 05-10-14 22:48:01, Ted Tso wrote: > The boot loader inode (inode #5) should never be visible in the > directory hierarchy, but it's possible if the file system is corrupted > that there will be a directory entry that points at inode #5. In > order to avoid accidentally trashing it, when such a directory inode > is opened, the inode will be marked as a bad inode, so that it's not > possible to modify (or read) the inode from userspace. > > Unfortunately, when we unlink this (invalid/illegal) directory entry, > we will put the bad inode on the ophan list, and then when try to > unlink the directory, we don't actually remove the bad inode from the > orphan list before freeing in-memory inode structure. This means the > in-memory orphan list is corrupted, leading to a kernel oops. > > In addition, avoid truncating a bad inode in ext4_destroy_inode(), > since truncating the boot loader inode is not a smart thing to do. > > Reported-by: Sami Liedes <sami.liedes@iki.fi> > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > Cc: stable@vger.kernel.org The patch looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/inode.c | 7 +++---- > fs/ext4/namei.c | 2 +- > 2 files changed, 4 insertions(+), 5 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 41c4f97..59983b2 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -224,16 +224,15 @@ void ext4_evict_inode(struct inode *inode) > goto no_delete; > } > > - if (!is_bad_inode(inode)) > - dquot_initialize(inode); > + if (is_bad_inode(inode)) > + goto no_delete; > + dquot_initialize(inode); > > if (ext4_should_order_data(inode)) > ext4_begin_ordered_truncate(inode, 0); > truncate_inode_pages_final(&inode->i_data); > > WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count)); > - if (is_bad_inode(inode)) > - goto no_delete; > > /* > * Protect us against freezing - iput() caller didn't have to have any > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index 51705f8..a2a9d40 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -2544,7 +2544,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode) > int err = 0, rc; > bool dirty = false; > > - if (!sbi->s_journal) > + if (!sbi->s_journal || is_bad_inode(inode)) > return 0; > > WARN_ON_ONCE(!(inode->i_state & (I_NEW | I_FREEING)) && > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 15+ messages in thread
* One more corrupted fs crash in ext4_put_super 2014-10-05 0:12 Intentionally corrupted ext4s causing two different kernel panics at umount Sami Liedes 2014-10-06 2:48 ` [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode Theodore Ts'o @ 2014-10-07 20:56 ` Sami Liedes 2014-10-07 21:57 ` Darrick J. Wong 2014-10-09 20:15 ` Sami Liedes 1 sibling, 2 replies; 15+ messages in thread From: Sami Liedes @ 2014-10-07 20:56 UTC (permalink / raw) To: linux-ext4; +Cc: Theodore Ts'o [-- Attachment #1: Type: text/plain, Size: 4187 bytes --] Hi, Here's one more filesystem that causes a crash in ext4_put_super on 3.17 both with and without the two patches from this thread applied. Interestingly this one does not seem to crash on 3.16.4, with or without the patches from this thread. Even on 3.17 I *think* I've seen it not crash, but the reproducibility seems to be well over 95%. Crashing image: http://www.niksula.hut.fi/~sliedes/ext4/ext4_put_super/testimg.ext4.112041.min.bz2 Pristine image: http://www.niksula.hut.fi/~sliedes/ext4/testimg.ext4.pristine.bz2 Diff: --- /dev/fd/63 2014-10-07 23:52:33.397018880 +0300 +++ /dev/fd/62 2014-10-07 23:52:33.398018880 +0300 @@ -36771,7 +36771,7 @@ 001bd040 65 76 65 6e 74 30 00 00 b8 04 00 00 10 00 05 02 |event0..........| 001bd050 62 79 2d 69 64 00 00 00 bc 04 00 00 10 00 07 02 |by-id...........| 001bd060 62 79 2d 70 61 74 68 00 c2 04 00 00 10 00 06 03 |by-path.........| -001bd070 65 76 65 6e 74 35 00 00 c3 04 00 00 0c 00 04 03 |event5..........| +001bd070 65 76 65 6e 74 35 00 00 c3 00 00 00 0c 00 04 03 |event5..........| 001bd080 6d 69 63 65 c4 04 00 00 10 00 06 03 65 76 65 6e |mice........even| 001bd090 74 32 00 00 c5 04 00 00 10 00 06 03 65 76 65 6e |t2..........even| 001bd0a0 74 33 00 00 c6 04 00 00 5c 03 06 03 65 76 65 6e |t3......\...even| Backtrace: [ 1.936509] EXT4-fs (vdb): sb orphan head is 195 [ 1.936889] sb_info orphan list: [ 1.937145] inode vdb:195 at ffff880006675d90: mode 40755, nlink 0, next 0 [ 1.937699] ------------[ cut here ]------------ [ 1.938057] kernel BUG at fs/ext4/super.c:836! [ 1.938419] invalid opcode: 0000 [#1] SMP [ 1.938788] CPU: 0 PID: 1041 Comm: umount Not tainted 3.17.0+ #32 [ 1.939278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1.940059] task: ffff8800060bd2d0 ti: ffff88000639c000 task.ti: ffff88000639c000 [ 1.940299] RIP: 0010:[<ffffffff812753e6>] [<ffffffff812753e6>] ext4_put_super+0x366/0x370 [ 1.940299] RSP: 0018:ffff88000639fe70 EFLAGS: 00010287 [ 1.940299] RAX: 0000000000000040 RBX: ffff8800063b6800 RCX: 0000000000006665 [ 1.940299] RDX: 0000000000000040 RSI: 0000000000000001 RDI: 0000000000000286 [ 1.940299] RBP: ffff88000639fea0 R08: 0000000000000001 R09: 0000000000000000 [ 1.940299] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800063b6b28 [ 1.940299] R13: ffff8800063b6000 R14: ffff8800063b6a88 R15: ffff8800063b6b28 [ 1.940299] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f7549780 [ 1.940299] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 1.940299] CR2: 000000000a02e004 CR3: 000000000635f000 CR4: 00000000000006b0 [ 1.940299] Stack: [ 1.940299] ffff880000000000 ffff8800063b6000 ffff8800063b60f8 ffffffff81a33e00 [ 1.940299] 0000000000000000 0000000000000000 ffff88000639fec8 ffffffff81164ebd [ 1.940299] 0000000000000083 ffff880006c0d600 ffff8800063a2780 ffff88000639fee8 [ 1.940299] Call Trace: [ 1.940299] [<ffffffff81164ebd>] generic_shutdown_super+0x6d/0xf0 [ 1.940299] [<ffffffff81166122>] kill_block_super+0x22/0x70 [ 1.940299] [<ffffffff81164bdc>] deactivate_locked_super+0x3c/0x60 [ 1.940299] [<ffffffff81164c5c>] deactivate_super+0x5c/0x60 [ 1.940299] [<ffffffff81183cd0>] mntput_no_expire+0x180/0x210 [ 1.940299] [<ffffffff81185757>] ? SyS_umount+0x87/0x100 [ 1.940299] [<ffffffff81185757>] SyS_umount+0x87/0x100 [ 1.940299] [<ffffffff8188e888>] sysenter_dispatch+0x7/0x2a [ 1.940299] [<ffffffff8165e9cb>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 1.940299] Code: b0 10 05 00 00 41 8b 87 64 ff ff ff 89 04 24 31 c0 e8 f7 ae 60 00 4d 8b 3f 4d 39 fc 75 b5 4c 3b a3 28 03 00 00 0f 84 af fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 4c 8d a7 90 fe [ 1.940299] RIP [<ffffffff812753e6>] ext4_put_super+0x366/0x370 [ 1.940299] RSP <ffff88000639fe70> [ 1.958649] ---[ end trace 6419dd181c457894 ]--- [ 1.959008] Kernel panic - not syncing: Fatal exception [ 1.959568] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 1.960337] Rebooting in 1 seconds.. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: One more corrupted fs crash in ext4_put_super 2014-10-07 20:56 ` One more corrupted fs crash in ext4_put_super Sami Liedes @ 2014-10-07 21:57 ` Darrick J. Wong 2014-10-07 22:22 ` Darrick J. Wong 2014-10-09 20:15 ` Sami Liedes 1 sibling, 1 reply; 15+ messages in thread From: Darrick J. Wong @ 2014-10-07 21:57 UTC (permalink / raw) To: Sami Liedes, linux-ext4, Theodore Ts'o On Tue, Oct 07, 2014 at 11:56:43PM +0300, Sami Liedes wrote: > Hi, > > Here's one more filesystem that causes a crash in ext4_put_super on > 3.17 both with and without the two patches from this thread applied. > > Interestingly this one does not seem to crash on 3.16.4, with or > without the patches from this thread. Even on 3.17 I *think* I've seen > it not crash, but the reproducibility seems to be well over 95%. Oh, I got it to crash on 3.17. :) Does mounting with -o block_validity eliminate the backtrace, at least? With that option, I get this instead: EXT4-fs error (device loop0): ext4_map_blocks:559: inode #8: block 139: comm jbd2/loop0-8: lblock 15 mapped to illegal pblock (length 1) jbd2_journal_bmap: journal block not found at offset 15 on loop0-8 ...and a journal abort. Not nice, but at least the kernel doesn't blow up. --D > > Crashing image: > > http://www.niksula.hut.fi/~sliedes/ext4/ext4_put_super/testimg.ext4.112041.min.bz2 > > Pristine image: > > http://www.niksula.hut.fi/~sliedes/ext4/testimg.ext4.pristine.bz2 > > Diff: > > --- /dev/fd/63 2014-10-07 23:52:33.397018880 +0300 > +++ /dev/fd/62 2014-10-07 23:52:33.398018880 +0300 > @@ -36771,7 +36771,7 @@ > 001bd040 65 76 65 6e 74 30 00 00 b8 04 00 00 10 00 05 02 |event0..........| > 001bd050 62 79 2d 69 64 00 00 00 bc 04 00 00 10 00 07 02 |by-id...........| > 001bd060 62 79 2d 70 61 74 68 00 c2 04 00 00 10 00 06 03 |by-path.........| > -001bd070 65 76 65 6e 74 35 00 00 c3 04 00 00 0c 00 04 03 |event5..........| > +001bd070 65 76 65 6e 74 35 00 00 c3 00 00 00 0c 00 04 03 |event5..........| > 001bd080 6d 69 63 65 c4 04 00 00 10 00 06 03 65 76 65 6e |mice........even| > 001bd090 74 32 00 00 c5 04 00 00 10 00 06 03 65 76 65 6e |t2..........even| > 001bd0a0 74 33 00 00 c6 04 00 00 5c 03 06 03 65 76 65 6e |t3......\...even| > > Backtrace: > > [ 1.936509] EXT4-fs (vdb): sb orphan head is 195 > [ 1.936889] sb_info orphan list: > [ 1.937145] inode vdb:195 at ffff880006675d90: mode 40755, nlink 0, next 0 > [ 1.937699] ------------[ cut here ]------------ > [ 1.938057] kernel BUG at fs/ext4/super.c:836! > [ 1.938419] invalid opcode: 0000 [#1] SMP > [ 1.938788] CPU: 0 PID: 1041 Comm: umount Not tainted 3.17.0+ #32 > [ 1.939278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 > [ 1.940059] task: ffff8800060bd2d0 ti: ffff88000639c000 task.ti: ffff88000639c000 > [ 1.940299] RIP: 0010:[<ffffffff812753e6>] [<ffffffff812753e6>] ext4_put_super+0x366/0x370 > [ 1.940299] RSP: 0018:ffff88000639fe70 EFLAGS: 00010287 > [ 1.940299] RAX: 0000000000000040 RBX: ffff8800063b6800 RCX: 0000000000006665 > [ 1.940299] RDX: 0000000000000040 RSI: 0000000000000001 RDI: 0000000000000286 > [ 1.940299] RBP: ffff88000639fea0 R08: 0000000000000001 R09: 0000000000000000 > [ 1.940299] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800063b6b28 > [ 1.940299] R13: ffff8800063b6000 R14: ffff8800063b6a88 R15: ffff8800063b6b28 > [ 1.940299] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f7549780 > [ 1.940299] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b > [ 1.940299] CR2: 000000000a02e004 CR3: 000000000635f000 CR4: 00000000000006b0 > [ 1.940299] Stack: > [ 1.940299] ffff880000000000 ffff8800063b6000 ffff8800063b60f8 ffffffff81a33e00 > [ 1.940299] 0000000000000000 0000000000000000 ffff88000639fec8 ffffffff81164ebd > [ 1.940299] 0000000000000083 ffff880006c0d600 ffff8800063a2780 ffff88000639fee8 > [ 1.940299] Call Trace: > [ 1.940299] [<ffffffff81164ebd>] generic_shutdown_super+0x6d/0xf0 > [ 1.940299] [<ffffffff81166122>] kill_block_super+0x22/0x70 > [ 1.940299] [<ffffffff81164bdc>] deactivate_locked_super+0x3c/0x60 > [ 1.940299] [<ffffffff81164c5c>] deactivate_super+0x5c/0x60 > [ 1.940299] [<ffffffff81183cd0>] mntput_no_expire+0x180/0x210 > [ 1.940299] [<ffffffff81185757>] ? SyS_umount+0x87/0x100 > [ 1.940299] [<ffffffff81185757>] SyS_umount+0x87/0x100 > [ 1.940299] [<ffffffff8188e888>] sysenter_dispatch+0x7/0x2a > [ 1.940299] [<ffffffff8165e9cb>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [ 1.940299] Code: b0 10 05 00 00 41 8b 87 64 ff ff ff 89 04 24 31 c0 e8 f7 ae 60 00 4d 8b 3f 4d 39 fc 75 b5 4c 3b a3 28 03 00 00 0f 84 af fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 4c 8d a7 90 fe > [ 1.940299] RIP [<ffffffff812753e6>] ext4_put_super+0x366/0x370 > [ 1.940299] RSP <ffff88000639fe70> > [ 1.958649] ---[ end trace 6419dd181c457894 ]--- > [ 1.959008] Kernel panic - not syncing: Fatal exception > [ 1.959568] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) > [ 1.960337] Rebooting in 1 seconds.. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: One more corrupted fs crash in ext4_put_super 2014-10-07 21:57 ` Darrick J. Wong @ 2014-10-07 22:22 ` Darrick J. Wong 0 siblings, 0 replies; 15+ messages in thread From: Darrick J. Wong @ 2014-10-07 22:22 UTC (permalink / raw) To: Sami Liedes, linux-ext4, Theodore Ts'o On Tue, Oct 07, 2014 at 02:57:40PM -0700, Darrick J. Wong wrote: > On Tue, Oct 07, 2014 at 11:56:43PM +0300, Sami Liedes wrote: > > Hi, > > > > Here's one more filesystem that causes a crash in ext4_put_super on > > 3.17 both with and without the two patches from this thread applied. > > > > Interestingly this one does not seem to crash on 3.16.4, with or > > without the patches from this thread. Even on 3.17 I *think* I've seen > > it not crash, but the reproducibility seems to be well over 95%. > > Oh, I got it to crash on 3.17. :) > > Does mounting with -o block_validity eliminate the backtrace, at least? With > that option, I get this instead: > > EXT4-fs error (device loop0): ext4_map_blocks:559: inode #8: block 139: comm jbd2/loop0-8: lblock 15 mapped to illegal pblock (length 1) > jbd2_journal_bmap: journal block not found at offset 15 on loop0-8 > > ...and a journal abort. Not nice, but at least the kernel doesn't blow up. Rats, replied to the wrong crash report. All of what I said applies to the jbd2_commit_transaction crash, not this. --D > > --D > > > > > Crashing image: > > > > http://www.niksula.hut.fi/~sliedes/ext4/ext4_put_super/testimg.ext4.112041.min.bz2 > > > > Pristine image: > > > > http://www.niksula.hut.fi/~sliedes/ext4/testimg.ext4.pristine.bz2 > > > > Diff: > > > > --- /dev/fd/63 2014-10-07 23:52:33.397018880 +0300 > > +++ /dev/fd/62 2014-10-07 23:52:33.398018880 +0300 > > @@ -36771,7 +36771,7 @@ > > 001bd040 65 76 65 6e 74 30 00 00 b8 04 00 00 10 00 05 02 |event0..........| > > 001bd050 62 79 2d 69 64 00 00 00 bc 04 00 00 10 00 07 02 |by-id...........| > > 001bd060 62 79 2d 70 61 74 68 00 c2 04 00 00 10 00 06 03 |by-path.........| > > -001bd070 65 76 65 6e 74 35 00 00 c3 04 00 00 0c 00 04 03 |event5..........| > > +001bd070 65 76 65 6e 74 35 00 00 c3 00 00 00 0c 00 04 03 |event5..........| > > 001bd080 6d 69 63 65 c4 04 00 00 10 00 06 03 65 76 65 6e |mice........even| > > 001bd090 74 32 00 00 c5 04 00 00 10 00 06 03 65 76 65 6e |t2..........even| > > 001bd0a0 74 33 00 00 c6 04 00 00 5c 03 06 03 65 76 65 6e |t3......\...even| > > > > Backtrace: > > > > [ 1.936509] EXT4-fs (vdb): sb orphan head is 195 > > [ 1.936889] sb_info orphan list: > > [ 1.937145] inode vdb:195 at ffff880006675d90: mode 40755, nlink 0, next 0 > > [ 1.937699] ------------[ cut here ]------------ > > [ 1.938057] kernel BUG at fs/ext4/super.c:836! > > [ 1.938419] invalid opcode: 0000 [#1] SMP > > [ 1.938788] CPU: 0 PID: 1041 Comm: umount Not tainted 3.17.0+ #32 > > [ 1.939278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 > > [ 1.940059] task: ffff8800060bd2d0 ti: ffff88000639c000 task.ti: ffff88000639c000 > > [ 1.940299] RIP: 0010:[<ffffffff812753e6>] [<ffffffff812753e6>] ext4_put_super+0x366/0x370 > > [ 1.940299] RSP: 0018:ffff88000639fe70 EFLAGS: 00010287 > > [ 1.940299] RAX: 0000000000000040 RBX: ffff8800063b6800 RCX: 0000000000006665 > > [ 1.940299] RDX: 0000000000000040 RSI: 0000000000000001 RDI: 0000000000000286 > > [ 1.940299] RBP: ffff88000639fea0 R08: 0000000000000001 R09: 0000000000000000 > > [ 1.940299] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800063b6b28 > > [ 1.940299] R13: ffff8800063b6000 R14: ffff8800063b6a88 R15: ffff8800063b6b28 > > [ 1.940299] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f7549780 > > [ 1.940299] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b > > [ 1.940299] CR2: 000000000a02e004 CR3: 000000000635f000 CR4: 00000000000006b0 > > [ 1.940299] Stack: > > [ 1.940299] ffff880000000000 ffff8800063b6000 ffff8800063b60f8 ffffffff81a33e00 > > [ 1.940299] 0000000000000000 0000000000000000 ffff88000639fec8 ffffffff81164ebd > > [ 1.940299] 0000000000000083 ffff880006c0d600 ffff8800063a2780 ffff88000639fee8 > > [ 1.940299] Call Trace: > > [ 1.940299] [<ffffffff81164ebd>] generic_shutdown_super+0x6d/0xf0 > > [ 1.940299] [<ffffffff81166122>] kill_block_super+0x22/0x70 > > [ 1.940299] [<ffffffff81164bdc>] deactivate_locked_super+0x3c/0x60 > > [ 1.940299] [<ffffffff81164c5c>] deactivate_super+0x5c/0x60 > > [ 1.940299] [<ffffffff81183cd0>] mntput_no_expire+0x180/0x210 > > [ 1.940299] [<ffffffff81185757>] ? SyS_umount+0x87/0x100 > > [ 1.940299] [<ffffffff81185757>] SyS_umount+0x87/0x100 > > [ 1.940299] [<ffffffff8188e888>] sysenter_dispatch+0x7/0x2a > > [ 1.940299] [<ffffffff8165e9cb>] ? trace_hardirqs_on_thunk+0x3a/0x3f > > [ 1.940299] Code: b0 10 05 00 00 41 8b 87 64 ff ff ff 89 04 24 31 c0 e8 f7 ae 60 00 4d 8b 3f 4d 39 fc 75 b5 4c 3b a3 28 03 00 00 0f 84 af fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 4c 8d a7 90 fe > > [ 1.940299] RIP [<ffffffff812753e6>] ext4_put_super+0x366/0x370 > > [ 1.940299] RSP <ffff88000639fe70> > > [ 1.958649] ---[ end trace 6419dd181c457894 ]--- > > [ 1.959008] Kernel panic - not syncing: Fatal exception > > [ 1.959568] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) > > [ 1.960337] Rebooting in 1 seconds.. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: One more corrupted fs crash in ext4_put_super 2014-10-07 20:56 ` One more corrupted fs crash in ext4_put_super Sami Liedes 2014-10-07 21:57 ` Darrick J. Wong @ 2014-10-09 20:15 ` Sami Liedes 2014-10-09 20:49 ` Darrick J. Wong 1 sibling, 1 reply; 15+ messages in thread From: Sami Liedes @ 2014-10-09 20:15 UTC (permalink / raw) To: linux-ext4 [-- Attachment #1: Type: text/plain, Size: 1837 bytes --] On Tue, Oct 07, 2014 at 11:56:43PM +0300, Sami Liedes wrote: > Here's one more filesystem that causes a crash in ext4_put_super on > 3.17 both with and without the two patches from this thread applied. Ok, I bisected a bit. FWIW. No crash on 3.16.4 + these two patches: 1c8944cbe1b ext4: add ext4_iget_normal() which is to be used for dir tree lookups b65ad45743c ext4: don't orphan or truncate the boot loader inode Crash on 3.17 + the above two patches. The first commit that crashes on this test with the above patches: # first bad commit: [908790fa3b779d37365e6b28e3aa0f6e833020c3] dcache: d_splice_alias mustn't create directory aliases commit 908790fa3b779d37365e6b28e3aa0f6e833020c3 Author: J. Bruce Fields <bfields@redhat.com> Date: Mon Feb 17 17:58:42 2014 -0500 dcache: d_splice_alias mustn't create directory aliases Currently if d_splice_alias finds a directory with an alias that is not IS_ROOT or not DCACHE_DISCONNECTED, it creates a duplicate directory. Duplicate directory dentries are unacceptable; it is better just to error out. (In the case of a local filesystem the most likely case is filesystem corruption: for example, perhaps two directories point to the same child directory, and the other parent has already been found and cached.) Note that distributed filesystems may encounter this case in normal operation if a remote host moves a directory to a location different from the one we last cached in the dcache. For that reason, such filesystems should instead use d_materialise_unique, which tries to move the old directory alias to the right place instead of erroring out. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -- Sami [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: One more corrupted fs crash in ext4_put_super 2014-10-09 20:15 ` Sami Liedes @ 2014-10-09 20:49 ` Darrick J. Wong 2014-10-09 21:28 ` A very similar crash on ext2 Sami Liedes 0 siblings, 1 reply; 15+ messages in thread From: Darrick J. Wong @ 2014-10-09 20:49 UTC (permalink / raw) To: Sami Liedes, linux-ext4 On Thu, Oct 09, 2014 at 11:15:41PM +0300, Sami Liedes wrote: > On Tue, Oct 07, 2014 at 11:56:43PM +0300, Sami Liedes wrote: > > Here's one more filesystem that causes a crash in ext4_put_super on > > 3.17 both with and without the two patches from this thread applied. > > Ok, I bisected a bit. FWIW. > > No crash on 3.16.4 + these two patches: > > 1c8944cbe1b ext4: add ext4_iget_normal() which is to be used for dir tree lookups > b65ad45743c ext4: don't orphan or truncate the boot loader inode > > Crash on 3.17 + the above two patches. > > The first commit that crashes on this test with the above patches: Yeah. There's a directory that's linked twice (inode 195). The subsequent FS walk loads the inode into memory twice (== i_count > 2). When you delete everything on the FS, the inode gets put on the in-memory orphan list but for whatever reason doesn't seem to get released via iput or something. This means it's still on the orphan list at umount time, which triggers the BUG. Worse yet, i_nlink is now 0... ...not clear what the appropriate course of action is here. The FS is corrupt and we need to scrape the mess off the machine. I guess you could -EIO earlier when you notice i_count > i_nlink? --D > > # first bad commit: [908790fa3b779d37365e6b28e3aa0f6e833020c3] dcache: d_splice_alias mustn't create directory aliases > > commit 908790fa3b779d37365e6b28e3aa0f6e833020c3 > Author: J. Bruce Fields <bfields@redhat.com> > Date: Mon Feb 17 17:58:42 2014 -0500 > > dcache: d_splice_alias mustn't create directory aliases > > Currently if d_splice_alias finds a directory with an alias that is not > IS_ROOT or not DCACHE_DISCONNECTED, it creates a duplicate directory. > > Duplicate directory dentries are unacceptable; it is better just to > error out. > > (In the case of a local filesystem the most likely case is filesystem > corruption: for example, perhaps two directories point to the same child > directory, and the other parent has already been found and cached.) > > Note that distributed filesystems may encounter this case in normal > operation if a remote host moves a directory to a location different > from the one we last cached in the dcache. For that reason, such > filesystems should instead use d_materialise_unique, which tries to move > the old directory alias to the right place instead of erroring out. > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > > -- > > Sami ^ permalink raw reply [flat|nested] 15+ messages in thread
* A very similar crash on ext2 2014-10-09 20:49 ` Darrick J. Wong @ 2014-10-09 21:28 ` Sami Liedes 2014-10-21 0:28 ` Darrick J. Wong 0 siblings, 1 reply; 15+ messages in thread From: Sami Liedes @ 2014-10-09 21:28 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-ext4 [-- Attachment #1: Type: text/plain, Size: 6815 bytes --] On Thu, Oct 09, 2014 at 01:49:13PM -0700, Darrick J. Wong wrote: > Yeah. There's a directory that's linked twice (inode 195). The subsequent FS > walk loads the inode into memory twice (== i_count > 2). When you delete > everything on the FS, the inode gets put on the in-memory orphan list but for > whatever reason doesn't seem to get released via iput or something. This means > it's still on the orphan list at umount time, which triggers the BUG. Worse > yet, i_nlink is now 0... > > ...not clear what the appropriate course of action is here. The FS is corrupt > and we need to scrape the mess off the machine. I guess you could -EIO earlier > when you notice i_count > i_nlink? I don't know if this is exactly the same bug, but I'm also seeing a similar crash on ext2 which also bisected to this exact same commit (908790fa3b). The symptoms are a bit different, though; first a VFS warning about busy inodes after unmount, then shortly after that a crash. Pristine fs: http://www.niksula.hut.fi/~sliedes/ext2/testimg.ext2.bz2 Broken fs: http://www.niksula.hut.fi/~sliedes/ext2/testimg.ext2.449.min.bz2 Diff: --- /dev/fd/63 2014-10-10 00:20:59.562913594 +0300 +++ /dev/fd/62 2014-10-10 00:20:59.562913594 +0300 @@ -9785,6 +9785,8 @@ 0080a8f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 |................| 0080a900 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| * +0080ac20 ff ff ff ff ff ff ff ff ff ff ff fd ff ff ff ff |................| +0080ac30 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 0080ac40 ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0080ac50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * Backtrace: [ 1.422976] VFS: Busy inodes after unmount of vdb. Self-destruct in 5 seconds. Have a nice day... [ 1.857020] BUG: unable to handle kernel NULL pointer dereference at 0000000000000197 [ 1.858178] IP: [<ffffffff810a0859>] __lock_acquire.isra.31+0x199/0xd70 [ 1.859047] PGD 633a067 PUD 5171067 PMD 0 [ 1.859524] Oops: 0002 [#1] SMP [ 1.859842] CPU: 0 PID: 59 Comm: kworker/u2:1 Not tainted 3.16.0+ #94 [ 1.860068] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1.860068] Workqueue: writeback bdi_writeback_workfn (flush-254:16) [ 1.860068] task: ffff8800060f2060 ti: ffff880006104000 task.ti: ffff880006104000 [ 1.860068] RIP: 0010:[<ffffffff810a0859>] [<ffffffff810a0859>] __lock_acquire.isra.31+0x199/0xd70 [ 1.860068] RSP: 0018:ffff880006107b28 EFLAGS: 00010086 [ 1.860068] RAX: 0000000000000000 RBX: ffff8800060f2060 RCX: 0000000000000001 [ 1.860068] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8800051cb0c8 [ 1.860068] RBP: ffff880006107b90 R08: 0000000000000000 R09: 0000000000000000 [ 1.860068] R10: ffff8800051cb0c8 R11: 0000000000000003 R12: 0000000000000001 [ 1.860068] R13: 0000000000000001 R14: ffffffffffffffff R15: 0000000000000000 [ 1.860068] FS: 0000000000000000(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000 [ 1.860068] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1.860068] CR2: 0000000000000197 CR3: 000000000517c000 CR4: 00000000000006b0 [ 1.860068] Stack: [ 1.860068] ffff880006107b88 ffff8800060f2770 ffffffff81170027 0000000000000096 [ 1.860068] 0000000000000000 0000000000000000 ffff8800060f2770 000000000000003d [ 1.860068] 0000000000000286 0000000000000000 0000000000000001 0000000000000001 [ 1.860068] Call Trace: [ 1.860068] [<ffffffff81170027>] ? SyS_sysfs+0xf7/0x1e0 [ 1.860068] [<ffffffff810a1c46>] lock_acquire+0x96/0x130 [ 1.860068] [<ffffffff81152aaf>] ? grab_super_passive+0x3f/0x90 [ 1.860068] [<ffffffff8109e079>] down_read_trylock+0x59/0x60 [ 1.860068] [<ffffffff81152aaf>] ? grab_super_passive+0x3f/0x90 [ 1.860068] [<ffffffff81152aaf>] grab_super_passive+0x3f/0x90 [ 1.860068] [<ffffffff8117c837>] __writeback_inodes_wb+0x57/0xd0 [ 1.860068] [<ffffffff8117caeb>] wb_writeback+0x23b/0x320 [ 1.860068] [<ffffffff8117ceed>] bdi_writeback_workfn+0x1cd/0x470 [ 1.860068] [<ffffffff8107bf90>] process_one_work+0x1c0/0x580 [ 1.860068] [<ffffffff8107bf27>] ? process_one_work+0x157/0x580 [ 1.860068] [<ffffffff8107c3b3>] worker_thread+0x63/0x540 [ 1.860068] [<ffffffff8107c350>] ? process_one_work+0x580/0x580 [ 1.860068] [<ffffffff81081b81>] kthread+0xf1/0x110 [ 1.860068] [<ffffffff81081a90>] ? __kthread_parkme+0x70/0x70 [ 1.860068] [<ffffffff81850f2c>] ret_from_fork+0x7c/0xb0 [ 1.860068] [<ffffffff81081a90>] ? __kthread_parkme+0x70/0x70 [ 1.860068] Code: 0b 00 00 48 c7 c7 25 cd c8 81 31 c0 e8 31 4a fc ff eb a7 0f 1f 80 00 00 00 00 44 89 f8 4d 8b 74 c2 08 4d 85 f6 0f 84 c2 fe ff ff <3e> 41 ff 86 98 01 00 00 8b 05 f1 57 96 01 44 8b bb 90 06 00 00 [ 1.860068] RIP [<ffffffff810a0859>] __lock_acquire.isra.31+0x199/0xd70 [ 1.860068] RSP <ffff880006107b28> [ 1.860068] CR2: 0000000000000197 [ 1.860068] ---[ end trace 3d3d835bcb59d5fe ]--- [ 1.860068] Kernel panic - not syncing: Fatal exception [ 1.860068] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 1.860068] Rebooting in 1 seconds.. Sami > > > > # first bad commit: [908790fa3b779d37365e6b28e3aa0f6e833020c3] dcache: d_splice_alias mustn't create directory aliases > > > > commit 908790fa3b779d37365e6b28e3aa0f6e833020c3 > > Author: J. Bruce Fields <bfields@redhat.com> > > Date: Mon Feb 17 17:58:42 2014 -0500 > > > > dcache: d_splice_alias mustn't create directory aliases > > > > Currently if d_splice_alias finds a directory with an alias that is not > > IS_ROOT or not DCACHE_DISCONNECTED, it creates a duplicate directory. > > > > Duplicate directory dentries are unacceptable; it is better just to > > error out. > > > > (In the case of a local filesystem the most likely case is filesystem > > corruption: for example, perhaps two directories point to the same child > > directory, and the other parent has already been found and cached.) > > > > Note that distributed filesystems may encounter this case in normal > > operation if a remote host moves a directory to a location different > > from the one we last cached in the dcache. For that reason, such > > filesystems should instead use d_materialise_unique, which tries to move > > the old directory alias to the right place instead of erroring out. > > > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > > > > -- > > > > Sami > > [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: A very similar crash on ext2 2014-10-09 21:28 ` A very similar crash on ext2 Sami Liedes @ 2014-10-21 0:28 ` Darrick J. Wong 0 siblings, 0 replies; 15+ messages in thread From: Darrick J. Wong @ 2014-10-21 0:28 UTC (permalink / raw) To: Sami Liedes, linux-ext4 On Fri, Oct 10, 2014 at 12:28:02AM +0300, Sami Liedes wrote: > On Thu, Oct 09, 2014 at 01:49:13PM -0700, Darrick J. Wong wrote: > > Yeah. There's a directory that's linked twice (inode 195). The subsequent FS > > walk loads the inode into memory twice (== i_count > 2). When you delete > > everything on the FS, the inode gets put on the in-memory orphan list but for > > whatever reason doesn't seem to get released via iput or something. This means > > it's still on the orphan list at umount time, which triggers the BUG. Worse > > yet, i_nlink is now 0... > > > > ...not clear what the appropriate course of action is here. The FS is corrupt > > and we need to scrape the mess off the machine. I guess you could -EIO earlier > > when you notice i_count > i_nlink? > > I don't know if this is exactly the same bug, but I'm also seeing a > similar crash on ext2 which also bisected to this exact same commit > (908790fa3b). The symptoms are a bit different, though; first a VFS > warning about busy inodes after unmount, then shortly after that a > crash. ext4 spits up that crash message on umount because it thinks the orphan list is messed up... but seems to avoid blowing up. ext2 doesn't know what an orphan list is, so it goes straight to the VFS warning and then blows up later, probably because it tries to do something with the (now torn down) ext2 sb. <shrug> I had a patch that would detect rmdir of multiply linked dirs, but I think we ought to catch that sooner, if possible. --D > Pristine fs: http://www.niksula.hut.fi/~sliedes/ext2/testimg.ext2.bz2 > > Broken fs: http://www.niksula.hut.fi/~sliedes/ext2/testimg.ext2.449.min.bz2 > > Diff: > > --- /dev/fd/63 2014-10-10 00:20:59.562913594 +0300 > +++ /dev/fd/62 2014-10-10 00:20:59.562913594 +0300 > @@ -9785,6 +9785,8 @@ > 0080a8f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 |................| > 0080a900 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > * > +0080ac20 ff ff ff ff ff ff ff ff ff ff ff fd ff ff ff ff |................| > +0080ac30 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > 0080ac40 ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > 0080ac50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > > Backtrace: > > [ 1.422976] VFS: Busy inodes after unmount of vdb. Self-destruct in 5 seconds. Have a nice day... > [ 1.857020] BUG: unable to handle kernel NULL pointer dereference at 0000000000000197 > [ 1.858178] IP: [<ffffffff810a0859>] __lock_acquire.isra.31+0x199/0xd70 > [ 1.859047] PGD 633a067 PUD 5171067 PMD 0 > [ 1.859524] Oops: 0002 [#1] SMP > [ 1.859842] CPU: 0 PID: 59 Comm: kworker/u2:1 Not tainted 3.16.0+ #94 > [ 1.860068] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 > [ 1.860068] Workqueue: writeback bdi_writeback_workfn (flush-254:16) > [ 1.860068] task: ffff8800060f2060 ti: ffff880006104000 task.ti: ffff880006104000 > [ 1.860068] RIP: 0010:[<ffffffff810a0859>] [<ffffffff810a0859>] __lock_acquire.isra.31+0x199/0xd70 > [ 1.860068] RSP: 0018:ffff880006107b28 EFLAGS: 00010086 > [ 1.860068] RAX: 0000000000000000 RBX: ffff8800060f2060 RCX: 0000000000000001 > [ 1.860068] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8800051cb0c8 > [ 1.860068] RBP: ffff880006107b90 R08: 0000000000000000 R09: 0000000000000000 > [ 1.860068] R10: ffff8800051cb0c8 R11: 0000000000000003 R12: 0000000000000001 > [ 1.860068] R13: 0000000000000001 R14: ffffffffffffffff R15: 0000000000000000 > [ 1.860068] FS: 0000000000000000(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000 > [ 1.860068] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 1.860068] CR2: 0000000000000197 CR3: 000000000517c000 CR4: 00000000000006b0 > [ 1.860068] Stack: > [ 1.860068] ffff880006107b88 ffff8800060f2770 ffffffff81170027 0000000000000096 > [ 1.860068] 0000000000000000 0000000000000000 ffff8800060f2770 000000000000003d > [ 1.860068] 0000000000000286 0000000000000000 0000000000000001 0000000000000001 > [ 1.860068] Call Trace: > [ 1.860068] [<ffffffff81170027>] ? SyS_sysfs+0xf7/0x1e0 > [ 1.860068] [<ffffffff810a1c46>] lock_acquire+0x96/0x130 > [ 1.860068] [<ffffffff81152aaf>] ? grab_super_passive+0x3f/0x90 > [ 1.860068] [<ffffffff8109e079>] down_read_trylock+0x59/0x60 > [ 1.860068] [<ffffffff81152aaf>] ? grab_super_passive+0x3f/0x90 > [ 1.860068] [<ffffffff81152aaf>] grab_super_passive+0x3f/0x90 > [ 1.860068] [<ffffffff8117c837>] __writeback_inodes_wb+0x57/0xd0 > [ 1.860068] [<ffffffff8117caeb>] wb_writeback+0x23b/0x320 > [ 1.860068] [<ffffffff8117ceed>] bdi_writeback_workfn+0x1cd/0x470 > [ 1.860068] [<ffffffff8107bf90>] process_one_work+0x1c0/0x580 > [ 1.860068] [<ffffffff8107bf27>] ? process_one_work+0x157/0x580 > [ 1.860068] [<ffffffff8107c3b3>] worker_thread+0x63/0x540 > [ 1.860068] [<ffffffff8107c350>] ? process_one_work+0x580/0x580 > [ 1.860068] [<ffffffff81081b81>] kthread+0xf1/0x110 > [ 1.860068] [<ffffffff81081a90>] ? __kthread_parkme+0x70/0x70 > [ 1.860068] [<ffffffff81850f2c>] ret_from_fork+0x7c/0xb0 > [ 1.860068] [<ffffffff81081a90>] ? __kthread_parkme+0x70/0x70 > [ 1.860068] Code: 0b 00 00 48 c7 c7 25 cd c8 81 31 c0 e8 31 4a fc ff eb a7 0f 1f 80 00 00 00 00 44 89 f8 4d 8b 74 c2 08 4d 85 f6 0f 84 c2 fe ff ff <3e> 41 ff 86 98 01 00 00 8b 05 f1 57 96 01 44 8b bb 90 06 00 00 > [ 1.860068] RIP [<ffffffff810a0859>] __lock_acquire.isra.31+0x199/0xd70 > [ 1.860068] RSP <ffff880006107b28> > [ 1.860068] CR2: 0000000000000197 > [ 1.860068] ---[ end trace 3d3d835bcb59d5fe ]--- > [ 1.860068] Kernel panic - not syncing: Fatal exception > [ 1.860068] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) > [ 1.860068] Rebooting in 1 seconds.. > > Sami > > > > > > > > # first bad commit: [908790fa3b779d37365e6b28e3aa0f6e833020c3] dcache: d_splice_alias mustn't create directory aliases > > > > > > commit 908790fa3b779d37365e6b28e3aa0f6e833020c3 > > > Author: J. Bruce Fields <bfields@redhat.com> > > > Date: Mon Feb 17 17:58:42 2014 -0500 > > > > > > dcache: d_splice_alias mustn't create directory aliases > > > > > > Currently if d_splice_alias finds a directory with an alias that is not > > > IS_ROOT or not DCACHE_DISCONNECTED, it creates a duplicate directory. > > > > > > Duplicate directory dentries are unacceptable; it is better just to > > > error out. > > > > > > (In the case of a local filesystem the most likely case is filesystem > > > corruption: for example, perhaps two directories point to the same child > > > directory, and the other parent has already been found and cached.) > > > > > > Note that distributed filesystems may encounter this case in normal > > > operation if a remote host moves a directory to a location different > > > from the one we last cached in the dcache. For that reason, such > > > filesystems should instead use d_materialise_unique, which tries to move > > > the old directory alias to the right place instead of erroring out. > > > > > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > > > > > > -- > > > > > > Sami > > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-10-21 0:28 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-05 0:12 Intentionally corrupted ext4s causing two different kernel panics at umount Sami Liedes 2014-10-06 2:48 ` [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode Theodore Ts'o 2014-10-06 2:48 ` [PATCH 2/2] ext4: add ext4_iget_normal() which is to be used for dir tree lookups Theodore Ts'o 2014-10-06 2:52 ` Andreas Dilger 2014-10-06 3:16 ` Theodore Ts'o 2014-10-06 15:09 ` Jan Kara 2014-10-06 18:55 ` Theodore Ts'o 2014-10-06 15:06 ` [PATCH 1/2] ext4: don't orphan or truncate the boot loader inode Jan Kara 2014-10-07 20:56 ` One more corrupted fs crash in ext4_put_super Sami Liedes 2014-10-07 21:57 ` Darrick J. Wong 2014-10-07 22:22 ` Darrick J. Wong 2014-10-09 20:15 ` Sami Liedes 2014-10-09 20:49 ` Darrick J. Wong 2014-10-09 21:28 ` A very similar crash on ext2 Sami Liedes 2014-10-21 0:28 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).