* 2.6.35-r5 ext3 corruptions @ 2010-07-15 10:57 Dave Chinner 2010-07-15 11:26 ` Dave Chinner ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Dave Chinner @ 2010-07-15 10:57 UTC (permalink / raw) To: linux-ext4; +Cc: linux-kernel Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in repeated errors on the root drive of a test VM: { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 [ 1532.370859] Aborting journal on device sda1. [ 1532.376957] EXT3-fs (sda1): [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected aborted journal [ 1532.376980] EXT3-fs (sda1): error: remounting filesystem read-only [ 1532.420361] error: remounting filesystem read-only [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 The filesysetm is a mess when checked on reboot - lots of illegal references to blocks, multiply linked blocks, etc, but repairs. Files are lots, truncated, etc, so there is visible filesystem damage. I did lots of testing on 2.6.35-rc3 and came across no problems; problems only seemed to start with 2.6.35-rc5, and I've repろoduced the problem on a vanilla 2.6.35-rc4. The problem seems to occur randomly - sometimes during boot or when idle after boot, sometimes a while after boot. I haven't done any digging at all for the cause - all I've done so far is confirm that it is reproducable and it's not my code causing the problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-15 10:57 2.6.35-r5 ext3 corruptions Dave Chinner @ 2010-07-15 11:26 ` Dave Chinner 2010-07-15 18:23 ` Josef Bacik ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Dave Chinner @ 2010-07-15 11:26 UTC (permalink / raw) To: linux-ext4; +Cc: linux-kernel On Thu, Jul 15, 2010 at 08:57:45PM +1000, Dave Chinner wrote: > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in > repeated errors on the root drive of a test VM: > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > [ 1532.370859] Aborting journal on device sda1. > [ 1532.376957] EXT3-fs (sda1): > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected aborted journal > [ 1532.376980] EXT3-fs (sda1): error: remounting filesystem read-only > [ 1532.420361] error: remounting filesystem read-only > [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > The filesysetm is a mess when checked on reboot - lots of illegal > references to blocks, multiply linked blocks, etc, but repairs. > Files are lots, truncated, etc, so there is visible filesystem > damage. > > I did lots of testing on 2.6.35-rc3 and came across no problems; > problems only seemed to start with 2.6.35-rc5, and I've repろoduced > the problem on a vanilla 2.6.35-rc4. > > The problem seems to occur randomly - sometimes during boot or when > idle after boot, sometimes a while after boot. I haven't done any > digging at all for the cause - all I've done so far is confirm that > it is reproducable and it's not my code causing the problem. FWIW, a warning is trigging a few seconds after an error occurs: [ 1025.201140] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 [ 1025.203062] Aborting journal on device sda1. [ 1025.217894] EXT3-fs (sda1): error: remounting filesystem read-only [ 1025.271198] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 [ 1039.116558] ------------[ cut here ]------------ [ 1039.117192] WARNING: at fs/ext3/inode.c:1534 ext3_ordered_writepage+0x213/0x230() [ 1039.120544] Hardware name: Bochs [ 1039.121036] Modules linked in: [last unloaded: scsi_wait_scan] [ 1039.122103] Pid: 1838, comm: flush-8:0 Not tainted 2.6.35-rc5-dgc+ #34 [ 1039.122837] Call Trace: [ 1039.123320] [<ffffffff8107de0f>] warn_slowpath_common+0x7f/0xc0 [ 1039.123892] [<ffffffff8107de6a>] warn_slowpath_null+0x1a/0x20 [ 1039.124461] [<ffffffff811dc4d3>] ext3_ordered_writepage+0x213/0x230 [ 1039.125088] [<ffffffff81114c6a>] __writepage+0x1a/0x50 [ 1039.125652] [<ffffffff81115a47>] write_cache_pages+0x1f7/0x410 [ 1039.126233] [<ffffffff81114c50>] ? __writepage+0x0/0x50 [ 1039.126796] [<ffffffff8107303b>] ? cpuacct_charge+0x9b/0xb0 [ 1039.127371] [<ffffffff81072fc2>] ? cpuacct_charge+0x22/0xb0 [ 1039.127947] [<ffffffff8105ed38>] ? pvclock_clocksource_read+0x58/0xd0 [ 1039.128574] [<ffffffff81115c87>] generic_writepages+0x27/0x30 [ 1039.129146] [<ffffffff81115cc5>] do_writepages+0x35/0x40 [ 1039.129709] [<ffffffff81171704>] writeback_single_inode+0xe4/0x3e0 [ 1039.130290] [<ffffffff81171f29>] writeback_sb_inodes+0x199/0x2a0 [ 1039.130869] [<ffffffff81172756>] writeback_inodes_wb+0x76/0x1a0 [ 1039.131444] [<ffffffff81172acb>] wb_writeback+0x24b/0x2b0 [ 1039.132001] [<ffffffff81172cad>] wb_do_writeback+0x17d/0x190 [ 1039.132597] [<ffffffff81172d17>] bdi_writeback_task+0x57/0x160 [ 1039.133200] [<ffffffff8109d1a7>] ? bit_waitqueue+0x17/0xc0 [ 1039.133771] [<ffffffff81125200>] ? bdi_start_fn+0x0/0x100 [ 1039.134327] [<ffffffff81125286>] bdi_start_fn+0x86/0x100 [ 1039.134876] [<ffffffff81125200>] ? bdi_start_fn+0x0/0x100 [ 1039.135435] [<ffffffff8109cdb6>] kthread+0x96/0xa0 [ 1039.135970] [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10 [ 1039.136575] [<ffffffff817a5a90>] ? restore_args+0x0/0x30 [ 1039.137128] [<ffffffff8109cd20>] ? kthread+0x0/0xa0 [ 1039.137701] [<ffffffff81035de0>] ? kernel_thread_helper+0x0/0x10 [ 1039.138272] ---[ end trace 689f32ae8f9a7104 ]--- Of interest is that it is the same inode number that it tripped over. It's always been inode numbers in the ~211000 range that have been reported. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-15 10:57 2.6.35-r5 ext3 corruptions Dave Chinner 2010-07-15 11:26 ` Dave Chinner @ 2010-07-15 18:23 ` Josef Bacik 2010-07-15 20:14 ` Johannes Hirte 2010-07-19 22:45 ` Dave Chinner 3 siblings, 0 replies; 13+ messages in thread From: Josef Bacik @ 2010-07-15 18:23 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-ext4, linux-kernel On Thu, Jul 15, 2010 at 08:57:45PM +1000, Dave Chinner wrote: > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in > repeated errors on the root drive of a test VM: > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > [ 1532.370859] Aborting journal on device sda1. > [ 1532.376957] EXT3-fs (sda1): > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected aborted journal > [ 1532.376980] EXT3-fs (sda1): error: remounting filesystem read-only > [ 1532.420361] error: remounting filesystem read-only > [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > The filesysetm is a mess when checked on reboot - lots of illegal > references to blocks, multiply linked blocks, etc, but repairs. > Files are lots, truncated, etc, so there is visible filesystem > damage. > > I did lots of testing on 2.6.35-rc3 and came across no problems; > problems only seemed to start with 2.6.35-rc5, and I've repろoduced > the problem on a vanilla 2.6.35-rc4. > > The problem seems to occur randomly - sometimes during boot or when > idle after boot, sometimes a while after boot. I haven't done any > digging at all for the cause - all I've done so far is confirm that > it is reproducable and it's not my code causing the problem. > All I see from 2.6.35-rc4 thats changed is some writeback cleanups, nothing that jumps out at me as being horribly broken. Could you drop a dump_stack() in that "deleted inode referenced" message so I can see how we're getting here? The other stack trace is just because writeback started on a readonly fs, so it doesn't necessarily have anything to do with the original problem. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-15 10:57 2.6.35-r5 ext3 corruptions Dave Chinner 2010-07-15 11:26 ` Dave Chinner 2010-07-15 18:23 ` Josef Bacik @ 2010-07-15 20:14 ` Johannes Hirte 2010-07-19 22:45 ` Dave Chinner 3 siblings, 0 replies; 13+ messages in thread From: Johannes Hirte @ 2010-07-15 20:14 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-ext4, linux-kernel Am Donnerstag 15 Juli 2010, 12:57:45 schrieb Dave Chinner: > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in > repeated errors on the root drive of a test VM: > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode > referenced: 211043 [ 1532.370859] Aborting journal on device sda1. > [ 1532.376957] EXT3-fs (sda1): > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected > aborted journal [ 1532.376980] EXT3-fs (sda1): error: remounting > filesystem read-only [ 1532.420361] error: remounting filesystem read-only > [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode > referenced: 211043 > > The filesysetm is a mess when checked on reboot - lots of illegal > references to blocks, multiply linked blocks, etc, but repairs. > Files are lots, truncated, etc, so there is visible filesystem > damage. > > I did lots of testing on 2.6.35-rc3 and came across no problems; > problems only seemed to start with 2.6.35-rc5, and I've repろoduced > the problem on a vanilla 2.6.35-rc4. > > The problem seems to occur randomly - sometimes during boot or when > idle after boot, sometimes a while after boot. I haven't done any > digging at all for the cause - all I've done so far is confirm that > it is reproducable and it's not my code causing the problem. This sounds like the errors I've encountered with btrfs and XFS: http://lkml.org/lkml/2010/7/8/181 I'm not sure, but it's quite possible that this started with the change from 2.6.35-rc3 to 2.6.35-rc4 . regards, Johannes ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-15 10:57 2.6.35-r5 ext3 corruptions Dave Chinner ` (2 preceding siblings ...) 2010-07-15 20:14 ` Johannes Hirte @ 2010-07-19 22:45 ` Dave Chinner 2010-07-21 6:32 ` Dave Chinner 3 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2010-07-19 22:45 UTC (permalink / raw) To: linux-ext4; +Cc: linux-kernel On Thu, Jul 15, 2010 at 08:57:45PM +1000, Dave Chinner wrote: > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in > repeated errors on the root drive of a test VM: > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > [ 1532.370859] Aborting journal on device sda1. > [ 1532.376957] EXT3-fs (sda1): > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected aborted journal > [ 1532.376980] EXT3-fs (sda1): error: remounting filesystem read-only > [ 1532.420361] error: remounting filesystem read-only > [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > The filesysetm is a mess when checked on reboot - lots of illegal > references to blocks, multiply linked blocks, etc, but repairs. > Files are lots, truncated, etc, so there is visible filesystem > damage. > > I did lots of testing on 2.6.35-rc3 and came across no problems; > problems only seemed to start with 2.6.35-rc5, and I've reproduced > the problem on a vanilla 2.6.35-rc4. > > The problem seems to occur randomly - sometimes during boot or when > idle after boot, sometimes a while after boot. I haven't done any > digging at all for the cause - all I've done so far is confirm that > it is reproducable and it's not my code causing the problem. Looks like this problem was isolated to a single VM and root filesystem. I could not reproduce it on anything other than the one filesystem that was failing. Unfortunately, I had a fat-fingered moment and backed up the wrong filesystem image at the outset. So after I smashed the original filesystem into oblivion (one failure lead to half the filesystem in lost+found), I had nothing to restore from to continue testing. So I re-imaged the root filesystem and the problem has not occurred despite trying for more than a day. When it was bad, it didn't take more than a few minutes of activity to reproduce. Hence I can only conclude there was something wrong with the filesystem itself that wasn't being detected, not some more generic problem.... I'll go add this to the bugzilla and close it down. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-19 22:45 ` Dave Chinner @ 2010-07-21 6:32 ` Dave Chinner 2010-07-21 11:07 ` Török Edwin ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Dave Chinner @ 2010-07-21 6:32 UTC (permalink / raw) To: linux-ext4; +Cc: linux-kernel On Tue, Jul 20, 2010 at 08:45:12AM +1000, Dave Chinner wrote: > On Thu, Jul 15, 2010 at 08:57:45PM +1000, Dave Chinner wrote: > > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in > > repeated errors on the root drive of a test VM: > > > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > [ 1532.370859] Aborting journal on device sda1. > > [ 1532.376957] EXT3-fs (sda1): > > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected aborted journal > > [ 1532.376980] EXT3-fs (sda1): error: remounting filesystem read-only > > [ 1532.420361] error: remounting filesystem read-only > > [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > > > The filesysetm is a mess when checked on reboot - lots of illegal > > references to blocks, multiply linked blocks, etc, but repairs. > > Files are lots, truncated, etc, so there is visible filesystem > > damage. > > > > I did lots of testing on 2.6.35-rc3 and came across no problems; > > problems only seemed to start with 2.6.35-rc5, and I've reproduced > > the problem on a vanilla 2.6.35-rc4. > > > > The problem seems to occur randomly - sometimes during boot or when > > idle after boot, sometimes a while after boot. I haven't done any > > digging at all for the cause - all I've done so far is confirm that > > it is reproducable and it's not my code causing the problem. > > Looks like this problem was isolated to a single VM and root > filesystem. I could not reproduce it on anything other than the > one filesystem that was failing. Ok, so now I know *why* that one filesystem got busted - I built a kernel without CONFIG_EXT3_DEFAULTS_TO_ORDERED set and it got a forced reboot (echo b > proc/sysrq-trigger). That'll teach me for trying to reproduce bugs Andrew is tripping over with his config files. Quite frankly, data=writeback mode for ext3 is a dangerous, dangerous configuration to run by default. IMO, it shouldn't be the default. Patch below. Cheers, Dave. -- Dave Chinner david@fromorbit.com ext3: default to ordered mode From: Dave Chinner <dchinner@redhat.com> data=writeback mode is dangerous and is leads to filesystem corruption, data loss and stale data exposure when systems crash. It should not be the default, especially when all major distros ensure their ext3 filesystems default to ordered mode. Change the default mode to the safer data=ordered mode, because we should be caring far more about avoiding corruption than performance. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/ext3/Kconfig | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/ext3/Kconfig b/fs/ext3/Kconfig index 522b154..e8c6ba0 100644 --- a/fs/ext3/Kconfig +++ b/fs/ext3/Kconfig @@ -31,6 +31,7 @@ config EXT3_FS config EXT3_DEFAULTS_TO_ORDERED bool "Default to 'data=ordered' in ext3" depends on EXT3_FS + default y help The journal mode options for ext3 have different tradeoffs between when data is guaranteed to be on disk and ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-21 6:32 ` Dave Chinner @ 2010-07-21 11:07 ` Török Edwin 2010-07-21 14:01 ` Eric Sandeen 2010-07-21 13:55 ` Eric Sandeen 2010-07-23 10:43 ` Jan Kara 2 siblings, 1 reply; 13+ messages in thread From: Török Edwin @ 2010-07-21 11:07 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-ext4, linux-kernel On Wed, 21 Jul 2010 16:32:22 +1000 Dave Chinner <david@fromorbit.com> wrote: > On Tue, Jul 20, 2010 at 08:45:12AM +1000, Dave Chinner wrote: > > On Thu, Jul 15, 2010 at 08:57:45PM +1000, Dave Chinner wrote: > > > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting > > > in repeated errors on the root drive of a test VM: > > > > > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted > > > inode referenced: 211043 [ 1532.370859] Aborting journal on > > > device sda1. [ 1532.376957] EXT3-fs (sda1): > > > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: > > > Detected aborted journal [ 1532.376980] EXT3-fs (sda1): error: > > > remounting filesystem read-only [ 1532.420361] error: remounting > > > filesystem read-only [ 1532.621209] EXT3-fs error (device sda1): > > > ext3_lookup: deleted inode referenced: 211043 > > > > > > The filesysetm is a mess when checked on reboot - lots of illegal > > > references to blocks, multiply linked blocks, etc, but repairs. > > > Files are lots, truncated, etc, so there is visible filesystem > > > damage. > > > > > > I did lots of testing on 2.6.35-rc3 and came across no problems; > > > problems only seemed to start with 2.6.35-rc5, and I've reproduced > > > the problem on a vanilla 2.6.35-rc4. > > > > > > The problem seems to occur randomly - sometimes during boot or > > > when idle after boot, sometimes a while after boot. I haven't > > > done any digging at all for the cause - all I've done so far is > > > confirm that it is reproducable and it's not my code causing the > > > problem. > > > > Looks like this problem was isolated to a single VM and root > > filesystem. I could not reproduce it on anything other than the > > one filesystem that was failing. > > Ok, so now I know *why* that one filesystem got busted - I built a > kernel without CONFIG_EXT3_DEFAULTS_TO_ORDERED set and it got a > forced reboot (echo b > proc/sysrq-trigger). That'll teach me for > trying to reproduce bugs Andrew is tripping over with his config > files. > > Quite frankly, data=writeback mode for ext3 is a dangerous, > dangerous configuration to run by default. IMO, it shouldn't be the > default. Patch below. Hi, I don't see CONFIG_EXT3_DEFAULTS_TO_ORDERED in my .config at all. What I have in my .config is: CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y CONFIG_EXT4_FS_XATTR=y So what is the equivalent of that config option for ext4 used as ext3 driver? Best regards, --Edwin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-21 11:07 ` Török Edwin @ 2010-07-21 14:01 ` Eric Sandeen 0 siblings, 0 replies; 13+ messages in thread From: Eric Sandeen @ 2010-07-21 14:01 UTC (permalink / raw) To: Török Edwin; +Cc: Dave Chinner, linux-ext4, linux-kernel Török Edwin wrote: > On Wed, 21 Jul 2010 16:32:22 +1000 > Dave Chinner <david@fromorbit.com> wrote: ... >> Quite frankly, data=writeback mode for ext3 is a dangerous, >> dangerous configuration to run by default. IMO, it shouldn't be the >> default. Patch below. > > Hi, > > I don't see CONFIG_EXT3_DEFAULTS_TO_ORDERED in my .config at all. > What I have in my .config is: > CONFIG_EXT4_FS=y > CONFIG_EXT4_USE_FOR_EXT23=y > CONFIG_EXT4_FS_XATTR=y > > So what is the equivalent of that config option for ext4 used as ext3 > driver? There is none AFAICT, just an artifact of the twisting option-paths of extN, I guess. :) Good news is you should get something sane, and -not- default to writeback with your config. -Eric > Best regards, > --Edwin -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-21 6:32 ` Dave Chinner 2010-07-21 11:07 ` Török Edwin @ 2010-07-21 13:55 ` Eric Sandeen 2010-07-23 10:43 ` Jan Kara 2 siblings, 0 replies; 13+ messages in thread From: Eric Sandeen @ 2010-07-21 13:55 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-ext4, linux-kernel, Jan Kara Dave Chinner wrote: > Ok, so now I know *why* that one filesystem got busted - I built a > kernel without CONFIG_EXT3_DEFAULTS_TO_ORDERED set and it got a > forced reboot (echo b > proc/sysrq-trigger). That'll teach me for > trying to reproduce bugs Andrew is tripping over with his config > files. > > Quite frankly, data=writeback mode for ext3 is a dangerous, > dangerous configuration to run by default. IMO, it shouldn't be the > default. Patch below. I agree, though I might just remove the config option altogether, it just obfuscates what's going on, IMHO. Still, as far as it goes, you can add: Acked-by: Eric Sandeen <sandeen@redhat.com> to the patch. -Eric > ext3: default to ordered mode > > From: Dave Chinner <dchinner@redhat.com> > > data=writeback mode is dangerous and is leads to filesystem > corruption, data loss and stale data exposure when systems crash. It > should not be the default, especially when all major distros ensure > their ext3 filesystems default to ordered mode. Change the default > mode to the safer data=ordered mode, because we should be caring > far more about avoiding corruption than performance. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/ext3/Kconfig | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/fs/ext3/Kconfig b/fs/ext3/Kconfig > index 522b154..e8c6ba0 100644 > --- a/fs/ext3/Kconfig > +++ b/fs/ext3/Kconfig > @@ -31,6 +31,7 @@ config EXT3_FS > config EXT3_DEFAULTS_TO_ORDERED > bool "Default to 'data=ordered' in ext3" > depends on EXT3_FS > + default y > help > The journal mode options for ext3 have different tradeoffs > between when data is guaranteed to be on disk and ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-21 6:32 ` Dave Chinner 2010-07-21 11:07 ` Török Edwin 2010-07-21 13:55 ` Eric Sandeen @ 2010-07-23 10:43 ` Jan Kara 2010-07-23 10:53 ` Jan Kara 2010-07-23 10:58 ` Christoph Hellwig 2 siblings, 2 replies; 13+ messages in thread From: Jan Kara @ 2010-07-23 10:43 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-ext4, linux-kernel > On Tue, Jul 20, 2010 at 08:45:12AM +1000, Dave Chinner wrote: > > On Thu, Jul 15, 2010 at 08:57:45PM +1000, Dave Chinner wrote: > > > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in > > > repeated errors on the root drive of a test VM: > > > > > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > > [ 1532.370859] Aborting journal on device sda1. > > > [ 1532.376957] EXT3-fs (sda1): > > > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected aborted journal > > > [ 1532.376980] EXT3-fs (sda1): error: remounting filesystem read-only > > > [ 1532.420361] error: remounting filesystem read-only > > > [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > > > > > The filesysetm is a mess when checked on reboot - lots of illegal > > > references to blocks, multiply linked blocks, etc, but repairs. > > > Files are lots, truncated, etc, so there is visible filesystem > > > damage. > > > > > > I did lots of testing on 2.6.35-rc3 and came across no problems; > > > problems only seemed to start with 2.6.35-rc5, and I've reproduced > > > the problem on a vanilla 2.6.35-rc4. > > > > > > The problem seems to occur randomly - sometimes during boot or when > > > idle after boot, sometimes a while after boot. I haven't done any > > > digging at all for the cause - all I've done so far is confirm that > > > it is reproducable and it's not my code causing the problem. > > > > Looks like this problem was isolated to a single VM and root > > filesystem. I could not reproduce it on anything other than the > > one filesystem that was failing. > > Ok, so now I know *why* that one filesystem got busted - I built a > kernel without CONFIG_EXT3_DEFAULTS_TO_ORDERED set and it got a > forced reboot (echo b > proc/sysrq-trigger). That'll teach me for > trying to reproduce bugs Andrew is tripping over with his config > files. > > Quite frankly, data=writeback mode for ext3 is a dangerous, > dangerous configuration to run by default. IMO, it shouldn't be the > default. Patch below. data=writeback can cause larger data loss and stale data exposure but it actually shouldn't cause filesystem corruption about which you write in the changelog below. I'd much rather attribute the metadata corruption to a missing barrier option or barrier support in the virtualization stack. But I guess it's hard to tell now. Anyways, I agree with you that data=ordered is a saner default so I'll push your change. Honza > -- > Dave Chinner > david@fromorbit.com > > ext3: default to ordered mode > > From: Dave Chinner <dchinner@redhat.com> > > data=writeback mode is dangerous and is leads to filesystem > corruption, data loss and stale data exposure when systems crash. It > should not be the default, especially when all major distros ensure > their ext3 filesystems default to ordered mode. Change the default > mode to the safer data=ordered mode, because we should be caring > far more about avoiding corruption than performance. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/ext3/Kconfig | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/fs/ext3/Kconfig b/fs/ext3/Kconfig > index 522b154..e8c6ba0 100644 > --- a/fs/ext3/Kconfig > +++ b/fs/ext3/Kconfig > @@ -31,6 +31,7 @@ config EXT3_FS > config EXT3_DEFAULTS_TO_ORDERED > bool "Default to 'data=ordered' in ext3" > depends on EXT3_FS > + default y > help > The journal mode options for ext3 have different tradeoffs > between when data is guaranteed to be on disk and > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-23 10:43 ` Jan Kara @ 2010-07-23 10:53 ` Jan Kara 2010-07-23 10:55 ` Dave Chinner 2010-07-23 10:58 ` Christoph Hellwig 1 sibling, 1 reply; 13+ messages in thread From: Jan Kara @ 2010-07-23 10:53 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-ext4, linux-kernel > > On Tue, Jul 20, 2010 at 08:45:12AM +1000, Dave Chinner wrote: > > > On Thu, Jul 15, 2010 at 08:57:45PM +1000, Dave Chinner wrote: > > > > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in > > > > repeated errors on the root drive of a test VM: > > > > > > > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > > > [ 1532.370859] Aborting journal on device sda1. > > > > [ 1532.376957] EXT3-fs (sda1): > > > > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected aborted journal > > > > [ 1532.376980] EXT3-fs (sda1): error: remounting filesystem read-only > > > > [ 1532.420361] error: remounting filesystem read-only > > > > [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > > > > > > > The filesysetm is a mess when checked on reboot - lots of illegal > > > > references to blocks, multiply linked blocks, etc, but repairs. > > > > Files are lots, truncated, etc, so there is visible filesystem > > > > damage. > > > > > > > > I did lots of testing on 2.6.35-rc3 and came across no problems; > > > > problems only seemed to start with 2.6.35-rc5, and I've reproduced > > > > the problem on a vanilla 2.6.35-rc4. > > > > > > > > The problem seems to occur randomly - sometimes during boot or when > > > > idle after boot, sometimes a while after boot. I haven't done any > > > > digging at all for the cause - all I've done so far is confirm that > > > > it is reproducable and it's not my code causing the problem. > > > > > > Looks like this problem was isolated to a single VM and root > > > filesystem. I could not reproduce it on anything other than the > > > one filesystem that was failing. > > > > Ok, so now I know *why* that one filesystem got busted - I built a > > kernel without CONFIG_EXT3_DEFAULTS_TO_ORDERED set and it got a > > forced reboot (echo b > proc/sysrq-trigger). That'll teach me for > > trying to reproduce bugs Andrew is tripping over with his config > > files. > > > > Quite frankly, data=writeback mode for ext3 is a dangerous, > > dangerous configuration to run by default. IMO, it shouldn't be the > > default. Patch below. > data=writeback can cause larger data loss and stale data exposure but it > actually shouldn't cause filesystem corruption about which you write in the > changelog below. I'd much rather attribute the metadata corruption to a missing > barrier option or barrier support in the virtualization stack. But I guess it's > hard to tell now. > Anyways, I agree with you that data=ordered is a saner default so I'll > push your change. I've taken the change just updating the changelog to: ext3: default to ordered mode data=writeback mode is dangerous as it leads to higher data loss and stale data exposure when systems crash. It should not be the default, especially when all major distros ensure their ext3 filesystems default to ordered mode. Change the default mode to the safer data=ordered mode, because we should be caring far more about avoiding stale data exposure than performance. Do you agree Dave? Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-23 10:53 ` Jan Kara @ 2010-07-23 10:55 ` Dave Chinner 0 siblings, 0 replies; 13+ messages in thread From: Dave Chinner @ 2010-07-23 10:55 UTC (permalink / raw) To: Jan Kara; +Cc: linux-ext4, linux-kernel On Fri, Jul 23, 2010 at 12:53:26PM +0200, Jan Kara wrote: > I've taken the change just updating the changelog to: > ext3: default to ordered mode > > data=writeback mode is dangerous as it leads to higher data loss and stale data > exposure when systems crash. It should not be the default, especially when all > major distros ensure their ext3 filesystems default to ordered mode. Change the > default mode to the safer data=ordered mode, because we should be caring far > more about avoiding stale data exposure than performance. > > Do you agree Dave? Fine by me ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.35-r5 ext3 corruptions 2010-07-23 10:43 ` Jan Kara 2010-07-23 10:53 ` Jan Kara @ 2010-07-23 10:58 ` Christoph Hellwig 1 sibling, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2010-07-23 10:58 UTC (permalink / raw) To: Jan Kara; +Cc: Dave Chinner, linux-ext4, linux-kernel On Fri, Jul 23, 2010 at 12:43:14PM +0200, Jan Kara wrote: > changelog below. I'd much rather attribute the metadata corruption to a missing > barrier option or barrier support in the virtualization stack. But I guess it's > hard to tell now. Any recent qemu/kvm stack has perfectly working barrier support. Xen is quite broken in that respect, but I hope no one is using that anyway. But yes, with large write caches ext3 is rather broken due to the lack of barriers. Fortunately enough at least the enterprise enable it anyway these days. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-07-23 10:58 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-15 10:57 2.6.35-r5 ext3 corruptions Dave Chinner 2010-07-15 11:26 ` Dave Chinner 2010-07-15 18:23 ` Josef Bacik 2010-07-15 20:14 ` Johannes Hirte 2010-07-19 22:45 ` Dave Chinner 2010-07-21 6:32 ` Dave Chinner 2010-07-21 11:07 ` Török Edwin 2010-07-21 14:01 ` Eric Sandeen 2010-07-21 13:55 ` Eric Sandeen 2010-07-23 10:43 ` Jan Kara 2010-07-23 10:53 ` Jan Kara 2010-07-23 10:55 ` Dave Chinner 2010-07-23 10:58 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).