* [4.3-rc1, regression] ext2 vs ext3/ext4 fs probing issues @ 2015-09-21 7:07 Dave Chinner 2015-09-25 23:51 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2015-09-21 7:07 UTC (permalink / raw) To: linux-ext4; +Cc: linux-fsdevel Hi folks, The first of my test VMs that I upgraded to 4.3-rc1 from 4.2 has been behaving rather strangely w.r.t. boot hangs and ext3 filesystems. One the first cold boot of a new kernel, the boot appears to hang. What i've discovered (which took a long time thanks to the shitpile that is systemd) is that it appears to be doing a e2fsck on the root device, and that is failing resulting in systemd outputing: [FAILED] Failed to start File System Check on Root Device. systemd then goes to shutting down the system and reboot but that fail because it's still starting other stuff up and simultaneously shutting shit down tha the stuff starting up depends on. It fucks up badly, with teh last concole entries looking like it has hung waiting for lvm/md devices to appear. I can't tell you what the e2fsck failure is, because systemd oh so helpfully overwrites the console output from fsck with all the other shit that it is doing concurrently. it also fails to log it anywhere because this has happened before A) the console logging has been started and b) the the root fs is still only mounted RO at this point. Hence I have absolutely zero output apart from seeing the "checking xxx%" update on the bottom line of the screen occasionally before it is immediately overwritten by systemd logging something else. So, warm reboot the VM (via system_reset in the qemu console), and the system comes up. No fsck is run, but the filesystem is mounted as ext2, not ext3. What I see is this in dmesg: [ 2.322798] EXT2-fs (sda1): warning: mounting ext3 filesystem as ext2 [ 2.325123] VFS: Mounted root (ext2 filesystem) readonly on device 8:1. It's definitely an ext3 filesystem, but the interesting point is that it has a clean journal so can be mounted as ext2: $ sudo blkid /dev/sda1 /dev/sda1: UUID="b21615e5-fe8a-4ffc-ab80-c24cdc8b740a" SEC_TYPE="ext2" TYPE="ext3" PARTUUID="000efa91-01" $ sudo dumpe2fs -h /dev/sda1 | grep -i feature dumpe2fs 1.42.13 (17-May-2015) Filesystem features: has_journal ext_attr resize_inode dir_index filetype sparse_super large_file Journal features: journal_incompat_revoke $ So, I change the kernel config from: CONFIG_EXT2_FS=y # CONFIG_EXT4_USE_FOR_EXT2 is not set to # CONFIG_EXT2_FS is not set CONFIG_EXT4_USE_FOR_EXT2=y And what I see is this: [ 2.228894] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem [ 2.238832] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) Which tells me that there's a problem with fstype probe ordering regressions w.r.t ext2 and ext3 as a result of removing the ext3 module. It also doesn't fail fsck checks now, so boots successfully every time. I suspect the "boot hang" problem is that e2fsck sees a dirty journal, fixes everything and then asks for a reboot, which fails. So, suspecting this, I switched back to the original CONFIG_EXT2_FS=y build: [ 2.254213] EXT2-fs (sda1): error: couldn't mount because of unsupported optional features (4) [ 2.257076] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem [ 2.259712] EXT4-fs (sda1): INFO: recovery required on readonly filesystem [ 2.261740] EXT4-fs (sda1): write access will be enabled during recovery [ 2.336798] EXT4-fs (sda1): orphan cleanup on readonly fs [ 2.338492] EXT4-fs (sda1): 2 orphan inodes deleted [ 2.339922] EXT4-fs (sda1): recovery complete [ 2.346063] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 2.348349] VFS: Mounted root (ext3 filesystem) readonly on device 8:1. Which makes it pretty clean that if the journal was clean it would have mounted as an ext2 filesystem, not ext3. Basically, we need to ensure that the ext4 module probes filesystems before the ext2 module when CONFIG_EXT2_FS=y is set so that ext3 filesystems are correctly mounted.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [4.3-rc1, regression] ext2 vs ext3/ext4 fs probing issues 2015-09-21 7:07 [4.3-rc1, regression] ext2 vs ext3/ext4 fs probing issues Dave Chinner @ 2015-09-25 23:51 ` Dave Chinner 2015-09-27 23:14 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2015-09-25 23:51 UTC (permalink / raw) To: linux-ext4; +Cc: linux-fsdevel Ping? On Mon, Sep 21, 2015 at 05:07:15PM +1000, Dave Chinner wrote: > Hi folks, > > The first of my test VMs that I upgraded to 4.3-rc1 from 4.2 has > been behaving rather strangely w.r.t. boot hangs and ext3 > filesystems. > > One the first cold boot of a new kernel, the boot appears to hang. > What i've discovered (which took a long time thanks to the shitpile > that is systemd) is that it appears to be doing a e2fsck on the root > device, and that is failing resulting in systemd outputing: > > [FAILED] Failed to start File System Check on Root Device. > > systemd then goes to shutting down the system and reboot but that > fail because it's still starting other stuff up and simultaneously > shutting shit down tha the stuff starting up depends on. It fucks up > badly, with teh last concole entries looking like it has hung > waiting for lvm/md devices to appear. > > I can't tell you what the e2fsck failure is, because systemd oh so > helpfully overwrites the console output from fsck with all the other > shit that it is doing concurrently. it also fails to log it anywhere > because this has happened before A) the console logging has been > started and b) the the root fs is still only mounted RO at this > point. > > Hence I have absolutely zero output apart from seeing the "checking > xxx%" update on the bottom line of the screen occasionally before it > is immediately overwritten by systemd logging something else. > > So, warm reboot the VM (via system_reset in the qemu console), and > the system comes up. No fsck is run, but the filesystem is mounted > as ext2, not ext3. What I see is this in dmesg: > > [ 2.322798] EXT2-fs (sda1): warning: mounting ext3 filesystem as ext2 > [ 2.325123] VFS: Mounted root (ext2 filesystem) readonly on device 8:1. > > It's definitely an ext3 filesystem, but the interesting point is > that it has a clean journal so can be mounted as ext2: > > $ sudo blkid /dev/sda1 > /dev/sda1: UUID="b21615e5-fe8a-4ffc-ab80-c24cdc8b740a" SEC_TYPE="ext2" TYPE="ext3" PARTUUID="000efa91-01" > $ sudo dumpe2fs -h /dev/sda1 | grep -i feature > dumpe2fs 1.42.13 (17-May-2015) > Filesystem features: has_journal ext_attr resize_inode dir_index filetype sparse_super large_file > Journal features: journal_incompat_revoke > $ > > So, I change the kernel config from: > > CONFIG_EXT2_FS=y > # CONFIG_EXT4_USE_FOR_EXT2 is not set > > to > > # CONFIG_EXT2_FS is not set > CONFIG_EXT4_USE_FOR_EXT2=y > > And what I see is this: > > [ 2.228894] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem > [ 2.238832] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) > > Which tells me that there's a problem with fstype probe ordering > regressions w.r.t ext2 and ext3 as a result of removing the ext3 > module. It also doesn't fail fsck checks now, so boots successfully > every time. I suspect the "boot hang" problem is that e2fsck sees a > dirty journal, fixes everything and then asks for a reboot, which > fails. > > So, suspecting this, I switched back to the original > CONFIG_EXT2_FS=y build: > > [ 2.254213] EXT2-fs (sda1): error: couldn't mount because of unsupported optional features (4) > [ 2.257076] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem > [ 2.259712] EXT4-fs (sda1): INFO: recovery required on readonly filesystem > [ 2.261740] EXT4-fs (sda1): write access will be enabled during recovery > [ 2.336798] EXT4-fs (sda1): orphan cleanup on readonly fs > [ 2.338492] EXT4-fs (sda1): 2 orphan inodes deleted > [ 2.339922] EXT4-fs (sda1): recovery complete > [ 2.346063] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) > [ 2.348349] VFS: Mounted root (ext3 filesystem) readonly on device 8:1. > > Which makes it pretty clean that if the journal was clean it would > have mounted as an ext2 filesystem, not ext3. Basically, we need to > ensure that the ext4 module probes filesystems before the ext2 > module when CONFIG_EXT2_FS=y is set so that ext3 filesystems are > correctly mounted.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [4.3-rc1, regression] ext2 vs ext3/ext4 fs probing issues 2015-09-25 23:51 ` Dave Chinner @ 2015-09-27 23:14 ` Theodore Ts'o 2015-09-28 0:04 ` Dave Chinner 2015-09-28 17:36 ` Darrick J. Wong 0 siblings, 2 replies; 5+ messages in thread From: Theodore Ts'o @ 2015-09-27 23:14 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-ext4, linux-fsdevel On Sat, Sep 26, 2015 at 09:51:26AM +1000, Dave Chinner wrote: > Ping? Sorry, I somehow missed this the first time you posted this. > > Which tells me that there's a problem with fstype probe ordering > > regressions w.r.t ext2 and ext3 as a result of removing the ext3 > > module. It also doesn't fail fsck checks now, so boots successfully > > every time. I suspect the "boot hang" problem is that e2fsck sees a > > dirty journal, fixes everything and then asks for a reboot, which > > fails. The original probe ordering was: ext3, ext2, ext4. As you've correctly pointed out, this allowed us to preferentially use ext3 over ext2 even if the file system did not need a file system replay. We put ext4 at the end so that if there was an ext2-only root file system, we would use ext2 in preference over ext4. This was useful for backwards compatibility for certain ancient enterprise distributions, and/or when ext4 was still under development and we wanted to make sure for an ext2-only root, we would use ext2 if possible. When ext3 was removed, we were now left with the boot order: ext2, ext4. We could swap this around, but in that case, ext4 would *always* be used in preference to ext2, which not necessarily the best thing. Especially since in most cases most people will be using distro initrd's, which don't use the brute-force probe order in init/do_mounts.c. The main people who still use the kernel boot order tend to be us died-in-the stick developers who don't believe in initrd's (and probably wouldn't be using systemd unless our distro was forcing us to). Something which I'm thinking about doing --- but which is an awful hack --- is to switch the order so we probe ext4 first, but also add code in ext4's mount function which checks to see if (a) the pid is 1 in the root pid namespace, (b) the file system feature set is one that can be supported by the ext2 driver, and (c) the ext2 driver is available. In that case, we could fail the mount, so that in the case where we are doing the initial boot time probing, and the root file system is an ext2 file system, we properly use the ext2 file system if it's available. This should do what we want in all circumstances, but the question is whether I'd respect myself in the morning..... :-) - Ted P.S. Regarding the problem which triggered your investigation of the boot order: > > One the first cold boot of a new kernel, the boot appears to hang. > > What i've discovered (which took a long time thanks to the shitpile > > that is systemd) is that it appears to be doing a e2fsck on the root > > device, and that is failing resulting in systemd outputing: > > > > [FAILED] Failed to start File System Check on Root Device. Clearly both you and I don't have the same refined tastes as Lennart Poettering. :-) After all, instead of grepping through shell scripts, Lennart clearly prefers people to have to go diving through C source code to figure out what is going on.... I could be wrong, since the systemd sources is a twisty maze of C code, all different, but I *believe* that error is caused by the fact that for some reason, systemd wasn't able to start the executable /lib/systemd/systemd-fsck (The string "File System Check on Root Device" comes from the file /lib/systemd/system/systemd-fsck-root.service, and I'm pretty sure this means it wasn't able to start the ExecStart program, /lib/systemd/systemd-fsck). If /lib/systemd/systemd-fsck (a C program, why use a shell script when you can force hard-working programmres to have to comprehend someone else's C code) had managed to start fsck.ext[234] and it returned an error, you should have seen an explicit message about fsck failing with a specific error code and/or signal: if (status.si_code != CLD_EXITED || (status.si_status & ~1)) { if (status.si_code == CLD_KILLED || status.si_code == CLD_DUMPED) log_error("fsck terminated by signal %s.", signal_to_string(status.si_status)); else if (status.si_code == CLD_EXITED) log_error("fsck failed with error code %i.", status.si_status); else log_error("fsck failed due to unknown reason."); if (status.si_code == CLD_EXITED && (status.si_status & 2) && root_directory) /* System should be rebooted. */ start_target(SPECIAL_REBOOT_TARGET); else if (status.si_code == CLD_EXITED && (status.si_status & 6)) /* Some other problem */ start_target(SPECIAL_EMERGENCY_TARGET); else { r = EXIT_SUCCESS; log_warning("Ignoring error."); } And it *does* appear that if we had modified the root file system and had requested a reboot (at least from the version of systemd sources I examined) , it does appear that systemd should have immediately started rebooting the system, having printed the exit code (probably 3, i.e. FSCK_NONDESTRUCT | FSCK_REBOOT). So I think something else was going on here, but given the inability for systemd to save logs in this instance, I'm not sure we'll be able to figure out what was going on. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [4.3-rc1, regression] ext2 vs ext3/ext4 fs probing issues 2015-09-27 23:14 ` Theodore Ts'o @ 2015-09-28 0:04 ` Dave Chinner 2015-09-28 17:36 ` Darrick J. Wong 1 sibling, 0 replies; 5+ messages in thread From: Dave Chinner @ 2015-09-28 0:04 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4, linux-fsdevel On Sun, Sep 27, 2015 at 07:14:58PM -0400, Theodore Ts'o wrote: > On Sat, Sep 26, 2015 at 09:51:26AM +1000, Dave Chinner wrote: > > Ping? > > Sorry, I somehow missed this the first time you posted this. > > > > Which tells me that there's a problem with fstype probe ordering > > > regressions w.r.t ext2 and ext3 as a result of removing the ext3 > > > module. It also doesn't fail fsck checks now, so boots successfully > > > every time. I suspect the "boot hang" problem is that e2fsck sees a > > > dirty journal, fixes everything and then asks for a reboot, which > > > fails. > > The original probe ordering was: ext3, ext2, ext4. As you've > correctly pointed out, this allowed us to preferentially use ext3 over > ext2 even if the file system did not need a file system replay. We > put ext4 at the end so that if there was an ext2-only root file > system, we would use ext2 in preference over ext4. This was useful > for backwards compatibility for certain ancient enterprise > distributions, and/or when ext4 was still under development and we > wanted to make sure for an ext2-only root, we would use ext2 if > possible. > > When ext3 was removed, we were now left with the boot order: ext2, > ext4. Yup, that's what I thought, given the way the ext4 code explictly registers ext3/ext2/ext4 in that order when ext4 is being used for all of themmm. > P.S. Regarding the problem which triggered your investigation of the > boot order: > > > > One the first cold boot of a new kernel, the boot appears to hang. > > > What i've discovered (which took a long time thanks to the shitpile > > > that is systemd) is that it appears to be doing a e2fsck on the root > > > device, and that is failing resulting in systemd outputing: > > > > > > [FAILED] Failed to start File System Check on Root Device. > > Clearly both you and I don't have the same refined tastes as Lennart > Poettering. :-) > > After all, instead of grepping through shell scripts, Lennart clearly > prefers people to have to go diving through C source code to figure > out what is going on.... That's the single biggest problem systemd has - you need another machine with source code on it to debug a machine that is failing to boot. > I could be wrong, since the systemd sources is a twisty maze of C > code, all different, but I *believe* that error is caused by the fact [snip] I didn't feel like doing this when it became clear that I could just work around systemd's issues by using a different kernel config. > And it *does* appear that if we had modified the root file system and > had requested a reboot (at least from the version of systemd sources I > examined) , it does appear that systemd should have immediately > started rebooting the system, having printed the exit code (probably > 3, i.e. FSCK_NONDESTRUCT | FSCK_REBOOT). > > So I think something else was going on here, but given the inability > for systemd to save logs in this instance, I'm not sure we'll be able > to figure out what was going on. That something else is probably the fact that ever since systemd took over the init scripts, reboot/halt operations fail to actually reboot/halt my VMs - they just hang instead, which in most cases is just fine so I've never bothered to track down what magic kernel config option I need to set to make it work again. The thing here is that systemd output is not making it clear that it is rebooting and the fact that the e2fsck output like "**** REBOOT IMMEDIATELY ****" are swallowed and never appear on the console means that it's pretty much impossible to determine what went wrong. Add to that the console log being a massive confusion of things still starting and some things stopping, and it ends up being completely indecipherable as to what is going on when it finally stops doing stuff.... Brave new world, eh? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [4.3-rc1, regression] ext2 vs ext3/ext4 fs probing issues 2015-09-27 23:14 ` Theodore Ts'o 2015-09-28 0:04 ` Dave Chinner @ 2015-09-28 17:36 ` Darrick J. Wong 1 sibling, 0 replies; 5+ messages in thread From: Darrick J. Wong @ 2015-09-28 17:36 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Dave Chinner, linux-ext4, linux-fsdevel On Sun, Sep 27, 2015 at 07:14:58PM -0400, Theodore Ts'o wrote: > On Sat, Sep 26, 2015 at 09:51:26AM +1000, Dave Chinner wrote: > > Ping? > > Sorry, I somehow missed this the first time you posted this. > > > > Which tells me that there's a problem with fstype probe ordering > > > regressions w.r.t ext2 and ext3 as a result of removing the ext3 > > > module. It also doesn't fail fsck checks now, so boots successfully > > > every time. I suspect the "boot hang" problem is that e2fsck sees a > > > dirty journal, fixes everything and then asks for a reboot, which > > > fails. > > The original probe ordering was: ext3, ext2, ext4. As you've > correctly pointed out, this allowed us to preferentially use ext3 over > ext2 even if the file system did not need a file system replay. We > put ext4 at the end so that if there was an ext2-only root file > system, we would use ext2 in preference over ext4. This was useful > for backwards compatibility for certain ancient enterprise > distributions, and/or when ext4 was still under development and we > wanted to make sure for an ext2-only root, we would use ext2 if > possible. > > When ext3 was removed, we were now left with the boot order: ext2, > ext4. We could swap this around, but in that case, ext4 would > *always* be used in preference to ext2, which not necessarily the best > thing. Especially since in most cases most people will be using > distro initrd's, which don't use the brute-force probe order in > init/do_mounts.c. The main people who still use the kernel boot order > tend to be us died-in-the stick developers who don't believe in > initrd's (and probably wouldn't be using systemd unless our distro was > forcing us to). > > Something which I'm thinking about doing --- but which is an awful > hack --- is to switch the order so we probe ext4 first, but also add > code in ext4's mount function which checks to see if (a) the pid is 1 > in the root pid namespace, (b) the file system feature set is one that > can be supported by the ext2 driver, and (c) the ext2 driver is > available. In that case, we could fail the mount, so that in the case > where we are doing the initial boot time probing, and the root file > system is an ext2 file system, we properly use the ext2 file system if > it's available. > > This should do what we want in all circumstances, but the question is > whether I'd respect myself in the morning..... :-) How about teaching ext2 to check for compat features that only ext4 handles (i.e. the journal) and fail the mount so that ext4 will pick it up? I figure that most users aren't going to want to mount an ext3 fs with ext2 when ext4 is available. (But this discussion turns into systemd, so I'm also warily backing away...) --D > > - Ted > > P.S. Regarding the problem which triggered your investigation of the > boot order: > > > > One the first cold boot of a new kernel, the boot appears to hang. > > > What i've discovered (which took a long time thanks to the shitpile > > > that is systemd) is that it appears to be doing a e2fsck on the root > > > device, and that is failing resulting in systemd outputing: > > > > > > [FAILED] Failed to start File System Check on Root Device. > > Clearly both you and I don't have the same refined tastes as Lennart > Poettering. :-) > > After all, instead of grepping through shell scripts, Lennart clearly > prefers people to have to go diving through C source code to figure > out what is going on.... > > I could be wrong, since the systemd sources is a twisty maze of C > code, all different, but I *believe* that error is caused by the fact > that for some reason, systemd wasn't able to start the executable > /lib/systemd/systemd-fsck (The string "File System Check on Root > Device" comes from the file > /lib/systemd/system/systemd-fsck-root.service, and I'm pretty sure > this means it wasn't able to start the ExecStart program, > /lib/systemd/systemd-fsck). > > If /lib/systemd/systemd-fsck (a C program, why use a shell script when > you can force hard-working programmres to have to comprehend someone > else's C code) had managed to start fsck.ext[234] and it returned an > error, you should have seen an explicit message about fsck failing > with a specific error code and/or signal: > > if (status.si_code != CLD_EXITED || (status.si_status & ~1)) { > > if (status.si_code == CLD_KILLED || status.si_code == CLD_DUMPED) > log_error("fsck terminated by signal %s.", signal_to_string(status.si_status)); > else if (status.si_code == CLD_EXITED) > log_error("fsck failed with error code %i.", status.si_status); > else > log_error("fsck failed due to unknown reason."); > > if (status.si_code == CLD_EXITED && (status.si_status & 2) && root_directory) > /* System should be rebooted. */ > start_target(SPECIAL_REBOOT_TARGET); > else if (status.si_code == CLD_EXITED && (status.si_status & 6)) > /* Some other problem */ > start_target(SPECIAL_EMERGENCY_TARGET); > else { > r = EXIT_SUCCESS; > log_warning("Ignoring error."); > } > > And it *does* appear that if we had modified the root file system and > had requested a reboot (at least from the version of systemd sources I > examined) , it does appear that systemd should have immediately > started rebooting the system, having printed the exit code (probably > 3, i.e. FSCK_NONDESTRUCT | FSCK_REBOOT). > > So I think something else was going on here, but given the inability > for systemd to save logs in this instance, I'm not sure we'll be able > to figure out what was going on. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-09-28 17:36 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-21 7:07 [4.3-rc1, regression] ext2 vs ext3/ext4 fs probing issues Dave Chinner 2015-09-25 23:51 ` Dave Chinner 2015-09-27 23:14 ` Theodore Ts'o 2015-09-28 0:04 ` Dave Chinner 2015-09-28 17:36 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).