* Re: Xfs_repair and journalling
2013-03-17 11:42 ` Subranshu Patel
@ 2013-03-17 14:50 ` Stan Hoeppner
2013-03-17 15:18 ` Matthias Schniedermeyer
` (4 subsequent siblings)
5 siblings, 0 replies; 23+ messages in thread
From: Stan Hoeppner @ 2013-03-17 14:50 UTC (permalink / raw)
To: Subranshu Patel; +Cc: xfs
On 3/17/2013 6:42 AM, Subranshu Patel wrote:
> What I understand is that, XFS being a journalling filesystem makes running
> xfs_repair after a unclean unmount unnecessary.
>
> After a system crash or force power down, one can mount the filesystem
> which causes the journal to be replayed and handles the half finished
> writes. This is one part. But there can be other file corruption as well
> and these can be handled by xfs_repair.
> So the crux is to mount the filesystem so that journal will be replayed,
> and then unmount the filesystem and run xfs_repair. (Assuming xfs_repair
> may not be a mandatory step always)
>
> In case of EXT4, journal will not be replayed on performing mount. One need
> to invoke fsck which performs journal playback and then other corruption
> checks/recovery.
> Correct me if I am wrong.
Enough with the foreplay. Get to your point please. I'm assuming you
think you have an exception, a bug, to report.
--
Stan
>
> On Sun, Mar 17, 2013 at 10:56 AM, Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
>>
>> On 3/16/2013 10:56 AM, Subranshu Patel wrote:
>>
>>> This is not observed in EXT4, fsck successfully recovers without
>>> mounting the filesystem.
>>
>> And this is the real problem. You're *assuming* XFS should behave in
>> the same manner as EXT4. Why would you assume a Ferrari should behave
>> like a Tata Nano?
>>
>> XFS is far more sophisticated than EXT4 in many, many ways, including
>> recovery after unclean shutdown. XFS kernel code performs journal
>> playback/recovery automatically when the filesystem is mounted.
>> xfs_repair is a tool for fixing filesystems that are broken, not simply
>> in need of journal playback. Thus xfs_repair has no code to perform
>> journal recovery.
>>
>> EXT4 (and EXT3) lacks this sophistication and must call a user space
>> tool, e2fsck, to perform journal playback/recovery.
>>
>> XFS is the Ferrari of Linux filesystems and EXT is the Tata. Keep that
>> in mind as you discover many of the other differences in the future.
>>
>> --
>> Stan
>>
>
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Xfs_repair and journalling
2013-03-17 11:42 ` Subranshu Patel
2013-03-17 14:50 ` Stan Hoeppner
@ 2013-03-17 15:18 ` Matthias Schniedermeyer
2013-03-17 23:20 ` Dave Chinner
` (3 subsequent siblings)
5 siblings, 0 replies; 23+ messages in thread
From: Matthias Schniedermeyer @ 2013-03-17 15:18 UTC (permalink / raw)
To: Subranshu Patel; +Cc: stan, xfs
On 17.03.2013 17:12, Subranshu Patel wrote:
>
> In case of EXT4, journal will not be replayed on performing mount. One need
> to invoke fsck which performs journal playback and then other corruption
> checks/recovery.
> Correct me if I am wrong.
Wrong.
ALL Journaling FSes automatically replay the long upon mounting, or at
least i don't know of any that don't.
Only if there is additional corruption the corresponding fsck comes into
play.
For XFS i have only needed xfs_repair 1 time in over 10 years (and
countless power-failures/spontanous resets, forgotten umounts
(USB-drives) ...) and that was caused by a bug in XFS that was fixed
recently, otherwise my stat would still be 0.
In most cases, after an unclean shutdown, the journal is replayed and
everything is fine. Otherwise you either hit a bug (that needs fixing),
or you have hardware problems.
--
Matthias
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Xfs_repair and journalling
2013-03-17 11:42 ` Subranshu Patel
2013-03-17 14:50 ` Stan Hoeppner
2013-03-17 15:18 ` Matthias Schniedermeyer
@ 2013-03-17 23:20 ` Dave Chinner
2013-03-18 18:22 ` Ben Myers
` (2 subsequent siblings)
5 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2013-03-17 23:20 UTC (permalink / raw)
To: Subranshu Patel; +Cc: stan, xfs
On Sun, Mar 17, 2013 at 05:12:36PM +0530, Subranshu Patel wrote:
> What I understand is that, XFS being a journalling filesystem makes running
> xfs_repair after a unclean unmount unnecessary.
Correct.
> After a system crash or force power down, one can mount the filesystem
> which causes the journal to be replayed and handles the half finished
> writes.
What "half finished writes"? The journal replays only completely
written checkpoints. It tosses away half written writes because they
are not complete and so replaying them will result in filesystem
corruption....
> This is one part. But there can be other file corruption as well
> and these can be handled by xfs_repair.
I think you don't understand how OS level writeback caching works.
You will *lose* data on a power failure unless the application
spcifically writes it to disk with fdatasync/fsync(). This has
nothing to do with the filesystem, nor journal replay. If the
application uses fsync, then the data on disk is consistent with
what is in the journal, so after journal replay, the data is there
on disk.
xfs_reapir is not full of magic pixie dust that miraculously
recovers data that was never written to disk....
> In case of EXT4, journal will not be replayed on performing mount. One need
> to invoke fsck which performs journal playback and then other corruption
> checks/recovery.
> Correct me if I am wrong.
ext4 behaves like this, but your basic assumption that fsck.ext4
performs data recovery/repair is wrong, same as for your assumption
that xfs_repair does this for XFS.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Xfs_repair and journalling
2013-03-17 11:42 ` Subranshu Patel
` (2 preceding siblings ...)
2013-03-17 23:20 ` Dave Chinner
@ 2013-03-18 18:22 ` Ben Myers
2013-03-18 20:58 ` Martin Steigerwald
2013-03-18 20:50 ` Martin Steigerwald
2013-03-19 4:02 ` Eric Sandeen
5 siblings, 1 reply; 23+ messages in thread
From: Ben Myers @ 2013-03-18 18:22 UTC (permalink / raw)
To: Subranshu Patel; +Cc: stan, xfs
Hi Subranshu,
On Sun, Mar 17, 2013 at 05:12:36PM +0530, Subranshu Patel wrote:
> What I understand is that, XFS being a journalling filesystem makes running
> xfs_repair after a unclean unmount unnecessary.
You are correct.
> After a system crash or force power down, one can mount the filesystem
> which causes the journal to be replayed and handles the half finished
> writes. This is one part. But there can be other file corruption as well
> and these can be handled by xfs_repair.
As Dave mentioned, XFS only journals metadata. Unwritten cached file contents
will not be recovered in this situation.
> So the crux is to mount the filesystem so that journal will be replayed,
> and then unmount the filesystem and run xfs_repair. (Assuming xfs_repair
> may not be a mandatory step always)
If you are set up correctly, (e.g. write caches are turned off on your disk)
you shouldn't even need to unmount the filesystem an run xfs_repair.
See this section of the xfs faq for more about write caches:
http://www.xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F
> In case of EXT4, journal will not be replayed on performing mount. One need
> to invoke fsck which performs journal playback and then other corruption
> checks/recovery.
I can't speak for ext4. I do think that your expectation that fsck/repair be
able to replay a journal is pretty reasonable. That's just not how it is
implemented here. In xfs, log recovery is in the kernel and xfs_repair knows
just enough about the log to avoid clobbering your precious metadata when it
needs to be recovered. There was some discussion in the past about making
xfs_repair able to recover the log but I wouldn't expect that any time soon.
Regards,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Xfs_repair and journalling
2013-03-18 18:22 ` Ben Myers
@ 2013-03-18 20:58 ` Martin Steigerwald
0 siblings, 0 replies; 23+ messages in thread
From: Martin Steigerwald @ 2013-03-18 20:58 UTC (permalink / raw)
To: xfs; +Cc: Ben Myers, stan, Subranshu Patel
Am Montag, 18. März 2013 schrieb Ben Myers:
> Hi Subranshu,
[…]
> > In case of EXT4, journal will not be replayed on performing mount. One
> > need to invoke fsck which performs journal playback and then other
> > corruption checks/recovery.
>
> I can't speak for ext4. I do think that your expectation that
> fsck/repair be able to replay a journal is pretty reasonable. That's
> just not how it is implemented here. In xfs, log recovery is in the
> kernel and xfs_repair knows just enough about the log to avoid
> clobbering your precious metadata when it needs to be recovered. There
> was some discussion in the past about making xfs_repair able to recover
> the log but I wouldn't expect that any time soon.
Yes, fsck.ext4 is able to replay the journal, and also able to *just* replay
the journal and do nothing else. I just checked its manpage.
Thanks for not speaking for Ext4 unless you really think you know it.
Being able to replay in userspace might help when the filesystem cannot be
mounted correctly due to reasons laying outside the journal. Cause then one
can attempt to replay the journal in userspace and run a repair on the
filesystem then.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling
2013-03-17 11:42 ` Subranshu Patel
` (3 preceding siblings ...)
2013-03-18 18:22 ` Ben Myers
@ 2013-03-18 20:50 ` Martin Steigerwald
2013-03-19 4:02 ` Eric Sandeen
5 siblings, 0 replies; 23+ messages in thread
From: Martin Steigerwald @ 2013-03-18 20:50 UTC (permalink / raw)
To: xfs
Am Sonntag, 17. März 2013 schrieb Subranshu Patel:
> In case of EXT4, journal will not be replayed on performing mount. One
> need to invoke fsck which performs journal playback and then other
> corruption checks/recovery.
> Correct me if I am wrong.
USB stick with Ext4 without any mkfs options:
merkaba:~> mount /dev/sdb /mnt/zeit
merkaba:~> rsync -a /etc /mnt/zeit
(was to short)
merkaba:~> rsync -a /usr/bin /mnt/zeit
Unplug during write activity:
merkaba:~> grep "journal" /var/log/syslog | tail -5
Mar 18 21:43:22 merkaba kernel: [29938.038298] journal commit I/O error
Mar 18 21:43:22 merkaba kernel: [29938.038300] journal commit I/O error
Mar 18 21:43:22 merkaba kernel: [29938.038302] journal commit I/O error
Mar 18 21:43:22 merkaba kernel: [29938.038305] journal commit I/O error
Mar 18 21:43:22 merkaba kernel: [29938.038308] journal commit I/O error
merkaba:~> grep -c "journal commit" /var/log/syslog
1915
Wait till KDE SC stops replaying these notifications and then restart
plasma-desktop to get rid of it cause that takes to long.
Plug USB stick back in. And mount it:
merkaba:~> mount /dev/sdc /mnt/zeit
merkaba:~> ls -l /mnt/zeit
insgesamt 40
drwxr-xr-x 2 root root 12288 Mär 18 21:43 bin
drwxr-xr-x 188 root root 12288 Mär 18 13:31 etc
drwx------ 2 root root 16384 Mär 18 21:42 lost+found
Mount took about 5 seconds.
merkaba:~> dmesg | tail -2
[30217.601896] EXT4-fs (sdc): recovery complete
[30217.603287] EXT4-fs (sdc): mounted filesystem with ordered data mode.
Opts: (null)
So can we get over any guess work that Ext4 might not be able to replay
its journal after an unclean shutdown?
It can and it does. And not just since this 3.9-rc3 kernel.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Xfs_repair and journalling
2013-03-17 11:42 ` Subranshu Patel
` (4 preceding siblings ...)
2013-03-18 20:50 ` Martin Steigerwald
@ 2013-03-19 4:02 ` Eric Sandeen
2013-03-19 6:19 ` Stan Hoeppner
5 siblings, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2013-03-19 4:02 UTC (permalink / raw)
To: Subranshu Patel; +Cc: stan, xfs
On 3/17/13 6:42 AM, Subranshu Patel wrote:
>
> In case of EXT4, journal will not be replayed on performing mount.
> One need to invoke fsck which performs journal playback and then
> other corruption checks/recovery. Correct me if I am wrong.
You are wrong, I'm afraid.
Simple tests, or reading the code, will show you that ext4
replays a dirty log at mount time.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Xfs_repair and journalling
2013-03-19 4:02 ` Eric Sandeen
@ 2013-03-19 6:19 ` Stan Hoeppner
2013-03-19 8:24 ` Martin Steigerwald
0 siblings, 1 reply; 23+ messages in thread
From: Stan Hoeppner @ 2013-03-19 6:19 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Subranshu Patel, xfs
On 3/18/2013 11:02 PM, Eric Sandeen wrote:
> On 3/17/13 6:42 AM, Subranshu Patel wrote:
>>
>> In case of EXT4, journal will not be replayed on performing mount.
>> One need to invoke fsck which performs journal playback and then
>> other corruption checks/recovery. Correct me if I am wrong.
>
> You are wrong, I'm afraid.
Eric he misunderstood what I stated previously, and mangled repeating it.
> Simple tests, or reading the code, will show you that ext4
> replays a dirty log at mount time.
What I stated was that EXT4 makes a call to user space e2fsck to execute
the journal replay routine. If this code is duplicated in the EXT4
kernel driver, then my apologies for spreading misinformation due to
being misinformed. I'm not a programmer and have not read the code, and
wouldn't understand it if I did.
I've never used EXT3/4. Thus I Googled extensively for "ext4 journal
recovery" and the like before making my statement specifically to avoid
a misstatement of fact. 30+ minutes wasted apparently... Everything I
found indicated that the mechanism for journal playback was the journal
code in e2fsck. And not finding any mention of journal recovery code in
the kernel in these hits, I thought I had correct information.
I use rolled kernels that don't have module support, and they don't have
the EXT4 driver built in, only XFS. So even if I'd thought of it I
couldn't perform the USB test Martin mentioned and verify if the user
space call was made or not. And after what Google told me, I simply
didn't consider the information to be incorrect or incomplete, so I
didn't think to build a rig and test it.
Martin, I didn't state that ext4 cannot perform journal recovery, which
you previously misunderstood. As mentioned above I stated it made a
call to e2fsck to perform the task. And, again, apparently this is not
the case. If you want to excoriate me for getting this wrong, that's
fine. But don't do it in a way that suggests it was intentional, or
that I made no effort to verify the information before I stated it. I
spent at least 30 minutes Googling trying to track down documents
explaining the ext4 journal recovery code in the kernel. I simply
didn't find any. The only thing I found were descriptions of e2fsck
based journal recovery.
If someone has a link to a document describing the ext4 journal recovery
code I'd love to read it, so I can speak more intelligently about it in
the future.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling
2013-03-19 6:19 ` Stan Hoeppner
@ 2013-03-19 8:24 ` Martin Steigerwald
2013-03-19 10:14 ` Stan Hoeppner
0 siblings, 1 reply; 23+ messages in thread
From: Martin Steigerwald @ 2013-03-19 8:24 UTC (permalink / raw)
To: stan; +Cc: Eric Sandeen, Subranshu Patel, xfs
Am Dienstag, 19. März 2013 schrieb Stan Hoeppner:
> Martin, I didn't state that ext4 cannot perform journal recovery, which
> you previously misunderstood. As mentioned above I stated it made a
> call to e2fsck to perform the task. And, again, apparently this is not
> the case. If you want to excoriate me for getting this wrong, that's
> fine. But don't do it in a way that suggests it was intentional, or
> that I made no effort to verify the information before I stated it. I
> spent at least 30 minutes Googling trying to track down documents
> explaining the ext4 journal recovery code in the kernel. I simply
> didn't find any. The only thing I found were descriptions of e2fsck
> based journal recovery.
Stan, you are still a XFS expert, you are still a hardware expert, and I
love reading your posts at debian-user, I sometimes even search for those,
you still know a lot and heck you are still Stan and as such without any
achievement or knowledge at all a precious being.
Just like anyone else on this list (and elsewhere) is a precious being just
as they are.
So whats so difficult with admitting that what you wrote about Ext4 and
journal replay as at least misleading?
Heck, even I was confused at first. Cause the manpage of fsck.ext4 IMHO is
not really clear about that topic to say the least. I tested it out for a
reason.
I am concerned about the tendency I perceive in open source, heck general
computer communities to bind own value to being right on a topic. There is
no, absolutely no connection at all. You and everyone else is valuable and
precious without any prerequisite at all.
I also take some to learn out of this myself: Cause I was obsessed with
being right myself and bound my value to it as well. I have overdone my
previous mails. Sorry for that.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling
2013-03-19 8:24 ` Martin Steigerwald
@ 2013-03-19 10:14 ` Stan Hoeppner
2013-03-30 12:49 ` Xfs_repair and journalling -- EXT4 journal replay discussion Stan Hoeppner
0 siblings, 1 reply; 23+ messages in thread
From: Stan Hoeppner @ 2013-03-19 10:14 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Eric Sandeen, Subranshu Patel, xfs
On 3/19/2013 3:24 AM, Martin Steigerwald wrote:
> Am Dienstag, 19. März 2013 schrieb Stan Hoeppner:
>> Martin, I didn't state that ext4 cannot perform journal recovery, which
>> you previously misunderstood. As mentioned above I stated it made a
>> call to e2fsck to perform the task. And, again, apparently this is not
>> the case. If you want to excoriate me for getting this wrong, that's
>> fine. But don't do it in a way that suggests it was intentional, or
>> that I made no effort to verify the information before I stated it. I
>> spent at least 30 minutes Googling trying to track down documents
>> explaining the ext4 journal recovery code in the kernel. I simply
>> didn't find any. The only thing I found were descriptions of e2fsck
>> based journal recovery.
>
> Stan, you are still a XFS expert, you are still a hardware expert, and I
> love reading your posts at debian-user, I sometimes even search for those,
> you still know a lot and heck you are still Stan and as such without any
> achievement or knowledge at all a precious being.
Thank you for the kind words. I'm far from an XFS expert. I may
understand parts of it better than some other users, but my knowledge of
it is laughably tiny and woefully incomplete compared to any of the
developers, who are the subject matter experts. Which is why I
regularly ask Dave to elaborate on things of which I don't yet have
knowledge or understanding. He explains things in a way that I can
easily digest. Kudos to Dave for being a good teacher.
> Just like anyone else on this list (and elsewhere) is a precious being just
> as they are.
>
> So whats so difficult with admitting that what you wrote about Ext4 and
> journal replay as at least misleading?
I thought I did in my last reply, 3 paragraphs above the one you pasted
above. I said:
"...if this code is duplicated in the EXT4 kernel driver, then my
apologies for spreading misinformation due to being misinformed...."
> Heck, even I was confused at first. Cause the manpage of fsck.ext4 IMHO is
> not really clear about that topic to say the least. I tested it out for a
> reason.
I already contacted Ted off list hoping he can point me to the relevant
kernel documentation, so I don't make such a mistake again with EXT.
> I am concerned about the tendency I perceive in open source, heck general
> computer communities to bind own value to being right on a topic. There is
> no, absolutely no connection at all. You and everyone else is valuable and
> precious without any prerequisite at all.
I'm less concerned about "being right" than "getting it right". When I
make a mistake like this it's rather embarrassing. So I do my best to
correct the mistake, by learning the relevant information, and not
making the mistake again.
> I also take some to learn out of this myself: Cause I was obsessed with
> being right myself and bound my value to it as well. I have overdone my
> previous mails. Sorry for that.
The part you're referring to wasn't about being right, but merely
clarifying what occurred in the thread. I stated incorrect information,
then the OP repeated it but in a way that made what I said even more
incorrect. In other words, I was telling Eric this was my fault. I.e.
taking responsibility for the mistake, that then snowballed. I wouldn't
call that "being obsessed with being right". I was wrong and clearly
stated so.
Anyway, too much text/time/bandwidth has been wasted on this already.
Let's move on to something else. If I'm able to get hold of some good
ext4 kernel documentation describing the journal handlig I'll gladly share.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling -- EXT4 journal replay discussion
2013-03-19 10:14 ` Stan Hoeppner
@ 2013-03-30 12:49 ` Stan Hoeppner
2013-03-30 17:40 ` Eric Sandeen
2013-03-31 1:35 ` Dave Chinner
0 siblings, 2 replies; 23+ messages in thread
From: Stan Hoeppner @ 2013-03-30 12:49 UTC (permalink / raw)
To: stan; +Cc: Eric Sandeen, Subranshu Patel, xfs
On 3/19/2013 5:14 AM, Stan Hoeppner wrote:
> On 3/19/2013 3:24 AM, Martin Steigerwald wrote:
...
>> Heck, even I was confused at first. Cause the manpage of fsck.ext4 IMHO is
>> not really clear about that topic to say the least. I tested it out for a
>> reason.
>
> I already contacted Ted off list hoping he can point me to the relevant
> kernel documentation, so I don't make such a mistake again with EXT.
Ok, so here's the skinny on the source of our confusion WRT how/when
EXT4 replays journals, and it's rather interesting. Ted Ts'o explained
the following.
The EXT4 kernel module does have code to perform journal replay, but it
is rarely executed. The reasons for this are:
1. EXT4 journal replay can take a lot of time (whereas XFS is instant)
2. EXT4 systems tend to have multiple filesystems, often one per drive
(whereas XFS systems tend to have few filesystems)
3. Linux mounts filesystems serially during startup
To prevent potentially lengthy boot times, the init scripts run e2fsck
to replay all EXT4 filesystem journals in parallel, well before the
mount stage. Thus the only case where the EXT4 kernel module performs
journal replay is when doing a mount while the system is running, e.g.
USB hard drive.
There are other reasons e2fsck was chosen to perform journal replay at
boot in addition to the speed issue, but as I understood Ted this is the
main reason.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Xfs_repair and journalling -- EXT4 journal replay discussion
2013-03-30 12:49 ` Xfs_repair and journalling -- EXT4 journal replay discussion Stan Hoeppner
@ 2013-03-30 17:40 ` Eric Sandeen
2013-03-30 18:52 ` Stan Hoeppner
2013-03-31 1:35 ` Dave Chinner
1 sibling, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2013-03-30 17:40 UTC (permalink / raw)
To: stan; +Cc: Subranshu Patel, xfs
On 3/30/13 7:49 AM, Stan Hoeppner wrote:
> On 3/19/2013 5:14 AM, Stan Hoeppner wrote:
>> On 3/19/2013 3:24 AM, Martin Steigerwald wrote:
> ...
>>> Heck, even I was confused at first. Cause the manpage of fsck.ext4 IMHO is
>>> not really clear about that topic to say the least. I tested it out for a
>>> reason.
>>
>> I already contacted Ted off list hoping he can point me to the relevant
>> kernel documentation, so I don't make such a mistake again with EXT.
>
> Ok, so here's the skinny on the source of our confusion WRT how/when
> EXT4 replays journals, and it's rather interesting. Ted Ts'o explained
> the following.
Where was this, out of curiosity?
> The EXT4 kernel module does have code to perform journal replay, but it
> is rarely executed. The reasons for this are:
>
> 1. EXT4 journal replay can take a lot of time (whereas XFS is instant)
> 2. EXT4 systems tend to have multiple filesystems, often one per drive
> (whereas XFS systems tend to have few filesystems)
Those are, I think, gross generalizations. Journal replay takes as
long as it takes to replay all the IO required, which can vary greatly.
And TBH I have no idea where the notion came from that systems have many
ext4 filesystems but few xfs filesystems.
> 3. Linux mounts filesystems serially during startup
I think that is correct.
> To prevent potentially lengthy boot times, the init scripts run e2fsck
> to replay all EXT4 filesystem journals in parallel, well before the
> mount stage.
I'd never heard this rationale before, but I could believe that maybe
parallel log replays from userspace are faster, although it probably
depends a lot on how many spindles are available to do the work - fsck
avoids running in parallel for filesystems on the same physical disk,
at least according to the manpage.
> Thus the only case where the EXT4 kernel module performs
> journal replay is when doing a mount while the system is running, e.g.
> USB hard drive.
Or when running xfstests ;) Technically, it does replay when the kernel
mount code finds a dirty log. That's interesting, though, I hadn't thought
about how most systems probably don't get a ton of coverage of kernelspace
ext[34] log replay.
> There are other reasons e2fsck was chosen to perform journal replay at
> boot in addition to the speed issue, but as I understood Ted this is the
> main reason.
Ok, I can see some rationale to parallel userspace log replays; it'd be
interesting to actually measure that result, though.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling -- EXT4 journal replay discussion
2013-03-30 17:40 ` Eric Sandeen
@ 2013-03-30 18:52 ` Stan Hoeppner
2013-03-30 20:21 ` Eric Sandeen
2013-03-31 2:03 ` Dave Chinner
0 siblings, 2 replies; 23+ messages in thread
From: Stan Hoeppner @ 2013-03-30 18:52 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Subranshu Patel, xfs
On 3/30/2013 12:40 PM, Eric Sandeen wrote:
> On 3/30/13 7:49 AM, Stan Hoeppner wrote:
>> Ok, so here's the skinny on the source of our confusion WRT how/when
>> EXT4 replays journals, and it's rather interesting. Ted Ts'o explained
>> the following.
>
> Where was this, out of curiosity?
Private email exchange with Ted. I thought it was best from an
etiquette standpoint not to wholesale paste his two private emails to
the XFS list, but to summarize. Maybe it would be ok if I put it on my
web server for a couple of days then remove it.
http://www.hardwarefreak.com/ext4-journaling.txt
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling -- EXT4 journal replay discussion
2013-03-30 18:52 ` Stan Hoeppner
@ 2013-03-30 20:21 ` Eric Sandeen
2013-03-31 11:24 ` Stan Hoeppner
2013-03-31 2:03 ` Dave Chinner
1 sibling, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2013-03-30 20:21 UTC (permalink / raw)
To: stan; +Cc: Subranshu Patel, xfs
On 3/30/13 1:52 PM, Stan Hoeppner wrote:
> On 3/30/2013 12:40 PM, Eric Sandeen wrote:
>> On 3/30/13 7:49 AM, Stan Hoeppner wrote:
>
>>> Ok, so here's the skinny on the source of our confusion WRT how/when
>>> EXT4 replays journals, and it's rather interesting. Ted Ts'o explained
>>> the following.
>>
>> Where was this, out of curiosity?
>
> Private email exchange with Ted. I thought it was best from an
> etiquette standpoint not to wholesale paste his two private emails to
> the XFS list, but to summarize. Maybe it would be ok if I put it on my
> web server for a couple of days then remove it.
I didn't need it verbatim, just wondered. I agree, private emails
probably should stay that way, w/o permission otherwise.
Thanks,
-Eric
> http://www.hardwarefreak.com/ext4-journaling.txt
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling -- EXT4 journal replay discussion
2013-03-30 20:21 ` Eric Sandeen
@ 2013-03-31 11:24 ` Stan Hoeppner
0 siblings, 0 replies; 23+ messages in thread
From: Stan Hoeppner @ 2013-03-31 11:24 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Subranshu Patel, xfs
On 3/30/2013 3:21 PM, Eric Sandeen wrote:
> I didn't need it verbatim, just wondered. I agree, private emails
> probably should stay that way, w/o permission otherwise.
The link is dead. I did this simply to show I wasn't pulling things
outta thin air and that my info was now correct, and because Ted
provided a lot of insight I didn't parrot in my post. As long as nobody
copy/pasted that content anywhere it should only be 'public' in the gray
matter of a few folks on this list, as it shouldn't have been archived
by any bots, even though the dead link has been.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling -- EXT4 journal replay discussion
2013-03-30 18:52 ` Stan Hoeppner
2013-03-30 20:21 ` Eric Sandeen
@ 2013-03-31 2:03 ` Dave Chinner
1 sibling, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2013-03-31 2:03 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: Eric Sandeen, Subranshu Patel, xfs
On Sat, Mar 30, 2013 at 01:52:17PM -0500, Stan Hoeppner wrote:
> On 3/30/2013 12:40 PM, Eric Sandeen wrote:
> > On 3/30/13 7:49 AM, Stan Hoeppner wrote:
>
> >> Ok, so here's the skinny on the source of our confusion WRT how/when
> >> EXT4 replays journals, and it's rather interesting. Ted Ts'o explained
> >> the following.
> >
> > Where was this, out of curiosity?
>
> Private email exchange with Ted. I thought it was best from an
> etiquette standpoint not to wholesale paste his two private emails to
> the XFS list, but to summarize. Maybe it would be ok if I put it on my
> web server for a couple of days then remove it.
>
> http://www.hardwarefreak.com/ext4-journaling.txt
That again? I'll just point to this post from Ted in 2004:
http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc
It's the same FUD about XFS journalling and power failures that Ted
has been claiming for the past 10+ years. It's been rebutted so many
times I can now do it in three words: volatile drive caches.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Xfs_repair and journalling -- EXT4 journal replay discussion
2013-03-30 12:49 ` Xfs_repair and journalling -- EXT4 journal replay discussion Stan Hoeppner
2013-03-30 17:40 ` Eric Sandeen
@ 2013-03-31 1:35 ` Dave Chinner
1 sibling, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2013-03-31 1:35 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: Eric Sandeen, Subranshu Patel, xfs
On Sat, Mar 30, 2013 at 07:49:54AM -0500, Stan Hoeppner wrote:
> On 3/19/2013 5:14 AM, Stan Hoeppner wrote:
> > On 3/19/2013 3:24 AM, Martin Steigerwald wrote:
> ...
> >> Heck, even I was confused at first. Cause the manpage of fsck.ext4 IMHO is
> >> not really clear about that topic to say the least. I tested it out for a
> >> reason.
> >
> > I already contacted Ted off list hoping he can point me to the relevant
> > kernel documentation, so I don't make such a mistake again with EXT.
>
> Ok, so here's the skinny on the source of our confusion WRT how/when
> EXT4 replays journals, and it's rather interesting. Ted Ts'o explained
> the following.
>
> The EXT4 kernel module does have code to perform journal replay, but it
> is rarely executed. The reasons for this are:
>
> 1. EXT4 journal replay can take a lot of time (whereas XFS is instant)
19 minutes is my current record for XFS journal replay. 2GB log,
filled full of inode creates, required about 300,000 IOs to complete
recovery.....
> 2. EXT4 systems tend to have multiple filesystems, often one per drive
> (whereas XFS systems tend to have few filesystems)
[Citation needed]
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread