Re: Ext3 behavior on power failure

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ric Wheeler <ric@emc.com>
To: Jan Kara <jack@suse.cz>
Cc: csar@stanford.edu, linux-ext4@vger.kernel.org, ext3-users@redhat.com
Subject: Re: Ext3 behavior on power failure
Date: Wed, 28 Mar 2007 19:00:54 -0400	[thread overview]
Message-ID: <460AF3A6.403@emc.com> (raw)
In-Reply-To: <20070328124015.GG14935@atrey.karlin.mff.cuni.cz>



Jan Kara wrote:
>> armangau_philippe@emc.com wrote:
>>> Hi all,
>>>
>>> We are building a new system which is going to use ext3 FS. We would like 
>>> to know more about the behavior of ext3 in the case of failure.  But 
>>> before I procede, I would like to share more information about our future 
>>> system. 
>>> *	Our application always does an fsync on files
>>> *	When symbolic links (more specifically fast symlink) are created, 
>>> the host directory is also fsync'ed. *	Our application is also 
>>> going to front an EMC disk array configured using RAID5 or RAID6.
>>> *	We will be using multipathing  so that we can assume that no disk 
>>> errors will be reported. 
>>> In this context , we would like to know the following for recovery after a 
>>> power outage:
>>>
>>> 1.	When will an fsck have to be run (not counting  the scheduled fsck 
>>> every N-mounts)?
>>> 2.	In the case of a crash, are the fsync-ed file contents and symbolic 
>>> links safe no matter what?
>>>
>>> Thanks,
>> This is an interesting twist on some of the discussion that we have had 
>> at the recent workshop and in other forums on hardening  file system in 
>> order to prevent the need to fsck.
>>
>> The twist is that we have a disk that will not lose power without being 
>> able to write to platter all of the data that has been sent - this is 
>> the case for most mid-range or higher disk arrays.
>>
>> If the application can precisely use fsync() on files, directories and 
>> symlinks, it wants to know that all objects are safe on disk that have 
>> completed a successful fsync. It also wants to know that the file system 
>> will not need any recovery beyond replaying transactions after a power 
>> outage/reboot - simply mount, let the transactions get replayed and you 
>> should be good to go without the fsck.
>>
>> The hard part of the question is to understand when and how often we 
>> will fail to deliver this easy case. Also, does any of the hardening in 
>> ext4 help here.
>   I'm probably misunderstanding something because the answer seems to be
> too obvious to me :) But anyway I'll write it so that you can correct
> me:
>   Due to journalling guarantees you should get consistent FS whenever
> you replay the log (unless there are some software bugs or hardware
> problems which is why fsck is run once per several mounts anyway).
>   If you fsync() your data, you are guaranteed that also your data are
> safely on disk when fsync returns. So what is the question here?
> 
> 								Honza

I think that the real question here is in practice, how often does this really 
hold to be true? When it fails, how long does it take to recover the file system?

There are a lot of odd errors that can happen when you monitor a large enough 
number of file systems. In my experience, I would guess that disk errors are 
clearly the leading cause of issues, followed by software bugs (file system, 
firmware, etc) and then a group of errors caused by various occasional things 
(bad DRAM in the server/HBA/disk, bad cables/etc). Note that using a high end 
array does not eliminate errors, it just reduces the rate (hopefully by a large 
amount).

What is really hard to predict is the rate of the failures that require fsck 
with our current file system (say for a specific hardware setup) and how changes 
like the checksumming in ext4 can help us ride through these errors without 
needing a full fsck.

This rate has a direct impact on how much pain an fsck will inflict and how 
important redundancy is to avoid having the file system be a single point of 
failure.

ric

next prev parent reply	other threads:[~2007-03-28 23:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <A74E8B4A356D8143B79BBEB839421F3004023496@CORPUSMX20B.corp.emc.com>
2007-03-23 10:47 ` Ext3 behavior on power failure Ric Wheeler
2007-03-28 12:40   ` Jan Kara
2007-03-28 13:17     ` John Anthony Kazos Jr.
2007-03-28 13:29       ` Jan Kara
2007-03-28 14:17       ` armangau_philippe
2007-03-28 15:00         ` Jan Kara
2007-04-18 21:49       ` Bruno Wolff III
2007-03-28 23:00     ` Ric Wheeler [this message]
2007-03-29  8:00       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=460AF3A6.403@emc.com \
    --to=ric@emc.com \
    --cc=csar@stanford.edu \
    --cc=ext3-users@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).