public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* e2fsck bogus error report on orphan-list
@ 2007-07-19 15:39 Ryoichi.Kato
  2007-07-19 16:55 ` Theodore Tso
  0 siblings, 1 reply; 6+ messages in thread
From: Ryoichi.Kato @ 2007-07-19 15:39 UTC (permalink / raw)
  To: linux-ext4, tytso; +Cc: sct, akpm, adilger, Tim Bird

Hi,
I hit a problem of ext3/e2fsck on orphan-list handling.

The following sequence produces bogus e2fsck error report:
"/dev/XXX: Inodes that were part of a corrupted orphan linked list found."

   1. Delete a file in an ext3 filesystem in early 1970
   2. Set RTC to 2007, and then mount/write the filesystem.
   3. Run e2fsck (with -f)

This is because i_dtime (deletion time) field is also used as a
next-pointer of an orphan-list (stores inode number rather than time),
and e2fsck handles it improperly.
You will have the same probrem if you run e2fsck on an ext3
filesystem with 1.2+ billion of files in it. (Is it possible?)

For more detail, please take a look at a document I wrote:
 - http://tree.celinuxforum.org/CelfPubWiki/Ext3OrphanedInodeProblem
 - http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree15?action=AttachFile&do=get&target=ext3orphaned-inode.ppt (Sorry for .PPT)


So, my questions are:

 *Is this really a bug (or design defect) ?

 *Which of ext3 or e2fsck is responsible for the problem?
    - I feel that e2fsck is. But needs help of ext3 to solve it elegantly.

 *How should I(we) deal with this problem.
    - As a work-around, it's avoidable by just set RTC
      to 2007 or so before doing any ext3 operation.

Thank you.
--
Ryoichi KATO <Ryoichi.Kato@jp.sony.com>
    Audio Development & Engineering Div.
    Sony Corporation Audio Business Group
    Tel +81-3-3599-3862 / Fax +81-3-3599-3859


--
Ryoichi KATO <Ryoichi.Kato@jp.sony.com>
    System Design Dept. No4
    Audio Development & Engineering Div.
    Sony Corporation Audio Business Group
    Tel +81-3-3599-3862 / Fax +81-3-3599-3859

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: e2fsck bogus error report on orphan-list
  2007-07-19 15:39 e2fsck bogus error report on orphan-list Ryoichi.Kato
@ 2007-07-19 16:55 ` Theodore Tso
  2007-07-19 21:05   ` Tim Bird
  2007-07-19 23:20   ` Ryoichi KATO
  0 siblings, 2 replies; 6+ messages in thread
From: Theodore Tso @ 2007-07-19 16:55 UTC (permalink / raw)
  To: Ryoichi.Kato; +Cc: linux-ext4, sct, akpm, adilger, Tim Bird

On Fri, Jul 20, 2007 at 12:39:19AM +0900, Ryoichi.Kato@jp.sony.com wrote:
> Hi,
> I hit a problem of ext3/e2fsck on orphan-list handling.

Wow, I'm rather impressed that this was sufficient for a presentation
at a conference.  You could have just sent me e-mail.  :-)

> 
> The following sequence produces bogus e2fsck error report:
> "/dev/XXX: Inodes that were part of a corrupted orphan linked list found."
> 
>    1. Delete a file in an ext3 filesystem in early 1970

Dare I ask *why* the system clock was set in the 1970's?  Umm... don't
do that.

>    2. Set RTC to 2007, and then mount/write the filesystem.

There is code that detects when the time is set back in the 1970's
(normally due to a bad clock battery) and thus disables this
particular check.  So it only triggers when the clock was previously
bad, and is now good.

> This is because i_dtime (deletion time) field is also used as a
> next-pointer of an orphan-list (stores inode number rather than time),
> and e2fsck handles it improperly.
> You will have the same probrem if you run e2fsck on an ext3
> filesystem with 1.2+ billion of files in it. (Is it possible?)

It's *possible* but in practice no one does it, because the fsck times
if the filesystem had that many inodes would be pretty scary --- and
there will always be times when you must run fsck --- for example, if
you have hardware induced corruption and you need to salvage the
filesystem because your backups had failed (or you weren't doing
backups :-).


The net is that the check is basically a sanity check to make any bugs
in the orphaned list handling would be discovered, although it can
also trigger if there is block device corruption where part of the
inode table is corrupted.  I had added hueristics that for most people
meant that it never triggered, so I'm surprised that it actually did
in your environment.  Still, if it did, the easist thing to do is to
just turn it off.

We haven't had bugs in that area of the code for a long time, and if
it's actually causing you trouble, the simplest thing to do is to just
comment out the check.  That, or just make sure that the time is
correct, which is generally a good idea anyway.  Hmm, maybe I should
add an e2fsck configuration parameter:

[options]
	unreliable_system_clock = 1

Which disables various hueristics that assumes that the system clock
can be trusted.

	       	  	      	   		 - Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: e2fsck bogus error report on orphan-list
  2007-07-19 16:55 ` Theodore Tso
@ 2007-07-19 21:05   ` Tim Bird
  2007-07-19 23:20   ` Ryoichi KATO
  1 sibling, 0 replies; 6+ messages in thread
From: Tim Bird @ 2007-07-19 21:05 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Ryoichi.Kato, linux-ext4, sct, akpm, adilger

Theodore Tso wrote:
> On Fri, Jul 20, 2007 at 12:39:19AM +0900, Ryoichi.Kato@jp.sony.com wrote:
>> The following sequence produces bogus e2fsck error report:
>> "/dev/XXX: Inodes that were part of a corrupted orphan linked list found."
>>
>>    1. Delete a file in an ext3 filesystem in early 1970
> 
> Dare I ask *why* the system clock was set in the 1970's?  Umm... don't
> do that.

It is not uncommon for embedded boards to omit battery backing
on the RTC, so they always boot with a bogus (start-of-epoch) time.

 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: e2fsck bogus error report on orphan-list
  2007-07-19 16:55 ` Theodore Tso
  2007-07-19 21:05   ` Tim Bird
@ 2007-07-19 23:20   ` Ryoichi KATO
  2007-07-20  4:10     ` Theodore Tso
  1 sibling, 1 reply; 6+ messages in thread
From: Ryoichi KATO @ 2007-07-19 23:20 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4, sct, akpm, adilger, tim.bird

At Thu, 19 Jul 2007 12:55:10 -0400,
Theodore Tso wrote:
> On Fri, Jul 20, 2007 at 12:39:19AM +0900, Ryoichi.Kato@jp.sony.com wrote:
> > Hi,
> > I hit a problem of ext3/e2fsck on orphan-list handling.
> 
> Wow, I'm rather impressed that this was sufficient for a presentation
> at a conference.  You could have just sent me e-mail.  :-)

I know it's a rare case for most of the people and not sure
it is a 'bug',  but I thought it might happen more offten for CE people.
So, I asked for opinions of CE people in a lighting session of
"CELF Technical Jamboree."


> >    1. Delete a file in an ext3 filesystem in early 1970
> 
> Dare I ask *why* the system clock was set in the 1970's?  Umm... don't
> do that.

As Tim pointed out, embedded devices offten omit RTC battery.


> >    2. Set RTC to 2007, and then mount/write the filesystem.
> 
> There is code that detects when the time is set back in the 1970's
> (normally due to a bad clock battery) and thus disables this
> particular check.  So it only triggers when the clock was previously
> bad, and is now good.

Actually, it's a *real* problem happend for my car navigation product.
Until GPS signal is available, it's clock was 1970.
And for servers and PCs, it's possible that RTC backup battery run out,
then clock get set correctly afterward by, say, NTP.


> The net is that the check is basically a sanity check to make any bugs
> in the orphaned list handling would be discovered, although it can
> also trigger if there is block device corruption where part of the
> inode table is corrupted.  I had added hueristics that for most people
> meant that it never triggered, so I'm surprised that it actually did
> in your environment.  Still, if it did, the easist thing to do is to
> just turn it off.

Now, after things behind the problem turned out, it's easy.
But let me point out that,

 * It is very difficult to relate RTC to the problem.
   No clue without digging into e2fsck source code.

 * -p (preen) option of e2fsck doen't fix it automatically.
   Though I'm not sure but, maybe it's safe to correct the
   problem automatically?


Actually, it took me for several weeks to solve, because it is rare.
My system only reset RTC for hardware reset or when main battery run out
but not for software reset. But it can happen.


Thank you.
--
Ryoichi KATO <Ryoichi.Kato@jp.sony.com>
    System Design Dept. No4
    Audio Development & Engineering Div.
    Sony Corporation Audio Business Group
    Tel +81-3-3599-3862 / Fax +81-3-3599-3859

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: e2fsck bogus error report on orphan-list
  2007-07-19 23:20   ` Ryoichi KATO
@ 2007-07-20  4:10     ` Theodore Tso
  2007-07-20  9:45       ` Ryoichi KATO
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Tso @ 2007-07-20  4:10 UTC (permalink / raw)
  To: Ryoichi KATO; +Cc: linux-ext4, sct, akpm, adilger, tim.bird

On Fri, Jul 20, 2007 at 08:20:26AM +0900, Ryoichi KATO wrote:
> > >    1. Delete a file in an ext3 filesystem in early 1970
> > 
> > Dare I ask *why* the system clock was set in the 1970's?  Umm... don't
> > do that.
> 
> As Tim pointed out, embedded devices offten omit RTC battery.

Yes, I added the busted_fs_clock specifically to handle this.

> > There is code that detects when the time is set back in the 1970's
> > (normally due to a bad clock battery) and thus disables this
> > particular check.  So it only triggers when the clock was previously
> > bad, and is now good.
> 
> Actually, it's a *real* problem happend for my car navigation product.
> Until GPS signal is available, it's clock was 1970.
> And for servers and PCs, it's possible that RTC backup battery run out,
> then clock get set correctly afterward by, say, NTP.

Sure, but we have checks to detect if the last superblock write *or*
last mount time is before 1970, and if so, we declare the filesystem
has having a busted/insane system clock, and and we disable the
dtime/orphaned inode checks.  This in practice is plenty since it
means that the mount-time is in the 1970's, and then when NTP sets the
time, we're fine, since that's generally after the e2fsck and the
mount of the filesystem.  

So for it to trigger it requires a very strange set of modulations of
the time.  You need to have time be correct at the time of the mount
(so s_mtime is sane, implying that the RTC backup battery is not
dead), and *then* reset to the 1970's, delete some files, then be
correct when the filesystem is unmounted (so s_wtime is sane).  That's
pretty hard to accomplishl; and I would submit, even on embedded
systems.  The system clock must be crazily warping back and forth
between correct time and 1970's/insane time in order for this to be an
issue.  

This has been true since e2fsprogs 1.38, released June 30, 2005;
before that point we only checked s_wtime for sanity, and we did have
a few cases slip through, but ever since I added the s_wtime check, I
haven't had anyone report a problem until now.  (Although if people
don't e-mail me, and just do conference presentations, I'd have no way
of finding out unless I was lucky enough to attend the conference.  :-)

>  * It is very difficult to relate RTC to the problem.
>    No clue without digging into e2fsck source code.

Yes.  As I said, it might be a good idea to add an
unreliable_system_time config parameter to e2fsck in the future to
catch this case.  That would also document the issue to avoid future
people from running into this.

>  * -p (preen) option of e2fsck doen't fix it automatically.
>    Though I'm not sure but, maybe it's safe to correct the
>    problem automatically?

Yes, but this was deliberate; if there was a bug in the kernel's
orphan handling code, I really wanted to know about it, and if it was
just -p, most folk would never know.  (Although if there were orphan
list handling bugs, it could cause some truncates would not be
reliably replayed, so it might cause even **harder** to diagnose bugs.
Life is always full of tradeoffs.)

							- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: e2fsck bogus error report on orphan-list
  2007-07-20  4:10     ` Theodore Tso
@ 2007-07-20  9:45       ` Ryoichi KATO
  0 siblings, 0 replies; 6+ messages in thread
From: Ryoichi KATO @ 2007-07-20  9:45 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4, sct, akpm, adilger, tim.bird

At Fri, 20 Jul 2007 00:10:52 -0400,
Theodore Tso wrote:
> So for it to trigger it requires a very strange set of modulations of
> the time.  You need to have time be correct at the time of the mount
> (so s_mtime is sane, implying that the RTC backup battery is not
> dead), and *then* reset to the 1970's, delete some files, then be
> correct when the filesystem is unmounted (so s_wtime is sane).  That's
> pretty hard to accomplishl; and I would submit, even on embedded
> systems.  The system clock must be crazily warping back and forth
> between correct time and 1970's/insane time in order for this to be an
> issue.  

If I'm understanding correctly, once you have deleted a file in 1970,
it might stay in a filesystem for a certain period of time, like a time bomb.
Then you don't have to have the clock to jump back and forth.
I seems to me that evan a typical PCs can have the symptom,
after two reboots like this:

  1.  RTC backup run out
  2.  hardware reboot; set RTC to 1970.
  3.  mount, delete a file (in 2007)
  4.  umount
  5.  Set clock to 2007 (manually, or by NTP)
  - - - -
  6.  reboot (software reset which don't reset the RTC, or replace battery.)
  7.  e2fsck (no problem this time)
  8.  mount (in 2007)
  9.  write (in 2007)
  10. umount 
  - - - -
  11. reboot
  12. e2fsck, hit the problem.

No way to notice the real reason (RTC), if the system is a server
and only reboots once a year.

 
> >  * It is very difficult to relate RTC to the problem.
> >    No clue without digging into e2fsck source code.
> 
> Yes.  As I said, it might be a good idea to add an
> unreliable_system_time config parameter to e2fsck in the future to
> catch this case.  That would also document the issue to avoid future
> people from running into this.
And might it be also very helpful to have some hint in the e2fsck message?


> >  * -p (preen) option of e2fsck doen't fix it automatically.
> >    Though I'm not sure but, maybe it's safe to correct the
> >    problem automatically?
> 
> Yes, but this was deliberate; if there was a bug in the kernel's
> orphan handling code, I really wanted to know about it, and if it was
> just -p, most folk would never know.  (Although if there were orphan
> list handling bugs, it could cause some truncates would not be
> reliably replayed, so it might cause even **harder** to diagnose bugs.
> Life is always full of tradeoffs.)
OK, I agree.  You have at least one example of such person here :-)


Regards,
--
Ryoichi KATO <Ryoichi.Kato@jp.sony.com>
    Audio Development & Engineering Div.
    Sony Corporation Audio Business Group
    Tel +81-3-3599-3862 / Fax +81-3-3599-3859

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-07-20  9:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-19 15:39 e2fsck bogus error report on orphan-list Ryoichi.Kato
2007-07-19 16:55 ` Theodore Tso
2007-07-19 21:05   ` Tim Bird
2007-07-19 23:20   ` Ryoichi KATO
2007-07-20  4:10     ` Theodore Tso
2007-07-20  9:45       ` Ryoichi KATO

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox