ext3-2.4-0.9.6

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* ext3-2.4-0.9.6
@ 2001-08-12  1:40 Andrew Morton
  2001-08-12  1:46 ` ext3-2.4-0.9.6 Tom Rini
  2001-08-12  2:38 ` ext3-2.4-0.9.6 Ralf Baechle
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2001-08-12  1:40 UTC (permalink / raw)
  To: ext3-users@redhat.com, lkml

Patch against linux-2.4.8 is at

	http://www.uow.edu.au/~andrewm/linux/ext3/

The only changes here are merging up to 2.4.8 and the bigendian
fix.

linux-2.4.8-ac1 currently has ext3-0.9.3 which has no known
crash-worthy bugs, but is old.  I'm about to send Alan a diff
which takes -ac up to 0.9.6.  The changes between 0.9.3 and
0.9.6 may be summarised as:

- Simplify the handling of synchronous operations (O_SYNC, fsync(),
  chattr +S, etc).

- Fix a couple of places where we're not syncing writes when we should.

- Implement batching of synchronous operations: when multiple threads
  want to perform synchronous writes we allow the threads to block together
  and all their writes happen in the same transaction.   Speeds things up
  muchly.

- Implement support for external journal devices.  This is experimental
  at this stage.  It works fine, but the operational interfaces will change.
  At present the external journal device is not "mounted" when we're using
  it and it really should be.

- ext3 has for a long time had developer code which allows the target device
  to be turned read-only at the disk device driver level a certain number
  of jiffies after the fs was mounted.  This is to allow scripted testing
  of crash recovery.  This facility has been extended to support two devices;
  one for the filesystem and one for the external journal device.

- Accelerate an O(N^2) algorithm in log_do_checkpoint().

- Accelerate an O(N^2) algorithm in journal_commit_transaction().

- Rate-limit some error messages which can come out when we're
  hopelessly out of memory.

- Honour __GFP_WAIT in journal_try_to_free_buffers().  The fs is supposed
  to perform synchronous writeout on the second pass of page_launder() and
  we weren't doing that - we were starting all IO async.  The net effect of
  this change is to decrease throughout with dbench by 10-20%, but system
  CPU time goes from 60% to 30%.  It's the right thing to do...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  1:40 ext3-2.4-0.9.6 Andrew Morton
@ 2001-08-12  1:46 ` Tom Rini
  2001-08-12  1:55   ` ext3-2.4-0.9.6 Andrew Morton
  2001-08-12  2:38 ` ext3-2.4-0.9.6 Ralf Baechle
  1 sibling, 1 reply; 11+ messages in thread
From: Tom Rini @ 2001-08-12  1:46 UTC (permalink / raw)
  To: ext3-users, Andrew Morton; +Cc: lkml

On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote:

> Patch against linux-2.4.8 is at
> 
> 	http://www.uow.edu.au/~andrewm/linux/ext3/
> 
> The only changes here are merging up to 2.4.8 and the bigendian
> fix.

Gack.  I think about when you wrote this, I managed to crash again.  I
was running 2.4.8-pre8 + fsync_dev -> fsync_no_super + first fix.  It
was at transaction.c:1184, but the logs didn't make it to disk.  On a
related note, what does ext3 do to the disk when this happens,  I
think I need to point the yaboot author at it since it couldn't
load a kernel (which was fun, let me tell you.. :))

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  1:46 ` ext3-2.4-0.9.6 Tom Rini
@ 2001-08-12  1:55   ` Andrew Morton
  2001-08-12  2:15     ` ext3-2.4-0.9.6 Tom Rini
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2001-08-12  1:55 UTC (permalink / raw)
  To: Tom Rini; +Cc: ext3-users, lkml

Tom Rini wrote:
> 
> On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote:
> 
> > Patch against linux-2.4.8 is at
> >
> >       http://www.uow.edu.au/~andrewm/linux/ext3/
> >
> > The only changes here are merging up to 2.4.8 and the bigendian
> > fix.
> 
> Gack.  I think about when you wrote this, I managed to crash again.  I
> was running 2.4.8-pre8 + fsync_dev -> fsync_no_super + first fix.  It
> was at transaction.c:1184, but the logs didn't make it to disk.

I'd assumed that this was related to the endianness fix.  You're
sure you were running with that in place?  If you can capture
a buffer trace that'd be great.

>  On a
> related note, what does ext3 do to the disk when this happens,  I
> think I need to point the yaboot author at it since it couldn't
> load a kernel (which was fun, let me tell you.. :))

ext3 is designed to nicely crash the machine if it thinks something
may be wrong with the fs - it's very defensive of your data.

If yaboot is open firmware's native ext2 capability then presumably
it refuses to read an ext3 partition which needs recovery.  ext3
is designed to not be compatible with ext2 when it's in the
needs-recovery state.

Probably the simplest way to avoid this is to make the boot partition
ext2.

-

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  1:55   ` ext3-2.4-0.9.6 Andrew Morton
@ 2001-08-12  2:15     ` Tom Rini
  2001-08-12  2:28       ` ext3-2.4-0.9.6 Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Rini @ 2001-08-12  2:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ext3-users, lkml

On Sat, Aug 11, 2001 at 06:55:05PM -0700, Andrew Morton wrote:
> Tom Rini wrote:
> > 
> > On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote:
> > 
> > > Patch against linux-2.4.8 is at
> > >
> > >       http://www.uow.edu.au/~andrewm/linux/ext3/
> > >
> > > The only changes here are merging up to 2.4.8 and the bigendian
> > > fix.
> > 
> > Gack.  I think about when you wrote this, I managed to crash again.  I
> > was running 2.4.8-pre8 + fsync_dev -> fsync_no_super + first fix.  It
> > was at transaction.c:1184, but the logs didn't make it to disk.
> 
> I'd assumed that this was related to the endianness fix.  You're
> sure you were running with that in place?  If you can capture
> a buffer trace that'd be great.

I'm sure I had the fix in.  I re-ran the original test I had a few times
and it was good.  I'll try and capture the buffer trace if it happens
again, but last time I'm guessing it happened on my root fs, so the log
couldn't goto disk.

> >  On a
> > related note, what does ext3 do to the disk when this happens,  I
> > think I need to point the yaboot author at it since it couldn't
> > load a kernel (which was fun, let me tell you.. :))
> 
> ext3 is designed to nicely crash the machine if it thinks something
> may be wrong with the fs - it's very defensive of your data.
> 
> If yaboot is open firmware's native ext2 capability then presumably
> it refuses to read an ext3 partition which needs recovery.  ext3
> is designed to not be compatible with ext2 when it's in the
> needs-recovery state.

It's the linux bootloader that OF runs.  Is there any 'safe' way to read
data off of an unclean ext3 partition?  I'm thinking grub might run into
this problem too..

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  2:15     ` ext3-2.4-0.9.6 Tom Rini
@ 2001-08-12  2:28       ` Andrew Morton
  2001-08-12  2:47         ` ext3-2.4-0.9.6 Tom Rini
  2001-08-13 18:15         ` ext3-2.4-0.9.6 Andreas Dilger
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2001-08-12  2:28 UTC (permalink / raw)
  To: Tom Rini; +Cc: ext3-users, lkml

Tom Rini wrote:
> 
> ...
> > I'd assumed that this was related to the endianness fix.  You're
> > sure you were running with that in place?  If you can capture
> > a buffer trace that'd be great.
> 
> I'm sure I had the fix in.  I re-ran the original test I had a few times
> and it was good.  I'll try and capture the buffer trace if it happens
> again, but last time I'm guessing it happened on my root fs, so the log
> couldn't goto disk.

OK, thanks.

> > >  On a
> > > related note, what does ext3 do to the disk when this happens,  I
> > > think I need to point the yaboot author at it since it couldn't
> > > load a kernel (which was fun, let me tell you.. :))
> >
> > ext3 is designed to nicely crash the machine if it thinks something
> > may be wrong with the fs - it's very defensive of your data.
> >
> > If yaboot is open firmware's native ext2 capability then presumably
> > it refuses to read an ext3 partition which needs recovery.  ext3
> > is designed to not be compatible with ext2 when it's in the
> > needs-recovery state.
> 
> It's the linux bootloader that OF runs.  Is there any 'safe' way to read
> data off of an unclean ext3 partition?  I'm thinking grub might run into
> this problem too..

Well, LILO works OK with an unclean ext3 FS because it goes straight to
the underlying blocks.  If both grub and OF parse the superblock compatibility
bits then they could fail in this manner.

I *think* that at present an unrecovered ext3 filesystem is "incomaptible"
with ext2.  If, however we were to make it "read-only compatible" then
ext2-aware loaders would still be able to read the fs and boot from it.
But this stuff makes my head hurt - let's see what Andreas and Stephen
have to say.

-

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  1:40 ext3-2.4-0.9.6 Andrew Morton
  2001-08-12  1:46 ` ext3-2.4-0.9.6 Tom Rini
@ 2001-08-12  2:38 ` Ralf Baechle
  2001-08-12  3:10   ` ext3-2.4-0.9.6 Andrew Morton
  2001-08-13 17:56   ` ext3-2.4-0.9.6 Stephen C. Tweedie
  1 sibling, 2 replies; 11+ messages in thread
From: Ralf Baechle @ 2001-08-12  2:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ext3-users@redhat.com, lkml

On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote:

> - ext3 has for a long time had developer code which allows the target device
>   to be turned read-only at the disk device driver level a certain number
>   of jiffies after the fs was mounted.  This is to allow scripted testing
>   of crash recovery.  This facility has been extended to support two devices;
>   one for the filesystem and one for the external journal device.

Would this facility also be able to deal with parts of a device becoming
read-only unexpectedly?  Some of the disks I have in RAIDs have the
nice habit of disabling write access when overheating.  That's an
interesting failure scenario in a RAID system.

  Ralf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  2:28       ` ext3-2.4-0.9.6 Andrew Morton
@ 2001-08-12  2:47         ` Tom Rini
  2001-08-12  4:58           ` ext3-2.4-0.9.6 Ben Collins
  2001-08-13 18:15         ` ext3-2.4-0.9.6 Andreas Dilger
  1 sibling, 1 reply; 11+ messages in thread
From: Tom Rini @ 2001-08-12  2:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ext3-users, lkml

On Sat, Aug 11, 2001 at 07:28:51PM -0700, Andrew Morton wrote:
> Tom Rini wrote:
> > > >  On a
> > > > related note, what does ext3 do to the disk when this happens,  I
> > > > think I need to point the yaboot author at it since it couldn't
> > > > load a kernel (which was fun, let me tell you.. :))
> > >
> > > ext3 is designed to nicely crash the machine if it thinks something
> > > may be wrong with the fs - it's very defensive of your data.
> > >
> > > If yaboot is open firmware's native ext2 capability then presumably
> > > it refuses to read an ext3 partition which needs recovery.  ext3
> > > is designed to not be compatible with ext2 when it's in the
> > > needs-recovery state.
> > 
> > It's the linux bootloader that OF runs.  Is there any 'safe' way to read
> > data off of an unclean ext3 partition?  I'm thinking grub might run into
> > this problem too..
> 
> Well, LILO works OK with an unclean ext3 FS because it goes straight to
> the underlying blocks.  If both grub and OF parse the superblock compatibility
> bits then they could fail in this manner.

Both GRUB and yaboot can read directly from the fs.  It's possible to boot
a kernel right out of OF from an HFS partition (which I had to do to get
the box up again).  It might be worth documenting this someplace.

> I *think* that at present an unrecovered ext3 filesystem is "incomaptible"
> with ext2.  If, however we were to make it "read-only compatible" then
> ext2-aware loaders would still be able to read the fs and boot from it.

That'd be nice.  There's lots of PPC boxes which don't have a seperate
/boot.  But ext2 should be able to read a clean ext3, yes?  I never get
a chance to check thing... :)

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  2:38 ` ext3-2.4-0.9.6 Ralf Baechle
@ 2001-08-12  3:10   ` Andrew Morton
  2001-08-13 17:56   ` ext3-2.4-0.9.6 Stephen C. Tweedie
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2001-08-12  3:10 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: ext3-users@redhat.com, lkml

Ralf Baechle wrote:
> 
> On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote:
> 
> > - ext3 has for a long time had developer code which allows the target device
> >   to be turned read-only at the disk device driver level a certain number
> >   of jiffies after the fs was mounted.  This is to allow scripted testing
> >   of crash recovery.  This facility has been extended to support two devices;
> >   one for the filesystem and one for the external journal device.
> 
> Would this facility also be able to deal with parts of a device becoming
> read-only unexpectedly?  Some of the disks I have in RAIDs have the
> nice habit of disabling write access when overheating.  That's an
> interesting failure scenario in a RAID system.

Well, that facility is purely for development purposes.  The obvious
way of testing recovery is to hit the reset button at strategic
times, which rather sucks.  So what the above IDE driver trick does
is adds a new mount option `ro-after=3000'.  When this is provided,
a kernel timer fires 30 seconds after mount and the IDE driver starts
silently ignoring writes to the underlying device.  It also provides
a special ioctl() which blocks the caller until the timer has fired.

So we have scripts which do:

1: mount fs, set to go read-only in 30 seconds
2: start some filesystem activity
3: Block on the timer
4: wake up, kill off the filesystem activity
5: unmount the fs
6: mount the fs (this will run recovery)
7: unmount the fs
8: fsck it
9: repeat with a different read-only interval.

I also have a hacked-on version of dbench which writes
known-but-variable info into the files, so we can check that
the contents of whatever files survived the "crash" are correct.

This setup has allowed me to run crash+recovery many thousands
of times with varying workloads - I'm pretty confident about recovery
because of this.  The one thing it doesn't cover is the effects
of disk write caching.

As for the RAID problem: if the filesystem has magically turned
read-only then all you need to do is to unmount it (often hard
to do, if it's in use), then make it writable and then mount it or
run fsck against it.  ext3 will perform recovery and all should
be peachy, until next time...

-

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  2:47         ` ext3-2.4-0.9.6 Tom Rini
@ 2001-08-12  4:58           ` Ben Collins
  0 siblings, 0 replies; 11+ messages in thread
From: Ben Collins @ 2001-08-12  4:58 UTC (permalink / raw)
  To: Tom Rini; +Cc: Andrew Morton, ext3-users, lkml

On Sat, Aug 11, 2001 at 07:47:44PM -0700, Tom Rini wrote:
> On Sat, Aug 11, 2001 at 07:28:51PM -0700, Andrew Morton wrote:
> > Well, LILO works OK with an unclean ext3 FS because it goes straight to
> > the underlying blocks.  If both grub and OF parse the superblock compatibility
> > bits then they could fail in this manner.
> 
> Both GRUB and yaboot can read directly from the fs.  It's possible to boot
> a kernel right out of OF from an HFS partition (which I had to do to get
> the box up again).  It might be worth documenting this someplace.

Same goes for SILO too. Not sure if SILO even works with ext3 right now
(uses libext2fs, so I assume so).

Ben

-- 
 .----------=======-=-======-=========-----------=====------------=-=-----.
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  bcollins@debian.org  --  bcollins@openldap.org  --  bcollins@linux.com  '
 `---=========------=======-------------=-=-----=-===-======-------=--=---'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  2:38 ` ext3-2.4-0.9.6 Ralf Baechle
  2001-08-12  3:10   ` ext3-2.4-0.9.6 Andrew Morton
@ 2001-08-13 17:56   ` Stephen C. Tweedie
  1 sibling, 0 replies; 11+ messages in thread
From: Stephen C. Tweedie @ 2001-08-13 17:56 UTC (permalink / raw)
  To: ext3-users; +Cc: Andrew Morton, lkml

Hi,

On Sun, Aug 12, 2001 at 04:38:41AM +0200, Ralf Baechle wrote:
> On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote:
> 
> > - ext3 has for a long time had developer code which allows the target device
> >   to be turned read-only at the disk device driver level a certain number
> >   of jiffies after the fs was mounted.  This is to allow scripted testing
> >   of crash recovery.  This facility has been extended to support two devices;
> >   one for the filesystem and one for the external journal device.
> 
> Would this facility also be able to deal with parts of a device becoming
> read-only unexpectedly?  Some of the disks I have in RAIDs have the
> nice habit of disabling write access when overheating.  That's an
> interesting failure scenario in a RAID system.

No, that particular part of the ext3 patch is only there for testing
--- it forces a single device readonly to simulate a crash.  It adds
no ability to deal cleanly with a device unexpectedly going readonly
on its own. 

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext3-2.4-0.9.6
  2001-08-12  2:28       ` ext3-2.4-0.9.6 Andrew Morton
  2001-08-12  2:47         ` ext3-2.4-0.9.6 Tom Rini
@ 2001-08-13 18:15         ` Andreas Dilger
  1 sibling, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2001-08-13 18:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tom Rini, ext3-users, lkml

Andrew writes:
> I *think* that at present an unrecovered ext3 filesystem is "incomaptible"
> with ext2.  If, however we were to make it "read-only compatible" then
> ext2-aware loaders would still be able to read the fs and boot from it.
> But this stuff makes my head hurt - let's see what Andreas and Stephen
> have to say.

I advocated changing the compat flag to be RO_COMPAT at one time as well.
Technically, an unrecovered ext3 filesystem is as "compatible" as an ext2
filesystem that was not fscked before mount.  We ro mount unchecked root
filesystems all the time, so there shouldn't be a _huge_ issue for ro
mounting unrecovered ext3 filesystems.

Stephen and Ted disagree, because with the ext3 journal it is possible
to have a large number of pending changes in the journal at the time
of a crash.  The Linux VFS doesn't easily allow flushing all of the
cached inodes between recovery and remount-rw, so this may cause filesystem
corruption if the in-kernel inode data does not match the on-disk inode
data after recovery.

The only time this becomes an issue is with the root filesystem, generally.

The "solution" for the problem at hand would probably be to make
GRUB, et. al., recognize the INCOMPAT_RECOVER flag, and still read the
kernel/initrd images from disk.  They will generally be static, so the
contents of the journal will not affect them, and can be ignored.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-08-13 18:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-08-12  1:40 ext3-2.4-0.9.6 Andrew Morton
2001-08-12  1:46 ` ext3-2.4-0.9.6 Tom Rini
2001-08-12  1:55   ` ext3-2.4-0.9.6 Andrew Morton
2001-08-12  2:15     ` ext3-2.4-0.9.6 Tom Rini
2001-08-12  2:28       ` ext3-2.4-0.9.6 Andrew Morton
2001-08-12  2:47         ` ext3-2.4-0.9.6 Tom Rini
2001-08-12  4:58           ` ext3-2.4-0.9.6 Ben Collins
2001-08-13 18:15         ` ext3-2.4-0.9.6 Andreas Dilger
2001-08-12  2:38 ` ext3-2.4-0.9.6 Ralf Baechle
2001-08-12  3:10   ` ext3-2.4-0.9.6 Andrew Morton
2001-08-13 17:56   ` ext3-2.4-0.9.6 Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox