ReiserFS problems

All of lore.kernel.org
 help / color / mirror / Atom feed

* ReiserFS problems
@ 2003-08-06 16:20 Rogier Wolff
  2003-08-06 16:43 ` Hans Reiser
                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 16:20 UTC (permalink / raw)
  To: reiserfs-list; +Cc: copy

Hi,

We're using reiserfs on a large (640Gb) raid disk. 

Reiserfs messed up our filesystem again (one file gives us "permission
denied" when we try to remove it). So when we had the system in
single-user mode because of a hardware change (another 600G raid
partition). We decided to run reiserfsck...

Because I seem to remember that without the --rebuild-tree this
problem doesn't go away, we decided to run with the --rebuild-tree
option immediately. After some "pondering" it mentionted it was going
to read some 137 million blocks. We were impressed with the speed at
which it was doing that: 130Mbytes per second. But then we realized
that was going to take quite a while. I just did the math, and it's
going to take another 2 hours. (which is optimistic as the disk
doesn't do 130M per second at the end, but only just over 70Mbytes per
second)

A "surface scan" needs to read all the datablocks. But an fsck
doesn't. At least that's the normal case.

As we were not going to be here in 2 hours, and we still have some
work to do, we decided that we would be able to live with the "non
removable file" for some more time, and that we'd run the fsck
later. So we hit control-C on the fsck.

But now mounting the filesystem gives us: 

ReiserFS version 3.6.25
reiserfs: checking transaction log (device 09:00) ...
is_tree_node: node level 0 does not match to the expected one 65534
vs-5150: search_by_key: invalid format found in block 0. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [1 2 0x0 SD]
Using r5 hash to sort names
is_tree_node: node level 0 does not match to the expected one 65534
vs-5150: search_by_key: invalid format found in block 0. Fsck?
vs-2140: finish_unfinished: search_by_key returned -2

and fsck without --rebuild-tree gives us that an unfinished
--rebuild-tree was in progress. So we've restarted the tree-rebuild.

Question: If it is reading all datablocks, I'm guessing that it is
looking for the magics that build up the filesystem. We're a
datarecovery company. We probably don't have any current
datarecoveries of people with Reiserfs on their disk. But if we had a
disk-image with a valid (or not) Reiserfs on it, would it link that
into our filesytem?

Anyway, when I first started out with Reiserfs, it didn't support > 2G
files (or was it 4G?) I had to patch the kernel and (irreversably!) 
upgrade the on-disk format. 

We've noticed horrible slowdowns when the filesystem is > 90% full. It
turns out that when a block group is more than 90% full reiserfs will
prefer a different block group. i.e. it is ALWAYS switching block
groups when the whole disk is > 90% full. Something like that. When we
report something like that it's always: Ah, yes, that's an old bug
we've fixed it. Use patch.....

Well, FYI, this is the last incident we have with Reiserfs, and we'll
move on to something that's a bit more mature. Feel free to continue
to work on your toy filesystem, but we're no longer available for
testing it. I'm sorry. 

Good luck!

			Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 16:20 ReiserFS problems Rogier Wolff
@ 2003-08-06 16:43 ` Hans Reiser
  2003-08-06 18:41   ` Jeff Mahoney
  2003-08-06 20:48   ` Bernd Schubert
  2003-08-06 16:48 ` Oleg Drokin
  2003-08-06 16:52 ` Andreas Dilger
  2 siblings, 2 replies; 47+ messages in thread
From: Hans Reiser @ 2003-08-06 16:43 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: reiserfs-list, copy, Jeff Mahoney

Vitaly, when --rebuild-tree is used, warn the user "rebuilding tree, 
once this is started it must be finished or the filesystem will be 
unusable.  If you kill it or the machine crashes before it finishes, run 
it again.".

Also explain this in the doc for --rebuiild-tree.

Hans

>
>
>Question: If it is reading all datablocks, I'm guessing that it is
>looking for the magics that build up the filesystem.
>
not sure what a magic is.  Perhaps you mean formatted nodes.

> We're a
>datarecovery company. We probably don't have any current
>datarecoveries of people with Reiserfs on their disk. But if we had a
>disk-image with a valid (or not) Reiserfs on it, would it link that
>into our filesytem? to 
>
If you store a copy of reiserfs on reiserfs, it totally screws fsck for 
V3.  V4 has some features added to the node format specifically to avoid 
that.

>
>We've noticed horrible slowdowns when the filesystem is > 90% full. It
>turns out that when a block group is more than 90% full reiserfs will
>prefer a different block group. i.e. it is ALWAYS switching block
>groups when the whole disk is > 90% full. Something like that. When we
>report something like that it's always: Ah, yes, that's an old bug
>we've fixed it. Use patch.....
>
I don't think you reported that to me.....

Jeff, give me an opinion on this....

For V4 we have a repacker.

-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 16:20 ReiserFS problems Rogier Wolff
  2003-08-06 16:43 ` Hans Reiser
@ 2003-08-06 16:48 ` Oleg Drokin
  2003-08-06 17:18   ` Rogier Wolff
                     ` (2 more replies)
  2003-08-06 16:52 ` Andreas Dilger
  2 siblings, 3 replies; 47+ messages in thread
From: Oleg Drokin @ 2003-08-06 16:48 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: reiserfs-list, copy

Hello!

On Wed, Aug 06, 2003 at 06:20:55PM +0200, Rogier Wolff wrote:

> Reiserfs messed up our filesystem again (one file gives us "permission

And you use what kernel with what patches on what hardware?

> A "surface scan" needs to read all the datablocks. But an fsck
> doesn't. At least that's the normal case.

reiserfsck --rebuild-tree is special, it actually reads in all
the blocks on the device that are marked as used, to find metadata blocks and
connect them to the tree (even if they were previously unconnected).
Unlike many other filesystems out there, reiserfs does not have fixed metadata locations,
hence we absolutely need this scan.

> later. So we hit control-C on the fsck.

That was big mistake.

> But now mounting the filesystem gives us: 
> ReiserFS version 3.6.25
> reiserfs: checking transaction log (device 09:00) ...
> is_tree_node: node level 0 does not match to the expected one 65534
> vs-5150: search_by_key: invalid format found in block 0. Fsck?
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [1 2 0x0 SD]
> Using r5 hash to sort names
> is_tree_node: node level 0 does not match to the expected one 65534
> vs-5150: search_by_key: invalid format found in block 0. Fsck?
> vs-2140: finish_unfinished: search_by_key returned -2
> and fsck without --rebuild-tree gives us that an unfinished
> --rebuild-tree was in progress. So we've restarted the tree-rebuild.

Yes. Once you run tree-rebuild, you must wait until it is completed.
(Documentation update is scheduled just now. But in fact we mention this in our FAQ).

> Question: If it is reading all datablocks, I'm guessing that it is

All one that are marked as occupied in the bitmaps.

> looking for the magics that build up the filesystem. We're a

Yes.

> datarecovery company. We probably don't have any current
> datarecoveries of people with Reiserfs on their disk. But if we had a
> disk-image with a valid (or not) Reiserfs on it, would it link that
> into our filesytem?

yes it will.
So basically speaking you do not want to run rebuild-tree operation on the 
FS that contains files with reiserfs metadata embedded in them in clear.
This is also explained in our FAQ.

> Anyway, when I first started out with Reiserfs, it didn't support > 2G
> files (or was it 4G?) I had to patch the kernel and (irreversably!) 
> upgrade the on-disk format. 

Yes. Linux by itself was not supporting 2G some time ago and people used patches
an changed their on disk formats even for other filesystems out there.

> We've noticed horrible slowdowns when the filesystem is > 90% full. It
> turns out that when a block group is more than 90% full reiserfs will
> prefer a different block group. i.e. it is ALWAYS switching block
> groups when the whole disk is > 90% full. Something like that. When we
> report something like that it's always: Ah, yes, that's an old bug
> we've fixed it. Use patch.....

In fact this is not exactly true, it only switches to other "block group" if
you are creating new file. Why do you think this is a problem?
(of course I am speaking of 2.4.20+ kernels).

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 16:20 ReiserFS problems Rogier Wolff
  2003-08-06 16:43 ` Hans Reiser
  2003-08-06 16:48 ` Oleg Drokin
@ 2003-08-06 16:52 ` Andreas Dilger
  2 siblings, 0 replies; 47+ messages in thread
From: Andreas Dilger @ 2003-08-06 16:52 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: reiserfs-list, copy

On Aug 06, 2003  18:20 +0200, Rogier Wolff wrote:
> Question: If it is reading all datablocks, I'm guessing that it is
> looking for the magics that build up the filesystem. We're a
> datarecovery company. We probably don't have any current
> datarecoveries of people with Reiserfs on their disk. But if we had a
> disk-image with a valid (or not) Reiserfs on it, would it link that
> into our filesytem?

Correct.  I think that is mentioned somewhere with the resierfsck docs
not to try this with an image of a reiserfsck disk in the filesystem.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 16:48 ` Oleg Drokin
@ 2003-08-06 17:18   ` Rogier Wolff
  2003-08-06 17:28     ` Oleg Drokin
                       ` (2 more replies)
  2003-08-06 17:22   ` Rogier Wolff
  2003-08-07 12:58   ` Hans Reiser
  2 siblings, 3 replies; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 17:18 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Rogier Wolff, reiserfs-list, copy

On Wed, Aug 06, 2003 at 08:48:52PM +0400, Oleg Drokin wrote:
> Hello!
> 
> On Wed, Aug 06, 2003 at 06:20:55PM +0200, Rogier Wolff wrote:
> 
> > Reiserfs messed up our filesystem again (one file gives us "permission
> 
> And you use what kernel with what patches on what hardware?

Linux version 2.4.20-rmap15i (root@obelix) (gcc version 2.95.3 20010315 (SuSE)) #1 SMP Fri May 23 15:08:55 CEST 2003

Dual athlon 2000. 

> > A "surface scan" needs to read all the datablocks. But an fsck
> > doesn't. At least that's the normal case.
 
> reiserfsck --rebuild-tree is special, it actually reads in all the
> blocks on the device that are marked as used, to find metadata
> blocks and connect them to the tree (even if they were previously
> unconnected).  Unlike many other filesystems out there, reiserfs
> does not have fixed metadata locations, hence we absolutely need
> this scan.

I'm working on an XFS recovery. It's got it's inodes all over the
place as well. 

> > later. So we hit control-C on the fsck.
> 
> That was big mistake.

It was only a couple of percent done. All we have to do now is run it
again, and let it continue.

> > But now mounting the filesystem gives us: 
> > ReiserFS version 3.6.25
> > reiserfs: checking transaction log (device 09:00) ...
> > is_tree_node: node level 0 does not match to the expected one 65534
> > vs-5150: search_by_key: invalid format found in block 0. Fsck?
> > vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [1 2 0x0 SD]
> > Using r5 hash to sort names
> > is_tree_node: node level 0 does not match to the expected one 65534
> > vs-5150: search_by_key: invalid format found in block 0. Fsck?
> > vs-2140: finish_unfinished: search_by_key returned -2
> > and fsck without --rebuild-tree gives us that an unfinished
> > --rebuild-tree was in progress. So we've restarted the tree-rebuild.
> 
> Yes. Once you run tree-rebuild, you must wait until it is completed.
> (Documentation update is scheduled just now. But in fact we mention this in our FAQ).
> 
> > Question: If it is reading all datablocks, I'm guessing that it is
> 
> All one that are marked as occupied in the bitmaps.

Well, we cleared the old 240G partition by copying over the data to
our reiserfs partition. That's filled her up to almost 90%..... 

> > looking for the magics that build up the filesystem. We're a
> 
> Yes.
> 
> > datarecovery company. We probably don't have any current
> > datarecoveries of people with Reiserfs on their disk. But if we had a
> > disk-image with a valid (or not) Reiserfs on it, would it link that
> > into our filesytem?
> 
> yes it will.
> So basically speaking you do not want to run rebuild-tree operation on the 
> FS that contains files with reiserfs metadata embedded in them in clear.
> This is also explained in our FAQ.

Oh, great. It provably corrupts our filesystem which is only fixed by
running a rebuilt-tree, but if we have certain data (which we actually
are likely to have!) then we simply can't. 

WOW it's documented. So it's not a bug. OK. Fine. 

> > Anyway, when I first started out with Reiserfs, it didn't support > 2G
> > files (or was it 4G?) I had to patch the kernel and (irreversably!) 
> > upgrade the on-disk format. 
 
> Yes. Linux by itself was not supporting 2G some time ago and people
> used patches an changed their on disk formats even for other
> filesystems out there.

> > We've noticed horrible slowdowns when the filesystem is > 90% full. It
> > turns out that when a block group is more than 90% full reiserfs will
> > prefer a different block group. i.e. it is ALWAYS switching block
> > groups when the whole disk is > 90% full. Something like that. When we
> > report something like that it's always: Ah, yes, that's an old bug
> > we've fixed it. Use patch.....
 
> In fact this is not exactly true, it only switches to other "block
> group" if you are creating new file. Why do you think this is a
> problem?  (of course I am speaking of 2.4.20+ kernels).

Well we were recovering data into 1G files, but performance of adding
a new block was horrible. It was doing this for every block. Either it
was doing a fruitless search on every block-add or it was actually
adding the block to another block group. Anyway, performance dropped
-=*A LOT*=- when this happened.

I think you're describing the way it should be, or "is now", but there
was a bug that caused it to behave differently.

	Roger. 


-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 16:48 ` Oleg Drokin
  2003-08-06 17:18   ` Rogier Wolff
@ 2003-08-06 17:22   ` Rogier Wolff
  2003-08-06 18:01     ` Vitaly Fertman
  2003-08-07 12:58   ` Hans Reiser
  2 siblings, 1 reply; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 17:22 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Rogier Wolff, reiserfs-list, copy

On Wed, Aug 06, 2003 at 08:48:52PM +0400, Oleg Drokin wrote:

> reiserfsck --rebuild-tree is special, it actually reads in all the
> blocks on the device that are marked as used, to find metadata
> blocks and connect them to the tree (even if they were previously
> unconnected).  Unlike many other filesystems out there, reiserfs
> does not have fixed metadata locations, hence we absolutely need
> this scan.
> > later. So we hit control-C on the fsck.
> 
> That was big mistake.

Well, I forgot to say this in my last Email: but it's a big mistake
for "--rebuild-tree" to START by writing to the filesystem. As for a
large (and full) filesystem it is going to take HOURS to read the
whole thing, the first write to the filesystem should happen AFTER
reading all the datablocks.

			Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:18   ` Rogier Wolff
@ 2003-08-06 17:28     ` Oleg Drokin
  2003-08-06 17:49       ` Rogier Wolff
  2003-08-07 13:22       ` Hans Reiser
  2003-08-06 17:43     ` Andreas Dilger
  2003-08-07 13:03     ` Hans Reiser
  2 siblings, 2 replies; 47+ messages in thread
From: Oleg Drokin @ 2003-08-06 17:28 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: reiserfs-list, copy

Hello!

On Wed, Aug 06, 2003 at 07:18:06PM +0200, Rogier Wolff wrote:
> > > Reiserfs messed up our filesystem again (one file gives us "permission
> > And you use what kernel with what patches on what hardware?
> Linux version 2.4.20-rmap15i (root@obelix) (gcc version 2.95.3 20010315 (SuSE)) #1 SMP Fri May 23 15:08:55 CEST 2003
> Dual athlon 2000. 

Hm, there was a bug fixed after 2.4.20 was out, that might have lead to directory entries pointing to nowhere
(visible to you as I/O error when trying to access some file).

> > > A "surface scan" needs to read all the datablocks. But an fsck
> > > doesn't. At least that's the normal case.
> > reiserfsck --rebuild-tree is special, it actually reads in all the
> > blocks on the device that are marked as used, to find metadata
> > blocks and connect them to the tree (even if they were previously
> > unconnected).  Unlike many other filesystems out there, reiserfs
> > does not have fixed metadata locations, hence we absolutely need
> > this scan.
> I'm working on an XFS recovery. It's got it's inodes all over the
> place as well. 

And how do they find all of them when they are not sure that all of the inodes are
properly referenced? Do they have separete bitmaps for metadata or something else?

> > > later. So we hit control-C on the fsck.
> > That was big mistake.
> It was only a couple of percent done. All we have to do now is run it
> again, and let it continue.

Yes, you need to wait for it to finish.

> > > Question: If it is reading all datablocks, I'm guessing that it is
> > All one that are marked as occupied in the bitmaps.
> Well, we cleared the old 240G partition by copying over the data to
> our reiserfs partition. That's filled her up to almost 90%..... 

Well. As of now we do not have any better way of finding all of our metadata other than
reading all occupied blocks.

> > > datarecovery company. We probably don't have any current
> > > datarecoveries of people with Reiserfs on their disk. But if we had a
> > > disk-image with a valid (or not) Reiserfs on it, would it link that
> > > into our filesytem?
> > yes it will.
> > So basically speaking you do not want to run rebuild-tree operation on the 
> > FS that contains files with reiserfs metadata embedded in them in clear.
> > This is also explained in our FAQ.
> Oh, great. It provably corrupts our filesystem which is only fixed by
> running a rebuilt-tree, but if we have certain data (which we actually
> are likely to have!) then we simply can't. 

Well. This is actually unfortunate, I agree. In such a case you'd better
move your reiserfs images to some other place for the time of reiserfsck --rebuild-tree run.

> WOW it's documented. So it's not a bug. OK. Fine. 

This does not make it less annoying, though.
But we cannot do much about it. Really.

> > > We've noticed horrible slowdowns when the filesystem is > 90% full. It
> > > turns out that when a block group is more than 90% full reiserfs will
> > > prefer a different block group. i.e. it is ALWAYS switching block
> > > groups when the whole disk is > 90% full. Something like that. When we
> > > report something like that it's always: Ah, yes, that's an old bug
> > > we've fixed it. Use patch.....
> > In fact this is not exactly true, it only switches to other "block
> > group" if you are creating new file. Why do you think this is a
> > problem?  (of course I am speaking of 2.4.20+ kernels).
> Well we were recovering data into 1G files, but performance of adding
> a new block was horrible. It was doing this for every block. Either it

This is really strange. Unless you are having horrible fragmentation, that should
not happen.

> was doing a fruitless search on every block-add or it was actually
> adding the block to another block group. Anyway, performance dropped
> -=*A LOT*=- when this happened.

Can we ask for a metadata snapshot?
(debugreiserfs -d /dev/whatever_is_your_device | bzip2 -9c >metadata.bz2)
If you still have that FS, of course. It should not even be fully correct
for this to work.

> I think you're describing the way it should be, or "is now", but there
> was a bug that caused it to behave differently.

Or may be you just have some horrible fragmentation (for unknown reason).
I cannot tell without seeing what's on your fs.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:18   ` Rogier Wolff
  2003-08-06 17:28     ` Oleg Drokin
@ 2003-08-06 17:43     ` Andreas Dilger
  2003-08-06 17:52       ` Rogier Wolff
  2003-08-07 13:03     ` Hans Reiser
  2 siblings, 1 reply; 47+ messages in thread
From: Andreas Dilger @ 2003-08-06 17:43 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Oleg Drokin, reiserfs-list, copy

On Aug 06, 2003  19:18 +0200, Rogier Wolff wrote:
> > > later. So we hit control-C on the fsck.
> > 
> > That was big mistake.
> 
> It was only a couple of percent done. All we have to do now is run it
> again, and let it continue.

 From a user-safety point-of-view, you should use "tty()" to see if the program
is running interactively, and then trap CTRL-C and have it print a warning in
the signal handler that pressing CTRL-C again in the next second will kill it.
All you need then is to call "time()" and save it in a static, and if the
signal handler is called more than once in the same second only then exit.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:28     ` Oleg Drokin
@ 2003-08-06 17:49       ` Rogier Wolff
  2003-08-06 18:10         ` Vitaly Fertman
  2003-08-07 13:22       ` Hans Reiser
  1 sibling, 1 reply; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 17:49 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Rogier Wolff, reiserfs-list, copy

On Wed, Aug 06, 2003 at 09:28:34PM +0400, Oleg Drokin wrote:
> On Wed, Aug 06, 2003 at 07:18:06PM +0200, Rogier Wolff wrote:
> > Linux version 2.4.20-rmap15i (root@obelix) (gcc version 2.95.3 20010315 (SuSE)) #1 SMP Fri May 23 15:08:55 CEST 2003
> > Dual athlon 2000. 

> Hm, there was a bug fixed after 2.4.20 was out, that might have lead
> to directory entries pointing to nowhere (visible to you as I/O
> error when trying to access some file).

We saw "permission denied" when trying to remove a file, even as
"root" wether we saw an IO error in the logs I don't remember.

> > > > A "surface scan" needs to read all the datablocks. But an fsck
> > > > doesn't. At least that's the normal case.
> > > reiserfsck --rebuild-tree is special, it actually reads in all the
> > > blocks on the device that are marked as used, to find metadata
> > > blocks and connect them to the tree (even if they were previously
> > > unconnected).  Unlike many other filesystems out there, reiserfs
> > > does not have fixed metadata locations, hence we absolutely need
> > > this scan.

> > I'm working on an XFS recovery. It's got it's inodes all over the
> > place as well. 

> And how do they find all of them when they are not sure that all of
> the inodes are properly referenced? Do they have separete bitmaps
> for metadata or something else?

Well, something went wrong. That's for sure. The mess I'm seeing has
required the help of an "repair_xfs" tool. (i.e. it's now more
properly messed up than it was before running that tool).

I'm not currently interested in how things happened. I have found the
directory that my client is looking for, and I'm going to find the
inodes that are referenced there. I'm going to recover those files,
and be done with it.

> > WOW it's documented. So it's not a bug. OK. Fine. 
> 
> This does not make it less annoying, though.
> But we cannot do much about it. Really.

If I think about it for 5 seconds I can find a solution. When mkfs-ing
a new partition, make up a random FS-ID. Store that ID in every block
that rebuild-tree needs to find.

If you use 32 bits out of every 4k block (0.1%) for this and if we
happen to have 4 different reiserfs images on our disk, then we'll
have a one in a billion chance of messing up our filesystem by running
--rebuild tree. 

> > > > We've noticed horrible slowdowns when the filesystem is > 90% full. It
> > > > turns out that when a block group is more than 90% full reiserfs will
> > > > prefer a different block group. i.e. it is ALWAYS switching block
> > > > groups when the whole disk is > 90% full. Something like that. When we
> > > > report something like that it's always: Ah, yes, that's an old bug
> > > > we've fixed it. Use patch.....
> > > In fact this is not exactly true, it only switches to other "block
> > > group" if you are creating new file. Why do you think this is a
> > > problem?  (of course I am speaking of 2.4.20+ kernels).

> > Well we were recovering data into 1G files, but performance of
> adding > a new block was horrible. It was doing this for every
> block. Either it

> This is really strange. Unless you are having horrible
> fragmentation, that should not happen.

It was happening. It was a bug. We reported it, and it was a known bug
by then. it had been fixed. We just needed to do a kernel-rebuild on
our main fileserver. We'd rather not. We might have eventually. Don't
worry about this. It got fixed. Apparently. For us we now consider
this a "known bug" and when we see the fill-percentage go above 90% we
know we have to clean up again. It's a shame we can't use the last
60Gb of our disk, but what the heck...

currently 80% done of reading the 0.98 million leaf nodes.....

See http://www.bitwizard.nl/io_throughput.gif for the plot of the IO
throughput that is achieved during this fsck. (red, green = read, blue
= write).  The peaks (near 1800s) are when I tried reading from our
other 600G partition (which got counted as well). The important drops
are not caused by other activity on the system. Probably some areas on
the disk that are less populated than the rest of the disk.  (I
removed 120G of data (in about 5 million files, 1 million directories)
this morning).

			Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:43     ` Andreas Dilger
@ 2003-08-06 17:52       ` Rogier Wolff
  2003-08-07 13:27         ` Hans Reiser
  0 siblings, 1 reply; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 17:52 UTC (permalink / raw)
  To: Rogier Wolff, Oleg Drokin, reiserfs-list, copy

On Wed, Aug 06, 2003 at 11:43:31AM -0600, Andreas Dilger wrote:
> On Aug 06, 2003  19:18 +0200, Rogier Wolff wrote:
> > > > later. So we hit control-C on the fsck.
> > > 
> > > That was big mistake.
> > 
> > It was only a couple of percent done. All we have to do now is run it
> > again, and let it continue.
 
>  From a user-safety point-of-view, you should use "tty()" to see if
>  the program > is running interactively, and then trap CTRL-C and
>  have it print a warning in > the signal handler that pressing
>  CTRL-C again in the next second will kill it.  > All you need then
>  is to call "time()" and save it in a static, and if the > signal
>  handler is called more than once in the same second only then exit.

No. The warning should not be that pressing control-C again will kill
the program, but that interrupting a rebuild-tree will make your
filesystem unmountable, and that pressing control-C again will
interrupt the running rebuild-tree. 

			Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:22   ` Rogier Wolff
@ 2003-08-06 18:01     ` Vitaly Fertman
  2003-08-06 18:14       ` Rogier Wolff
  0 siblings, 1 reply; 47+ messages in thread
From: Vitaly Fertman @ 2003-08-06 18:01 UTC (permalink / raw)
  To: Rogier Wolff, Oleg Drokin; +Cc: Rogier Wolff, reiserfs-list, copy

Hi Rogier,

> Well, I forgot to say this in my last Email: but it's a big mistake
> for "--rebuild-tree" to START by writing to the filesystem. As for a
> large (and full) filesystem it is going to take HOURS to read the
> whole thing, the first write to the filesystem should happen AFTER
> reading all the datablocks.

Pass0 does not just read blocks and gather information, it fixes leaves 
of the internal reiserfs tree (storage tree), although it does not change
the internal tree itself and upper nodes. As it is run after --check which
has reported that there are fatal corruptions on the fs and rebuild-tree 
is needed, it is supposed that mounting of that fs may lead to other 
corruptions. So fs becomes notmountable and rebuild-tree must run 
to complition.

-- 
Thanks,
Vitaly Fertman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:49       ` Rogier Wolff
@ 2003-08-06 18:10         ` Vitaly Fertman
  0 siblings, 0 replies; 47+ messages in thread
From: Vitaly Fertman @ 2003-08-06 18:10 UTC (permalink / raw)
  To: Rogier Wolff, Oleg Drokin; +Cc: Rogier Wolff, reiserfs-list, copy

> If I think about it for 5 seconds I can find a solution. When mkfs-ing
> a new partition, make up a random FS-ID. Store that ID in every block
> that rebuild-tree needs to find.
>
> If you use 32 bits out of every 4k block (0.1%) for this and if we
> happen to have 4 different reiserfs images on our disk, then we'll
> have a one in a billion chance of messing up our filesystem by running
> --rebuild tree.

it is done for v4, but disk format change was not accepted for v3.

-- 
Thanks,
Vitaly Fertman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 18:01     ` Vitaly Fertman
@ 2003-08-06 18:14       ` Rogier Wolff
  2003-08-06 18:22         ` Rogier Wolff
  2003-08-06 18:52         ` Vitaly Fertman
  0 siblings, 2 replies; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 18:14 UTC (permalink / raw)
  To: Vitaly Fertman; +Cc: Oleg Drokin, reiserfs-list, copy

On Wed, Aug 06, 2003 at 10:01:15PM +0400, Vitaly Fertman wrote:
> Hi Rogier,
> 
> > Well, I forgot to say this in my last Email: but it's a big mistake
> > for "--rebuild-tree" to START by writing to the filesystem. As for a
> > large (and full) filesystem it is going to take HOURS to read the
> > whole thing, the first write to the filesystem should happen AFTER
> > reading all the datablocks.
> 
> Pass0 does not just read blocks and gather information, it fixes leaves 
> of the internal reiserfs tree (storage tree), although it does not change
> the internal tree itself and upper nodes. As it is run after --check which
> has reported that there are fatal corruptions on the fs and rebuild-tree 
> is needed, it is supposed that mounting of that fs may lead to other 
> corruptions. So fs becomes notmountable and rebuild-tree must run 
> to complition.

You can verify on 

	http://www.bitwizard.nl/io_throughput.gif

that (almost) no data was written throughout the first pass.

You can do something like:

#define write my_write
#define llseek my_llseek

before the pass0 code, and do:

static long long cur_pos;
static write_chain_s *write_chain; 

my_llseek (...)
{
  cur_pos = ... ; 
}

my_write (fd, buf, size)
{
  struct write_chain_s *t; 

  t = malloc (sizeof (write_chain_s)); 
  t->data= malloc (size); 
  memcpy (t->data, buf, size)
  t->pos = cur_pos; 
  cur_pos += size; 
  t->next = write_chain; 
  write_chain = t;
} 

Add a flush_postponed_writes to the end of pass0. 

Add some checks that you don't do this using up too much
memory. etc. etc.

				Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 18:14       ` Rogier Wolff
@ 2003-08-06 18:22         ` Rogier Wolff
  2003-08-06 19:03           ` Oleg Drokin
                             ` (2 more replies)
  2003-08-06 18:52         ` Vitaly Fertman
  1 sibling, 3 replies; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 18:22 UTC (permalink / raw)
  Cc: Vitaly Fertman, Oleg Drokin, reiserfs-list, copy

On Wed, Aug 06, 2003 at 08:14:05PM +0200, Rogier Wolff wrote:
> You can verify on 
> 
> 	http://www.bitwizard.nl/io_throughput.gif
> 
> that (almost) no data was written throughout the first pass.

It seems to have dropped into a different pass again: throughput has
dropped to just over 300k per second.

Should I feel comfortable if it's finding formatting errors all over
the place?:

  entry [x y] "ZZZ" in directory [a b] is not formatted properly -- fixed. 

(usually x == 2, a,b == 1,2 .)

Tip: 

Only list the file/directory that's being worked upon when explicitly
requested. When not explicitly requested, set an alarm handler to
print it every second (or so). Lots of time is now spent in writing to
the screen. (It's consumed over an hour of CPU time by now...)

		Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 16:43 ` Hans Reiser
@ 2003-08-06 18:41   ` Jeff Mahoney
  2003-08-06 19:21     ` Rogier Wolff
  2003-08-07 15:05     ` Hans Reiser
  2003-08-06 20:48   ` Bernd Schubert
  1 sibling, 2 replies; 47+ messages in thread
From: Jeff Mahoney @ 2003-08-06 18:41 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Rogier Wolff, reiserfs-list, copy

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

|> We've noticed horrible slowdowns when the filesystem is > 90% full. It
|> turns out that when a block group is more than 90% full reiserfs will
|> prefer a different block group. i.e. it is ALWAYS switching block
|> groups when the whole disk is > 90% full. Something like that. When we
|> report something like that it's always: Ah, yes, that's an old bug
|> we've fixed it. Use patch.....
|>
| I don't think you reported that to me.....
|
| Jeff, give me an opinion on this....

The skip_busy algorithm works like so:

If the filesystem is less than 95% full, the allocator tries to be a bit
smarter and leaves 10% of the bitmap free for future allocations to
avoid fragmentation. If the bitmap being examined has 10% or less free
space, it's skipped. *UNLESS* the file doing the allocation already has
an interest in that bitmap, as determined by the allocator getting
passed a non-zero offset into the bitmap.

If it finds no bitmaps that are more than 10% free or the filesystem is
| 95% full, it restarts the search at the initial hint and ignores the
10% rule.

In short;
1) Find a block in the current bitmap if the file's last block was
allocated there.
2) If there aren't any, or there is no stake, search until a bitmap that
is > 10% free is found, but only from the initial search point to the
end of the disk - without wrapping around.
3) If there aren't any, try to find any free block from the initial
search point until the end of the disk
4) If there aren't any, start at the beginning of the disk and search up
to the initial search point.

The idea is that allocations that can be kept contiguous should be. Once
the allocation ends up being outside of the local bitmap, then the disk
is already seeking, so it doesn't matter if it seeks a bit more if it
can find another chunk where it can find contiguous allocation.

All these searches are streamlined by making find_*_zero_bit do as
little work as possible. For each bitmap, the offset of the first zero
bit is kept as well as how many free bits there are. This makes it
trivial to skip bitmaps that have < 10% free, as well as not force the
allocator to scan entire bitmaps to find that the last bit is the zero bit.

So, yes, when the filesystem approaches 95% full, and there are only new
files being created, the *initial* allocations will scatter themselves.
This is by design so that the subsequent allocations for each of those
files will be able to be contiguous with the original allocation.

What is the workload that is producing the horrible slowdowns?

- -Jeff

- --
jeffm@suse.com
jeffm@csh.rit.edu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQE/MUvoLPWxlyuTD7IRArpkAJ9MThFNeVzmEIONDDlypsALv70dTACgj7xo
EJVh5oDQWqfsG9RH9lcFtO4=
=ZcoM
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 18:14       ` Rogier Wolff
  2003-08-06 18:22         ` Rogier Wolff
@ 2003-08-06 18:52         ` Vitaly Fertman
  1 sibling, 0 replies; 47+ messages in thread
From: Vitaly Fertman @ 2003-08-06 18:52 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: reiserfs-list, copy

> You can verify on
>
> 	http://www.bitwizard.nl/io_throughput.gif
>
> that (almost) no data was written throughout the first pass.
>
> You can do something like:
>
> #define write my_write
> #define llseek my_llseek
>
....
>
> Add a flush_postponed_writes to the end of pass0.

Ok, thank you for suggestion, actually there is a list of buffers in 
progs and when the amount of buffers exceeds some limit some 
of dirty ones are flushed on disk -- which were accessed earlier. 
And all of them are flush at the end of each pass.

-- 
Thanks,
Vitaly Fertman


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 18:22         ` Rogier Wolff
@ 2003-08-06 19:03           ` Oleg Drokin
  2003-08-06 19:04           ` Vitaly Fertman
  2003-08-07 13:35           ` Hans Reiser
  2 siblings, 0 replies; 47+ messages in thread
From: Oleg Drokin @ 2003-08-06 19:03 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Vitaly Fertman, reiserfs-list, copy

Hello!

On Wed, Aug 06, 2003 at 08:22:52PM +0200, Rogier Wolff wrote:

> Only list the file/directory that's being worked upon when explicitly
> requested. When not explicitly requested, set an alarm handler to
> print it every second (or so). Lots of time is now spent in writing to

I think we already do something like this.
Vitaly should know exact details.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 18:22         ` Rogier Wolff
  2003-08-06 19:03           ` Oleg Drokin
@ 2003-08-06 19:04           ` Vitaly Fertman
  2003-08-07 13:35           ` Hans Reiser
  2 siblings, 0 replies; 47+ messages in thread
From: Vitaly Fertman @ 2003-08-06 19:04 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Oleg Drokin, reiserfs-list, copy

> Should I feel comfortable if it's finding formatting errors all over
> the place?:
>
>   entry [x y] "ZZZ" in directory [a b] is not formatted properly -- fixed.
>
> (usually x == 2, a,b == 1,2 .)

Direntries have different format in reiserfs v3.5 and v3.6. These messages 
are probably due to that reiserfs image of 3.5 format you have  on your fs 
of 3.6 format.

> Tip:
>
> Only list the file/directory that's being worked upon when explicitly
> requested. When not explicitly requested, set an alarm handler to
> print it every second (or so). Lots of time is now spent in writing to
> the screen. (It's consumed over an hour of CPU time by now...)
>
> 		Roger.

I do not understand your "explicitly" term exactly, but during semantic 
traversing the name of the last file in the path is not printed at all, only 
the directory it was reached from. Thank you for the tip again.

-- 
Thanks,
Vitaly Fertman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 18:41   ` Jeff Mahoney
@ 2003-08-06 19:21     ` Rogier Wolff
  2003-08-06 19:36       ` Rogier Wolff
  2003-08-06 19:40       ` Vitaly Fertman
  2003-08-07 15:05     ` Hans Reiser
  1 sibling, 2 replies; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 19:21 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Hans Reiser, Rogier Wolff, reiserfs-list, copy

On Wed, Aug 06, 2003 at 02:41:44PM -0400, Jeff Mahoney wrote:
> What is the workload that is producing the horrible slowdowns?

write at 20Mb per second into files of 1Gb each. 

It was reported, and you guys told me it was a bug that had 
been fixed.  we refused to upgrade to the recommended version
back then. But we might have gotten an upgrade due to a general
kernel upgrade since then. But we try to keep the filesystem
at < 90% full anyway. 

Recommendation: Make that "10%" a variable, settable by the
"reserved-for-root" parameter...

Is there a way that allows me to see the fill-percentage of
our block groups?

	Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 19:21     ` Rogier Wolff
@ 2003-08-06 19:36       ` Rogier Wolff
  2003-08-06 22:08         ` Mike Fedyk
  2003-08-06 19:40       ` Vitaly Fertman
  1 sibling, 1 reply; 47+ messages in thread
From: Rogier Wolff @ 2003-08-06 19:36 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Jeff Mahoney, Hans Reiser, reiserfs-list, copy

On Wed, Aug 06, 2003 at 09:21:07PM +0200, Rogier Wolff wrote:
> On Wed, Aug 06, 2003 at 02:41:44PM -0400, Jeff Mahoney wrote:
> > What is the workload that is producing the horrible slowdowns?
> 
> write at 20Mb per second into files of 1Gb each. 

P.S. We also sometimes do: 

	while (1)
	   seek to random position (in randomly chosen 1G file)
	   write 1k

we might benefit from a call that tells the filesystem:
Pretend that this file WILL grow to 1Gb, but leave it sparse for
now.

So, the block to allocate if we seek to 500Mb past the beginning of the
file would be 500Mb further along the disk. That way the eventual image
will be linear. But leaving the intermediate blocks unallocated will 
allow us to  eventually use up that free space if we DO NOT end up
writing that area. 

To get this behaviour we could

	dd if=/dev/zero of=disk.img24 bs=1024k count=1024

at the beginning, but that would use up 1Gb of disk space
right from the beginning, which in some percentage of the 
cases we don't end up using after all.... Having things go
well automatically is very important. 

			Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 19:21     ` Rogier Wolff
  2003-08-06 19:36       ` Rogier Wolff
@ 2003-08-06 19:40       ` Vitaly Fertman
  1 sibling, 0 replies; 47+ messages in thread
From: Vitaly Fertman @ 2003-08-06 19:40 UTC (permalink / raw)
  To: Rogier Wolff, Jeff Mahoney; +Cc: Hans Reiser, Rogier Wolff, reiserfs-list, copy

On Wednesday 06 August 2003 23:21, Rogier Wolff wrote:
> On Wed, Aug 06, 2003 at 02:41:44PM -0400, Jeff Mahoney wrote:
> > What is the workload that is producing the horrible slowdowns?
>
> write at 20Mb per second into files of 1Gb each.
>
> It was reported, and you guys told me it was a bug that had
> been fixed.  we refused to upgrade to the recommended version
> back then. But we might have gotten an upgrade due to a general
> kernel upgrade since then. But we try to keep the filesystem
> at < 90% full anyway.
>
> Recommendation: Make that "10%" a variable, settable by the
> "reserved-for-root" parameter...
>
> Is there a way that allows me to see the fill-percentage of
> our block groups?

debugreiserfs -m prints out bitmap blocks, so
debugreiserfs -m /dev/xxx | grep "^used" would do so.

-- 
Thanks,
Vitaly Fertman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 16:43 ` Hans Reiser
  2003-08-06 18:41   ` Jeff Mahoney
@ 2003-08-06 20:48   ` Bernd Schubert
  1 sibling, 0 replies; 47+ messages in thread
From: Bernd Schubert @ 2003-08-06 20:48 UTC (permalink / raw)
  To: reiserfs-list

On Wednesday 06 August 2003 18:43, Hans Reiser wrote:
> Vitaly, when --rebuild-tree is used, warn the user "rebuilding tree,
> once this is started it must be finished or the filesystem will be
> unusable.  If you kill it or the machine crashes before it finishes, run
> it again.".
>

Wouldn't it be usefull if 'ctrl+c' and 'kill pid' would be disabled on 
entering those sensitive functions?


Regards,
	Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 19:36       ` Rogier Wolff
@ 2003-08-06 22:08         ` Mike Fedyk
  2003-08-07  4:40           ` Rogier Wolff
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Fedyk @ 2003-08-06 22:08 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Jeff Mahoney, Hans Reiser, reiserfs-list, copy

On Wed, Aug 06, 2003 at 09:36:23PM +0200, Rogier Wolff wrote:
> P.S. We also sometimes do: 
> 
> 	while (1)
> 	   seek to random position (in randomly chosen 1G file)
> 	   write 1k
> 
> we might benefit from a call that tells the filesystem:
> Pretend that this file WILL grow to 1Gb, but leave it sparse for
> now.
> 
> So, the block to allocate if we seek to 500Mb past the beginning of the
> file would be 500Mb further along the disk. That way the eventual image
> will be linear. But leaving the intermediate blocks unallocated will 
> allow us to  eventually use up that free space if we DO NOT end up
> writing that area. 

Ahh, but sparse files are not handled effeciently at all with reiserfs.  It
is fixed properly in reiser4 though.  There was a thread recently on this
issue on the reiserfs list within the last week about this.

If you use reiserfs, avoid large sparse files like the plague.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 22:08         ` Mike Fedyk
@ 2003-08-07  4:40           ` Rogier Wolff
  0 siblings, 0 replies; 47+ messages in thread
From: Rogier Wolff @ 2003-08-07  4:40 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Rogier Wolff, Jeff Mahoney, Hans Reiser, reiserfs-list, copy

On Wed, Aug 06, 2003 at 03:08:17PM -0700, Mike Fedyk wrote:
> If you use reiserfs, avoid large sparse files like the plague.

Well, we can't avoid large sparse files. So we'll avoid Reiserfs
like the plague as soon as we get a chance. Thanks for the tip. 

		Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 16:48 ` Oleg Drokin
  2003-08-06 17:18   ` Rogier Wolff
  2003-08-06 17:22   ` Rogier Wolff
@ 2003-08-07 12:58   ` Hans Reiser
  2003-08-07 13:24     ` Russell Coker
  2 siblings, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2003-08-07 12:58 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Rogier Wolff, reiserfs-list, copy

Oleg Drokin wrote:

>So basically speaking you do not want to run rebuild-tree operation on the 
>FS that contains files with reiserfs metadata embedded in them in clear.
>This is also explained in our FAQ.
>
if you compress these files, then you will be okay.  You would almost 
always want to compress them.....


-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:18   ` Rogier Wolff
  2003-08-06 17:28     ` Oleg Drokin
  2003-08-06 17:43     ` Andreas Dilger
@ 2003-08-07 13:03     ` Hans Reiser
  2003-08-07 13:41       ` Rogier Wolff
  2 siblings, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2003-08-07 13:03 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Oleg Drokin, reiserfs-list, copy

Rogier Wolff wrote:

>>In fact this is not exactly true, it only switches to other "block
>>group" if you are creating new file. Why do you think this is a
>>problem?  (of course I am speaking of 2.4.20+ kernels).
>>    
>>
>
>Well we were recovering data into 1G files, but performance of adding
>a new block was horrible. It was doing this for every block. Either it
>was doing a fruitless search on every block-add or it was actually
>adding the block to another block group. Anyway, performance dropped
>-=*A LOT*=- when this happened.
>
>I think you're describing the way it should be, or "is now", but there
>was a bug that caused it to behave differently.
>
>	Roger. 
>
>
>  
>

Can you help Oleg investigate this more closely by providing an exact 
account of what to do to replicate it?  Oleg, replicate this and observe 
what happens.

-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:28     ` Oleg Drokin
  2003-08-06 17:49       ` Rogier Wolff
@ 2003-08-07 13:22       ` Hans Reiser
  2003-08-07 18:12         ` Mike Fedyk
  1 sibling, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2003-08-07 13:22 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Rogier Wolff, reiserfs-list, copy

Oleg Drokin wrote:

>  
>
>>>>datarecovery company. We probably don't have any current
>>>>datarecoveries of people with Reiserfs on their disk. But if we had a
>>>>disk-image with a valid (or not) Reiserfs on it, would it link that
>>>>into our filesytem?
>>>>        
>>>>
>>>yes it will.
>>>So basically speaking you do not want to run rebuild-tree operation on the 
>>>FS that contains files with reiserfs metadata embedded in them in clear.
>>>This is also explained in our FAQ.
>>>      
>>>
>>Oh, great. It provably corrupts our filesystem which is only fixed by
>>running a rebuilt-tree, but if we have certain data (which we actually
>>are likely to have!) then we simply can't. 
>>    
>>
>
>Well. This is actually unfortunate, I agree. In such a case you'd better
>move your reiserfs images to some other place for the time of reiserfsck --rebuild-tree run.
>
or compress them.

>
>  
>
>>WOW it's documented. So it's not a bug. OK. Fine. 
>>    
>>
>
>This does not make it less annoying, though.
>But we cannot do much about it. Really.
>
we fixed it in v4.....

>
>  
>
>>>>We've noticed horrible slowdowns when the filesystem is > 90% full. It
>>>>turns out that when a block group is more than 90% full reiserfs will
>>>>prefer a different block group. i.e. it is ALWAYS switching block
>>>>groups when the whole disk is > 90% full. Something like that. When we
>>>>report something like that it's always: Ah, yes, that's an old bug
>>>>we've fixed it. Use patch.....
>>>>        
>>>>
>>>In fact this is not exactly true, it only switches to other "block
>>>group" if you are creating new file. Why do you think this is a
>>>problem?  (of course I am speaking of 2.4.20+ kernels).
>>>      
>>>
>>Well we were recovering data into 1G files, but performance of adding
>>a new block was horrible. It was doing this for every block. Either it
>>    
>>
>
>This is really strange. Unless you are having horrible fragmentation, that should
>not happen.
>
try replicating it instead of telling him it cannot happen.


-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 12:58   ` Hans Reiser
@ 2003-08-07 13:24     ` Russell Coker
  2003-08-07 14:41       ` Hans Reiser
  0 siblings, 1 reply; 47+ messages in thread
From: Russell Coker @ 2003-08-07 13:24 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Rogier Wolff, reiserfs-list

On Thu, 7 Aug 2003 22:58, Hans Reiser wrote:
> Oleg Drokin wrote:
> >So basically speaking you do not want to run rebuild-tree operation on the
> >FS that contains files with reiserfs metadata embedded in them in clear.
> >This is also explained in our FAQ.
>
> if you compress these files, then you will be okay.  You would almost
> always want to compress them.....

Moderately fast desktop machines can gzip compress data at 10MB/s, the fastest 
available machines could probably manage almost 20MB/s.

4 years ago IDE drives were routinely delivering 30MB/s, now they can now do 
40MB/s or more, the fastest SCSI drives can do 70MB/s, and RAID arrays can do 
more.

Compressing data will significantly decrease linear write speed.

gzip decompression can proceed at ~100MB/s on a moderately fast desktop 
machine, so provided that you are not running out of CPU power and you are 
not using a high-end RAID setup it won't be much of a bottleneck, and could 
actually increase read speed for linear access.

Compressing data with gzip prevents non-linear access (IE running fsck type 
programs).

If you want to maintain an image for ghost-installs of new machines then gzip 
compressed file system images will probably be useful.  For every other case 
where having a file system image is desired you would not want compression.

The problem with fsck being too agressive has a long history.  The HPFS DLL 
for CHKDSK.EXE in OS/2 when run in level 3 was known for recovering files 
from a recently formatted file system.  It seems that the IBM file-system 
people weren't smart enough to come up with the type of ideas that Rogier can 
come up with in seconds.  ;)

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 17:52       ` Rogier Wolff
@ 2003-08-07 13:27         ` Hans Reiser
  0 siblings, 0 replies; 47+ messages in thread
From: Hans Reiser @ 2003-08-07 13:27 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Oleg Drokin, reiserfs-list, copy

Rogier Wolff wrote:

>On Wed, Aug 06, 2003 at 11:43:31AM -0600, Andreas Dilger wrote:
>  
>
>>On Aug 06, 2003  19:18 +0200, Rogier Wolff wrote:
>>    
>>
>>>>>later. So we hit control-C on the fsck.
>>>>>          
>>>>>
>>>>That was big mistake.
>>>>        
>>>>
>>>It was only a couple of percent done. All we have to do now is run it
>>>again, and let it continue.
>>>      
>>>
> 
>  
>
>> From a user-safety point-of-view, you should use "tty()" to see if
>> the program > is running interactively, and then trap CTRL-C and
>> have it print a warning in > the signal handler that pressing
>> CTRL-C again in the next second will kill it.  > All you need then
>> is to call "time()" and save it in a static, and if the > signal
>> handler is called more than once in the same second only then exit.
>>    
>>
>
>No. The warning should not be that pressing control-C again will kill
>the program, but that interrupting a rebuild-tree will make your
>filesystem unmountable, and that pressing control-C again will
>interrupt the running rebuild-tree. 
>
>			Roger. 
>
>  
>
roger is right.

-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 18:22         ` Rogier Wolff
  2003-08-06 19:03           ` Oleg Drokin
  2003-08-06 19:04           ` Vitaly Fertman
@ 2003-08-07 13:35           ` Hans Reiser
  2003-08-07 13:46             ` Rogier Wolff
  2 siblings, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2003-08-07 13:35 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Vitaly Fertman, Oleg Drokin, reiserfs-list, copy

Rogier Wolff wrote:

>On Wed, Aug 06, 2003 at 08:14:05PM +0200, Rogier Wolff wrote:
>  
>
>>You can verify on 
>>
>>	http://www.bitwizard.nl/io_throughput.gif
>>
>>that (almost) no data was written throughout the first pass.
>>    
>>
>
>It seems to have dropped into a different pass again: throughput has
>dropped to just over 300k per second.
>
>Should I feel comfortable if it's finding formatting errors all over
>the place?:
>
>  entry [x y] "ZZZ" in directory [a b] is not formatted properly -- fixed. 
>
>(usually x == 2, a,b == 1,2 .)
>
>Tip: 
>
>Only list the file/directory that's being worked upon when explicitly
>requested. When not explicitly requested, set an alarm handler to
>print it every second (or so). Lots of time is now spent in writing to
>the screen. (It's consumed over an hour of CPU time by now...)
>
>		Roger. 
>
>
>  
>
good point.  Vitaly, do this unless you have a reason not to.

-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 13:03     ` Hans Reiser
@ 2003-08-07 13:41       ` Rogier Wolff
  2003-08-07 18:44         ` Mike Fedyk
  0 siblings, 1 reply; 47+ messages in thread
From: Rogier Wolff @ 2003-08-07 13:41 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Oleg Drokin, reiserfs-list, copy

On Thu, Aug 07, 2003 at 05:03:02PM +0400, Hans Reiser wrote:
> Rogier Wolff wrote:
> 
> >>In fact this is not exactly true, it only switches to other "block
> >>group" if you are creating new file. Why do you think this is a
> >>problem?  (of course I am speaking of 2.4.20+ kernels).
> >>   
> >>
> >
> >Well we were recovering data into 1G files, but performance of adding
> >a new block was horrible. It was doing this for every block. Either it
> >was doing a fruitless search on every block-add or it was actually
> >adding the block to another block group. Anyway, performance dropped
> >-=*A LOT*=- when this happened.
> >
> >I think you're describing the way it should be, or "is now", but there
> >was a bug that caused it to behave differently.

> Can you help Oleg investigate this more closely by providing an exact 
> account of what to do to replicate it?  Oleg, replicate this and observe 
> what happens.

What part of: "we reported it a while back, and you told us it was
fixed" don't you understand?

			Roger. 

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 13:35           ` Hans Reiser
@ 2003-08-07 13:46             ` Rogier Wolff
  2003-08-07 14:11               ` Vitaly Fertman
  0 siblings, 1 reply; 47+ messages in thread
From: Rogier Wolff @ 2003-08-07 13:46 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Vitaly Fertman, Oleg Drokin, reiserfs-list, copy

On Thu, Aug 07, 2003 at 05:35:19PM +0400, Hans Reiser wrote:
> Rogier Wolff wrote:
> >Only list the file/directory that's being worked upon when explicitly
> >requested. When not explicitly requested, set an alarm handler to
> >print it every second (or so). Lots of time is now spent in writing to
> >the screen. (It's consumed over an hour of CPU time by now...)

> good point.  Vitaly, do this unless you have a reason not to.

FYI: A "reason not to" would be if you "frequently" get crashes in the
program that get the location misreported because the last line it
printed ends up being quite wrong.

			Roger.

-- 
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 13:46             ` Rogier Wolff
@ 2003-08-07 14:11               ` Vitaly Fertman
  0 siblings, 0 replies; 47+ messages in thread
From: Vitaly Fertman @ 2003-08-07 14:11 UTC (permalink / raw)
  To: Rogier Wolff, Hans Reiser; +Cc: reiserfs-list, copy

On Thursday 07 August 2003 17:46, Rogier Wolff wrote:
> On Thu, Aug 07, 2003 at 05:35:19PM +0400, Hans Reiser wrote:
> > Rogier Wolff wrote:
> > >Only list the file/directory that's being worked upon when explicitly
> > >requested. When not explicitly requested, set an alarm handler to
> > >print it every second (or so). Lots of time is now spent in writing to
> > >the screen. (It's consumed over an hour of CPU time by now...)
> >
> > good point.  Vitaly, do this unless you have a reason not to.

:) Hans, I answered on this and described how it is done, do you see 
my answer?

> FYI: A "reason not to" would be if you "frequently" get crashes in the
> program that get the location misreported because the last line it
> printed ends up being quite wrong.
>
> 			Roger.

As I have wrote already the path to the directory, the file was reached from 
is printed, without the name of the file being checked -- this reduces the 
amount of stuff being printed and as there is no ararms at the time of a 
crash the path to the parent directory is always printed. 

-- 
Thanks,
Vitaly Fertman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 13:24     ` Russell Coker
@ 2003-08-07 14:41       ` Hans Reiser
  0 siblings, 0 replies; 47+ messages in thread
From: Hans Reiser @ 2003-08-07 14:41 UTC (permalink / raw)
  To: russell; +Cc: Rogier Wolff, reiserfs-list

Russell Coker wrote:

>
>The problem with fsck being too agressive has a long history.  The HPFS DLL 
>for CHKDSK.EXE in OS/2 when run in level 3 was known for recovering files 
>from a recently formatted file system.  It seems that the IBM file-system 
>people weren't smart enough to come up with the type of ideas that Rogier can 
>come up with in seconds.  ;)
>
>  
>
I think it is more like, when you write the first version of fsck you 
don't think about people storing copies of reiserfs on reiserfs, and 
then on the next chance you get you change the disk format.  
Unfortunately, changing disk formats is not done often so there is a lag 
(though now that we have the new plugin infrastructure....).

-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-06 18:41   ` Jeff Mahoney
  2003-08-06 19:21     ` Rogier Wolff
@ 2003-08-07 15:05     ` Hans Reiser
  2003-08-07 15:53       ` Jeff Mahoney
  1 sibling, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2003-08-07 15:05 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Rogier Wolff, reiserfs-list, copy

Jeff Mahoney wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> |> We've noticed horrible slowdowns when the filesystem is > 90% full. It
> |> turns out that when a block group is more than 90% full reiserfs will
> |> prefer a different block group. i.e. it is ALWAYS switching block
> |> groups when the whole disk is > 90% full. Something like that. When we
> |> report something like that it's always: Ah, yes, that's an old bug
> |> we've fixed it. Use patch.....
> |>
> | I don't think you reported that to me.....
> |
> | Jeff, give me an opinion on this....
>
> The skip_busy algorithm works like so:
>
> If the filesystem is less than 95% full, the allocator tries to be a bit
> smarter and leaves 10% of the bitmap free for future allocations to
> avoid fragmentation. 


> If the bitmap being examined has 10% or less free
> space, it's skipped. *UNLESS* the file doing the allocation already has
> an interest in that bitmap, as determined by the allocator getting
> passed a non-zero offset into the bitmap. 

Define this unless clause more fully please.

>
>
> If it finds no bitmaps that are more than 10% free or the filesystem is
> | 95% full, it restarts the search at the initial hint and ignores the
> 10% rule.
>
> In short;
> 1) Find a block in the current bitmap if the file's last block was
> allocated there.
> 2) If there aren't any, or there is no stake

stake?

> , search until a bitmap that
> is > 10% free is found, but only from the initial search point to the
> end of the disk - without wrapping around.
> 3) If there aren't any, try to find any free block from the initial
> search point until the end of the disk
> 4) If there aren't any, start at the beginning of the disk and search up
> to the initial search point.
>
> The idea is that allocations that can be kept contiguous should be. Once
> the allocation ends up being outside of the local bitmap, then the disk
> is already seeking, so it doesn't matter if it seeks a bit more if it
> can find another chunk where it can find contiguous allocation.
>
> All these searches are streamlined by making find_*_zero_bit do as
> little work as possible. For each bitmap, the offset of the first zero
> bit is kept as well as how many free bits there are. This makes it
> trivial to skip bitmaps that have < 10% free, as well as not force the
> allocator to scan entire bitmaps to find that the last bit is the zero 
> bit.
>
> So, yes, when the filesystem approaches 95% full, and there are only new
> files being created, the *initial* allocations will scatter themselves. 

it would be better to scatter at the directory rather than the file 
level, but that is probably harder to code.

>
> This is by design so that the subsequent allocations for each of those
> files will be able to be contiguous with the original allocation.
>
> What is the workload that is producing the horrible slowdowns?
>
> - -Jeff
>
> - --
> jeffm@suse.com
> jeffm@csh.rit.edu
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQE/MUvoLPWxlyuTD7IRArpkAJ9MThFNeVzmEIONDDlypsALv70dTACgj7xo
> EJVh5oDQWqfsG9RH9lcFtO4=
> =ZcoM
> -----END PGP SIGNATURE-----
>
>
>


-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 15:05     ` Hans Reiser
@ 2003-08-07 15:53       ` Jeff Mahoney
  2003-08-08 13:07         ` Hans Reiser
  0 siblings, 1 reply; 47+ messages in thread
From: Jeff Mahoney @ 2003-08-07 15:53 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Rogier Wolff, reiserfs-list, copy

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hans Reiser wrote:
| Jeff Mahoney wrote:
|
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> |> We've noticed horrible slowdowns when the filesystem is > 90% full. It
|> |> turns out that when a block group is more than 90% full reiserfs will
|> |> prefer a different block group. i.e. it is ALWAYS switching block
|> |> groups when the whole disk is > 90% full. Something like that. When we
|> |> report something like that it's always: Ah, yes, that's an old bug
|> |> we've fixed it. Use patch.....
|> |>
|> | I don't think you reported that to me.....
|> |
|> | Jeff, give me an opinion on this....
|>
|> The skip_busy algorithm works like so:
|>
|> If the filesystem is less than 95% full, the allocator tries to be a bit
|> smarter and leaves 10% of the bitmap free for future allocations to
|> avoid fragmentation.
|
|
|
|> If the bitmap being examined has 10% or less free
|> space, it's skipped. *UNLESS* the file doing the allocation already has
|> an interest in that bitmap, as determined by the allocator getting
|> passed a non-zero offset into the bitmap.
|
|
| Define this unless clause more fully please.
|
|>
|>
|> If it finds no bitmaps that are more than 10% free or the filesystem is
|> | 95% full, it restarts the search at the initial hint and ignores the
|> 10% rule.
|>
|> In short;
|> 1) Find a block in the current bitmap if the file's last block was
|> allocated there.
|> 2) If there aren't any, or there is no stake
|
|
| stake?

The "unless" and "stake" mean the same thing. When the block allocator
is given a hint for a file, it's the last block allocated already for
that file. So, the search starts at the bitmap and offset specified by
the hint. If there is a block available after that position, but still
in that bitmap, it's used; regardless of the 10% rule.

Once we move out of that bitmap, the skip busy algorithm is applied,
which aims to keep bitmaps at least 10% free when possible so that
future allocations may have blocks local to the last allocation available.

The algorithm is only as good as the hint passed to it. It doesn't try
to be smart about placement other than implementing the above algorithm.

- -Jeff

- --
jeffm@suse.com
jeffm@csh.rit.edu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQE/MnYGLPWxlyuTD7IRAriBAKCBi4j1YvWmndTrQsqDAZex/HFSMACdFMrV
octG4Hi4ipGEKXUxoiWkFwo=
=J/Pn
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 13:22       ` Hans Reiser
@ 2003-08-07 18:12         ` Mike Fedyk
  2003-08-08  0:18           ` Russell Coker
  2003-08-08  9:56           ` Oleg Drokin
  0 siblings, 2 replies; 47+ messages in thread
From: Mike Fedyk @ 2003-08-07 18:12 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Oleg Drokin, Rogier Wolff, reiserfs-list, copy

On Thu, Aug 07, 2003 at 05:22:44PM +0400, Hans Reiser wrote:
> Oleg Drokin wrote:
> >Well. This is actually unfortunate, I agree. In such a case you'd better
> >move your reiserfs images to some other place for the time of reiserfsck 
> >--rebuild-tree run.
> >
> or compress them.

But if there was at any time an uncompressed reiserfs image within the outer
reiserfs filesystem you're fscking, won't that screw it up too?

So you can compress it, but if you uncompress it to work with it, it still
fscks fsck...  Right? :-/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 13:41       ` Rogier Wolff
@ 2003-08-07 18:44         ` Mike Fedyk
  0 siblings, 0 replies; 47+ messages in thread
From: Mike Fedyk @ 2003-08-07 18:44 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Hans Reiser, Oleg Drokin, reiserfs-list, copy

On Thu, Aug 07, 2003 at 03:41:08PM +0200, Rogier Wolff wrote:
> On Thu, Aug 07, 2003 at 05:03:02PM +0400, Hans Reiser wrote:
> > Can you help Oleg investigate this more closely by providing an exact 
> > account of what to do to replicate it?  Oleg, replicate this and observe 
> > what happens.
> 
> What part of: "we reported it a while back, and you told us it was
> fixed" don't you understand?

OK Rogier, please search through the archive, and post the URL of your
previous report.  That will be most helpful.

You are complaining about legitemate problems, and they are doing what they
can to get it identified, and fixed.  With that URL they will be able to at
least know what they already fixed...

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 18:12         ` Mike Fedyk
@ 2003-08-08  0:18           ` Russell Coker
  2003-08-08 11:29             ` [OT] " Christian Kujau
  2003-08-08  9:56           ` Oleg Drokin
  1 sibling, 1 reply; 47+ messages in thread
From: Russell Coker @ 2003-08-08  0:18 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: reiserfs-list

On Fri, 8 Aug 2003 04:12, Mike Fedyk wrote:
> But if there was at any time an uncompressed reiserfs image within the
> outer reiserfs filesystem you're fscking, won't that screw it up too?
>
> So you can compress it, but if you uncompress it to work with it, it still
> fscks fsck...  Right? :-/

Fortunately the crypto support in the kernel is getting good now.  So you 
could just use a crypto-loopback device from a file on the ReiserFS file 
system for storing a file-system image.  If you use an XOR encryption method 
it won't even hurt performance either.  :-#

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 18:12         ` Mike Fedyk
  2003-08-08  0:18           ` Russell Coker
@ 2003-08-08  9:56           ` Oleg Drokin
  1 sibling, 0 replies; 47+ messages in thread
From: Oleg Drokin @ 2003-08-08  9:56 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Hans Reiser, Rogier Wolff, reiserfs-list, copy

Hello!

On Thu, Aug 07, 2003 at 11:12:27AM -0700, Mike Fedyk wrote:
> > >Well. This is actually unfortunate, I agree. In such a case you'd better
> > >move your reiserfs images to some other place for the time of reiserfsck 
> > >--rebuild-tree run.
> > or compress them.
> But if there was at any time an uncompressed reiserfs image within the outer
> reiserfs filesystem you're fscking, won't that screw it up too?

Yes.
The fs in file will be completely destroyed.
Some stuff from it may appear in outer fs. (possibly in lost + found,
no actual file data, just the names and directory structure).

> So you can compress it, but if you uncompress it to work with it, it still
> fscks fsck...  Right? :-/

Yes.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [OT] Re: ReiserFS problems
  2003-08-08  0:18           ` Russell Coker
@ 2003-08-08 11:29             ` Christian Kujau
  2003-08-08 12:40               ` Nikita Danilov
  2003-08-08 12:59               ` Russell Coker
  0 siblings, 2 replies; 47+ messages in thread
From: Christian Kujau @ 2003-08-08 11:29 UTC (permalink / raw)
  To: reiserfs-list

Russell Coker wrote:
> system for storing a file-system image.  If you use an XOR encryption method 
> it won't even hurt performance either.  :-#

sorry to hop in here, but i don't understand why an algorithm like "XOR" 
is named "encryption" at all. isn't it that another XOR operation just 
delivers the cleartext again?

a     b             a

0 XOR 1 = 1 XOR b = 0
1 XOR 0 = 1 XOR b = 1
1 XOR 1 = 0 XOR b = 1
0 XOR 0 = 0 XOR b = 0


but i don't have to explain that to you...

thanks,
Christian.

-- 
BOFH excuse #179:

multicasts on broken packets


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [OT] Re: ReiserFS problems
  2003-08-08 11:29             ` [OT] " Christian Kujau
@ 2003-08-08 12:40               ` Nikita Danilov
  2003-08-08 13:06                 ` Carl-Daniel Hailfinger
  2003-08-08 12:59               ` Russell Coker
  1 sibling, 1 reply; 47+ messages in thread
From: Nikita Danilov @ 2003-08-08 12:40 UTC (permalink / raw)
  To: Christian Kujau; +Cc: reiserfs-list

Christian Kujau writes:
 > Russell Coker wrote:
 > > system for storing a file-system image.  If you use an XOR encryption method 
 > > it won't even hurt performance either.  :-#
 > 
 > sorry to hop in here, but i don't understand why an algorithm like "XOR" 
 > is named "encryption" at all. isn't it that another XOR operation just 
 > delivers the cleartext again?

"XOR encryption" xors consequent bytes of data being encrypted (in our
case blocks of loop device) with consequent bytes of user supplied key
(password). For all reasonable sizes of the key, this is surely only
marginally safer than no encryption at all, because, for instance, file
block devices usually contains a lot of zero-filled blocks, and xoring
key with zeroes will give you key.

 > 
 > a     b             a
 > 
 > 0 XOR 1 = 1 XOR b = 0
 > 1 XOR 0 = 1 XOR b = 1
 > 1 XOR 1 = 0 XOR b = 1
 > 0 XOR 0 = 0 XOR b = 0
 > 
 > 
 > but i don't have to explain that to you...
 > 
 > thanks,
 > Christian.

Nikita.

 > 
 > -- 
 > BOFH excuse #179:
 > 
 > multicasts on broken packets
 > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [OT] Re: ReiserFS problems
  2003-08-08 11:29             ` [OT] " Christian Kujau
  2003-08-08 12:40               ` Nikita Danilov
@ 2003-08-08 12:59               ` Russell Coker
  2003-08-08 15:39                 ` Christian Kujau
  2003-08-09  0:45                 ` The Amazing Dragon
  1 sibling, 2 replies; 47+ messages in thread
From: Russell Coker @ 2003-08-08 12:59 UTC (permalink / raw)
  To: Christian Kujau, reiserfs-list

On Fri, 8 Aug 2003 21:29, Christian Kujau wrote:
> Russell Coker wrote:
> > system for storing a file-system image.  If you use an XOR encryption
> > method it won't even hurt performance either.  :-#
>
> sorry to hop in here, but i don't understand why an algorithm like "XOR"
> is named "encryption" at all. isn't it that another XOR operation just
> delivers the cleartext again?

It is encryption just as is the "Caesar cipher" (of which ROT-13 is a 
sub-set).  It is not very good encryption, but it's enough to stop reiserfsck 
from doing the wrong thing as it will defeat magic number detection.

Of course a magic number can be hit by chance given a sufficiently large 
amount of input data.  I imagine that a gzip file could have the ReiserFS 
magic data if given the right/wrong input and thus could be munged through a 
fsck.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [OT] Re: ReiserFS problems
  2003-08-08 12:40               ` Nikita Danilov
@ 2003-08-08 13:06                 ` Carl-Daniel Hailfinger
  0 siblings, 0 replies; 47+ messages in thread
From: Carl-Daniel Hailfinger @ 2003-08-08 13:06 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Christian Kujau, reiserfs-list

Nikita Danilov wrote:
> Christian Kujau writes:
>  > Russell Coker wrote:
>  > > system for storing a file-system image.  If you use an XOR encryption method 
>  > > it won't even hurt performance either.  :-#
>  > 
>  > sorry to hop in here, but i don't understand why an algorithm like "XOR" 
>  > is named "encryption" at all. isn't it that another XOR operation just 
>  > delivers the cleartext again?
> 
> "XOR encryption" xors consequent bytes of data being encrypted (in our
> case blocks of loop device) with consequent bytes of user supplied key
> (password). For all reasonable sizes of the key, this is surely only
> marginally safer than no encryption at all, because, for instance, file
> block devices usually contains a lot of zero-filled blocks, and xoring
> key with zeroes will give you key.

Yeah, but we don't know if we will say something similar about AES in 50
years. Besides that, if you look at different cryptoloop implementations,
you will notice that some of them (AFAIK the affected versions are no
longer used) use the same IV for every block they encrypt, thus giving
identical ciphertext blocks for identical plaintext blocks. That doesn't
necessarily give you the key when you look at ciphertext blocks where the
plaintext is supposed to be zero-filled (it NEVER should if your algorithm
is worth anything) but still gives you strong hints about the filesystem
type which was encrypted. Knowing the filesystem type, you know more parts
of the plaintext and thus can start cryptanalysis. However, all current
algorithms are designed to withstand this type of attack.

> 
>  > 
>  > a     b             a
>  > 
>  > 0 XOR 1 = 1 XOR b = 0
>  > 1 XOR 0 = 1 XOR b = 1
>  > 1 XOR 1 = 0 XOR b = 1
>  > 0 XOR 0 = 0 XOR b = 0
>  > 
>  > 
>  > but i don't have to explain that to you...
>  > 
>  > thanks,
>  > Christian.
> 
> Nikita.

Regards,
Carl-Daniel

-- 
Usual disclaimers apply. Satisfaction guaranteed: You would get your money
back if you had paid me for writing this.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: ReiserFS problems
  2003-08-07 15:53       ` Jeff Mahoney
@ 2003-08-08 13:07         ` Hans Reiser
  0 siblings, 0 replies; 47+ messages in thread
From: Hans Reiser @ 2003-08-08 13:07 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Rogier Wolff, reiserfs-list, copy

Jeff Mahoney wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hans Reiser wrote:
> | Jeff Mahoney wrote:
> |
> |> -----BEGIN PGP SIGNED MESSAGE-----
> |> Hash: SHA1
> |>
> |> |> We've noticed horrible slowdowns when the filesystem is > 90% 
> full. It
> |> |> turns out that when a block group is more than 90% full reiserfs 
> will
> |> |> prefer a different block group. i.e. it is ALWAYS switching block
> |> |> groups when the whole disk is > 90% full. Something like that. 
> When we
> |> |> report something like that it's always: Ah, yes, that's an old bug
> |> |> we've fixed it. Use patch.....
> |> |>
> |> | I don't think you reported that to me.....
> |> |
> |> | Jeff, give me an opinion on this....
> |>
> |> The skip_busy algorithm works like so:
> |>
> |> If the filesystem is less than 95% full, the allocator tries to be 
> a bit
> |> smarter and leaves 10% of the bitmap free for future allocations to
> |> avoid fragmentation.
> |
> |
> |
> |> If the bitmap being examined has 10% or less free
> |> space, it's skipped. *UNLESS* the file doing the allocation already 
> has
> |> an interest in that bitmap, as determined by the allocator getting
> |> passed a non-zero offset into the bitmap.
> |
> |
> | Define this unless clause more fully please.
> |
> |>
> |>
> |> If it finds no bitmaps that are more than 10% free or the 
> filesystem is
> |> | 95% full, it restarts the search at the initial hint and ignores the
> |> 10% rule.
> |>
> |> In short;
> |> 1) Find a block in the current bitmap if the file's last block was
> |> allocated there.
> |> 2) If there aren't any, or there is no stake
> |
> |
> | stake?
>
> The "unless" and "stake" mean the same thing. When the block allocator
> is given a hint for a file, it's the last block allocated already for
> that file

and when it is a new file?  surely we still use the left neighbor in the 
tree as the hint....?


It seems that Rogiers problems are due to not using this code that you 
wrote at all.....

> . So, the search starts at the bitmap and offset specified by
> the hint. If there is a block available after that position, but still
> in that bitmap, it's used; regardless of the 10% rule.
>
> Once we move out of that bitmap, the skip busy algorithm is applied,
> which aims to keep bitmaps at least 10% free when possible so that
> future allocations may have blocks local to the last allocation 
> available.
>
> The algorithm is only as good as the hint passed to it. It doesn't try
> to be smart about placement other than implementing the above algorithm.
>
> - -Jeff
>
> - --
> jeffm@suse.com
> jeffm@csh.rit.edu
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQE/MnYGLPWxlyuTD7IRAriBAKCBi4j1YvWmndTrQsqDAZex/HFSMACdFMrV
> octG4Hi4ipGEKXUxoiWkFwo=
> =J/Pn
> -----END PGP SIGNATURE-----
>
>
>


-- 
Hans



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [OT] Re: ReiserFS problems
  2003-08-08 12:59               ` Russell Coker
@ 2003-08-08 15:39                 ` Christian Kujau
  2003-08-09  0:45                 ` The Amazing Dragon
  1 sibling, 0 replies; 47+ messages in thread
From: Christian Kujau @ 2003-08-08 15:39 UTC (permalink / raw)
  To: reiserfs-list

Russell Coker wrote:
> sub-set).  It is not very good encryption, but it's enough to stop reiserfsck 
> from doing the wrong thing as it will defeat magic number detection.

oh, i see, it was about an fsck issue here (that must be the reason why 
XOR comes up on reiserfs-list :-))

Thank you all for your comments,
Christian.

-- 
BOFH excuse #133:

It's not plugged in.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [OT] Re: ReiserFS problems
  2003-08-08 12:59               ` Russell Coker
  2003-08-08 15:39                 ` Christian Kujau
@ 2003-08-09  0:45                 ` The Amazing Dragon
  1 sibling, 0 replies; 47+ messages in thread
From: The Amazing Dragon @ 2003-08-09  0:45 UTC (permalink / raw)
  To: russell; +Cc: Christian Kujau, reiserfs-list

> From: Russell Coker <russell@coker.com.au>
> On Fri, 8 Aug 2003 21:29, Christian Kujau wrote:
> > Russell Coker wrote:
> > > system for storing a file-system image.  If you use an XOR encryption
> > > method it won't even hurt performance either.  :-#
> >
> > sorry to hop in here, but i don't understand why an algorithm like "XOR"
> > is named "encryption" at all. isn't it that another XOR operation just
> > delivers the cleartext again?
> 
> It is encryption just as is the "Caesar cipher" (of which ROT-13 is a 
> sub-set).  It is not very good encryption, but it's enough to stop reiserfsck 
> from doing the wrong thing as it will defeat magic number detection.

It depends what is being mentioned. If you've got a short key, say less
than 1MB, then it is horrible encryption; take a couple minutes to find a
cracker and your data will be out in the open pretty quickly.

If the key is large, then it is a very different matter. If you've got a
keystream that is at least as large as the data being sent, and it is
*never* reused, then this is the "one time pad". The "one time pad" is
not merely the best cryptosystem out there, but it is the only
cryptosystem that is *provably* secure. The problem is that keystream it
can *never* be reused, and must be absolutly secure (this really needs an
incredible amount of emphasis). If you're sending a little bit of super
sensitive data around, then it is worth trying to deal with the key
distribution problem; otherwise, use a conventional cryptosystem.

> Of course a magic number can be hit by chance given a sufficiently large 
> amount of input data.  I imagine that a gzip file could have the ReiserFS 
> magic data if given the right/wrong input and thus could be munged through a 
> fsck.

Which as already noted is what really matters here.  :-)  Just a little
bit of obfustication, so fsck will fail to identify the data as part of
the filesystem, just as mere data and leave it alone.

Shouldn't fsck be able to deal with this anyway? Just end up with the
file being merged into the larger filesystem? Could this be used by a
cracker to attack a system? The use a random per-FS cookie approach seems
like a good idea.

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \   (    |         EHeM@cs.pdx.edu      PGP 8881EF59         |    )   /
  \_  \   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
    \___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2003-08-09  0:45 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-06 16:20 ReiserFS problems Rogier Wolff
2003-08-06 16:43 ` Hans Reiser
2003-08-06 18:41   ` Jeff Mahoney
2003-08-06 19:21     ` Rogier Wolff
2003-08-06 19:36       ` Rogier Wolff
2003-08-06 22:08         ` Mike Fedyk
2003-08-07  4:40           ` Rogier Wolff
2003-08-06 19:40       ` Vitaly Fertman
2003-08-07 15:05     ` Hans Reiser
2003-08-07 15:53       ` Jeff Mahoney
2003-08-08 13:07         ` Hans Reiser
2003-08-06 20:48   ` Bernd Schubert
2003-08-06 16:48 ` Oleg Drokin
2003-08-06 17:18   ` Rogier Wolff
2003-08-06 17:28     ` Oleg Drokin
2003-08-06 17:49       ` Rogier Wolff
2003-08-06 18:10         ` Vitaly Fertman
2003-08-07 13:22       ` Hans Reiser
2003-08-07 18:12         ` Mike Fedyk
2003-08-08  0:18           ` Russell Coker
2003-08-08 11:29             ` [OT] " Christian Kujau
2003-08-08 12:40               ` Nikita Danilov
2003-08-08 13:06                 ` Carl-Daniel Hailfinger
2003-08-08 12:59               ` Russell Coker
2003-08-08 15:39                 ` Christian Kujau
2003-08-09  0:45                 ` The Amazing Dragon
2003-08-08  9:56           ` Oleg Drokin
2003-08-06 17:43     ` Andreas Dilger
2003-08-06 17:52       ` Rogier Wolff
2003-08-07 13:27         ` Hans Reiser
2003-08-07 13:03     ` Hans Reiser
2003-08-07 13:41       ` Rogier Wolff
2003-08-07 18:44         ` Mike Fedyk
2003-08-06 17:22   ` Rogier Wolff
2003-08-06 18:01     ` Vitaly Fertman
2003-08-06 18:14       ` Rogier Wolff
2003-08-06 18:22         ` Rogier Wolff
2003-08-06 19:03           ` Oleg Drokin
2003-08-06 19:04           ` Vitaly Fertman
2003-08-07 13:35           ` Hans Reiser
2003-08-07 13:46             ` Rogier Wolff
2003-08-07 14:11               ` Vitaly Fertman
2003-08-06 18:52         ` Vitaly Fertman
2003-08-07 12:58   ` Hans Reiser
2003-08-07 13:24     ` Russell Coker
2003-08-07 14:41       ` Hans Reiser
2003-08-06 16:52 ` Andreas Dilger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.