All of lore.kernel.org
 help / color / mirror / Atom feed
* Reoccuring corruption problem
@ 2004-08-19 18:46 Dan Nilsson
  2004-08-19 19:42 ` Sander
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Nilsson @ 2004-08-19 18:46 UTC (permalink / raw)
  To: reiserfs-list

Hi!

I've been trying to cure a reoccuring file corruption problem
i have on my Gentoo box running Linux 2.6.8.1-mm1 (upgraded
from 2.6.7; made no difference).

I get corrupted directory entries behaving like this:

bash-2.05b# rm -rf dev-perl/
rm: cannot remove directory Dev-perl/': Directory not empty
bash-2.05b# cd dev-perl/  
bash-2.05b# ls
Curses-uI
bash-2.05b# ls -al
ls: Curses-uI: No such file or directory
total 0
drwxr-xr-x    3 root     root           80 Aug 19 20:32 .
drwxrwxrwt    8 root     root          304 Aug 19 20:32 ..

Interesting to note is that "Curses-uI" is a corrupted file name,
it usually happens to the same set of files in my /usr/portage.
Typical corruptions i get:

"ruby-progressbar" becomes "ruby-pr?gressbar"
"ruby-zlib"        becomes "ruby-zlyb"

If i reboot and use a rescue CD to run reiserfsck --rebuild-tree
the corrupted directories are deleted. However, if i delete my
/usr/portage and try to restore it with "emerge sync", the same
set of files (usually) will corrupt in exactly the same way.

This is how i can reproduce the problem:

1. Reboot to rescue CD.
2. Run reiserfsck --rebuild-tree until it reports no errors
3. Reboot computer (no CD)  
4. Remove /usr/portage
5. emerge sync (it will fetch /usr/portage)

emerge will fail and i will be left with some corrupted directories
in the portage trees, usually the ruby-directories.

If i want to be really sure to get more corruption i just try to unpack
a kernel source tarball. It will in the same way create some corrupted
files (and usually same files every time)

From my 2.6.7 configuration:

CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
CONFIG_REISERFS_PROC_INFO=y
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
CONFIG_REISERFS_FS_SECURITY=y

Running 2.6.8.1-mm1 now with SECURITY turned off and some other settings
activated. I do not have the configuration file right here. A friend built
me that kenrel since i cannot build a kernel, unpacking the source produces
corrupted files.

I have run badblocks and memtest and they have come up with nothing.

A very strange problem, I have not read about anything similiar and I
have no clue about what might be causing it.

I made a test program creating 100000 directory entries, which i ran. 
No problem deleting those directories. Argh.

Any ideas?

Thanks,
Dan Nilsson

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reoccuring corruption problem
  2004-08-19 18:46 Reoccuring corruption problem Dan Nilsson
@ 2004-08-19 19:42 ` Sander
  2004-08-21  5:43   ` Dan Nilsson
  0 siblings, 1 reply; 6+ messages in thread
From: Sander @ 2004-08-19 19:42 UTC (permalink / raw)
  To: Dan Nilsson; +Cc: reiserfs-list

Dan Nilsson wrote (ao):
> Typical corruptions i get:
> 
> "ruby-progressbar" becomes "ruby-pr?gressbar"
> "ruby-zlib"        becomes "ruby-zlyb"

> I have run badblocks and memtest and they have come up with nothing.

Sounds like bad memory nevertheless. Can you try memtest86 for at least
24 hours?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reoccuring corruption problem
  2004-08-19 19:42 ` Sander
@ 2004-08-21  5:43   ` Dan Nilsson
  2004-08-21  7:59     ` mjt
  2004-08-21  8:16     ` Sander
  0 siblings, 2 replies; 6+ messages in thread
From: Dan Nilsson @ 2004-08-21  5:43 UTC (permalink / raw)
  To: sander; +Cc: reiserfs-list

> Sounds like bad memory nevertheless. Can you try memtest86 for at least
> 24 hours?

Yes,
I put the computer running memtest86 after getting your email and it did in 
fact pass all tests (after ~24hrs i aborted the test). 

I have fairly recently upgraded the computer by adding a memory module which i 
suspected would show up the errors (if any).

After running the test I fsck:ed to remove the corruptions, then proceeded by 
removed my /usr/portage tree (where corruptions seem to easily occur) and 
reinstalled all the files (with rsync). The same type of corruption occured 
but now to a different directory than "usual". 

I have absolutely no idea what causes it. Nothing shows up in syslog and there 
doesn't seem to be any problems with the hard drive (which, by the way is a 
2Gb drive with the corruptions occuring on a 1.6 Gb partition).

Thanks,
Dan Nilsson

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reoccuring corruption problem
  2004-08-21  5:43   ` Dan Nilsson
@ 2004-08-21  7:59     ` mjt
  2004-08-21  8:16     ` Sander
  1 sibling, 0 replies; 6+ messages in thread
From: mjt @ 2004-08-21  7:59 UTC (permalink / raw)
  To: Dan Nilsson; +Cc: sander, reiserfs-list

On Sat, Aug 21, 2004 at 07:43:41AM +0200, Dan Nilsson wrote:
>
>After running the test I fsck:ed to remove the corruptions, then proceeded by 
>removed my /usr/portage tree (where corruptions seem to easily occur) and 
>reinstalled all the files (with rsync). The same type of corruption occured 
>but now to a different directory than "usual". 

I think you should provide the metadata dump of the filesystem with and
without breakage to Namesys, so they can look at it when their new network
is up and running. This has usually been the next logical step :)

Did you have any debugging options turned on?

-- 
mjt


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reoccuring corruption problem
  2004-08-21  5:43   ` Dan Nilsson
  2004-08-21  7:59     ` mjt
@ 2004-08-21  8:16     ` Sander
  2004-08-24  8:21       ` Dan Nilsson
  1 sibling, 1 reply; 6+ messages in thread
From: Sander @ 2004-08-21  8:16 UTC (permalink / raw)
  To: Dan Nilsson; +Cc: reiserfs-list

Dan Nilsson wrote (ao):
> > Sounds like bad memory nevertheless. Can you try memtest86 for at least
> > 24 hours?
> 
> Yes,
> I put the computer running memtest86 after getting your email and it
> did in fact pass all tests (after ~24hrs i aborted the test). 
> 
> I have fairly recently upgraded the computer by adding a memory module
> which i suspected would show up the errors (if any).
> 
> After running the test I fsck:ed to remove the corruptions, then
> proceeded by removed my /usr/portage tree (where corruptions seem to
> easily occur) and reinstalled all the files (with rsync). The same
> type of corruption occured but now to a different directory than
> "usual". 

You seem to be able to trigger it fast and easy. Can you try again with
the added memory module removed? I'm a bit stubborn .. ;-)

It must be something with your hardware, as you are the only one, and
also the kind of corruption is common with broken hardware.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reoccuring corruption problem
  2004-08-21  8:16     ` Sander
@ 2004-08-24  8:21       ` Dan Nilsson
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Nilsson @ 2004-08-24  8:21 UTC (permalink / raw)
  To: sander; +Cc: reiserfs-list

On Saturday 21 August 2004 10.16, Sander wrote:
> You seem to be able to trigger it fast and easy. Can you try again with
> the added memory module removed? I'm a bit stubborn .. ;-)
>
> It must be something with your hardware, as you are the only one, and
> also the kind of corruption is common with broken hardware.

Ok. I removed the module. And put in a new one + an extra HD.

Performed the standard procedure... and no corruptions! Some further
tests (unpacking kernels etc) produced no errors. Seems like - uh -
the corruption problems are gone! :)

Well, what can I say, I am surprised :) 

Never thought hardware problems could.. behave in this way, at least
not which such consistency and precision, hehe.

Anyway, thanks a lot for your time+help. 

Dan Nilsson

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-08-24  8:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-19 18:46 Reoccuring corruption problem Dan Nilsson
2004-08-19 19:42 ` Sander
2004-08-21  5:43   ` Dan Nilsson
2004-08-21  7:59     ` mjt
2004-08-21  8:16     ` Sander
2004-08-24  8:21       ` Dan Nilsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.