All of lore.kernel.org
 help / color / mirror / Atom feed
* reiser4 crash
@ 2004-07-24 16:27 Francesco Biscani
  2004-07-25  7:41 ` mjt
  2004-08-01 11:37 ` reiser4 crash [solved?] Francesco Biscani
  0 siblings, 2 replies; 6+ messages in thread
From: Francesco Biscani @ 2004-07-24 16:27 UTC (permalink / raw)
  To: reiserfs-list

Hi,

I had reiser4 crash pretty badly. Here's the story.

My distribution is Gentoo. As you probably know it uses as packaging system a 
tool called "emerge" which basically installs applications following 
installation scripts called "ebuilds". Usually packages are compiled from 
sources, but not necessarily, since ebuilds can contain totally arbitrary 
instructions.

I decided to install the pre-compiled binary version of Openoffice 1.1.2, 
which under Gentoo is known as "openoffice-bin". The installation went on 
regularly, but near the end everything seemed to hang in the "Registering 
components" phase. No CPU or HD activity. After a while, suspecting a bug in 
the ebuild, I went over the Gentoo forums and I found these posts:

http://forums.gentoo.org/viewtopic.php?t=201410&highlight=openofficebin
http://forums.gentoo.org/viewtopic.php?t=184798&highlight=openofficebin+reiser4

These people also report problems installing openoffice on reiser4. In the 
meanwhile the installation process of openoffice-bin was still hanging, but 
suddenly the CPU went 100%. It was "system" activity, no "user" activity. Top 
revealed that it was the installation process that was eating all my CPU. The 
system was still working, but "sync" was not working (it hung). Pretty much 
worried, CPU still 100%, I tried to reboot, but the system was not able to do 
that. I tried to kill the offending process, with no luck. I had no choice 
but to push the power button.

fsck 0.5.6 revealed these errors:

FSCK: Directory [ccb2c:6d703300000000:10b195] (dir40), node [790184], item 
[0], unit [55]: entry has wrong offset
[10b195:0(NAME):14d69636861656c:2e4275626ce92e4d:14942a136fe7bf]. Should be
[10b195:0(NAME):14d69636861656c:2e4275626ce92e4d:14942a136f370f].
FSCK: Directory [209045:1536f6e6e792052:2e987a] (dir40), node [3593262], item 
[0], unit [5]: entry has wrong offset
[2e987a:0(NAME):1536f6e6e792052:6f6c6c696e73202d:2bd0cd03e55f727a]. Should be
[2e987a:0(NAME):1536f6e6e792052:6f6c6c696e73202d:2bd0cd03bde670ca].

I had to issue a --build-fs, which lead to:

FSCK: No 'lost+found' entry found. Building a new object with the key 
2a:0:ffff.
FSCK: Failed to recognize the plugin for the directory [2a:0:ffff].
FSCK: Trying to recover the directory [2a:0:ffff] with the default 
plugin--dir40.
FSCK: The file [2a:0:ffff] does not have a StatData item. Creating a new one. 
Plugin dir40.
FSCK: Directory [2a:0:ffff]: The entry "." is not found. Insert a new one. 
Plugin (dir40).
FSCK: Node (460152), item (2), [2a:0:ffff] (stat40): wrong size (0), Fixed to 
(1).
FSCK: Node (460152), item (2), [2a:0:ffff] (stat40): wrong bytes (0), Fixed to 
(50).
FSCK: Directory [ccb2c:6d703300000000:10b195] (dir40), node [790184], item 
[0], unit [55]: entry has wrong offset
[10b195:0(NAME):14d69636861656c:2e4275626ce92e4d:14942a136fe7bf]. Should be
[10b195:0(NAME):14d69636861656c:2e4275626ce92e4d:14942a136f370f]. Removed.
FSCK: Node (2917509), item (11), [ccb2c:6d703300000000:10b195] (stat40): wrong 
size (62), Fixed to (61).
FSCK: Node (2917509), item (11), [ccb2c:6d703300000000:10b195] (stat40): wrong 
bytes (4090), Fixed to (4012).
FSCK: Directory [209045:1536f6e6e792052:2e987a] (dir40), node [3593262], item 
[0], unit [5]: entry has wrong offset
[2e987a:0(NAME):1536f6e6e792052:6f6c6c696e73202d:2bd0cd03e55f727a]. Should be
[2e987a:0(NAME):1536f6e6e792052:6f6c6c696e73202d:2bd0cd03bde670ca]. Removed.
FSCK: Node (3688550), item (22), [209045:1536f6e6e792052:2e987a] (stat40): 
wrong size (13), Fixed to (12).
FSCK: Node (3688550), item (22), [209045:1536f6e6e792052:2e987a] (stat40): 
wrong bytes (1154), Fixed to (1052).

After that fs was consistent. In lost+found I found some files from the 
web-browser's cache and some temporary files from the installation of 
openoffice. So it probably stopped committing changes to the fs when the 
installation hung.

System logs did not record anything. Fortunately it seems like nothing is 
missing from my fs. Should I be worried about something? fsck does not find 
any errors.

Using auto-snapshot from 20 July agains 2.6.7-mm7.

Hope this is useful. I'll be glad to give more details is asked to. 
Regards,

  Francesco

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reiser4 crash
  2004-07-24 16:27 reiser4 crash Francesco Biscani
@ 2004-07-25  7:41 ` mjt
  2004-07-25 19:37   ` Francesco Biscani
  2004-08-01 11:37 ` reiser4 crash [solved?] Francesco Biscani
  1 sibling, 1 reply; 6+ messages in thread
From: mjt @ 2004-07-25  7:41 UTC (permalink / raw)
  To: Francesco Biscani; +Cc: reiserfs-list

On Sat, Jul 24, 2004 at 06:27:54PM +0200, Francesco Biscani wrote:
>
>Hope this is useful. I'll be glad to give more details is asked to. 
>Regards,

Try patching in
http://mjt.nysv.org/reiser/log-write-readpage-releasepage-2.diff.gz

Then recompile the kernel with debugging and assertions (printing was iirc
not required) turned on and try to reproduce it.

Note, that this may cause your system to oops and go haywire big time, so
if you have a netconsole or something to log, it's great.
One other method is cat /proc/kmsg > foo and scping foo elsewhere before
the computer goes down.

The patch above is from Namesys, but I don't think it's in any of the
auto-snapshots (should it be?) and it may or may not give more info
on what's going on, but if it does, the output is some 512 extra lines
of log.

-- 
mjt


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reiser4 crash
  2004-07-25  7:41 ` mjt
@ 2004-07-25 19:37   ` Francesco Biscani
  2004-07-26 15:03     ` Francesco Biscani
  0 siblings, 1 reply; 6+ messages in thread
From: Francesco Biscani @ 2004-07-25 19:37 UTC (permalink / raw)
  To: Markus Törnqvist; +Cc: reiserfs-list

On Sunday 25 July 2004 09:41, Markus Törnqvist wrote:
> On Sat, Jul 24, 2004 at 06:27:54PM +0200, Francesco Biscani wrote:
> >Hope this is useful. I'll be glad to give more details is asked to.
> >Regards,
>
> Try patching in
> http://mjt.nysv.org/reiser/log-write-readpage-releasepage-2.diff.gz
>
> Then recompile the kernel with debugging and assertions (printing was iirc
> not required) turned on and try to reproduce it.
>

Well, I'll try to do something but it'll be difficult. On the laptop I have 
reiser4 on /, and I cannot afford to break it. I could try on the workstation 
at home where I have a test partition for reiser4, but I'll be away until the 
next weekend. Maybe some other Gentoo user could help (Redeeman are you 
listening? :))

> Note, that this may cause your system to oops and go haywire big time, so
> if you have a netconsole or something to log, it's great.
> One other method is cat /proc/kmsg > foo and scping foo elsewhere before
> the computer goes down.
>

Ok.

An update: I have found a dir called 

"lost_name_<insert garbage here>"

I've fixed the name, which obviously was lost during --build-fs. I'm getting a 
bit psychotic about this but is there anything I can do to make sure 
everything is alright? I've searched for other lost names but I found 
nothing. The system is working as normal. Should I expect that something was 
lost at all? It is a bit strange because the dir with the garbled name was 
not open in write mode when the crash happened. Should I expect random 
corruption to be happened?

Thanks very much.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reiser4 crash
  2004-07-25 19:37   ` Francesco Biscani
@ 2004-07-26 15:03     ` Francesco Biscani
  0 siblings, 0 replies; 6+ messages in thread
From: Francesco Biscani @ 2004-07-26 15:03 UTC (permalink / raw)
  To: reiserfs-list; +Cc: Markus Törnqvist

On Sunday 25 July 2004 21:37, Francesco Biscani wrote:
> On Sunday 25 July 2004 09:41, Markus Törnqvist wrote:
> > On Sat, Jul 24, 2004 at 06:27:54PM +0200, Francesco Biscani wrote:
> > >Hope this is useful. I'll be glad to give more details is asked to.
> > >Regards,
> >
> > Try patching in
> > http://mjt.nysv.org/reiser/log-write-readpage-releasepage-2.diff.gz
> >
> > Then recompile the kernel with debugging and assertions (printing was
> > iirc not required) turned on and try to reproduce it.
>
> Well, I'll try to do something but it'll be difficult. On the laptop I have
> reiser4 on /, and I cannot afford to break it. I could try on the
> workstation at home where I have a test partition for reiser4, but I'll be
> away until the next weekend. Maybe some other Gentoo user could help
> (Redeeman are you listening? :))
>


Mmmhh.. I was think that the bug could pop up also by just installing the 
binary version of openoffice. I did not see any strange command in the ebuild 
and I've never had problems with emerge+reiser4 before. Is anyone brave 
enough to try that?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reiser4 crash [solved?]
  2004-07-24 16:27 reiser4 crash Francesco Biscani
  2004-07-25  7:41 ` mjt
@ 2004-08-01 11:37 ` Francesco Biscani
  2004-08-01 11:38   ` mjt
  1 sibling, 1 reply; 6+ messages in thread
From: Francesco Biscani @ 2004-08-01 11:37 UTC (permalink / raw)
  To: reiserfs-list

On Saturday 24 July 2004 18:27, Francesco Biscani wrote:
> Hi,
>
> I had reiser4 crash pretty badly. Here's the story.
>
>[...]

I tried to reproduce the crash on my workstation and laptop (the machine on 
which the crash showed for the 1st time). In both cases the problem did not 
appear. Using 30/07 snapshot on both machines with all debug options enabled. 
Nothing appears in logs. Bug solved?

Regards,
  Francesco

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reiser4 crash [solved?]
  2004-08-01 11:37 ` reiser4 crash [solved?] Francesco Biscani
@ 2004-08-01 11:38   ` mjt
  0 siblings, 0 replies; 6+ messages in thread
From: mjt @ 2004-08-01 11:38 UTC (permalink / raw)
  To: Francesco Biscani; +Cc: reiserfs-list

On Sun, Aug 01, 2004 at 01:37:37PM +0200, Francesco Biscani wrote:
>I tried to reproduce the crash on my workstation and laptop (the machine on 
>which the crash showed for the 1st time). In both cases the problem did not 
>appear. Using 30/07 snapshot on both machines with all debug options enabled. 
>Nothing appears in logs. Bug solved?

I got a patch from them that fixed these issues, but as it was only
copypasted to irc, I did not publish it. It was just deleting some lines.

Anyway, I think that got merged in 2004.07.30, or somewhere, but the only
other source tree I have beside 2004.07.27 with the manual patch is
the aforementioned snapshot, but I can't remember the location of the
removals anymore :)

But hey, they fixed the bug for me, thanks again for all that! :)

-- 
mjt


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-08-01 11:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-24 16:27 reiser4 crash Francesco Biscani
2004-07-25  7:41 ` mjt
2004-07-25 19:37   ` Francesco Biscani
2004-07-26 15:03     ` Francesco Biscani
2004-08-01 11:37 ` reiser4 crash [solved?] Francesco Biscani
2004-08-01 11:38   ` mjt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.