* reiser4 crash
@ 2004-07-24 16:27 Francesco Biscani
2004-07-25 7:41 ` mjt
2004-08-01 11:37 ` reiser4 crash [solved?] Francesco Biscani
0 siblings, 2 replies; 6+ messages in thread
From: Francesco Biscani @ 2004-07-24 16:27 UTC (permalink / raw)
To: reiserfs-list
Hi,
I had reiser4 crash pretty badly. Here's the story.
My distribution is Gentoo. As you probably know it uses as packaging system a
tool called "emerge" which basically installs applications following
installation scripts called "ebuilds". Usually packages are compiled from
sources, but not necessarily, since ebuilds can contain totally arbitrary
instructions.
I decided to install the pre-compiled binary version of Openoffice 1.1.2,
which under Gentoo is known as "openoffice-bin". The installation went on
regularly, but near the end everything seemed to hang in the "Registering
components" phase. No CPU or HD activity. After a while, suspecting a bug in
the ebuild, I went over the Gentoo forums and I found these posts:
http://forums.gentoo.org/viewtopic.php?t=201410&highlight=openofficebin
http://forums.gentoo.org/viewtopic.php?t=184798&highlight=openofficebin+reiser4
These people also report problems installing openoffice on reiser4. In the
meanwhile the installation process of openoffice-bin was still hanging, but
suddenly the CPU went 100%. It was "system" activity, no "user" activity. Top
revealed that it was the installation process that was eating all my CPU. The
system was still working, but "sync" was not working (it hung). Pretty much
worried, CPU still 100%, I tried to reboot, but the system was not able to do
that. I tried to kill the offending process, with no luck. I had no choice
but to push the power button.
fsck 0.5.6 revealed these errors:
FSCK: Directory [ccb2c:6d703300000000:10b195] (dir40), node [790184], item
[0], unit [55]: entry has wrong offset
[10b195:0(NAME):14d69636861656c:2e4275626ce92e4d:14942a136fe7bf]. Should be
[10b195:0(NAME):14d69636861656c:2e4275626ce92e4d:14942a136f370f].
FSCK: Directory [209045:1536f6e6e792052:2e987a] (dir40), node [3593262], item
[0], unit [5]: entry has wrong offset
[2e987a:0(NAME):1536f6e6e792052:6f6c6c696e73202d:2bd0cd03e55f727a]. Should be
[2e987a:0(NAME):1536f6e6e792052:6f6c6c696e73202d:2bd0cd03bde670ca].
I had to issue a --build-fs, which lead to:
FSCK: No 'lost+found' entry found. Building a new object with the key
2a:0:ffff.
FSCK: Failed to recognize the plugin for the directory [2a:0:ffff].
FSCK: Trying to recover the directory [2a:0:ffff] with the default
plugin--dir40.
FSCK: The file [2a:0:ffff] does not have a StatData item. Creating a new one.
Plugin dir40.
FSCK: Directory [2a:0:ffff]: The entry "." is not found. Insert a new one.
Plugin (dir40).
FSCK: Node (460152), item (2), [2a:0:ffff] (stat40): wrong size (0), Fixed to
(1).
FSCK: Node (460152), item (2), [2a:0:ffff] (stat40): wrong bytes (0), Fixed to
(50).
FSCK: Directory [ccb2c:6d703300000000:10b195] (dir40), node [790184], item
[0], unit [55]: entry has wrong offset
[10b195:0(NAME):14d69636861656c:2e4275626ce92e4d:14942a136fe7bf]. Should be
[10b195:0(NAME):14d69636861656c:2e4275626ce92e4d:14942a136f370f]. Removed.
FSCK: Node (2917509), item (11), [ccb2c:6d703300000000:10b195] (stat40): wrong
size (62), Fixed to (61).
FSCK: Node (2917509), item (11), [ccb2c:6d703300000000:10b195] (stat40): wrong
bytes (4090), Fixed to (4012).
FSCK: Directory [209045:1536f6e6e792052:2e987a] (dir40), node [3593262], item
[0], unit [5]: entry has wrong offset
[2e987a:0(NAME):1536f6e6e792052:6f6c6c696e73202d:2bd0cd03e55f727a]. Should be
[2e987a:0(NAME):1536f6e6e792052:6f6c6c696e73202d:2bd0cd03bde670ca]. Removed.
FSCK: Node (3688550), item (22), [209045:1536f6e6e792052:2e987a] (stat40):
wrong size (13), Fixed to (12).
FSCK: Node (3688550), item (22), [209045:1536f6e6e792052:2e987a] (stat40):
wrong bytes (1154), Fixed to (1052).
After that fs was consistent. In lost+found I found some files from the
web-browser's cache and some temporary files from the installation of
openoffice. So it probably stopped committing changes to the fs when the
installation hung.
System logs did not record anything. Fortunately it seems like nothing is
missing from my fs. Should I be worried about something? fsck does not find
any errors.
Using auto-snapshot from 20 July agains 2.6.7-mm7.
Hope this is useful. I'll be glad to give more details is asked to.
Regards,
Francesco
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: reiser4 crash
2004-07-24 16:27 reiser4 crash Francesco Biscani
@ 2004-07-25 7:41 ` mjt
2004-07-25 19:37 ` Francesco Biscani
2004-08-01 11:37 ` reiser4 crash [solved?] Francesco Biscani
1 sibling, 1 reply; 6+ messages in thread
From: mjt @ 2004-07-25 7:41 UTC (permalink / raw)
To: Francesco Biscani; +Cc: reiserfs-list
On Sat, Jul 24, 2004 at 06:27:54PM +0200, Francesco Biscani wrote:
>
>Hope this is useful. I'll be glad to give more details is asked to.
>Regards,
Try patching in
http://mjt.nysv.org/reiser/log-write-readpage-releasepage-2.diff.gz
Then recompile the kernel with debugging and assertions (printing was iirc
not required) turned on and try to reproduce it.
Note, that this may cause your system to oops and go haywire big time, so
if you have a netconsole or something to log, it's great.
One other method is cat /proc/kmsg > foo and scping foo elsewhere before
the computer goes down.
The patch above is from Namesys, but I don't think it's in any of the
auto-snapshots (should it be?) and it may or may not give more info
on what's going on, but if it does, the output is some 512 extra lines
of log.
--
mjt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: reiser4 crash
2004-07-25 7:41 ` mjt
@ 2004-07-25 19:37 ` Francesco Biscani
2004-07-26 15:03 ` Francesco Biscani
0 siblings, 1 reply; 6+ messages in thread
From: Francesco Biscani @ 2004-07-25 19:37 UTC (permalink / raw)
To: Markus Törnqvist; +Cc: reiserfs-list
On Sunday 25 July 2004 09:41, Markus Törnqvist wrote:
> On Sat, Jul 24, 2004 at 06:27:54PM +0200, Francesco Biscani wrote:
> >Hope this is useful. I'll be glad to give more details is asked to.
> >Regards,
>
> Try patching in
> http://mjt.nysv.org/reiser/log-write-readpage-releasepage-2.diff.gz
>
> Then recompile the kernel with debugging and assertions (printing was iirc
> not required) turned on and try to reproduce it.
>
Well, I'll try to do something but it'll be difficult. On the laptop I have
reiser4 on /, and I cannot afford to break it. I could try on the workstation
at home where I have a test partition for reiser4, but I'll be away until the
next weekend. Maybe some other Gentoo user could help (Redeeman are you
listening? :))
> Note, that this may cause your system to oops and go haywire big time, so
> if you have a netconsole or something to log, it's great.
> One other method is cat /proc/kmsg > foo and scping foo elsewhere before
> the computer goes down.
>
Ok.
An update: I have found a dir called
"lost_name_<insert garbage here>"
I've fixed the name, which obviously was lost during --build-fs. I'm getting a
bit psychotic about this but is there anything I can do to make sure
everything is alright? I've searched for other lost names but I found
nothing. The system is working as normal. Should I expect that something was
lost at all? It is a bit strange because the dir with the garbled name was
not open in write mode when the crash happened. Should I expect random
corruption to be happened?
Thanks very much.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: reiser4 crash
2004-07-25 19:37 ` Francesco Biscani
@ 2004-07-26 15:03 ` Francesco Biscani
0 siblings, 0 replies; 6+ messages in thread
From: Francesco Biscani @ 2004-07-26 15:03 UTC (permalink / raw)
To: reiserfs-list; +Cc: Markus Törnqvist
On Sunday 25 July 2004 21:37, Francesco Biscani wrote:
> On Sunday 25 July 2004 09:41, Markus Törnqvist wrote:
> > On Sat, Jul 24, 2004 at 06:27:54PM +0200, Francesco Biscani wrote:
> > >Hope this is useful. I'll be glad to give more details is asked to.
> > >Regards,
> >
> > Try patching in
> > http://mjt.nysv.org/reiser/log-write-readpage-releasepage-2.diff.gz
> >
> > Then recompile the kernel with debugging and assertions (printing was
> > iirc not required) turned on and try to reproduce it.
>
> Well, I'll try to do something but it'll be difficult. On the laptop I have
> reiser4 on /, and I cannot afford to break it. I could try on the
> workstation at home where I have a test partition for reiser4, but I'll be
> away until the next weekend. Maybe some other Gentoo user could help
> (Redeeman are you listening? :))
>
Mmmhh.. I was think that the bug could pop up also by just installing the
binary version of openoffice. I did not see any strange command in the ebuild
and I've never had problems with emerge+reiser4 before. Is anyone brave
enough to try that?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: reiser4 crash [solved?]
2004-07-24 16:27 reiser4 crash Francesco Biscani
2004-07-25 7:41 ` mjt
@ 2004-08-01 11:37 ` Francesco Biscani
2004-08-01 11:38 ` mjt
1 sibling, 1 reply; 6+ messages in thread
From: Francesco Biscani @ 2004-08-01 11:37 UTC (permalink / raw)
To: reiserfs-list
On Saturday 24 July 2004 18:27, Francesco Biscani wrote:
> Hi,
>
> I had reiser4 crash pretty badly. Here's the story.
>
>[...]
I tried to reproduce the crash on my workstation and laptop (the machine on
which the crash showed for the 1st time). In both cases the problem did not
appear. Using 30/07 snapshot on both machines with all debug options enabled.
Nothing appears in logs. Bug solved?
Regards,
Francesco
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: reiser4 crash [solved?]
2004-08-01 11:37 ` reiser4 crash [solved?] Francesco Biscani
@ 2004-08-01 11:38 ` mjt
0 siblings, 0 replies; 6+ messages in thread
From: mjt @ 2004-08-01 11:38 UTC (permalink / raw)
To: Francesco Biscani; +Cc: reiserfs-list
On Sun, Aug 01, 2004 at 01:37:37PM +0200, Francesco Biscani wrote:
>I tried to reproduce the crash on my workstation and laptop (the machine on
>which the crash showed for the 1st time). In both cases the problem did not
>appear. Using 30/07 snapshot on both machines with all debug options enabled.
>Nothing appears in logs. Bug solved?
I got a patch from them that fixed these issues, but as it was only
copypasted to irc, I did not publish it. It was just deleting some lines.
Anyway, I think that got merged in 2004.07.30, or somewhere, but the only
other source tree I have beside 2004.07.27 with the manual patch is
the aforementioned snapshot, but I can't remember the location of the
removals anymore :)
But hey, they fixed the bug for me, thanks again for all that! :)
--
mjt
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-08-01 11:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-24 16:27 reiser4 crash Francesco Biscani
2004-07-25 7:41 ` mjt
2004-07-25 19:37 ` Francesco Biscani
2004-07-26 15:03 ` Francesco Biscani
2004-08-01 11:37 ` reiser4 crash [solved?] Francesco Biscani
2004-08-01 11:38 ` mjt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.