From: Jeff Mahoney <jeffm@suse.com>
To: Adam Nielsen <a.nielsen@optushome.com.au>
Cc: reiserfs-list@namesys.com
Subject: Re: BUG: Replaying ReiserFS log causes hard lockup
Date: Fri, 26 Mar 2004 10:46:40 -0500 [thread overview]
Message-ID: <40645060.7050707@suse.com> (raw)
In-Reply-To: <20040326223443.2e5011fa.a.nielsen@optushome.com.au>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Adam Nielsen wrote:
|>Then it locks. Once it printed the message about INIT starting after
|>this, so I guess it's something around this point that's causing the
|>problem. However since it seems to be a *really* bad lock, I don't
|>know how to proceed :-(
|
|
| Ok, I've done some more fiddling and I think I've narrowed the problem
| down to the SATA driver. Since the kernel used to boot, then it
| started locking up during booting, I was trying to work out what could
| cause this. I tried wiggling the SATA cable again, and that fixed
| it - I could boot Linux again.
|
| Wiggling the SATA cable while Linux was loaded didn't seem to cause a
| problem, so I started dd reading off a ton of data and started fiddling
| with the cable. I couldn't seem to fault anything, so I tried
| unplugging the cable altogether (as SATA is hot swappable.) Obviously
| everything accessing the drive paused, and when I plugged the cable back
| into the drive it seemed as though everything resumed again (because the
| drive started reading off a ton of data due to dd running.) However,
| after a few seconds the drive stopped reading, and dd wouldn't
| terminate. I tried switching to another VT and logging in, but after I
| typed in my login name it froze too. It seems that if the drive gets
| disconnected and reconnected, everything accessing it stops (I'm
| guessing due to a driver problem.)
|
| I'm still not sure why it would lock up during boot, but perhaps when
| the SATA hardware is initialised the drive disconnects and reconnects,
| and if it hasn't reconnected by the time something (e.g. ReiserFS) wants
| to access the disk, that causes the lockup - only since the lock happens
| in the kernel, it completely locks everything up. Of course, this is
| just a wild guess, but given the behaviour it does seem like a
| possibility. Especially if hotswapping is a feature not yet implemented
| in the driver (which is quite likely, as it is still only in the testing
| stage.)
|
| Anyway, thanks for all your help, and I might try to find the SI3512
| driver people and see what they think. Wish me luck ;-)
It may be a driver problem, but it could also be a shortcoming in the
way ReiserFS currently handles write errors in the journal.
Even when all you're doing is reading from the disk, you're still
writing a little bit (unless you've mounted with -onoatime,nodiratime).
ReiserFS can't deal at all with the device going away when it's
performing journal operations. Since updating the atime for a file
requires altering the metadata, it uses the journal. If I had to guess,
I'd say that you'd see ReiserFS panic in your logs, with something along
the lines of "journal-###: buffer write failed". A panic will make all
further access to the filesystem hang.
I'm in the final stages of a patch that will allow ReiserFS to handle
journal io-errors more gracefully. The result will be, rather than
panicking the system on journal write, the filesystem will be forced
read-only and all active transactions will be aborted and released. The
filesystem will umount'able, and on re-mount will appear similar to as
if a power failure had occured. However, since it did abort on an
io-error, I'd recommend a reiserfsck on the aborted partition.
- -Jeff
- --
Jeff Mahoney
SuSE Labs
jeffm@suse.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFAZFBgLPWxlyuTD7IRAgOYAJ4m9a2QRsUH7BuB7igHOWZf3P3j4ACfWml/
0aYNPCreOG3UQbI4/YNJTSw=
=Bqo5
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2004-03-26 15:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-25 10:48 BUG: Replaying ReiserFS log causes hard lockup Adam Nielsen
2004-03-25 11:15 ` Vladimir Saveliev
2004-03-25 12:14 ` Adam Nielsen
2004-03-25 12:55 ` Vladimir Saveliev
2004-03-25 12:54 ` Chris Mason
2004-03-26 12:01 ` Adam Nielsen
2004-03-26 12:34 ` Adam Nielsen
2004-03-26 15:46 ` Jeff Mahoney [this message]
2004-03-26 23:06 ` Adam Nielsen
2004-03-25 18:51 ` Matthias Andree
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40645060.7050707@suse.com \
--to=jeffm@suse.com \
--cc=a.nielsen@optushome.com.au \
--cc=reiserfs-list@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.