From: Kalin KOZHUHAROV <kalin@thinrope.net>
To: linux-kernel@vger.kernel.org
Subject: Re: Help track down a freezing machine, libata or hardware
Date: Mon, 09 Jan 2006 01:28:17 +0900 [thread overview]
Message-ID: <dprej1$qmb$1@sea.gmane.org> (raw)
In-Reply-To: <dor1is$9bo$1@sea.gmane.org>
Kalin KOZHUHAROV wrote:
> Kalin KOZHUHAROV wrote:
>
>>Alex Riesen wrote:
>>
>>
>>>On 12/14/05, Kalin KOZHUHAROV <kalin@thinrope.net> wrote:
>>>
>>>
>>>
>>>>Now that I get a repetitive freeze, is there anything to debug the problem?
>>>>I guess, the point when kernel is still responsive to keyboard, but I cannot login.
>>>
>>>
>>>try to connect a serial console to it and press Alt+SysRq+t
>>
>>
>>Thank you for the suggestio, Alex.
>>
>>I was always trying to avoid the serial console till now (it just seems difficult, and I DO know it
>>is not), and didn't even bother with the netconsole...
>
>
> It is, I as have to go and buy a null modem cable... will do it.
>
>
>>So until now, here is an oops, the first I saw in a few months, captured by my camera and then
>>digitally enhanced: http://linux.tar.bz/reports/oopses/char/2.6.14.3-K01_char__oops1.jpg
>>
>>The OCRed/handwritten text ( http://linux.tar.bz/reports/oopses/char/2.6.14.3-K01_char__oops1.txt )
>>says:
>>
>>Call trace:
>>SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB)
>>SCSI device sda: drive cache: write back
>>[<c01ec22f>] kobject_put+0x1f/0x30
>>[<c028c8fd>] scsi_end_request+0xdd/0xf0
>>[<c028ccae>] scsi_io_completion+0x26e/0x570
>>[<c011b623>] load_balance_newidle+0x43/0x110
>>[<c028d255>] scsi_generic_done+0x35/0x50
>>[<c02873ee>] scsi_finish_command+0x8e/0xd0
>>[<c0318dea>] schedule+0x4da/0xd50
>>[<c0318e1d>] schedule+0x50d/0xd50
>>[<c028728f>] scsi_sortirq+0xdf/0x160
>>[<c0125836>] __do_softirq+0xd6/0xf0
>>[<c0125885>] do_softirq+0x35/0x40
>>[<c0125e35>] ksoftirqd+0x95/0xe0
>>[<c0125da0>] ksoftirqd+0x0/0xe0
>>[<c0135b9a>] kthread+0xba/0xc0
>>[<c0135ae0>] kthread+0x0/0xc0
>>[<c0101245>] kernel_thread_helper+0x5/0x10
>>Code: e1 08 00 89 44 24 04 89 1c 24 e8 27 b0 ff ff eb a5 90 8d 74 26 00 55 57 56
>> 53 83 ec 08 8b 44 24 1c 89 44 24 04 8b 80 ec 00 00 00 <8b> 38 f6 80 79 01 00 00
>> 80 0f 85 98 00 00 00 8b 47 2c 8d 6f 20
>><0>Kernel panic - not syncing: Fatal exception in interrupt
>>
>>Unfortunately everything was frozen (KBD too), so I couldn't scroll up to see the beginning. As you
>>may guess, it was not written to the disk.
>>
>>The oops happened on boot (after a hard power-off) and is probbably related to the SATA system.
>
>
> All right, the above started to be reproducible, about once every 3 boots: the system freezes when
> tries to initialize the ata sybsystem. (still don't have cable for the serial console, sorry)
>
>
>>The .config is available at http://linux.tar.bz/reports/oopses/char/2.6.14.3-K01_char.config
>
>
> Now this is 2.6.14.4 and the new config is:
> http://linux.tar.bz/reports/oopses/char/2.6.14.4-K01_char.config
>
> I added another 250GB SATA HDD and changed the PSU, but it does not seem to be related for that bug.
> Will try to tweak the IDE parameters in BIOS.
OK, now this looks like hardware failure... I run 2.6.15 the other day and I was happy to see 2d
uptime :-) However... everything was borked, root was mounted RO, and fs was generally screwed.
I don't have physical acces to the box right now, so I will post more details tomorrow, but this is
what I got as dmesg:
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0xb
ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sda, sector 10803956
Buffer I/O error on device sda3, logical block 362497
The above was repeated many times (buffer was full, so I don't know how many), cannot tell the time,
as trying to `cat /var/log/everything` resulted in IO error.
I also got a bunch of these:
ReiserFS: sda3: warning: clm-6006: writing inode 56216 on readonly FS
ReiserFS: sda3: warning: clm-6006: writing inode 56216 on readonly FS
ReiserFS: sda3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find
stat data of [25971 27180 0x0 SD]
ReiserFS: sda3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find
stat data of [9064 25871 0x0 SD]
ReiserFS: sda3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find
stat data of [25971 27180 0x0 SD]
Do you see this as hardware problem?
The drive in question is a WD740GD (SATA, 10k RPM) and I will run the WD tools on it in a day or two
to check if it is a hardware failure.
Kalin.
--
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|
next prev parent reply other threads:[~2006-01-08 16:28 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-14 12:55 Help track down a freezing machine Kalin KOZHUHAROV
2005-12-14 14:25 ` Alex Riesen
2005-12-16 17:52 ` Kalin KOZHUHAROV
2005-12-27 9:30 ` Help track down a freezing machine, libata? Kalin KOZHUHAROV
2006-01-08 16:28 ` Kalin KOZHUHAROV [this message]
2006-01-11 2:00 ` Help track down a freezing machine, libata or hardware Esben Stien
2005-12-14 20:15 ` Help track down a freezing machine Nigel Cunningham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='dprej1$qmb$1@sea.gmane.org' \
--to=kalin@thinrope.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox