All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kalin KOZHUHAROV <kalin@thinrope.net>
To: linux-kernel@vger.kernel.org
Subject: Re: Help track down a freezing machine, libata or hardware
Date: Mon, 09 Jan 2006 01:28:17 +0900	[thread overview]
Message-ID: <dprej1$qmb$1@sea.gmane.org> (raw)
In-Reply-To: <dor1is$9bo$1@sea.gmane.org>

Kalin KOZHUHAROV wrote:
> Kalin KOZHUHAROV wrote:
> 
>>Alex Riesen wrote:
>>
>>
>>>On 12/14/05, Kalin KOZHUHAROV <kalin@thinrope.net> wrote:
>>>
>>>
>>>
>>>>Now that I get a repetitive freeze, is there anything to debug the problem?
>>>>I guess, the point when kernel is still responsive to keyboard, but I cannot login.
>>>
>>>
>>>try to connect a serial console to it and press Alt+SysRq+t
>>
>>
>>Thank you for the suggestio, Alex.
>>
>>I was always trying to avoid the serial console till now (it just seems difficult, and I DO know it
>>is not), and didn't even bother with the netconsole...
> 
> 
> It is, I as have to go and buy a null modem cable... will do it.
> 
> 
>>So until now, here is an oops, the first I saw in a few months, captured by my camera and then
>>digitally enhanced: http://linux.tar.bz/reports/oopses/char/2.6.14.3-K01_char__oops1.jpg
>>
>>The OCRed/handwritten text ( http://linux.tar.bz/reports/oopses/char/2.6.14.3-K01_char__oops1.txt )
>>says:
>>
>>Call trace:
>>SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB)
>>SCSI device sda: drive cache: write back
>>[<c01ec22f>] kobject_put+0x1f/0x30
>>[<c028c8fd>] scsi_end_request+0xdd/0xf0
>>[<c028ccae>] scsi_io_completion+0x26e/0x570
>>[<c011b623>] load_balance_newidle+0x43/0x110
>>[<c028d255>] scsi_generic_done+0x35/0x50
>>[<c02873ee>] scsi_finish_command+0x8e/0xd0
>>[<c0318dea>] schedule+0x4da/0xd50
>>[<c0318e1d>] schedule+0x50d/0xd50
>>[<c028728f>] scsi_sortirq+0xdf/0x160
>>[<c0125836>] __do_softirq+0xd6/0xf0
>>[<c0125885>] do_softirq+0x35/0x40
>>[<c0125e35>] ksoftirqd+0x95/0xe0
>>[<c0125da0>] ksoftirqd+0x0/0xe0
>>[<c0135b9a>] kthread+0xba/0xc0
>>[<c0135ae0>] kthread+0x0/0xc0
>>[<c0101245>] kernel_thread_helper+0x5/0x10
>>Code: e1 08 00 89 44 24 04 89 1c 24 e8 27 b0 ff ff eb a5 90 8d 74 26 00 55 57 56
>> 53 83 ec 08 8b 44 24 1c 89 44 24 04 8b 80 ec 00 00 00 <8b> 38 f6 80 79 01 00 00
>> 80 0f 85 98 00 00 00 8b 47 2c 8d 6f 20
>><0>Kernel panic - not syncing: Fatal exception in interrupt
>>
>>Unfortunately everything was frozen (KBD too), so I couldn't scroll up to see the beginning. As you
>>may guess, it was not written to the disk.
>>
>>The oops happened on boot (after a hard power-off) and is probbably related to the SATA system.
> 
> 
> All right, the above started to be reproducible, about once every 3 boots: the system freezes when
> tries to initialize the ata sybsystem. (still don't have cable for the serial console, sorry)
> 
> 
>>The .config is available at http://linux.tar.bz/reports/oopses/char/2.6.14.3-K01_char.config
> 
> 
> Now this is 2.6.14.4 and the new config is:
> http://linux.tar.bz/reports/oopses/char/2.6.14.4-K01_char.config
> 
> I added another 250GB SATA HDD and changed the PSU, but it does not seem to be related for that bug.
> Will try to tweak the IDE parameters in BIOS.

OK, now this looks like hardware failure... I run 2.6.15 the other day and I was happy to see 2d
uptime :-) However... everything was borked, root was mounted RO, and fs was generally screwed.

I don't have physical acces to the box right now, so I will post more details tomorrow, but this is
what I got as dmesg:


ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: port reset, p_is 40000001 is 1 pis 0 cmd c017 tf 471 ss 113 se 0
ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0xb
    ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sda, sector 10803956
Buffer I/O error on device sda3, logical block 362497


The above was repeated many times (buffer was full, so I don't know how many), cannot tell the time,
as trying to `cat /var/log/everything` resulted in IO error.
I also got a bunch of these:

ReiserFS: sda3: warning: clm-6006: writing inode 56216 on readonly FS
ReiserFS: sda3: warning: clm-6006: writing inode 56216 on readonly FS
ReiserFS: sda3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find
stat data of [25971 27180 0x0 SD]
ReiserFS: sda3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find
stat data of [9064 25871 0x0 SD]
ReiserFS: sda3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find
stat data of [25971 27180 0x0 SD]

Do you see this as hardware problem?

The drive in question is a WD740GD (SATA, 10k RPM) and I will run the WD tools on it in a day or two
to check if it is a hardware failure.

Kalin.

-- 
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|


  reply	other threads:[~2006-01-08 16:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-14 12:55 Help track down a freezing machine Kalin KOZHUHAROV
2005-12-14 14:25 ` Alex Riesen
2005-12-16 17:52   ` Kalin KOZHUHAROV
2005-12-27  9:30     ` Help track down a freezing machine, libata? Kalin KOZHUHAROV
2006-01-08 16:28       ` Kalin KOZHUHAROV [this message]
2006-01-11  2:00         ` Help track down a freezing machine, libata or hardware Esben Stien
2005-12-14 20:15 ` Help track down a freezing machine Nigel Cunningham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='dprej1$qmb$1@sea.gmane.org' \
    --to=kalin@thinrope.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.