public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gene Heskett <gene.heskett@gmail.com>
To: Zan Lynx <zlynx@acm.org>
Cc: Calvin Walton <calvin.walton@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux ide Mailing list <linux-ide@vger.kernel.org>
Subject: Re: Problem with ata layer in 2.6.24
Date: Mon, 28 Jan 2008 12:44:10 -0500	[thread overview]
Message-ID: <200801281244.10662.gene.heskett@gmail.com> (raw)
In-Reply-To: <200801281230.32910.gene.heskett@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3056 bytes --]

On Monday 28 January 2008, Gene Heskett wrote:
>On Monday 28 January 2008, Zan Lynx wrote:
>>On Mon, 2008-01-28 at 11:50 -0500, Calvin Walton wrote:
>>> On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote:
>>> > On Monday 28 January 2008, Mikael Pettersson wrote:
>>> > >Unfortunately we also see:
>>> > > > [   48.285456] nvidia: module license 'NVIDIA' taints kernel.
>>> > > > [   48.549725] ACPI: PCI Interrupt 0000:02:00.0[A] -> Link [APC4]
>>> > > > -> GSI 19 (level, high) -> IRQ 20 [   48.550149] NVRM: loading
>>> > > > NVIDIA UNIX x86 Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007
>>> > >
>>> > >We have no way of debugging that module, so please try 2.6.24 without
>>> > > it.
>>> >
>>> > Sorry, I can't do this and have a working machine.  The nv driver has
>>> > suffered bit rot or something since the FC2 days when it COULD run a
>>> > 19" crt at 1600x1200, and will not drive this 20" wide screen lcd
>>> > 1680x1050 monitor at more than 800x600, which is absolutely butt ugly
>>> > fuzzy, looking like a jpg compressed to 10%.  The system is not usable
>>> > on a day to basis without the nvidia driver.
>>>
>>> You should probably give the nouveau[1] driver a try, if only for
>>> testing purposes; if you are running an NV4x (G6x or G7x) card in
>>> particular, it works a lot better than the nv driver for 2d support.
>>>
>>> 1. http://nouveau.freedesktop.org/wiki/InstallNouveau
>>
>>But nouveau is much less stable than nv.  For testing purposes, go with
>>stable.
>
>I believe at this point, its moot.  I captured quite a few instances of that
>error message while rebooting the last time, all of which occurred long
>before I logged in and did a startx (I boot to runlevel 3 here), so the
>kernel was NOT tainted at that point.  That dmesg has been posted and some
>questions asked.
>
>As this has gone on for a while, it seems to me that with 14,800 google hits
>on this problem, Linus should call a halt until this is found and fixed. 
> But I'm not Linus.  I'm also locking up for 30 at a time, & probably ready
> for reboot #7 today.
>
>>I'm not sure why it won't run his screen though.  I can use nv to run a
>>1920x1200 laptop LCD.  It *is* dog slow (although nouveau was not any
>>better with a NV17 / 440-Go -- render support for AA fonts seems to be
>>missing), but it does work.

I've been trying to run a long selftest on that drive, but the constant 
reboots are fscking that up.  I have attached the last smartctl -a output, 
indicating that the test was aborted probably from all the resets that are 
being issued, the last one froze me for around 5 minutes but I haven't 
rebooted yet.  Its attached.  Can anyone see if there is actually anything 
wrong with the drive?  If a boot will last long enough for the -t long to 
complete, then it passes with no errors, but this was interrupted now for the 
3rd time.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Well begun is half done.
		-- Aristotle

[-- Attachment #2: smart.log --]
[-- Type: text/x-log, Size: 9050 bytes --]

smartctl version 5.37 [i386-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar SE family
Device Model:     WDC WD2000JB-00EVA0
Serial Number:    WD-WMAEH2782398
Firmware Version: 15.05R15
User Capacity:    200,049,647,616 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Jan 28 12:39:08 2008 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
					was aborted by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249)	Self-test routine in progress...
					90% of test remaining.
Total time to complete Offline 
data collection: 		 (6942) seconds.
Offline data collection
capabilities: 			 (0x79) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  88) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   194   194   051    Pre-fail  Always       -       12
  3 Spin_Up_Time            0x0007   180   124   021    Pre-fail  Always       -       1500
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       182
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       27779
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       178
194 Temperature_Celsius     0x0022   120   253   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0009   200   155   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 12 occurred at disk power-on lifetime: 472 hours (19 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 58 8b 3d 07 e0  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 58 00 00      00:03:35.550  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:35.550  NOP [Abort queued commands]
  00 00 ef 00 00 45 00 00      00:03:35.550  NOP [Abort queued commands]
  00 00 27 00 00 00 00 00      00:03:35.550  NOP [Abort queued commands]
  00 00 07 00 00 89 3d 00      00:03:35.550  NOP [Abort queued commands]

Error 11 occurred at disk power-on lifetime: 472 hours (19 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 58 8b 3d 07 e0  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 58 00 00      00:03:33.500  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:33.500  NOP [Abort queued commands]
  00 00 ef 00 00 45 00 00      00:03:33.500  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:33.500  NOP [Abort queued commands]
  00 00 c8 00 00 58 00 00      00:03:33.500  NOP [Abort queued commands]

Error 10 occurred at disk power-on lifetime: 472 hours (19 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 58 8b 3d 07 e0  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 58 00 00      00:03:31.400  NOP [Abort queued commands]
  00 00 ec 00 00 00 00 00      00:03:31.400  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:31.400  NOP [Abort queued commands]
  00 00 ec 00 00 00 00 00      00:03:31.400  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:31.400  NOP [Abort queued commands]

Error 9 occurred at disk power-on lifetime: 472 hours (19 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 58 8b 3d 07 e0  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 58 00 00      00:03:29.350  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:29.350  NOP [Abort queued commands]
  00 00 ef 00 00 45 00 00      00:03:29.350  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:29.350  NOP [Abort queued commands]
  00 00 c8 00 00 58 00 00      00:03:29.350  NOP [Abort queued commands]

Error 8 occurred at disk power-on lifetime: 472 hours (19 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 58 8b 3d 07 e0  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 58 00 00      00:03:27.150  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:27.150  NOP [Abort queued commands]
  00 00 ef 00 00 45 00 00      00:03:27.150  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00      00:03:27.150  NOP [Abort queued commands]
  00 00 c8 00 00 58 00 00      00:03:27.150  NOP [Abort queued commands]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       460         -
# 2  Extended offline    Completed without error       00%       452         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


  reply	other threads:[~2008-01-28 17:56 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-28  2:22 Problem with ata layer in 2.6.24 Gene Heskett
2008-01-28  3:19 ` Kasper Sandberg
2008-01-28  8:17 ` Mikael Pettersson
2008-01-28 12:03   ` Peter Zijlstra
2008-01-28 12:26     ` Mikael Pettersson
2008-01-28 12:45       ` Ingo Molnar
2008-01-28 12:54     ` Gene Heskett
2008-01-28 13:19       ` Gene Heskett
2008-01-28 13:57       ` Mikael Pettersson
2008-01-28 16:35         ` Gene Heskett
2008-01-28 16:50           ` Calvin Walton
2008-01-28 17:20             ` Zan Lynx
2008-01-28 17:30               ` Gene Heskett
2008-01-28 17:44                 ` Gene Heskett [this message]
2008-01-28 17:59                 ` Daniel Barkalow
2008-01-28 18:23                   ` Richard Heck
2008-01-28 18:53                     ` Andrey Borzenkov
2008-01-28 19:09                       ` Gene Heskett
2008-01-28 19:21                         ` Andrey Borzenkov
2008-01-28 19:31                           ` Gene Heskett
2008-01-28 20:00                       ` Richard Heck
2008-01-28 20:01                     ` Daniel Barkalow
2008-01-29  0:05                       ` Gene Heskett
2008-01-29  0:34                         ` Daniel Barkalow
2008-01-29  1:31                           ` Gene Heskett
2008-01-29  1:51                             ` Daniel Barkalow
2008-01-29  4:48                             ` Michal Jaegermann
2008-01-29 12:12                     ` Alan Cox
2008-01-29 14:30                       ` Gene Heskett
2008-01-29 14:51                       ` Gene Heskett
2008-01-29 15:47                         ` Alan Cox
2008-01-29 16:32                           ` Gene Heskett
2008-01-29 16:48                             ` Mikael Pettersson
2008-01-29 17:04                               ` Gene Heskett
2008-01-29 17:38                                 ` Daniel Barkalow
2008-01-29 17:44                                   ` Alan Cox
2008-01-29 18:12                                     ` Daniel Barkalow
2008-01-29 17:59                                   ` Gene Heskett
2008-01-29 18:54                                 ` Alan Cox
2008-01-29 22:41                                   ` Gene Heskett
2008-01-29 22:48                                     ` Alan Cox
2008-01-30  0:19                                     ` Mark Lord
2008-01-29 17:06                       ` rgheck
2008-01-29 17:12                         ` Alan Cox
2008-01-29 18:11                         ` Mark Lord
2008-01-29 18:28                           ` rgheck
2008-01-29 18:32                             ` Mark Lord
2008-01-29 18:14                       ` Daniel Barkalow
2008-01-29 18:46                         ` Alan Cox
2008-01-29 19:14                           ` Daniel Barkalow
2008-01-29 19:34                             ` Alan Cox
2008-01-28 16:56           ` Gene Heskett
2008-01-28 18:20             ` Mark Lord
2008-01-28 18:59               ` Gene Heskett
2008-01-28 20:43                 ` Mark Lord
2008-01-29  0:06                   ` Gene Heskett
2008-01-29  3:16                     ` Mark Lord
2008-01-29  4:07                       ` Gene Heskett
2008-01-28 17:06           ` Dave Neuer
2008-01-29  4:23           ` Kasper Sandberg
2008-01-29  4:49             ` Gene Heskett
2008-01-29  5:01               ` Kasper Sandberg
2008-02-02  7:13                 ` Tejun Heo
2008-01-28 14:44       ` Richard Heck
2008-01-28 17:01         ` Gene Heskett
2008-01-28 18:54 ` Mark Lord
2008-01-28 19:01   ` Mark Lord
2008-01-28 19:04   ` Gene Heskett
2008-01-28 20:22     ` Mark Lord
2008-01-28 20:32       ` Mark Lord
2008-01-29  0:10       ` Gene Heskett
2008-01-28 19:08 ` Jeff Garzik
2008-01-28 19:13   ` Gene Heskett
2008-01-29  6:41     ` Florian Attenberger
2008-01-29 15:04       ` Gene Heskett
2008-01-29 16:12         ` Mark Lord
2008-01-29 16:36           ` Gene Heskett
2008-01-29 18:09             ` Mark Lord
2008-01-29 16:50           ` rgheck
2008-01-29 16:58         ` Jeff Garzik
2008-01-29 17:12           ` Gene Heskett
2008-01-29 17:32             ` Jeff Garzik
2008-01-29 17:53               ` Gene Heskett
     [not found] <fa.YoSRdik0niRWE1jgfb9yTQPim5A@ifi.uio.no>
     [not found] ` <fa.fCSKKy5aVmh9IomaWmc+Y8P16+c@ifi.uio.no>
     [not found]   ` <fa.ebGd1717PgKMzHUmaQOLoLkFdsI@ifi.uio.no>
2008-01-29  0:19     ` Robert Hancock
2008-01-29  0:55       ` Gene Heskett
2008-01-29  1:31         ` Robert Hancock
2008-01-29  1:51           ` Gene Heskett
2008-01-29  2:20             ` Gene Heskett
2008-01-29  3:21               ` Mark Lord
  -- strict thread matches above, loose matches on Subject: below --
2008-01-29 17:42 Adam Turk
2008-01-29 17:55 ` Gene Heskett
2008-01-29 22:57   ` Adam Turk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801281244.10662.gene.heskett@gmail.com \
    --to=gene.heskett@gmail.com \
    --cc=calvin.walton@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zlynx@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox