* Re: 2.6.16-rc1-mm2
[not found] ` <200601211014.44041.edt@aei.ca>
@ 2006-01-21 16:39 ` Ed Tomlinson
2006-01-21 18:45 ` 2.6.16-rc1-mm2 Barry K. Nathan
0 siblings, 1 reply; 8+ messages in thread
From: Ed Tomlinson @ 2006-01-21 16:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, reiserfs-dev, jgarzik, linux-scsi
On Saturday 21 January 2006 10:14, Ed Tomlinson wrote:
> Hi,
>
> >From my perspective 2.6.16-rc1-mm2 still needs work. I did not try 15-mm1 or mm2. Both
> mm3 and mm4 had X problems in that the system would lock but the keyboard was still
> active for Sysrq. The lockups took days to occur on both mm3 and mm4. The reiser3 problem
> made it impossible to test rc1-mm1, rc2-mm2 locked hard sometime in the first 4 hours of
> use - this time sysrq was dead too.
>
> The system is a amd64 using x86_64 from the unofficial debian build. The box is stable using
> 15-rc5-mm3 which has had uptimes of over two weeks.
>
> If anyone has ideas on what to backout let me know. Failing that I will boot with a serial console
> active and see that it reports.
>
> Ideas,
> Ed Tomlinson
Serial console shows that its an I/O error triggering a reiserfs4 kernel panic
[ 559.544404] end_request: I/O error, dev sda, sector 19856555
[ 559.554791] reiser4 panicked cowardly: reiser4[wget(6000)]: commit_current_atom (fs/reiser4/txnmgr.c:1130)[zam-597]:
[ 559.554794] write log failed (-5)
[ 559.554795]
[ 559.582807] Kernel panic - not syncing: reiser4[wget(6000)]: commit_current_atom (fs/reiser4/txnmgr.c:1130)[zam-597]:
[ 559.582809] write log failed (-5)
Have some new errors started to be passed back thru the scsi / libata stack? I have had no problems
with 15-rc5-mm3 though it may just be masking an issue...
some more info on this:
lspci -vvv from 2.6.15-rc5-mm3
0000:00:0a.0 IDE interface: nVidia Corporation CK8S Serial ATA Controller (v2.5) (rev a2) (prog-if 85 [Master SecO PriO])
Subsystem: Micro-Star International Co., Ltd.: Unknown device 0300
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0 (750ns min, 250ns max)
Interrupt: pin A routed to IRQ 17
Region 0: I/O ports at 09f0 [size=8]
Region 1: I/O ports at 0bf0 [size=4]
Region 2: I/O ports at 0970 [size=8]
Region 3: I/O ports at 0b70 [size=4]
Region 4: I/O ports at e000 [size=16]
Region 5: I/O ports at e400 [size=128]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Extract from the boot of 2.6.16-rc1-mm2
Jan 21 10:50:46 grover kernel: [ 26.457745] SCSI subsystem initialized
Jan 21 10:50:46 grover kernel: [ 26.495088] libata version 1.20 loaded.
Jan 21 10:50:46 grover kernel: [ 26.498691] sata_nv 0000:00:09.0: version 0.8
Jan 21 10:50:46 grover kernel: [ 26.499255] ACPI: PCI Interrupt Link [APSI] enabled at IRQ 23
Jan 21 10:50:46 grover kernel: [ 26.518542] ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [APSI] -> GSI 23 (level, high) -> IRQ 16
Jan 21 10:50:46 grover kernel: [ 26.547990] PCI: Setting latency timer of device 0000:00:09.0 to 64
Jan 21 10:50:46 grover kernel: [ 26.548042] ata1: SATA max UDMA/133 cmd 0x9E0 ctl 0xBE2 bmdma 0xC800 irq 16
Jan 21 10:50:46 grover kernel: [ 26.570996] ata2: SATA max UDMA/133 cmd 0x960 ctl 0xB62 bmdma 0xC808 irq 16
Jan 21 10:50:46 grover kernel: [ 26.669114] usb 2-4.1: new low speed USB device using ohci_hcd and address3
Jan 21 10:50:46 grover kernel: [ 26.819069] usb 2-4.1: configuration #1 chosen from 1 choice
Jan 21 10:50:46 grover kernel: [ 26.843314] ata1: SATA link up 1.5 Gbps (SStatus 113)
Jan 21 10:50:46 grover kernel: [ 27.017383] nv_sata: Primary device added
Jan 21 10:50:46 grover kernel: [ 27.030590] nv_sata: Primary device removed
Jan 21 10:50:46 grover kernel: [ 27.044365] nv_sata: Secondary device removed
Jan 21 10:50:46 grover kernel: [ 27.058744] ata1: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4673 85:7c69 86:3e2187:4663 88:407f
Jan 21 10:50:46 grover kernel: [ 27.058749] ata1: dev 0 ATA-7, max UDMA/133, 490234752 sectors: LBA48
Jan 21 10:50:46 grover kernel: [ 27.101015] usb 2-4.2: new full speed USB device using ohci_hcd and address 4
Jan 21 10:50:46 grover kernel: [ 27.125640] ata1: dev 0 configured for UDMA/133
Jan 21 10:50:46 grover kernel: [ 27.151867] scsi0 : sata_nv
Jan 21 10:50:46 grover kernel: [ 27.239985] usb 2-4.2: not running at top speed; connect to a high speed hub
Jan 21 10:50:46 grover kernel: [ 27.276984] usb 2-4.2: configuration #1 chosen from 1 choice
Jan 21 10:50:46 grover kernel: [ 27.298013] hub 2-4.2:1.0: USB hub found
Jan 21 10:50:46 grover kernel: [ 27.311966] hub 2-4.2:1.0: 4 ports detected
Jan 21 10:50:46 grover kernel: [ 27.395432] ata2: SATA link down (SStatus 0)
Jan 21 10:50:46 grover kernel: [ 27.409957] scsi1 : sata_nv
Jan 21 10:50:46 grover kernel: [ 27.423656] Vendor: ATA Model: Maxtor 6L250S0 Rev: BACE
Jan 21 10:50:46 grover kernel: [ 27.445172] Type: Direct-Access ANSI SCSI revision: 05
Jan 21 10:50:46 grover kernel: [ 27.469990] ACPI: PCI Interrupt Link [APSJ] enabled at IRQ 22
Jan 21 10:50:46 grover kernel: [ 27.488973] ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [APSJ] -> GSI 22 (level, high) -> IRQ 17
Jan 21 10:50:46 grover kernel: [ 27.518040] PCI: Setting latency timer of device 0000:00:0a.0 to 64
Jan 21 10:50:46 grover kernel: [ 27.518079] ata3: SATA max UDMA/133 cmd 0x9F0 ctl 0xBF2 bmdma 0xE000 irq 17
Jan 21 10:50:46 grover kernel: [ 27.541022] ata4: SATA max UDMA/133 cmd 0x970 ctl 0xB72 bmdma 0xE008 irq 17
Jan 21 10:50:46 grover kernel: [ 27.768102] ata3: SATA link down (SStatus 0)
Jan 21 10:50:46 grover kernel: [ 27.782177] scsi2 : sata_nv
Jan 21 10:50:46 grover kernel: [ 27.793873] usb 2-4.2.1: new full speed USB device using ohci_hcd and address 5
Jan 21 10:50:46 grover kernel: [ 27.968838] usb 2-4.2.1: configuration #1 chosen from 1 choice
Jan 21 10:50:46 grover kernel: [ 28.007301] ata4: SATA link down (SStatus 0)
Jan 21 10:50:46 grover kernel: [ 28.021888] scsi3 : sata_nv
Jan 21 10:50:46 grover kernel: [ 28.031774] ACPI: PCI Interrupt Link [APC4] enabled at IRQ 19
Jan 21 10:50:46 grover kernel: [ 28.050702] GSI 20 sharing vector 0xD1 and IRQ 20
Jan 21 10:50:46 grover kernel: [ 28.066570] ACPI: PCI Interrupt Link [APCJ] enabled at IRQ 21
Jan 21 10:50:46 grover kernel: [ 28.085505] ACPI: PCI Interrupt 0000:00:06.0[A] -> <6>ACPI: PCI Interrupt 0
000:02:0c.0[A] -> Link [APC4] -> Link [APCJ] -> GSI 21 (level, high) -> IRQ 18
Jan 21 10:50:46 grover kernel: [ 28.132680] GSI 19 (level, low) -> IRQ 20
Jan 21 10:50:46 grover kernel: [ 28.145889] PCI: Via IRQ fixup for 0000:02:0c.0, from 10 to 4
Jan 21 10:50:46 grover kernel: [ 28.164961] PCI: Setting latency timer of device 0000:00:06.0 to 64
Jan 21 10:50:46 grover kernel: [ 28.219223] ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[20] MMIO=[ea000000-ea0007ff] Max Packet=[2048] IR/IT contexts=[4/8]
Jan 21 10:50:46 grover kernel: [ 28.579166] intel8x0_measure_ac97_clock: measured 56004 usecs
Jan 21 10:50:46 grover kernel: [ 28.598090] intel8x0: clocking to 48728
Jan 21 10:50:46 grover kernel: [ 29.555763] ieee1394: Host added: ID:BUS[0-00:1023] GUID[0010dc00006b07c2]
Jan 21 10:50:46 grover kernel: [ 32.498302] SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
Jan 21 10:50:46 grover kernel: [ 32.583207] sda: Write Protect is off
Jan 21 10:50:46 grover kernel: [ 32.595993] sda: Mode Sense: 00 3a 00 10
Jan 21 10:50:46 grover kernel: [ 32.599979] SCSI device sda: drive cache: write back w/ FUA
Jan 21 10:50:46 grover kernel: [ 32.672741] SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
Jan 21 10:50:46 grover kernel: [ 32.694879] sda: Write Protect is off
Jan 21 10:50:46 grover kernel: [ 32.707334] sda: Mode Sense: 00 3a 00 10
Jan 21 10:50:46 grover kernel: [ 32.707585] SCSI device sda: drive cache: write back w/ FUA
Jan 21 10:50:46 grover kernel: [ 32.726082] sda: sda1 sda2 sda3 sda4 < sda5 >
Jan 21 10:50:46 grover kernel: [ 32.873995] sd 0:0:0:0: Attached scsi disk sda
Jan 21 10:50:46 grover kernel: [ 33.541752] eth1394: eth1: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0)
Jan 21 10:50:46 grover kernel: [ 34.409131] usbcore: registered new driver hiddev
Jan 21 10:50:46 grover kernel: [ 34.569628] input: Microsoft Microsoft IntelliMouse® Optical as /class/input/input1
Jan 21 10:50:46 grover kernel: [ 34.594902] input: USB HID v1.00 Mouse [Microsoft Microsoft IntelliMouse® Optical] on usb-0000:00:02.0-4.1
Jan 21 10:50:46 grover kernel: [ 34.627294] usbcore: registered new driver usbhid
Jan 21 10:50:46 grover kernel: [ 34.642789] drivers/usb/input/hid-core.c: v2.6:USB HID core driver
Jan 21 10:50:46 grover kernel: [ 34.816697] Bluetooth: Core ver 2.8
Jan 21 10:50:46 grover kernel: [ 34.828192] NET: Registered protocol family 31
Jan 21 10:50:46 grover kernel: [ 34.842990] Bluetooth: HCI device and connection manager initialized
Jan 21 10:50:46 grover kernel: [ 34.863934] Bluetooth: HCI socket layer initialized
Jan 21 10:50:46 grover kernel: [ 34.920716] Bluetooth: HCI USB driver ver 2.9
Jan 21 10:50:46 grover kernel: [ 34.941494] usbcore: registered new driver hci_usb
Jan 21 10:50:46 grover kernel: [ 36.538804] Adding 979956k swap on /dev/hda2. Priority:-1 extents:1 across:979956k
Jan 21 10:50:46 grover kernel: [ 36.564768] Adding 1020116k swap on /dev/sda2. Priority:-2 extents:1 across:1020116k
Jan 21 10:50:46 grover kernel: [ 38.361425] ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1)
Jan 21 10:50:46 grover kernel: [ 38.384356] ieee1394: sbp2: Try serialize_io=0 for better performance
Jan 21 10:50:46 grover kernel: [ 38.552288] Driver 'w83627hf' needs updating - please use bus_type methods
Jan 21 10:50:46 grover kernel: [ 38.579591] w83627hf 9191-0290: Reading VID from GPIO5
Jan 21 10:50:46 grover kernel: [ 38.703929] powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.60.0)
Jan 21 10:50:46 grover kernel: [ 38.733242] powernow-k8: 0 : fid 0xa (1800 MHz), vid 0x2 (1500 mV)
Jan 21 10:50:46 grover kernel: [ 38.754467] powernow-k8: 1 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV)
Jan 21 10:50:46 grover kernel: [ 38.776181] cpu_init done, current fid 0xa, vid 0x2
Jan 21 10:50:46 grover kernel: [ 38.826859] video1394: Installed video1394 module
Jan 21 10:50:46 grover kernel: [ 38.863707] mice: PS/2 mouse device common for all mice
Jan 21 10:50:46 grover kernel: [ 48.737454] kjournald starting. Commit interval 5 seconds
Jan 21 10:50:46 grover kernel: [ 48.766804] EXT3 FS on hda1, internal journal
Jan 21 10:50:46 grover kernel: [ 48.781215] EXT3-fs: mounted filesystem with ordered data mode.
Jan 21 10:50:46 grover kernel: [ 48.844247] ReiserFS: hda5: found reiserfs format "3.6" with standard journal
Jan 21 10:50:46 grover kernel: [ 56.740820] ReiserFS: hda5: using ordered data mode
Jan 21 10:50:46 grover kernel: [ 56.787279] ReiserFS: hda5: journal params: device hda5, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
Jan 21 10:50:46 grover kernel: [ 56.837125] ReiserFS: hda5: checking transaction log (hda5)
Jan 21 10:50:46 grover kernel: [ 56.902508] ReiserFS: hda5: Using r5 hash to sort names
Jan 21 10:50:46 grover kernel: [ 57.252649] Loading Reiser4. See www.namesys.com for a description of Reiser4
grover:/var/log# sdparm -i /dev/sda
/dev/sda: ATA Maxtor 6L250S0 BACE
Device identification VPD page:
Addressed logical unit:
id_type: vendor specific [0x0], code_set: ASCII
00 4c 69 6e 75 78 20 41 54 41 2d 53 43 53 49 20 73 Linux ATA-SCSI s
10 69 6d 75 6c 61 74 6f 72 imulator
grover:/var/log# smartctl -i -d ata /dev/sda
smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: Maxtor 6L250S0
Serial Number: L50QDF3H
Firmware Version: BACE1G10
User Capacity: 251,000,193,024 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Sat Jan 21 11:27:21 2006 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
grover:/var/log# smartctl -H -d ata /dev/sda
smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
---
Hope this helps and that I found the correct places to copy the info.
Ed Tomlinson
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.16-rc1-mm2
2006-01-21 16:39 ` 2.6.16-rc1-mm2 Ed Tomlinson
@ 2006-01-21 18:45 ` Barry K. Nathan
2006-01-21 21:36 ` 2.6.16-rc1-mm2 Ed Tomlinson
0 siblings, 1 reply; 8+ messages in thread
From: Barry K. Nathan @ 2006-01-21 18:45 UTC (permalink / raw)
To: Ed Tomlinson
Cc: Andrew Morton, linux-kernel, reiserfs-dev, jgarzik, linux-scsi
On 1/21/06, Ed Tomlinson <edt@aei.ca> wrote:
> grover:/var/log# smartctl -i -d ata /dev/sda
[snip]
> grover:/var/log# smartctl -H -d ata /dev/sda
> smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> ---
>
> Hope this helps and that I found the correct places to copy the info.
How about:
smartctl -a -d ata /dev/sda
or, if that produces too much output, then at least the following two:
smartctl -A -d ata /dev/sda
smartctl -l error -d ata /dev/sda
That way we might be able to figure out whether the disk
coincidentally started going bad after you updated the kernel.
--
-Barry K. Nathan <barryn@pobox.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.16-rc1-mm2
2006-01-21 18:45 ` 2.6.16-rc1-mm2 Barry K. Nathan
@ 2006-01-21 21:36 ` Ed Tomlinson
2006-01-21 23:57 ` 2.6.16-rc1-mm2 Barry K. Nathan
2006-01-23 12:39 ` 2.6.16-rc1-mm2 Ed Tomlinson
0 siblings, 2 replies; 8+ messages in thread
From: Ed Tomlinson @ 2006-01-21 21:36 UTC (permalink / raw)
To: Barry K. Nathan
Cc: Andrew Morton, linux-kernel, reiserfs-dev, jgarzik, linux-scsi
On Saturday 21 January 2006 13:45, Barry K. Nathan wrote:
> On 1/21/06, Ed Tomlinson <edt@aei.ca> wrote:
> > grover:/var/log# smartctl -i -d ata /dev/sda
> [snip]
> > grover:/var/log# smartctl -H -d ata /dev/sda
> > smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
> > Home page is http://smartmontools.sourceforge.net/
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > ---
> >
> > Hope this helps and that I found the correct places to copy the info.
>
> How about:
> smartctl -a -d ata /dev/sdagrover:/poola/home/ed# smartctl -a -d ata /dev/sda
smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: Maxtor 6L250S0
Serial Number: L50QDF3H
Firmware Version: BACE1G10
User Capacity: 251,000,193,024 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Sat Jan 21 16:34:26 2006 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (1922) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 99) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 252 252 063 Pre-fail Always - 571
4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 2
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 250 240 187 Pre-fail Always - 49844
9 Power_On_Hours 0x0032 251 251 000 Old_age Always - 49644
10 Spin_Retry_Count 0x002b 252 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 252 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 4
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 028 253 000 Old_age Always - 29
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 8656
196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 1
202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0
204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current 0x002a 252 252 000 Old_age Always - 0
208 Spin_Buzz 0x002a 252 252 000 Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 242 242 000 Old_age Offline - 143
210 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
211 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
212 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
> or, if that produces too much output, then at least the following two:
> smartctl -A -d ata /dev/sda
> smartctl -l error -d ata /dev/sda
grover:/poola/home/ed# smartctl -l error -d ata /dev/sda
smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged
> That way we might be able to figure out whether the disk
> coincidentally started going bad after you updated the kernel.
I suspect the newer kernel (or kernels) since when I revert to 15-rc5-mm3 all is well.
Thanks,
Ed Tomlinson
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.16-rc1-mm2
2006-01-21 21:36 ` 2.6.16-rc1-mm2 Ed Tomlinson
@ 2006-01-21 23:57 ` Barry K. Nathan
2006-01-23 12:39 ` 2.6.16-rc1-mm2 Ed Tomlinson
1 sibling, 0 replies; 8+ messages in thread
From: Barry K. Nathan @ 2006-01-21 23:57 UTC (permalink / raw)
To: Ed Tomlinson
Cc: Andrew Morton, linux-kernel, reiserfs-dev, jgarzik, linux-scsi
On 1/21/06, Ed Tomlinson <edt@aei.ca> wrote:
[snip]
> smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
[snip]
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
[snip]
>
> SMART Error Log Version: 1
> No Errors Logged
[snip]
> I suspect the newer kernel (or kernels) since when I revert to 15-rc5-mm3 all is well.
That's what it looks like to me, too. Weird...
--
-Barry K. Nathan <barryn@pobox.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.16-rc1-mm2
2006-01-21 21:36 ` 2.6.16-rc1-mm2 Ed Tomlinson
2006-01-21 23:57 ` 2.6.16-rc1-mm2 Barry K. Nathan
@ 2006-01-23 12:39 ` Ed Tomlinson
2006-01-27 14:53 ` 2.6.16-rc1-mm2 Jeff Garzik
1 sibling, 1 reply; 8+ messages in thread
From: Ed Tomlinson @ 2006-01-23 12:39 UTC (permalink / raw)
To: Barry K. Nathan
Cc: Andrew Morton, linux-kernel, reiserfs-dev, jgarzik, linux-scsi
Summarizing all this. There are two problems here.
1. reserifs4 panics when it gets io errors - I remember this was an issue that
needed to be fixed in the R4 code before it moves to mainline...
2. Why does a drive which is fine with 2.6.15-rc5-mm3, return a -5 with 2.6.16-mm3
and above? Smart reports no problems with the drive hardware. What has changed
in the libata/scsi stacks?
Thanks,
Ed Tomlinson
On Saturday 21 January 2006 16:36, Ed Tomlinson wrote:
> On Saturday 21 January 2006 13:45, Barry K. Nathan wrote:
> > On 1/21/06, Ed Tomlinson <edt@aei.ca> wrote:
> > > grover:/var/log# smartctl -i -d ata /dev/sda
> > [snip]
> > > grover:/var/log# smartctl -H -d ata /dev/sda
> > > smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
> > > Home page is http://smartmontools.sourceforge.net/
> > >
> > > === START OF READ SMART DATA SECTION ===
> > > SMART overall-health self-assessment test result: PASSED
> > >
> > > ---
> > >
> > > Hope this helps and that I found the correct places to copy the info.
> >
> > How about:
> > smartctl -a -d ata /dev/sdagrover:/poola/home/ed# smartctl -a -d ata /dev/sda
> smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Device Model: Maxtor 6L250S0
> Serial Number: L50QDF3H
> Firmware Version: BACE1G10
> User Capacity: 251,000,193,024 bytes
> Device is: Not in smartctl database [for details use: -P showall]
> ATA Version is: 7
> ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
> Local Time is: Sat Jan 21 16:34:26 2006 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x80) Offline data collection activity
> was never started.
> Auto Offline Data Collection: Enabled.
> Self-test execution status: ( 0) The previous self-test routine completed
> without error or no self-test has ever
> been run.
> Total time to complete Offline
> data collection: (1922) seconds.
> Offline data collection
> capabilities: (0x5b) SMART execute Offline immediate.
> Auto Offline data collection on/off support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> No Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 2) minutes.
> Extended self-test routine
> recommended polling time: ( 99) minutes.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 3 Spin_Up_Time 0x0027 252 252 063 Pre-fail Always - 571
> 4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 2
> 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0
> 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
> 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
> 8 Seek_Time_Performance 0x0027 250 240 187 Pre-fail Always - 49844
> 9 Power_On_Hours 0x0032 251 251 000 Old_age Always - 49644
> 10 Spin_Retry_Count 0x002b 252 252 157 Pre-fail Always - 0
> 11 Calibration_Retry_Count 0x002b 252 252 223 Pre-fail Always - 0
> 12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 4
> 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
> 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
> 194 Temperature_Celsius 0x0032 028 253 000 Old_age Always - 29
> 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 8656
> 196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0
> 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0
> 198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0
> 199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
> 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
> 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 1
> 202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0
> 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0
> 204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0
> 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0
> 207 Spin_High_Current 0x002a 252 252 000 Old_age Always - 0
> 208 Spin_Buzz 0x002a 252 252 000 Old_age Always - 0
> 209 Offline_Seek_Performnce 0x0024 242 242 000 Old_age Offline - 143
> 210 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
> 211 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
> 212 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> No self-tests have been logged. [To run self-tests, use: smartctl -t]
>
>
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
>
> > or, if that produces too much output, then at least the following two:
> > smartctl -A -d ata /dev/sda
> > smartctl -l error -d ata /dev/sda
> grover:/poola/home/ed# smartctl -l error -d ata /dev/sda
> smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART Error Log Version: 1
> No Errors Logged
>
>
> > That way we might be able to figure out whether the disk
> > coincidentally started going bad after you updated the kernel.
>
> I suspect the newer kernel (or kernels) since when I revert to 15-rc5-mm3 all is well.
>
> Thanks,
> Ed Tomlinson
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.16-rc1-mm2
2006-01-23 12:39 ` 2.6.16-rc1-mm2 Ed Tomlinson
@ 2006-01-27 14:53 ` Jeff Garzik
[not found] ` <200601280846.23279.edt@aei.ca>
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2006-01-27 14:53 UTC (permalink / raw)
To: Ed Tomlinson
Cc: Barry K. Nathan, Andrew Morton, linux-kernel, reiserfs-dev,
linux-scsi
Ed Tomlinson wrote:
> Summarizing all this. There are two problems here.
>
> 1. reserifs4 panics when it gets io errors - I remember this was an issue that
> needed to be fixed in the R4 code before it moves to mainline...
>
> 2. Why does a drive which is fine with 2.6.15-rc5-mm3, return a -5 with 2.6.16-mm3
> and above? Smart reports no problems with the drive hardware. What has changed
> in the libata/scsi stacks?
That's a long answer. Could you assist in narrowing down the versions
which are affected?
It would also be useful if you could try vanilla kernels, and help us
discover whether problems surfaces in 2.6.15, 2.6.15-git[1234],
2.6.16-rc1, etc.
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.16-rc1-mm2 (mm5 too) panics
[not found] ` <200601280846.23279.edt@aei.ca>
@ 2006-02-04 23:43 ` Ed Tomlinson
2006-02-24 1:57 ` 2.6.16-rc4-mm1 & " Ed Tomlinson
0 siblings, 1 reply; 8+ messages in thread
From: Ed Tomlinson @ 2006-02-04 23:43 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, reiserfs-dev, linux-scsi,
Jeff Garzik
Hi,
I need some help figuring this one out. I cannot use git bisect and applying the 2.6.15-1 reiserfs4
patch does not work with newer 2.6.16 levels - looks like the mutex conversion hits it hard. Looking
in mm there are about 50 reiser4 patches...
To reproduce this problem I need reiser4 though the issue maybe in libata or scsi. Does anyone
have an idea how I can proceed to find what makes newer kernels get io errors which trigger a reiser4
panic? The panic is a zam-597 (fs/reiser4/txnmgr.c) with an rc=-5
Anyone have a git tree tracking linus with reiser4 patches applied staring before 2.6.15 - if so git
bisect could be used to find the change causing the problem.
One datapoint. In 2.6.15 + reiser4 2.6.15-1 works fine - the resiser4 filesystem is heavily used with
no errors. Smart report no problems with the drive being used. The libata driver used is sata_nv.
Please reply to my email - I am only subscribed to lkml.
Help!
Ed Tomlinson
ret = reiser4_write_logs(nr_submitted);
if (ret < 0)
reiser4_panic("zam-597", "write log failed (%ld)\n", ret);
On Saturday 28 January 2006 08:46, Ed Tomlinson wrote:
> On Friday 27 January 2006 09:53, you wrote:
> > Ed Tomlinson wrote:
> > > Summarizing all this. There are two problems here.
> > >
> > > 1. reserifs4 panics when it gets io errors - I remember this was an issue that
> > > needed to be fixed in the R4 code before it moves to mainline...
> > >
> > > 2. Why does a drive which is fine with 2.6.15-rc5-mm3, return a -5 with 2.6.16-mm3
> > > and above? Smart reports no problems with the drive hardware. What has changed
> > > in the libata/scsi stacks?
> >
> > That's a long answer. Could you assist in narrowing down the versions
> > which are affected?
> >
> > It would also be useful if you could try vanilla kernels, and help us
> > discover whether problems surfaces in 2.6.15, 2.6.15-git[1234],
> > 2.6.16-rc1, etc.
>
> Jeff,
>
> I'll see what I can do with git bisect. Given that reserifs4 is in the picture this may be
> fun... I expect it will be a slow process (kernels take 40min to build here).
>
> Thanks
> Ed Tomlinson
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* 2.6.16-rc4-mm1 & 2.6.16-rc1-mm2 (mm5 too) panics
2006-02-04 23:43 ` 2.6.16-rc1-mm2 (mm5 too) panics Ed Tomlinson
@ 2006-02-24 1:57 ` Ed Tomlinson
0 siblings, 0 replies; 8+ messages in thread
From: Ed Tomlinson @ 2006-02-24 1:57 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, reiserfs-dev, linux-scsi, Jeff Garzik, Hans Reiser
Hi,
2.6.16-rc4-mm1 panics with a zam-597 (fs/reiser4/txnmgr.c) with an rc=-5
This has been happening since 2.6.16-rc1-mm2. It makes testing with reiserfs4 impossible here.
I have not been able to figure out how to use bisect etc to find out what is triggering this. I had
hoped that reversing "jbd: split checkpoint lists" as per the fix for ocfs2 might help (its in rc4-mm1)
but no such luck.
Does anyone have a suggestion on how I can help find what is broken here?
TIA,
Ed Tomlinson
On Saturday 04 February 2006 18:43, Ed Tomlinson wrote:
> I need some help figuring this one out. I cannot use git bisect and applying the 2.6.15-1 reiserfs4
> patch does not work with newer 2.6.16 levels - looks like the mutex conversion hits it hard. Looking
> in mm there are about 50 reiser4 patches...
>
> To reproduce this problem I need reiser4 though the issue maybe in libata or scsi. Does anyone
> have an idea how I can proceed to find what makes newer kernels get io errors which trigger a reiser4
> panic? The panic is a zam-597 (fs/reiser4/txnmgr.c) with an rc=-5
>
> Anyone have a git tree tracking linus with reiser4 patches applied staring before 2.6.15 - if so git
> bisect could be used to find the change causing the problem.
>
> One datapoint. In 2.6.15 + reiser4 2.6.15-1 works fine - the resiser4 filesystem is heavily used with
> no errors. Smart report no problems with the drive being used. The libata driver used is sata_nv.
>
> Please reply to my email - I am only subscribed to lkml.
>
> Help!
> Ed Tomlinson
>
> ret = reiser4_write_logs(nr_submitted);
> if (ret < 0)
> reiser4_panic("zam-597", "write log failed (%ld)\n", ret);
>
> On Saturday 28 January 2006 08:46, Ed Tomlinson wrote:
> > On Friday 27 January 2006 09:53, you wrote:
> > > Ed Tomlinson wrote:
> > > > Summarizing all this. There are two problems here.
> > > >
> > > > 1. reserifs4 panics when it gets io errors - I remember this was an issue that
> > > > needed to be fixed in the R4 code before it moves to mainline...
> > > >
> > > > 2. Why does a drive which is fine with 2.6.15-rc5-mm3, return a -5 with 2.6.16-mm3
> > > > and above? Smart reports no problems with the drive hardware. What has changed
> > > > in the libata/scsi stacks?
> > >
> > > That's a long answer. Could you assist in narrowing down the versions
> > > which are affected?
> > >
> > > It would also be useful if you could try vanilla kernels, and help us
> > > discover whether problems surfaces in 2.6.15, 2.6.15-git[1234],
> > > 2.6.16-rc1, etc.
> >
> > Jeff,
> >
> > I'll see what I can do with git bisect. Given that reserifs4 is in the picture this may be
> > fun... I expect it will be a slow process (kernels take 40min to build here).
> >
> > Thanks
> > Ed Tomlinson
> >
> >
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-02-24 1:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060120031555.7b6d65b7.akpm@osdl.org>
[not found] ` <43D170CB.8080802@reub.net>
[not found] ` <200601211014.44041.edt@aei.ca>
2006-01-21 16:39 ` 2.6.16-rc1-mm2 Ed Tomlinson
2006-01-21 18:45 ` 2.6.16-rc1-mm2 Barry K. Nathan
2006-01-21 21:36 ` 2.6.16-rc1-mm2 Ed Tomlinson
2006-01-21 23:57 ` 2.6.16-rc1-mm2 Barry K. Nathan
2006-01-23 12:39 ` 2.6.16-rc1-mm2 Ed Tomlinson
2006-01-27 14:53 ` 2.6.16-rc1-mm2 Jeff Garzik
[not found] ` <200601280846.23279.edt@aei.ca>
2006-02-04 23:43 ` 2.6.16-rc1-mm2 (mm5 too) panics Ed Tomlinson
2006-02-24 1:57 ` 2.6.16-rc4-mm1 & " Ed Tomlinson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).