* [2.6.18,19] SATA boot problems (ICH6/ICH6W) @ 2006-12-11 18:03 Kovid Goyal 2006-12-20 0:44 ` Tejun Heo 2007-01-23 21:49 ` danieljzhang 0 siblings, 2 replies; 29+ messages in thread From: Kovid Goyal @ 2006-12-11 18:03 UTC (permalink / raw) To: linux-ide Hi, I'm having problems booting from a SATA disk with 2.6.19. Grub loads fine, but when the kernel boots, it *sometimes* ends up with VFS: Cannot open root device "sda5" or unknown-block(0,0). This issue appears to be identical with the one reported in http://lkml.org/lkml/2006/10/19/327 except that 2.6.19 did not fix it for me. In fact, it made it worse. The timeouts reported in that thread are still present and with 2.6.19 the fraction of unsuccessful boots has increased from 30% to 80%. I often have to reboot my machine 3 times in a row before it succeeds. I have tried both vanilla 2.6.19 and gentoo-sources-2.6.19-r1 To summarize: My machine stalls for nearly a minute on all boots and the root filesystem fails to mount on 80% of the boots. This problem first manifested itself when I switched to 2.6.18 and has become worse with 2.6.19. The controller: 00:1f.2 IDE interface: Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA Controller (rev 03) I don't know how to get the logs from an unsuccessful boot. Here is an extract from the logs of a successful boot (in which the timeout is present): Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Probing IDE interface ide0... hda: PHILIPS DVD+/-RW DVD8631, ATAPI CD/DVD-ROM drive input: ImExPS/2 Logitech MX Mouse as /class/input/input2 Probing IDE interface ide1... ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache Uniform CD-ROM driver Revision: 3.20 ata_piix 0000:00:1f.1: version 2.00ac6 ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 16 (level, low) -> IRQ 16 ata: 0x1F0 IDE port busy ata: conflict with ide0 PCI: Setting latency timer of device 0000:00:1f.1 to 64 ata1: DUMMY ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15 scsi0 : ata_piix scsi1 : ata_piix ata2: port disabled. ignoring. ATA: abnormal status 0xFF on port 0x177 ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt 0000:00:1f.2[C] -> GSI 20 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata3: SATA max UDMA/133 cmd 0xFE00 ctl 0xFE12 bmdma 0xFEA0 irq 17 ata4: SATA max UDMA/133 cmd 0xFE20 ctl 0xFE32 bmdma 0xFEA8 irq 17 scsi2 : ata_piix ata3.00: ATA-6, max UDMA/133, 156250000 sectors: LBA48 ata3.00: ata3: dev 0 multi count 8 ata3.01: ATA-6, max UDMA/133, 156301488 sectors: LBA48 ata3.01: ata3: dev 1 multi count 8 ata3.00: configured for UDMA/133 ata3.01: configured for UDMA/133 scsi3 : ata_piix ata4: port is slow to respond, please be patient (Status 0xff) ata4: port failed to respond (30 secs, Status 0xff) ata4: SRST failed (status 0xFF) ata4: SRST failed (err_mask=0x100) ata4: softreset failed, retrying in 5 secs ata4: SRST failed (status 0xFF) ata4: SRST failed (err_mask=0x100) ata4: softreset failed, retrying in 5 secs ata4: SRST failed (status 0xFF) ata4: SRST failed (err_mask=0x100) ata4: reset failed, giving up scsi 2:0:0:0: Direct-Access ATA ST380013AS 8.05 PQ: 0 ANSI: 5 SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back SCSI device sda: 156250000 512-byte hdwr sectors (80000 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > sd 2:0:0:0: Attached scsi disk sda scsi 2:0:1:0: Direct-Access ATA ST380013AS 3.18 PQ: 0 ANSI: 5 SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back sdb: sdb1 sdb2 sdb3 sd 2:0:1:0: Attached scsi disk sdb The relevant sections from my kernel configuration: # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_SATA_AHCI is not set # CONFIG_SATA_SVW is not set # CONFIG_SATA_MV is not set # CONFIG_SATA_NV is not set # CONFIG_SATA_QSTOR is not set # CONFIG_SATA_PROMISE is not set # CONFIG_SATA_SX4 is not set # CONFIG_SATA_SIL is not set # CONFIG_SATA_SIL24 is not set # CONFIG_SATA_SIS is not set # CONFIG_SATA_ULI is not set # CONFIG_SATA_VIA is not set # CONFIG_SATA_VITESSE is not set CONFIG_SATA_INTEL_COMBINED=y CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set CONFIG_IDE_GENERIC=y # CONFIG_BLK_DEV_IDEPNP is not set # CONFIG_BLK_DEV_IDEPCI is not set # CONFIG_IDE_ARM is not set # CONFIG_IDE_CHIPSETS is not set # CONFIG_BLK_DEV_IDEDMA is not set # CONFIG_IDEDMA_AUTO is not set # CONFIG_CD_NO_IDESCSI is not set Thanks, Kovid. -- _____________________________________ Kovid Goyal MC 452-48 California Institute of Technology 1200 E California Blvd Pasadena, CA 91125 home : +01 626 390 8699 office: +01 626 395 6595 (449 Lauritsen) email : kovid@theory.caltech.edu web : http://www.kovidgoyal.net _____________________________________ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-11 18:03 [2.6.18,19] SATA boot problems (ICH6/ICH6W) Kovid Goyal @ 2006-12-20 0:44 ` Tejun Heo 2006-12-20 2:00 ` Kovid Goyal 2006-12-20 3:29 ` Gary Hade 2007-01-23 21:49 ` danieljzhang 1 sibling, 2 replies; 29+ messages in thread From: Tejun Heo @ 2006-12-20 0:44 UTC (permalink / raw) To: Kovid Goyal; +Cc: linux-ide Kovid Goyal wrote: > Hi, > > I'm having problems booting from a SATA disk with 2.6.19. > Grub loads fine, but when the kernel boots, it *sometimes* ends up with > VFS: Cannot open root device "sda5" or unknown-block(0,0). > > This issue appears to be identical with the one reported in > http://lkml.org/lkml/2006/10/19/327 except that 2.6.19 did not fix it for me. > In fact, it made it worse. The timeouts reported in that thread are still > present and with 2.6.19 the fraction of unsuccessful boots has increased from > 30% to 80%. I often have to reboot my machine 3 times in a row before it > succeeds. I have tried both vanilla 2.6.19 and gentoo-sources-2.6.19-r1 > > To summarize: My machine stalls for nearly a minute on all boots and the root > filesystem fails to mount on 80% of the boots. This problem first manifested > itself when I switched to 2.6.18 and has become worse with 2.6.19. > > The controller: > 00:1f.2 IDE interface: Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA > Controller (rev 03) > > I don't know how to get the logs from an unsuccessful boot. Here is > an extract from the logs of a successful boot (in which the timeout is > present): Does goving back to 2.6.17 fix the problem? Also, please report the result of 'smartctl -d ata -a /dev/sdX' where sdX is the problematic drive. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-20 0:44 ` Tejun Heo @ 2006-12-20 2:00 ` Kovid Goyal 2006-12-20 2:13 ` Tejun Heo 2006-12-20 3:29 ` Gary Hade 1 sibling, 1 reply; 29+ messages in thread From: Kovid Goyal @ 2006-12-20 2:00 UTC (permalink / raw) To: Tejun Heo, linux-ide Yes going back to 2.6.17 does fix it. Also windows and various LiveCDs boot without problems. Here is the output smartctl -d ata -a /dev/sda smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family Device Model: ST380013AS Serial Number: 5JVFSQQW Firmware Version: 8.05 User Capacity: 80,000,000,000 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2 Local Time is: Tue Dec 19 17:56:07 2006 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 58) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 060 057 006 Pre-fail Always - 38101080 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 086 060 030 Pre-fail Always - 424904923 9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 11741 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 311 194 Temperature_Celsius 0x0022 060 072 000 Old_age Always - 60 195 Hardware_ECC_Recovered 0x001a 060 057 000 Old_age Always - 38101080 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 11726 - # 2 Short offline Completed without error 00% 11702 - # 3 Short offline Completed without error 00% 11679 - # 4 Extended offline Completed without error 00% 11658 - # 5 Short offline Completed without error 00% 11655 - # 6 Short offline Completed without error 00% 11632 - # 7 Short offline Completed without error 00% 11608 - # 8 Short offline Completed without error 00% 11585 - # 9 Short offline Completed without error 00% 11562 - #10 Short offline Completed without error 00% 11538 - #11 Extended offline Completed without error 00% 11500 - #12 Short offline Completed without error 00% 11498 - #13 Short offline Completed without error 00% 11474 - #14 Short offline Completed without error 00% 11452 - #15 Short offline Completed without error 00% 11429 - #16 Short offline Completed without error 00% 11405 - #17 Short offline Completed without error 00% 11381 - #18 Short offline Completed without error 00% 11358 - #19 Extended offline Completed without error 00% 11338 - #20 Short offline Completed without error 00% 11334 - #21 Short offline Completed without error 00% 11311 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Thanks, Kovid. On Tuesday 19 December 2006 16:44, you wrote: > Does goving back to 2.6.17 fix the problem? Also, please report the > result of 'smartctl -d ata -a /dev/sdX' where sdX is the problematic drive. -- _____________________________________ Kovid Goyal MC 452-48 California Institute of Technology 1200 E California Blvd Pasadena, CA 91125 home : +01 626 390 8699 office: +01 626 395 6595 (449 Lauritsen) email : kovid@theory.caltech.edu web : http://www.kovidgoyal.net _____________________________________ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-20 2:00 ` Kovid Goyal @ 2006-12-20 2:13 ` Tejun Heo 2006-12-20 4:56 ` Kovid Goyal 2007-01-11 23:32 ` Kovid Goyal 0 siblings, 2 replies; 29+ messages in thread From: Tejun Heo @ 2006-12-20 2:13 UTC (permalink / raw) To: Kovid Goyal; +Cc: linux-ide Kovid Goyal wrote: > Yes going back to 2.6.17 does fix it. Also windows and various LiveCDs boot > without problems. > > Here is the output > smartctl -d ata -a /dev/sda > smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ Thanks, I couldn't find anything indicating hardware problem from the smartctl output. Ergh... The 0xff timing out is fixed in the devel tree. The problem is why sda sometimes doesn't get detected. Can you post the result of failing dmesg? If you've got a second computer, doing netconsole (Documentation/networking/netconsole.txt) is the best way. If not, just take a picture of the screen while ata1 detection is in progress. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-20 2:13 ` Tejun Heo @ 2006-12-20 4:56 ` Kovid Goyal 2007-01-11 23:32 ` Kovid Goyal 1 sibling, 0 replies; 29+ messages in thread From: Kovid Goyal @ 2006-12-20 4:56 UTC (permalink / raw) To: Tejun Heo, linux-ide Unfortunately I only have ssh access to the box for the next two weeks, but after I return I will send the netconsole logs. Thanks for looking into this. Kovid. On Tuesday 19 December 2006 18:13, you wrote: > Kovid Goyal wrote: > > Yes going back to 2.6.17 does fix it. Also windows and various LiveCDs > > boot without problems. > > > > Here is the output > > smartctl -d ata -a /dev/sda > > smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce > > Allen Home page is http://smartmontools.sourceforge.net/ > > Thanks, I couldn't find anything indicating hardware problem from the > smartctl output. Ergh... The 0xff timing out is fixed in the devel > tree. The problem is why sda sometimes doesn't get detected. Can you > post the result of failing dmesg? If you've got a second computer, > doing netconsole (Documentation/networking/netconsole.txt) is the best > way. If not, just take a picture of the screen while ata1 detection is > in progress. -- _____________________________________ Kovid Goyal MC 452-48 California Institute of Technology 1200 E California Blvd Pasadena, CA 91125 home : +01 626 390 8699 office: +01 626 395 6595 (449 Lauritsen) email : kovid@theory.caltech.edu web : http://www.kovidgoyal.net _____________________________________ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-20 2:13 ` Tejun Heo 2006-12-20 4:56 ` Kovid Goyal @ 2007-01-11 23:32 ` Kovid Goyal 2007-01-13 2:19 ` Tejun Heo 1 sibling, 1 reply; 29+ messages in thread From: Kovid Goyal @ 2007-01-11 23:32 UTC (permalink / raw) To: linux-ide, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 1261 bytes --] Hi, I was finally able to get the netconsole logs for a failing boot. Attached as log-panic. Sorry for the delay. Thanks, Kovid. On Tuesday 19 December 2006 18:13, you wrote: > Kovid Goyal wrote: > > Yes going back to 2.6.17 does fix it. Also windows and various LiveCDs > > boot without problems. > > > > Here is the output > > smartctl -d ata -a /dev/sda > > smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce > > Allen Home page is http://smartmontools.sourceforge.net/ > > Thanks, I couldn't find anything indicating hardware problem from the > smartctl output. Ergh... The 0xff timing out is fixed in the devel > tree. The problem is why sda sometimes doesn't get detected. Can you > post the result of failing dmesg? If you've got a second computer, > doing netconsole (Documentation/networking/netconsole.txt) is the best > way. If not, just take a picture of the screen while ata1 detection is > in progress. -- _____________________________________ Kovid Goyal MC 452-48 California Institute of Technology 1200 E California Blvd Pasadena, CA 91125 home : +01 626 390 8699 office: +01 626 395 6595 (449 Lauritsen) email : kovid@theory.caltech.edu web : http://www.kovidgoyal.net _____________________________________ [-- Attachment #2: log-panic --] [-- Type: text/plain, Size: 12888 bytes --] Linux version 2.6.19-gentoo-r3 (root@giskard) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #1 SMP Sat Jan 6 09:35:46 PST 2007 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fe88c00 (usable) BIOS-e820: 000000003fe88c00 - 000000003fe8ac00 (ACPI NVS) BIOS-e820: 000000003fe8ac00 - 000000003fe8cc00 (ACPI data) BIOS-e820: 000000003fe8cc00 - 0000000040000000 (reserved) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved) BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) 126MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000fe710 Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 229376 HighMem 229376 -> 261768 early_node_map[1] active PFN ranges 0: 0 -> 261768 DMI 2.3 present. ACPI: PM-Timer IO Port: 0x808 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:3 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:3 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x02] disabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] disabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x07] disabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x03] disabled) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x05] disabled) ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1]) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Enabling APIC mode: Flat. Using 1 I/O APICs ACPI: HPET id: 0x8086a201 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000) Detected 3192.174 MHz processor. Built 1 zonelists. Total pages: 259723 Kernel command line: root=/dev/sda5 console=tty1 netconsole=4444@192.168.1.100/eth0,6666@192.168.1.103/00:09:5B:0E:58:EE netconsole: local port 4444 netconsole: local IP 192.168.1.100 netconsole: interface eth0 netconsole: remote port 6666 netconsole: remote IP 192.168.1.103 netconsole: remote ethernet address 00:09:5b:0e:58:ee Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 PID hash table entries: 4096 (order: 12, 16384 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1033260k/1047072k available (2672k kernel code, 13256k reserved, 872k data, 228k init, 129568k highmem) virtual kernel memory layout: fixmap : 0xfff4f000 - 0xfffff000 ( 704 kB) pkmap : 0xff800000 - 0xffc00000 (4096 kB) vmalloc : 0xf8800000 - 0xff7fe000 ( 111 MB) lowmem : 0xc0000000 - 0xf8000000 ( 896 MB) .init : 0xc047f000 - 0xc04b8000 ( 228 kB) .data : 0xc039c162 - 0xc047618c ( 872 kB) .text : 0xc0100000 - 0xc039c162 (2672 kB) Checking if this processor honours the WP bit even in supervisor mode... Ok. hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 hpet0: 3 64-bit timers, 14318180 Hz Using HPET for base-timer Calibrating delay using timer specific routine.. 6388.08 BogoMIPS (lpj=3194042) Mount-cache hash table entries: 512 monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled Checking 'hlt' instruction... OK. Freeing SMP alternatives: 20k freed ACPI: Core revision 20060707 CPU0: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 04 Booting processor 1/1 eip 2000 Initializing CPU#1 Calibrating delay using timer specific routine.. 6383.99 BogoMIPS (lpj=3191999) monitor/mwait feature present. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel P4/Xeon Extended MCE MSRs (12) available CPU1: Thermal monitoring enabled CPU1: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 04 Total of 2 processors activated (12772.08 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 checking TSC synchronization across 2 CPUs: passed. Brought up 2 CPUs migration_cost=296 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using MMCONFIG Setting up standard PCI resources ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 PCI quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO PCI quirk: region 0880-08bf claimed by ICH6 GPIO PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1 PCI: Firmware left 0000:03:08.0 e100 interrupts enabled, disabling PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 9 10 11 12 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 9 10 11 12 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 12 15) *0, disabled. ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 *9 10 11 12 15) ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 *10 11 12 15) ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 *9 10 11 12 15) ACPI: PCI Interrupt Link [LNKH] (IRQs *3 4 5 6 7 9 10 11 12 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 13 devices SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report pnp: 00:01: ioport range 0x800-0x85f could not be reserved pnp: 00:01: ioport range 0xc00-0xc7f has been reserved pnp: 00:01: ioport range 0x860-0x8ff could not be reserved pnp: 00:0b: ioport range 0x100-0x1fe has been reserved pnp: 00:0b: ioport range 0x200-0x277 has been reserved pnp: 00:0b: ioport range 0x280-0x2e7 has been reserved pnp: 00:0b: ioport range 0x2e8-0x2ef has been reserved pnp: 00:0b: ioport range 0x2f0-0x2f7 has been reserved pnp: 00:0b: ioport range 0x2f8-0x2ff has been reserved pnp: 00:0b: ioport range 0x300-0x377 has been reserved pnp: 00:0b: ioport range 0x380-0x3bb has been reserved PCI: Bridge: 0000:00:01.0 IO window: d000-dfff MEM window: dfe00000-dfffffff PREFETCH window: c0000000-cfffffff PCI: Bridge: 0000:00:1c.0 IO window: disabled. MEM window: dfd00000-dfdfffff PREFETCH window: disabled. PCI: Bridge: 0000:00:1e.0 IO window: c000-cfff MEM window: dfc00000-dfcfffff PREFETCH window: disabled. ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16 ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 16 (level, low) -> IRQ 16 NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) hda: PHILIPS DVD+/-RW DVD8631, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache Uniform CD-ROM driver Revision: 3.20 ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 16 (level, low) -> IRQ 16 ata: 0x1F0 IDE port busy ata: conflict with ide0 ata1: DUMMY ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15 scsi0 : ata_piix scsi1 : ata_piix ata2: port disabled. ignoring. ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt 0000:00:1f.2[C] -> GSI 20 (level, low) -> IRQ 17 ata3: SATA max UDMA/133 cmd 0xFE00 ctl 0xFE12 bmdma 0xFEA0 irq 17 ata4: SATA max UDMA/133 cmd 0xFE20 ctl 0xFE32 bmdma 0xFEA8 irq 17 scsi2 : ata_piix scsi3 : ata_piix ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 21 (level, low) -> IRQ 18 ehci_hcd 0000:00:1d.7: EHCI Host Controller ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:1d.7: debug port 1 ehci_hcd 0000:00:1d.7: irq 18, io mem 0xffa80800 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 8 ports detected USB Universal Host Controller Interface driver v3.0 ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 21 (level, low) -> IRQ 18 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:1d.0: irq 18, io base 0x0000ff80 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 22 (level, low) -> IRQ 19 uhci_hcd 0000:00:1d.1: UHCI Host Controller uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1d.1: irq 19, io base 0x0000ff60 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 20 uhci_hcd 0000:00:1d.2: UHCI Host Controller uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:1d.2: irq 20, io base 0x0000ff40 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 23 (level, low) -> IRQ 21 uhci_hcd 0000:00:1d.3: UHCI Host Controller uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5 uhci_hcd 0000:00:1d.3: irq 21, io base 0x0000ff20 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected usb 2-1: new full speed USB device using uhci_hcd and address 2 usb 2-1: configuration #1 chosen from 1 choice usb 2-2: new full speed USB device using uhci_hcd and address 3 usb 2-2: configuration #1 chosen from 1 choice usb 3-1: new full speed USB device using uhci_hcd and address 2 usb 3-1: configuration #1 chosen from 1 choice usb 3-2: new low speed USB device using uhci_hcd and address 3 usb 3-2: configuration #1 chosen from 1 choice usb 4-1: new low speed USB device using uhci_hcd and address 2 usb 4-1: configuration #1 chosen from 1 choice usb 4-2: new full speed USB device using uhci_hcd and address 3 usb 4-2: configuration #1 chosen from 1 choice usb 5-1: new full speed USB device using uhci_hcd and address 2 usb 5-1: configuration #1 chosen from 2 choices drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 3 if 1 alt 0 proto 2 vid 0x03F0 pid 0x3A11 usbcore: registered new interface driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver Initializing USB Mass Storage driver... usbcore: registered new interface driver usb-storage USB Mass Storage support registered. input: Microsoft Natural® Ergonomic Keyboard 4000 as /class/input/input0 input: USB HID v1.11 Keyboard [Microsoft Natural® Ergonomic Keyboard 4000] on usb-0000:00:1d.2-1 input: Microsoft Natural® Ergonomic Keyboard 4000 as /class/input/input1 input: USB HID v1.11 Device [Microsoft Natural® Ergonomic Keyboard 4000] on usb-0000:00:1d.2-1 usbcore: registered new interface driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver input: Microsoft Natural® Ergonomic Keyboard 4000 as /class/input/input2 The simple driver 'usbnek4k' attach to the device 'Microsoft Natural® Ergonomic Keyboard 4000' PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice input: PC Speaker as /class/input/input3 input: AT Translated Set 2 keyboard as /class/input/input4 TCP cubic registered Initializing XFRM netlink socket NET: Registered protocol family 1 NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 Starting balanced_irq Using IPI Shortcut mode Time: tsc clocksource has been installed. input: ImExPS/2 Logitech MX Mouse as /class/input/input5 Root-NFS: No NFS server available, giving up. VFS: Unable to mount root fs via NFS, trying floppy. VFS: Cannot open root device "sda5" or unknown-block(2,0) Please append a correct "root=" boot option Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-11 23:32 ` Kovid Goyal @ 2007-01-13 2:19 ` Tejun Heo 0 siblings, 0 replies; 29+ messages in thread From: Tejun Heo @ 2007-01-13 2:19 UTC (permalink / raw) To: Kovid Goyal; +Cc: linux-ide Kovid Goyal wrote: > ata3: SATA max UDMA/133 cmd 0xFE00 ctl 0xFE12 bmdma 0xFEA0 irq 17 > ata4: SATA max UDMA/133 cmd 0xFE20 ctl 0xFE32 bmdma 0xFEA8 irq 17 > scsi2 : ata_piix > scsi3 : ata_piix Hmmm... ata3.00 was /dev/sda1. Presence testing seems to have failed. Please give a shot at 2.6.20-rc4. We dropped whole presence detection thing and went back to polling IDENTIFY and that fixed a lot of detection bugs. I think your problem is one of those. Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-20 0:44 ` Tejun Heo 2006-12-20 2:00 ` Kovid Goyal @ 2006-12-20 3:29 ` Gary Hade 2006-12-20 3:53 ` Tejun Heo 1 sibling, 1 reply; 29+ messages in thread From: Gary Hade @ 2006-12-20 3:29 UTC (permalink / raw) To: Tejun Heo; +Cc: Kovid Goyal, linux-ide, lcm On Wed, Dec 20, 2006 at 09:44:45AM +0900, Tejun Heo wrote: > Kovid Goyal wrote: > > Hi, > > > > I'm having problems booting from a SATA disk with 2.6.19. > > Grub loads fine, but when the kernel boots, it *sometimes* ends up with > > VFS: Cannot open root device "sda5" or unknown-block(0,0). > > > > This issue appears to be identical with the one reported in > > http://lkml.org/lkml/2006/10/19/327 except that 2.6.19 did not fix it for me. > > In fact, it made it worse. The timeouts reported in that thread are still > > present and with 2.6.19 the fraction of unsuccessful boots has increased from > > 30% to 80%. I often have to reboot my machine 3 times in a row before it > > succeeds. I have tried both vanilla 2.6.19 and gentoo-sources-2.6.19-r1 > > > > To summarize: My machine stalls for nearly a minute on all boots and the root > > filesystem fails to mount on 80% of the boots. This problem first manifested > > itself when I switched to 2.6.18 and has become worse with 2.6.19. > > > > The controller: > > 00:1f.2 IDE interface: Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA > > Controller (rev 03) > > > > I don't know how to get the logs from an unsuccessful boot. Here is > > an extract from the logs of a successful boot (in which the timeout is > > present): > > Does goving back to 2.6.17 fix the problem? Also, please report the > result of 'smartctl -d ata -a /dev/sdX' where sdX is the problematic drive. We have been seeing what might be the same problem on an IBM Intellistation M30 Pro which contains the same SATA controller. I have been debugging and trying to better understand the problem before posting my findings and possible fix but after noticing this message today and then seeing the "SRST failed (status 0xFF)" messages included in Kovid's original message I think I should share what I have found so far. In our case the SRST failures are associated with a Quantum GoVault removable hard drive device. We have also noticed that failure is dependent on cable placement e.g. after encoutering the failure a reversal of the ports to which the hard drive and GoVault cables are connected will cause the problem to disappear. The above SRST failure message was from this code in libata-core.c. /* Before we perform post reset processing we want to see if * the bus shows 0xFF because the odd clown forgets the D7 * pulldown resistor. */ if (ata_check_status(ap) == 0xFF) { ata_port_printk(ap, KERN_ERR, "SRST failed (status 0xFF)\n"); return AC_ERR_OTHER; } I noticed that Tejun recently provided a "libata: handle 0xff status properly" patch that is now in mainline that improves this code re: http://marc.theaimsgroup.com/?l=linux-ide&m=116038642105802&w=2 but I found that the check still failed but more silently and with no retries. I decided to try increasing the delay that preceeds the above check [ msleep(150); ] and found that a change from 150ms to 1000ms caused the problem to disappear. I then replaced the msleep(150); with: { int i, ms = 5; msleep(ms); ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n", ms, ata_check_status(ap)); for (i = 1; i <= 20; i++) { ms += 50; msleep(50); ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n", ms, ata_check_status(ap)); } } Output for two cable placement configurations (0xFF check failure and 0xFF check success) are included below. Note that there are cable placement configurations for both the hard drive and GoVault where the initial status is 0xff. i.e. both transition from 0xff to 0x7f when BSY bit is cleared but it is taking MUCH longer for the GoVault (600-700ms for GoVault and <5ms for hard drive). It does not appear that the 0xff starting status is device specific. So, it appears that we have a situation with this SATA controller where a 0xFF status is not an accurate indication that there is no device. Although the 150ms to 1000ms delay increase works for the GoVault device I am not sure if it is the best long term fix for the problem. Gary -- Gary Hade IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc ================================================ Falied 0xFF check cable placement configuration ================================================ libata version 2.00 loaded. ata_piix 0000:00:1f.2: version 2.00 ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 225 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0x20C0 ctl 0x20BA bmdma 0x2090 irq 225 ata2: SATA max UDMA/133 cmd 0x20B0 ctl 0x20A6 bmdma 0x2098 irq 225 scsi0 : ata_piix ata1: status @ 5 ms: 0xff ata1: status @ 55 ms: 0xff ata1: status @ 105 ms: 0xff ata1: status @ 155 ms: 0xff ata1: status @ 205 ms: 0xff ata1: status @ 255 ms: 0xff ata1: status @ 305 ms: 0xff ata1: status @ 355 ms: 0xff ata1: status @ 405 ms: 0xff ata1: status @ 455 ms: 0xff ata1: status @ 505 ms: 0xff ata1: status @ 555 ms: 0xff ata1: status @ 605 ms: 0xff ata1: status @ 655 ms: 0x7f ata1: status @ 705 ms: 0x7f ata1: status @ 755 ms: 0x7f ata1: status @ 805 ms: 0x7f ata1: status @ 855 ms: 0x7f ata1: status @ 905 ms: 0x7f ata1: status @ 955 ms: 0x7f ata1: status @ 1005 ms: 0x7f ATA: abnormal status 0x7F on port 0x20C7 ATA: abnormal status 0x7F on port 0x20C7 ata1.01: ATAPI, max UDMA/66 ata1.01: configured for UDMA/66 scsi1 : ata_piix ata2: status @ 5 ms: 0x50 ata2: status @ 55 ms: 0x50 ata2: status @ 105 ms: 0x50 ata2: status @ 155 ms: 0x50 ata2: status @ 205 ms: 0x50 ata2: status @ 255 ms: 0x50 ata2: status @ 305 ms: 0x50 ata2: status @ 355 ms: 0x50 ata2: status @ 405 ms: 0x50 ata2: status @ 455 ms: 0x50 ata2: status @ 505 ms: 0x50 ata2: status @ 555 ms: 0x50 ata2: status @ 605 ms: 0x50 ata2: status @ 655 ms: 0x50 ata2: status @ 705 ms: 0x50 ata2: status @ 755 ms: 0x50 ata2: status @ 805 ms: 0x50 ata2: status @ 855 ms: 0x50 ata2: status @ 905 ms: 0x50 ata2: status @ 955 ms: 0x50 ata2: status @ 1005 ms: 0x50 ata2.00: ATA-7, max UDMA/133, 156312576 sectors: LBA ata2.00: ata2: dev 0 multi count 16 ata2.00: configured for UDMA/133 Vendor: IBM Model: GoVault Rev: 008F Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 156300464 512-byte hdwr sectors (80026 MB) sda: Write Protect is off sda: Mode Sense: 00 74 00 00 sda: cache data unavailable sda: assuming drive cache: write through SCSI device sda: 156300464 512-byte hdwr sectors (80026 MB) sda: Write Protect is off sda: Mode Sense: 00 74 00 00 sda: cache data unavailable sda: assuming drive cache: write through sda: sda1 sda2 sda3 sd 0:0:1:0: Attached scsi removable disk sda Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 156312576 512-byte hdwr sectors (80032 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back SCSI device sdb: 156312576 512-byte hdwr sectors (80032 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 > sd 1:0:0:0: Attached scsi disk sdb ========================================================== =================================================== Successful 0xFF check cable placement configuration =================================================== libata version 2.00 loaded. ata_piix 0000:00:1f.2: version 2.00 ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 225 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0x20C0 ctl 0x20BA bmdma 0x2090 irq 225 ata2: SATA max UDMA/133 cmd 0x20B0 ctl 0x20A6 bmdma 0x2098 irq 225 scsi0 : ata_piix ata1: status @ 5 ms: 0x7f ata1: status @ 55 ms: 0x7f ata1: status @ 105 ms: 0x7f ata1: status @ 155 ms: 0x7f ata1: status @ 205 ms: 0x7f ata1: status @ 255 ms: 0x7f ata1: status @ 305 ms: 0x7f ata1: status @ 355 ms: 0x7f ata1: status @ 405 ms: 0x7f ata1: status @ 455 ms: 0x7f ata1: status @ 505 ms: 0x7f ata1: status @ 555 ms: 0x7f ata1: status @ 605 ms: 0x7f ata1: status @ 655 ms: 0x7f ata1: status @ 705 ms: 0x7f ata1: status @ 755 ms: 0x7f ata1: status @ 805 ms: 0x7f ata1: status @ 855 ms: 0x7f ata1: status @ 905 ms: 0x7f ata1: status @ 955 ms: 0x7f ata1: status @ 1005 ms: 0x7f ATA: abnormal status 0x7F on port 0x20C7 ATA: abnormal status 0x7F on port 0x20C7 ata1.01: ATA-7, max UDMA/133, 156312576 sectors: LBA ata1.01: ata1: dev 1 multi count 16 ata1.01: configured for UDMA/133 scsi1 : ata_piix ata2: status @ 5 ms: 0xd0 ata2: status @ 55 ms: 0xd0 ata2: status @ 105 ms: 0xd0 ata2: status @ 155 ms: 0xd0 ata2: status @ 205 ms: 0xd0 ata2: status @ 255 ms: 0xd0 ata2: status @ 305 ms: 0xd0 ata2: status @ 355 ms: 0xd0 ata2: status @ 405 ms: 0xd0 ata2: status @ 455 ms: 0xd0 ata2: status @ 505 ms: 0xd0 ata2: status @ 555 ms: 0xd0 ata2: status @ 605 ms: 0xd0 ata2: status @ 655 ms: 0x0 ata2: status @ 705 ms: 0x0 ata2: status @ 755 ms: 0x0 ata2: status @ 805 ms: 0x0 ata2: status @ 855 ms: 0x0 ata2: status @ 905 ms: 0x0 ata2: status @ 955 ms: 0x0 ata2: status @ 1005 ms: 0x0 ata2.00: ATAPI, max UDMA/66 ata2.00: configured for UDMA/66 Vendor: ATA Model: Maxtor 6Y080M0 Rev: YAR5 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 156312576 512-byte hdwr sectors (80032 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back SCSI device sda: 156312576 512-byte hdwr sectors (80032 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 > sd 0:0:1:0: Attached scsi disk sda Vendor: IBM Model: GoVault Rev: 008F Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 156300464 512-byte hdwr sectors (80026 MB) sdb: Write Protect is off sdb: Mode Sense: 00 74 00 00 sdb: cache data unavailable sdb: assuming drive cache: write through SCSI device sdb: 156300464 512-byte hdwr sectors (80026 MB) sdb: Write Protect is off sdb: Mode Sense: 00 74 00 00 sdb: cache data unavailable sdb: assuming drive cache: write through sdb: sdb1 sdb2 sdb3 sd 1:0:0:0: Attached scsi removable disk sdb ========================================================== ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-20 3:29 ` Gary Hade @ 2006-12-20 3:53 ` Tejun Heo 2006-12-20 4:30 ` Tejun Heo 2006-12-21 17:10 ` Gary Hade 0 siblings, 2 replies; 29+ messages in thread From: Tejun Heo @ 2006-12-20 3:53 UTC (permalink / raw) To: Gary Hade; +Cc: Kovid Goyal, linux-ide, lcm, Jeff Garzik Howdy, Gary Hade wrote: > I noticed that Tejun recently provided a "libata: handle 0xff status > properly" patch that is now in mainline that improves this code > re: http://marc.theaimsgroup.com/?l=linux-ide&m=116038642105802&w=2 > but I found that the check still failed but more silently and with no > retries. > > I decided to try increasing the delay that preceeds the above > check [ msleep(150); ] and found that a change from 150ms to > 1000ms caused the problem to disappear. Aieeee, 150ms not enough for the device to send the first FIS after SRST? > I then replaced the msleep(150); with: > { > int i, ms = 5; > msleep(ms); > ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n", > ms, ata_check_status(ap)); > for (i = 1; i <= 20; i++) { > ms += 50; > msleep(50); > ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n", > ms, ata_check_status(ap)); > } > } > > Output for two cable placement configurations (0xFF check failure > and 0xFF check success) are included below. Note that there are > cable placement configurations for both the hard drive and > GoVault where the initial status is 0xff. i.e. both transition > from 0xff to 0x7f when BSY bit is cleared but it is taking MUCH > longer for the GoVault (600-700ms for GoVault and <5ms for > hard drive). It does not appear that the 0xff starting status > is device specific. > > So, it appears that we have a situation with this SATA controller > where a 0xFF status is not an accurate indication that there is > no device. > > Although the 150ms to 1000ms delay increase works for the GoVault > device I am not sure if it is the best long term fix for the problem. I would be surprised if Kovid's sda not detected case is caused by this. For GoVault (that's SATAPI right?), yeah, maybe. For an ATA disk, no way (hopefully). Can you consult with quantum about it? If they verify your fix (ie, GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs after SRST), I'll push similar patch upstream. Hmm.. or do we have to wait !BSY here as old IDE did? -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-20 3:53 ` Tejun Heo @ 2006-12-20 4:30 ` Tejun Heo 2006-12-21 17:10 ` Gary Hade 1 sibling, 0 replies; 29+ messages in thread From: Tejun Heo @ 2006-12-20 4:30 UTC (permalink / raw) Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, Jeff Garzik Tejun Heo wrote: > I would be surprised if Kovid's sda not detected case is caused by this. > For GoVault (that's SATAPI right?), yeah, maybe. For an ATA disk, no > way (hopefully). > > Can you consult with quantum about it? If they verify your fix (ie, > GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs > after SRST), I'll push similar patch upstream. > > Hmm.. or do we have to wait !BSY here as old IDE did? More thoughts. * IIRC, ata_piix reports 0xff if SATA link is broken. 0x80 or something similar if link established && waiting for the first FIS. Not sure whether this always holds but it tends to work that way. So, 0xff status means phy link broken, which shouldn't happen after SRST. Hmmm.. * Recently in ata_bus_softreset() and waiting functions, we've added the assumption status==0xff indicates empty port. This is taken from the IDE layer and proven to work for PATA devices. I'm not sure we can safely use this to SATA devices tho. If PHY is broken or the first FIS isn't received, arbitrary value (sometimes accidentally) chosen by the controller is used which may as well be 0xff. But, then again, many controllers do mean that there is no devices by 0xff status including ata_piix w/ its now infamous phantom device problems. I think we should honor 0xff iff SCR on/offline status is not available. How does that sound? -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-20 3:53 ` Tejun Heo 2006-12-20 4:30 ` Tejun Heo @ 2006-12-21 17:10 ` Gary Hade 2007-01-30 1:55 ` Gary Hade 1 sibling, 1 reply; 29+ messages in thread From: Gary Hade @ 2006-12-21 17:10 UTC (permalink / raw) To: Tejun Heo; +Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, Jeff Garzik On Wed, Dec 20, 2006 at 12:53:57PM +0900, Tejun Heo wrote: > Howdy, > > Gary Hade wrote: > > I noticed that Tejun recently provided a "libata: handle 0xff status > > properly" patch that is now in mainline that improves this code > > re: http://marc.theaimsgroup.com/?l=linux-ide&m=116038642105802&w=2 > > but I found that the check still failed but more silently and with no > > retries. > > > > I decided to try increasing the delay that preceeds the above > > check [ msleep(150); ] and found that a change from 150ms to > > 1000ms caused the problem to disappear. > > Aieeee, 150ms not enough for the device to send the first FIS after SRST? Yea, it appears so. :) GoVault access via 'ahci' is also fails in some cable placement configurations with: kernel: scsi1 : ahci kernel: ata2: softreset failed (1st FIS failed) kernel: ata2: softreset failed, retrying in 5 secs kernel: ata2: port is slow to respond, please be patient kernel: ata2: port failed to respond (30 secs) kernel: ata2: COMRESET failed (device not ready) kernel: ata2: hardreset failed, retrying in 5 secs kernel: ata2: port is slow to respond, please be patient kernel: ata2: port failed to respond (30 secs) kdump: kexec: failed to load kdump kernel kernel: ata2: COMRESET failed (device not ready) kernel: ata2: reset failed, giving up This problem also disappears after reversing ports to which the hard drive and GoVault cables are connected. The following timeout increase appears to correct the 'ahci' problem: --- ./linux-2.6.18.i386/drivers/scsi/ahci.c.orig 2006-12-19 09:07:58.000000000 -0800 +++ ./linux-2.6.18.i386/drivers/scsi/ahci.c 2006-12-19 13:30:29.000000000 -0800 @@ -788,7 +788,7 @@ static int ahci_softreset(struct ata_por writel(1, port_mmio + PORT_CMD_ISSUE); - tmp = ata_wait_register(port_mmio + PORT_CMD_ISSUE, 0x1, 0x1, 1, 500); + tmp = ata_wait_register(port_mmio + PORT_CMD_ISSUE, 0x1, 0x1, 1, 2500); if (tmp & 0x1) { rc = -EIO; reason = "1st FIS failed"; 1000ms, 1500ms, 1750ms, and 1900ms didn't work. 2000ms worked so 2500ms includes some extra to be safe. This experience seems to be more representative of the 1 to 2 second time (with RDC present) mentioned by Quantum (see below) than the 'ata_piix' 600-700ms experience. > > > I then replaced the msleep(150); with: > > { > > int i, ms = 5; > > msleep(ms); > > ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n", > > ms, ata_check_status(ap)); > > for (i = 1; i <= 20; i++) { > > ms += 50; > > msleep(50); > > ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n", > > ms, ata_check_status(ap)); > > } > > } > > > > Output for two cable placement configurations (0xFF check failure > > and 0xFF check success) are included below. Note that there are > > cable placement configurations for both the hard drive and > > GoVault where the initial status is 0xff. i.e. both transition > > from 0xff to 0x7f when BSY bit is cleared but it is taking MUCH > > longer for the GoVault (600-700ms for GoVault and <5ms for > > hard drive). It does not appear that the 0xff starting status > > is device specific. > > > > So, it appears that we have a situation with this SATA controller > > where a 0xFF status is not an accurate indication that there is > > no device. > > > > Although the 150ms to 1000ms delay increase works for the GoVault > > device I am not sure if it is the best long term fix for the problem. > > I would be surprised if Kovid's sda not detected case is caused by this. > For GoVault (that's SATAPI right?), yeah, maybe. Yes, the GoVault is an ATAPI device. > For an ATA disk, no way (hopefully). Yes, probably true that Kovid got the same errors but for a different reason. > > Can you consult with quantum about it? I checked with Quantum about this and they said: --- "We confirmed that if there's an RDC present when the soft reset is received, then it can take between one and two seconds to complete the reset. Issuing a SET FEATURES command to the RDC is the longest part of it. Even without an RDC, we've measured time on the order of 170 milliseconds. " --- The RDC has been present for almost all of my testing. Here are comparison traces with and without the RDC which definitely confirms the RDC factor. It also confirms the order of 170ms without RDC time that Quantum mentioned. ======== With RDC ======== kernel: ata1: status @ 5 ms: 0xff kernel: ata1: status @ 55 ms: 0xff kernel: ata1: status @ 105 ms: 0xff kernel: ata1: status @ 155 ms: 0xff kernel: ata1: status @ 205 ms: 0xff kernel: ata1: status @ 255 ms: 0xff kernel: ata1: status @ 305 ms: 0xff kernel: ata1: status @ 355 ms: 0xff kernel: ata1: status @ 405 ms: 0xff kernel: ata1: status @ 455 ms: 0xff kernel: ata1: status @ 505 ms: 0xff kernel: ata1: status @ 555 ms: 0xff kernel: ata1: status @ 605 ms: 0xff kernel: ata1: status @ 655 ms: 0x7f kernel: ata1: status @ 705 ms: 0x7f kernel: ata1: status @ 755 ms: 0x7f kernel: ata1: status @ 805 ms: 0x7f kernel: ata1: status @ 855 ms: 0x7f kernel: ata1: status @ 905 ms: 0x7f kernel: ata1: status @ 955 ms: 0x7f kernel: ata1: status @ 1005 ms: 0x7f =========== Without RDC =========== kernel: ata1: status @ 5 ms: 0xff kernel: ata1: status @ 55 ms: 0xff kernel: ata1: status @ 105 ms: 0xff kernel: ata1: status @ 155 ms: 0xff kernel: ata1: status @ 205 ms: 0x7f kernel: ata1: status @ 255 ms: 0x7f kernel: ata1: status @ 305 ms: 0x7f kernel: ata1: status @ 355 ms: 0x7f kernel: ata1: status @ 405 ms: 0x7f kernel: ata1: status @ 455 ms: 0x7f kernel: ata1: status @ 505 ms: 0x7f kernel: ata1: status @ 555 ms: 0x7f kernel: ata1: status @ 605 ms: 0x7f kernel: ata1: status @ 655 ms: 0x7f kernel: ata1: status @ 705 ms: 0x7f kernel: ata1: status @ 755 ms: 0x7f kernel: ata1: status @ 805 ms: 0x7f kernel: ata1: status @ 855 ms: 0x7f kernel: ata1: status @ 905 ms: 0x7f kernel: ata1: status @ 955 ms: 0x7f kernel: ata1: status @ 1005 ms: 0x7f > If they verify your fix (ie, > GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs > after SRST), I'll push similar patch upstream. Thanks. If you think that changes to increase the delays are the way to go (at least until we can find a better solution) I can provide patches. > > Hmm.. or do we have to wait !BSY here as old IDE did? Not sure. I'm fairly new to this stuff. Thanks! Gary -- Gary Hade IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-21 17:10 ` Gary Hade @ 2007-01-30 1:55 ` Gary Hade 2007-01-30 7:32 ` Tejun Heo 0 siblings, 1 reply; 29+ messages in thread From: Gary Hade @ 2007-01-30 1:55 UTC (permalink / raw) To: Gary Hade; +Cc: Tejun Heo, Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr On Thu, Dec 21, 2006 at 09:10:35AM -0800, Gary Hade wrote: > On Wed, Dec 20, 2006 at 12:53:57PM +0900, Tejun Heo wrote: > > Howdy, > > > > Gary Hade wrote: > > > I noticed that Tejun recently provided a "libata: handle 0xff status > > > properly" patch that is now in mainline that improves this code > > > re: http://marc.theaimsgroup.com/?l=linux-ide&m=116038642105802&w=2 > > > but I found that the check still failed but more silently and with no > > > retries. > > > > > > I decided to try increasing the delay that preceeds the above > > > check [ msleep(150); ] and found that a change from 150ms to > > > 1000ms caused the problem to disappear. > > > > Aieeee, 150ms not enough for the device to send the first FIS after SRST? > > Yea, it appears so. :) > > GoVault access via 'ahci' is also fails in some cable placement > configurations with: > kernel: scsi1 : ahci > kernel: ata2: softreset failed (1st FIS failed) > kernel: ata2: softreset failed, retrying in 5 secs > kernel: ata2: port is slow to respond, please be patient > kernel: ata2: port failed to respond (30 secs) > kernel: ata2: COMRESET failed (device not ready) > kernel: ata2: hardreset failed, retrying in 5 secs > kernel: ata2: port is slow to respond, please be patient > kernel: ata2: port failed to respond (30 secs) > kdump: kexec: failed to load kdump kernel > kernel: ata2: COMRESET failed (device not ready) > kernel: ata2: reset failed, giving up > > This problem also disappears after reversing ports to which > the hard drive and GoVault cables are connected. > > The following timeout increase appears to correct the 'ahci' problem: > > --- ./linux-2.6.18.i386/drivers/scsi/ahci.c.orig 2006-12-19 09:07:58.000000000 -0800 > +++ ./linux-2.6.18.i386/drivers/scsi/ahci.c 2006-12-19 13:30:29.000000000 -0800 > @@ -788,7 +788,7 @@ static int ahci_softreset(struct ata_por > > writel(1, port_mmio + PORT_CMD_ISSUE); > > - tmp = ata_wait_register(port_mmio + PORT_CMD_ISSUE, 0x1, 0x1, 1, 500); > + tmp = ata_wait_register(port_mmio + PORT_CMD_ISSUE, 0x1, 0x1, 1, 2500); > if (tmp & 0x1) { > rc = -EIO; > reason = "1st FIS failed"; > > 1000ms, 1500ms, 1750ms, and 1900ms didn't work. 2000ms worked so > 2500ms includes some extra to be safe. This experience seems to > be more representative of the 1 to 2 second time (with RDC present) > mentioned by Quantum (see below) than the 'ata_piix' 600-700ms > experience. > > > > > > I then replaced the msleep(150); with: > > > { > > > int i, ms = 5; > > > msleep(ms); > > > ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n", > > > ms, ata_check_status(ap)); > > > for (i = 1; i <= 20; i++) { > > > ms += 50; > > > msleep(50); > > > ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n", > > > ms, ata_check_status(ap)); > > > } > > > } > > > > > > Output for two cable placement configurations (0xFF check failure > > > and 0xFF check success) are included below. Note that there are > > > cable placement configurations for both the hard drive and > > > GoVault where the initial status is 0xff. i.e. both transition > > > from 0xff to 0x7f when BSY bit is cleared but it is taking MUCH > > > longer for the GoVault (600-700ms for GoVault and <5ms for > > > hard drive). It does not appear that the 0xff starting status > > > is device specific. > > > > > > So, it appears that we have a situation with this SATA controller > > > where a 0xFF status is not an accurate indication that there is > > > no device. > > > > > > Although the 150ms to 1000ms delay increase works for the GoVault > > > device I am not sure if it is the best long term fix for the problem. > > > > I would be surprised if Kovid's sda not detected case is caused by this. > > For GoVault (that's SATAPI right?), yeah, maybe. > > Yes, the GoVault is an ATAPI device. > > > For an ATA disk, no way (hopefully). > > Yes, probably true that Kovid got the same errors but for a > different reason. > > > > > Can you consult with quantum about it? > > I checked with Quantum about this and they said: > --- > "We confirmed that if there's an RDC present when the soft reset is > received, then it can take between one and two seconds to complete the > reset. Issuing a SET FEATURES command to the RDC is the longest part of > it. > > Even without an RDC, we've measured time on the order of 170 > milliseconds. " > --- > > The RDC has been present for almost all of my testing. Here > are comparison traces with and without the RDC which definitely > confirms the RDC factor. It also confirms the order of 170ms > without RDC time that Quantum mentioned. > > ======== > With RDC > ======== > kernel: ata1: status @ 5 ms: 0xff > kernel: ata1: status @ 55 ms: 0xff > kernel: ata1: status @ 105 ms: 0xff > kernel: ata1: status @ 155 ms: 0xff > kernel: ata1: status @ 205 ms: 0xff > kernel: ata1: status @ 255 ms: 0xff > kernel: ata1: status @ 305 ms: 0xff > kernel: ata1: status @ 355 ms: 0xff > kernel: ata1: status @ 405 ms: 0xff > kernel: ata1: status @ 455 ms: 0xff > kernel: ata1: status @ 505 ms: 0xff > kernel: ata1: status @ 555 ms: 0xff > kernel: ata1: status @ 605 ms: 0xff > kernel: ata1: status @ 655 ms: 0x7f > kernel: ata1: status @ 705 ms: 0x7f > kernel: ata1: status @ 755 ms: 0x7f > kernel: ata1: status @ 805 ms: 0x7f > kernel: ata1: status @ 855 ms: 0x7f > kernel: ata1: status @ 905 ms: 0x7f > kernel: ata1: status @ 955 ms: 0x7f > kernel: ata1: status @ 1005 ms: 0x7f > > =========== > Without RDC > =========== > kernel: ata1: status @ 5 ms: 0xff > kernel: ata1: status @ 55 ms: 0xff > kernel: ata1: status @ 105 ms: 0xff > kernel: ata1: status @ 155 ms: 0xff > kernel: ata1: status @ 205 ms: 0x7f > kernel: ata1: status @ 255 ms: 0x7f > kernel: ata1: status @ 305 ms: 0x7f > kernel: ata1: status @ 355 ms: 0x7f > kernel: ata1: status @ 405 ms: 0x7f > kernel: ata1: status @ 455 ms: 0x7f > kernel: ata1: status @ 505 ms: 0x7f > kernel: ata1: status @ 555 ms: 0x7f > kernel: ata1: status @ 605 ms: 0x7f > kernel: ata1: status @ 655 ms: 0x7f > kernel: ata1: status @ 705 ms: 0x7f > kernel: ata1: status @ 755 ms: 0x7f > kernel: ata1: status @ 805 ms: 0x7f > kernel: ata1: status @ 855 ms: 0x7f > kernel: ata1: status @ 905 ms: 0x7f > kernel: ata1: status @ 955 ms: 0x7f > kernel: ata1: status @ 1005 ms: 0x7f > > > If they verify your fix (ie, > > GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs > > after SRST), I'll push similar patch upstream. > > Thanks. If you think that changes to increase the delays are > the way to go (at least until we can find a better solution) > I can provide patches. Tejun, I haven't heard anything from you on this so I'm including a delay increase patch against 2.6.20-rc6 for the 'ata-piix' case below. I hope that you, Jeff, and others find this acceptable. With respect to the 'ahci' case w/2.6.20-rc6 the GoVault device is useable following boot although the below messages are being logged during initialization. Please let me know if you have any thoughts on this. scsi1 : ahci ata2: softreset failed (port busy but CLO unavailable) ata2: softreset failed, retrying in 5 secs ata2: port is slow to respond, please be patient (Status 0x80) ata2: port failed to respond (30 secs, Status 0x80) ata2: COMRESET failed (device not ready) ata2: hardreset failed, retrying in 5 secs ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATAPI, max UDMA/66 ata2.00: configured for UDMA/66 Thanks, Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc Controllers such as the ICH6R/ICH6RW may set the status to 0xFF when software reset is initiated even when the device is present. Since some removable media devices can take longer than 150ms to complete the 0xFF status check can fail even when the device is present. For example, a software reset for the Quantum GoVault removable hard can take as long as 2 seconds to complete. This patch eliminates incorrect software reset failures for slower than normal software reset responders by adding an additional 2 second wait when a 0xFF status is detected following the current 150ms wait. Signed-off-by: Gary Hade <garyhade@us.ibm.com> --- linux-2.6.20-rc6/drivers/ata/libata-core.c.orig 2007-01-24 18:19:28.000000000 -0800 +++ linux-2.6.20-rc6/drivers/ata/libata-core.c 2007-01-29 16:39:34.000000000 -0800 @@ -2683,6 +2683,13 @@ static unsigned int ata_bus_softreset(st */ msleep(150); + /* For those controllers where the status could start out at + * 0xFF even though the device is present we may need to wait + * a little longer for slower removable media devices to respond. + */ + if (ata_check_status(ap) == 0xFF) + msleep(2000); + /* Before we perform post reset processing we want to see if * the bus shows 0xFF because the odd clown forgets the D7 * pulldown resistor. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-30 1:55 ` Gary Hade @ 2007-01-30 7:32 ` Tejun Heo 2007-01-30 23:37 ` Gary Hade 0 siblings, 1 reply; 29+ messages in thread From: Tejun Heo @ 2007-01-30 7:32 UTC (permalink / raw) To: Gary Hade; +Cc: Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr Hello, Gary. Gary Hade wrote: >>> If they verify your fix (ie, >>> GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs >>> after SRST), I'll push similar patch upstream. >> Thanks. If you think that changes to increase the delays are >> the way to go (at least until we can find a better solution) >> I can provide patches. > > Tejun, > I haven't heard anything from you on this so I'm including a delay > increase patch against 2.6.20-rc6 for the 'ata-piix' case below. > I hope that you, Jeff, and others find this acceptable. Sorry about being unresponsive. The thing is that the change adds unnecessary 2 secs of delay to a lot of other normal device-not-present cases, so I was hesitant to ack the patch. I'll give it more thoughts (and respond timely this time :-) > With respect to the 'ahci' case w/2.6.20-rc6 the GoVault device is > useable following boot although the below messages are being logged > during initialization. Please let me know if you have any thoughts > on this. > scsi1 : ahci > ata2: softreset failed (port busy but CLO unavailable) > ata2: softreset failed, retrying in 5 secs > ata2: port is slow to respond, please be patient (Status 0x80) > ata2: port failed to respond (30 secs, Status 0x80) > ata2: COMRESET failed (device not ready) > ata2: hardreset failed, retrying in 5 secs > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata2.00: ATAPI, max UDMA/66 > ata2.00: configured for UDMA/66 The above should have been fixed in 2.6.20-rc6. Please test it. It was caused by the ahci driver incorrectly clearing ahci CAP register and fixed recently. Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-30 7:32 ` Tejun Heo @ 2007-01-30 23:37 ` Gary Hade 2007-01-31 0:54 ` Jeff Garzik ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Gary Hade @ 2007-01-30 23:37 UTC (permalink / raw) To: Tejun Heo; +Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr On Tue, Jan 30, 2007 at 04:32:34PM +0900, Tejun Heo wrote: > Hello, Gary. > > Gary Hade wrote: > >>> If they verify your fix (ie, > >>> GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs > >>> after SRST), I'll push similar patch upstream. > >> Thanks. If you think that changes to increase the delays are > >> the way to go (at least until we can find a better solution) > >> I can provide patches. > > > > Tejun, > > I haven't heard anything from you on this so I'm including a delay > > increase patch against 2.6.20-rc6 for the 'ata-piix' case below. > > I hope that you, Jeff, and others find this acceptable. > > Sorry about being unresponsive. The thing is that the change adds > unnecessary 2 secs of delay to a lot of other normal device-not-present > cases, so I was hesitant to ack the patch. I'll give it more thoughts > (and respond timely this time :-) Thanks! My followup was untimely so we're even. :-) Some of my random thoughts: There does appear to be this invalid assumption that 0xFF status always implies device-not-present. The status register access restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The contents of this register, except for BSY, shall be ignored when BSY is set to one." which the code does not honor. There is apparently past experience that 0xFF status implies device-not-present for some controllers (the odd clowns :) but I have no idea how common these are. We obviously can't get rid of the check but since we cannot clear the read-only status register and there appears to be no specification dictated upper limit on how long it should take for a software reset to complete it just seems like we need to wait long enough to support the slowest known device which may be the GoVault. > > > With respect to the 'ahci' case w/2.6.20-rc6 the GoVault device is > > useable following boot although the below messages are being logged > > during initialization. Please let me know if you have any thoughts > > on this. > > scsi1 : ahci > > ata2: softreset failed (port busy but CLO unavailable) > > ata2: softreset failed, retrying in 5 secs > > ata2: port is slow to respond, please be patient (Status 0x80) > > ata2: port failed to respond (30 secs, Status 0x80) > > ata2: COMRESET failed (device not ready) > > ata2: hardreset failed, retrying in 5 secs > > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > ata2.00: ATAPI, max UDMA/66 > > ata2.00: configured for UDMA/66 > > The above should have been fixed in 2.6.20-rc6. Please test it. It was > caused by the ahci driver incorrectly clearing ahci CAP register and > fixed recently. I'm clearly seeing this with 2.6.20-rc6 but unlike the ata-piix issue it does not appear to be dependent on the port to which the device is attached. I've been playing around with this today and found that it could be solved by inserting a delay between the ahci_stop_engine() call and BSY/DRQ check. This change: --- linux-2.6.20-rc6/drivers/ata/ahci.c.orig 2007-01-30 11:01:20.000000000 -0800 +++ linux-2.6.20-rc6/drivers/ata/ahci.c 2007-01-30 12:59:38.000000000 -0800 @@ -804,6 +804,19 @@ static int ahci_softreset(struct ata_por goto fail_restart; } + { + int delay; + u8 stat; + for (delay = 0; delay < 2000; delay+=100) { + if (!(ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ))) + break; + msleep(100); + stat = ahci_check_status(ap); + ata_port_printk(ap, KERN_INFO, "delay=%d BSY=%d DRQ=%d\n", + delay, (stat & ATA_BUSY)?1:0, (stat & ATA_DRQ)?1:0); + } + } + /* check BUSY/DRQ, perform Command List Override if necessary */ if (ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)) { rc = ahci_clo(ap); Yielded this output both with and without the RDC inserted: scsi1 : ahci ata2: delay=0 BSY=1 DRQ=0 ata2: delay=100 BSY=1 DRQ=0 ata2: delay=200 BSY=1 DRQ=0 ata2: delay=300 BSY=1 DRQ=0 ata2: delay=400 BSY=1 DRQ=0 ata2: delay=500 BSY=1 DRQ=0 ata2: delay=600 BSY=1 DRQ=0 ata2: delay=700 BSY=1 DRQ=0 ata2: delay=800 BSY=1 DRQ=0 ata2: delay=900 BSY=1 DRQ=0 ata2: delay=1000 BSY=1 DRQ=0 ata2: delay=1100 BSY=1 DRQ=0 ata2: delay=1200 BSY=0 DRQ=0 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATAPI, max UDMA/66 ata2.00: configured for UDMA/66 So it appears that we may also have a similar device slowness issue with this driver. Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-30 23:37 ` Gary Hade @ 2007-01-31 0:54 ` Jeff Garzik 2007-01-31 11:00 ` Tejun Heo 2007-01-31 10:44 ` Tejun Heo 2007-02-17 0:34 ` Gary Hade 2 siblings, 1 reply; 29+ messages in thread From: Jeff Garzik @ 2007-01-31 0:54 UTC (permalink / raw) To: Gary Hade; +Cc: Tejun Heo, Kovid Goyal, linux-ide, lcm, konradr Gary Hade wrote: > Some of my random thoughts: > There does appear to be this invalid assumption that 0xFF status > always implies device-not-present. The status register access > restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The > contents of this register, except for BSY, shall be ignored when > BSY is set to one." which the code does not honor. There is apparently > past experience that 0xFF status implies device-not-present for some > controllers (the odd clowns :) but I have no idea how common these are. > We obviously can't get rid of the check but since we cannot clear > the read-only status register and there appears to be no specification > dictated upper limit on how long it should take for a software reset to > complete it just seems like we need to wait long enough to support the > slowest known device which may be the GoVault. SRST is specified to take no longer than 31 seconds from the clearing to the SRST bit to the clearing of the BSY bit. Look through the software reset protocol documentation (ATA/ATAPI-7 volume 2), mainly the device state machines. Jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-31 0:54 ` Jeff Garzik @ 2007-01-31 11:00 ` Tejun Heo 2007-01-31 12:20 ` Alan 0 siblings, 1 reply; 29+ messages in thread From: Tejun Heo @ 2007-01-31 11:00 UTC (permalink / raw) To: Jeff Garzik; +Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, konradr Jeff Garzik wrote: > SRST is specified to take no longer than 31 seconds from the clearing to > the SRST bit to the clearing of the BSY bit. > > Look through the software reset protocol documentation (ATA/ATAPI-7 > volume 2), mainly the device state machines. Jeff, this is different delay. We're short circuiting the 31sec wait when the reported status is 0xff to avoid wasting half a minute on port which has no device but is showing 0xff because bus datalines are not pulled down (PATA) or the specific controller is made to emulate 0xff when link isn't established yet (SATA). This never matters for PATA devices because no sane device reports 0xff as the status register value regardless of its state - 0xff is the special value indicating no device attached and data bus is floating, and 150ms is more than enough to allow the device to report its status register value (the spec says >= 2ms). Some SATA controllers use 0xff to indicate empty port. This seldomly matters as we have the almighty SStatus register to check device presence (there is a bug regarding this, patch pending). This GoVault drive fails because ata_piix doesn't have SCR while using 0xff to indicate port not ready (dunno exact which state causes 0xff status tho) while the GoVault drive fails to clear that state in 150ms (not 30s). The libata sees 0xff after SRST if GoVault drive is attached and thinks that the port is empty. So, I'm afraid there is no easy way out here but to wait longer for 0xff to clear. As Kovid suggested just a few secs should be enough. Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-31 11:00 ` Tejun Heo @ 2007-01-31 12:20 ` Alan 2007-01-31 13:16 ` Tejun Heo 0 siblings, 1 reply; 29+ messages in thread From: Alan @ 2007-01-31 12:20 UTC (permalink / raw) To: Tejun Heo; +Cc: Jeff Garzik, Gary Hade, Kovid Goyal, linux-ide, lcm, konradr > Some SATA controllers use 0xff to indicate empty port. This seldomly > matters as we have the almighty SStatus register to check device > presence (there is a bug regarding this, patch pending). > > This GoVault drive fails because ata_piix doesn't have SCR while using > 0xff to indicate port not ready (dunno exact which state causes 0xff > status tho) while the GoVault drive fails to clear that state in 150ms > (not 30s). The libata sees 0xff after SRST if GoVault drive is attached So we can also cut this down by only doing the extra polling on a device which is SATA and lacks SCR ? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-31 12:20 ` Alan @ 2007-01-31 13:16 ` Tejun Heo 2007-01-31 15:24 ` Jeff Garzik 0 siblings, 1 reply; 29+ messages in thread From: Tejun Heo @ 2007-01-31 13:16 UTC (permalink / raw) To: Alan; +Cc: Jeff Garzik, Gary Hade, Kovid Goyal, linux-ide, lcm, konradr Alan wrote: >> Some SATA controllers use 0xff to indicate empty port. This seldomly >> matters as we have the almighty SStatus register to check device >> presence (there is a bug regarding this, patch pending). >> >> This GoVault drive fails because ata_piix doesn't have SCR while using >> 0xff to indicate port not ready (dunno exact which state causes 0xff >> status tho) while the GoVault drive fails to clear that state in 150ms >> (not 30s). The libata sees 0xff after SRST if GoVault drive is attached > > So we can also cut this down by only doing the extra polling on a device > which is SATA and lacks SCR ? That's true but the offending one is ata_piix, so the cutting down is not as effective. If we can live with the extra two secs per empty port for some of ata_piix for the time being (maybe two or three releases), the delay can be added now. One more thing to consider is GoVault is the only known device to show this behavior till now. Hmm... What do you (Alan and Jeff) think? -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-31 13:16 ` Tejun Heo @ 2007-01-31 15:24 ` Jeff Garzik 2007-01-31 15:30 ` Mark Lord 0 siblings, 1 reply; 29+ messages in thread From: Jeff Garzik @ 2007-01-31 15:24 UTC (permalink / raw) To: Tejun Heo; +Cc: Alan, Gary Hade, Kovid Goyal, linux-ide, lcm, konradr Tejun Heo wrote: > Alan wrote: >>> Some SATA controllers use 0xff to indicate empty port. This seldomly >>> matters as we have the almighty SStatus register to check device >>> presence (there is a bug regarding this, patch pending). >>> >>> This GoVault drive fails because ata_piix doesn't have SCR while using >>> 0xff to indicate port not ready (dunno exact which state causes 0xff >>> status tho) while the GoVault drive fails to clear that state in 150ms >>> (not 30s). The libata sees 0xff after SRST if GoVault drive is attached >> So we can also cut this down by only doing the extra polling on a device >> which is SATA and lacks SCR ? > > That's true but the offending one is ata_piix, so the cutting down is > not as effective. If we can live with the extra two secs per empty port > for some of ata_piix for the time being (maybe two or three releases), > the delay can be added now. One more thing to consider is GoVault is > the only known device to show this behavior till now. > > Hmm... What do you (Alan and Jeff) think? That last factor weighs on my mind. While I don't mind making changes for this device, and taking into consideration Alan's recent comments that some ATAPI workarounds are still yet to appear for libata, I still dislike making changes for one specific device with non-standard behavior. Jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-31 15:24 ` Jeff Garzik @ 2007-01-31 15:30 ` Mark Lord 0 siblings, 0 replies; 29+ messages in thread From: Mark Lord @ 2007-01-31 15:30 UTC (permalink / raw) To: Jeff Garzik Cc: Tejun Heo, Alan, Gary Hade, Kovid Goyal, linux-ide, lcm, konradr Jeff Garzik wrote: > Tejun Heo wrote: >> Alan wrote: >>>> Some SATA controllers use 0xff to indicate empty port. This seldomly >>>> matters as we have the almighty SStatus register to check device >>>> presence (there is a bug regarding this, patch pending). >>>> >>>> This GoVault drive fails because ata_piix doesn't have SCR while using >>>> 0xff to indicate port not ready (dunno exact which state causes 0xff >>>> status tho) while the GoVault drive fails to clear that state in 150ms >>>> (not 30s). The libata sees 0xff after SRST if GoVault drive is >>>> attached >>> So we can also cut this down by only doing the extra polling on a device >>> which is SATA and lacks SCR ? >> >> That's true but the offending one is ata_piix, so the cutting down is >> not as effective. If we can live with the extra two secs per empty port .. > While I don't mind making changes for this device, and taking into > consideration Alan's recent comments that some ATAPI workarounds are > still yet to appear for libata, I still dislike making changes for one > specific device with non-standard behavior. How does drivers/ide manage with this device, or does it? It would be useful here to patch the PCI ID into drivers/ide for that PIIX variant, and try it.. just to see what the behaviour is. There are other possible ways to avoid a 2-second per-port delay at boot. Eg. by attempting write+readback on some of the other registers. It all sounds very messy, but that's the ATA/ATAPI world. Cheers ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-30 23:37 ` Gary Hade 2007-01-31 0:54 ` Jeff Garzik @ 2007-01-31 10:44 ` Tejun Heo 2007-01-31 10:47 ` Jeff Garzik 2007-02-01 0:49 ` Gary Hade 2007-02-17 0:34 ` Gary Hade 2 siblings, 2 replies; 29+ messages in thread From: Tejun Heo @ 2007-01-31 10:44 UTC (permalink / raw) To: Gary Hade; +Cc: Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr Gary Hade wrote: > Some of my random thoughts: > There does appear to be this invalid assumption that 0xFF status > always implies device-not-present. The status register access > restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The > contents of this register, except for BSY, shall be ignored when > BSY is set to one." which the code does not honor. There is apparently > past experience that 0xFF status implies device-not-present for some > controllers (the odd clowns :) but I have no idea how common these are. The 0xff is the value we get when there is no device present and the motherboard manufacturer forgot to pull down the ATA bus. It's not very uncommon in cheap PATA world and, following the weird tradition, some SATA controllers choose to emulate 0xff if there is no device attached (link not established). Not sure how many of them does it but intel's SATA chipset is one of them, so we're pretty much stuck with it. ie. In many P/SATA setups, your patch would add 2 extra secs of waiting for empty ports. > We obviously can't get rid of the check but since we cannot clear > the read-only status register and there appears to be no specification > dictated upper limit on how long it should take for a software reset to > complete it just seems like we need to wait long enough to support the > slowest known device which may be the GoVault. Agreed but still hesitant to ack the patch. :-) I'm gonna work on parallel probing for libata. I think we can easily hide extra 2 secs of waiting with parallel probing. It will take some time but that seems to be the 'right' thing to do especially considering the fact that 150ms sleep has been enough for gazillions of ATA devices during last decade except for this GoVault drive. I'll leave this thread in my to-do folder and apply your patch after parallel probing is in place (optimistic ETA 1 month). How does that sound? Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-31 10:44 ` Tejun Heo @ 2007-01-31 10:47 ` Jeff Garzik 2007-01-31 11:00 ` Tejun Heo 2007-02-01 0:49 ` Gary Hade 1 sibling, 1 reply; 29+ messages in thread From: Jeff Garzik @ 2007-01-31 10:47 UTC (permalink / raw) To: Tejun Heo; +Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, konradr Tejun Heo wrote: > I'm gonna work on parallel probing for libata. I think we can easily > hide extra 2 secs of waiting with parallel probing. It will take some > time but that seems to be the 'right' thing to do especially considering > the fact that 150ms sleep has been enough for gazillions of ATA devices > during last decade except for this GoVault drive. FWIW I don't mind polling for a /maximum/ of $N seconds, but I want to stress that any patch that blatantly adds stuff like if (status == 0xff) mdelay(2000) will be rejected. Jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-31 10:47 ` Jeff Garzik @ 2007-01-31 11:00 ` Tejun Heo 0 siblings, 0 replies; 29+ messages in thread From: Tejun Heo @ 2007-01-31 11:00 UTC (permalink / raw) To: Jeff Garzik; +Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, konradr Jeff Garzik wrote: > Tejun Heo wrote: >> I'm gonna work on parallel probing for libata. I think we can easily >> hide extra 2 secs of waiting with parallel probing. It will take some >> time but that seems to be the 'right' thing to do especially considering >> the fact that 150ms sleep has been enough for gazillions of ATA devices >> during last decade except for this GoVault drive. > > > FWIW I don't mind polling for a /maximum/ of $N seconds, but I want to > stress that any patch that blatantly adds stuff like > > if (status == 0xff) > mdelay(2000) > > will be rejected. Aye aye captain. I wouldn't do that myself either. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-31 10:44 ` Tejun Heo 2007-01-31 10:47 ` Jeff Garzik @ 2007-02-01 0:49 ` Gary Hade 1 sibling, 0 replies; 29+ messages in thread From: Gary Hade @ 2007-02-01 0:49 UTC (permalink / raw) To: Tejun Heo; +Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr On Wed, Jan 31, 2007 at 07:44:43PM +0900, Tejun Heo wrote: > Gary Hade wrote: > > Some of my random thoughts: > > There does appear to be this invalid assumption that 0xFF status > > always implies device-not-present. The status register access > > restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The > > contents of this register, except for BSY, shall be ignored when > > BSY is set to one." which the code does not honor. There is apparently > > past experience that 0xFF status implies device-not-present for some > > controllers (the odd clowns :) but I have no idea how common these are. > > The 0xff is the value we get when there is no device present and the > motherboard manufacturer forgot to pull down the ATA bus. It's not very > uncommon in cheap PATA world and, following the weird tradition, some > SATA controllers choose to emulate 0xff if there is no device attached > (link not established). Not sure how many of them does it but intel's > SATA chipset is one of them, so we're pretty much stuck with it. > > ie. In many P/SATA setups, your patch would add 2 extra secs of waiting > for empty ports. > > > We obviously can't get rid of the check but since we cannot clear > > the read-only status register and there appears to be no specification > > dictated upper limit on how long it should take for a software reset to > > complete it just seems like we need to wait long enough to support the > > slowest known device which may be the GoVault. > > Agreed but still hesitant to ack the patch. :-) > > I'm gonna work on parallel probing for libata. I think we can easily > hide extra 2 secs of waiting with parallel probing. It will take some > time but that seems to be the 'right' thing to do especially considering > the fact that 150ms sleep has been enough for gazillions of ATA devices > during last decade except for this GoVault drive. > > I'll leave this thread in my to-do folder and apply your patch after > parallel probing is in place (optimistic ETA 1 month). How does that sound? Thanks. Solution sounds great. ETA may be an issue since we need to get the device enabled in a near-term release. I'll let you know. Polling version included below. GoVault appeared to be the only currently known device that would benefit from polling so I didn't think the extra code was necessary. After considering Jeff's comment, it makes sense prepare for the possibility of those future slow but not quite so slow devices. Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc Controllers such as the ICH6R/ICH6RW may set the status to 0xFF when software reset is initiated even when the device is present. Since some removable media devices can take longer than 150ms to complete the 0xFF status check can fail even when the device is present. For example, a software reset for the Quantum GoVault removable hard can take as long as 2 seconds to complete. This patch eliminates incorrect software reset failures for slower than normal software reset responders by adding an additional 2 second wait when a 0xFF status is detected following the current 150ms wait. Signed-off-by: Gary Hade <garyhade@us.ibm.com> --- linux-2.6.20-rc6/drivers/ata/libata-core.c.orig 2007-01-24 18:19:28.000000000 -0800 +++ linux-2.6.20-rc6/drivers/ata/libata-core.c 2007-01-31 12:00:52.000000000 -0800 @@ -2653,6 +2653,8 @@ static unsigned int ata_bus_softreset(st unsigned int devmask) { struct ata_ioports *ioaddr = &ap->ioaddr; + u8 status; + unsigned long timeout; DPRINTK("ata%u: bus reset via SRST\n", ap->id); @@ -2683,11 +2685,22 @@ static unsigned int ata_bus_softreset(st */ msleep(150); + /* For those controllers where the status could start out at + * 0xFF even though the device is present we wait up to 2 seconds + * longer for slower removable media devices to respond. + */ + status = ata_check_status(ap); + timeout = jiffies + 2*HZ; + while (status == 0xFF && time_before(jiffies, timeout)) { + msleep(50); + status = ata_check_status(ap); + } + /* Before we perform post reset processing we want to see if * the bus shows 0xFF because the odd clown forgets the D7 * pulldown resistor. */ - if (ata_check_status(ap) == 0xFF) + if (status == 0xFF) return 0; ata_bus_post_reset(ap, devmask); ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-01-30 23:37 ` Gary Hade 2007-01-31 0:54 ` Jeff Garzik 2007-01-31 10:44 ` Tejun Heo @ 2007-02-17 0:34 ` Gary Hade 2007-02-21 12:40 ` Tejun Heo 2 siblings, 1 reply; 29+ messages in thread From: Gary Hade @ 2007-02-17 0:34 UTC (permalink / raw) To: Gary Hade; +Cc: Tejun Heo, Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr On Tue, Jan 30, 2007 at 03:37:36PM -0800, Gary Hade wrote: > On Tue, Jan 30, 2007 at 04:32:34PM +0900, Tejun Heo wrote: > > Hello, Gary. > > > > Gary Hade wrote: > > >>> If they verify your fix (ie, > > >>> GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs > > >>> after SRST), I'll push similar patch upstream. > > >> Thanks. If you think that changes to increase the delays are > > >> the way to go (at least until we can find a better solution) > > >> I can provide patches. > > > > > > Tejun, > > > I haven't heard anything from you on this so I'm including a delay > > > increase patch against 2.6.20-rc6 for the 'ata-piix' case below. > > > I hope that you, Jeff, and others find this acceptable. > > > > Sorry about being unresponsive. The thing is that the change adds > > unnecessary 2 secs of delay to a lot of other normal device-not-present > > cases, so I was hesitant to ack the patch. I'll give it more thoughts > > (and respond timely this time :-) > > Thanks! My followup was untimely so we're even. :-) > > Some of my random thoughts: > There does appear to be this invalid assumption that 0xFF status > always implies device-not-present. The status register access > restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The > contents of this register, except for BSY, shall be ignored when > BSY is set to one." which the code does not honor. There is apparently > past experience that 0xFF status implies device-not-present for some > controllers (the odd clowns :) but I have no idea how common these are. > We obviously can't get rid of the check but since we cannot clear > the read-only status register and there appears to be no specification > dictated upper limit on how long it should take for a software reset to > complete it just seems like we need to wait long enough to support the > slowest known device which may be the GoVault. > > > > > > With respect to the 'ahci' case w/2.6.20-rc6 the GoVault device is > > > useable following boot although the below messages are being logged > > > during initialization. Please let me know if you have any thoughts > > > on this. > > > scsi1 : ahci > > > ata2: softreset failed (port busy but CLO unavailable) > > > ata2: softreset failed, retrying in 5 secs > > > ata2: port is slow to respond, please be patient (Status 0x80) > > > ata2: port failed to respond (30 secs, Status 0x80) > > > ata2: COMRESET failed (device not ready) > > > ata2: hardreset failed, retrying in 5 secs > > > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > > ata2.00: ATAPI, max UDMA/66 > > > ata2.00: configured for UDMA/66 > > > > The above should have been fixed in 2.6.20-rc6. Please test it. It was > > caused by the ahci driver incorrectly clearing ahci CAP register and > > fixed recently. > > I'm clearly seeing this with 2.6.20-rc6 but unlike the ata-piix > issue it does not appear to be dependent on the port to which the > device is attached. I've been playing around with this today and > found that it could be solved by inserting a delay between the > ahci_stop_engine() call and BSY/DRQ check. > > This change: > --- linux-2.6.20-rc6/drivers/ata/ahci.c.orig 2007-01-30 11:01:20.000000000 -0800 > +++ linux-2.6.20-rc6/drivers/ata/ahci.c 2007-01-30 12:59:38.000000000 -0800 > @@ -804,6 +804,19 @@ static int ahci_softreset(struct ata_por > goto fail_restart; > } > > + { > + int delay; > + u8 stat; > + for (delay = 0; delay < 2000; delay+=100) { > + if (!(ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ))) > + break; > + msleep(100); > + stat = ahci_check_status(ap); > + ata_port_printk(ap, KERN_INFO, "delay=%d BSY=%d DRQ=%d\n", > + delay, (stat & ATA_BUSY)?1:0, (stat & ATA_DRQ)?1:0); > + } > + } > + > /* check BUSY/DRQ, perform Command List Override if necessary */ > if (ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)) { > rc = ahci_clo(ap); > > Yielded this output both with and without the RDC inserted: > scsi1 : ahci > ata2: delay=0 BSY=1 DRQ=0 > ata2: delay=100 BSY=1 DRQ=0 > ata2: delay=200 BSY=1 DRQ=0 > ata2: delay=300 BSY=1 DRQ=0 > ata2: delay=400 BSY=1 DRQ=0 > ata2: delay=500 BSY=1 DRQ=0 > ata2: delay=600 BSY=1 DRQ=0 > ata2: delay=700 BSY=1 DRQ=0 > ata2: delay=800 BSY=1 DRQ=0 > ata2: delay=900 BSY=1 DRQ=0 > ata2: delay=1000 BSY=1 DRQ=0 > ata2: delay=1100 BSY=1 DRQ=0 > ata2: delay=1200 BSY=0 DRQ=0 > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata2.00: ATAPI, max UDMA/66 > ata2.00: configured for UDMA/66 > > So it appears that we may also have a similar device slowness issue > with this driver. Tejun, I instrumented the code and found that for the SATA hard drive BSY was set just before the call to ahci_init_port() from ahci_port_start() and clear after the return from ahci_init_port(). For the GoVault BSY was still set after the return from ahci_init_port() and remained set for almost 2 seconds. The below patch which gives BSY some extra time to clear repairs the problem. Unlike the extra delay for ata-piix needed by GoVault I believe this delay will only be seen for attached devices that need it. Please let me know what you think. Thanks. Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc We encountered a problem where the BSY status bit is still set on entry to the 'ahci' error handler during initialization of the Quantum GoVault when attached to an ICH6R/ICH6RW controller. This caused a software reset failure due to failed BSY/DRQ check forcing a hard reset with the following messages logged. ata1: softreset failed (port busy but CLO unavailable) ata1: softreset failed, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0x80) ata1: port failed to respond (30 secs, Status 0x80) ata1: COMRESET failed (device not ready) ata1: hardreset failed, retrying in 5 secs It was taking almost 2 seconds for BSY to clear following the return from ahci_init_port() in ahci_port_start() so this patch gives BSY up to 3 seconds extra time to clear eliminating the problem. Signed-off-by: Gary Hade <garyhade@us.ibm.com> --- linux-2.6.20-rc7/drivers/ata/ahci.c.orig 2007-02-16 10:11:21.000000000 -0800 +++ linux-2.6.20-rc7/drivers/ata/ahci.c 2007-02-16 13:23:04.000000000 -0800 @@ -1423,6 +1423,8 @@ static int ahci_port_start(struct ata_po void *mem; dma_addr_t mem_dma; int rc; + u8 status; + unsigned long timeout; pp = kmalloc(sizeof(*pp), GFP_KERNEL); if (!pp) @@ -1477,6 +1479,17 @@ static int ahci_port_start(struct ata_po /* initialize port */ ahci_init_port(port_mmio, hpriv->cap, pp->cmd_slot_dma, pp->rx_fis_dma); + status = ahci_check_status(ap); + + /* for some devices we need to delay to allow BSY to clear */ + if (status & ATA_BUSY) { + timeout = jiffies + 3*HZ; + while ((status & ATA_BUSY) && time_before(jiffies, timeout)) { + msleep(50); + status = ahci_check_status(ap); + } + } + return 0; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-02-17 0:34 ` Gary Hade @ 2007-02-21 12:40 ` Tejun Heo 2007-02-22 0:41 ` Gary Hade 2007-02-23 0:32 ` Gary Hade 0 siblings, 2 replies; 29+ messages in thread From: Tejun Heo @ 2007-02-21 12:40 UTC (permalink / raw) To: Gary Hade; +Cc: Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr Hello, Gary. Gary Hade wrote: > I instrumented the code and found that for the SATA hard drive BSY was set > just before the call to ahci_init_port() from ahci_port_start() and clear > after the return from ahci_init_port(). For the GoVault BSY was still set > after the return from ahci_init_port() and remained set for almost 2 seconds. > > The below patch which gives BSY some extra time to clear repairs the problem. > Unlike the extra delay for ata-piix needed by GoVault I believe this delay > will only be seen for attached devices that need it. Please let me know > what you think. Can you post full dmesg without the patch? And which controller are you using (lspci -nn please)? How come it doesn't support CLO? I don't think the patch is a good idea. !BSY wait before reset is supposed to be done in ->prereset() if possible. ahci skips that step because waiting for !BSY isn't reliable after hotplug and it often wastes 30secs for no good reason after an hotplug event. Even if you need to add that specific shorter wait, the correct place would be ->prereset() not at the end of ->port_start(). I don't think fixing here and there for GoVault drive is a good idea. It's not gonna be tested widely and GoValut would end up working on some controllers while broken on others. I think recently posted EH improvements and faster reset patches should help here. The first reset will fail timely and all in all the drive should be detected in slightly over ten secs, which isn't that bad and actually is okay if parallel probing is implemented. I dunno. Maybe the correct thing to do is wait briefly for !BSY in std_prereset() when SKIP_D2H_BSY is set. What status value does the controller report in that state - 0x80 or 0xff? Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-02-21 12:40 ` Tejun Heo @ 2007-02-22 0:41 ` Gary Hade 2007-02-23 0:32 ` Gary Hade 1 sibling, 0 replies; 29+ messages in thread From: Gary Hade @ 2007-02-22 0:41 UTC (permalink / raw) To: Tejun Heo; +Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr Hi Tejun, Thanks for the help. On Wed, Feb 21, 2007 at 09:40:10PM +0900, Tejun Heo wrote: > Hello, Gary. > > Gary Hade wrote: > > I instrumented the code and found that for the SATA hard drive BSY was set > > just before the call to ahci_init_port() from ahci_port_start() and clear > > after the return from ahci_init_port(). For the GoVault BSY was still set > > after the return from ahci_init_port() and remained set for almost 2 seconds. > > > > The below patch which gives BSY some extra time to clear repairs the problem. > > Unlike the extra delay for ata-piix needed by GoVault I believe this delay > > will only be seen for attached devices that need it. Please let me know > > what you think. > > Can you post full dmesg without the patch? Included below. > And which controller are you using (lspci -nn please)? [root@elm3a114 ~]# lspci -s 00:1f.2 00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA Controller (rev 03) [root@elm3a114 ~]# lspci -s 00:1f.2 -nn 00:1f.2 0101: 8086:2652 (rev 03) [root@elm3a114 ~]# lspci -s 00:1f.2 -nvv 00:1f.2 0101: 8086:2652 (rev 03) (prog-if 8f) Subsystem: 1014:2652 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin B routed to IRQ 19 Region 0: I/O ports at 20c0 [size=8] Region 1: I/O ports at 20b8 [size=4] Region 2: I/O ports at 20b0 [size=8] Region 3: I/O ports at 20a4 [size=4] Region 4: I/O ports at 2090 [size=16] Region 5: Memory at d8500c00 (32-bit, non-prefetchable) [size=1K] Capabilities: [70] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- > How come it doesn't support CLO? I don't know. > > I don't think the patch is a good idea. No problem. I asked for your opinion. :) > !BSY wait before reset is > supposed to be done in ->prereset() if possible. ahci skips that step > because waiting for !BSY isn't reliable after hotplug and it often > wastes 30secs for no good reason after an hotplug event. > > Even if you need to add that specific shorter wait, the correct place > would be ->prereset() not at the end of ->port_start(). Yea, I did see that wait for !BSY in ata_std_prereset() and remember wondering why it was turned off for ahci where it would have nicely solved the problem. I actually considered inserting the short wait there (i.e. the change you suggest below) but since the problem appeared to happen only during initialization I thought it might be better to put it in a less frequently visited ahci-only place. > > I don't think fixing here and there for GoVault drive is a good idea. > It's not gonna be tested widely and GoValut would end up working on some > controllers while broken on others. I think recently posted EH > improvements and faster reset patches should help here. The first reset > will fail timely and all in all the drive should be detected in slightly > over ten secs, which isn't that bad and actually is okay if parallel > probing is implemented. I could give them a try but it seems unlikely that they will show up in all of the soon to be released Distro updates we're worried about. > > I dunno. Maybe the correct thing to do is wait briefly for !BSY in > std_prereset() when SKIP_D2H_BSY is set. This definitely works for me if the other information I included in this message doesn't change your mind. > What status value does the > controller report in that state - 0x80 or 0xff? 0x80 I believe I tried with GoVault attached to each of the 4 ports and always saw 0x80. Below is my ugly 2.6.20-rc6 debug output (hard drive on ata2, GoVault on ata3) that may be of interest. Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc ======================== 2.6.20-rc7 dmesg - start ======================== Linux version 2.6.20-rc7 (root@elm3a114.beaverton.ibm.com) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP Fri Feb 16 09:54:30 PST 2007 Command line: ro root=LABEL=/ BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009b800 (usable) BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fef0000 (usable) BIOS-e820: 000000003fef0000 - 000000003fef6000 (ACPI data) BIOS-e820: 000000003fef6000 - 000000003ff00000 (ACPI NVS) BIOS-e820: 000000003ff00000 - 0000000040000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved) Entering add_active_range(0, 0, 155) 0 entries of 3200 used Entering add_active_range(0, 256, 261872) 1 entries of 3200 used end_pfn_map = 1048576 DMI present. ACPI: RSDP (v000 PTLTD ) @ 0x00000000000f6e60 ACPI: RSDT (v001 PTLTD RSDT 0x06040000 LTP 0x00000000) @ 0x000000003fef024b ACPI: FADT (v001 INTEL 0x06040000 PTL 0x00000003) @ 0x000000003fef5de7 ACPI: SPCR (v001 PTLTD $UCRTBL$ 0x06040000 PTL 0x00000001) @ 0x000000003fef5e5b ACPI: MADT (v001 PTLTD APIC 0x06040000 LTP 0x00000000) @ 0x000000003fef5eab ACPI: BOOT (v001 PTLTD $SBFTBL$ 0x06040000 LTP 0x00000001) @ 0x000000003fef5f05 ACPI: ASF! (v032 MBI 0x06040000 0x00000000) @ 0x000000003fef5f2d ACPI: MCFG (v001 PTLTD MCFG 0x06040000 LTP 0x00000000) @ 0x000000003fef5fc4 ACPI: DSDT (v001 IBM FIREBIRD 0x06040000 MSFT 0x0100000e) @ 0x0000000000000000 No NUMA configuration found Faking a node at 0000000000000000-000000003fef0000 Entering add_active_range(0, 0, 155) 0 entries of 3200 used Entering add_active_range(0, 256, 261872) 1 entries of 3200 used Bootmem setup node 0 0000000000000000-000000003fef0000 Zone PFN ranges: DMA 0 -> 4096 DMA32 4096 -> 1048576 Normal 1048576 -> 1048576 early_node_map[2] active PFN ranges 0: 0 -> 155 0: 256 -> 261872 On node 0 totalpages: 261771 DMA zone: 56 pages used for memmap DMA zone: 9 pages reserved DMA zone: 3930 pages, LIFO batch:0 DMA32 zone: 3524 pages used for memmap DMA32 zone: 254252 pages, LIFO batch:31 Normal zone: 0 pages used for memmap ACPI: PM-Timer IO Port: 0x1008 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 1, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Setting APIC routing to physical flat Using ACPI (MADT) for SMP configuration information Nosave address range: 000000000009b000 - 000000000009c000 Nosave address range: 000000000009c000 - 00000000000a0000 Nosave address range: 00000000000a0000 - 00000000000e4000 Nosave address range: 00000000000e4000 - 0000000000100000 Allocating PCI resources starting at 50000000 (gap: 40000000:bec00000) SMP: Allowing 1 CPUs, 0 hotplug CPUs PERCPU: Allocating 67840 bytes of per cpu data Built 1 zonelists. Total pages: 258182 Kernel command line: ro root=LABEL=/ Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) Checking aperture... Memory: 1021828k/1047488k available (2377k kernel code, 25256k reserved, 1284k data, 332k init) Calibrating delay using timer specific routine.. 7185.52 BogoMIPS (lpj=3592761) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU 0/0 -> Node 0 using mwait in idle threads. CPU: Hyper-Threading is disabled CPU0: Thermal monitoring handled by SMI SMP alternatives: switching to UP code Freeing SMP alternatives: 32k freed ACPI: Core revision 20060707 Using local APIC timer interrupts. result 12468941 Detected 12.468 MHz APIC timer. Brought up 1 CPUs testing NMI watchdog ... OK. Disabling vsyscall due to use of PM timer time.c: Using 3.579545 MHz WALL PM GTOD PM timer. time.c: Detected 3591.068 MHz processor. NET: Registered protocol family 16 No dock devices found. ACPI: bus type pci registered PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved PCI: Not using MMCONFIG. PCI: Using configuration type 1 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) PCI quirk: region 1000-107f claimed by ICH6 ACPI/GPIO/TCO PCI quirk: region 1180-11bf claimed by ICH6 GPIO Boot video device is 0000:01:00.0 PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEG_._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EXP0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIB._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 10 *11 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 10 *11 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 10 11 14 15) *5 ACPI: PCI Interrupt Link [LNKD] (IRQs 3 *10 11 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 *10 11 14 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 13 devices usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default PCI-GART: No AMD northbridge found. PCI: Bridge: 0000:00:01.0 IO window: 3000-3fff MEM window: d8100000-d81fffff PREFETCH window: d0000000-d7ffffff PCI: Bridge: 0000:00:1c.0 IO window: disabled. MEM window: d8000000-d80fffff PREFETCH window: disabled. PCI: Bridge: 0000:00:1e.0 IO window: disabled. MEM window: d8200000-d82fffff PREFETCH window: disabled. ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:00:01.0 to 64 ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1c.0 to 64 PCI: Enabling device 0000:00:1e.0 (0144 -> 0146) PCI: Setting latency timer of device 0000:00:1e.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 6, 262144 bytes) TCP established hash table entries: 131072 (order: 9, 2097152 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered checking if image is initramfs... it is Freeing initrd memory: 1538k freed Simple Boot Flag at 0x36 set to 0x1 audit: initializing netlink socket (disabled) audit(1172091506.301:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) SELinux: Registering netfilter hooks io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) PCI: Setting latency timer of device 0000:00:01.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:01.0:pcie00] PCI: Setting latency timer of device 0000:00:1c.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:1c.0:pcie00] Allocate Port Service[0000:00:1c.0:pcie02] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 ACPI: CPU0 (power states: C1[C1] C3[C3]) ACPI: Processor [CPU0] (supports 8 throttling states) Real Time Clock Driver v1.12ac Non-volatile memory driver v1.2 Linux agpgart interface v0.101 (c) Dave Jones Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A 00:09: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A 00:0a: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH6: IDE controller at PCI slot 0000:00:1f.1 ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18 ICH6: chipset revision 3 ICH6: not 100% native mode: will probe irqs later ide0: BM-DMA at 0x2080-0x2087, BIOS settings: hda:DMA, hdb:pio Probing IDE interface ide0... hda: HL-DT-ST GCE-8483B, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... ide-floppy driver 0.99.newide usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver PNP: PS/2 Controller [PNP0303:KBC,PNP0f13:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice TCP bic registered Initializing XFRM netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 ACPI: (supports S0 S3 S4 S5) Freeing unused kernel memory: 332k freed Write protecting the kernel read-only data: 723k USB Universal Host Controller Interface driver v3.0 ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 23 PCI: Setting latency timer of device 0000:00:1d.0 to 64 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:1d.0: irq 23, io base 0x00001800 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 19 PCI: Setting latency timer of device 0000:00:1d.1 to 64 uhci_hcd 0000:00:1d.1: UHCI Host Controller uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:1d.1: irq 19, io base 0x00002000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 18 PCI: Setting latency timer of device 0000:00:1d.2 to 64 uhci_hcd 0000:00:1d.2: UHCI Host Controller uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1d.2: irq 18, io base 0x00002020 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:00:1d.3 to 64 uhci_hcd 0000:00:1d.3: UHCI Host Controller uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:1d.3: irq 16, io base 0x00002040 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 23 PCI: Setting latency timer of device 0000:00:1d.7 to 64 ehci_hcd 0000:00:1d.7: EHCI Host Controller ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 5 ehci_hcd 0000:00:1d.7: debug port 1 PCI: cache line size of 128 is not supported by device 0000:00:1d.7 ehci_hcd 0000:00:1d.7: irq 23, io mem 0xd8500000 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 8 ports detected input: PS/2 Generic Mouse as /class/input/input0 input: AT Translated Set 2 keyboard as /class/input/input1 SCSI subsystem initialized libata version 2.00 loaded. ahci 0000:00:1f.2: version 2.0 ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19 ahci 0000:00:1f.2: PORTS_IMPL is zero, forcing 0xf PCI: Setting latency timer of device 0000:00:1f.2 to 64 ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl IDE mode ahci 0000:00:1f.2: flags: 64bit ncq pm led pmp slum part ata1: SATA max UDMA/133 cmd 0xFFFFC2000001CD00 ctl 0x0 bmdma 0x0 irq 19 ata2: SATA max UDMA/133 cmd 0xFFFFC2000001CD80 ctl 0x0 bmdma 0x0 irq 19 ata3: SATA max UDMA/133 cmd 0xFFFFC2000001CE00 ctl 0x0 bmdma 0x0 irq 19 ata4: SATA max UDMA/133 cmd 0xFFFFC2000001CE80 ctl 0x0 bmdma 0x0 irq 19 scsi0 : ahci ata1: SATA link down (SStatus 0 SControl 300) scsi1 : ahci ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATA-7, max UDMA/133, 156312576 sectors: LBA ata2.00: ata2: dev 0 multi count 0 ata2.00: configured for UDMA/133 scsi2 : ahci ata3: softreset failed (port busy but CLO unavailable) ata3: softreset failed, retrying in 5 secs ata3: port is slow to respond, please be patient (Status 0x80) ata3: port failed to respond (30 secs, Status 0x80) ata3: COMRESET failed (device not ready) ata3: hardreset failed, retrying in 5 secs ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATAPI, max UDMA/66 ata3.00: configured for UDMA/66 scsi3 : ahci ata4: SATA link down (SStatus 0 SControl 300) scsi 1:0:0:0: Direct-Access ATA Maxtor 6Y080M0 YAR5 PQ: 0 ANSI: 5 SCSI device sda: 156312576 512-byte hdwr sectors (80032 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA SCSI device sda: 156312576 512-byte hdwr sectors (80032 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 > sd 1:0:0:0: Attached scsi disk sda scsi 2:0:0:0: Direct-Access IBM GoVault 0096 PQ: 0 ANSI: 5 SCSI device sdb: 156300464 512-byte hdwr sectors (80026 MB) sdb: Write Protect is off sdb: Mode Sense: 00 74 00 00 sdb: cache data unavailable sdb: assuming drive cache: write through SCSI device sdb: 156300464 512-byte hdwr sectors (80026 MB) sdb: Write Protect is off sdb: Mode Sense: 00 74 00 00 sdb: cache data unavailable sdb: assuming drive cache: write through sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 > sd 2:0:0:0: Attached scsi removable disk sdb kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. audit(1172091564.004:2): enforcing=1 old_enforcing=0 auid=4294967295 security: 3 users, 6 roles, 1587 types, 172 bools, 1 sens, 1024 cats security: 59 classes, 49762 rules security: class dccp_socket not defined in policy security: permission dccp_recv in class node not defined in policy security: permission dccp_send in class node not defined in policy security: permission dccp_recv in class netif not defined in policy security: permission dccp_send in class netif not defined in policy SELinux: Completing initialization. SELinux: Setting up existing superblocks. SELinux: initialized (dev sda8, type ext3), uses xattr SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs SELinux: initialized (dev hugetlbfs, type hugetlbfs), uses genfs_contexts SELinux: initialized (dev devpts, type devpts), uses transition SIDs SELinux: initialized (dev eventpollfs, type eventpollfs), uses task SIDs SELinux: initialized (dev inotifyfs, type inotifyfs), uses genfs_contexts SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs SELinux: initialized (dev futexfs, type futexfs), uses genfs_contexts SELinux: initialized (dev pipefs, type pipefs), uses task SIDs SELinux: initialized (dev sockfs, type sockfs), uses task SIDs SELinux: initialized (dev cpuset, type cpuset), not configured for labeling SELinux: initialized (dev proc, type proc), uses genfs_contexts SELinux: initialized (dev bdev, type bdev), uses genfs_contexts SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts audit(1172091564.636:3): policy loaded auid=4294967295 tg3.c:v3.72 (January 8, 2007) ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:02:00.0 to 64 eth0: Tigon3 [partno(BCM95721) rev 4100 PHY(5750)] (PCI Express) 10/100/1000Base-T Ethernet 00:0d:60:b7:76:b3 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth0: dma_rwctrl[76180000] dma_mask[64-bit] ACPI: PCI Interrupt 0000:00:1f.3[B] -> GSI 19 (level, low) -> IRQ 19 input: PC Speaker as /class/input/input2 parport: PnPBIOS parport detected. parport0: PC-style at 0x378 (0x778), irq 7 [PCSPP,TRISTATE,EPP] intel_rng: FWH not detected hda: ATAPI 48X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 sd 1:0:0:0: Attached scsi generic sg0 type 0 sd 2:0:0:0: Attached scsi generic sg1 type 0 PCI: Enabling device 0000:00:1e.2 (0005 -> 0007) ACPI: PCI Interrupt 0000:00:1e.2[A] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1e.2 to 64 intel8x0_measure_ac97_clock: measured 50307 usecs intel8x0: clocking to 48000 FDC 0 is a National Semiconductor PC87306 lp0: using parport0 (interrupt-driven). lp0: console ready input: Power Button (FF) as /class/input/input3 ACPI: Power Button (FF) [PWRF] input: Power Button (CM) as /class/input/input4 ACPI: Power Button (CM) [PWRB] ibm_acpi: ec object not found md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com EXT3 FS on sda8, internal journal SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs SELinux: initialized (dev sda1, type ext2), uses xattr kjournald starting. Commit interval 5 seconds EXT3 FS on sda2, internal journal EXT3-fs: mounted filesystem with ordered data mode. SELinux: initialized (dev sda2, type ext3), uses xattr Adding 1028120k swap on /dev/sda12. Priority:-1 extents:1 across:1028120k SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts IA-32 Microcode Update Driver: v1.14a <tigran@aivazian.fsnet.co.uk> NET: Registered protocol family 10 lo: Disabled Privacy Extensions process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.lo.retrans_time; Use net.ipv6.neigh.lo.retrans_time_ms instead. ADDRCONF(NETDEV_UP): eth0: link is not ready tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is on for TX and on for RX. ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready audit(1172091587.001:4): audit_pid=2126 old=0 by auid=4294967295 subj=system_u:system_r:auditd_t:s0 SELinux: initialized (dev rpc_pipefs, type rpc_pipefs), uses genfs_contexts Bluetooth: Core ver 2.11 NET: Registered protocol family 31 Bluetooth: HCI device and connection manager initialized Bluetooth: HCI socket layer initialized Bluetooth: L2CAP ver 2.8 Bluetooth: L2CAP socket layer initialized Bluetooth: RFCOMM socket layer initialized Bluetooth: RFCOMM TTY layer initialized Bluetooth: RFCOMM ver 1.8 Bluetooth: HIDP (Human Interface Emulation) ver 1.1 SELinux: initialized (dev autofs, type autofs), uses genfs_contexts SELinux: initialized (dev autofs, type autofs), uses genfs_contexts SELinux: initialized (dev autofs, type autofs), uses genfs_contexts eth0: no IPv6 routers present [drm] Initialized drm 1.1.0 20060810 ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 [drm] Initialized radeon 1.25.0 20060524 on minor 0 [drm] Setting GART location based on new memory map [drm] Loading R300 Microcode [drm] writeback test succeeded in 1 usecs ======================== 2.6.20-rc7 dmesg - end ======================== =============================== 2.6.20-rc6 debug output - start =============================== ... libata version 2.00 loaded. ahci 0000:00:1f.2: version 2.0 ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19 ahci 0000:00:1f.2: PORTS_IMPL is zero, forcing 0xf XXX - ~PORT_CMD_FIS_RX in ahci_stop_fix_rx XXX - ~PORT_CMD_FIS_RX in ahci_stop_fix_rx XXX - ~PORT_CMD_FIS_RX in ahci_stop_fix_rx XXX - ~PORT_CMD_FIS_RX in ahci_stop_fix_rx PCI: Setting latency timer of device 0000:00:1f.2 to 64 ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl IDE mode ahci 0000:00:1f.2: flags: 64bit ncq pm led pmp slum part ata1: XXX - in ata_device_add before port_start ata1: XXX entering ahci_port_start status=0x7f BSY=0 DRQ=1 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 ata1: XXX in ahci_port_start before ahci_power_up status=0x7f BSY=0 DRQ=1 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_ICC_ACTIVE in ahci_power_up ata1: XXX in ahci_port_start after ahci_power_up status=0x7f BSY=0 DRQ=1 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_FIS_RX in ahci_start_fis_rx ata1: XXX in ahci_init_port after ahci_start_fis_rx status=0x7f BSY=0 DRQ=1 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata1: XXX in ahci_init_port after ahci_start_engine status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata1: XXX in ahci_port_start after ahci_init_port status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata1: XXX - in ata_device_add after port_start ata1: SATA max UDMA/133 cmd 0xFFFFC2000001CD00 ctl 0x0 bmdma 0x0 irq 19 ata1: XXX in ahci_freeze before turn IRQ off status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata1: XXX in ahci_freeze after turn IRQ off status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata1: XXX ata_device_add after ata_eh_freeze_port: status=0x7f BSY=0 DRQ=1 ata2: XXX - in ata_device_add before port_start ata2: XXX entering ahci_port_start status=0x80 BSY=1 DRQ=0 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 ata2: XXX in ahci_port_start before ahci_power_up status=0x80 BSY=1 DRQ=0 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_ICC_ACTIVE in ahci_power_up ata2: XXX in ahci_port_start after ahci_power_up status=0x80 BSY=1 DRQ=0 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_FIS_RX in ahci_start_fis_rx ata2: XXX in ahci_init_port after ahci_start_fis_rx status=0x50 BSY=0 DRQ=0 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata2: XXX in ahci_init_port after ahci_start_engine status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata2: XXX in ahci_port_start after ahci_init_port status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata2: XXX - in ata_device_add after port_start ata2: SATA max UDMA/133 cmd 0xFFFFC2000001CD80 ctl 0x0 bmdma 0x0 irq 19 ata2: XXX in ahci_freeze before turn IRQ off status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata2: XXX in ahci_freeze after turn IRQ off status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata2: XXX ata_device_add after ata_eh_freeze_port: status=0x50 BSY=0 DRQ=0 ata3: XXX - in ata_device_add before port_start ata3: XXX entering ahci_port_start status=0x80 BSY=1 DRQ=0 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 ata3: XXX in ahci_port_start before ahci_power_up status=0x80 BSY=1 DRQ=0 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_ICC_ACTIVE in ahci_power_up ata3: XXX in ahci_port_start after ahci_power_up status=0x80 BSY=1 DRQ=0 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_FIS_RX in ahci_start_fis_rx ata3: XXX in ahci_init_port after ahci_start_fis_rx status=0x80 BSY=1 DRQ=0 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata3: XXX in ahci_init_port after ahci_start_engine status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX in ahci_port_start after ahci_init_port status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX - in ata_device_add after port_start ata3: SATA max UDMA/133 cmd 0xFFFFC2000001CE00 ctl 0x0 bmdma 0x0 irq 19 ata3: XXX in ahci_freeze before turn IRQ off status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX in ahci_freeze after turn IRQ off status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX ata_device_add after ata_eh_freeze_port: status=0x80 BSY=1 DRQ=0 ata4: XXX - in ata_device_add before port_start ata4: XXX entering ahci_port_start status=0x7f BSY=0 DRQ=1 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 ata4: XXX in ahci_port_start before ahci_power_up status=0x7f BSY=0 DRQ=1 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_ICC_ACTIVE in ahci_power_up ata4: XXX in ahci_port_start after ahci_power_up status=0x7f BSY=0 DRQ=1 CR=0 FR=0 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_FIS_RX in ahci_start_fis_rx ata4: XXX in ahci_init_port after ahci_start_fis_rx status=0x7f BSY=0 DRQ=1 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata4: XXX in ahci_init_port after ahci_start_engine status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata4: XXX in ahci_port_start after ahci_init_port status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata4: XXX - in ata_device_add after port_start ata4: SATA max UDMA/133 cmd 0xFFFFC2000001CE80 ctl 0x0 bmdma 0x0 irq 19 ata4: XXX in ahci_freeze before turn IRQ off status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata4: XXX in ahci_freeze after turn IRQ off status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata4: XXX ata_device_add after ata_eh_freeze_port: status=0x7f BSY=0 DRQ=1 ata1: XXX ata_device_add before sata_scr_read: status=0x7f BSY=0 DRQ=1 ata1: XXX ata_device_add before scsi_add_host: status=0x7f BSY=0 DRQ=1 scsi0 : ahci ata1: XXX ata_device_add before scheduleing eh: status=0x7f BSY=0 DRQ=1 ata1: XXX entering ahci_error_handler: status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata1: XXX in ahci_freeze before turn IRQ off status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata1: XXX in ahci_freeze after turn IRQ off status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata1: XXX before ata_port_offline status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata1: SATA link down (SStatus 0 SControl 300) XXX - ~PORT_CMD_ATAPI in ahci_postreset ata2: XXX ata_device_add before sata_scr_read: status=0x50 BSY=0 DRQ=0 ata2: XXX ata_device_add before scsi_add_host: status=0x50 BSY=0 DRQ=0 scsi1 : ahci ata2: XXX ata_device_add before scheduleing eh: status=0x50 BSY=0 DRQ=0 ata2: XXX entering ahci_error_handler: status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata2: XXX in ahci_freeze before turn IRQ off status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata2: XXX in ahci_freeze after turn IRQ off status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata2: XXX before ata_port_offline status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata2: XXX before stop_engine status=0x50 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - ~PORT_CMD_START in ahci_stop_engine ata2: XXX before BSY/DRQ check status=0x50 BSY=0 DRQ=0 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) XXX - ~PORT_CMD_ATAPI in ahci_postreset ata2.00: ATA-7, max UDMA/133, 156312576 sectors: LBA ata2.00: ata2: dev 0 multi count 0 ata2.00: configured for UDMA/133 ata3: XXX ata_device_add before sata_scr_read: status=0x80 BSY=1 DRQ=0 ata3: XXX ata_device_add before scsi_add_host: status=0x80 BSY=1 DRQ=0 scsi2 : ahci ata3: XXX ata_device_add before scheduleing eh: status=0x80 BSY=1 DRQ=0 ata3: XXX entering ahci_error_handler: status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX in ahci_freeze before turn IRQ off status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX in ahci_freeze after turn IRQ off status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX before ata_port_offline status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX before stop_engine status=0x80 BSY=1 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - ~PORT_CMD_START in ahci_stop_engine ata3: XXX before BSY/DRQ check status=0x80 BSY=1 DRQ=0 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata3: softreset failed (port busy but CLO unavailable) ata3: softreset failed, retrying in 5 secs XXX - ~PORT_CMD_START in ahci_stop_engine ata3: port is slow to respond, please be patient (Status 0x80) ata3: port failed to respond (30 secs, Status 0x80) ata3: COMRESET failed (device not ready) XXX - PORT_CMD_START in ahci_start_engine ata3: hardreset failed, retrying in 5 secs XXX - ~PORT_CMD_START in ahci_stop_engine XXX - PORT_CMD_START in ahci_start_engine ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) XXX - PORT_CMD_ATAPI in ahci_postreset ata3.00: ATAPI, max UDMA/66 ata3.00: configured for UDMA/66 ata4: XXX ata_device_add before sata_scr_read: status=0x7f BSY=0 DRQ=1 ata4: XXX ata_device_add before scsi_add_host: status=0x7f BSY=0 DRQ=1 scsi3 : ahci ata4: XXX ata_device_add before scheduleing eh: status=0x7f BSY=0 DRQ=1 ata4: XXX entering ahci_error_handler: status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata4: XXX in ahci_freeze before turn IRQ off status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata4: XXX in ahci_freeze after turn IRQ off status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata4: XXX before ata_port_offline status=0x7f BSY=0 DRQ=1 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata4: SATA link down (SStatus 0 SControl 300) XXX - ~PORT_CMD_ATAPI in ahci_postreset scsi 1:0:0:0: Direct-Access ATA Maxtor 6Y080M0 YAR5 PQ: 0 ANSI: 5 SCSI device sda: 156312576 512-byte hdwr sectors (80032 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA SCSI device sda: 156312576 512-byte hdwr sectors (80032 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 > sd 1:0:0:0: Attached scsi disk sda scsi 2:0:0:0: Direct-Access IBM GoVault 0096 PQ: 0 ANSI: 5 ata3: XXX entering ahci_error_handler: status=0x51 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x1 XXX - ~PORT_CMD_START in ahci_stop_engine ata3: XXX in ahci_error_handler after ahci_stop_engine status=0x51 BSY=0 DRQ=0 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata3: XXX in ahci_error_handler after ahci_start_engine status=0x51 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 ata3: XXX entering ahci_error_handler: status=0x51 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x1 XXX - ~PORT_CMD_START in ahci_stop_engine ata3: XXX in ahci_error_handler after ahci_stop_engine status=0x51 BSY=0 DRQ=0 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata3: XXX in ahci_error_handler after ahci_start_engine status=0x51 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 SCSI device sdb: 156300464 512-byte hdwr sectors (80026 MB) sdb: Write Protect is off sdb: Mode Sense: 00 74 00 00 ata3: XXX entering ahci_error_handler: status=0x51 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x1 XXX - ~PORT_CMD_START in ahci_stop_engine ata3: XXX in ahci_error_handler after ahci_stop_engine status=0x51 BSY=0 DRQ=0 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata3: XXX in ahci_error_handler after ahci_start_engine status=0x51 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 sdb: cache data unavailable sdb: assuming drive cache: write through SCSI device sdb: 156300464 512-byte hdwr sectors (80026 MB) sdb: Write Protect is off sdb: Mode Sense: 00 74 00 00 ata3: XXX entering ahci_error_handler: status=0x51 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x1 XXX - ~PORT_CMD_START in ahci_stop_engine ata3: XXX in ahci_error_handler after ahci_stop_engine status=0x51 BSY=0 DRQ=0 CR=0 FR=1 ICC=0 CCS=0x0, CI=0x0 XXX - PORT_CMD_START in ahci_start_engine ata3: XXX in ahci_error_handler after ahci_start_engine status=0x51 BSY=0 DRQ=0 CR=1 FR=1 ICC=0 CCS=0x0, CI=0x0 sdb: cache data unavailable sdb: assuming drive cache: write through sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 > sd 2:0:0:0: Attached scsi removable disk sdb ... =============================== 2.6.20-rc6 debug output - end =============================== ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2007-02-21 12:40 ` Tejun Heo 2007-02-22 0:41 ` Gary Hade @ 2007-02-23 0:32 ` Gary Hade 1 sibling, 0 replies; 29+ messages in thread From: Gary Hade @ 2007-02-23 0:32 UTC (permalink / raw) To: Tejun Heo; +Cc: Gary Hade, Kovid Goyal, linux-ide, lcm, Jeff Garzik, konradr On Wed, Feb 21, 2007 at 09:40:10PM +0900, Tejun Heo wrote: > Hello, Gary. > > Gary Hade wrote: > > I instrumented the code and found that for the SATA hard drive BSY was set > > just before the call to ahci_init_port() from ahci_port_start() and clear > > after the return from ahci_init_port(). For the GoVault BSY was still set > > after the return from ahci_init_port() and remained set for almost 2 seconds. > > > > The below patch which gives BSY some extra time to clear repairs the problem. > > Unlike the extra delay for ata-piix needed by GoVault I believe this delay > > will only be seen for attached devices that need it. Please let me know > > what you think. > > Can you post full dmesg without the patch? And which controller are you > using (lspci -nn please)? How come it doesn't support CLO? > > I don't think the patch is a good idea. !BSY wait before reset is > supposed to be done in ->prereset() if possible. ahci skips that step > because waiting for !BSY isn't reliable after hotplug and it often > wastes 30secs for no good reason after an hotplug event. > > Even if you need to add that specific shorter wait, the correct place > would be ->prereset() not at the end of ->port_start(). > > I don't think fixing here and there for GoVault drive is a good idea. > It's not gonna be tested widely and GoValut would end up working on some > controllers while broken on others. I think recently posted EH > improvements and faster reset patches should help here. The first reset > will fail timely and all in all the drive should be detected in slightly > over ten secs, which isn't that bad and actually is okay if parallel > probing is implemented. > > I dunno. Maybe the correct thing to do is wait briefly for !BSY in > std_prereset() when SKIP_D2H_BSY is set. Tejun, I will be on vacation for a few days (returning 3/1) so I thought I would go ahead and provide the below patch before you have had a chance to review the information I sent you yesterday and actually confirm that it still looks like a viable solution. Thanks, Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc We encountered a problem where the BSY status bit is still set on entry to the 'ahci' error handler during initialization of the Quantum GoVault when attached to an ICH6R/ICH6RW controller. This caused a software reset failure due to failed BSY/DRQ check in ahci_softreset() forcing a hard reset with the following messages logged. ata1: softreset failed (port busy but CLO unavailable) ata1: softreset failed, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0x80) ata1: port failed to respond (30 secs, Status 0x80) ata1: COMRESET failed (device not ready) ata1: hardreset failed, retrying in 5 secs It was taking almost 2 seconds for BSY to clear following the return from ahci_init_port() in ahci_port_start() so this patch gives BSY up to 3 seconds extra time to clear during prereset eliminating the problem. Signed-off-by: Gary Hade <garyhade@us.ibm.com> --- linux-2.6.20-rc7/drivers/ata/libata-core.c.orig 2007-02-22 11:52:34.000000000 -0800 +++ linux-2.6.20-rc7/drivers/ata/libata-core.c 2007-02-22 14:59:44.000000000 -0800 @@ -2915,6 +2915,8 @@ int ata_std_prereset(struct ata_port *ap struct ata_eh_context *ehc = &ap->eh_context; const unsigned long *timing = sata_ehc_deb_timing(ehc); int rc; + u8 status; + unsigned long timeout; /* handle link resume & hotplug spinup */ if ((ehc->i.flags & ATA_EHI_RESUME_LINK) && @@ -2942,9 +2944,19 @@ int ata_std_prereset(struct ata_port *ap /* Wait for !BSY if the controller can wait for the first D2H * Reg FIS and we don't know that no device is attached. + * For other controllers a brief wait (up to 3 secs) may be + * needed for some devices. */ if (!(ap->flags & ATA_FLAG_SKIP_D2H_BSY) && !ata_port_offline(ap)) ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT); + else { + timeout = jiffies + 3*HZ; + status = ata_busy_wait(ap, ATA_BUSY, 300); + while ((status & ATA_BUSY) && time_before(jiffies, timeout)) { + msleep(50); + status = ata_busy_wait(ap, ATA_BUSY, 3); + } + } return 0; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W) 2006-12-11 18:03 [2.6.18,19] SATA boot problems (ICH6/ICH6W) Kovid Goyal 2006-12-20 0:44 ` Tejun Heo @ 2007-01-23 21:49 ` danieljzhang 1 sibling, 0 replies; 29+ messages in thread From: danieljzhang @ 2007-01-23 21:49 UTC (permalink / raw) To: linux-ide Need help to rebuild new kernel: I dont know if this is the same problem. Please do help. I am trying to build kernel linux-2.6.20-rc2 with patch gregkh-all-2.6.20-rc2.patch But system hanged when boot. The original working kernel is: 2.6.13-15.13-smp. The distribution is SUSE. According to YaST2, Disk is ST380013AS, Device Name is /dev/sda Driver is ata_piix. I used the original .config but add sata as module, usb as kernel build-in. In /etc/sysconfig/kernel, I add ahci module. Right now INITRD_MODULES="ahci piix ata_piix reiserfs processor thremal fan" When booting, there is no FATAL error for loading module. I built in the USB module as kernel. I still can get keyboard (USB keyboard). Then it shows: "Waiting for device /dev/sda2 to appear: ........ resume device /dev/sda2 not found (ignoring) Loading reiserfs Waiting for device /dev/sda3 to appear: ..........not found -- exiting to /bin/sh I got shell promt "$" anyway. But just pwd cd works. But I can not find solution after googling and bothering linux guy around. Please help! I was told that there is a new SATA driver that I need to build and load. Anyone can kindly send me a link? Help, please! -- View this message in context: http://www.nabble.com/-2.6.18%2C19--SATA-boot-problems-%28ICH6-ICH6W%29-tf2795723.html#a8550219 Sent from the linux-ide mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2007-02-23 0:32 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-12-11 18:03 [2.6.18,19] SATA boot problems (ICH6/ICH6W) Kovid Goyal 2006-12-20 0:44 ` Tejun Heo 2006-12-20 2:00 ` Kovid Goyal 2006-12-20 2:13 ` Tejun Heo 2006-12-20 4:56 ` Kovid Goyal 2007-01-11 23:32 ` Kovid Goyal 2007-01-13 2:19 ` Tejun Heo 2006-12-20 3:29 ` Gary Hade 2006-12-20 3:53 ` Tejun Heo 2006-12-20 4:30 ` Tejun Heo 2006-12-21 17:10 ` Gary Hade 2007-01-30 1:55 ` Gary Hade 2007-01-30 7:32 ` Tejun Heo 2007-01-30 23:37 ` Gary Hade 2007-01-31 0:54 ` Jeff Garzik 2007-01-31 11:00 ` Tejun Heo 2007-01-31 12:20 ` Alan 2007-01-31 13:16 ` Tejun Heo 2007-01-31 15:24 ` Jeff Garzik 2007-01-31 15:30 ` Mark Lord 2007-01-31 10:44 ` Tejun Heo 2007-01-31 10:47 ` Jeff Garzik 2007-01-31 11:00 ` Tejun Heo 2007-02-01 0:49 ` Gary Hade 2007-02-17 0:34 ` Gary Hade 2007-02-21 12:40 ` Tejun Heo 2007-02-22 0:41 ` Gary Hade 2007-02-23 0:32 ` Gary Hade 2007-01-23 21:49 ` danieljzhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).