* [libata/sata_sil] Error on startup [not found] <c58a7a270809120334r16164e3ej9f8ab769a09742ed@mail.gmail.com> @ 2008-09-12 11:27 ` Alex Gonzalez 2008-09-13 12:55 ` Sergei Shtylyov 0 siblings, 1 reply; 12+ messages in thread From: Alex Gonzalez @ 2008-09-12 11:27 UTC (permalink / raw) To: linux-ide Hi, I am seeing a problem with libata and the sata_sil driver. I was redirected here from the lkml so I hope it's the correct list for this. I am running a 2.6.21 SMP mips kernel and on some occasions the kernel halts on startup. This only happens at boot time, that is, once the system is running it never exhibits the problem. The problem occurs when trying to read the partitions from the SATA drive, on the first call to read_dev_sector(). As I understand it, the code finishes doing a sync_page() which starts an asynchronous I/O operation and sleeps until awaken by an interrupt generated by the SATA device. I have followed the code through until the call to ata_bmdma_start(), where it does: iowrite8(dmactl | ATA_DMA_START, ap->ioaddr.bmdma_addr + ATA_DMA_CMD); On a working case, the device interrupts and the flow continues. On a non working case, there is a delay of 5 or 6 seconds, and then a series of calls to sil_freeze(), ata_bmdma_error_handler() and ata_bmdma_stop(). I am about to delve into the ATA specs to try to understand this interaction, but has anybody seen this behavior before? Please CC me on replies. Regards, Alex ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-12 11:27 ` [libata/sata_sil] Error on startup Alex Gonzalez @ 2008-09-13 12:55 ` Sergei Shtylyov 2008-09-13 17:23 ` Alex Gonzalez 0 siblings, 1 reply; 12+ messages in thread From: Sergei Shtylyov @ 2008-09-13 12:55 UTC (permalink / raw) To: Alex Gonzalez; +Cc: linux-ide Hello. Alex Gonzalez wrote: > I am seeing a problem with libata and the sata_sil driver. I was > redirected here from the lkml so I hope it's the correct list for > this. > > I am running a 2.6.21 SMP mips kernel and on some occasions the kernel > halts on startup. This only happens at boot time, that is, once the > system is running it never exhibits the problem. > > The problem occurs when trying to read the partitions from the SATA > drive, on the first call to read_dev_sector(). As I understand it, the > code finishes doing a sync_page() which starts an asynchronous I/O > operation and sleeps until awaken by an interrupt generated by the > SATA device. > > I have followed the code through until the call to ata_bmdma_start(), > where it does: > > iowrite8(dmactl | ATA_DMA_START, ap->ioaddr.bmdma_addr + ATA_DMA_CMD); > > On a working case, the device interrupts and the flow continues. On a > non working case, there is a delay of 5 or 6 seconds, and then a > series of calls to sil_freeze(), ata_bmdma_error_handler() and ata_bmdma_stop(). > > I am about to delve into the ATA specs to try to understand this > ATA spec. itself won't avail here, it doesn't cover host's DMA support. You probably want to read the part concerning "PCI Compatibility and PCI-Native Mode Bus Master Adapters" pf this document: http://www.t13.org/Documents/UploadedDocuments/project/d1510r1-Host-Adapter.pdf > interaction, but has anybody seen this behavior before? Please CC me > on replies. > Generally, DMA timeouts are seen quite frequently... MBR, Sergei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-13 12:55 ` Sergei Shtylyov @ 2008-09-13 17:23 ` Alex Gonzalez 2008-09-15 8:57 ` Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Alex Gonzalez @ 2008-09-13 17:23 UTC (permalink / raw) To: Sergei Shtylyov; +Cc: linux-ide Thanks for the tip Sergei. The timeout in question lasts several seconds and is unrecoverable, finishing with a bus error. I could be looking at a hardware problem, but I have tried different drives, cables and I even see it on different units. And most of the times everything works fine. Regards, Alex On Sat, Sep 13, 2008 at 1:55 PM, Sergei Shtylyov <sshtylyov@ru.mvista.com> wrote: > Hello. > > Alex Gonzalez wrote: >> >> I am seeing a problem with libata and the sata_sil driver. I was >> redirected here from the lkml so I hope it's the correct list for >> this. >> >> I am running a 2.6.21 SMP mips kernel and on some occasions the kernel >> halts on startup. This only happens at boot time, that is, once the >> system is running it never exhibits the problem. >> >> The problem occurs when trying to read the partitions from the SATA >> drive, on the first call to read_dev_sector(). As I understand it, the >> code finishes doing a sync_page() which starts an asynchronous I/O >> operation and sleeps until awaken by an interrupt generated by the >> SATA device. >> >> I have followed the code through until the call to ata_bmdma_start(), >> where it does: >> >> iowrite8(dmactl | ATA_DMA_START, ap->ioaddr.bmdma_addr + ATA_DMA_CMD); >> >> On a working case, the device interrupts and the flow continues. On a >> non working case, there is a delay of 5 or 6 seconds, and then a >> series of calls to sil_freeze(), ata_bmdma_error_handler() and >> ata_bmdma_stop(). >> >> I am about to delve into the ATA specs to try to understand this >> > > ATA spec. itself won't avail here, it doesn't cover host's DMA support. You > probably want to read the part concerning "PCI Compatibility and PCI-Native > Mode Bus Master Adapters" pf this document: > > http://www.t13.org/Documents/UploadedDocuments/project/d1510r1-Host-Adapter.pdf > >> interaction, but has anybody seen this behavior before? Please CC me >> on replies. >> > > Generally, DMA timeouts are seen quite frequently... > > MBR, Sergei > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-13 17:23 ` Alex Gonzalez @ 2008-09-15 8:57 ` Tejun Heo 2008-09-15 9:49 ` Alex Gonzalez 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2008-09-15 8:57 UTC (permalink / raw) To: Alex Gonzalez; +Cc: Sergei Shtylyov, linux-ide Alex Gonzalez wrote: > Thanks for the tip Sergei. The timeout in question lasts several > seconds and is unrecoverable, finishing with a bus error. I could be > looking at a hardware problem, but I have tried different drives, > cables and I even see it on different units. > > And most of the times everything works fine. Care to post the kernel log, preferably with timestamps? -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-15 8:57 ` Tejun Heo @ 2008-09-15 9:49 ` Alex Gonzalez 2008-09-15 10:10 ` Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Alex Gonzalez @ 2008-09-15 9:49 UTC (permalink / raw) To: Tejun Heo; +Cc: Sergei Shtylyov, linux-ide Of course, An extract of the working kernel log, [4294670.795000] 0:<6>Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 [4294670.796000] 0:<6>ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx [4294670.797000] 0:Initializing Phoenix PCMCIA IDE... [4294670.798000] 0:Phoenix PCMCIA configured as IDE interface = 0 [4294670.799000] 0:<7>Probing IDE interface ide0... [4294671.058000] 0:hda: CF CARD 2GB, 0:CFA DISK drive [4294671.672000] 0:ide0 at 0xbd0001f1-0xbd0001f8,0xbd0003f7 on irq 210: [4294671.674000] 0:<6>hda: max request size: 128KiB [4294671.675000] 0:<6>hda: 3964464 sectors (2029 MB)0: w/1KiB Cache0:, CHS=3933/16/630: [4294671.679000] 0:<6> hda:0: hda10: [4294671.721000] 0:<7>sata_sil 0000:00:01.0: version 2.1 [4294671.722000] 0:<6>sata_sil 0000:00:01.0: Applying R_ERR on DMA activate FIS errata fix [4294671.723000] 0:<6>ata1: SATA max UDMA/100 cmd 0xc0000080 ctl 0xc000008a bmdma 0xc0000000 irq 24 [4294671.724000] 0:<6>ata2: SATA max UDMA/100 cmd 0xc00000c0 ctl 0xc00000ca bmdma 0xc0000008 irq 24 [4294671.725000] 0:<6>scsi0 : sata_sil [4294672.180000] 3:<6>ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [4294672.205000] 3:<6>ata1.00: ATA-7: ST3160815AS, 4.AAB, max UDMA/133 [4294672.223000] 3:<6>ata1.00: 312581808 sectors, multi 0: LBA48 NCQ (depth 0/32) [4294672.248000] 3:<6>ata1.00: configured for UDMA/100 [4294672.262000] 0:<6>scsi1 : sata_sil [4294672.567000] 3:<6>ata2: SATA link down (SStatus 0 SControl 310) [4294672.585000] 0:<5>scsi 0:0:0:0: Direct-Access ATA ST3160815AS 4.AA PQ: 0 ANSI: 5 [4294672.586000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [4294672.587000] 0:<5>sda: Write Protect is off [4294672.588000] 0:<7>sda: Mode Sense: 00 3a 00 00 [4294672.589000] 0:<5>SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA [4294672.590000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [4294672.591000] 0:<5>sda: Write Protect is off [4294672.592000] 0:<7>sda: Mode Sense: 00 3a 00 00 [4294672.593000] 0:<5>SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA [4294672.594000] 0:<6> sda:0: sda10: sda20: sda30: sda40: [4294672.624000] 0:<5>sd 0:0:0:0: Attached scsi disk sda And the non working log, [4294670.790000] 0:<6>Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 [4294670.791000] 0:<6>ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx [4294670.792000] 0:Initializing Phoenix PCMCIA IDE... [4294670.793000] 0:Phoenix PCMCIA configured as IDE interface = 0 [4294670.794000] 0:<7>Probing IDE interface ide0... [4294671.053000] 0:hda: CF CARD 2GB, 0:CFA DISK drive [4294671.667000] 0:ide0 at 0xbd0001f1-0xbd0001f8,0xbd0003f7 on irq 210: [4294671.669000] 0:<6>hda: max request size: 128KiB [4294671.670000] 0:<6>hda: 3964464 sectors (2029 MB)0: w/1KiB Cache0:, CHS=3933/16/630: [4294671.674000] 0:<6> hda:0: hda10: [4294671.680000] 0:<7>sata_sil 0000:00:01.0: version 2.1 [4294671.681000] 0:<6>sata_sil 0000:00:01.0: Applying R_ERR on DMA activate FIS errata fix [4294671.682000] 0:<6>ata1: SATA max UDMA/100 cmd 0xc0000080 ctl 0xc000008a bmdma 0xc0000000 irq 24 [4294671.683000] 0:<6>ata2: SATA max UDMA/100 cmd 0xc00000c0 ctl 0xc00000ca bmdma 0xc0000008 irq 24 [4294671.684000] 0:<6>scsi0 : sata_sil [4294672.142000] 2:<6>ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [4294672.165000] 2:<6>ata1.00: ATA-7: ST3160815AS, 4.AAB, max UDMA/133 [4294672.183000] 2:<6>ata1.00: 312581808 sectors, multi 0: LBA48 NCQ (depth 0/32) [4294672.208000] 2:<6>ata1.00: configured for UDMA/100 [4294672.222000] 0:<6>scsi1 : sata_sil [4294672.526000] 2:<6>ata2: SATA link down (SStatus 0 SControl 310) [4294672.544000] 0:<5>scsi 0:0:0:0: Direct-Access ATA ST3160815AS 4.AA PQ: 0 ANSI: 5 [4294672.545000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [4294672.546000] 0:<5>sda: Write Protect is off [4294672.547000] 0:<7>sda: Mode Sense: 00 3a 00 00 [4294672.548000] 0:<5>SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA [4294672.549000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [4294672.550000] 0:<5>sda: Write Protect is off [4294672.551000] 0:<7>sda: Mode Sense: 00 3a 00 00 [4294672.552000] 0:<5>SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA [4294672.553000] 0:<6> sda: < Added: Big pause around 10 secs> ********************************************* cpu_0 received a bus/cache error ********************************************* Bridge: Phys Addr = 0x0000000000, Device_AERR = 0x00000000 Bridge: The devices reporting AERR are: CPU: (XLR specific) Cache Error log = 0x0000007800004601, Phy Addr = 0x00f0000088 CPU: epc = 0x8340e324, errorepc = 0x835bccd4, cacheerr = 0x00000000 Can not handle bus/cache error - Halting cpu With some printks to follow the action: :<6> sda:0:msdos_partition: sector_size=1 0:read_dev_sector: mapping = 87946cac 0:__read_cache_page:cached_page=83a15580 0:read_cache_page: page=83a15580 0:lock_page 0:sync_page on mm/filemap.c 0:block_sync_page 0:blk_run_backing_dev() 0:blk_backing_dev_unplug() 0:scsi_request_fn 0:scsi_dispatch_cmd 0:ata_scsi_queuecmd 0:ata_scsi_translate 0:***ata_qc_prep*** 0:***ata_check_status*** 0:***ata_std_dev_select*** 0:***ata_check_status*** 0:***ata_check_status*** 0:***ata_tf_load 0:***ata_bmdma_setup*** 0:***ata_exec_command*** 0:***ata_bmdma_start*** 0:Dispatched cmd 8b64eba0 with rth 0 <PAUSE> 0:***sil_freeze()*** 0:***ata_bmdma_error_handler*** 0:***ata_bmdma_status*** 0:***ata_bmdma_stop*** ********************************************* cpu_0 received a bus/cache error ********************************************* Bridge: Phys Addr = 0x0000000000, Device_AERR = 0x00000000 Bridge: The devices reporting AERR are: CPU: (XLR specific) Cache Error log = 0x0000007800004601, Phy Addr = 0x00f0000088 CPU: epc = 0x83435dfc, errorepc = 0x835bf054, cacheerr = 0x00000000 Can not handle bus/cache error - Halting cpu Please tell me if a full trace would be better than these extracts, Regards, Alex On Mon, Sep 15, 2008 at 9:57 AM, Tejun Heo <tj@kernel.org> wrote: > Alex Gonzalez wrote: >> Thanks for the tip Sergei. The timeout in question lasts several >> seconds and is unrecoverable, finishing with a bus error. I could be >> looking at a hardware problem, but I have tried different drives, >> cables and I even see it on different units. >> >> And most of the times everything works fine. > > Care to post the kernel log, preferably with timestamps? > > -- > tejun > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-15 9:49 ` Alex Gonzalez @ 2008-09-15 10:10 ` Tejun Heo 2008-09-15 10:20 ` Alex Gonzalez 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2008-09-15 10:10 UTC (permalink / raw) To: Alex Gonzalez; +Cc: Sergei Shtylyov, linux-ide Hello, Alex Gonzalez wrote: > :<6> sda:0:msdos_partition: sector_size=1 > 0:read_dev_sector: mapping = 87946cac > 0:__read_cache_page:cached_page=83a15580 > 0:read_cache_page: page=83a15580 > 0:lock_page > 0:sync_page on mm/filemap.c > 0:block_sync_page > 0:blk_run_backing_dev() > 0:blk_backing_dev_unplug() > 0:scsi_request_fn > 0:scsi_dispatch_cmd > 0:ata_scsi_queuecmd > 0:ata_scsi_translate > 0:***ata_qc_prep*** > 0:***ata_check_status*** > 0:***ata_std_dev_select*** > 0:***ata_check_status*** > 0:***ata_check_status*** > 0:***ata_tf_load > 0:***ata_bmdma_setup*** > 0:***ata_exec_command*** > 0:***ata_bmdma_start*** > 0:Dispatched cmd 8b64eba0 with rth 0 > > <PAUSE> > > 0:***sil_freeze()*** > 0:***ata_bmdma_error_handler*** This is libata error handling kicking in after detecting command timeout. > 0:***ata_bmdma_status*** > 0:***ata_bmdma_stop*** And EH tries to stop the bmdma engine. > ********************************************* > cpu_0 received a bus/cache error > ********************************************* > Bridge: Phys Addr = 0x0000000000, Device_AERR = 0x00000000 > Bridge: The devices reporting AERR are: > CPU: (XLR specific) Cache Error log = 0x0000007800004601, Phy Addr = > 0x00f0000088 > CPU: epc = 0x83435dfc, errorepc = 0x835bf054, cacheerr = 0x00000000 > Can not handle bus/cache error - Halting cpu Which triggers bus/cache error. Hmmm... When the booting works, does the drive keep working after boot? ie. if you put the drive under stress test, does the system survive. Also, error handling shouldn't trigger bus/cache error. Do you know what the bus/cache error means? Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-15 10:10 ` Tejun Heo @ 2008-09-15 10:20 ` Alex Gonzalez 2008-09-15 10:24 ` Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Alex Gonzalez @ 2008-09-15 10:20 UTC (permalink / raw) To: Tejun Heo; +Cc: Sergei Shtylyov, linux-ide Hi Tejun, I haven't placed the drive under stress test, but it does work OK under normal conditions without errors or timeouts in the log. >From the cache error, I know that the physical region iomapped in 0xf0000000 - 0xf0000200 is the PCI memory space, so the exception is being caused by trying to access this area. I haven't yet looked deeply into it, I was assuming the action of stopping the bmdma engine might have triggered the exception. If not, why not receive an immediate exception without the 10secs timeout? Regards, Alex On Mon, Sep 15, 2008 at 11:10 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > Alex Gonzalez wrote: >> :<6> sda:0:msdos_partition: sector_size=1 >> 0:read_dev_sector: mapping = 87946cac >> 0:__read_cache_page:cached_page=83a15580 >> 0:read_cache_page: page=83a15580 >> 0:lock_page >> 0:sync_page on mm/filemap.c >> 0:block_sync_page >> 0:blk_run_backing_dev() >> 0:blk_backing_dev_unplug() >> 0:scsi_request_fn >> 0:scsi_dispatch_cmd >> 0:ata_scsi_queuecmd >> 0:ata_scsi_translate >> 0:***ata_qc_prep*** >> 0:***ata_check_status*** >> 0:***ata_std_dev_select*** >> 0:***ata_check_status*** >> 0:***ata_check_status*** >> 0:***ata_tf_load >> 0:***ata_bmdma_setup*** >> 0:***ata_exec_command*** >> 0:***ata_bmdma_start*** >> 0:Dispatched cmd 8b64eba0 with rth 0 >> >> <PAUSE> >> >> 0:***sil_freeze()*** >> 0:***ata_bmdma_error_handler*** > > This is libata error handling kicking in after detecting command > timeout. > >> 0:***ata_bmdma_status*** >> 0:***ata_bmdma_stop*** > > And EH tries to stop the bmdma engine. > >> ********************************************* >> cpu_0 received a bus/cache error >> ********************************************* >> Bridge: Phys Addr = 0x0000000000, Device_AERR = 0x00000000 >> Bridge: The devices reporting AERR are: >> CPU: (XLR specific) Cache Error log = 0x0000007800004601, Phy Addr = >> 0x00f0000088 >> CPU: epc = 0x83435dfc, errorepc = 0x835bf054, cacheerr = 0x00000000 >> Can not handle bus/cache error - Halting cpu > > Which triggers bus/cache error. Hmmm... When the booting works, does > the drive keep working after boot? ie. if you put the drive under > stress test, does the system survive. Also, error handling shouldn't > trigger bus/cache error. Do you know what the bus/cache error means? > > Thanks. > > -- > tejun > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-15 10:20 ` Alex Gonzalez @ 2008-09-15 10:24 ` Tejun Heo 2008-09-15 15:53 ` Alex Gonzalez 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2008-09-15 10:24 UTC (permalink / raw) To: Alex Gonzalez; +Cc: Sergei Shtylyov, linux-ide Hello, Alex Gonzalez wrote: > I haven't placed the drive under stress test, but it does work OK > under normal conditions without errors or timeouts in the log. Hmm... Okay. >>From the cache error, I know that the physical region iomapped in > 0xf0000000 - 0xf0000200 is the PCI memory space, so the exception is > being caused by trying to access this area. > > I haven't yet looked deeply into it, I was assuming the action of > stopping the bmdma engine might have triggered the exception. If not, > why not receive an immediate exception without the 10secs timeout? Yeah, it's caused by EH trying to stop the BMDMA engine but it shouldn't. It works well on other platforms. Can you try to diagnose the bus failure? Working EH will be able to tell us more about the problem. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-15 10:24 ` Tejun Heo @ 2008-09-15 15:53 ` Alex Gonzalez 2008-09-16 10:13 ` Sergei Shtylyov 0 siblings, 1 reply; 12+ messages in thread From: Alex Gonzalez @ 2008-09-15 15:53 UTC (permalink / raw) To: Tejun Heo; +Cc: Sergei Shtylyov, linux-ide Hi all, I got some more information regarding this issue. After reading "PCI Compatibility and PCI-Native Mode Bus Master Adapters" as pointed by Sergei, I printed the Bus Master ATA status register (offset 0x2), the PCI status/command (offset 0x4) register and the SATA ATA status (offset 0x104), control (offset 0x100) and error (offset 0x108) registers looking for clues. :***sil_freeze()*** 0:***ata_bmdma_status=00000002 0:PCI status=02b00016 0:SATA status 00000113 ctl 00000310 err 00000000 Everything looks fine to me, except maybe bits 1 and 4 of the PCI status. The Silicon Image Sil3512 datasheet specifies that bit 4 should be hardcoded to zero, and bit 0 (the IO space enable) is strangely zero, maybe it's set dynamically when needed? Following on Tejun's advise, I found out that just the delay of a printk in ata_bmdma_stop(), right after the iowrite8, gets rid of the cache error and it's replaced by this: [4294671.719000] 0:<7>sata_sil 0000:00:01.0: version 2.1 [4294671.720000] 0:<6>sata_sil 0000:00:01.0: Applying R_ERR on DMA activate FIS errata fix [4294671.722000] 0:<6>ata1: SATA max UDMA/100 cmd 0xc0000080 ctl 0xc000008a bmdma 0xc0000000 irq 24 [4294671.724000] 0:<6>ata2: SATA max UDMA/100 cmd 0xc00000c0 ctl 0xc00000ca bmdma 0xc0000008 irq 24 [4294671.725000] 0:<6>scsi0 : sata_sil [4294672.182000] 2:<6>ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [4294672.205000] 2:drivers/ata/libata-sff.c:368: Delay [4294672.219000] 2:<6>ata1.00: ATA-7: ST3160815AS, 4.AAB, max UDMA/133 [4294672.238000] 2:<6>ata1.00: 312581808 sectors, multi 0: LBA48 NCQ (depth 0/32) [4294672.259000] 2:drivers/ata/libata-sff.c:368: Delay [4294672.277000] 2:drivers/ata/libata-sff.c:368: Delay [4294672.291000] 2:<6>ata1.00: configured for UDMA/100 [4294672.306000] 0:<6>scsi1 : sata_sil [4294672.610000] 2:<6>ata2: SATA link down (SStatus 0 SControl 310) [4294672.628000] 0:<5>scsi 0:0:0:0: Direct-Access ATA ST3160815AS 4.AA PQ: 0 ANSI: 5 [4294672.630000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [4294672.631000] 0:<5>sda: Write Protect is off [4294672.632000] 0:<7>sda: Mode Sense: 00 3a 00 00 [4294672.633000] 0:<5>SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA [4294672.634000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [4294672.635000] 0:<5>sda: Write Protect is off [4294672.636000] 0:<7>sda: Mode Sense: 00 3a 00 00 [4294672.637000] 0:<5>SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA [4294672.638000] 0:<6> sda:0:drivers/ata/libata-sff.c:368: Delay [4294702.640000] 0:<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [4294702.641000] 0:<3>ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in [4294702.641000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x20 (host bus error) [4294702.646000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 [4294702.651000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 and never comes back. I'll try to decode the ATA error see where it takes me next. Thanks for your continued help and patience, Alex On Mon, Sep 15, 2008 at 11:24 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > Alex Gonzalez wrote: >> I haven't placed the drive under stress test, but it does work OK >> under normal conditions without errors or timeouts in the log. > > Hmm... Okay. > >>>From the cache error, I know that the physical region iomapped in >> 0xf0000000 - 0xf0000200 is the PCI memory space, so the exception is >> being caused by trying to access this area. >> >> I haven't yet looked deeply into it, I was assuming the action of >> stopping the bmdma engine might have triggered the exception. If not, >> why not receive an immediate exception without the 10secs timeout? > > Yeah, it's caused by EH trying to stop the BMDMA engine but it > shouldn't. It works well on other platforms. Can you try to diagnose > the bus failure? Working EH will be able to tell us more about the > problem. > > -- > tejun > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-15 15:53 ` Alex Gonzalez @ 2008-09-16 10:13 ` Sergei Shtylyov 2008-09-18 10:30 ` Alex Gonzalez 0 siblings, 1 reply; 12+ messages in thread From: Sergei Shtylyov @ 2008-09-16 10:13 UTC (permalink / raw) To: Alex Gonzalez; +Cc: Tejun Heo, linux-ide Hello. Alex Gonzalez wrote: > I got some more information regarding this issue. > > After reading "PCI Compatibility and PCI-Native Mode Bus Master > Adapters" as pointed by Sergei, I printed the Bus Master ATA status > register (offset 0x2), the PCI status/command (offset 0x4) register > and the SATA ATA status (offset 0x104), control (offset 0x100) and > error (offset 0x108) registers looking for clues. > > :***sil_freeze()*** > 0:***ata_bmdma_status=00000002 > This status means that transfer ended due to a PCI error. > 0:PCI status=02b00016 > 0:SATA status 00000113 ctl 00000310 err 00000000 > > Everything looks fine to me, except maybe bits 1 and 4 of the PCI > status. The Silicon Image Sil3512 datasheet specifies that bit 4 > should be hardcoded to zero, and bit 0 (the IO space enable) is > strangely zero, maybe it's set dynamically when needed? > No, it's setn when enabling the dvice if it has BARs in I/O space. Or at least it should be... > Following on Tejun's advise, I found out that just the delay of a > printk in ata_bmdma_stop(), right after the iowrite8, gets rid of the > cache error and it's replaced by this: > Hm, maybe it was the cache error was what stopped the DMA transfer? > [4294671.719000] 0:<7>sata_sil 0000:00:01.0: version 2.1 > [4294671.720000] 0:<6>sata_sil 0000:00:01.0: Applying R_ERR on DMA > activate FIS errata fix > [4294671.722000] 0:<6>ata1: SATA max UDMA/100 cmd 0xc0000080 ctl > 0xc000008a bmdma 0xc0000000 irq 24 > [4294671.724000] 0:<6>ata2: SATA max UDMA/100 cmd 0xc00000c0 ctl > 0xc00000ca bmdma 0xc0000008 irq 24 > Memory mapped registers are used by the driver. So, probably I/O space was not enabled on purpose -- can't find the implementation of pci_enable_device() right now... > [4294671.725000] 0:<6>scsi0 : sata_sil > [4294672.182000] 2:<6>ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > [4294672.205000] 2:drivers/ata/libata-sff.c:368: Delay > [4294672.219000] 2:<6>ata1.00: ATA-7: ST3160815AS, 4.AAB, max UDMA/133 > [4294672.238000] 2:<6>ata1.00: 312581808 sectors, multi 0: LBA48 NCQ > (depth 0/32) > [4294672.259000] 2:drivers/ata/libata-sff.c:368: Delay > [4294672.277000] 2:drivers/ata/libata-sff.c:368: Delay > [4294672.291000] 2:<6>ata1.00: configured for UDMA/100 > [4294672.306000] 0:<6>scsi1 : sata_sil > [4294672.610000] 2:<6>ata2: SATA link down (SStatus 0 SControl 310) > [4294672.628000] 0:<5>scsi 0:0:0:0: Direct-Access ATA > ST3160815AS 4.AA PQ: 0 ANSI: 5 > [4294672.630000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors > (160042 MB) > [4294672.631000] 0:<5>sda: Write Protect is off > [4294672.632000] 0:<7>sda: Mode Sense: 00 3a 00 00 > [4294672.633000] 0:<5>SCSI device sda: write cache: enabled, read > cache: enabled, doesn't support DPO or FUA > [4294672.634000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors > (160042 MB) > [4294672.635000] 0:<5>sda: Write Protect is off > [4294672.636000] 0:<7>sda: Mode Sense: 00 3a 00 00 > [4294672.637000] 0:<5>SCSI device sda: write cache: enabled, read > cache: enabled, doesn't support DPO or FUA > [4294672.638000] 0:<6> sda:0:drivers/ata/libata-sff.c:368: Delay > [4294702.640000] 0:<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 > [4294702.641000] 0:<3>ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 > tag 0 cdb 0x0 data 4096 in > The command failed was READ DMA. > [4294702.641000] res 40/00:00:00:00:00/00:00:00:00:00/00 > Emask 0x20 (host bus error) > Hum, the output IDE taskfile looks strange... status of 0x40 and all 0s in other regs... > [4294702.646000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 > [4294702.651000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 > [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 > [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 > [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 > [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 > Hum, it's a status characterisitc for an unfinished PIO transfer. > and never comes back. > > I'll try to decode the ATA error see where it takes me next. > Did that for you. :-) BTW, the tables at http://ata.wiki.kernel.org/index.php/Libata_error_messages lack the explanation of a number after the last slash... > Thanks for your continued help and patience, > Alex > MBR, Sergei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-16 10:13 ` Sergei Shtylyov @ 2008-09-18 10:30 ` Alex Gonzalez 2008-11-10 10:02 ` Alex Gonzalez 0 siblings, 1 reply; 12+ messages in thread From: Alex Gonzalez @ 2008-09-18 10:30 UTC (permalink / raw) To: Sergei Shtylyov; +Cc: Tejun Heo, linux-ide Thanks for that Sergei. To this point there is little more information we can get from a software perspective. I am handing this over to hardware design. I'll make sure I'll update the thread when a solution is found, if only to assert it as a hardware design issue. Thanks for all your help, Alex On Tue, Sep 16, 2008 at 11:13 AM, Sergei Shtylyov <sshtylyov@ru.mvista.com> wrote: > Hello. > > Alex Gonzalez wrote: > >> I got some more information regarding this issue. >> >> After reading "PCI Compatibility and PCI-Native Mode Bus Master >> Adapters" as pointed by Sergei, I printed the Bus Master ATA status >> register (offset 0x2), the PCI status/command (offset 0x4) register >> and the SATA ATA status (offset 0x104), control (offset 0x100) and >> error (offset 0x108) registers looking for clues. >> >> :***sil_freeze()*** >> 0:***ata_bmdma_status=00000002 >> > > This status means that transfer ended due to a PCI error. > >> 0:PCI status=02b00016 >> 0:SATA status 00000113 ctl 00000310 err 00000000 >> >> Everything looks fine to me, except maybe bits 1 and 4 of the PCI >> status. The Silicon Image Sil3512 datasheet specifies that bit 4 >> should be hardcoded to zero, and bit 0 (the IO space enable) is >> strangely zero, maybe it's set dynamically when needed? >> > > No, it's setn when enabling the dvice if it has BARs in I/O space. Or at > least it should be... > >> Following on Tejun's advise, I found out that just the delay of a >> printk in ata_bmdma_stop(), right after the iowrite8, gets rid of the >> cache error and it's replaced by this: >> > > Hm, maybe it was the cache error was what stopped the DMA transfer? > >> [4294671.719000] 0:<7>sata_sil 0000:00:01.0: version 2.1 >> [4294671.720000] 0:<6>sata_sil 0000:00:01.0: Applying R_ERR on DMA >> activate FIS errata fix >> [4294671.722000] 0:<6>ata1: SATA max UDMA/100 cmd 0xc0000080 ctl >> 0xc000008a bmdma 0xc0000000 irq 24 >> [4294671.724000] 0:<6>ata2: SATA max UDMA/100 cmd 0xc00000c0 ctl >> 0xc00000ca bmdma 0xc0000008 irq 24 >> > > Memory mapped registers are used by the driver. So, probably I/O space was > not enabled on purpose -- can't find the implementation of > pci_enable_device() right now... > >> [4294671.725000] 0:<6>scsi0 : sata_sil >> [4294672.182000] 2:<6>ata1: SATA link up 1.5 Gbps (SStatus 113 SControl >> 310) >> [4294672.205000] 2:drivers/ata/libata-sff.c:368: Delay >> [4294672.219000] 2:<6>ata1.00: ATA-7: ST3160815AS, 4.AAB, max UDMA/133 >> [4294672.238000] 2:<6>ata1.00: 312581808 sectors, multi 0: LBA48 NCQ >> (depth 0/32) >> [4294672.259000] 2:drivers/ata/libata-sff.c:368: Delay >> [4294672.277000] 2:drivers/ata/libata-sff.c:368: Delay >> [4294672.291000] 2:<6>ata1.00: configured for UDMA/100 >> [4294672.306000] 0:<6>scsi1 : sata_sil >> [4294672.610000] 2:<6>ata2: SATA link down (SStatus 0 SControl 310) >> [4294672.628000] 0:<5>scsi 0:0:0:0: Direct-Access ATA >> ST3160815AS 4.AA PQ: 0 ANSI: 5 >> [4294672.630000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors >> (160042 MB) >> [4294672.631000] 0:<5>sda: Write Protect is off >> [4294672.632000] 0:<7>sda: Mode Sense: 00 3a 00 00 >> [4294672.633000] 0:<5>SCSI device sda: write cache: enabled, read >> cache: enabled, doesn't support DPO or FUA >> [4294672.634000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors >> (160042 MB) >> [4294672.635000] 0:<5>sda: Write Protect is off >> [4294672.636000] 0:<7>sda: Mode Sense: 00 3a 00 00 >> [4294672.637000] 0:<5>SCSI device sda: write cache: enabled, read >> cache: enabled, doesn't support DPO or FUA >> [4294672.638000] 0:<6> sda:0:drivers/ata/libata-sff.c:368: Delay >> [4294702.640000] 0:<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 >> action 0x0 >> [4294702.641000] 0:<3>ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 >> tag 0 cdb 0x0 data 4096 in >> > > The command failed was READ DMA. > >> [4294702.641000] res 40/00:00:00:00:00/00:00:00:00:00/00 >> Emask 0x20 (host bus error) >> > > Hum, the output IDE taskfile looks strange... status of 0x40 and all 0s in > other regs... > >> [4294702.646000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >> [4294702.651000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >> [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >> [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >> [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >> [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >> > > Hum, it's a status characterisitc for an unfinished PIO transfer. > >> and never comes back. >> >> I'll try to decode the ATA error see where it takes me next. >> > > Did that for you. :-) > BTW, the tables at > http://ata.wiki.kernel.org/index.php/Libata_error_messages lack the > explanation of a number after the last slash... > >> Thanks for your continued help and patience, >> Alex >> > > MBR, Sergei > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [libata/sata_sil] Error on startup 2008-09-18 10:30 ` Alex Gonzalez @ 2008-11-10 10:02 ` Alex Gonzalez 0 siblings, 0 replies; 12+ messages in thread From: Alex Gonzalez @ 2008-11-10 10:02 UTC (permalink / raw) To: Sergei Shtylyov; +Cc: Tejun Heo, linux-ide Hi, I finally have an update on this issue. It all boiled down to an errata on the XLR CPU (PA1041.1 - PCI-X host mode can only use BAR0 in 32-bit mode ). The bootloader had a workaround applied for it, but it had a bug when applying the workaround to specific CPU revisions, and unfortunately the revision I was using needed the fix but didn't have it applied. The result was the PCI subblock sometimes not recognizing the correct addressing. Thanks for your help, really appreciated. Alex On Thu, Sep 18, 2008 at 10:30 AM, Alex Gonzalez <langabe@gmail.com> wrote: > Thanks for that Sergei. > > To this point there is little more information we can get from a > software perspective. I am handing this over to hardware design. I'll > make sure I'll update the thread when a solution is found, if only to > assert it as a hardware design issue. > > Thanks for all your help, > Alex > > On Tue, Sep 16, 2008 at 11:13 AM, Sergei Shtylyov > <sshtylyov@ru.mvista.com> wrote: >> Hello. >> >> Alex Gonzalez wrote: >> >>> I got some more information regarding this issue. >>> >>> After reading "PCI Compatibility and PCI-Native Mode Bus Master >>> Adapters" as pointed by Sergei, I printed the Bus Master ATA status >>> register (offset 0x2), the PCI status/command (offset 0x4) register >>> and the SATA ATA status (offset 0x104), control (offset 0x100) and >>> error (offset 0x108) registers looking for clues. >>> >>> :***sil_freeze()*** >>> 0:***ata_bmdma_status=00000002 >>> >> >> This status means that transfer ended due to a PCI error. >> >>> 0:PCI status=02b00016 >>> 0:SATA status 00000113 ctl 00000310 err 00000000 >>> >>> Everything looks fine to me, except maybe bits 1 and 4 of the PCI >>> status. The Silicon Image Sil3512 datasheet specifies that bit 4 >>> should be hardcoded to zero, and bit 0 (the IO space enable) is >>> strangely zero, maybe it's set dynamically when needed? >>> >> >> No, it's setn when enabling the dvice if it has BARs in I/O space. Or at >> least it should be... >> >>> Following on Tejun's advise, I found out that just the delay of a >>> printk in ata_bmdma_stop(), right after the iowrite8, gets rid of the >>> cache error and it's replaced by this: >>> >> >> Hm, maybe it was the cache error was what stopped the DMA transfer? >> >>> [4294671.719000] 0:<7>sata_sil 0000:00:01.0: version 2.1 >>> [4294671.720000] 0:<6>sata_sil 0000:00:01.0: Applying R_ERR on DMA >>> activate FIS errata fix >>> [4294671.722000] 0:<6>ata1: SATA max UDMA/100 cmd 0xc0000080 ctl >>> 0xc000008a bmdma 0xc0000000 irq 24 >>> [4294671.724000] 0:<6>ata2: SATA max UDMA/100 cmd 0xc00000c0 ctl >>> 0xc00000ca bmdma 0xc0000008 irq 24 >>> >> >> Memory mapped registers are used by the driver. So, probably I/O space was >> not enabled on purpose -- can't find the implementation of >> pci_enable_device() right now... >> >>> [4294671.725000] 0:<6>scsi0 : sata_sil >>> [4294672.182000] 2:<6>ata1: SATA link up 1.5 Gbps (SStatus 113 SControl >>> 310) >>> [4294672.205000] 2:drivers/ata/libata-sff.c:368: Delay >>> [4294672.219000] 2:<6>ata1.00: ATA-7: ST3160815AS, 4.AAB, max UDMA/133 >>> [4294672.238000] 2:<6>ata1.00: 312581808 sectors, multi 0: LBA48 NCQ >>> (depth 0/32) >>> [4294672.259000] 2:drivers/ata/libata-sff.c:368: Delay >>> [4294672.277000] 2:drivers/ata/libata-sff.c:368: Delay >>> [4294672.291000] 2:<6>ata1.00: configured for UDMA/100 >>> [4294672.306000] 0:<6>scsi1 : sata_sil >>> [4294672.610000] 2:<6>ata2: SATA link down (SStatus 0 SControl 310) >>> [4294672.628000] 0:<5>scsi 0:0:0:0: Direct-Access ATA >>> ST3160815AS 4.AA PQ: 0 ANSI: 5 >>> [4294672.630000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors >>> (160042 MB) >>> [4294672.631000] 0:<5>sda: Write Protect is off >>> [4294672.632000] 0:<7>sda: Mode Sense: 00 3a 00 00 >>> [4294672.633000] 0:<5>SCSI device sda: write cache: enabled, read >>> cache: enabled, doesn't support DPO or FUA >>> [4294672.634000] 0:<5>SCSI device sda: 312581808 512-byte hdwr sectors >>> (160042 MB) >>> [4294672.635000] 0:<5>sda: Write Protect is off >>> [4294672.636000] 0:<7>sda: Mode Sense: 00 3a 00 00 >>> [4294672.637000] 0:<5>SCSI device sda: write cache: enabled, read >>> cache: enabled, doesn't support DPO or FUA >>> [4294672.638000] 0:<6> sda:0:drivers/ata/libata-sff.c:368: Delay >>> [4294702.640000] 0:<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 >>> action 0x0 >>> [4294702.641000] 0:<3>ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 >>> tag 0 cdb 0x0 data 4096 in >>> >> >> The command failed was READ DMA. >> >>> [4294702.641000] res 40/00:00:00:00:00/00:00:00:00:00/00 >>> Emask 0x20 (host bus error) >>> >> >> Hum, the output IDE taskfile looks strange... status of 0x40 and all 0s in >> other regs... >> >>> [4294702.646000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >>> [4294702.651000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >>> [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >>> [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >>> [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >>> [4294702.652000] 0:<4>ATA: abnormal status 0x58 on port 0xc0000087 >>> >> >> Hum, it's a status characterisitc for an unfinished PIO transfer. >> >>> and never comes back. >>> >>> I'll try to decode the ATA error see where it takes me next. >>> >> >> Did that for you. :-) >> BTW, the tables at >> http://ata.wiki.kernel.org/index.php/Libata_error_messages lack the >> explanation of a number after the last slash... >> >>> Thanks for your continued help and patience, >>> Alex >>> >> >> MBR, Sergei >> >> >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-11-10 10:02 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <c58a7a270809120334r16164e3ej9f8ab769a09742ed@mail.gmail.com>
2008-09-12 11:27 ` [libata/sata_sil] Error on startup Alex Gonzalez
2008-09-13 12:55 ` Sergei Shtylyov
2008-09-13 17:23 ` Alex Gonzalez
2008-09-15 8:57 ` Tejun Heo
2008-09-15 9:49 ` Alex Gonzalez
2008-09-15 10:10 ` Tejun Heo
2008-09-15 10:20 ` Alex Gonzalez
2008-09-15 10:24 ` Tejun Heo
2008-09-15 15:53 ` Alex Gonzalez
2008-09-16 10:13 ` Sergei Shtylyov
2008-09-18 10:30 ` Alex Gonzalez
2008-11-10 10:02 ` Alex Gonzalez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).