VIA SATA Raid needs a long time to recover from suspend

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* VIA SATA Raid needs a long time to recover from suspend
@ 2005-11-16  3:37 Phillip Susi
  2005-11-17  1:06 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Phillip Susi @ 2005-11-16  3:37 UTC (permalink / raw)
  To: linux-kernel

I have been debugging a power management problem for a few days now, and 
I believe I have finally solved the problem.  Because it involved 
patching the kernel, I felt I should share the fix here in hopes that it 
can be improved and/or integrated into future kernels.  Right now I am 
running 2.6.14.2 on amd64, compiled myself, with the ubuntu breezy amd64 
distribution. 

First I'll state the fix.  It involved changing two lines in 
include/linux/libata.h:

static inline u8 ata_busy_wait(struct ata_port *ap, unsigned int bits,
                   unsigned int max)
{
    u8 status;

    do {
        udelay(100);                                 <-- changed to 100 
from 10
        status = ata_chk_status(ap);
        max--;
    } while ((status & bits) && (max > 0));

    return status;
}

and:

static inline u8 ata_wait_idle(struct ata_port *ap)
{
    u8 status = ata_busy_wait(ap, ATA_BUSY | ATA_DRQ, 
10000);             <-- changed to 10,000 from 1,000

    if (status & (ATA_BUSY | ATA_DRQ)) {
        unsigned long l = ap->ioaddr.status_addr;
        printk(KERN_WARNING
               "ATA: abnormal status 0x%X on port 0x%lX\n",
               status, l);
    }

    return status;
}

The problem seems to be that my VIA SATA raid controller requires more 
time to recover from being suspended.  It looks like the code in 
sata_via.c restores the task file after a resume, then calls 
ata_wait_idle to wait for the busy bit to clear.  The problem was that 
this function timed out before the busy bit cleared, resulting in 
messages like this:

ATA: abnormal status 0x80 on port 0xE007

Then if there was an IO request made immediately after resuming, it 
would timeout and fail, because it was issued before the hardware was 
ready.  Changing the timeout resolved this.  I tried changing both the 
udelay and ata_busy_wait lines to increase the timeout, and it did not 
seem to matter which I changed, as long as the total timeout was 
increased by a factor of 100. 

Since increasing the maximum timeout, suspend and hibernate work great 
for me.  While experiencing this bug, it may have exposed another bug, 
which I will mention now in passing.  As I said before, after a resume, 
if there was an IO request made immediately ( before the busy bit 
finally did clear ) it would timeout and fail.  It seemed the kernel 
filled the buffer cache for the requested block with garbage rather than 
retry the read.  It seems to me that at some point, the read should have 
been retried.  The symptoms of this were:

1) When suspend.sh called resume.sh immediately after the echo mem > 
/sys/power/state line, then on resume, the read would fail in a block in 
the resierfs tree that was required to lookup the resume.sh file.  This 
caused reiserfs to complain about errors in the node, and the script 
failed to execute.  Further attempts to touch the script, even with ls 
-al /etc/acpi/resume.sh failed with EPERM.  I would think that at worst, 
this should fail with EIO or something, not EPERM. 

2) At one point I tried running echo mem > /sys/power/state ; df.  After 
the resume, the IO read failed when trying to load df, and I got an 
error message saying the kernel could not execute the binary file.  
Further attempts to run df failed also.  Other IO at this point was fine. 

This leads me to think that when the IO failed, rather than inform the 
calling code of the failure, for example, with an EIO status, the buffer 
cache got filled with junk, and this should not happen.  Either the 
operation should succeed, and the correct data be returned, or it should 
fail, and the caller should be informed of the failure, and not given 
incorrect data. 

When the first IO immediately following the suspend failed, I got these 
messages:

[   32.013538] ata1: command 0x35 timeout, stat 0x50 host_stat 0x1
[   32.045510] ata2: command 0x35 timeout, stat 0x50 host_stat 0x1

As long as no IO was immediately requested after the resume ( i.e. if I 
echo mem > /sys/power/state on an otherwise idle system, rather than 
using suspend.sh ) then these errors did not happen, only the abnormal 
status messages did. 

For reference, my system is configured as follows:

Motherboard: Asus K8V Deluxe
CPU: AMD Athlon 64 3200+
RAM: 1 GB of Corsair low latency pc3200 ddr sdram
Video: ATI Radeon 9800 Pro with a Samsung 930B 19 inch LCD display
Disks: 2 WD 36 gig SATA 10,000 rpm raptors in a raid0 configuration on 
the via sata raid controller
Partitions:

/dev/mapper/via_hfciifae1: 40 gig winxp NTFS partition
/dev/mapper/via_hfciifae3: 10 gig experimental partition
/dev/mapper/via_hfciifae5: 50 meg ext2 /boot partition
/dev/mapper/via_hfciifae6: 1 gig swap partition
/dev/mapper/via_hfciifae7: 22 gig reiserfs root partition

If anyone has any suggestions of further tests I can perform to narrow 
down the problem, or a better solution for it, you have my full 
cooperation.  If this fix seems acceptable, then I hope it can be merged 
in the next kernel release. 

PS> Please CC me on any replies, as I am not subscribed to this list

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: VIA SATA Raid needs a long time to recover from suspend
  2005-11-16  3:37 VIA SATA Raid needs a long time to recover from suspend Phillip Susi
@ 2005-11-17  1:06 ` Andrew Morton
  2005-11-17  3:55   ` Phillip Susi
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2005-11-17  1:06 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-kernel, Jeff Garzik

Phillip Susi <psusi@cfl.rr.com> wrote:
>
> I have been debugging a power management problem for a few days now, and 
> I believe I have finally solved the problem.  Because it involved 
> patching the kernel, I felt I should share the fix here in hopes that it 
> can be improved and/or integrated into future kernels.  Right now I am 
> running 2.6.14.2 on amd64, compiled myself, with the ubuntu breezy amd64 
> distribution. 
> 
> First I'll state the fix.  It involved changing two lines in 
> include/linux/libata.h:
> 
> static inline u8 ata_busy_wait(struct ata_port *ap, unsigned int bits,
>                    unsigned int max)
> {
>     u8 status;
> 
>     do {
>         udelay(100);                                 <-- changed to 100 
> from 10
>         status = ata_chk_status(ap);
>         max--;
>     } while ((status & bits) && (max > 0));
> 
>     return status;
> }
> 
> and:
> 
> static inline u8 ata_wait_idle(struct ata_port *ap)
> {
>     u8 status = ata_busy_wait(ap, ATA_BUSY | ATA_DRQ, 
> 10000);             <-- changed to 10,000 from 1,000
> 
>     if (status & (ATA_BUSY | ATA_DRQ)) {
>         unsigned long l = ap->ioaddr.status_addr;
>         printk(KERN_WARNING
>                "ATA: abnormal status 0x%X on port 0x%lX\n",
>                status, l);
>     }
> 
>     return status;
> }

This change will increase the minimum delay in both ata_wait_idle() and
ata_busy_wait() from 10 usec to 100 usec, which is not a good change.

It would be less damaging to increase the delay in ata_wait_idle() from
1000 to 100,000.  A one second spin is a bit sad, but the hardware's bust,
so that's the least of the user's worries.

The best fix would be to identify the specific _callers_ of these functions
which need their delay increased.   You can do that by adding:

	if (max == 0)
		dump_stack();

at the end of ata_busy_wait().  The resulting stack dumps will tell you
where the offending callsites are.  With luck, it's just one.  If we know
which one, that might point us at some chipset-level delay which we're
forgetting to do, or something like that.

> The problem seems to be that my VIA SATA raid controller requires more 
> time to recover from being suspended.  It looks like the code in 
> sata_via.c restores the task file after a resume, then calls 
> ata_wait_idle to wait for the busy bit to clear.  The problem was that 
> this function timed out before the busy bit cleared, resulting in 
> messages like this:
> 
> ATA: abnormal status 0x80 on port 0xE007
> 
> Then if there was an IO request made immediately after resuming, it 
> would timeout and fail, because it was issued before the hardware was 
> ready.  Changing the timeout resolved this.  I tried changing both the 
> udelay and ata_busy_wait lines to increase the timeout, and it did not 
> seem to matter which I changed, as long as the total timeout was 
> increased by a factor of 100. 
> 
> Since increasing the maximum timeout, suspend and hibernate work great 
> for me.  While experiencing this bug, it may have exposed another bug, 
> which I will mention now in passing.  As I said before, after a resume, 
> if there was an IO request made immediately ( before the busy bit 
> finally did clear ) it would timeout and fail.  It seemed the kernel 
> filled the buffer cache for the requested block with garbage rather than 
> retry the read.  It seems to me that at some point, the read should have 
> been retried.

Or should have returned an IO error.

Yes, this might be a driver bug.  Again, if we know precisely which
callsite is experiencing the timeout then we're better situated to fix it.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: VIA SATA Raid needs a long time to recover from suspend
  2005-11-17  1:06 ` Andrew Morton
@ 2005-11-17  3:55   ` Phillip Susi
  2005-11-18  0:08     ` Mark Lord
  0 siblings, 1 reply; 5+ messages in thread
From: Phillip Susi @ 2005-11-17  3:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Jeff Garzik

Andrew Morton wrote:

>This change will increase the minimum delay in both ata_wait_idle() and
>ata_busy_wait() from 10 usec to 100 usec, which is not a good change.
>
>It would be less damaging to increase the delay in ata_wait_idle() from
>1000 to 100,000.  A one second spin is a bit sad, but the hardware's bust,
>so that's the least of the user's worries.
>
>The best fix would be to identify the specific _callers_ of these functions
>which need their delay increased.   You can do that by adding:
>
>	if (max == 0)
>		dump_stack();
>
>at the end of ata_busy_wait().  The resulting stack dumps will tell you
>where the offending callsites are.  With luck, it's just one.  If we know
>which one, that might point us at some chipset-level delay which we're
>forgetting to do, or something like that.
>
>  
>

Right, changing ata_wait_idle's maximum to 100,000 would be a better fix 
as it would not effect the minimum wait time.  Thanks for the 
dump_stack() tip, I put that in and this is the call stack dumped for 
the two timeouts following a resume ( though without the stack dump, 
there were 6 timeouts reported, 3 on each channel ):

[    2.650217] Call Trace:<ffffffff88048782>{:libata:ata_dev_select+102} 
<ffffffff880488fa>{:libata:ata_qc_issue_prot+31}
[    2.650244]        <ffffffff8804ba48>{:libata:ata_scsi_translate+172} 
<ffffffff880244bd>{:scsi_mod:scsi_done+0}
[    2.650269]        <ffffffff880244bd>{:scsi_mod:scsi_done+0} 
<ffffffff8804bd06>{:libata:ata_scsi_queuecmd+219}
[    2.650291]        <ffffffff88024686>{:scsi_mod:scsi_dispatch_cmd+433}
[    2.650305]        <ffffffff8802978e>{:scsi_mod:scsi_request_fn+611} 
<ffffffff80148ea3>{sync_page+0}
[    2.650326]        <ffffffff80224858>{generic_unplug_device+7} 
<ffffffff80224851>{generic_unplug_device+0}
[    2.650334]        <ffffffff8807c118>{:dm_mod:dm_table_unplug_all+49} 
<ffffffff8807a0f1>{:dm_mod:dm_unplug_all+29}
[    2.650354]        <ffffffff8807c118>{:dm_mod:dm_table_unplug_all+49} 
<ffffffff8807a0f1>{:dm_mod:dm_unplug_all+29}
[    2.650372]        <ffffffff80164a08>{block_sync_page+58} 
<ffffffff80148ed6>{sync_page+51}
[    2.650381]        <ffffffff8028eb11>{__wait_on_bit_lock+55} 
<ffffffff801490fd>{__lock_page+94}
[    2.650392]        <ffffffff8013d37f>{wake_bit_function+0} 
<ffffffff80149374>{find_get_page+12}
[    2.650402]        <ffffffff8014a4de>{do_generic_mapping_read+472} 
<ffffffff8014947f>{file_read_actor+0}
[    2.650410]        <ffffffff8014af0a>{__generic_file_aio_read+333} 
<ffffffff8014b0c0>{generic_file_read+187}
[    2.650433]        <ffffffff8013d274>{autoremove_wake_function+0} 
<ffffffff80163898>{vfs_read+200}
[    2.650453]        <ffffffff8016ba1f>{kernel_read+65} 
<ffffffff8016d4c7>{do_execve+291}
[    2.650464]        <ffffffff8010cc31>{sys_execve+51} 
<ffffffff80139bd8>{worker_thread+0}
[    2.650475]        <ffffffff8010ef34>{execve+100} 
<ffffffff80139bd8>{worker_thread+0}
[    2.650482]        <ffffffff801397bb>{__call_usermodehelper+0} 
<ffffffff80139abc>{____call_usermodehelper+134}
[    2.650496]        <ffffffff8010eec6>{child_rip+8} 
<ffffffff80139bd8>{worker_thread+0}
[    2.650504]        <ffffffff801397bb>{__call_usermodehelper+0} 
<ffffffff80139a36>{____call_usermodehelper+0}
[    2.650514]        <ffffffff8010eebe>{child_rip+0}
[    2.693624] ATA: abnormal status 0x80 on port 0xE007
[    2.693626]
[    2.693627] Call 
Trace:<ffffffff880488fa>{:libata:ata_qc_issue_prot+31} 
<ffffffff8804ba48>{:libata:ata_scsi_translate+172}
[    2.693646]        <ffffffff880244bd>{:scsi_mod:scsi_done+0} 
<ffffffff880244bd>{:scsi_mod:scsi_done+0}
[    2.693670]        <ffffffff8804bd06>{:libata:ata_scsi_queuecmd+219} 
<ffffffff88024686>{:scsi_mod:scsi_dispatch_cmd+433}
[    2.693694]        <ffffffff8802978e>{:scsi_mod:scsi_request_fn+611} 
<ffffffff80148ea3>{sync_page+0}
[    2.693712]        <ffffffff80224858>{generic_unplug_device+7} 
<ffffffff80224851>{generic_unplug_device+0}
[    2.693720]        <ffffffff8807c118>{:dm_mod:dm_table_unplug_all+49} 
<ffffffff8807a0f1>{:dm_mod:dm_unplug_all+29}
[    2.693738]        <ffffffff8807c118>{:dm_mod:dm_table_unplug_all+49} 
<ffffffff8807a0f1>{:dm_mod:dm_unplug_all+29}
[    2.693756]        <ffffffff80164a08>{block_sync_page+58} 
<ffffffff80148ed6>{sync_page+51}
[    2.693765]        <ffffffff8028eb11>{__wait_on_bit_lock+55} 
<ffffffff801490fd>{__lock_page+94}
[    2.693775]        <ffffffff8013d37f>{wake_bit_function+0} 
<ffffffff80149374>{find_get_page+12}
[    2.693785]        <ffffffff8014a4de>{do_generic_mapping_read+472} 
<ffffffff8014947f>{file_read_actor+0}
[    2.693793]        <ffffffff8014af0a>{__generic_file_aio_read+333} 
<ffffffff8014b0c0>{generic_file_read+187}
[    2.693816]        <ffffffff8013d274>{autoremove_wake_function+0} 
<ffffffff80163898>{vfs_read+200}
[    2.693835]        <ffffffff8016ba1f>{kernel_read+65} 
<ffffffff8016d4c7>{do_execve+291}
[    2.693846]        <ffffffff8010cc31>{sys_execve+51} 
<ffffffff80139bd8>{worker_thread+0}
[    2.693857]        <ffffffff8010ef34>{execve+100} 
<ffffffff80139bd8>{worker_thread+0}
[    2.693864]        <ffffffff801397bb>{__call_usermodehelper+0} 
<ffffffff80139abc>{____call_usermodehelper+134}
[    2.693878]        <ffffffff8010eec6>{child_rip+8} 
<ffffffff80139bd8>{worker_thread+0}
[    2.693885]        <ffffffff801397bb>{__call_usermodehelper+0} 
<ffffffff80139a36>{____call_usermodehelper+0}
[    2.693896]        <ffffffff8010eebe>{child_rip+0}


Maybe a better fix would be to insert a delay after the resume but 
before the driver even tries to access the hardware.  Maybe somewhere in 
a method in the DSDT?

>Or should have returned an IO error.
>
>Yes, this might be a driver bug.  Again, if we know precisely which
>callsite is experiencing the timeout then we're better situated to fix it.
>
>
>  
>
I made the timeouts happen again by returning the timeouts to the way 
they were before, and issuing the command:

echo mem > /sys/power/state ; df

After the abnormal status messages, around 30 seconds went by, then this:

[   14.596710] ata1: command 0x25 timeout, stat 0x50 host_stat 0x1
[   14.596715]
[   14.596716] Call 
Trace:<ffffffff880498e2>{:libata:ata_eng_timeout+496} 
<ffffffff88027690>{:scsi_mod:scsi_error_handler+0}
[   14.596752]        <ffffffff8804af99>{:libata:ata_scsi_error+21} 
<ffffffff88027706>{:scsi_mod:scsi_error_handler+118}
[   14.596775]        <ffffffff88027690>{:scsi_mod:scsi_error_handler+0} 
<ffffffff88027690>{:scsi_mod:scsi_error_handler+0}
[   14.596803]        <ffffffff8013d0d8>{kthread+191} 
<ffffffff80139bd8>{worker_thread+0}
[   14.596814]        <ffffffff8010eec6>{child_rip+8} 
<ffffffff80139bd8>{worker_thread+0}
[   14.596822]        <ffffffff8013cf04>{keventd_create_kthread+0} 
<ffffffff8013d019>{kthread+0}
[   14.596833]        <ffffffff8010eebe>{child_rip+0}
[   14.597403] ReiserFS: warning: is_tree_node: node level 2295 does not 
match to the expected one 1
[   14.597408] ReiserFS: dm-5: warning: vs-5150: search_by_key: invalid 
format found in block 32828. Fsck?
[   14.597414] ReiserFS: dm-5: warning: vs-13070: 
reiserfs_read_locked_inode: i/o failure occurred trying to find stat 
data of [2 1455 0x0 SD]

Bash reports command not found when trying to run df even now as I write 
this. 

I also noticed that during the 30 seconds or so that the system was 
trying to load df before libata timed out the request, the cpu was 
pegged at 100% utilization. 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: VIA SATA Raid needs a long time to recover from suspend
  2005-11-17  3:55   ` Phillip Susi
@ 2005-11-18  0:08     ` Mark Lord
  2005-11-19 23:32       ` Pavel Machek
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Lord @ 2005-11-18  0:08 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Andrew Morton, linux-kernel, Jeff Garzik

Phillip Susi wrote:
> Andrew Morton wrote:
> 
>> This change will increase the minimum delay in both ata_wait_idle() and
>> ata_busy_wait() from 10 usec to 100 usec, which is not a good change.
>>
>> It would be less damaging to increase the delay in ata_wait_idle() from
>> 1000 to 100,000.  A one second spin is a bit sad, but the hardware's 
>> bust,

I wonder if this the same problem that prevents resume-from-ram
from working on my system when I use an older hard drive,
rather than the newer model that came installed (notebook)..

Whenever resume fails, the hard drive light is on solid
and the system is unresponsive.  And the backlight is off so no
debug info available (no serial ports, either).

Could be.. could be.

Cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: VIA SATA Raid needs a long time to recover from suspend
  2005-11-18  0:08     ` Mark Lord
@ 2005-11-19 23:32       ` Pavel Machek
  0 siblings, 0 replies; 5+ messages in thread
From: Pavel Machek @ 2005-11-19 23:32 UTC (permalink / raw)
  To: Mark Lord; +Cc: Phillip Susi, Andrew Morton, linux-kernel, Jeff Garzik

Hi!
> >>This change will increase the minimum delay in both ata_wait_idle() and
> >>ata_busy_wait() from 10 usec to 100 usec, which is not a good change.
> >>
> >>It would be less damaging to increase the delay in ata_wait_idle() from
> >>1000 to 100,000.  A one second spin is a bit sad, but the hardware's 
> >>bust,
> 
> I wonder if this the same problem that prevents resume-from-ram
> from working on my system when I use an older hard drive,
> rather than the newer model that came installed (notebook)..
> 
> Whenever resume fails, the hard drive light is on solid
> and the system is unresponsive.  And the backlight is off so no
> debug info available (no serial ports, either).

Debugging this sucks, sorry. Usefull methods are keyboard leds and system
beeper. printk-over-morse comes to mind :-).
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-11-20  8:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-16  3:37 VIA SATA Raid needs a long time to recover from suspend Phillip Susi
2005-11-17  1:06 ` Andrew Morton
2005-11-17  3:55   ` Phillip Susi
2005-11-18  0:08     ` Mark Lord
2005-11-19 23:32       ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox