From: Phillip Susi <psusi@cfl.rr.com>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, Jeff Garzik <jgarzik@pobox.com>
Subject: Re: VIA SATA Raid needs a long time to recover from suspend
Date: Wed, 16 Nov 2005 22:55:54 -0500 [thread overview]
Message-ID: <437BFF4A.4060402@cfl.rr.com> (raw)
In-Reply-To: <20051116170642.313aeada.akpm@osdl.org>
Andrew Morton wrote:
>This change will increase the minimum delay in both ata_wait_idle() and
>ata_busy_wait() from 10 usec to 100 usec, which is not a good change.
>
>It would be less damaging to increase the delay in ata_wait_idle() from
>1000 to 100,000. A one second spin is a bit sad, but the hardware's bust,
>so that's the least of the user's worries.
>
>The best fix would be to identify the specific _callers_ of these functions
>which need their delay increased. You can do that by adding:
>
> if (max == 0)
> dump_stack();
>
>at the end of ata_busy_wait(). The resulting stack dumps will tell you
>where the offending callsites are. With luck, it's just one. If we know
>which one, that might point us at some chipset-level delay which we're
>forgetting to do, or something like that.
>
>
>
Right, changing ata_wait_idle's maximum to 100,000 would be a better fix
as it would not effect the minimum wait time. Thanks for the
dump_stack() tip, I put that in and this is the call stack dumped for
the two timeouts following a resume ( though without the stack dump,
there were 6 timeouts reported, 3 on each channel ):
[ 2.650217] Call Trace:<ffffffff88048782>{:libata:ata_dev_select+102}
<ffffffff880488fa>{:libata:ata_qc_issue_prot+31}
[ 2.650244] <ffffffff8804ba48>{:libata:ata_scsi_translate+172}
<ffffffff880244bd>{:scsi_mod:scsi_done+0}
[ 2.650269] <ffffffff880244bd>{:scsi_mod:scsi_done+0}
<ffffffff8804bd06>{:libata:ata_scsi_queuecmd+219}
[ 2.650291] <ffffffff88024686>{:scsi_mod:scsi_dispatch_cmd+433}
[ 2.650305] <ffffffff8802978e>{:scsi_mod:scsi_request_fn+611}
<ffffffff80148ea3>{sync_page+0}
[ 2.650326] <ffffffff80224858>{generic_unplug_device+7}
<ffffffff80224851>{generic_unplug_device+0}
[ 2.650334] <ffffffff8807c118>{:dm_mod:dm_table_unplug_all+49}
<ffffffff8807a0f1>{:dm_mod:dm_unplug_all+29}
[ 2.650354] <ffffffff8807c118>{:dm_mod:dm_table_unplug_all+49}
<ffffffff8807a0f1>{:dm_mod:dm_unplug_all+29}
[ 2.650372] <ffffffff80164a08>{block_sync_page+58}
<ffffffff80148ed6>{sync_page+51}
[ 2.650381] <ffffffff8028eb11>{__wait_on_bit_lock+55}
<ffffffff801490fd>{__lock_page+94}
[ 2.650392] <ffffffff8013d37f>{wake_bit_function+0}
<ffffffff80149374>{find_get_page+12}
[ 2.650402] <ffffffff8014a4de>{do_generic_mapping_read+472}
<ffffffff8014947f>{file_read_actor+0}
[ 2.650410] <ffffffff8014af0a>{__generic_file_aio_read+333}
<ffffffff8014b0c0>{generic_file_read+187}
[ 2.650433] <ffffffff8013d274>{autoremove_wake_function+0}
<ffffffff80163898>{vfs_read+200}
[ 2.650453] <ffffffff8016ba1f>{kernel_read+65}
<ffffffff8016d4c7>{do_execve+291}
[ 2.650464] <ffffffff8010cc31>{sys_execve+51}
<ffffffff80139bd8>{worker_thread+0}
[ 2.650475] <ffffffff8010ef34>{execve+100}
<ffffffff80139bd8>{worker_thread+0}
[ 2.650482] <ffffffff801397bb>{__call_usermodehelper+0}
<ffffffff80139abc>{____call_usermodehelper+134}
[ 2.650496] <ffffffff8010eec6>{child_rip+8}
<ffffffff80139bd8>{worker_thread+0}
[ 2.650504] <ffffffff801397bb>{__call_usermodehelper+0}
<ffffffff80139a36>{____call_usermodehelper+0}
[ 2.650514] <ffffffff8010eebe>{child_rip+0}
[ 2.693624] ATA: abnormal status 0x80 on port 0xE007
[ 2.693626]
[ 2.693627] Call
Trace:<ffffffff880488fa>{:libata:ata_qc_issue_prot+31}
<ffffffff8804ba48>{:libata:ata_scsi_translate+172}
[ 2.693646] <ffffffff880244bd>{:scsi_mod:scsi_done+0}
<ffffffff880244bd>{:scsi_mod:scsi_done+0}
[ 2.693670] <ffffffff8804bd06>{:libata:ata_scsi_queuecmd+219}
<ffffffff88024686>{:scsi_mod:scsi_dispatch_cmd+433}
[ 2.693694] <ffffffff8802978e>{:scsi_mod:scsi_request_fn+611}
<ffffffff80148ea3>{sync_page+0}
[ 2.693712] <ffffffff80224858>{generic_unplug_device+7}
<ffffffff80224851>{generic_unplug_device+0}
[ 2.693720] <ffffffff8807c118>{:dm_mod:dm_table_unplug_all+49}
<ffffffff8807a0f1>{:dm_mod:dm_unplug_all+29}
[ 2.693738] <ffffffff8807c118>{:dm_mod:dm_table_unplug_all+49}
<ffffffff8807a0f1>{:dm_mod:dm_unplug_all+29}
[ 2.693756] <ffffffff80164a08>{block_sync_page+58}
<ffffffff80148ed6>{sync_page+51}
[ 2.693765] <ffffffff8028eb11>{__wait_on_bit_lock+55}
<ffffffff801490fd>{__lock_page+94}
[ 2.693775] <ffffffff8013d37f>{wake_bit_function+0}
<ffffffff80149374>{find_get_page+12}
[ 2.693785] <ffffffff8014a4de>{do_generic_mapping_read+472}
<ffffffff8014947f>{file_read_actor+0}
[ 2.693793] <ffffffff8014af0a>{__generic_file_aio_read+333}
<ffffffff8014b0c0>{generic_file_read+187}
[ 2.693816] <ffffffff8013d274>{autoremove_wake_function+0}
<ffffffff80163898>{vfs_read+200}
[ 2.693835] <ffffffff8016ba1f>{kernel_read+65}
<ffffffff8016d4c7>{do_execve+291}
[ 2.693846] <ffffffff8010cc31>{sys_execve+51}
<ffffffff80139bd8>{worker_thread+0}
[ 2.693857] <ffffffff8010ef34>{execve+100}
<ffffffff80139bd8>{worker_thread+0}
[ 2.693864] <ffffffff801397bb>{__call_usermodehelper+0}
<ffffffff80139abc>{____call_usermodehelper+134}
[ 2.693878] <ffffffff8010eec6>{child_rip+8}
<ffffffff80139bd8>{worker_thread+0}
[ 2.693885] <ffffffff801397bb>{__call_usermodehelper+0}
<ffffffff80139a36>{____call_usermodehelper+0}
[ 2.693896] <ffffffff8010eebe>{child_rip+0}
Maybe a better fix would be to insert a delay after the resume but
before the driver even tries to access the hardware. Maybe somewhere in
a method in the DSDT?
>Or should have returned an IO error.
>
>Yes, this might be a driver bug. Again, if we know precisely which
>callsite is experiencing the timeout then we're better situated to fix it.
>
>
>
>
I made the timeouts happen again by returning the timeouts to the way
they were before, and issuing the command:
echo mem > /sys/power/state ; df
After the abnormal status messages, around 30 seconds went by, then this:
[ 14.596710] ata1: command 0x25 timeout, stat 0x50 host_stat 0x1
[ 14.596715]
[ 14.596716] Call
Trace:<ffffffff880498e2>{:libata:ata_eng_timeout+496}
<ffffffff88027690>{:scsi_mod:scsi_error_handler+0}
[ 14.596752] <ffffffff8804af99>{:libata:ata_scsi_error+21}
<ffffffff88027706>{:scsi_mod:scsi_error_handler+118}
[ 14.596775] <ffffffff88027690>{:scsi_mod:scsi_error_handler+0}
<ffffffff88027690>{:scsi_mod:scsi_error_handler+0}
[ 14.596803] <ffffffff8013d0d8>{kthread+191}
<ffffffff80139bd8>{worker_thread+0}
[ 14.596814] <ffffffff8010eec6>{child_rip+8}
<ffffffff80139bd8>{worker_thread+0}
[ 14.596822] <ffffffff8013cf04>{keventd_create_kthread+0}
<ffffffff8013d019>{kthread+0}
[ 14.596833] <ffffffff8010eebe>{child_rip+0}
[ 14.597403] ReiserFS: warning: is_tree_node: node level 2295 does not
match to the expected one 1
[ 14.597408] ReiserFS: dm-5: warning: vs-5150: search_by_key: invalid
format found in block 32828. Fsck?
[ 14.597414] ReiserFS: dm-5: warning: vs-13070:
reiserfs_read_locked_inode: i/o failure occurred trying to find stat
data of [2 1455 0x0 SD]
Bash reports command not found when trying to run df even now as I write
this.
I also noticed that during the 30 seconds or so that the system was
trying to load df before libata timed out the request, the cpu was
pegged at 100% utilization.
next prev parent reply other threads:[~2005-11-17 3:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-11-16 3:37 VIA SATA Raid needs a long time to recover from suspend Phillip Susi
2005-11-17 1:06 ` Andrew Morton
2005-11-17 3:55 ` Phillip Susi [this message]
2005-11-18 0:08 ` Mark Lord
2005-11-19 23:32 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=437BFF4A.4060402@cfl.rr.com \
--to=psusi@cfl.rr.com \
--cc=akpm@osdl.org \
--cc=jgarzik@pobox.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox