Re: VIA SATA Raid needs a long time to recover from suspend

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@osdl.org>
To: Phillip Susi <psusi@cfl.rr.com>
Cc: linux-kernel@vger.kernel.org, Jeff Garzik <jgarzik@pobox.com>
Subject: Re: VIA SATA Raid needs a long time to recover from suspend
Date: Wed, 16 Nov 2005 17:06:42 -0800	[thread overview]
Message-ID: <20051116170642.313aeada.akpm@osdl.org> (raw)
In-Reply-To: <437AA996.9080505@cfl.rr.com>

Phillip Susi <psusi@cfl.rr.com> wrote:
>
> I have been debugging a power management problem for a few days now, and 
> I believe I have finally solved the problem.  Because it involved 
> patching the kernel, I felt I should share the fix here in hopes that it 
> can be improved and/or integrated into future kernels.  Right now I am 
> running 2.6.14.2 on amd64, compiled myself, with the ubuntu breezy amd64 
> distribution. 
> 
> First I'll state the fix.  It involved changing two lines in 
> include/linux/libata.h:
> 
> static inline u8 ata_busy_wait(struct ata_port *ap, unsigned int bits,
>                    unsigned int max)
> {
>     u8 status;
> 
>     do {
>         udelay(100);                                 <-- changed to 100 
> from 10
>         status = ata_chk_status(ap);
>         max--;
>     } while ((status & bits) && (max > 0));
> 
>     return status;
> }
> 
> and:
> 
> static inline u8 ata_wait_idle(struct ata_port *ap)
> {
>     u8 status = ata_busy_wait(ap, ATA_BUSY | ATA_DRQ, 
> 10000);             <-- changed to 10,000 from 1,000
> 
>     if (status & (ATA_BUSY | ATA_DRQ)) {
>         unsigned long l = ap->ioaddr.status_addr;
>         printk(KERN_WARNING
>                "ATA: abnormal status 0x%X on port 0x%lX\n",
>                status, l);
>     }
> 
>     return status;
> }

This change will increase the minimum delay in both ata_wait_idle() and
ata_busy_wait() from 10 usec to 100 usec, which is not a good change.

It would be less damaging to increase the delay in ata_wait_idle() from
1000 to 100,000.  A one second spin is a bit sad, but the hardware's bust,
so that's the least of the user's worries.

The best fix would be to identify the specific _callers_ of these functions
which need their delay increased.   You can do that by adding:

	if (max == 0)
		dump_stack();

at the end of ata_busy_wait().  The resulting stack dumps will tell you
where the offending callsites are.  With luck, it's just one.  If we know
which one, that might point us at some chipset-level delay which we're
forgetting to do, or something like that.

> The problem seems to be that my VIA SATA raid controller requires more 
> time to recover from being suspended.  It looks like the code in 
> sata_via.c restores the task file after a resume, then calls 
> ata_wait_idle to wait for the busy bit to clear.  The problem was that 
> this function timed out before the busy bit cleared, resulting in 
> messages like this:
> 
> ATA: abnormal status 0x80 on port 0xE007
> 
> Then if there was an IO request made immediately after resuming, it 
> would timeout and fail, because it was issued before the hardware was 
> ready.  Changing the timeout resolved this.  I tried changing both the 
> udelay and ata_busy_wait lines to increase the timeout, and it did not 
> seem to matter which I changed, as long as the total timeout was 
> increased by a factor of 100. 
> 
> Since increasing the maximum timeout, suspend and hibernate work great 
> for me.  While experiencing this bug, it may have exposed another bug, 
> which I will mention now in passing.  As I said before, after a resume, 
> if there was an IO request made immediately ( before the busy bit 
> finally did clear ) it would timeout and fail.  It seemed the kernel 
> filled the buffer cache for the requested block with garbage rather than 
> retry the read.  It seems to me that at some point, the read should have 
> been retried.

Or should have returned an IO error.

Yes, this might be a driver bug.  Again, if we know precisely which
callsite is experiencing the timeout then we're better situated to fix it.

next prev parent reply	other threads:[~2005-11-17  1:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-16  3:37 VIA SATA Raid needs a long time to recover from suspend Phillip Susi
2005-11-17  1:06 ` Andrew Morton [this message]
2005-11-17  3:55   ` Phillip Susi
2005-11-18  0:08     ` Mark Lord
2005-11-19 23:32       ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051116170642.313aeada.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=psusi@cfl.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox