All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergei Shtylyov <sshtylyov@ru.mvista.com>
To: Linas Vepstas <linas@austin.ibm.com>
Cc: linux-ide@vger.kernel.org
Subject: Re: [RFT] hpt366: reset DMA state machine on timeouts
Date: Fri, 22 Jun 2007 19:32:44 +0400	[thread overview]
Message-ID: <467BEB9C.1070407@ru.mvista.com> (raw)
In-Reply-To: <20070622151359.GD8840@austin.ibm.com>

Hello.

Linas Vepstas wrote:

>>Reset HPT36x's DMA state machine on a DMA timeout the way it's done for HPT370.

>>Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

>>---
>>Linas, here's what I've come up with -- this should apply against 2.6.21.y.
>>Compile-tested only, not for merging.

>> drivers/ide/pci/hpt366.c |   24 +++++++++++++++++++++++-

> This worked great!  The patch is good. But it raises another interesting
> issue, one of those akpm ZFS "voilates boundaries" isses.

> However.. When raid goes to reconstruct the partition, I get one
> of the Drive Ready Seek Complete etc. messages.  Your handler recovers 

    I hope you meant those messages were preceeded by DMA timeouts (otherwise 
this code wouldn't come into action).

> from it (I put in a printk to verify this).

    You mean into my ide_dma_timeout() method?

> And so these printk's
> try to get logged into /var/log/messages ... which trigger more 
> errors. At a very high rate ... sometimes hundreds a second, sometimes
> less.  The system remains usable, but at one point, it hit 60% cpu usage
> spewing these messages to the screen.  

    Hm...

> I'd like to see several things.

> 1) This patch should go in.  It converts a system that hangs into
>    one that doesn't hang.

    What's strange is that it never seemed to be necessary before your great 
new drive... ;-)
    So, providing its data certainly wouldn't hurt -- perhaps we just should 
blacklist it instead -- maybe there's a UDMA speed at which this wouldn't 
happen, and we could just limit the drive to it.

> 2) There needs to be a way of failing the disk when there's a high
>    number of errors. e.g. if there are more than 100 errors per minute
>    then the disk needs to be marked "failed" in the raid array.

>    Note it should be stopped only if the rate is high: if there is 
>    only 1 error per minte, this might be very annoying, but acceptable,
>    esp. if one is just trying to copy data off the disk.

>    I'm not sure what to do if this had been the only disk in the system.
>    Maybe if the eror reate exceed 100/minute, then dma is turned off 
>    permanently?

    In fact, it should be turned off after 3 DMA errors (causing PIO retries).

> --linas

MBR, Sergei

  reply	other threads:[~2007-06-22 15:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-21 17:54 [RFT] hpt366: reset DMA state machine on timeouts Sergei Shtylyov
2007-06-21 19:31 ` Linas Vepstas
2007-06-22 15:13 ` Linas Vepstas
2007-06-22 15:32   ` Sergei Shtylyov [this message]
2007-06-22 16:36     ` Linas Vepstas
2007-06-23 18:10       ` Sergei Shtylyov
2007-06-25 21:44         ` Linas Vepstas
2007-06-26 13:57           ` Sergei Shtylyov
2007-06-22 15:54   ` Alan Cox
2007-06-22 16:03     ` Linas Vepstas
2007-06-22 16:33       ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=467BEB9C.1070407@ru.mvista.com \
    --to=sshtylyov@ru.mvista.com \
    --cc=linas@austin.ibm.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.