linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert Hancock <hancockrwd@gmail.com>
To: levex@linux.com
Cc: "linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>
Subject: Re: [PATCH] BIOS SATA legacy mode failure
Date: Sat, 21 Sep 2013 11:04:12 -0600	[thread overview]
Message-ID: <CADLC3L3WCMWc4kuJ1-_GbFinEyCABuuh3Fonh641SptsfYDaeA@mail.gmail.com> (raw)
In-Reply-To: <523D4C4C.5070400@linux.com>

On Sat, Sep 21, 2013 at 1:35 AM, Levente Kurusa <levex@linux.com> wrote:
>>>>>>> The following dmesg is stuck in an infinite loop.
>>>>>>> dmesg:
>>>>>>> ata3: lost interrupt (Status 0x50)
>>>>>>> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>>>>>> ata3.00: failed command: READ DMA
>>>>>>> ata3.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
>>>>>>>                  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4
>>>>>>> (timeout)
>>>>>>> ata3.00: status: { DRDY }
>>>>>>> ata3: soft resetting link
>>>>>>> ata3.00: configured for UDMA/33 (no error)
>>>>>>> ata3.00: device reported invalid CHS sector 0
>>>>>>> ata3: EH complete
>>>>>>>
>>>>>>> Patch that fixes the infinite loop:
>>>>>>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>>>>>>> index f9476fb..eeedf80 100644
>>>>>>> --- a/drivers/ata/libata-eh.c
>>>>>>> +++ b/drivers/ata/libata-eh.c
>>>>>>> @@ -2437,6 +2437,14 @@ static void ata_eh_link_report(struct ata_link
>>>>>>> *link)
>>>>>>>                                ehc->i.action, frozen, tries_buf);
>>>>>>>                    if (desc)
>>>>>>>                            ata_dev_err(ehc->i.dev, "%s\n", desc);
>>>>>>> +               ehc->i.dev->exce_cnt ++;
>>>>>>> +               ata_dev_warn(ehc->i.dev, "Number of exceptions:
>>>>>>> %d\n",
>>>>>>> ehc->i.dev->exce_cnt);
>>>>>>> +               /**
>>>>>>> +                  * The device is failing terribly,
>>>>>>> +                 * disable it to prevent damage.
>>>>>>> +                 */
>>>>>>> +               if(ehc->i.dev->exce_cnt > 2)
>>>>>>> +                       ata_dev_disable(ehc->i.dev);
>>>>>>>            } else {
>>>>>>>                    ata_link_err(link, "exception Emask 0x%x "
>>>>>>>                                 "SAct 0x%x SErr 0x%x action
>>>>>>> 0x%x%s%s\n",
>>>>>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>>>>>> index eae7a05..fa52ee6 100644
>>>>>>> --- a/include/linux/libata.h
>>>>>>> +++ b/include/linux/libata.h
>>>>>>> @@ -660,7 +660,8 @@ struct ata_device {
>>>>>>>            u8
>>>>>>> devslp_timing[ATA_LOG_DEVSLP_SIZE];
>>>>>>>
>>>>>>>            /* error history */
>>>>>>> -       int                     spdn_cnt;
>>>>>>> +       int                     spdn_cnt; /* Number of speed_downs */
>>>>>>> +       int                     exce_cnt; /* Number of exceptions
>>>>>>> that
>>>>>>> happenned */
>>>>>>>            /* ering is CLEAR_END, read comment above CLEAR_END */
>>>>>>>            struct ata_ering        ering;
>>>>>>>     };
>>>>>>>
>>>>>>
>>>>>> This doesn't seem like a very good fix. It may prevent the apparent
>>>>>> infinite loop but will just prevent that device from functioning at
>>>>>> all.
>>>>>> It would be better if we could figure out what was actually going
>>>>>> wrong.
>>>>>>
>>>>>>
>>>>> I have tested the problem with three different computers, all switched
>>>>> to legacy/IDE/compatibility mode, and they didn't have this problem. Of
>>>>> course, they could have been set to AHCI mode, and there the kernel
>>>>> would
>>>>> boot normally. Feels strange, but so far I was only able to reproduce
>>>>> the
>>>>> problem with a Toshiba MK8052GSX. On the topic of my patch, I still
>>>>> don't
>>>>> see why a device which fails so terribly that it reports 3 exceptions
>>>>> shouldn't be disabled. Like in this case, it could cause infinite
>>>>> loops.
>>>>
>>>>
>>>>
>>>> The problem is that this could happen in some cases when you wouldn't
>>>> want to disable the device, like an error that just happens
>>>> sporadically and works on retry, or a device you're trying to recover
>>>> data from.
>>>>
>>> What do you think if I edit the patch in a way, that when an operation
>>> successfully completes, it resets exce_cnt to zero. Might as well add a
>>> module_param, which can set the maximum value of exce_cnt, while having
>>> zero
>>> as an option to never disable the device. Please don't think me wrong, I
>>> don't want to force this patch, I just want to learn how all this works,
>>> and
>>> in the process try to make it better. :-)
>>
>>
>> That would be better, but I think you're still going to have an issue
>> with what magic number to pick to avoid disabling devices
>> inappropriately.
>>
>> Conceptually, disabling the device doesn't really make sense anyway.
>> If someone in userspace wants to keep trying to read from that device,
>> why would you stop them because of some arbitrary judgement? The
>> kernel itself isn't "locked up" during this process, anything not
>> blocked on I/O to that device should be able to continue running, so
>> that process is only hurting itself. If the system fails to boot from
>> another device due to this, this would likely point out some kind of
>> problem in userspace or the distro boot process being overly
>> serialized.
>>
>
> I have been booting up with the initramfs from ubuntu 13.04,
> and I have also tried to boot with the ubuntu install cd. They couldn't
> continue the boot process. I'm gonna spend the weekend trying to figure
> out where and why the interrupts don't happen. Whether it be a routing
> or a hardware issue, which I highly doubt due to the fact that Windows
> XP SP2 was able to boot up without errors.

Are you able to get out full dmesg output from a boot attempt and the
contents of /proc/interrupts?

  reply	other threads:[~2013-09-21 17:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-08  6:35 [PATCH] BIOS SATA legacy mode failure Levente Kurusa
2013-09-10  4:01 ` Robert Hancock
2013-09-14 15:09   ` Levente Kurusa
2013-09-16  4:37     ` Robert Hancock
2013-09-17 16:47       ` Levente Kurusa
2013-09-18  1:35         ` Robert Hancock
2013-09-21  7:35           ` Levente Kurusa
2013-09-21 17:04             ` Robert Hancock [this message]
2013-09-22  7:13               ` Levente Kurusa
2013-09-25  6:31                 ` Robert Hancock
2013-09-27 13:24                   ` Levente Kurusa
2013-09-28  4:55                     ` Robert Hancock
2013-09-28 17:46                       ` Levente Kurusa
2013-09-29  1:21                         ` Robert Hancock
2013-10-01  4:25                           ` Robert Hancock
2013-10-11 16:07                             ` Levente Kurusa
2013-10-12  2:06                               ` Robert Hancock
     [not found]                                 ` <52591 681.1020001@linux.com>
2013-10-12  9:29                                 ` Levente Kurusa
2013-10-13  5:57                                   ` Robert Hancock
2013-10-13 12:02                                     ` Levente Kurusa
2013-10-16  0:16                                       ` Robert Hancock
2013-10-16 14:42                                         ` Levente Kurusa
2013-10-22  1:34                                           ` Robert Hancock
2013-10-22  2:12                                             ` Aaron Lu
2013-10-22 14:32                                               ` Levente Kurusa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADLC3L3WCMWc4kuJ1-_GbFinEyCABuuh3Fonh641SptsfYDaeA@mail.gmail.com \
    --to=hancockrwd@gmail.com \
    --cc=levex@linux.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).