From: Levente Kurusa <levex@linux.com>
To: Robert Hancock <hancockrwd@gmail.com>
Cc: "linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>
Subject: Re: [PATCH] BIOS SATA legacy mode failure
Date: Tue, 17 Sep 2013 18:47:56 +0200 [thread overview]
Message-ID: <523887BC.50704@linux.com> (raw)
In-Reply-To: <CADLC3L3tGG4yGZKir2pPzMYeddPjSFuD77u87C=YYEqtVn908Q@mail.gmail.com>
2013-09-16 06:37 keltezéssel, Robert Hancock írta:
> On Sat, Sep 14, 2013 at 9:09 AM, Levente Kurusa <levex@linux.com> wrote:
>> 2013-09-10 06:01 keltezéssel, Robert Hancock írta:
>>
>>> On 09/08/2013 12:35 AM, Levente Kurusa wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have been testing the Linux Kernel on a two year Toshiba NB100
>>>> netbook of mine, however when I enabled SATA compatibility/legacy mode
>>>> instead of AHCI mode in the BIOS, the kernel got stuck. I have pasted
>>>> the relevant dmesg piece along with a patch that fixes it temporarily.
>>>> What I suspect to be the cause is that the BIOS sets the device into
>>>> IDE mode, but it will report it as a SATA device and hence libata tries
>>>> to send ATA commands to it, which obviously makes it go bad. The patch
>>>
>>>
>>> No, the commands are the same whichever mode the controller is in. The
>>> problem is presumably something else, like maybe some kind of interrupt
>>> routing problem when the controller is in legacy mode.
>>>
>> Yes, I see now.
>>
>>
>>>> fixes it, by adding a new field to ata_device called exce_cnt, which
>>>> counts how many exceptions have occured. After three exceptions, it
>>>> automatically disables the device. Also, please note this is my first
>>>> ever patch for the kernel :-)
>>>>
>>>> The following dmesg is stuck in an infinite loop.
>>>> dmesg:
>>>> ata3: lost interrupt (Status 0x50)
>>>> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>>> ata3.00: failed command: READ DMA
>>>> ata3.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
>>>> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4
>>>> (timeout)
>>>> ata3.00: status: { DRDY }
>>>> ata3: soft resetting link
>>>> ata3.00: configured for UDMA/33 (no error)
>>>> ata3.00: device reported invalid CHS sector 0
>>>> ata3: EH complete
>>>>
>>>> Patch that fixes the infinite loop:
>>>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>>>> index f9476fb..eeedf80 100644
>>>> --- a/drivers/ata/libata-eh.c
>>>> +++ b/drivers/ata/libata-eh.c
>>>> @@ -2437,6 +2437,14 @@ static void ata_eh_link_report(struct ata_link
>>>> *link)
>>>> ehc->i.action, frozen, tries_buf);
>>>> if (desc)
>>>> ata_dev_err(ehc->i.dev, "%s\n", desc);
>>>> + ehc->i.dev->exce_cnt ++;
>>>> + ata_dev_warn(ehc->i.dev, "Number of exceptions: %d\n",
>>>> ehc->i.dev->exce_cnt);
>>>> + /**
>>>> + * The device is failing terribly,
>>>> + * disable it to prevent damage.
>>>> + */
>>>> + if(ehc->i.dev->exce_cnt > 2)
>>>> + ata_dev_disable(ehc->i.dev);
>>>> } else {
>>>> ata_link_err(link, "exception Emask 0x%x "
>>>> "SAct 0x%x SErr 0x%x action 0x%x%s%s\n",
>>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>>> index eae7a05..fa52ee6 100644
>>>> --- a/include/linux/libata.h
>>>> +++ b/include/linux/libata.h
>>>> @@ -660,7 +660,8 @@ struct ata_device {
>>>> u8 devslp_timing[ATA_LOG_DEVSLP_SIZE];
>>>>
>>>> /* error history */
>>>> - int spdn_cnt;
>>>> + int spdn_cnt; /* Number of speed_downs */
>>>> + int exce_cnt; /* Number of exceptions that
>>>> happenned */
>>>> /* ering is CLEAR_END, read comment above CLEAR_END */
>>>> struct ata_ering ering;
>>>> };
>>>>
>>>
>>> This doesn't seem like a very good fix. It may prevent the apparent
>>> infinite loop but will just prevent that device from functioning at all.
>>> It would be better if we could figure out what was actually going wrong.
>>>
>>>
>> I have tested the problem with three different computers, all switched
>> to legacy/IDE/compatibility mode, and they didn't have this problem. Of
>> course, they could have been set to AHCI mode, and there the kernel would
>> boot normally. Feels strange, but so far I was only able to reproduce the
>> problem with a Toshiba MK8052GSX. On the topic of my patch, I still don't
>> see why a device which fails so terribly that it reports 3 exceptions
>> shouldn't be disabled. Like in this case, it could cause infinite loops.
>
> The problem is that this could happen in some cases when you wouldn't
> want to disable the device, like an error that just happens
> sporadically and works on retry, or a device you're trying to recover
> data from.
>
What do you think if I edit the patch in a way, that when an operation
successfully completes, it resets exce_cnt to zero. Might as well add a
module_param, which can set the maximum value of exce_cnt, while having
zero as an option to never disable the device. Please don't think me
wrong, I don't want to force this patch, I just want to learn how all
this works, and in the process try to make it better. :-)
--
Regards,
Levente Kurusa
next prev parent reply other threads:[~2013-09-17 16:48 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-08 6:35 [PATCH] BIOS SATA legacy mode failure Levente Kurusa
2013-09-10 4:01 ` Robert Hancock
2013-09-14 15:09 ` Levente Kurusa
2013-09-16 4:37 ` Robert Hancock
2013-09-17 16:47 ` Levente Kurusa [this message]
2013-09-18 1:35 ` Robert Hancock
2013-09-21 7:35 ` Levente Kurusa
2013-09-21 17:04 ` Robert Hancock
2013-09-22 7:13 ` Levente Kurusa
2013-09-25 6:31 ` Robert Hancock
2013-09-27 13:24 ` Levente Kurusa
2013-09-28 4:55 ` Robert Hancock
2013-09-28 17:46 ` Levente Kurusa
2013-09-29 1:21 ` Robert Hancock
2013-10-01 4:25 ` Robert Hancock
2013-10-11 16:07 ` Levente Kurusa
2013-10-12 2:06 ` Robert Hancock
[not found] ` <52591 681.1020001@linux.com>
2013-10-12 9:29 ` Levente Kurusa
2013-10-13 5:57 ` Robert Hancock
2013-10-13 12:02 ` Levente Kurusa
2013-10-16 0:16 ` Robert Hancock
2013-10-16 14:42 ` Levente Kurusa
2013-10-22 1:34 ` Robert Hancock
2013-10-22 2:12 ` Aaron Lu
2013-10-22 14:32 ` Levente Kurusa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=523887BC.50704@linux.com \
--to=levex@linux.com \
--cc=hancockrwd@gmail.com \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).