From: Jeff Garzik <jeff@garzik.org>
To: Tejun Heo <htejun@gmail.com>
Cc: ric@emc.com, linux-ide@vger.kernel.org,
Mark Lord <mlord@pobox.com>, Jens Axboe <axboe@suse.de>
Subject: Re: [RFT] major libata update
Date: Mon, 22 May 2006 03:19:19 -0400 [thread overview]
Message-ID: <447165F7.1070905@garzik.org> (raw)
In-Reply-To: <446A7BF6.3090103@gmail.com>
Tejun Heo wrote:
> Jeff Garzik wrote:
>> Tejun Heo wrote:
>>> Jeff Garzik wrote:
>>>> Jeff Garzik wrote:
>>>>> Tejun Heo wrote:
>>>>>> Hmmm.. The drive is issuing SDB FIS which completes already
>>>>>> completed tags. This could be dangerous. Depending on timing, it
>>>>>> might end up finishing a command which occupied the slot which
>>>>>> hasn't been processed yet. If a drive does this, NCQ shouldn't be
>>>>>> enabled for it. Can you post full boot dmesg?
>>>>>
>>>>> I'm not sure the data supports that conclusion? PORT_IRQ_SDB_FIS
>>>>> is quite normal and expected during NCQ operation, if that
>>>>> interrupt is enabled. Just normal SDB:Entry and SDB:SetIntr states.
>>>>
>>>> Strike that last part: PORT_IRQ_SDB_FIS will appear, as with other
>>>> status bits, even if the enable bit is not set.
>>>>
>>>> So, you'll see that whenever you get an SDB FIS during normal
>>>> operation.
>>>
>>> The problem is with the second dword. Here are some of spurious SDB
>>> FISes Ric's AHCI was receiving.
>>>
>>> 004040a1:10000000
>>> 004040a1:00000020
>>> 004040a1:00000080
>>>
>>> If the second dword were all zero, it's simply SDB FIS turning on IRQ
>>> (bit 14 of the first dword) and there's nothing to worry about.
>>> However, all those spurious SDBs have one bit set in the second dword
>>> - meaning the SDB completes the corresponding tag, but the tag isn't
>>> active when those SDBs are received.
>>>
>>> This is okay as long as the controller thinks the tags are unoccupied
>>> when those SDBs are received, but it's not something which can be
>>> guaranteed. NCQ command synchronization depends on devices not
>>> completing the same commands more than once.
>>>
>>> The duplicate completions might be okay if the drive guarantees it
>>> doesn't send it if it loses to command issuance. e.g.
>>>
>>> 1. drive sends completion for tag x
>>> 2. drive shortly schedules another completion for tag x (spurious)
>>> 3. ahci/driver complete tag x
>>> 4. ahci/driver issues tag x
>>> 5. drive receives command for tag x before sending the spurious
>>> completion and determines not to send the spurious completion. (not
>>> very likely)
>>>
>>> If above is true, the drive might be okay, but nobody can guarantee
>>> how various controllers react. It depends on how controllers manage
>>> SActive (when to turn bits on). At any rate, it's dangerous IMHO.
>>
>> If the silicon is screwing up SActive bits, then we have bigger
>> problems than spurious interrupts.
>>
>> So, the typical policy of Internet servers applies here: "be liberal
>> in what you accept." For smart controllers like AHCI, we will simply
>> set the desired IRQ mask, then happily receive and ack events anytime
>> the controller decides to raise them. If the controller decides to
>> send us a no-op, don't worry about it. This is particularly true when
>> we turn on Command Coalescing, where we'll have a run of work
>> initiated [sometimes] by an internal timer, rather than an actual FIS
>> reception.
>
> I wish I could explain it better. This is a clear protocol violation
> from the drive. Depending on specific implementation of the drive and
> the controller, it can result in completion of command which is not
> processed yet (data corruption!).
I quite understand the implications. My argument comes from a different
angle: I don't feel we should be adding tons of code that essentially
validates the silicon. There are plenty of chances for the hardware to
fuck up in a way that corrupts data, and is also difficult to detect.
Pre-production BIOS have even done silly things like turn off data
verification (checksum) by default. Talk about subtle corruption...
So I feel the best path is to use the hardware programming sequences
described in the spec, because that's what the chip designers and Q/A
engineers validate with (read: the Windows driver).
Once we have deployed drivers with the standard programming sequences,
_then_ we can consider looking into proper spurious interrupt
accounting. The current AHCI interrupt accounting stuff is not nearly
as accurate as it should be, which implies that the code simply should
not exist at the present time.
Jeff
next prev parent reply other threads:[~2006-05-22 7:19 UTC|newest]
Thread overview: 115+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-15 17:00 [RFT] major libata update Jeff Garzik
2006-05-15 17:18 ` Andrew Morton
2006-05-15 18:06 ` Jeff Garzik
2006-05-15 19:06 ` Arkadiusz Miskiewicz
2006-05-15 20:45 ` Jeff Garzik
2006-05-15 19:33 ` Mark Lord
2006-05-15 22:52 ` Tejun Heo
2006-05-15 18:15 ` Jeff Garzik
2006-05-15 18:27 ` Andrew Morton
2006-05-15 18:44 ` Jeff Garzik
2006-05-15 18:37 ` Alan Cox
2006-05-15 17:19 ` Alan Cox
2006-05-15 17:13 ` Jeff Garzik
2006-05-15 18:29 ` Tomasz Torcz
2006-05-15 18:43 ` Jeff Garzik
2006-05-15 23:32 ` Tejun Heo
2006-05-15 23:49 ` Jeff Garzik
2006-05-16 0:04 ` Tejun Heo
2006-05-16 2:15 ` Tejun Heo
2006-05-15 19:15 ` Jeff Garzik
2006-05-15 23:02 ` Wakko Warner
2006-05-15 23:00 ` Jeff Garzik
2006-05-15 23:13 ` Wakko Warner
2006-05-15 23:19 ` Jeff Garzik
2006-05-15 23:40 ` Alan Cox
2006-05-15 23:50 ` Wakko Warner
2006-05-15 23:38 ` Alan Cox
2006-05-15 23:47 ` Wakko Warner
2006-05-15 23:45 ` Jeff Garzik
2006-05-15 23:30 ` Avuton Olrich
2006-05-15 23:36 ` Tejun Heo
2006-05-15 23:54 ` Jeff Garzik
2006-05-16 0:08 ` Avuton Olrich
2006-05-16 3:36 ` Avuton Olrich
2006-05-16 3:51 ` Jeff Garzik
2006-05-16 4:33 ` Avuton Olrich
2006-05-16 14:57 ` Linus Torvalds
2006-05-17 15:25 ` OGAWA Hirofumi
2006-05-17 23:40 ` Linus Torvalds
2006-05-17 23:48 ` Jeff Garzik
2006-05-18 1:48 ` Alan Cox
2006-05-17 23:49 ` Linus Torvalds
2006-05-16 15:02 ` Jeff Garzik
2006-05-16 3:55 ` Tejun Heo
2006-05-16 4:37 ` Avuton Olrich
2006-05-16 11:36 ` Ric Wheeler
2006-05-16 14:25 ` Jeff Garzik
2006-05-16 15:24 ` Tejun Heo
2006-05-16 18:29 ` Ric Wheeler
2006-05-16 21:41 ` Ric Wheeler
2006-05-16 22:02 ` Jeff Garzik
2006-05-16 23:11 ` Eric D. Mudama
2006-05-17 2:13 ` Ric Wheeler
2006-05-16 23:23 ` Tejun Heo
2006-05-17 2:09 ` Ric Wheeler
2006-05-16 23:44 ` Tejun Heo
2006-05-16 23:53 ` Jeff Garzik
2006-05-17 0:00 ` Jeff Garzik
2006-05-17 0:29 ` Tejun Heo
2006-05-17 1:08 ` Jeff Garzik
2006-05-17 1:27 ` Tejun Heo
2006-05-17 2:26 ` Jeff Garzik
2006-05-17 3:05 ` Tejun Heo
2006-05-22 7:19 ` Jeff Garzik [this message]
2006-05-23 13:59 ` Tejun Heo
2006-05-17 0:31 ` Jeff Garzik
2006-05-17 0:50 ` Tejun Heo
2006-05-17 0:57 ` Tejun Heo
2006-05-17 2:22 ` Ric Wheeler
2006-05-17 1:37 ` Tejun Heo
2006-05-17 3:57 ` Ric Wheeler
2006-05-17 4:44 ` Tejun Heo
2006-05-17 11:30 ` Ric Wheeler
2006-05-17 20:45 ` Ric Wheeler
2006-05-17 21:01 ` Mark Lord
2006-05-17 21:04 ` Jeff Garzik
2006-05-17 21:50 ` Tejun Heo
2006-05-17 21:56 ` Mark Lord
2006-05-17 22:00 ` Jeff Garzik
2006-05-17 22:03 ` Mark Lord
2006-05-17 22:13 ` Jeff Garzik
2006-05-18 3:33 ` Ric Wheeler
2006-05-18 3:26 ` Tejun Heo
2006-05-18 11:58 ` Ric Wheeler
2006-05-18 12:52 ` Mark Lord
2006-05-18 13:22 ` Ric Wheeler
2006-05-18 13:37 ` Jens Axboe
2006-05-17 1:13 ` Jeff Garzik
2006-05-17 1:14 ` Jeff Garzik
2006-05-17 2:16 ` Ric Wheeler
2006-05-16 23:34 ` Jeff Garzik
2006-05-16 23:53 ` Tejun Heo
2006-05-17 2:05 ` Andrew Morton
2006-05-17 4:49 ` Tejun Heo
2006-05-17 4:56 ` Andrew Morton
2006-05-17 5:14 ` Tejun Heo
2006-05-17 6:35 ` Tejun Heo
2006-05-18 11:24 ` Albert Lee
2006-05-18 11:33 ` Tejun Heo
2006-05-19 10:37 ` Albert Lee
2006-05-19 11:03 ` Tejun Heo
2006-05-22 3:51 ` [PATCH 1/1] libata: use polling pio for identify device Albert Lee
2006-05-22 6:24 ` Jeff Garzik
2006-05-23 2:27 ` Albert Lee
2006-05-18 23:07 ` [RFT] major libata update Andrew Morton
2006-05-19 1:14 ` Tejun Heo
2006-05-19 2:06 ` Jeff Garzik
2006-05-19 2:16 ` Tejun Heo
2006-05-22 7:22 ` Jeff Garzik
2006-05-21 23:51 ` Michael Sterrett -Mr. Bones.-
2006-05-22 2:42 ` Tejun Heo
2006-05-22 3:42 ` Michael Sterrett -Mr. Bones.-
2006-05-22 6:23 ` Michael Sterrett -Mr. Bones.-
-- strict thread matches above, loose matches on Subject: below --
2006-05-17 7:35 Matthieu CASTET
2006-05-18 0:36 Brown, Len
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=447165F7.1070905@garzik.org \
--to=jeff@garzik.org \
--cc=axboe@suse.de \
--cc=htejun@gmail.com \
--cc=linux-ide@vger.kernel.org \
--cc=mlord@pobox.com \
--cc=ric@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).