linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Jeff Garzik <jeff@garzik.org>
Cc: ric@emc.com, linux-ide@vger.kernel.org,
	Mark Lord <mlord@pobox.com>, Jens Axboe <axboe@suse.de>
Subject: Re: [RFT] major libata update
Date: Wed, 17 May 2006 12:05:01 +0900	[thread overview]
Message-ID: <446A92DD.4040201@gmail.com> (raw)
In-Reply-To: <446A89D3.60002@garzik.org>

Jeff Garzik wrote:
> A spurious SDB FIS updating SActive is bad news, yes.  But with a busy 
> AHCI controller, perhaps sharing PCI interrupts, I think there is 
> distinct potential to be flagged as a spurious interrupt, when not.

You are talking about cases where the controller raises SDB IRQ without 
actually receiving one, right?  That is certainly possible and if this 
is the case, I'm very okay with eating the IRQ.

> But I'm taking a higher level view than that, from two angles:
> 
> 1) I think its a waste of time to even worry about this.  We should just 
> program AHCI to spec, and  let the controller and devices talk to each 
> other as they will.  If there are spurious completions, that should show 
> up elsewhere via tag poisoning and/or tag rotation.  Or data corruption, 
> if nothing else.  We'll know, even without the potentially-questionable 
> spurious interrupt detection code.
> 
> 2) Given the factors mentioned above -- shared irq and busy 
> multi-controller controller -- highly asynchronous conditions combine to 
> create a higher probability of seeing events arrive while you're 
> processing other events.  Overall, I feel that trying to accurately 
> account for everything going on in silicon leads to madness and 
> complexity :)  Just hand things to silicon, and trust that it either 
> accurately accounts SDB FISs, or will be quite obviously broken under 
> stress.

I see your point.  But I'm quite worried about this one, because

* This won't occur often and even when it occurs at the hardware level, 
it might be masked by processing delays in the kernel.  But when it 
blows, it will corrupt data silently.  We might not see such a report 
for months and even when it strikes pinpointing the origin will be 
extremely difficult.

* The danger of the race condition is not theoretical.  I think I can 
trigger it with AHCI.  AHCI delegates management of SActive register to 
the driver.

Remember the lost command completion bug in very early AHCI NCQ 
implementation?  It was caused by setting SActive after issuing the 
command.  Command completion SDB was received before SActive was cleared 
and when the IRQ handler kicks in afterward, it saw the bit which was 
set after completion and thus couldn't complete the command resulting in 
timeout.

This problem is similar except that it's the other way around.  From 
ahci_qc_issue()...

	if (qc->tf.protocol == ATA_PROT_NCQ)
		writel(1 << qc->tag, port_mmio + PORT_SCR_ACT);
	writel(1 << qc->tag, port_mmio + PORT_CMD_ISSUE);

Let's say we insert mdelay(10) between SActive setting and actual 
command issuance.  It will drastically increase the chance of receiving 
the spurious SDB FISes inbetween.  The following sequence is likely when 
such event occurs.

1. driver sets SActive bit for tag x
2. disk issues spurious SDB completion for tag x
3. ahci receives the SDB FIS, clears the corresponding SActive bit and 
raises interrupt
4. driver issues command for tag x and complete command issuance
5. IRQ handler kicks in sees that SActive for tag x is clear and 
completes the corresponding qc.
6. we now have data corruption.

I want to emphasize that SActive manipulation correctness is highly 
critical for NCQ integrity.  It's the thin thread the whole NCQ protocol 
hangs on.  Spurious SDB FIS amounts to random command completion without 
synchronization.  This is a serious problem which requires handling and 
I'm pretty sure I'm not being paranoid here.

I think detection of this problem proves the necessity for spurious 
interrupt reporting, at least until NCQ driver and hardware mature.  We 
still don't know how drives act with NCQ commands and definitely need to 
learn what sorts of culprits they have and how to work around them.  The 
spurious interrupt handling allows us to detect and diagnose such 
problems much better.

Am I making any sense?  I really wish I could write better.  Please give 
it some thoughts even if my writing sucks.  :-)

-- 
tejun

  reply	other threads:[~2006-05-17  3:05 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-15 17:00 [RFT] major libata update Jeff Garzik
2006-05-15 17:18 ` Andrew Morton
2006-05-15 18:06   ` Jeff Garzik
2006-05-15 19:06     ` Arkadiusz Miskiewicz
2006-05-15 20:45       ` Jeff Garzik
2006-05-15 19:33     ` Mark Lord
2006-05-15 22:52       ` Tejun Heo
2006-05-15 18:15   ` Jeff Garzik
2006-05-15 18:27     ` Andrew Morton
2006-05-15 18:44       ` Jeff Garzik
2006-05-15 18:37     ` Alan Cox
2006-05-15 17:19 ` Alan Cox
2006-05-15 17:13   ` Jeff Garzik
2006-05-15 18:29 ` Tomasz Torcz
2006-05-15 18:43   ` Jeff Garzik
2006-05-15 23:32     ` Tejun Heo
2006-05-15 23:49       ` Jeff Garzik
2006-05-16  0:04         ` Tejun Heo
2006-05-16  2:15           ` Tejun Heo
2006-05-15 19:15 ` Jeff Garzik
2006-05-15 23:02 ` Wakko Warner
2006-05-15 23:00   ` Jeff Garzik
2006-05-15 23:13     ` Wakko Warner
2006-05-15 23:19       ` Jeff Garzik
2006-05-15 23:40     ` Alan Cox
2006-05-15 23:50       ` Wakko Warner
2006-05-15 23:38   ` Alan Cox
2006-05-15 23:47     ` Wakko Warner
2006-05-15 23:45       ` Jeff Garzik
2006-05-15 23:30 ` Avuton Olrich
2006-05-15 23:36   ` Tejun Heo
2006-05-15 23:54   ` Jeff Garzik
2006-05-16  0:08     ` Avuton Olrich
2006-05-16  3:36     ` Avuton Olrich
2006-05-16  3:51       ` Jeff Garzik
2006-05-16  4:33         ` Avuton Olrich
2006-05-16 14:57           ` Linus Torvalds
2006-05-17 15:25             ` OGAWA Hirofumi
2006-05-17 23:40               ` Linus Torvalds
2006-05-17 23:48                 ` Jeff Garzik
2006-05-18  1:48                   ` Alan Cox
2006-05-17 23:49                 ` Linus Torvalds
2006-05-16 15:02           ` Jeff Garzik
2006-05-16  3:55       ` Tejun Heo
2006-05-16  4:37         ` Avuton Olrich
2006-05-16 11:36 ` Ric Wheeler
2006-05-16 14:25   ` Jeff Garzik
2006-05-16 15:24     ` Tejun Heo
2006-05-16 18:29       ` Ric Wheeler
2006-05-16 21:41         ` Ric Wheeler
2006-05-16 22:02           ` Jeff Garzik
2006-05-16 23:11             ` Eric D. Mudama
2006-05-17  2:13               ` Ric Wheeler
2006-05-16 23:23             ` Tejun Heo
2006-05-17  2:09               ` Ric Wheeler
2006-05-16 23:44         ` Tejun Heo
2006-05-16 23:53           ` Jeff Garzik
2006-05-17  0:00             ` Jeff Garzik
2006-05-17  0:29               ` Tejun Heo
2006-05-17  1:08                 ` Jeff Garzik
2006-05-17  1:27                   ` Tejun Heo
2006-05-17  2:26                     ` Jeff Garzik
2006-05-17  3:05                       ` Tejun Heo [this message]
2006-05-22  7:19                     ` Jeff Garzik
2006-05-23 13:59                       ` Tejun Heo
2006-05-17  0:31               ` Jeff Garzik
2006-05-17  0:50                 ` Tejun Heo
2006-05-17  0:57                   ` Tejun Heo
2006-05-17  2:22                     ` Ric Wheeler
2006-05-17  1:37                       ` Tejun Heo
2006-05-17  3:57                         ` Ric Wheeler
2006-05-17  4:44                           ` Tejun Heo
2006-05-17 11:30                             ` Ric Wheeler
2006-05-17 20:45                             ` Ric Wheeler
2006-05-17 21:01                             ` Mark Lord
2006-05-17 21:04                               ` Jeff Garzik
2006-05-17 21:50                                 ` Tejun Heo
2006-05-17 21:56                                   ` Mark Lord
2006-05-17 22:00                                     ` Jeff Garzik
2006-05-17 22:03                                       ` Mark Lord
2006-05-17 22:13                                         ` Jeff Garzik
2006-05-18  3:33                                     ` Ric Wheeler
2006-05-18  3:26                                       ` Tejun Heo
2006-05-18 11:58                                         ` Ric Wheeler
2006-05-18 12:52                                           ` Mark Lord
2006-05-18 13:22                                             ` Ric Wheeler
2006-05-18 13:37                                               ` Jens Axboe
2006-05-17  1:13                   ` Jeff Garzik
2006-05-17  1:14                   ` Jeff Garzik
2006-05-17  2:16           ` Ric Wheeler
2006-05-16 23:34       ` Jeff Garzik
2006-05-16 23:53         ` Tejun Heo
2006-05-17  2:05 ` Andrew Morton
2006-05-17  4:49   ` Tejun Heo
2006-05-17  4:56     ` Andrew Morton
2006-05-17  5:14       ` Tejun Heo
2006-05-17  6:35         ` Tejun Heo
2006-05-18 11:24           ` Albert Lee
2006-05-18 11:33             ` Tejun Heo
2006-05-19 10:37               ` Albert Lee
2006-05-19 11:03                 ` Tejun Heo
2006-05-22  3:51                   ` [PATCH 1/1] libata: use polling pio for identify device Albert Lee
2006-05-22  6:24                     ` Jeff Garzik
2006-05-23  2:27                       ` Albert Lee
2006-05-18 23:07           ` [RFT] major libata update Andrew Morton
2006-05-19  1:14             ` Tejun Heo
2006-05-19  2:06               ` Jeff Garzik
2006-05-19  2:16                 ` Tejun Heo
2006-05-22  7:22           ` Jeff Garzik
2006-05-21 23:51 ` Michael Sterrett -Mr. Bones.-
2006-05-22  2:42   ` Tejun Heo
2006-05-22  3:42     ` Michael Sterrett -Mr. Bones.-
2006-05-22  6:23     ` Michael Sterrett -Mr. Bones.-
  -- strict thread matches above, loose matches on Subject: below --
2006-05-17  7:35 Matthieu CASTET
2006-05-18  0:36 Brown, Len

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=446A92DD.4040201@gmail.com \
    --to=htejun@gmail.com \
    --cc=axboe@suse.de \
    --cc=jeff@garzik.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=mlord@pobox.com \
    --cc=ric@emc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).