linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jeff Garzik <jeff@garzik.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-ide@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [git patches] libata fixes for .26
Date: Sat, 05 Jul 2008 12:13:14 +0900	[thread overview]
Message-ID: <486EE6CA.2040908@kernel.org> (raw)
In-Reply-To: <alpine.LFD.1.10.0807040955100.2815@woody.linux-foundation.org>

Hello, Linus.

Linus Torvalds wrote:
> Looking at the AHCI change, I think it's still potentially buggy.
> 
> I think it is potentially buggy for two separate reasons:
> 
>  - if the interrupt happens because of some port that we don't handle, we 
>    should still ACK it, in order to get rid of it. I don't think Tejun's 
>    patch fixed anything at all, since it still did a binary 'and' with 
>    hpriv->port_map on the bits, so it would never ACK anything that didn't 
>    have a bit set, and the (bogus) interrupt would keep screaming.

Strange.  Yeah, the AND should go.  The original patch was...

http://htj.dyndns.org/export/testing/head-x86_64-bug390937_dbg0/0001-ahci-interrupt-debug.patch

And the reporter verified debug option 2 (the posted patch) and 3
(always write 0xffffffff) fixed the problem.  The report was complete
w/ boot log showing which debug option was active.  Ah... weird.  The
debug option 2 shouldn't have made any difference.  :-(

Anyways, will send a patch to drop the AND.

>  - I also wonder if / suspect that the IRQ ACK should happen _before_ we 
>    handle the source of the interrupt, because otherwise if one port ends 
>    up having two events in close succession (can this happen? I think so), 
>    then we end up perhaps getting the irq for the first one, and handle 
>    that first event, but then the second event happens immediately 
>    afterwards, and before we do the writel() to ACK it, so now the 
>    _hardware_ thinks we have handled both of them, even though we never 
>    actually reacted to the second event.
> 
> So I think the appended (TOTALLY UNTESTED!) patch - based on top of the 
> pull that I already did - might be a good idea.
> 
> NOTE! I _really_ didn't test it. I do not know how AHCI works at a low 
> level, and maybe there is some reason why the IRQ ACK writel() actually 
> has to happen after you've handled the event (to avoid getting a new 
> interrupt immediately. But based on other controllers I've worked with, 
> this is the correct way to not lose irq events.
> 
> [ And yes, the race for the irq ack issue is small. And yes, the 
>   likelihood of a bogus interrupt triggering is probably small too. And 
>   see above about my lack of specific knowledge about AHCI.
> 
>   So I'm sure as heck not going to commit this patch, I'm just sending it 
>   out as a "Are you sure you shouldn't do it like this?" RFC patch.. ]

The current code is correct.  From 10.6.2.1 of ahci 1.1,

  In order to clear an interrupt, software must first clear the event
  from the PxIS register, then clear the interrupt event from the IS
  register. If software clears IS register only, leaving the level of
  the virtual wire from the PxIS register set, the resulting level of
  1 shall cause the IS register bit to be re-set.

so, it basically behaves like level triggered IRQ latch for ports,
which also creates an easy-to-miss requirement for MSI implementation
where the controller should generate a new MSI message when the driver
clears some of the IRQ pending bits but not all.

I do agree it's strange.  Whenever I come back to ahci_interrupt()
after enough time has passed, that clearing code always makes me look
up the spec.  I guess it's about time to add some comment there.

Thanks.

-- 
tejun


  reply	other threads:[~2008-07-05  3:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-04 13:10 [git patches] libata fixes for .26 Jeff Garzik
2008-07-04 17:09 ` Linus Torvalds
2008-07-05  3:13   ` Tejun Heo [this message]
2008-07-05  3:18     ` Tejun Heo
2008-07-05  4:10       ` [PATCH #upstream-fixes] ahci: give another shot at clearing all bits in irq_stat Tejun Heo
2008-07-06 13:45         ` Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2008-07-11 14:10 [git patches] libata fixes for .26 Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486EE6CA.2040908@kernel.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=jeff@garzik.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).