From: Oliver Hartkopp <socketcan@hartkopp.net>
To: Austin Schuh <austin@peloton-tech.com>
Cc: Wolfgang Grandegger <wg@grandegger.com>, linux-can@vger.kernel.org
Subject: Re: sja1000 interrupt problem
Date: Wed, 13 Nov 2013 07:58:59 +0100 [thread overview]
Message-ID: <52832333.9080908@hartkopp.net> (raw)
In-Reply-To: <CANGgnMaCvb=B2r997e+H9UjVquX66HJ+OtftL0EyGP7MKcy0tQ@mail.gmail.com>
Hi Austin,
sorry for checking my mails in sequential order :-)
I would have been able to shorten the last mail.
Thanks for your interesting investigation.
I wonder why this problem did not show up before then. Having shared
interrupts should be a usual thing.
This kind of race condition should not be there at all. Do you have a second
peak_pci hardware? I could be an idea to try to split the IRQs in a way that
you have two IRQs for two cards - and then connect can0 to can2.
You would have a pretty fast following RX/TX interrupt but without interrupt
sharing ...
Best regards,
Oliver
On 13.11.2013 04:41, Austin Schuh wrote:
> On Tue, Nov 12, 2013 at 3:22 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>> On Tue, Nov 12, 2013 at 1:26 PM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>>> On 12.11.2013 03:59, Austin Schuh wrote:
>>>
>>>>> From the trace it is pretty hard to know which CAN interface is in charge.
>>>>> (2) Can you please add the output of dev->ifindex in the pr_info() calls?
>>>>
>>>> Gladly. See the updated logs.
>>>>
>>>> [ 556.019246] peak_pci 0000:05:00.0 can1: Got an sja1000 interrupt.
>>>> [ 556.019268] Unhandled IRQ 18... stop tracing...
>>>> [ 556.019280] peak_pci 0000:05:00.0 can0: Got an sja1000 interrupt.
>>>> [ 556.019289] peak_pci 0000:05:00.0 can1: Received packet.
>>>> [ 556.019299] peak_pci 0000:05:00.0 can1: sja1000_rx
>>>> [ 556.019307] peak_pci 0000:05:00.0 can0: TX complete.
>>>> [ 556.019318] peak_pci 0000:05:00.0 can0: Returning IRQ_HANDLED
>>>> [ 556.019362] peak_pci 0000:05:00.0 can1: Returning IRQ_HANDLED
>>>>
>>>
>>> This looks pretty broken regarding the IRQ handling.
>>> Maybe the IRQ thread handling has a real problem in the -rt kernel ?!?
>>
>> Sounds pretty plausible right now.
>
> Ok, I spent a good chunk of today reading the IRQ handling code in the
> kernel, and I think I get what is happening and have a plausible
> explanation for why the interrupt is getting disabled. Not sure how
> to test it.
>
> Here is what it looks like is happening. The hardware triggers an
> interrupt. The handler is called, and then the registered action for
> each of the devices is to notify their threads that an IRQ occurred,
> and to have them handle it. Each of the handling threads then calls
> the sja1000_interrupt function, or the equivalent ata_generic
> interrupt function. 2 of the 3 interrupt functions then return
> IRQ_NONE, and one of them returns IRQ_HANDLED. note_interrupt is then
> called in each of the threads (instead of being called once in the
> non-rt case), resulting in 2 unhanded calls, and 1 handled call. So
> far, so good. The kernel operates as expected, since less than 99.9 %
> of the interrupts are handled. (There is a note_interrupt call in the
> handler, but since the threaded handlers are notified, this doesn't
> get counted.
>
> Since the IRQ handlers are now all in threads, if the thread that
> actually receives data doesn't process the interrupts either because
> something goes wrong, or because it doesn't get scheduled, there will
> be a bunch of unhanded interrupts noted, and no handled interrupts
> noted. This will cause the IRQ to be disabled.
>
> I guess the next interesting thing to do is to trigger when it
> disables the IRQ and take a look at what is happening. I have a test
> running on one machine with tracing enabled which will disable tracing
> when the IRQ is disabled. That should provide some interesting
> results. I think I also know how to bypass it for now by setting
> "noirqdebug", but I'd like to fix it for real as well.
>
> Austin
>
next prev parent reply other threads:[~2013-11-13 6:59 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-08 0:47 sja1000 interrupt problem Austin Schuh
2013-10-08 6:32 ` Wolfgang Grandegger
2013-10-08 6:58 ` Oliver Hartkopp
2013-10-08 18:48 ` Austin Schuh
2013-10-08 19:44 ` Wolfgang Grandegger
2013-10-08 20:47 ` Austin Schuh
2013-10-09 6:21 ` Wolfgang Grandegger
2013-10-09 6:31 ` Wolfgang Grandegger
2013-10-09 6:47 ` Wolfgang Grandegger
[not found] ` <CANGgnMZpPGctUWGcg7Lp-QFPc7d6A5GeL9KQYnpeYMR8WukgdA@mail.gmail.com>
2013-11-07 8:15 ` Wolfgang Grandegger
2013-11-07 23:43 ` Austin Schuh
2013-11-09 14:21 ` Oliver Hartkopp
2013-11-12 2:59 ` Austin Schuh
2013-11-12 21:26 ` Oliver Hartkopp
2013-11-12 23:22 ` Austin Schuh
2013-11-13 3:41 ` Austin Schuh
2013-11-13 6:58 ` Oliver Hartkopp [this message]
2013-11-13 9:48 ` Kurt Van Dijck
2013-11-13 6:44 ` Oliver Hartkopp
2013-11-13 8:11 ` Wolfgang Grandegger
2013-11-13 9:08 ` Pavel Pisa
2013-11-13 9:52 ` Wolfgang Grandegger
2013-11-13 18:41 ` Oliver Hartkopp
2013-11-13 19:29 ` Wolfgang Grandegger
2013-11-13 22:00 ` Oliver Hartkopp
2013-11-13 11:02 ` Kurt Van Dijck
2013-11-16 21:42 ` Oliver Hartkopp
2013-11-17 8:18 ` Wolfgang Grandegger
2013-11-17 14:27 ` Oliver Hartkopp
2013-11-17 17:23 ` Wolfgang Grandegger
2013-11-17 20:46 ` Wolfgang Grandegger
2013-11-18 17:08 ` Austin Schuh
2013-12-09 21:54 ` Austin Schuh
2013-12-09 21:54 ` Austin Schuh
2013-12-10 7:49 ` Wolfgang Grandegger
2013-12-10 8:05 ` Austin Schuh
2013-12-10 9:32 ` Wolfgang Grandegger
2013-12-10 13:47 ` Oliver Hartkopp
2013-12-10 14:23 ` Oliver Hartkopp
2013-12-10 14:41 ` Wolfgang Grandegger
2013-12-10 16:05 ` Oliver Hartkopp
2013-12-10 21:12 ` Wolfgang Grandegger
2013-12-11 16:59 ` Oliver Hartkopp
2013-12-11 19:27 ` Wolfgang Grandegger
2013-12-12 6:13 ` Oliver Hartkopp
2013-12-12 17:38 ` Oliver Hartkopp
2013-12-12 22:56 ` Wolfgang Grandegger
2013-12-13 0:07 ` Austin Schuh
2013-12-13 16:16 ` Oliver Hartkopp
2013-12-13 9:38 ` Oliver Hartkopp
2013-12-13 10:04 ` Wolfgang Grandegger
2013-12-13 10:09 ` Wolfgang Grandegger
2013-12-13 16:25 ` Oliver Hartkopp
2013-12-13 17:33 ` Wolfgang Grandegger
2013-12-13 10:07 ` Marc Kleine-Budde
2013-12-13 16:22 ` Oliver Hartkopp
2013-12-13 17:14 ` Oliver Hartkopp
2013-12-13 21:14 ` Oliver Hartkopp
2013-12-14 9:51 ` Oliver Hartkopp
2013-12-20 23:13 ` Austin Schuh
2013-12-21 8:29 ` Wolfgang Grandegger
2013-12-21 13:12 ` Oliver Hartkopp
2013-12-21 12:55 ` Oliver Hartkopp
2013-12-23 15:58 ` Oliver Hartkopp
2013-11-09 19:42 ` Wolfgang Grandegger
[not found] ` <CANGgnMbb+VResUC6h+cK6Hfe5PLJx9R9ao6bMdJM2e5BPaDamw@mail.gmail.com>
2013-11-12 22:15 ` Wolfgang Grandegger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52832333.9080908@hartkopp.net \
--to=socketcan@hartkopp.net \
--cc=austin@peloton-tech.com \
--cc=linux-can@vger.kernel.org \
--cc=wg@grandegger.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).