From: Wolfgang Grandegger <wg@grandegger.com>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Austin Schuh <austin@peloton-tech.com>,
Pavel Pisa <pisa@cmp.felk.cvut.cz>,
linux-can@vger.kernel.org
Subject: Re: sja1000 interrupt problem
Date: Thu, 12 Dec 2013 23:56:22 +0100 [thread overview]
Message-ID: <52AA3F16.3070309@grandegger.com> (raw)
In-Reply-To: <52A9F491.3060406@hartkopp.net>
Hi Oliver,
I started to analyse Austin's function trace...
On 12/12/2013 06:38 PM, Oliver Hartkopp wrote:
> On 12.12.2013 07:13, Oliver Hartkopp wrote:
>> On 11.12.2013 20:27, Wolfgang Grandegger wrote:
>
>>> BTW, does the problem show up if only one can is active? Austin tests
>>> did just RX on can1, but can0 was obviously up (but no traffic).
>>
>> There was a case that the IRQ line from can9, can10, can11, can12 failed with
>> an error messaged pointing to can10. All four interfaces where up but only
>> can9 had traffic at that time.
>
> Hi Wolfgang,
>
> here's my latest investigation result.
>
> The setup still has (only) can9 with traffic and after a modification of
> linux/kernel/irq/spurious.c (patch below) I got this:
>
> [ 1117.957651] irq 17: nobody cared (try booting with the "irqpoll" option)
> [ 1117.959910] CPU: 0 PID: 3498 Comm: irq/17-can9 Not tainted 3.10.11-rt7-can #6
> [ 1117.959913] Hardware name: xxxxxx
> [ 1117.959916] 00000000 c1089114 f4e44840 00000001 00000011 c1089490 ee84e780 f4e44840
> [ 1117.959924] ee84e780 ed876fa0 c1087cf3 c10884a9 ee84e7a0 ed876fa0 16edf7d9 00000000
> [ 1117.959932] 00000000 00000000 00000000 c108857f eea53de4 ee84e780 c1088416 ef0a1f90
> [ 1117.959939] Call Trace:
> [ 1117.959949] [<c1089114>] ? __report_bad_irq+0x18/0xbe
> [ 1117.959953] [<c1089490>] ? note_interrupt+0x118/0x194
> [ 1117.959957] [<c1087cf3>] ? irq_thread_fn+0x21/0x21
> [ 1117.959960] [<c10884a9>] ? irq_thread+0x93/0x169
> [ 1117.959964] [<c108857f>] ? irq_thread+0x169/0x169
> [ 1117.959968] [<c1088416>] ? wake_threads_waitq+0x31/0x31
> [ 1117.959973] [<c104a79e>] ? kthread+0x68/0x6d
> [ 1117.959979] [<c13143b7>] ? ret_from_kernel_thread+0x1b/0x28
> [ 1117.959982] [<c104a736>] ? __kthread_parkme+0x50/0x50
> [ 1117.959986] handlers:
> [ 1117.962184] [<c1087bdb>] irq_default_primary_handler threaded [<f86b169b>] sja1000_interrupt [sja1000]device can8 PITA 0x0001
> [ 1117.962190] [<c1087bdb>] irq_default_primary_handler threaded [<f86b169b>] sja1000_interrupt [sja1000]device can10 PITA 0x0001
> [ 1117.962196] [<c1087bdb>] irq_default_primary_handler threaded [<f86b169b>] sja1000_interrupt [sja1000]device can11 PITA 0x0001
> [ 1117.962201] [<c1087bdb>] irq_default_primary_handler threaded [<f86b169b>] sja1000_interrupt [sja1000]device can9 PITA 0x0001
> [ 1117.962202] Disabling IRQ #17
...ending here as well.
> The value in the PITA register is 0x0001 which is from can9 according to
> peak_pci_icr_masks[] (2nd element).
>
> So obviously the flag was set and not consumed correctly ...
Well, it could also be set after the bug above triggered. If I look to
Austins trace, I see at the end (with "grep note_interrupt"):
irq/18-ata_gene-219 [003] ....... 360026.315878: note_interrupt <-irq_thread
irq/18-can1-1890 [001] ....... 360026.315984: note_interrupt <-irq_thread
irq/18-can0-1863 [002] ....... 360026.316122: note_interrupt <-irq_thread
irq/18-ata_gene-219 [003] ....... 360026.316140: note_interrupt <-irq_thread
irq/18-can1-1890 [001] ....... 360026.316184: note_interrupt <-irq_thread
irq/18-can0-1863 [002] ....... 360026.316358: note_interrupt <-irq_thread
irq/18-ata_gene-219 [003] ....... 360026.316361: note_interrupt <-irq_thread
irq/18-ata_gene-219 [003] ....... 360026.316361: __report_bad_irq <-note_interrupt
irq/18-can1-1890 [001] ....... 360026.316437: note_interrupt <-irq_thread
irq/18-can0-1863 [002] ....... 360026.316608: note_interrupt <-irq_thread
irq/18-can1-1890 [001] ....... 360026.316714: note_interrupt <-irq_thread
irq/18-ata_gene-219 [003] d...1.. 360026.317050: _raw_spin_unlock_irqrestore <-note_interrupt
irq/18-ata_gene-219 [003] ....... 360026.317051: printk <-note_interrupt
irq/18-ata_gene-219 [003] ....... 360026.317541: irq_disable <-note_interrupt
irq/18-ata_gene-219 [003] ....... 360026.317541: mod_timer <-note_interrupt
In Austin's case can1 is doing RX and can0 is up but inactive. The last activity
in "sja1000_interrupt" is:
irq/18-can1-1890 [001] .....11 360026.316619: sja1000_interrupt <-irq_forced_thread_fn
irq/18-can1-1890 [001] .....11 360026.316619: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316621: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316623: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316626: alloc_can_skb <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316631: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316633: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316637: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316640: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316643: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316645: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316648: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316651: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316654: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316656: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316659: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316662: sja1000_write_cmdreg <-sja1000_interrupt
irq/18-can1-1890 [001] .....12 360026.316666: migrate_enable <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316667: netif_rx <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316671: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316673: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316675: peak_pci_post_irq <-sja1000_interrupt
Which ends with "peak_pci_post_irq" before the interrupt gets disabled.
Pretty normal! I'm puzzled why the irqs_unhandled counter of the
interrupt reaches 99900 here:
http://lxr.free-electrons.com/source/kernel/irq/spurious.c#L30
My impression is that the problem is with counting "irqs_unhandled" and "irqs_count",
which might not be done atomically. Actually three threads call "note_interrupt".
Does that make sense? Hope to find some time tomorrow to use atomic_set and friends
to handle these counters.
Wolfgang.
next prev parent reply other threads:[~2013-12-12 22:56 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-08 0:47 sja1000 interrupt problem Austin Schuh
2013-10-08 6:32 ` Wolfgang Grandegger
2013-10-08 6:58 ` Oliver Hartkopp
2013-10-08 18:48 ` Austin Schuh
2013-10-08 19:44 ` Wolfgang Grandegger
2013-10-08 20:47 ` Austin Schuh
2013-10-09 6:21 ` Wolfgang Grandegger
2013-10-09 6:31 ` Wolfgang Grandegger
2013-10-09 6:47 ` Wolfgang Grandegger
[not found] ` <CANGgnMZpPGctUWGcg7Lp-QFPc7d6A5GeL9KQYnpeYMR8WukgdA@mail.gmail.com>
2013-11-07 8:15 ` Wolfgang Grandegger
2013-11-07 23:43 ` Austin Schuh
2013-11-09 14:21 ` Oliver Hartkopp
2013-11-12 2:59 ` Austin Schuh
2013-11-12 21:26 ` Oliver Hartkopp
2013-11-12 23:22 ` Austin Schuh
2013-11-13 3:41 ` Austin Schuh
2013-11-13 6:58 ` Oliver Hartkopp
2013-11-13 9:48 ` Kurt Van Dijck
2013-11-13 6:44 ` Oliver Hartkopp
2013-11-13 8:11 ` Wolfgang Grandegger
2013-11-13 9:08 ` Pavel Pisa
2013-11-13 9:52 ` Wolfgang Grandegger
2013-11-13 18:41 ` Oliver Hartkopp
2013-11-13 19:29 ` Wolfgang Grandegger
2013-11-13 22:00 ` Oliver Hartkopp
2013-11-13 11:02 ` Kurt Van Dijck
2013-11-16 21:42 ` Oliver Hartkopp
2013-11-17 8:18 ` Wolfgang Grandegger
2013-11-17 14:27 ` Oliver Hartkopp
2013-11-17 17:23 ` Wolfgang Grandegger
2013-11-17 20:46 ` Wolfgang Grandegger
2013-11-18 17:08 ` Austin Schuh
2013-12-09 21:54 ` Austin Schuh
2013-12-09 21:54 ` Austin Schuh
2013-12-10 7:49 ` Wolfgang Grandegger
2013-12-10 8:05 ` Austin Schuh
2013-12-10 9:32 ` Wolfgang Grandegger
2013-12-10 13:47 ` Oliver Hartkopp
2013-12-10 14:23 ` Oliver Hartkopp
2013-12-10 14:41 ` Wolfgang Grandegger
2013-12-10 16:05 ` Oliver Hartkopp
2013-12-10 21:12 ` Wolfgang Grandegger
2013-12-11 16:59 ` Oliver Hartkopp
2013-12-11 19:27 ` Wolfgang Grandegger
2013-12-12 6:13 ` Oliver Hartkopp
2013-12-12 17:38 ` Oliver Hartkopp
2013-12-12 22:56 ` Wolfgang Grandegger [this message]
2013-12-13 0:07 ` Austin Schuh
2013-12-13 16:16 ` Oliver Hartkopp
2013-12-13 9:38 ` Oliver Hartkopp
2013-12-13 10:04 ` Wolfgang Grandegger
2013-12-13 10:09 ` Wolfgang Grandegger
2013-12-13 16:25 ` Oliver Hartkopp
2013-12-13 17:33 ` Wolfgang Grandegger
2013-12-13 10:07 ` Marc Kleine-Budde
2013-12-13 16:22 ` Oliver Hartkopp
2013-12-13 17:14 ` Oliver Hartkopp
2013-12-13 21:14 ` Oliver Hartkopp
2013-12-14 9:51 ` Oliver Hartkopp
2013-12-20 23:13 ` Austin Schuh
2013-12-21 8:29 ` Wolfgang Grandegger
2013-12-21 13:12 ` Oliver Hartkopp
2013-12-21 12:55 ` Oliver Hartkopp
2013-12-23 15:58 ` Oliver Hartkopp
2013-11-09 19:42 ` Wolfgang Grandegger
[not found] ` <CANGgnMbb+VResUC6h+cK6Hfe5PLJx9R9ao6bMdJM2e5BPaDamw@mail.gmail.com>
2013-11-12 22:15 ` Wolfgang Grandegger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52AA3F16.3070309@grandegger.com \
--to=wg@grandegger.com \
--cc=austin@peloton-tech.com \
--cc=linux-can@vger.kernel.org \
--cc=pisa@cmp.felk.cvut.cz \
--cc=socketcan@hartkopp.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).