From: Wolfgang Grandegger <wg@grandegger.com>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Austin Schuh <austin@peloton-tech.com>,
Pavel Pisa <pisa@cmp.felk.cvut.cz>,
linux-can@vger.kernel.org
Subject: Re: sja1000 interrupt problem
Date: Thu, 12 Dec 2013 23:56:22 +0100 [thread overview]
Message-ID: <52AA3F16.3070309@grandegger.com> (raw)
In-Reply-To: <52A9F491.3060406@hartkopp.net>
Hi Oliver,
I started to analyse Austin's function trace...
On 12/12/2013 06:38 PM, Oliver Hartkopp wrote:
> On 12.12.2013 07:13, Oliver Hartkopp wrote:
>> On 11.12.2013 20:27, Wolfgang Grandegger wrote:
>
>>> BTW, does the problem show up if only one can is active? Austin tests
>>> did just RX on can1, but can0 was obviously up (but no traffic).
>>
>> There was a case that the IRQ line from can9, can10, can11, can12 failed with
>> an error messaged pointing to can10. All four interfaces where up but only
>> can9 had traffic at that time.
>
> Hi Wolfgang,
>
> here's my latest investigation result.
>
> The setup still has (only) can9 with traffic and after a modification of
> linux/kernel/irq/spurious.c (patch below) I got this:
>
> [ 1117.957651] irq 17: nobody cared (try booting with the "irqpoll" option)
> [ 1117.959910] CPU: 0 PID: 3498 Comm: irq/17-can9 Not tainted 3.10.11-rt7-can #6
> [ 1117.959913] Hardware name: xxxxxx
> [ 1117.959916] 00000000 c1089114 f4e44840 00000001 00000011 c1089490 ee84e780 f4e44840
> [ 1117.959924] ee84e780 ed876fa0 c1087cf3 c10884a9 ee84e7a0 ed876fa0 16edf7d9 00000000
> [ 1117.959932] 00000000 00000000 00000000 c108857f eea53de4 ee84e780 c1088416 ef0a1f90
> [ 1117.959939] Call Trace:
> [ 1117.959949] [<c1089114>] ? __report_bad_irq+0x18/0xbe
> [ 1117.959953] [<c1089490>] ? note_interrupt+0x118/0x194
> [ 1117.959957] [<c1087cf3>] ? irq_thread_fn+0x21/0x21
> [ 1117.959960] [<c10884a9>] ? irq_thread+0x93/0x169
> [ 1117.959964] [<c108857f>] ? irq_thread+0x169/0x169
> [ 1117.959968] [<c1088416>] ? wake_threads_waitq+0x31/0x31
> [ 1117.959973] [<c104a79e>] ? kthread+0x68/0x6d
> [ 1117.959979] [<c13143b7>] ? ret_from_kernel_thread+0x1b/0x28
> [ 1117.959982] [<c104a736>] ? __kthread_parkme+0x50/0x50
> [ 1117.959986] handlers:
> [ 1117.962184] [<c1087bdb>] irq_default_primary_handler threaded [<f86b169b>] sja1000_interrupt [sja1000]device can8 PITA 0x0001
> [ 1117.962190] [<c1087bdb>] irq_default_primary_handler threaded [<f86b169b>] sja1000_interrupt [sja1000]device can10 PITA 0x0001
> [ 1117.962196] [<c1087bdb>] irq_default_primary_handler threaded [<f86b169b>] sja1000_interrupt [sja1000]device can11 PITA 0x0001
> [ 1117.962201] [<c1087bdb>] irq_default_primary_handler threaded [<f86b169b>] sja1000_interrupt [sja1000]device can9 PITA 0x0001
> [ 1117.962202] Disabling IRQ #17
...ending here as well.
> The value in the PITA register is 0x0001 which is from can9 according to
> peak_pci_icr_masks[] (2nd element).
>
> So obviously the flag was set and not consumed correctly ...
Well, it could also be set after the bug above triggered. If I look to
Austins trace, I see at the end (with "grep note_interrupt"):
irq/18-ata_gene-219 [003] ....... 360026.315878: note_interrupt <-irq_thread
irq/18-can1-1890 [001] ....... 360026.315984: note_interrupt <-irq_thread
irq/18-can0-1863 [002] ....... 360026.316122: note_interrupt <-irq_thread
irq/18-ata_gene-219 [003] ....... 360026.316140: note_interrupt <-irq_thread
irq/18-can1-1890 [001] ....... 360026.316184: note_interrupt <-irq_thread
irq/18-can0-1863 [002] ....... 360026.316358: note_interrupt <-irq_thread
irq/18-ata_gene-219 [003] ....... 360026.316361: note_interrupt <-irq_thread
irq/18-ata_gene-219 [003] ....... 360026.316361: __report_bad_irq <-note_interrupt
irq/18-can1-1890 [001] ....... 360026.316437: note_interrupt <-irq_thread
irq/18-can0-1863 [002] ....... 360026.316608: note_interrupt <-irq_thread
irq/18-can1-1890 [001] ....... 360026.316714: note_interrupt <-irq_thread
irq/18-ata_gene-219 [003] d...1.. 360026.317050: _raw_spin_unlock_irqrestore <-note_interrupt
irq/18-ata_gene-219 [003] ....... 360026.317051: printk <-note_interrupt
irq/18-ata_gene-219 [003] ....... 360026.317541: irq_disable <-note_interrupt
irq/18-ata_gene-219 [003] ....... 360026.317541: mod_timer <-note_interrupt
In Austin's case can1 is doing RX and can0 is up but inactive. The last activity
in "sja1000_interrupt" is:
irq/18-can1-1890 [001] .....11 360026.316619: sja1000_interrupt <-irq_forced_thread_fn
irq/18-can1-1890 [001] .....11 360026.316619: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316621: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316623: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316626: alloc_can_skb <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316631: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316633: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316637: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316640: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316643: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316645: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316648: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316651: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316654: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316656: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316659: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316662: sja1000_write_cmdreg <-sja1000_interrupt
irq/18-can1-1890 [001] .....12 360026.316666: migrate_enable <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316667: netif_rx <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316671: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316673: peak_pci_read_reg <-sja1000_interrupt
irq/18-can1-1890 [001] .....11 360026.316675: peak_pci_post_irq <-sja1000_interrupt
Which ends with "peak_pci_post_irq" before the interrupt gets disabled.
Pretty normal! I'm puzzled why the irqs_unhandled counter of the
interrupt reaches 99900 here:
http://lxr.free-electrons.com/source/kernel/irq/spurious.c#L30
My impression is that the problem is with counting "irqs_unhandled" and "irqs_count",
which might not be done atomically. Actually three threads call "note_interrupt".
Does that make sense? Hope to find some time tomorrow to use atomic_set and friends
to handle these counters.
Wolfgang.
next prev parent reply other threads:[~2013-12-12 22:56 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-08 0:47 sja1000 interrupt problem Austin Schuh
2013-10-08 6:32 ` Wolfgang Grandegger
2013-10-08 6:58 ` Oliver Hartkopp
2013-10-08 18:48 ` Austin Schuh
2013-10-08 19:44 ` Wolfgang Grandegger
2013-10-08 20:47 ` Austin Schuh
2013-10-09 6:21 ` Wolfgang Grandegger
2013-10-09 6:31 ` Wolfgang Grandegger
2013-10-09 6:47 ` Wolfgang Grandegger
[not found] ` <CANGgnMZpPGctUWGcg7Lp-QFPc7d6A5GeL9KQYnpeYMR8WukgdA@mail.gmail.com>
2013-11-07 8:15 ` Wolfgang Grandegger
2013-11-07 23:43 ` Austin Schuh
2013-11-09 14:21 ` Oliver Hartkopp
2013-11-12 2:59 ` Austin Schuh
2013-11-12 21:26 ` Oliver Hartkopp
2013-11-12 23:22 ` Austin Schuh
2013-11-13 3:41 ` Austin Schuh
2013-11-13 6:58 ` Oliver Hartkopp
2013-11-13 9:48 ` Kurt Van Dijck
2013-11-13 6:44 ` Oliver Hartkopp
2013-11-13 8:11 ` Wolfgang Grandegger
2013-11-13 9:08 ` Pavel Pisa
2013-11-13 9:52 ` Wolfgang Grandegger
2013-11-13 18:41 ` Oliver Hartkopp
2013-11-13 19:29 ` Wolfgang Grandegger
2013-11-13 22:00 ` Oliver Hartkopp
2013-11-13 11:02 ` Kurt Van Dijck
2013-11-16 21:42 ` Oliver Hartkopp
2013-11-17 8:18 ` Wolfgang Grandegger
2013-11-17 14:27 ` Oliver Hartkopp
2013-11-17 17:23 ` Wolfgang Grandegger
2013-11-17 20:46 ` Wolfgang Grandegger
2013-11-18 17:08 ` Austin Schuh
2013-12-09 21:54 ` Austin Schuh
2013-12-09 21:54 ` Austin Schuh
2013-12-10 7:49 ` Wolfgang Grandegger
2013-12-10 8:05 ` Austin Schuh
2013-12-10 9:32 ` Wolfgang Grandegger
2013-12-10 13:47 ` Oliver Hartkopp
2013-12-10 14:23 ` Oliver Hartkopp
2013-12-10 14:41 ` Wolfgang Grandegger
2013-12-10 16:05 ` Oliver Hartkopp
2013-12-10 21:12 ` Wolfgang Grandegger
2013-12-11 16:59 ` Oliver Hartkopp
2013-12-11 19:27 ` Wolfgang Grandegger
2013-12-12 6:13 ` Oliver Hartkopp
2013-12-12 17:38 ` Oliver Hartkopp
2013-12-12 22:56 ` Wolfgang Grandegger [this message]
2013-12-13 0:07 ` Austin Schuh
2013-12-13 16:16 ` Oliver Hartkopp
2013-12-13 9:38 ` Oliver Hartkopp
2013-12-13 10:04 ` Wolfgang Grandegger
2013-12-13 10:09 ` Wolfgang Grandegger
2013-12-13 16:25 ` Oliver Hartkopp
2013-12-13 17:33 ` Wolfgang Grandegger
2013-12-13 10:07 ` Marc Kleine-Budde
2013-12-13 16:22 ` Oliver Hartkopp
2013-12-13 17:14 ` Oliver Hartkopp
2013-12-13 21:14 ` Oliver Hartkopp
2013-12-14 9:51 ` Oliver Hartkopp
2013-12-20 23:13 ` Austin Schuh
2013-12-21 8:29 ` Wolfgang Grandegger
2013-12-21 13:12 ` Oliver Hartkopp
2013-12-21 12:55 ` Oliver Hartkopp
2013-12-23 15:58 ` Oliver Hartkopp
2013-11-09 19:42 ` Wolfgang Grandegger
[not found] ` <CANGgnMbb+VResUC6h+cK6Hfe5PLJx9R9ao6bMdJM2e5BPaDamw@mail.gmail.com>
2013-11-12 22:15 ` Wolfgang Grandegger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52AA3F16.3070309@grandegger.com \
--to=wg@grandegger.com \
--cc=austin@peloton-tech.com \
--cc=linux-can@vger.kernel.org \
--cc=pisa@cmp.felk.cvut.cz \
--cc=socketcan@hartkopp.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.