linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wolfgang Grandegger <wg@grandegger.com>
To: Austin Schuh <austin@peloton-tech.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>, linux-can@vger.kernel.org
Subject: Re: sja1000 interrupt problem
Date: Tue, 12 Nov 2013 23:15:16 +0100	[thread overview]
Message-ID: <5282A874.5050809@grandegger.com> (raw)
In-Reply-To: <CANGgnMbb+VResUC6h+cK6Hfe5PLJx9R9ao6bMdJM2e5BPaDamw@mail.gmail.com>

Hi Austin,

On 11/12/2013 03:59 AM, Austin Schuh wrote:
> Hi Wolfgang,
> 
> On Sat, Nov 9, 2013 at 11:42 AM, Wolfgang Grandegger <wg@grandegger.com> wrote:
>> Hi Austin,
>>
>> On 11/08/2013 12:43 AM, Austin Schuh wrote:
>>>>>
>>>>> The dump is attached.
>>>>
>>>> I do not see any sja1000_rx() calls. Either they never happen or more
>>>> likely the trace is not long enough. Could you try with a larger buffer
>>>> using "echo 20000 > buffer_size_kb"? I also do not see some pr_info()
>>>> related functions at the end of the trace. Are you sure is has stopped
>>>> (cat tracing_on or message in dmesg)?
>>>>
>>>> Also please do an "echo 0 > trace" to clear the trace content.
>>>>
>>>> Wolfgang.
>>>
>>> Hi Wolfgang,
>>>
>>> I'm pretty certain that the trace is long enough.  I tried again with
>>> echo -e -n 100000 > buffer_size_kb and I still don't see any calls to
>>> sja1000_rx.
>>>
>>> I added some pr_info prints at the front of sja1000_rx and
>>> sja1000_interrupt.  For each packet sent and then received, I see the
>>> following.  The following lines are from me sending 4 packets.
>>
>> sja1000_interrupt() is normally called for each SJA1000 device (shared
>> interrupt).
>>
>>> Nov  7 15:35:52 vpc5 kernel: [   75.136107] Got an sja1000 interrupt.
>>> Nov  7 15:35:52 vpc5 kernel: [   75.136123] Unhandled IRQ 18... stop tracing...
>>
>> I'm confused. Why is the IRQ "unhandled" without calling
>> sja1000_interrupt() twice. Ah, this is due to threaded interrupt
>> handling, where the spurious interrupt check is called for each
>> handler/device. Therefore the trigger is simply bad.
>>
>>> Nov  7 15:35:52 vpc5 kernel: [   75.136130] Got an sja1000 interrupt.
>>> Nov  7 15:35:52 vpc5 kernel: [   75.136139] Received packet.
>>> Nov  7 15:35:52 vpc5 kernel: [   75.136146] sja1000_rx
>>> Nov  7 15:35:52 vpc5 kernel: [   75.136155] TX complete.
>>> Nov  7 15:35:52 vpc5 kernel: [   75.136174] Returning IRQ_HANDLED
>>> Nov  7 15:35:52 vpc5 kernel: [   75.136207] Returning IRQ_HANDLED
>> ...
>>
>> This is the working case. How does it look like when the device stops
>> receiving messages. You should also label the device.
> 
> I haven't been able to run the code long enough with all the debug
> turned on because my syslog grows way too big.  This only happens on
> the PC connected to the highly loaded CAN bus.  I have another PC
> connected to a lightly loaded bus, and I don't see nearly as many
> problems with that PC.  I just figured out how to not fill my syslog
> up, so I'll get back to you when I have something interesting to show.
> 
>>> I didn't rerun any traces, since my analysis of the syslog is that it
>>> won't give you the information you are looking for without moving the
>>> tracing_off call somewhere else.
>>
>> Well, we know that messages are received properly for some time. We need
>> to trigger the malfunctioning. I understood that it happens just once a
>> day that the interrupts get stuck, right?
>>
>> Wolfgang.
> 
> I now have a number of datapoints.  These are the timestamps from when
> it disabled the IRQ.  After each of these, I rebooted the machine to
> be safe, though it works to instead take the interface down and bring
> it back up again.  I'm seeing anywhere from 13 hours - 2 hours.
> 
> [47430.043798] Disabling IRQ #18
> [21439.136781] Disabling IRQ #18
> [5641.564900] Disabling IRQ #18
> [45129.357272] Disabling IRQ #18

Be aware that it prints these messages not before having detected more
than 99000 unhandled IRQs in sequence. See:

  http://lxr.linux.no/#linux+v3.12/kernel/irq/spurious.c#L308

Either the SJA1000 is stuck or the interrupt does not go away. Maybe we
have a race with interrupt acknowledgement.

> I moved the trace disable to the end of sja1000_interrupt when I get
> an interrupt from can1, since that is the last event in the trace.
> Attached is the new trace.  It should have more interesting
> information in it.

This does disable the trace after the first interrupt served for CAN1.
Or do I miss something? What do you want to trigger?

>>From the syslog:
> Nov 11 17:56:14 vpc5 kernel: [   79.069060] Unhandled IRQ 18...
> irqs_unhandled: 1, irq_count: 27
> Nov 11 17:56:14 vpc5 kernel: [   79.069097] peak_pci 0000:05:00.0
> can0: Got an sja1000 interrupt.
> Nov 11 17:56:14 vpc5 kernel: [   79.069113] peak_pci 0000:05:00.0
> can1: Got an sja1000 interrupt.
> Nov 11 17:56:14 vpc5 kernel: [   79.069132] peak_pci 0000:05:00.0
> can1: Received packet.
> Nov 11 17:56:14 vpc5 kernel: [   79.069157] peak_pci 0000:05:00.0
> can0: TX complete.
> Nov 11 17:56:14 vpc5 kernel: [   79.069161] peak_pci 0000:05:00.0
> can1: sja1000_rx
> Nov 11 17:56:14 vpc5 kernel: [   79.069199] peak_pci 0000:05:00.0
> can0: Returning IRQ_HANDLED
> Nov 11 17:56:14 vpc5 kernel: [   79.069242] peak_pci 0000:05:00.0
> can1: Returning IRQ_HANDLED
> Nov 11 17:56:14 vpc5 kernel: [   79.069269] peak_pci 0000:05:00.0
> can1: Found can1, disabling tracing.

If you use FTRACE, please remove these prinkt's.

> Instead of sending packets between the CAN interfaces, I grabbed
> another CAN device and am using it to send packets on the bus.  This
> is the configuration that I have been seeing the real failures in.  I
> now have 2 machines with the CAN device attached to see if I can catch
> the kprintfs when it dies.  Hopefully one of them will have died by
> tomorrow morning and I'll be able to find something interesting.

It's good to make the test case as simple as possible. If I understand
correctly, just CAN1 does then receive messages.

Concerning the trigger. Maybe it's already sufficient to stop the trace
after 10 (or even less) unhandled IRQ 18 by adding:

	if (unlikely(desc->irqs_unhandled == 10)) {
		tracing_off();
		printk("%s: tracing stopped\n", __func__);
	}


in note_interrupt here:

  http://lxr.linux.no/#linux+v3.12/kernel/irq/spurious.c#L306.

Other things to check:

- do the following checks in sja1000_interrupt ever return IRQ_NONE?

  /* check for absent controller due to hw unplug */
        if (status == 0xFF && sja1000_is_absent(priv))
               return IRQ_NONE;

- What is the value of the PITA_ICR register when the device is stuck
  (IRQ 18 disabled)? See:

  http://lxr.linux.no/#linux/drivers/net/can/sja1000/peak_pci.c#L539

Hope it helps to find the problem.

Wolfgang.


      parent reply	other threads:[~2013-11-12 22:15 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-08  0:47 sja1000 interrupt problem Austin Schuh
2013-10-08  6:32 ` Wolfgang Grandegger
2013-10-08  6:58   ` Oliver Hartkopp
2013-10-08 18:48     ` Austin Schuh
2013-10-08 19:44       ` Wolfgang Grandegger
2013-10-08 20:47         ` Austin Schuh
2013-10-09  6:21           ` Wolfgang Grandegger
2013-10-09  6:31           ` Wolfgang Grandegger
2013-10-09  6:47           ` Wolfgang Grandegger
     [not found]             ` <CANGgnMZpPGctUWGcg7Lp-QFPc7d6A5GeL9KQYnpeYMR8WukgdA@mail.gmail.com>
2013-11-07  8:15               ` Wolfgang Grandegger
2013-11-07 23:43                 ` Austin Schuh
2013-11-09 14:21                   ` Oliver Hartkopp
2013-11-12  2:59                     ` Austin Schuh
2013-11-12 21:26                       ` Oliver Hartkopp
2013-11-12 23:22                         ` Austin Schuh
2013-11-13  3:41                           ` Austin Schuh
2013-11-13  6:58                             ` Oliver Hartkopp
2013-11-13  9:48                               ` Kurt Van Dijck
2013-11-13  6:44                           ` Oliver Hartkopp
2013-11-13  8:11                             ` Wolfgang Grandegger
2013-11-13  9:08                               ` Pavel Pisa
2013-11-13  9:52                                 ` Wolfgang Grandegger
2013-11-13 18:41                                   ` Oliver Hartkopp
2013-11-13 19:29                                     ` Wolfgang Grandegger
2013-11-13 22:00                                       ` Oliver Hartkopp
2013-11-13 11:02                                 ` Kurt Van Dijck
2013-11-16 21:42                                 ` Oliver Hartkopp
2013-11-17  8:18                                   ` Wolfgang Grandegger
2013-11-17 14:27                                     ` Oliver Hartkopp
2013-11-17 17:23                                       ` Wolfgang Grandegger
2013-11-17 20:46                                         ` Wolfgang Grandegger
2013-11-18 17:08                                           ` Austin Schuh
2013-12-09 21:54                                             ` Austin Schuh
2013-12-09 21:54                                               ` Austin Schuh
2013-12-10  7:49                                               ` Wolfgang Grandegger
2013-12-10  8:05                                                 ` Austin Schuh
2013-12-10  9:32                                                   ` Wolfgang Grandegger
2013-12-10 13:47                                                     ` Oliver Hartkopp
2013-12-10 14:23                                                       ` Oliver Hartkopp
2013-12-10 14:41                                                       ` Wolfgang Grandegger
2013-12-10 16:05                                                         ` Oliver Hartkopp
2013-12-10 21:12                                                           ` Wolfgang Grandegger
2013-12-11 16:59                                                             ` Oliver Hartkopp
2013-12-11 19:27                                                               ` Wolfgang Grandegger
2013-12-12  6:13                                                                 ` Oliver Hartkopp
2013-12-12 17:38                                                                   ` Oliver Hartkopp
2013-12-12 22:56                                                                     ` Wolfgang Grandegger
2013-12-13  0:07                                                                       ` Austin Schuh
2013-12-13 16:16                                                                         ` Oliver Hartkopp
2013-12-13  9:38                                                                       ` Oliver Hartkopp
2013-12-13 10:04                                                                         ` Wolfgang Grandegger
2013-12-13 10:09                                                                           ` Wolfgang Grandegger
2013-12-13 16:25                                                                             ` Oliver Hartkopp
2013-12-13 17:33                                                                               ` Wolfgang Grandegger
2013-12-13 10:07                                                                         ` Marc Kleine-Budde
2013-12-13 16:22                                                                           ` Oliver Hartkopp
2013-12-13 17:14                                                                             ` Oliver Hartkopp
2013-12-13 21:14                                                                               ` Oliver Hartkopp
2013-12-14  9:51                                                                                 ` Oliver Hartkopp
2013-12-20 23:13                                                                                   ` Austin Schuh
2013-12-21  8:29                                                                                     ` Wolfgang Grandegger
2013-12-21 13:12                                                                                       ` Oliver Hartkopp
2013-12-21 12:55                                                                                     ` Oliver Hartkopp
2013-12-23 15:58                                                                                       ` Oliver Hartkopp
2013-11-09 19:42                   ` Wolfgang Grandegger
     [not found]                     ` <CANGgnMbb+VResUC6h+cK6Hfe5PLJx9R9ao6bMdJM2e5BPaDamw@mail.gmail.com>
2013-11-12 22:15                       ` Wolfgang Grandegger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5282A874.5050809@grandegger.com \
    --to=wg@grandegger.com \
    --cc=austin@peloton-tech.com \
    --cc=linux-can@vger.kernel.org \
    --cc=socketcan@hartkopp.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).