linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oliver Hartkopp <socketcan@hartkopp.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Austin Schuh <austin@peloton-tech.com>,
	Wolfgang Grandegger <wg@grandegger.com>,
	Pavel Pisa <pisa@cmp.felk.cvut.cz>,
	Marc Kleine-Budde <mkl@pengutronix.de>,
	linux-can@vger.kernel.org
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
Date: Mon, 06 Jan 2014 14:32:15 +0100	[thread overview]
Message-ID: <52CAB05F.4010303@hartkopp.net> (raw)
In-Reply-To: <CANGgnMbszHzYe9pF2C6wag4MY_PfBG2qrMCC=rMmQnb-jyXXXw@mail.gmail.com>

Hi Thomas,

I just wanted to add my

Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>

In my setup with Core i7 and 20 CAN busses SJA1000 PCIe the problem
disappeared with the discussed patch with the -rt kernel.

The system was running at full CAN bus load over the weekend more than 72
hours of operation without problems:
 
           CPU0       CPU1       CPU2       CPU3       
  0:         40          0          0          0   IO-APIC-edge      timer
  1:          1          0          0          0   IO-APIC-edge      i8042
  8:          0          0          1          0   IO-APIC-edge      rtc0
  9:         42         45         45         42   IO-APIC-fasteoi   acpi
 16:          9          8          8          8   IO-APIC-fasteoi   ahci, ehci_hcd:usb1, can4, can5, can6, can7
 17:  441468642  443275488  443609061  441436145   IO-APIC-fasteoi   can8, can10, can11, can9
 18:  441975412  438811422  437317802  441209092   IO-APIC-fasteoi   can12, can13, can14, can15
 19:  427310388  428661677  429813687  428095739   IO-APIC-fasteoi   can0, can1, can2, can3, can16, can17, can18, can19
(..)

Before the having the patch, it lasted 1 minutes to 1.5 hours (usually ~3
minutes) until the irq was killed due to the spurious detection using Linux
3.10.11-rt (Debian linux-image-3.10-0.bpo.3-rt-686-pae).

I also tested the patch on different latest 3.13-rc5+ (non-rt) kernels for two
weeks now without problems.

If you want me to test an improved version (as Austin suggested below) please
send a patch.

Best regards,
Oliver

On 23.12.2013 20:25, Austin Schuh wrote:
> Hi Thomas,
> 
> Did anything happen with your patch to note_interrupt, originally
> posted on May 8th of 2013?  (https://lkml.org/lkml/2013/3/7/222)
> 
> I am seeing an issue on a machine right now running a
> config-preempt-rt kernel and a SJA1000 CAN card from PEAK.  It works
> for ~1 day, and then proceeds to die with a "Disabling IRQ #18"
> message.  I posted on the Linux CAN mailing list, and Oliver Hartkopp
> was able to reproduce the issue only on a realtime kernel.  A function
> trace ending when the IRQ was disabled shows that note_interrupt is
> being called regularly from the IRQ handler threads, and one of the
> threads is doing work (and therefore calling note_interrupt with
> IRQ_HANDLED).
> 
> Oliver Hartkopp and I ran tests over the weekend on numerous machines
> and verified that the patch that you proposed fixes the problem.  We
> think that the race condition that Till reported is causing the
> problem here.
> 
> In reply to the comment about using the upper bit of
> threads_handled_last for holding the SPURIOUS_DEFERRED flag, while
> that may still be an over-optimization, the code should still work.
> All comparisons are done with the bit set, which just makes it a 31
> bit counter.  It will take 8 more days for the counter to overflow on
> my machine, so I won't know for certain until then.
> 
> My only concern is that there may still be a small race condition with
> this new code.  If the interrupt handler thread is running at a
> realtime priority, but lower than another task, it may not get run
> until a large number of IRQs get triggered, and then process them
> quickly.  With your new handler code, this would be counted as one
> single handled interrupt.  With the current constants, this is only a
> problem if more than 1000 calls to the handler happen between IRQs.  I
> starved my card's irq threads by running 4 tasks at a higher realtime
> priority than the handler threads, and saw the number of unhandled
> IRQs jump from 1/100000 to 3/100000, so that problem may not show up
> in practice.
> 
> Austin Schuh
> 
> Tested-by: Austin Schuh <austin@peloton-tech.com>
> 

  reply	other threads:[~2014-01-06 13:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-23 19:25 [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs Austin Schuh
2014-01-06 13:32 ` Oliver Hartkopp [this message]
2014-04-07 18:38   ` Austin Schuh
2014-04-07 18:41     ` Thomas Gleixner
2014-04-07 20:05       ` Austin Schuh
2014-04-07 20:07         ` Thomas Gleixner
2014-04-07 20:08           ` Austin Schuh
2014-04-28 20:20             ` Austin Schuh
2014-04-28 20:44               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CAB05F.4010303@hartkopp.net \
    --to=socketcan@hartkopp.net \
    --cc=austin@peloton-tech.com \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=pisa@cmp.felk.cvut.cz \
    --cc=tglx@linutronix.de \
    --cc=wg@grandegger.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).