All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Hartkopp <socketcan@hartkopp.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Austin Schuh <austin@peloton-tech.com>,
	Wolfgang Grandegger <wg@grandegger.com>,
	Pavel Pisa <pisa@cmp.felk.cvut.cz>,
	Marc Kleine-Budde <mkl@pengutronix.de>,
	linux-can@vger.kernel.org
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
Date: Mon, 06 Jan 2014 14:32:15 +0100	[thread overview]
Message-ID: <52CAB05F.4010303@hartkopp.net> (raw)
In-Reply-To: <CANGgnMbszHzYe9pF2C6wag4MY_PfBG2qrMCC=rMmQnb-jyXXXw@mail.gmail.com>

Hi Thomas,

I just wanted to add my

Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>

In my setup with Core i7 and 20 CAN busses SJA1000 PCIe the problem
disappeared with the discussed patch with the -rt kernel.

The system was running at full CAN bus load over the weekend more than 72
hours of operation without problems:
 
           CPU0       CPU1       CPU2       CPU3       
  0:         40          0          0          0   IO-APIC-edge      timer
  1:          1          0          0          0   IO-APIC-edge      i8042
  8:          0          0          1          0   IO-APIC-edge      rtc0
  9:         42         45         45         42   IO-APIC-fasteoi   acpi
 16:          9          8          8          8   IO-APIC-fasteoi   ahci, ehci_hcd:usb1, can4, can5, can6, can7
 17:  441468642  443275488  443609061  441436145   IO-APIC-fasteoi   can8, can10, can11, can9
 18:  441975412  438811422  437317802  441209092   IO-APIC-fasteoi   can12, can13, can14, can15
 19:  427310388  428661677  429813687  428095739   IO-APIC-fasteoi   can0, can1, can2, can3, can16, can17, can18, can19
(..)

Before the having the patch, it lasted 1 minutes to 1.5 hours (usually ~3
minutes) until the irq was killed due to the spurious detection using Linux
3.10.11-rt (Debian linux-image-3.10-0.bpo.3-rt-686-pae).

I also tested the patch on different latest 3.13-rc5+ (non-rt) kernels for two
weeks now without problems.

If you want me to test an improved version (as Austin suggested below) please
send a patch.

Best regards,
Oliver

On 23.12.2013 20:25, Austin Schuh wrote:
> Hi Thomas,
> 
> Did anything happen with your patch to note_interrupt, originally
> posted on May 8th of 2013?  (https://lkml.org/lkml/2013/3/7/222)
> 
> I am seeing an issue on a machine right now running a
> config-preempt-rt kernel and a SJA1000 CAN card from PEAK.  It works
> for ~1 day, and then proceeds to die with a "Disabling IRQ #18"
> message.  I posted on the Linux CAN mailing list, and Oliver Hartkopp
> was able to reproduce the issue only on a realtime kernel.  A function
> trace ending when the IRQ was disabled shows that note_interrupt is
> being called regularly from the IRQ handler threads, and one of the
> threads is doing work (and therefore calling note_interrupt with
> IRQ_HANDLED).
> 
> Oliver Hartkopp and I ran tests over the weekend on numerous machines
> and verified that the patch that you proposed fixes the problem.  We
> think that the race condition that Till reported is causing the
> problem here.
> 
> In reply to the comment about using the upper bit of
> threads_handled_last for holding the SPURIOUS_DEFERRED flag, while
> that may still be an over-optimization, the code should still work.
> All comparisons are done with the bit set, which just makes it a 31
> bit counter.  It will take 8 more days for the counter to overflow on
> my machine, so I won't know for certain until then.
> 
> My only concern is that there may still be a small race condition with
> this new code.  If the interrupt handler thread is running at a
> realtime priority, but lower than another task, it may not get run
> until a large number of IRQs get triggered, and then process them
> quickly.  With your new handler code, this would be counted as one
> single handled interrupt.  With the current constants, this is only a
> problem if more than 1000 calls to the handler happen between IRQs.  I
> starved my card's irq threads by running 4 tasks at a higher realtime
> priority than the handler threads, and saw the number of unhandled
> IRQs jump from 1/100000 to 3/100000, so that problem may not show up
> in practice.
> 
> Austin Schuh
> 
> Tested-by: Austin Schuh <austin@peloton-tech.com>
> 

  reply	other threads:[~2014-01-06 13:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-23 19:25 [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs Austin Schuh
2014-01-06 13:32 ` Oliver Hartkopp [this message]
2014-04-07 18:38   ` Austin Schuh
2014-04-07 18:41     ` Thomas Gleixner
2014-04-07 20:05       ` Austin Schuh
2014-04-07 20:07         ` Thomas Gleixner
2014-04-07 20:08           ` Austin Schuh
2014-04-28 20:20             ` Austin Schuh
2014-04-28 20:44               ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2013-03-07 13:53 Thomas Gleixner
2013-03-08 14:47 ` Till Straumann
2013-03-08 16:12   ` Thomas Gleixner
2013-03-08 17:19     ` Till Straumann
2013-03-08 19:41       ` Thomas Gleixner
2013-03-12 13:22         ` Till Straumann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CAB05F.4010303@hartkopp.net \
    --to=socketcan@hartkopp.net \
    --cc=austin@peloton-tech.com \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=pisa@cmp.felk.cvut.cz \
    --cc=tglx@linutronix.de \
    --cc=wg@grandegger.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.