From: Oliver Hartkopp <socketcan@hartkopp.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Austin Schuh <austin@peloton-tech.com>,
Wolfgang Grandegger <wg@grandegger.com>,
Pavel Pisa <pisa@cmp.felk.cvut.cz>,
Marc Kleine-Budde <mkl@pengutronix.de>,
linux-can@vger.kernel.org
Subject: Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
Date: Mon, 06 Jan 2014 14:32:15 +0100 [thread overview]
Message-ID: <52CAB05F.4010303@hartkopp.net> (raw)
In-Reply-To: <CANGgnMbszHzYe9pF2C6wag4MY_PfBG2qrMCC=rMmQnb-jyXXXw@mail.gmail.com>
Hi Thomas,
I just wanted to add my
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
In my setup with Core i7 and 20 CAN busses SJA1000 PCIe the problem
disappeared with the discussed patch with the -rt kernel.
The system was running at full CAN bus load over the weekend more than 72
hours of operation without problems:
CPU0 CPU1 CPU2 CPU3
0: 40 0 0 0 IO-APIC-edge timer
1: 1 0 0 0 IO-APIC-edge i8042
8: 0 0 1 0 IO-APIC-edge rtc0
9: 42 45 45 42 IO-APIC-fasteoi acpi
16: 9 8 8 8 IO-APIC-fasteoi ahci, ehci_hcd:usb1, can4, can5, can6, can7
17: 441468642 443275488 443609061 441436145 IO-APIC-fasteoi can8, can10, can11, can9
18: 441975412 438811422 437317802 441209092 IO-APIC-fasteoi can12, can13, can14, can15
19: 427310388 428661677 429813687 428095739 IO-APIC-fasteoi can0, can1, can2, can3, can16, can17, can18, can19
(..)
Before the having the patch, it lasted 1 minutes to 1.5 hours (usually ~3
minutes) until the irq was killed due to the spurious detection using Linux
3.10.11-rt (Debian linux-image-3.10-0.bpo.3-rt-686-pae).
I also tested the patch on different latest 3.13-rc5+ (non-rt) kernels for two
weeks now without problems.
If you want me to test an improved version (as Austin suggested below) please
send a patch.
Best regards,
Oliver
On 23.12.2013 20:25, Austin Schuh wrote:
> Hi Thomas,
>
> Did anything happen with your patch to note_interrupt, originally
> posted on May 8th of 2013? (https://lkml.org/lkml/2013/3/7/222)
>
> I am seeing an issue on a machine right now running a
> config-preempt-rt kernel and a SJA1000 CAN card from PEAK. It works
> for ~1 day, and then proceeds to die with a "Disabling IRQ #18"
> message. I posted on the Linux CAN mailing list, and Oliver Hartkopp
> was able to reproduce the issue only on a realtime kernel. A function
> trace ending when the IRQ was disabled shows that note_interrupt is
> being called regularly from the IRQ handler threads, and one of the
> threads is doing work (and therefore calling note_interrupt with
> IRQ_HANDLED).
>
> Oliver Hartkopp and I ran tests over the weekend on numerous machines
> and verified that the patch that you proposed fixes the problem. We
> think that the race condition that Till reported is causing the
> problem here.
>
> In reply to the comment about using the upper bit of
> threads_handled_last for holding the SPURIOUS_DEFERRED flag, while
> that may still be an over-optimization, the code should still work.
> All comparisons are done with the bit set, which just makes it a 31
> bit counter. It will take 8 more days for the counter to overflow on
> my machine, so I won't know for certain until then.
>
> My only concern is that there may still be a small race condition with
> this new code. If the interrupt handler thread is running at a
> realtime priority, but lower than another task, it may not get run
> until a large number of IRQs get triggered, and then process them
> quickly. With your new handler code, this would be counted as one
> single handled interrupt. With the current constants, this is only a
> problem if more than 1000 calls to the handler happen between IRQs. I
> starved my card's irq threads by running 4 tasks at a higher realtime
> priority than the handler threads, and saw the number of unhandled
> IRQs jump from 1/100000 to 3/100000, so that problem may not show up
> in practice.
>
> Austin Schuh
>
> Tested-by: Austin Schuh <austin@peloton-tech.com>
>
next prev parent reply other threads:[~2014-01-06 13:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-23 19:25 [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs Austin Schuh
2014-01-06 13:32 ` Oliver Hartkopp [this message]
2014-04-07 18:38 ` Austin Schuh
2014-04-07 18:41 ` Thomas Gleixner
2014-04-07 20:05 ` Austin Schuh
2014-04-07 20:07 ` Thomas Gleixner
2014-04-07 20:08 ` Austin Schuh
2014-04-28 20:20 ` Austin Schuh
2014-04-28 20:44 ` Thomas Gleixner
-- strict thread matches above, loose matches on Subject: below --
2013-03-07 13:53 Thomas Gleixner
2013-03-08 14:47 ` Till Straumann
2013-03-08 16:12 ` Thomas Gleixner
2013-03-08 17:19 ` Till Straumann
2013-03-08 19:41 ` Thomas Gleixner
2013-03-12 13:22 ` Till Straumann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52CAB05F.4010303@hartkopp.net \
--to=socketcan@hartkopp.net \
--cc=austin@peloton-tech.com \
--cc=linux-can@vger.kernel.org \
--cc=mkl@pengutronix.de \
--cc=pisa@cmp.felk.cvut.cz \
--cc=tglx@linutronix.de \
--cc=wg@grandegger.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.