From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oliver Hartkopp Subject: Re: sja1000 interrupt problem Date: Tue, 10 Dec 2013 14:47:24 +0100 Message-ID: <52A71B6C.3050600@hartkopp.net> References: <52831FC7.3040509@hartkopp.net> <201311131008.55018.pisa@cmp.felk.cvut.cz> <5287E6B2.8020709@hartkopp.net> <85256584a266750b1330cfae8bebd55c@grandegger.com> <5288D236.403@hartkopp.net> <5288FB91.9050703@grandegger.com> <52892B21.9000501@grandegger.com> <333c0fd4238558062478212eb0704b04@grandegger.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from mo-p00-ob.rzone.de ([81.169.146.160]:42825 "EHLO mo-p00-ob.rzone.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753182Ab3LJNra (ORCPT ); Tue, 10 Dec 2013 08:47:30 -0500 In-Reply-To: Sender: linux-can-owner@vger.kernel.org List-ID: To: Wolfgang Grandegger , Austin Schuh , Pavel Pisa Cc: linux-can@vger.kernel.org Hey all, as I have a similar setup here (Core i7, 5x PEAK cPCI = 20 CAN interfaces) I downloaded the linux-image-3.10-0.bpo.3-rt-686-pae kernel including the sources from http://packages.debian.org/de/wheezy-backports/kernel/ and was able to see Austins problem with the -rt kernel. My interrupt lines are mostly dedicated to the CAN interfaces, so I was able to select interrupts (17 & 19) that _only_ deal with sja1000 irq handlers: 16: 7 7 10 9 IO-APIC-fasteoi ehci_hcd:usb1, ahci, can4, can5, can6, can7 17: 6328236 6330659 6328557 6330266 IO-APIC-fasteoi can8, can10, can9 18: 0 0 0 0 IO-APIC-fasteoi can12, can13, can14, can15 19: 1446093 1443817 1445833 1444230 IO-APIC-fasteoi can2, can16, can17, can18, can19, can3, can1, can0 can0/can2 are linked together (500 kbit/s) can1/can3 are linked together (500 kbit/s) can9 is linked to a 1Mbit/s CAN traffic source All interfaces get a full bus load from the outside. Additionally can0 and can1 get a 'cangen -g0 -i ' from the local host. The funny thing was that one time IRQ #19 got disabled twice(?!?) : Message from syslogd@xxxxx at Dec 10 11:25:37 ... kernel:[ 967.213174] Disabling IRQ #19 Message from syslogd@xxxxx at Dec 10 12:06:13 ... kernel:[ 3401.523019] Disabling IRQ #17 Message from syslogd@xxxxx at Dec 10 12:49:08 ... kernel:[ 5975.113373] Disabling IRQ #19 Don't know where the last message could come from as the 8 CAN interfaces at this interrupt line were already dead for more than a hour. The disabling of the interrupt seems to be reproducible - as Austin already mentioned after different times. My assumption was that we run into a problem with the PITA chip, when consuming the interface specific interrupt line in peak_pci_post_irq(), see: static void peak_pci_post_irq(const struct sja1000_priv *priv) { struct peak_pci_chan *chan = priv->priv; u16 icr; /* Select and clear in PITA stored interrupt */ icr = readw(chan->cfg_base + PITA_ICR); if (icr & chan->icr_mask) writew(chan->icr_mask, chan->cfg_base + PITA_ICR); } With the writew() only the corresponding SJA1000 line is consumed. My quick hack was to clear all bits in the PITA each time: --- peak_pci.c~ 2013-09-08 07:10:14.000000000 +0200 +++ peak_pci.c 2013-12-10 13:26:48.315166478 +0100 @@ -542,9 +542,13 @@ u16 icr; /* Select and clear in PITA stored interrupt */ +#if 0 icr = readw(chan->cfg_base + PITA_ICR); if (icr & chan->icr_mask) writew(chan->icr_mask, chan->cfg_base + PITA_ICR); +#else + writew(0x00C3, chan->cfg_base + PITA_ICR); +#endif } static int peak_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) The 0x00C3 comes from OR'ing the values from static const u16 peak_pci_icr_masks[PEAK_PCI_CHAN_MAX] I'm currently running the setup for more than one hour without any problems. But I assume that this a really bad hack - and I did not check, if any CAN frames got lost. Btw. the performance increased from 90% busload to 95% busload with that patch when creating only local traffic on the host. Any idea how to proceed? Regards, Oliver