From mboxrd@z Thu Jan 1 00:00:00 1970 From: Clark Williams Subject: Re: Kernel ISR Latency Date: Tue, 11 Apr 2017 17:06:56 -0500 Message-ID: <20170411170656.2eda2c21@tagon> References: <20170410195536.GK16052@jcartwri.amer.corp.natinst.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/yusHpP7WdOP8/M6Kr18fVxd"; protocol="application/pgp-signature" Cc: Julia Cartwright , linux-rt-users@vger.kernel.org To: Brian Wrenn Return-path: Received: from mx1.redhat.com ([209.132.183.28]:34788 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753228AbdDKWHE (ORCPT ); Tue, 11 Apr 2017 18:07:04 -0400 In-Reply-To: Sender: linux-rt-users-owner@vger.kernel.org List-ID: --Sig_/yusHpP7WdOP8/M6Kr18fVxd Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Mon, 10 Apr 2017 17:31:52 -0400 Brian Wrenn wrote: > I imagine that it would be a bad idea to prioritize my ISR ahead of > any one of these? >=20 > 12 FF 99 139 - [posixcputmr/0] > 14 FF 99 139 - [migration/0] > 15 FF 99 139 - [watchdog/0] > 16 FF 99 139 - [watchdog/1] > 17 FF 99 139 - [migration/1] > 21 FF 99 139 - [posixcputmr/1] > 24 FF 99 139 - [watchdog/2] > 25 FF 99 139 - [migration/2] > 29 FF 99 139 - [posixcputmr/2] > 32 FF 99 139 - [watchdog/3] > 33 FF 99 139 - [migration/3] > 37 FF 99 139 - [posixcputmr/3] >=20 > At this stage, that's all that prioritized higher. Yeah, it's never a good idea to use FIFO:99 and compete with these guys.=20 >=20 > On Mon, Apr 10, 2017 at 4:48 PM, Brian Wrenn wrote: > > On Mon, Apr 10, 2017 at 3:55 PM, Julia Cartwright wrote:= =20 > >> On Mon, Apr 10, 2017 at 03:12:28PM -0400, Brian Wrenn wrote: =20 > >>> Hi, I'm trying to track down the source of some ISR latency that I'm > >>> observing on a logic analyzer. > >>> > >>> Just for some background, my setup is the following. I'm running > >>> Linaro 4.4.9 with PREEMPT RT Patch 17 on a Variscite SD410 SOM devkit. > >>> My kernel has the following PREEMPT RT options. =20 > >> [..] =20 > >>> > >>> I have a micro-controller attached to a GPIO pin and two separate SPI > >>> ports. Every 10 milliseconds the micro-controller writes 18 Kbytes of > >>> data to each of 2 SPI ports (total of 36 Kbytes between the two) and > >>> issues an interrupt on the GPIO. A kernel module has an ISR > >>> registered to the GPIO pin. When the ISR executes, it copies 18 > >>> Kbytes of data from each SPI. There are two RT threads executing in > >>> user space as SCHED_RR with identical priorities less than (i.e. lower > >>> than) 50, where as I understand the kernel runs at priority 50. =20 > >> > >> It's a bit more complicated than "the kernel runs at priority 50". =20 > > > > Yes, better stated, my kernel space IRQ gets a priority of 50. (Most > > of the others seem to have 50 as well.) The RT kernel assigns all threaded IRQs default priority of fifo:50.=20 > > =20 > >> > >> Perhaps what you're getting at is that by default, with irq forced > >> threading enabled (which is the default with -rt build kernels) the > >> irqthreads have the default scheduling policy FIFO with priority 50. > >> It's up to the administrator/system integrator to appropriately > >> prioritize these irqthreads to suit the application. > >> =20 > > > > Pursuant to that, I'm reviewing this to gain a better understanding of > > such things. Please forgive me; I hadn't found this link prior to > > messaging the listserve. > > > > http://subversion.ffado.org/wiki/IrqPriorities > > =20 > >> For your usecase, this means that likely _all_ of the peripheral > >> interrupts, including the low priority stuff you might not care about > >> for your application (like networking or uart or whatever) are being > >> prioritized right alongside your latency sensitive gpio handler. > >> =20 > >>> Each of those user space threads copies data from some shared memory > >>> within the kernel module and performs some basic data integrity > >>> checks, i.e. a SHA1. =20 > >> > >> Just to better understand: how are your RT threads signalled by your > >> gpio handler? =20 > > > > The user space RT threads have mmap'ed memory shared with the kernel > > module. The kernel module writes to a designated memory location the > > current read offset so that the user space RT threads know where to > > read from iteration to iteration. > > =20 > >> =20 > >>> The vast majority of interrupts incur a delay in range of tens of > >>> MICROseconds, meaning the ISR executes within such amount of time of > >>> the micro-controller raising the GPIO pin of the ISR. However, > >>> occasionally, more than six MILLIseconds passes between the time of > >>> the micro-controller issuing the GPIO interrupt and the ISR beginning > >>> to execute. =20 > >> > >> Also: how are you measuring this latency? In particular, where is your > >> test point for the "ISR beginning"? =20 > >>> =20 > > > > Upon detection of a bad CRC or bad sequence number, the user space > > threads raise another GPIO, not the GPIO of the ISR, another one that > > triggers a logic analyzer to dump a forward captured buffer. Then, I > > got back through the capture and identify the high latency ISR > > execution. (See attached image.) > > =20 > >>> I'm trying to track down the cause of the 6+ MILLIsecond latency, > >>> which I would not expect to happen. Are there any well practiced > >>> techniques for determining what factor(s) may be introducing this > >>> latency? I looked at the description for RT Watchdog, but I don't > >>> think it would help me much because the only two RT threads other than > >>> the kernel have a lower priority. =20 > >> > >> I don't think the RT watchdog will help you here, however, I would > >> recommend that you get acquainted with using ftrace/trace-cmd and the > >> irq and sched traceevents. > >> > >> Julia =20 > > > > It sounds like to me that maybe you've confirmed that I'm looking in > > the right direction w.r.t. modifying the kernel ISRs with something > > like 'chrt' and 'rtiq'. Yes? But I also will look into > > ftrace/trace-cmd and irq and sched trace events. I'd suggest using chrt to boost your irq(s) up above the default 50 to say = 51. That would insure that they're scheduled in before the other interrupt = threads are processed.=20 Of course that just means your interrupt gets serviced first and does nothi= ng to help if some other interrupt (or other entity) is delaying the schedu= ling of your user-space threads. As Julia said, you probably want to look a= t ftrace and run your application with some tracepoints enabled. That will = let you look at the timestamps when various events are hit and hopefully gi= ve you an idea of what's happening when your latency spike hits.=20 If you want to get fancy, look at what cyclictest (from the rt-tests packag= e) does when the --breaktrace option is used. It takes a threshold value an= d stops processing when the threshold is exceeded. In addition it stops any= active trace and writes a message to the trace buffers, so it's quite obvi= ous when you've hit a spike and you get a nice event trace leading up to th= e spike.=20 Clark --=20 The United States Coast Guard Ruining Natural Selection since 1790 --Sig_/yusHpP7WdOP8/M6Kr18fVxd Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJY7VOAAAoJEBO1XdB8U7hRjA8P/17orxL/stO80YK3eR6bRNcZ i+t1sL/LV5iz78jLWWayiiNQ+VpdddrS7RQRkNPpWofCQRdlVWUQnsTz4z5ctYaU R+gQ6CMtdg4CydwPCie5YoGnmet2wXuw4gtKyiL7iLYRClyBO+t9bYZwSYe+Es6X iuu3HX44uUx8LsXm9njiNd0nPR50gRlXVVZ7O6nZ9SjFRlh1UaTWBFygKdWLbBWn fMvz6bLZduF034uoTLVsuaKfsMRSM/RSOn0rLqyIYGGq8N8Lka4fM2k/eufG3EAA RrDuuUERbIqVr3OW8TTN35GNyMxBrH8YHFhkrWlhW7oJX8kZM1j2GTSpAi+6DQPz O8aObKrpCK6Ix7z5/Y0Bma47ZnvTjiB7fpkjKhSRi3u1fXgdqXwScsBrznADaFE2 CnO9ZlIlHNzL1r8pCDykg/WCkLYlUPriAQ/MedwnDFbd0yTTrRG6yig0rLgZpKqZ rKuQS8jFGnzS+91g0NSqhRNUv9QoU11pP2dHjZL+AHcj+BBYLyV4q8kGM2mwirU3 V66tbkRI5Swa+mDLkV39YZWdQW1fTGthtGm5+XGBkQjrnHCJfHt4DPnbIMlagZEe IEaE6gHsger40+fME6bdERkw2ScbwUdyXuQbwlwMlKZmXPI9omWc5zEUQBeJEn0d 85uLXl6nbM3A0hYZb9C6 =Kv8c -----END PGP SIGNATURE----- --Sig_/yusHpP7WdOP8/M6Kr18fVxd--