From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753118AbXISASL (ORCPT ); Tue, 18 Sep 2007 20:18:11 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751370AbXISAR6 (ORCPT ); Tue, 18 Sep 2007 20:17:58 -0400 Received: from node3.inaccessnetworks.com ([212.205.200.118]:43425 "EHLO inaccessnetworks.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751361AbXISAR5 (ORCPT ); Tue, 18 Sep 2007 20:17:57 -0400 X-Greylist: delayed 1256 seconds by postgrey-1.27 at vger.kernel.org; Tue, 18 Sep 2007 20:17:56 EDT Date: Wed, 19 Sep 2007 02:56:48 +0300 To: linux-kernel@vger.kernel.org Subject: Bogus high interrupt load? Message-ID: <20070918235648.GA29746@inaccessnetworks.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.13 (2006-08-11) From: amanous@inaccessnetworks.com (Aggelos Manousarides) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org I am using kernel 2.6.13 on a PXA. The system has a DSP driver which triggers an interrupt every 5ms with a free running clock, unrelated to the PXA clock. During the debugging of the DSP driver a strange load peak was detected with a very precise period. Normally the total load figure was about 1%-2%, but every 4:40 minutes, the interrupt load was almost 30% of the system for about 10 seconds. All tools which are based on the /proc/stat numbers reported this (top, a custom top build on /proc/stat, snmp agent). This system supports multiple slots on a custom bus and another DSP card was used (which obviously had a different crystal). The same phenomenon was repeated but with a different period. The symptoms suggest that this a clock drift related problem. Various debugging methods where used (including the oprofile facility) and the possibility of a bug in the DSP driver has been eliminated. Also, the logic analyzer did not reveal any prolonged hardware cycles during this period in which there was load. Then, we came across a bug in 2.6.21, related to the load average and the calc_load function. The load I am referring is not based on the load average figures but the ones reported in /proc/stat, so this is probably not the same issue. But this gave us the idea that the load might not actually be real! So the following experiment was performed. A silly C program (about 5 lines long) sets a counter to zero and increases the counter in a busy loop until it reaches a certain bit boundary. The boundary was set to a number which made the loop last about 3.4 seconds. The following script was executed: while true; do time ./test; done Obviously this produces a 100% load on the system. When no interrupt load is present, time reported 3,4 seconds real, 3,4 seconds user. When actual interrupt load was introduced (high ethernet traffic) time reported 3,4 seconds user and a bigger number for real. Everything looks normal. However, during the supposed load produced by the DSP driver, the following amazing thing happened. Time reported that the real time was again 3,4 seconds (and it was, there was no delay) and that 1,7 seconds was user load and 1,7 was system load! How can a process that does "i = 0; while (i++) ;" produce system load? There is obviously something happening to the load reporting facility of the kernel when a periodic interrupt source is present. Has anyone seen something like this? I am not using any real time patches, this is a vanilla kernel patched only to support the platform at hand. -- Angelos Manousaridis