From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752992Ab0I2H33 (ORCPT ); Wed, 29 Sep 2010 03:29:29 -0400 Received: from ist.d-labs.de ([213.239.218.44]:39402 "EHLO mx01.d-labs.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751412Ab0I2H32 (ORCPT ); Wed, 29 Sep 2010 03:29:28 -0400 Date: Wed, 29 Sep 2010 09:29:24 +0200 From: Florian Mickler To: tmhikaru@gmail.com Cc: Greg KH , linux-kernel@vger.kernel.org Subject: Re: Linux 2.6.35.6 Message-ID: <20100929092924.2090f19e@schatten.dmk.lab> In-Reply-To: <20100928190358.GA24303@roll> References: <20100927003608.GA20395@kroah.com> <20100927163208.GA4892@roll> <20100927215135.3d11d587@schatten.dmk.lab> <20100927233956.GA15705@roll> <20100928083505.0a808ffd@schatten.dmk.lab> <20100928190358.GA24303@roll> X-Mailer: Claws Mail 3.7.6cvs31 (GTK+ 2.20.1; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 28 Sep 2010 15:03:58 -0400 tmhikaru@gmail.com wrote: > On Tue, Sep 28, 2010 at 08:35:05AM +0200, Florian Mickler wrote: > > > > > > Here's a graphical example of just how wacky this is: > > > > > > http://yfrog.com/6lloadbp > > > > > > In this image, the dip down to less than 0.5 after the 18'th is due to me > > > experimenting using the slackware distribution kernel (2.6.33.4) after I > > > finally noticed something was amiss. The sharp rise afterwards is due to me > > > first, building 2.6.35.5, and then afterwards, using it. To be perfectly > > > clear, I've previously used 2.6.34.2 and did not experience the problem > > > there either, nor is it in 2.6.33.4. > > > > What load figure are you basing your observations on? The 15 minutes > > average should be the most interesting (sampled at a 7 minutes > > interval...) > > my observations are based on letting the machine idle immediately after > bootup. I monitor the state of the machine using a program called conky, > which I have configured to show disk I/O, cpu use, swap I/O and among other > things, the load average. Immediately after booting my loadaverage tends to > peak at about 2.5 to 3.0; on a working kernel this eventually settles down > to 0.00 to 0.05 in about ten minutes. On kernels that exhibit this problem, > it doesn't settle lower than 0.3 and is much more likely to hang anywhere > from 0.8 to 1.2. In fact, if I give it enough time it'll raise and lower > itself constantly without any (visible) work being done. So basically I boot > the machine and go get a drink, come back, and if it's been ten minutes, > there's been no disk IO, cpu use, or any other activity recorded and it's > still above 0.3 something's not working right. Do you know what load average conky is showing you? If I type 'uptime' on a console, i get three load numbers: 1minute-, 5minutes- and 15minutes-average. If there is a systematic bias it should be visible on the 15minutes-average. If there are only bursts of 'load' it should be visible on the 1 minutes average numbers. But it doesn't really matter for now what kind of load disturbance you are seeing, because you actually have a better way to distinguish a good kernel from a bad: On Mon, 27 Sep 2010 12:32:08 -0400 tmhikaru@gmail.com wrote: > *Something* is wrong beyond the > mere loadaverage numbers going crazy however, since timed runs of kernel > compiles done with my distro's kernel and 2.6.35.5 show that while there is > no *apparent* use of cpu or disk showing in vmstat while the machine is > idle, the compiles on the newer kernel are taking approximately twice as > long as before. > If you're talking about the graph, > I merely posted it to show that I've been having this problem for over a > month, and it's demonstrably causing very inconsistent load averages. (Which > is why the graph isn't anything close to a line, it's a mess!) the graph > takes a reading every five minutes, if you were wondering about the sample > rate. Yes, the sample rate was one of the things I wanted to know, but also which of the 3 load figures you were graphing. > In other news, I'm in the process of bisection but keep having to skip > bisects that have compile errors. sigh. still at 12 hops, somewhere around > five thousands commits to check. Good. > > Tim McGrath > Regards, Flo