From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Max_M=FCller?= Subject: Re: Tweak Latency on Intel ATOM Date: Mon, 15 Feb 2010 10:32:54 +0100 Message-ID: <4B7914C6.8080005@gmx.net> References: <20100210163818.7f54ec3a@torg> <4B73B7B5.4070509@gmx.net> <20100211093447.3c0a97cd@torg> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-rt-users@vger.kernel.org To: Clark Williams Return-path: Received: from mo-p05-ob.rzone.de ([81.169.146.181]:18197 "EHLO mo-p05-ob.rzone.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753614Ab0BOJc5 (ORCPT ); Mon, 15 Feb 2010 04:32:57 -0500 In-Reply-To: <20100211093447.3c0a97cd@torg> Sender: linux-rt-users-owner@vger.kernel.org List-ID: Clark Williams schrieb: > On Thu, 11 Feb 2010 08:54:29 +0100 > Max M=FCller wrote: > > =20 >> Clark Williams schrieb: >> =20 >>> On Tue, 9 Feb 2010 07:41:58 +0000 (UTC) >>> Max Miller wrote: >>> >>> =20 >>> =20 >>>> Hello, >>>> >>>> im am using the PREEMPT-RT patch on linux 2.6.29.6. It runs on a M= SI965GSE >>>> industial board with Intel ATOM CPU (N270, 1,6GHz) and i945GSE Nor= thbridge.=20 >>>> >>>> I got about 45=B5s as maximum and 13=B5s as average latency when h= yperthreading is >>>> disabled. With enabled Hyperthreading the maximum latency increses= to about >>>> 100=B5s. I measured the latency with cyclictest.=20 >>>> >>>> What can i do to get better maximum latency? Can I do somthing in = the kernel >>>> configuration or are there some kernel bootoptions? Or is it still= impossible >>>> with this CPU to get better results? >>>> >>>> Thanks in advance, >>>> Max Miller=20 >>>> >>>> >>>> =20 >>>> =20 >>> Make sure you turn off any power management settings in the BIOS an= d >>> turn off the irqbalance and cpuspeed services on the Linux side. >>> >>> What cyclictest command are you using to measure latency? >>> >>> Clark >>> =20 >>> =20 >> I run cyclictest as follows: >> >> cyclictest -n -t3 -p99 >> =20 > > You might want to try the new cyclictest option --smp (which is reall= y > the options -t, -a -n) and I'd back the priority down to -p95 just to > keep out of the way of watchdog and migration threads. In general, wh= en > I run it on a multi-core box I use: > > $ cyclictest --smp -m -p95 -d0 > > Lately on AMD boxes I use --numa, which makes calls into libnuma to > allocated memory on local nodes for the measurement threads. > > If you want to get fancy and look at at the history of the run, you c= an > use the -h option to keep a histogram of buckets (1 bucket =3D= =3D 1 > microsecond).=20 > > =20 >> For generating additional system load i run (one to several instance= s): >> >> while true; do echo "blah" > /dev/null; done & >> >> Then i watch the max. latency from the thread with the highest prior= ity. >> Sometimes i add the parameter '-h' to generate a history. In this=20 >> history i can >> see that the most latency times are under 20=B5s, only about 5ppm a= re=20 >> worse than 30=B5s. >> Am i doing this correctly? >> =20 > > You're seeing some nice numbers there (any max latency under 100us is > pretty good).=20 > > I have a python program I've been developing named 'rteval' which kic= ks > off a kernel compile and a scheduler benchmark called 'hackbench', th= en > runs cyclictest with the histogram option. After the run it generates= a > report on how well cyclictest did with the loads in place. If you're > interested, you can get rteval from my kernel.org git repo: > > $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/r= teval.git > > It's not 100% complete, but it's getting there.=20 > > =20 >> The only powersave setting in the BIOS is "Intel speedstep" which i=20 >> disabled. >> >> >> I will check with disabled "irqbalance and cpuspeed services" disabl= ed=20 >> and will report later. >> >> >> What should the adequate max. latency on this system? >> >> >> =20 > > I'd say you're doing pretty good keeping under 50us. You might want t= o > try it under a heavier load than the shell script you've been running= =2E > If you don't want to fool with rteval, try kicking off a kernel compi= le > in another window like this: > > $ while true; do make -j4 clean bzImage modules; done > > and then run cyclictest. A kernel compile with parallel jobs (-j) is = a > good overall load of computation and I/O. > > =20 I tested now like you told me with irqbalance and cpuspeed services=20 disabled. I hope i made the right for disabling irqbalance, i used the=20 kernel parameter acpi_no_irqbalance. Is this correct? Unfortunately the= =20 results were nearly equal as before. =46or measureing latency i did now the following: -compile kernel (for high system load) -running cyclictest -n -m -t3 -p94 (for having some running high=20 priority threads) -running cyclictest -n -m -h80 -p95 -l6000000 (for latency measurement) I will also test your python program the next days. I have now about 50=B5s worst case latency and about 15=B5s average lat= ency. Greetings, Max -- To unsubscribe from this list: send the line "unsubscribe linux-rt-user= s" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html