From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH] poll: Avoid extra wakeups in select/poll Date: Thu, 30 Apr 2009 12:49:00 +0200 Message-ID: <49F9821C.5010802@cosmosbay.com> References: <49F43B8F.2050907@cosmosbay.com> <87ab60rh8t.fsf@basil.nowhere.org> <49F71B63.8010503@cosmosbay.com> <49F76174.6060009@cosmosbay.com> <49F767FD.2040205@cosmosbay.com> <49F76F6C.80005@cosmosbay.com> <49F77108.7060509@cosmosbay.com> <20090429091130.GA27857@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Christoph Lameter , linux kernel , Andi Kleen , David Miller , jesse.brandeburg@intel.com, netdev@vger.kernel.org, haoki@redhat.com, mchan@broadcom.com, davidel@xmailserver.org To: Ingo Molnar Return-path: Received: from gw1.cosmosbay.com ([212.99.114.194]:35920 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751818AbZD3Kt5 convert rfc822-to-8bit (ORCPT ); Thu, 30 Apr 2009 06:49:57 -0400 In-Reply-To: <20090429091130.GA27857@elte.hu> Sender: netdev-owner@vger.kernel.org List-ID: Ingo Molnar a =E9crit : > * Eric Dumazet wrote: >=20 >> On uddpping, I had prior to the patch about 49000 wakeups per=20 >> second, and after patch about 26000 wakeups per second (matches=20 >> number of incoming udp messages per second) >=20 > very nice. It might not show up as a real performance difference if=20 > the CPUs are not fully saturated during the test - but it could show=20 > up as a decrease in CPU utilization. >=20 > Also, if you run the test via 'perf stat -a ./test.sh' you should=20 > see a reduction in instructions executed: >=20 > aldebaran:~/linux/linux> perf stat -a sleep 1 >=20 > Performance counter stats for 'sleep': >=20 > 16128.045994 task clock ticks (msecs) > 12876 context switches (events) > 219 CPU migrations (events) > 186144 pagefaults (events) > 20911802763 CPU cycles (events) > 19309416815 instructions (events) > 199608554 cache references (events) > 19990754 cache misses (events) >=20 > Wall-clock time elapsed: 1008.882282 msecs >=20 > With -a it's measured system-wide, from start of test to end of test=20 > - the results will be a lot more stable (and relevant) statistically=20 > than wall-clock time or CPU usage measurements. (both of which are=20 > rather imprecise in general) I tried this perf stuff and got strange results on a cpu burning bench,= =20 saturating my 8 cpus with a "while (1) ;" loop # perf stat -a sleep 10 Performance counter stats for 'sleep': 80334.709038 task clock ticks (msecs) 80638 context switches (events) 4 CPU migrations (events) 468 pagefaults (events) 160694681969 CPU cycles (events) 160127154810 instructions (events) 686393 cache references (events) 230117 cache misses (events) Wall-clock time elapsed: 10041.531644 msecs So its about 16069468196 cycles per second for 8 cpus Divide by 8 to get 2008683524 cycles per second per cpu, which is not 3000000000 (E5450 @ 3.00GHz) It seems strange a "jmp myself" uses one unhalted cycle per instruction= =20 and 0.5 halted cycle ... Also, after using "perf stat", tbench results are 1778 MB/S instead of 2610 MB/s. Even if no perf stat running.