From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754191Ab1LBT2a (ORCPT ); Fri, 2 Dec 2011 14:28:30 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:56452 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753866Ab1LBT22 (ORCPT ); Fri, 2 Dec 2011 14:28:28 -0500 Message-ID: <4ED9267F.10106@fb.com> Date: Fri, 2 Dec 2011 11:26:55 -0800 From: Arun Sharma User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: Peter Zijlstra CC: , William Cohen , Stephane Eranian , Vince Weaver , Subject: Re: [RFC][PATCH 0/6] perf: x86 RDPMC and RDTSC support References: <20111121145114.049265181@chello.nl> In-Reply-To: <20111121145114.049265181@chello.nl> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.18.252] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.5.7110,1.0.211,0.0.0000 definitions=2011-12-02_05:2011-12-02,2011-12-02,1970-01-01 signatures=0 X-Proofpoint-Spam-Reason: safe Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/21/11 6:51 AM, Peter Zijlstra wrote: > These few patches implement x86 RDPMC support and add an extention to the self > monitoring data to also allow additional time updates using userspace TSC reads. > > There's a few loose ends, but it mostly seems to work. I haven't had a chance to test this out yet. But low overhead, always on perf counters is something we're very interested in. Thanks for implementing it. However, I suspect the major cost of leaving the perf counters always on seems to be in the hit on context switches, rather than the cost of reading the perf counters themselves. For eg: Baseline: (for i in `seq 1 10`; do numactl --cpunodebind 1 ./lat_ctx -P1 -s32k 4; done) 2>&1 | tee lmbench1.log 1 event: (for i in `seq 1 10`; do numactl --cpunodebind 1 perf stat -e instructions ./lat_ctx -P1 -s32k 4; done) 2>&1 | tee lmbench2.log 2 events: (for i in `seq 1 10`; do numactl --cpunodebind 1 perf stat -e cycles,instructions ./lat_ctx -P1 -s32k 4; done) 2>&1 | tee lmbench3.log Baseline: 2.2us One event: 6.8us Two events: 7.2us The cost seems to be at roughly 5us (I measured 2.6.38 and 3.2-rc2). I'll dig a bit more on what may be going on here. -Arun