From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161584AbdEWUxo (ORCPT ); Tue, 23 May 2017 16:53:44 -0400 Received: from mga06.intel.com ([134.134.136.31]:35408 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1032788AbdEWUxj (ORCPT ); Tue, 23 May 2017 16:53:39 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,383,1491289200"; d="scan'208";a="972251348" From: Andi Kleen To: Michael Edwards Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, linux-perf-users@vger.kernel.org Subject: Re: perf/x86/intel: Collecting CPU-local performance counters from all cores in parallel References: Date: Tue, 23 May 2017 13:53:32 -0700 In-Reply-To: (Michael Edwards's message of "Mon, 22 May 2017 22:42:29 -0700") Message-ID: <87o9uj47n7.fsf@firstfloor.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michael Edwards writes: > > Am I going about this wrong? It seems like a reasonable optimization, but it's likely a lot of work. > Is there some better way to pursue the > high-level goal of gathering PMC-based statistics frequently and > efficiently from all cores, without breaking everything else that uses > perf_events? If you can drive the collection from a performance counter (e.g. reference cycles) you could use leader sampling, and let the PMIs log the values to the mmap'ed ring buffer. This should be vastly more efficient than pulling everything. This works today, however there are some scaling problems with many groups still. perf record -F frequency -e '{cpu/ref-cycles/,}:S,... more groups like this ... -a ... -Andi