From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1161584AbdEWUxo (ORCPT <rfc822;w@1wt.eu>);
        Tue, 23 May 2017 16:53:44 -0400
Received: from mga06.intel.com ([134.134.136.31]:35408 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1032788AbdEWUxj (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 23 May 2017 16:53:39 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.38,383,1491289200"; 
   d="scan'208";a="972251348"
From: Andi Kleen <andi@firstfloor.org>
To: Michael Edwards <michael@tensyr.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
        linux-perf-users@vger.kernel.org
Subject: Re: perf/x86/intel: Collecting CPU-local performance counters from all cores in parallel
References: <CACFdaOz-ox4XSu-q8S-Op8xPTDwoT6FAN-yhi0988NJiazpm0Q@mail.gmail.com>
Date: Tue, 23 May 2017 13:53:32 -0700
In-Reply-To: <CACFdaOz-ox4XSu-q8S-Op8xPTDwoT6FAN-yhi0988NJiazpm0Q@mail.gmail.com>
        (Michael Edwards's message of "Mon, 22 May 2017 22:42:29 -0700")
Message-ID: <87o9uj47n7.fsf@firstfloor.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Michael Edwards <michael@tensyr.com> writes:
>
> Am I going about this wrong?

It seems like a reasonable optimization, but it's likely a lot of work.

> Is there some better way to pursue the
> high-level goal of gathering PMC-based statistics frequently and
> efficiently from all cores, without breaking everything else that uses
> perf_events?

If you can drive the collection from a performance counter
(e.g. reference cycles) you could use leader sampling, and let the
PMIs log the values to the mmap'ed ring buffer. This should
be vastly more efficient than pulling everything. This works today,
however there are some scaling problems with many groups still.

perf record -F frequency -e '{cpu/ref-cycles/,<three other
events to collect>}:S,... more groups like this ... -a ...

-Andi