From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751380AbdE2JP1 (ORCPT ); Mon, 29 May 2017 05:15:27 -0400 Received: from mga02.intel.com ([134.134.136.20]:53406 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751215AbdE2JPZ (ORCPT ); Mon, 29 May 2017 05:15:25 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,413,1491289200"; d="scan'208";a="107809912" Subject: Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi To: Peter Zijlstra Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Andi Kleen , Kan Liang , Dmitri Prokhorov , Valery Cherepennikov , David Carrillo-Cisneros , Stephane Eranian , Mark Rutland , linux-kernel@vger.kernel.org References: <1e962b59-3e39-e0d6-515d-c4fd3502edae@linux.intel.com> <20170529074636.tjftcdtcg6op74i3@hirez.programming.kicks-ass.net> From: Alexey Budankov Organization: Intel Corp. Message-ID: <75f031d8-68ec-4cd6-752f-1fbecaa86026@linux.intel.com> Date: Mon, 29 May 2017 12:15:14 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170529074636.tjftcdtcg6op74i3@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 29.05.2017 10:46, Peter Zijlstra wrote: > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: >> @@ -571,6 +587,27 @@ struct perf_event { >> * either sufficies for read. >> */ >> struct list_head group_entry; >> + /* >> + * Node on the pinned or flexible tree located at the event context; >> + * the node may be empty in case its event is not directly attached >> + * to the tree but to group_list list of the event directly >> + * attached to the tree; >> + */ >> + struct rb_node group_node; >> + /* >> + * List keeps groups allocated for the same cpu; >> + * the list may be empty in case its event is not directly >> + * attached to the tree but to group_list list of the event directly >> + * attached to the tree; >> + */ >> + struct list_head group_list; >> + /* >> + * Entry into the group_list list above; >> + * the entry may be attached to the self group_list list above >> + * in case the event is directly attached to the pinned or >> + * flexible tree; >> + */ >> + struct list_head group_list_entry; >> struct list_head sibling_list; >> >> /* > >> @@ -742,7 +772,17 @@ struct perf_event_context { >> >> struct list_head active_ctx_list; >> struct list_head pinned_groups; >> + /* >> + * Cpu tree for pinned groups; keeps event's group_node nodes >> + * of attached flexible groups; >> + */ >> + struct rb_root pinned_tree; >> struct list_head flexible_groups; >> + /* >> + * Cpu tree for flexible groups; keeps event's group_node nodes >> + * of attached flexible groups; >> + */ >> + struct rb_root flexible_tree; >> struct list_head event_list; >> int nr_events; >> int nr_active; >> @@ -758,6 +798,7 @@ struct perf_event_context { >> */ >> u64 time; >> u64 timestamp; >> + struct perf_event_tstamp tstamp_data; >> >> /* >> * These fields let us detect when two contexts have both > > > So why do we now have a list _and_ a tree for the same entries? We need groups list to iterate through all groups configured for collection and we need the tree to quickly iterate through the groups allocated for a particular CPU only. > > -Alexey