From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752593AbdF3KWQ (ORCPT ); Fri, 30 Jun 2017 06:22:16 -0400 Received: from mga05.intel.com ([192.55.52.43]:3074 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752422AbdF3KWO (ORCPT ); Fri, 30 Jun 2017 06:22:14 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,286,1496127600"; d="scan'208";a="280575856" From: Alexey Budankov Subject: Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi To: Peter Zijlstra Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Andi Kleen , Kan Liang , Dmitri Prokhorov , Valery Cherepennikov , David Carrillo-Cisneros , Stephane Eranian , Mark Rutland , linux-kernel@vger.kernel.org References: <1e962b59-3e39-e0d6-515d-c4fd3502edae@linux.intel.com> <20170529074636.tjftcdtcg6op74i3@hirez.programming.kicks-ass.net> <75f031d8-68ec-4cd6-752f-1fbecaa86026@linux.intel.com> <20170529104304.vy47zhf6fdq6bki3@hirez.programming.kicks-ass.net> <0e8d266e-ea38-baea-765d-cab98df9b9bc@linux.intel.com> <20170529112311.ht3pg2dd7pjm3m3a@hirez.programming.kicks-ass.net> <04fc9b86-8165-7c64-9f23-eb861d9384c9@linux.intel.com> <14544d26-3198-1d9a-3585-d6a7b09845f4@linux.intel.com> <2cc7195d-9354-0a2a-d97f-0bc8dc2545b0@linux.intel.com> Organization: Intel Corp. Message-ID: Date: Fri, 30 Jun 2017 13:22:09 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <2cc7195d-9354-0a2a-d97f-0bc8dc2545b0@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, On 21.06.2017 18:39, Alexey Budankov wrote: > > Hi, > > On 15.06.2017 20:42, Alexey Budankov wrote: >> On 29.05.2017 14:45, Alexey Budankov wrote: >>> On 29.05.2017 14:23, Peter Zijlstra wrote: >>>> On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: >>>>> On 29.05.2017 13:43, Peter Zijlstra wrote: >>>> >>>>>> Why can't the tree do both? >>>>>> >>>>> >>>>> Well, indeed, the tree provides such capability too. However switching to >>>>> the full tree iteration in cases where we now go through _groups lists will >>>>> enlarge the patch, what is probably is not a big deal. Do you think it is >>>>> worth implementing the switch? >>>> >>>> Do it as a series of patches, where patch 1 introduces the tree, patches >>>> 2 through n convert the list users into tree users, and patch n+1 >>>> removes the list. >>> >>> Well ok, let's do that additionally but please expect delay in delivery (I am OOO till Jun 14). >> >> addressed in v3. >> >>> >>>> >>>> I think its good to not have duplicate data structures if we can avoid >>>> it. >>>> >>> >>> yeah, makes sense. >>> >>> >>> >> >> > > After straightforward switch from struct list_head to struct rb_tree for flexible_groups I now get dmesg dumps on rb tree corruptions. That happens when iterating thru tree instead of thru list. No additional > synchronization for the tree access was added. It looks like there are > some assumptions on the list_head type in the implementation itself. > > Are there any ideas on why that corruptions may happen? > > I still suggest isolating event groups into a separate object (please see patch v4-1/4): > > struct perf_event_groups { > struct rb_root tree; > struct list_head list; > }; > > struct perf_event_context { > ... > struct perf_event_groups pinned_groups; > struct perf_event_groups flexible_groups; > > and implementing new API for the object: > > perf_event_groups_empty() > perf_event_groups_init() > perf_event_groups_insert() > perf_event_groups_delete() > perf_event_groups_rotate(..., int cpu) > perf_event_groups_iterate_cpu(..., int cpu) > perf_event_groups_iterate() > > so that perf_event_groups_iterate() would go thru list but leaving > the opportunity of iteration thru tree for a separate patch because > complete transition to rb trees may incur synchronization overhead in runtime. Completely got rid of list and tree duplication in patch v5 4/4. Please see here: [PATCH v5 4/4] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi > > Thanks, > Alexey > Thanks, Alexey