From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752593AbdF3KWQ (ORCPT <rfc822;w@1wt.eu>);
        Fri, 30 Jun 2017 06:22:16 -0400
Received: from mga05.intel.com ([192.55.52.43]:3074 "EHLO mga05.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1752422AbdF3KWO (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Jun 2017 06:22:14 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.40,286,1496127600"; 
   d="scan'208";a="280575856"
From: Alexey Budankov <alexey.budankov@linux.intel.com>
Subject: Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process,
 profiling of STREAM benchmark on Intel Xeon Phi
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
        Dmitri Prokhorov <Dmitry.Prohorov@intel.com>,
        Valery Cherepennikov <valery.cherepennikov@intel.com>,
        David Carrillo-Cisneros <davidcc@google.com>,
        Stephane Eranian <eranian@google.com>,
        Mark Rutland <mark.rutland@arm.com>, linux-kernel@vger.kernel.org
References: <1e962b59-3e39-e0d6-515d-c4fd3502edae@linux.intel.com>
 <b410f376-69ed-c681-8413-c4186bc74aed@linux.intel.com>
 <20170529074636.tjftcdtcg6op74i3@hirez.programming.kicks-ass.net>
 <75f031d8-68ec-4cd6-752f-1fbecaa86026@linux.intel.com>
 <20170529104304.vy47zhf6fdq6bki3@hirez.programming.kicks-ass.net>
 <0e8d266e-ea38-baea-765d-cab98df9b9bc@linux.intel.com>
 <20170529112311.ht3pg2dd7pjm3m3a@hirez.programming.kicks-ass.net>
 <04fc9b86-8165-7c64-9f23-eb861d9384c9@linux.intel.com>
 <14544d26-3198-1d9a-3585-d6a7b09845f4@linux.intel.com>
 <2cc7195d-9354-0a2a-d97f-0bc8dc2545b0@linux.intel.com>
Organization: Intel Corp.
Message-ID: <e95f57ed-1ec6-e1b6-80b4-7afa79deabb3@linux.intel.com>
Date: Fri, 30 Jun 2017 13:22:09 +0300
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <2cc7195d-9354-0a2a-d97f-0bc8dc2545b0@linux.intel.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Peter,

On 21.06.2017 18:39, Alexey Budankov wrote:
> 
> Hi,
> 
> On 15.06.2017 20:42, Alexey Budankov wrote:
>> On 29.05.2017 14:45, Alexey Budankov wrote:
>>> On 29.05.2017 14:23, Peter Zijlstra wrote:
>>>> On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
>>>>> On 29.05.2017 13:43, Peter Zijlstra wrote:
>>>>
>>>>>> Why can't the tree do both?
>>>>>>
>>>>>
>>>>> Well, indeed, the tree provides such capability too. However switching to
>>>>> the full tree iteration in cases where we now go through _groups lists will
>>>>> enlarge the patch, what is probably is not a big deal. Do you think it is
>>>>> worth implementing the switch?
>>>>
>>>> Do it as a series of patches, where patch 1 introduces the tree, patches
>>>> 2 through n convert the list users into tree users, and patch n+1
>>>> removes the list.
>>>
>>> Well ok, let's do that additionally but please expect delay in delivery (I am OOO till Jun 14).
>>
>> addressed in v3.
>>
>>>
>>>>
>>>> I think its good to not have duplicate data structures if we can avoid
>>>> it.
>>>>
>>>
>>> yeah, makes sense.
>>>
>>>
>>>
>>
>>
> 
> After straightforward switch from struct list_head to struct rb_tree for flexible_groups I now get dmesg dumps on rb tree corruptions. That happens when iterating thru tree instead of thru list. No additional
> synchronization for the tree access was added. It looks like there are
> some assumptions on the list_head type in the implementation itself.
> 
> Are there any ideas on why that corruptions may happen?
> 
> I still suggest isolating event groups into a separate object (please see patch v4-1/4):
> 
> struct perf_event_groups {
>     struct rb_root     tree;
>     struct list_head list;
> };
> 
> struct perf_event_context {
> ...
> struct perf_event_groups pinned_groups;
> struct perf_event_groups flexible_groups;
> 
> and implementing new API for the object:
> 
> perf_event_groups_empty()
> perf_event_groups_init()
> perf_event_groups_insert()
> perf_event_groups_delete()
> perf_event_groups_rotate(..., int cpu)
> perf_event_groups_iterate_cpu(..., int cpu)
> perf_event_groups_iterate()
> 
> so that perf_event_groups_iterate() would go thru list but leaving
> the opportunity of iteration thru tree for a separate patch because
> complete transition to rb trees may incur synchronization overhead in runtime.

Completely got rid of list and tree duplication in patch v5 4/4.
Please see here: 

[PATCH v5 4/4] perf/core: addressing 4x slowdown during per-process 
                          profiling of STREAM benchmark on Intel Xeon Phi

> 
> Thanks,
> Alexey
> 

Thanks,
Alexey