From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752006AbdFTPMP (ORCPT ); Tue, 20 Jun 2017 11:12:15 -0400 Received: from mga06.intel.com ([134.134.136.31]:64521 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750993AbdFTPMN (ORCPT ); Tue, 20 Jun 2017 11:12:13 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,364,1493708400"; d="scan'208";a="115364707" Subject: Re: [RESEND PATCH v3 1/n] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi To: Mark Rutland Cc: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Andi Kleen , Kan Liang , Dmitri Prokhorov , Valery Cherepennikov , David Carrillo-Cisneros , Stephane Eranian , linux-kernel@vger.kernel.org References: <20170620134442.GG28157@leverpostej> From: Alexey Budankov Organization: Intel Corp. Message-ID: <48e77d47-20cc-67b4-577e-00489767b263@linux.intel.com> Date: Tue, 20 Jun 2017 18:12:08 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170620134442.GG28157@leverpostej> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20.06.2017 16:44, Mark Rutland wrote: > On Fri, Jun 16, 2017 at 02:03:58AM +0300, Alexey Budankov wrote: >> perf/core: use rb trees for pinned/flexible groups >> >> By default, the userspace perf tool opens per-cpu task-bound events >> when sampling, so for N logical events requested by the user, the tool >> will open N * NR_CPUS events. >> >> In the kernel, we mux events with a hrtimer, periodically rotating the >> flexible group list and trying to schedule each group in turn. We >> skip groups whose cpu filter doesn't match. So when we get unlucky, >> we can walk N * (NR_CPUS - 1) groups pointlessly for each hrtimer >> invocation. >> >> This has been observed to result in significant overhead when running >> the STREAM benchmark on 272 core Xeon Phi systems. >> >> One way to avoid this is to place our events into an rb tree sorted by >> CPU filter, so that our hrtimer can skip to the current CPU's >> list and ignore everything else. >> >> As a step towards that, this patch replaces event group lists with rb >> trees. >> >> Signed-off-by: Alexey Budankov >> --- >> include/linux/perf_event.h | 18 ++- >> kernel/events/core.c | 393 >> +++++++++++++++++++++++++++++++++------------ >> 2 files changed, 307 insertions(+), 104 deletions(-) >> >> Addressed Mark Rutland's comments from the previous patch version. > > ... then this should be v4, no? > > Which comments? Could you pelase write a changelog in future? Changes are: 1. changed type of pinned_groups/flexible_groups to rb_tree; 2. removed group_list_entry and reused group_entry for that purposes; 3. added add_to_groups()/del_from_groups() helper functions; > > In future, please send patches as a series, with a known upper-bound > rather than N. It's really painful to find them when they're sent > separately, without a known upper bound. Accepted. > > [...] > >> +/* >> + * Delete a group from a tree. If the group is directly attached to >> the tree >> + * it also detaches all groups on the group's group_list list. >> + */ >> +static void >> +perf_cpu_tree_delete(struct rb_root *tree, struct perf_event *event) >> +{ >> + WARN_ON_ONCE(!tree || !event); >> + >> + if (RB_EMPTY_NODE(&event->group_node)) { >> + list_del_init(&event->group_entry); >> + } else { >> + struct perf_event *list_event, *list_next; >> + >> + rb_erase(&event->group_node, tree); >> + list_for_each_entry_safe(list_event, list_next, >> + &event->group_list, group_entry) >> + list_del_init(&list_event->group_entry); >> + } >> +} > > As I commented on the last version, this means that all groups which > were (incidentally) hanging off of this one are removed, and can > no longer be reached via the tree. > > Surely one of the remaining groups should be added to the tree? Aww, I see. That needs to implemented. Thanks. > > I don't beleive that is correct. > > I beleive it would be simpler to reason about a threaded rb-tree here, > since that special case wouldn't exist. > > Thanks, > Mark. >