From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760254AbdAJKZj (ORCPT ); Tue, 10 Jan 2017 05:25:39 -0500 Received: from mail-pf0-f174.google.com ([209.85.192.174]:35284 "EHLO mail-pf0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750924AbdAJKZh (ORCPT ); Tue, 10 Jan 2017 05:25:37 -0500 From: David Carrillo-Cisneros To: linux-kernel@vger.kernel.org Cc: "x86@kernel.org" , Ingo Molnar , Thomas Gleixner , Andi Kleen , Kan Liang , Peter Zijlstra , Borislav Petkov , Srinivas Pandruvada , Dave Hansen , Vikas Shivappa , Mark Rutland , Arnaldo Carvalho de Melo , Vince Weaver , Paul Turner , Stephane Eranian , David Carrillo-Cisneros Subject: [RFC 0/6] optimize ctx switch with rb-tree Date: Tue, 10 Jan 2017 02:24:56 -0800 Message-Id: <20170110102502.106187-1-davidcc@google.com> X-Mailer: git-send-email 2.11.0.390.gc69c2f50cf-goog Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Following the discussion in: https://patchwork.kernel.org/patch/9420035/ This is is an early version of a series of perf context switches optimizations. The main idea is to create and maintain a list of inactive events sorted by timestamp, and a rb-tree index to index it. The rb-tree's key are {cpu,flexible,stamp} for task contexts and {cgroup,flexible,stamp} for CPU contexts. The rb-tree provides functions to find intervals in the inactive event list so that ctx_sched_in only has to visit the events that can be potentially be scheduled (i.e. avoid iterations over events bound to CPUs or cgroups that are not current). Since the inactive list is sort by timestamp, rotation can be done by simply scheduling out and in the events. This implies that each timer interrupt, the events will rotate by q events (where q is the number of hardware counters). This changes the current behavior of rotation. Feedback welcome! I haven't profiled the new approach. I am only assuming it will be superior when the number of per-cpu or distict cgroup events is large. The last patch shows how perf_iterate_ctx can use the new rb-tree index to reduce the number of visited events. I haven't looked carefully if locking and other things are correct. If this changes are in the right direction. A next version could remove some existing code, specifically the lists ctx->pinned_groups and ctx->flexible_groups could be removed. Also, event_filter_match could be simplified when called on events groups filtered using the rb-tree, since both perform similar checks. David Carrillo-Cisneros (6): perf/core: create active and inactive event groups perf/core: add a rb-tree index to inactive_groups perf/core: use rb-tree to sched in event groups perf/core: avoid rb-tree traversal when no inactive events perf/core: rotation no longer neccesary. Behavior has changed. Beware perf/core: use rb-tree index to optimize filtered perf_iterate_ctx include/linux/perf_event.h | 13 ++ kernel/events/core.c | 466 +++++++++++++++++++++++++++++++++++++++------ 2 files changed, 426 insertions(+), 53 deletions(-) -- 2.11.0.390.gc69c2f50cf-goog