From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751490AbdAMK0E (ORCPT ); Fri, 13 Jan 2017 05:26:04 -0500 Received: from foss.arm.com ([217.140.101.70]:38672 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751351AbdAMK0C (ORCPT ); Fri, 13 Jan 2017 05:26:02 -0500 Date: Fri, 13 Jan 2017 10:24:59 +0000 From: Mark Rutland To: David Carrillo-Cisneros Cc: Peter Zijlstra , Kan Liang , linux-kernel , "x86@kernel.org" , Ingo Molnar , Thomas Gleixner , Andi Kleen , Borislav Petkov , Srinivas Pandruvada , Dave Hansen , Vikas Shivappa , Arnaldo Carvalho de Melo , Vince Weaver , Paul Turner , Stephane Eranian Subject: Re: [RFC 3/6] perf/core: use rb-tree to sched in event groups Message-ID: <20170113102458.GA26120@leverpostej> References: <20170110102502.106187-1-davidcc@google.com> <20170110102502.106187-4-davidcc@google.com> <20170110163850.GA24036@leverpostej> <20170112121432.GD10615@leverpostej> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 13, 2017 at 12:01:03AM -0800, David Carrillo-Cisneros wrote: > On Thu, Jan 12, 2017 at 4:14 AM, Mark Rutland wrote: > > On Tue, Jan 10, 2017 at 12:51:58PM -0800, David Carrillo-Cisneros wrote: > >> On Tue, Jan 10, 2017 at 8:38 AM, Mark Rutland wrote: > >> > That's a fair point. Sorting by CPU before runtime we'll get subtrees we > >> > won't get fairness unless we sort the events solely by runtime at > >> > sched_in time. If we sort by with runtime before CPU we'll have to skip > >> > events not targeting the current CPU when scheduling task events in. I > >> > note the latter is true today anyhow. > >> > >> That's were ctx->inactive_groups comes in. That list is sorted by runtime > >> and the rb-tree is used to skip to the part of the list that has the events > >> that matter. > > > > Is the list only sorted by runtime and not {cpu,runtime}? If it's the > > latter, I'm not sure I follow. If it's the former, sorry for the noise! > > The former. List only sorted by runtime. Ah, sorry. I had missed that. > > The case I'm worried about is a set of task-bound events that have CPU > > filters. For example, if the user opens a set of task-bound events for > > any CPU: > > > > perf_event_open(attr1, pid, -1, -1, 0); > > perf_event_open(attr2, pid, -1, -1, 0); > > > > ... and also some for the same task, but limited to a specific CPU: > > > > perf_event_open(attr3, pid, 1, -1, 0); > > perf_event_open(attr4, pid, 1, -1, 0); > > > > ... if CPU is before runtime in the sort, one of these groups will > > always be considered first when scheduling, and may starve the other > > group. > > Yes, that case is the reason to have the sorted inactive_list and > the tree. I tried to explain this in the change log of this patch. Please > see new attempt below. That's mostly a reading comprehension failure on my behalf, the commit log does accurately describe this. It might be a little clearer if we say the inactive list is sorted *solely* by timestamp, but nothing more than that should be necessary. > >> > In Peter's original suggestion we didn't sort by cgroup. IIRC there was > >> > some email thread where the cgroup was considered for the sort (maybe > >> > that was *only* for cpu contexts? I'm not too familiar with cgroups), > >> > though I can't find the relevant mail, if it existed. :/ > >> > >> FWIW, in this approach, we only sort by cgroup in CPU contexts, since cgroup > >> events are only installed in CPU contexts. > > > > Sure. However, I think a similar issue to the above applies when > > scheduling events where some are bound to a specific cgroup, and others > > are not. > > Yes, it's addressed in the same way. I see that now. Many thanks for the explanation, and apologies for the noise. Thanks, Mark.