From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750999AbdE2Kqp (ORCPT ); Mon, 29 May 2017 06:46:45 -0400 Received: from mga09.intel.com ([134.134.136.24]:34514 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750871AbdE2Kqo (ORCPT ); Mon, 29 May 2017 06:46:44 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,413,1491289200"; d="scan'208";a="1175659917" Subject: Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi To: Peter Zijlstra Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Andi Kleen , Kan Liang , Dmitri Prokhorov , Valery Cherepennikov , David Carrillo-Cisneros , Stephane Eranian , Mark Rutland , linux-kernel@vger.kernel.org References: <1e962b59-3e39-e0d6-515d-c4fd3502edae@linux.intel.com> <20170529074506.brvayqdp6xcxbhs7@hirez.programming.kicks-ass.net> <20170529103343.rqy37cxnp6zsbnap@hirez.programming.kicks-ass.net> From: Alexey Budankov Organization: Intel Corp. Message-ID: Date: Mon, 29 May 2017 13:46:38 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170529103343.rqy37cxnp6zsbnap@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 29.05.2017 13:33, Peter Zijlstra wrote: > On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote: >> On 29.05.2017 10:45, Peter Zijlstra wrote: >>> On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: >>>> Solution: >>>> >>>> cpu indexed trees for perf_event_context::pinned_groups and >>>> perf_event_context::flexible_groups lists are introduced. Every tree node >>>> keeps a list of groups allocated for the same cpu. A tree references only >>>> groups located at the appropriate group list. The tree provides capability >>>> to iterate over groups allocated for a specific cpu only, what is exactly >>>> required by multiplexing timer interrupt handler. The handler runs per-cpu >>>> and enables/disables groups using group_sched_in()/group_sched_out() that >>>> call event_filter_match() function filtering out groups allocated for cpus >>>> different from the one executing the handler. Additionally for every >>>> filtered out group group_sched_out() updates tstamps values to the current >>>> interrupt time. This updating work is now done only once by >>>> update_context_time() called by ctx_sched_out() before cpu groups >>>> iteration. For this trick to work it is required that tstamps of filtered >>>> out events would point to perf_event_context::tstamp_data object instead >>>> of perf_event::tstamp_data ones, as it is initialized from an event >>>> allocation. tstamps references are switched by >>>> group_sched_in()/group_sched_out() every time a group is checked for its >>>> suitability for currently running cpu. When a thread enters some cpu on >>>> a context switch a long run through pinned and flexible groups is >>>> performed by perf_event_sched_in(, mux=0) with new parameter mux set to 0 >>>> and filtered out groups tstamps are switched to >>>> perf_event_context::tstamp_data object. Then a series of multiplexing >>>> interrupts happens and the handler rotates the flexible groups calling >>>> ctx_sched_out(,mux=1)/perf_event_sched_in(,mux=1) iterating over the cpu >>>> tree lists only and avoiding long runs through the complete group lists. >>>> This is where speedup comes from. Eventually when the thread leaves the cpu >>>> ctx_sched_out(,mux=0) is called restoring tstamps pointers to the events' >>>> perf_event::tstamp_data objects. >>> >>> This is unreadable.. Please use whitespace. >> >> Do you mean do NOT use whitespaces? Could you explain in more detail what >> you mean? > > No, add _more_ whitespace. Use things like paragraphs and such. Reading > a massive blob of text like that is painful. Also use full and complete > sentences. For example, the very first sentence: > > 'cpu indexed trees for perf_event_context::pinned_groups and > perf_event_context::flexible_groups lists are introduced.' > > feels incomplete and leaves one wondering what for etc.. And it only > gets worse. > Makes sense. Will do. Thanks. > >>> Yeah, this doesn't go into a changelog. Have you _ever_ seen a changelog >>> with such crud in? >>> >> >> I have not. Could you please advise how to format this information to be >> suitable for changelog, if it's required at all? > > Don't include it. It should be fairly obvious from the diff itself what > changed after all. > Ok. Clear.