From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933696AbdJXNNE (ORCPT ); Tue, 24 Oct 2017 09:13:04 -0400 Received: from mail.kernel.org ([198.145.29.99]:39260 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933634AbdJXNIv (ORCPT ); Tue, 24 Oct 2017 09:08:51 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46D6421908 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=acme@kernel.org Date: Tue, 24 Oct 2017 10:08:49 -0300 From: Arnaldo Carvalho de Melo To: Ingo Molnar Cc: Jiri Olsa , "Liang, Kan" , "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , "peterz@infradead.org" , "jolsa@kernel.org" , "wangnan0@huawei.com" , "hekuang@huawei.com" , "namhyung@kernel.org" , "alexander.shishkin@linux.intel.com" , "Hunter, Adrian" , "ak@linux.intel.com" Subject: Re: [PATCH V3 0/6] event synthesization multithreading for perf record Message-ID: <20171024130849.GD7045@kernel.org> References: <1508529934-369393-1-git-send-email-kan.liang@intel.com> <20171023114822.ijbixdkhysinlwqv@gmail.com> <37D7C6CF3E00A74B8858931C1DB2F077537D874E@SHSMSX103.ccr.corp.intel.com> <20171024092200.wef6b66ecmhrvaja@gmail.com> <20171024114755.GA2716@krava> <20171024125944.uswroptykcqrgjox@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171024125944.uswroptykcqrgjox@gmail.com> X-Url: http://acmel.wordpress.com User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Tue, Oct 24, 2017 at 02:59:44PM +0200, Ingo Molnar escreveu: > > * Jiri Olsa wrote: > > > I recently made some changes on threaded record, which are based > > on Namhyungs time* API, which is needed to read/sort the data afterwards > > > > but I wasn't able to get any substantial and constant reduce of LOST events > > and then I got sidetracked and did not finish, but it's in here: > > So, in the context of system-wide profiling, the way that would work best I think > is the following: > > thread #0 binds itself to CPU#0 (via sched_setaffinity) and creates a per-CPU event on CPU#0 > thread #1 binds itself to CPU#1 (via sched_setaffinity) and creates a per-CPU event on CPU#1 > thread #2 binds itself to CPU#2 (via sched_setaffinity) and creates a per-CPU event on CPU#2 Right, that is how I think it should be done as well, and those will just dump on separate files, in a per session directory, with an extra file for the session details, in what is now the header. Later, the same thing happens at processing time, this time we'll have contention to access global thread state, the need for rounds of PERF_SAMPLE_TIME based ordering, like what we have now in the tools/perf/util/ordered-events.[ch] code, etc. This works for both 'report', 'script', 'top', 'trace', etc, as is basically the model we already have. All the work that was done for refcounting the thread, map, etc as well as locking those rbtrees would finally be taken full advantage of. - Arnaldo > etc. > > Is this how you implemented it? > If the threads in the thread pool are just free-running then the scheduler might > not migrate it to the 'right' CPU that is streaming the perf events and there will > be a lot of cross-talking between CPUs. > > Inherited events (default 'perf record') is tougher. > > Thanks, > > Ingo