From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756819AbdJKOpM (ORCPT ); Wed, 11 Oct 2017 10:45:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:58466 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751454AbdJKOpL (ORCPT ); Wed, 11 Oct 2017 10:45:11 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 335F721483 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=acme@kernel.org Date: Wed, 11 Oct 2017 11:45:06 -0300 From: Arnaldo Carvalho de Melo To: "Liang, Kan" Cc: "peterz@infradead.org" , "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , "jolsa@kernel.org" , "wangnan0@huawei.com" , "hekuang@huawei.com" , "namhyung@kernel.org" , "alexander.shishkin@linux.intel.com" , "Hunter, Adrian" , "ak@linux.intel.com" Subject: Re: [PATCH 01/10] perf record: new interfaces to read ring buffer to file Message-ID: <20171011144506.GA3503@kernel.org> References: <1507656023-177125-1-git-send-email-kan.liang@intel.com> <1507656023-177125-2-git-send-email-kan.liang@intel.com> <20171010183016.GJ28623@kernel.org> <37D7C6CF3E00A74B8858931C1DB2F077537D1E3D@SHSMSX103.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F077537D1E3D@SHSMSX103.ccr.corp.intel.com> X-Url: http://acmel.wordpress.com User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Wed, Oct 11, 2017 at 04:12:42AM +0000, Liang, Kan escreveu: > > > /* When check_messup is true, 'end' must points to a good entry */ > > > static union perf_event * perf_mmap__read(struct perf_mmap *md, bool > > > check_messup, u64 start, diff --git a/tools/perf/util/evlist.h > > > b/tools/perf/util/evlist.h index b1c14f1..1ce4857 100644 > > > --- a/tools/perf/util/evlist.h > > > +++ b/tools/perf/util/evlist.h > > > @@ -39,6 +39,16 @@ struct perf_mmap { > > > char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8); > > > }; > > > > > > +struct perf_mmap_read { > > > + struct perf_mmap *md; > > > + u64 head; > > > + u64 start; > > > + u64 end; > > > > So there will be always a one-on-one association of 'struct perf_mmap_read' > > and 'struct perf_mmap', why not go on adding more fields to 'struct > > perf_mmap' as we need > > The fields in 'struct perf_mmap' needs to be recalculated before each reading. > So I put them in a new struct. Ok, but I still think that if there is a one on one relatioship of perf_mmap_read with perf_mmap, then we should just extend the one we already have for per-mmap operations, i.e. 'struct perf_mmap', I'll try and provide a patch on top of my perf/core branch to see how it looks. > > but not doing it all at once (backward, snapshotting, > > overwrite, etc) but first the simple part, make the most basic mode: > > > > perf record -a > > > > perf top > > > > work, multithreaded, leaving the other more complicated modes fallbacking > > to the old format, then when we have it solid, go on getting the other > > features. > > Agree. > When I did perf top optimization, I also tried Namhyung's perf top multi-thread patch. > https://lwn.net/Articles/667469/ > I think it may be a good start point. I have to read that to understand why we need those indexes :-\ > I didn't work on his patch. Because the root cause of bad perf top performance > is non overwrite mode, which generate lots of samples shortly. It exceeds KNL's > computational capability. Multi-threading doesn't help much on this case. > So I tried to use overwrite mode then. Right, work on the problem you have at hand, but all these efforts should be considered to move forward. > > In the end, having the two formats supported will be needed anyway, and > > we can as well ask for processing with both perf.data file formats to compare > > results, while we strenghten out the new code. > > > > I just think we should do this in a more fine grained way to avoid too much > > code churn as well as having a fallback to the old code, that albeit non > > scalable, is what we have been using and can help in certifying that the new > > one works well, by comparing its outputs. > > I already extended the multithreading support for event synthesization in perf > record. > https://github.com/kliang2/perf.git perf_record_opt > I will send it out for review shortly after rebasing on the latest perf/core. > > In the patch series, I realloc buffer for each thread to temporarily keep the > processing result, and write them to the perf.data at the end of event > synthesization. The number of synthesized event is not big (hundreds of > Kilobyte). So I think it should be OK to do that. Ok, one thing I noticed was that with the snapshotting code we synthesize events multiple times, once per each new perf.data file, I haven't tested that with the multithreaded synthesizing code we recently merged, have you? - Arnaldo