From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751812AbdFSRfe (ORCPT ); Mon, 19 Jun 2017 13:35:34 -0400 Received: from mail.kernel.org ([198.145.29.99]:41082 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751017AbdFSRfd (ORCPT ); Mon, 19 Jun 2017 13:35:33 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E28D239B8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=acme@kernel.org Date: Mon, 19 Jun 2017 14:35:29 -0300 From: Arnaldo Carvalho de Melo To: Jin Yao Cc: Jiri Olsa , Peter Zijlstra , mingo@infradead.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org, ak@linux.intel.com, kan.liang@intel.com, yao.jin@intel.com Subject: Re: [PATCH v2 3/3] perf report: Implement visual marker for macro fusion in annotate Message-ID: <20170619173529.GM3645@kernel.org> References: <1497840958-4759-1-git-send-email-yao.jin@linux.intel.com> <1497840958-4759-4-git-send-email-yao.jin@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1497840958-4759-4-git-send-email-yao.jin@linux.intel.com> X-Url: http://acmel.wordpress.com User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Mon, Jun 19, 2017 at 10:55:58AM +0800, Jin Yao escreveu: > For marking the fused instructions clearly, This patch adds a > line before the first instruction of pair and joins it with the > arrow of the jump. > > For example, when je is selected in annotate view, the line > before cmpl is displayed and joins the arrow of je. > > │ ┌──cmpl $0x0,argp_program_version_hook > 81.93 │ │──je 20 > │ │ lock cmpxchg %esi,0x38a9a4(%rip) > │ │↓ jne 29 > │ │↓ jmp 43 > 11.47 │20:└─→cmpxch %esi,0x38a999(%rip) Ok, thanks for making this per-arch! Some comments: I think we should have this marked permanently, i.e. not just when we go to the jump line, something like this (testing here in a t450s broadwell, function hc_find_func, /usr/lib64/liblzma.so.5.2.2): It is like this now, when we are not on the jne jump line: 0.71 │ mov %r14d,%r10d ▒ │ movzbl (%rdx,%r10,1),%ebp ▒ 1.06 │ 70: mov (%r9,%rcx,4),%ecx ◆ 77.98 │ 74: cmp %bpl,(%rbx,%r10,1) ▒ │ ↑ jne 70 ▒ 0.85 │ movzbl (%rdx),%r10d ▒ 0.99 │ cmp %r10b,(%rbx) ▒ I think it should be augmented to: 0.71 │ mov %r14d,%r10d ▒ │ movzbl (%rdx,%r10,1),%ebp ▒ 1.06 │ 70: ┌─mov (%r9,%rcx,4),%ecx ◆ 77.98 │ 74: └─cmp %bpl,(%rbx,%r10,1) ▒ │ ↑ jne 70 ▒ 0.85 │ movzbl (%rdx),%r10d ▒ 0.99 │ cmp %r10b,(%rbx) ▒ I.e. no arrow, the two instructions that end up as one micro-op being connected. And then this: │ ┌──cmpl $0x0,argp_program_version_hook 81.93 │ │──je 20 │ │ lock cmpxchg %esi,0x38a9a4(%rip) │ │↓ jne 29 │ │↓ jmp 43 11.47 │20:└─→cmpxch %esi,0x38a999(%rip) Would look better as: │ ┌──cmpl $0x0,argp_program_version_hook 81.93 │ ├──je 20 │ │ lock cmpxchg %esi,0x38a9a4(%rip) │ │↓ jne 29 │ │↓ jmp 43 11.47 │20:└─→cmpxch %esi,0x38a999(%rip) Patch below, please test/ack :-) This was the low hanging fruit, having the: 1.06 │ 70: ┌─mov (%r9,%rcx,4),%ecx ◆ 77.98 │ 74: └─cmp %bpl,(%rbx,%r10,1) ▒ Marker always there, not just when we have the cursor on top of one of those lines remains to be coded. But you state: ------------ Macro fusion merges two instructions to a single micro-op. Intel core platform performs this hardware optimization under limited circumstances. ------------ "Intel core", what about older arches, etc, don't you have to look at: # cpudesc : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz # cpuid : GenuineIntel,6,61,4 present in the perf.data header (or in the running system, for things like 'perf top') to make sure that this is a machine where such "macro fusion" takes place? - Arnaldo diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c index acba636bd165..9ef7677ae14f 100644 --- a/tools/perf/ui/browser.c +++ b/tools/perf/ui/browser.c @@ -756,8 +756,10 @@ void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column, ui_browser__gotorc(browser, end_row, column); SLsmg_draw_hline(2); ui_browser__gotorc(browser, end_row + 1, column - 1); - SLsmg_draw_vline(1); + SLsmg_write_char(SLSMG_LTEE_CHAR); } else { + ui_browser__gotorc(browser, end_row, column - 1); + SLsmg_write_char(SLSMG_LTEE_CHAR); ui_browser__gotorc(browser, end_row, column); SLsmg_draw_hline(2); }