Re: [PATCH v2 3/3] perf report: Implement visual marker for macro fusion in annotate

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Jin, Yao" <yao.jin@linux.intel.com>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	mingo@infradead.org, alexander.shishkin@linux.intel.com,
	linux-kernel@vger.kernel.org, ak@linux.intel.com,
	kan.liang@intel.com, yao.jin@intel.com
Subject: Re: [PATCH v2 3/3] perf report: Implement visual marker for macro fusion in annotate
Date: Tue, 20 Jun 2017 09:25:35 +0800	[thread overview]
Message-ID: <13d15de3-419c-22b1-1699-fdcc766117eb@linux.intel.com> (raw)
In-Reply-To: <20170619173529.GM3645@kernel.org>


> Ok, thanks for making this per-arch! Some comments:
>
> I think we should have this marked permanently, i.e. not just when we go
> to the jump line, something like this (testing here in a t450s
> broadwell, function hc_find_func, /usr/lib64/liblzma.so.5.2.2):
>
> It is like this now, when we are not on the jne jump line:
>
>    0.71 │       mov    %r14d,%r10d                                                                                                                            ▒
>         │       movzbl (%rdx,%r10,1),%ebp                                                                                                                     ▒
>    1.06 │ 70:   mov    (%r9,%rcx,4),%ecx                                                                                                                      ◆
>   77.98 │ 74:   cmp    %bpl,(%rbx,%r10,1)                                                                                                                     ▒
>         │     ↑ jne    70                                                                                                                                     ▒
>    0.85 │       movzbl (%rdx),%r10d                                                                                                                           ▒
>    0.99 │       cmp    %r10b,(%rbx)                                                                                                                           ▒
>
> I think it should be augmented to:
>
>    0.71 │       mov    %r14d,%r10d                                                                                                                            ▒
>         │       movzbl (%rdx,%r10,1),%ebp                                                                                                                     ▒
>    1.06 │ 70: ┌─mov    (%r9,%rcx,4),%ecx                                                                                                                      ◆
>   77.98 │ 74: └─cmp    %bpl,(%rbx,%r10,1)                                                                                                                     ▒
>         │     ↑ jne    70                                                                                                                                     ▒
>    0.85 │       movzbl (%rdx),%r10d                                                                                                                           ▒
>    0.99 │       cmp    %r10b,(%rbx)                                                                                                                           ▒
>
> I.e. no arrow, the two instructions that end up as one micro-op being
> connected.

The fused instruction pairs are:
cmp + jcc
test + jcc
add + jcc
sub + jcc
and + jcc
inc + jcc
dec + jcc

Mov and cmp are not the fused instruction pair. So we don't need to 
connect mov and cmp. I guess what Arnaldo wants is to connect two fused 
instructions even we don't go to the jcc line. For example: a line is 
connected between cmp and jne in above case.

I have thought about that. While the visualization may be not very good 
because the original arrow before jne would be overwritten. So now I 
just implement a way that joins the jump arrow when we go to the jcc 
line. Another consideration is the fused instruction pairs are very 
common instructions in code, if we mark them all, there may be too much.

> And then this:
>
>          │   ┌──cmpl   $0x0,argp_program_version_hook
>    81.93 │   │──je     20
>          │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
>          │   │↓ jne    29
>          │   │↓ jmp    43
>    11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
>
> Would look better as:
>
>          │   ┌──cmpl   $0x0,argp_program_version_hook
>    81.93 │   ├──je     20
>          │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
>          │   │↓ jne    29
>          │   │↓ jmp    43
>    11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
>
> Patch below, please test/ack :-)

I have tested. It's better! There is no space in the line. Thanks!

> This was the low hanging fruit, having the:
>
>    1.06 │ 70: ┌─mov    (%r9,%rcx,4),%ecx                                                                                                                      ◆
>   77.98 │ 74: └─cmp    %bpl,(%rbx,%r10,1)                                                                                                                     ▒
>
> Marker always there, not just when we have the cursor on top of one of
> those lines remains to be coded.

My comment is as above.

> But you state:
>
>   ------------
>      Macro fusion merges two instructions to a single micro-op. Intel core
>      platform performs this hardware optimization under limited
>      circumstances.
>   ------------
>
> "Intel core", what about older arches, etc, don't you have to look at:
>
> # cpudesc : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
> # cpuid : GenuineIntel,6,61,4
>
> present in the perf.data header (or in the running system, for things
> like 'perf top') to make sure that this is a machine where such "macro
> fusion" takes place?
>
> - Arnaldo

Reference for macro fusion is the optimization guide,
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html
2.3.2.1
— In Intel microarchitecture code name Nehalem: CMP, TEST.
— In Intel microarchitecture code name Sandy Bridge: CMP, TEST, ADD, 
SUB, AND, INC, DEC
— These instructions can fuse if The first source / destination operand 
is a register.

The second source operand (if exists) is one of: immediate, register, or 
non RIP-relative memory.
The second instruction of the macro-fusable pair is a conditional branch.

We probably don't need the full rules, just a simple test for 
CMP/TEST/ADD/SUB/AND/INC/DEC and second instruction a Jcc condition 
branch. Also I don't think we need to distinguish Nehalem/Sandy Bridge 
and other core platforms. A simple test may be acceptable.

Thanks!
Jin Yao

> diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
> index acba636bd165..9ef7677ae14f 100644
> --- a/tools/perf/ui/browser.c
> +++ b/tools/perf/ui/browser.c
> @@ -756,8 +756,10 @@ void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column,
>   		ui_browser__gotorc(browser, end_row, column);
>   		SLsmg_draw_hline(2);
>   		ui_browser__gotorc(browser, end_row + 1, column - 1);
> -		SLsmg_draw_vline(1);
> +		SLsmg_write_char(SLSMG_LTEE_CHAR);
>   	} else {
> +		ui_browser__gotorc(browser, end_row, column - 1);
> +		SLsmg_write_char(SLSMG_LTEE_CHAR);
>   		ui_browser__gotorc(browser, end_row, column);
>   		SLsmg_draw_hline(2);
>   	}

next prev parent reply	other threads:[~2017-06-20  1:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-19  2:55 [PATCH v2 0/3] perf report: Implement visual marker for macro fusion in annotate Jin Yao
2017-06-19  2:55 ` [PATCH v2 1/3] perf util: Return arch from symbol__disassemble and save it in browser Jin Yao
2017-06-20  9:02   ` [tip:perf/core] perf annotate: Return arch from symbol__disassemble() " tip-bot for Jin Yao
2017-06-19  2:55 ` [PATCH v2 2/3] perf util: Check for fused instruction Jin Yao
2017-06-19  2:55 ` [PATCH v2 3/3] perf report: Implement visual marker for macro fusion in annotate Jin Yao
2017-06-19 17:35   ` Arnaldo Carvalho de Melo
2017-06-19 19:13     ` Arnaldo Carvalho de Melo
2017-06-20  1:25     ` Jin, Yao [this message]
2017-06-20  1:37       ` Arnaldo Carvalho de Melo
2017-06-20  1:54         ` Jin, Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13d15de3-419c-22b1-1699-fdcc766117eb@linux.intel.com \
    --to=yao.jin@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@infradead.org \
    --cc=peterz@infradead.org \
    --cc=yao.jin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox