public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Subject: Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
Date: Fri, 17 Apr 2009 12:31:43 -0700	[thread overview]
Message-ID: <49E8D91F.1060005@goop.org> (raw)
In-Reply-To: <20090417162326.GG8253@elte.hu>

Ingo Molnar wrote:
> I meant to suggest to Jeremy to measure the effect of this 
> out-of-lining, in terms of instruction count in the hotpath.
>   

OK, here's a comparison for trace_sched_switch, comparing inline and out 
of line tracing functions, with CONFIG_PREEMPT enabled:

The inline __DO_TRACE version of trace_sched_switch inserts 20 
instructions, assembling to 114 bytes of code in the hot path:

        cmpl    $0, __tracepoint_sched_switch+8(%rip)   #, __tracepoint_sched_switch.state
        je      .L1582  #,
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        incl    -8124(%rax)     # <variable>.preempt_count
        movq    __tracepoint_sched_switch+16(%rip), %r12        #, it_func
        testq   %r12, %r12      # it_func
.L1603:
        je      .L1583  #,
        movq    -136(%rbp), %rdx        # next,
        movq    -144(%rbp), %rsi        # prev,
        movq    %rbx, %rdi      # rq,
        call    *(%r12) #* it_func
        addq    $8, %r12        #, it_func
        cmpq    $0, (%r12)      #,* it_func
        jmp     .L1603  #
.L1583:
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        decl    -8124(%rax)     # <variable>.preempt_count
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        testb   $8, -8136(%rax) #,
        je      .L1582  #,
        call    preempt_schedule        #
.L1582:


Taking __do_trace_sched_switch out of lines inserts this into the hot 
path (6 instructions, 31 bytes):

        cmpl    $0, __tracepoint_sched_switch+8(%rip)   #, __tracepoint_sched_switch.state
        je      .L1748  #,
        movq    -136(%rbp), %rdx        # next,
        movq    -144(%rbp), %rsi        # prev,
        movq    %rbx, %rdi      # rq,
        call    __do_trace_sched_switch #
.L1748:


__do_trace_sched_switch is a fair bit larger, mostly due to function 
preamble frame and reg save/restore, and some unfortunate and 
unnecessary register thrashing (why not keep rdi,rsi,rdx where they 
are?).  But it isn't that much larger than the inline version: 34 
instructions, 118 bytes.  This code will also be shared among all 
instances of the tracepoint (not in this case, because sched_switch is 
unique, but other tracepoints have multiple users).

__do_trace_sched_switch:
        pushq   %rbp    #
        movq    %rsp, %rbp      #,
        pushq   %r14    #
        movq    %rdi, %r14      # rq, rq
        pushq   %r13    #
        movq    %rsi, %r13      # prev, prev
        pushq   %r12    #
        movq    %rdx, %r12      # next, next
        pushq   %rbx    #
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        incl    -8124(%rax)     # <variable>.preempt_count
        movq    __tracepoint_sched_switch+16(%rip), %rax        #, _________p1
        testq   %rax, %rax      # _________p1
        je      .L2403  #,
        movq    %rax, %rbx      # _________p1, it_func
.L2404:
        movq    %r12, %rdx      # next,
        movq    %r13, %rsi      # prev,
        movq    %r14, %rdi      # rq,
        call    *(%rbx) #* it_func
        addq    $8, %rbx        #, it_func
        cmpq    $0, (%rbx)      #,* it_func
        jne     .L2404  #,
.L2403:
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        decl    -8124(%rax)     # <variable>.preempt_count
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        testb   $8, -8136(%rax) #,
        je      .L2406  #,
        call    preempt_schedule        #
.L2406:
        popq    %rbx    #
        popq    %r12    #
        popq    %r13    #
        popq    %r14    #
        leave
        ret


So, conclusion: putting the tracepoint code out of line significantly 
reduces the hot-path code size at each tracepoint (114 bytes down to 31 
in this case, 27% the size).  This should reduce the overhead of having 
tracing configured but not enabled.  The saving won't be as large for 
tracepoints with fewer arguments or without CONFIG_PREEMPT, but I chose 
this example because it is realistic and undeniably a hot path.  And 
when doing pvops tracing, 80 new events with hundreds of callsites 
around the kernel, this is really going to add up.

The tradeoff is that the actual tracing function is a little larger, but 
not dramatically so.  I would expect some performance hit when the 
tracepoint is actually enabled.  This may be mitigated increased icache 
hits when a tracepoint has multiple sites.

(BTW, I realized that we don't need to pass &__tracepoint_FOO to 
__do_trace_FOO(), since its always going to be the same; this simplifies 
the calling convention at the callsite, and it also makes void 
tracepoints work again.)

    J

  parent reply	other threads:[~2009-04-17 19:31 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-17  6:35 [PATCH] tracing WIP patches Jeremy Fitzhardinge
2009-04-17  6:35 ` [PATCH 1/4] tracing: move __DO_TRACE out of line Jeremy Fitzhardinge
2009-04-17 15:46   ` Ingo Molnar
2009-04-17 16:10     ` Mathieu Desnoyers
2009-04-17 16:23       ` Ingo Molnar
2009-04-17 16:47         ` Jeremy Fitzhardinge
2009-04-17 19:31         ` Jeremy Fitzhardinge [this message]
2009-04-17 19:46           ` Ingo Molnar
2009-04-17 19:57             ` Steven Rostedt
2009-04-17 19:58             ` Jeremy Fitzhardinge
2009-04-17 20:06               ` Steven Rostedt
2009-04-18  6:53           ` Mathieu Desnoyers
2009-04-18 14:16             ` Steven Rostedt
2009-04-19  3:59               ` Mathieu Desnoyers
2009-04-19 23:38                 ` Jeremy Fitzhardinge
2009-04-20 21:39                   ` Mathieu Desnoyers
2009-04-19 23:40             ` Jeremy Fitzhardinge
2009-04-20 21:47               ` Mathieu Desnoyers
2009-04-17  6:35 ` [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems Jeremy Fitzhardinge
2009-04-17 15:55   ` Steven Rostedt
2009-04-17 16:14     ` Jeremy Fitzhardinge
2009-04-17 16:32       ` Steven Rostedt
2009-04-17 16:48         ` Jeremy Fitzhardinge
2009-04-17 16:57           ` Steven Rostedt
2009-04-17 17:14             ` Jeremy Fitzhardinge
2009-04-17 17:33               ` Steven Rostedt
2009-04-17 18:11                 ` Jeremy Fitzhardinge
2009-04-17  6:35 ` [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE Jeremy Fitzhardinge
2009-04-17  6:48   ` Christoph Hellwig
2009-04-17  6:58     ` Jeremy Fitzhardinge
2009-04-17  7:05       ` Christoph Hellwig
2009-04-17 12:53         ` Ingo Molnar
2009-04-17 15:21     ` Mathieu Desnoyers
2009-04-17  6:35 ` [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints Jeremy Fitzhardinge
2009-04-17 15:53   ` Steven Rostedt
2009-04-17 15:53   ` Ingo Molnar
2009-04-17 16:10   ` [tip:tracing/core] " tip-bot for Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49E8D91F.1060005@goop.org \
    --to=jeremy@goop.org \
    --cc=jeremy.fitzhardinge@citrix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox