linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Anju T Sudhakar <anju@linux.vnet.ibm.com>
To: Balbir Singh <bsingharora@gmail.com>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Cc: ananth@in.ibm.com, mahesh@linux.vnet.ibm.com, paulus@samba.org,
	mhiramat@kernel.org, naveen.n.rao@linux.vnet.ibm.com,
	srikar@linux.vnet.ibm.com
Subject: Re: [PATCH V2 0/4] OPTPROBES for powerpc
Date: Fri, 16 Dec 2016 22:50:51 +0530	[thread overview]
Message-ID: <5454f661-f33a-9d0c-6e18-deaf7687db0b@linux.vnet.ibm.com> (raw)
In-Reply-To: <fd268944-5f8e-9815-6fef-e7d9d5191044@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4788 bytes --]

Hi Balbir,



On Friday 16 December 2016 08:16 PM, Balbir Singh wrote:
>
> On 15/12/16 03:18, Anju T Sudhakar wrote:
>> This is the V2 patchset of the kprobes jump optimization
>> (a.k.a OPTPROBES)for powerpc. Kprobe being an inevitable tool
>> for kernel developers, enhancing the performance of kprobe has
>> got much importance.
>>
>> Currently kprobes inserts a trap instruction to probe a running kernel.
>> Jump optimization allows kprobes to replace the trap with a branch,
>> reducing the probe overhead drastically.
>>
>> In this series, conditional branch instructions are not considered for
>> optimization as they have to be assessed carefully in SMP systems.
>>
>> The kprobe placed on the kretprobe_trampoline during boot time, is also
>> optimized in this series. Patch 4/4 furnishes this.
>>
>> The first two patches can go independently of the series. The helper
>> functions in these patches are invoked in patch 3/4.
>>
>> Performance:
>> ============
>> An optimized kprobe in powerpc is 1.05 to 4.7 times faster than a kprobe.
>>   
>> Example:
>>   
>> Placed a probe at an offset 0x50 in _do_fork().
>> *Time Diff here is, difference in time before hitting the probe and
>> after the probed instruction. mftb() is employed in kernel/fork.c for
>> this purpose.
>>   
>> # echo 0 > /proc/sys/debug/kprobes-optimization
>> Kprobes globally unoptimized
>>   [  233.607120] Time Diff = 0x1f0
>>   [  233.608273] Time Diff = 0x1ee
>>   [  233.609228] Time Diff = 0x203
>>   [  233.610400] Time Diff = 0x1ec
>>   [  233.611335] Time Diff = 0x200
>>   [  233.612552] Time Diff = 0x1f0
>>   [  233.613386] Time Diff = 0x1ee
>>   [  233.614547] Time Diff = 0x212
>>   [  233.615570] Time Diff = 0x206
>>   [  233.616819] Time Diff = 0x1f3
>>   [  233.617773] Time Diff = 0x1ec
>>   [  233.618944] Time Diff = 0x1fb
>>   [  233.619879] Time Diff = 0x1f0
>>   [  233.621066] Time Diff = 0x1f9
>>   [  233.621999] Time Diff = 0x283
>>   [  233.623281] Time Diff = 0x24d
>>   [  233.624172] Time Diff = 0x1ea
>>   [  233.625381] Time Diff = 0x1f0
>>   [  233.626358] Time Diff = 0x200
>>   [  233.627572] Time Diff = 0x1ed
>>   
>> # echo 1 > /proc/sys/debug/kprobes-optimization
>> Kprobes globally optimized
>>   [   70.797075] Time Diff = 0x103
>>   [   70.799102] Time Diff = 0x181
>>   [   70.801861] Time Diff = 0x15e
>>   [   70.803466] Time Diff = 0xf0
>>   [   70.804348] Time Diff = 0xd0
>>   [   70.805653] Time Diff = 0xad
>>   [   70.806477] Time Diff = 0xe0
>>   [   70.807725] Time Diff = 0xbe
>>   [   70.808541] Time Diff = 0xc3
>>   [   70.810191] Time Diff = 0xc7
>>   [   70.811007] Time Diff = 0xc0
>>   [   70.812629] Time Diff = 0xc0
>>   [   70.813640] Time Diff = 0xda
>>   [   70.814915] Time Diff = 0xbb
>>   [   70.815726] Time Diff = 0xc4
>>   [   70.816955] Time Diff = 0xc0
>>   [   70.817778] Time Diff = 0xcd
>>   [   70.818999] Time Diff = 0xcd
>>   [   70.820099] Time Diff = 0xcb
>>   [   70.821333] Time Diff = 0xf0
>>
>> Implementation:
>> ===================
>>   
>> The trap instruction is replaced by a branch to a detour buffer. To address
>> the limitation of branch instruction in power architecture, detour buffer
>> slot is allocated from a reserved area . This will ensure that the branch
>> is within ± 32 MB range. The current kprobes insn caches allocate memory
>> area for insn slots with module_alloc(). This will always be beyond
>> ± 32MB range.
>>   
> The paragraph is a little confusing. We need the detour buffer to be within
> +-32 MB, but then you say we always get memory from module_alloc() beyond
> 32MB.

The last two lines in the paragraph talks about the*current 
*method**which the regular kprobe uses
for allocating instruction slot. So in our case, we can't use 
module_alloc() since there is no guarantee that the slot allocated will 
be within +/- 32MB range.
>> The detour buffer contains a call to optimized_callback() which in turn
>> call the pre_handler(). Once the pre-handler is run, the original
>> instruction is emulated from the detour buffer itself. Also the detour
>> buffer is equipped with a branch back to the normal work flow after the
>> probed instruction is emulated.
> Does the branch itself use registers that need to be saved? I presume
> we are going to rely on the +-32MB, what are the guarantees of success
> of such a mechanism?

For branching back to the next instruction, after the execution of the 
kprobe's pre-handler,
we place the branch instruction in the detour buffer itself. Hence we 
don't have to clobber any registers
after restoring them.
Before optimizing the kprobe we make sure that , 'branch to detour 
buffer' and 'branch back from detour buffer' is within +/- 32MB range. 
This ensures the working of optimized kprobe.


Thanks ,
Anju

>
> Balbir Singh.
>


[-- Attachment #2: Type: text/html, Size: 5507 bytes --]

      parent reply	other threads:[~2016-12-16 17:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-14 16:18 [PATCH V2 0/4] OPTPROBES for powerpc Anju T Sudhakar
2016-12-14 16:18 ` [PATCH V2 3/4] arch/powerpc: Implement Optprobes Anju T Sudhakar
2016-12-16 14:02   ` Masami Hiramatsu
2016-12-16 17:54     ` Anju T Sudhakar
2016-12-14 16:18 ` [PATCH V2 4/4] arch/powerpc: Optimize kprobe in kretprobe_trampoline Anju T Sudhakar
2016-12-16 14:09   ` Masami Hiramatsu
2016-12-14 16:18 ` [PATCH V2 1/4] powerpc: asm/ppc-opcode.h: introduce __PPC_SH64() Anju T Sudhakar
2016-12-14 16:18 ` [PATCH V2 2/4] powerpc: add helper to check if offset is within rel branch range Anju T Sudhakar
2016-12-16 11:52   ` Masami Hiramatsu
2016-12-16 16:31     ` Anju T Sudhakar
2016-12-16 14:46 ` [PATCH V2 0/4] OPTPROBES for powerpc Balbir Singh
2016-12-16 17:19   ` Naveen N. Rao
2016-12-16 17:20   ` Anju T Sudhakar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5454f661-f33a-9d0c-6e18-deaf7687db0b@linux.vnet.ibm.com \
    --to=anju@linux.vnet.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).