public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: "Jon Medhurst (Tixy)" <tixy@linaro.org>
Cc: Wang Nan <wangnan0@huawei.com>,
	linux@arm.linux.org.uk, will.deacon@arm.com,
	taras.kondratiuk@linaro.org, ben.dooks@codethink.co.uk,
	cl@linux.com, rabin@rab.in, davem@davemloft.net,
	lizefan@huawei.com, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: Re: [PATCH v10 2/2] ARM: kprobes: enable OPTPROBES for ARM 32
Date: Fri, 28 Nov 2014 12:12:42 +0900	[thread overview]
Message-ID: <5477E82A.3020208@hitachi.com> (raw)
In-Reply-To: <1417099007.2041.6.camel@linaro.org>

(2014/11/27 23:36), Jon Medhurst (Tixy) wrote:
> On Fri, 2014-11-21 at 14:35 +0800, Wang Nan wrote:
>> This patch introduce kprobeopt for ARM 32.
> 
> If I've understood things correctly, this is a feature which inserts
> probes by using a branch instruction to some trampoline code rather than
> using an undefined instruction as a breakpoint. That way we avoid the
> overhead of processing the exception and it is this performance
> improvement which is the main/only reason for implementing it?
> 
> If so, I though it good to see what kind of improvement we get by
> running the micro benchmarks in the kprobes test code. On an A7/A15
> big.LITTLE vexpress board the approximate figures I get are 0.3us for
> optimised probe, 1us for un-optimised, so a three times performance
> improvement. This is with an empty probe pre-handler and no post
> handler, so with a more realistic usecase, the relative improvement we
> get from optimisation would be less.

Indeed, I think we'd better use ftrace to measure performance, since
it is the most realistic usecase. On x86, we have similar number,
and ftrace itself has 0.3-0.4us to record an event. So I guess
it can get 2 times faster. (Of course it depends on the SoC because
memory bandwidth is the key for performance of event recording)


> I thought it good to see what sort of benefits this code achieves,
> especially as it could grow quite complex over time, and the cost of
> that versus the benefit should be considered.

I don't think it's so complex. It's actually cleanly separated.
However, ARM tree should have arch/arm/kernel/kprobe/ dir,
since there are too many kprobe related files under arch/arm/kernel/ ...


>>
>> Limitations:
>>  - Currently only kernel compiled with ARM ISA is supported.
> 
> Supporting Thumb will be very difficult because I don't believe that
> putting a branch into an IT block could be made to work, and you can't
> feasibly know if an instruction is in an IT block other than by first
> using something like the breakpoint probe method and then when that is
> hit examine the IT flags to see if they're set. If they aren't you could
> then change the probe to an optimised probe. Is transforming the probe
> type like that currently supported by the generic kprobes code?

Optprobe framework optimizes probes transparently. If it can not be
optimized, it just do nothing on it.


> Also, the Thumb branch instruction can only jump half as far as the ARM
> mode one. And being 32-bits when a lot of instructions people will want
> to probe are 16-bits will be an additional problem, similar as
> identified below for ARM instructions...
> 
> 
>>
>>  - Offset between probe point and optinsn slot must not larger than
>>    32MiB.
> 
> 
> I see that elsewhere [1] people are working on supporting loading kernel
> modules at locations that are out of the range of a branch instruction,
> I guess because with multi-platform kernels and general code bloat
> kernels are getting too big. The same reasons would impact the usability
> of optimized kprobes as well if they're restricted to the range of a
> single branch instruction.
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/305539.html
> 
> 
>>  Masami Hiramatsu suggests replacing 2 words, it will make
>>    things complex. Futher patch can make such optimization.
> 
> I'm wondering how can we replace 2 words if we can't determine if the
> second word is the target of a branch instruction?

on X86, we already have an instruction decoder for finding the
branch target :). But yes, it can be impossible in other arch if
it intensively uses indirect branch.

> E.g. if we had
> 
> 		b	after_probe
> 		...
> probe_me:	mov	r2, #0
> after_probe:	ldr	r0, [r1]
> 
> and we inserted a two word probe at probe_me, then the branch to
> after_probe would be to the second half of that 2 word probe. Guess that
> could be worked around by ensuring the 2nd word is an invalid
> instruction and trapping that case then emulating after_probe like we do
> unoptimised probes. This assumes that we can come up with an
> encoding for a 2 word 'long branch' that was suitable. (For Thumb, I
> suspect that we would need at least 3 16-bit instructions to achieve
> that).
> 
> As the commit message says "will make things complex" and I begin to
> wonder if the extra complexity would be worth the benefits. (Considering
> that the resulting optimised probe would only be around twice as fast.)
> 
> 
>>
>> Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because
>> ARM instruction is always 4 bytes aligned and 4 bytes long. This patch
>> replace probed instruction by a 'b', branch to trampoline code and then
>> calls optimized_callback(). optimized_callback() calls opt_pre_handler()
>> to execute kprobe handler. It also emulate/simulate replaced instruction.
>>
>> When unregistering kprobe, the deferred manner of unoptimizer may leave
>> branch instruction before optimizer is called. Different from x86_64,
>> which only copy the probed insn after optprobe_template_end and
>> reexecute them, this patch call singlestep to emulate/simulate the insn
>> directly. Futher patch can optimize this behavior.
>>
>> Signed-off-by: Wang Nan <wangnan0@huawei.com>
>> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> Cc: Jon Medhurst (Tixy) <tixy@linaro.org>
>> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
>> Cc: Will Deacon <will.deacon@arm.com>
>>
>> ---
> 
> I initially had some trouble testing this. I tried running the kprobes
> test code with some printf's added to the code and it seems that only
> very rarely are optimised probes actually executed. This turned out to
> be due to the optimization being run as a background task after a delay.
> So I ended up hacking kernel/kprobes.c to force some calls to
> wait_for_kprobe_optimizer(). It would be nice to have the test code to
> robustly cover both optimised and unoptimised cases but that would need
> some new exported functions from the generic kprobes code, not sure what
> people think of that idea?

Hm, did you use ftrace's kprobe events?
You can actually add kprobes via /sys/kernel/debug/tracing/kprobe_events and
see what kprobes are optimized via /sys/kernel/debug/kprobes/list.

For more information, please refer
 Documentation/trace/kprobetrace.txt
 Documentation/kprobes.txt

Thank you,



-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



  reply	other threads:[~2014-11-28  3:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-21  6:35 [PATCH v10 0/2] ARM: kprobes: enable OPTPROBES for ARM32 Wang Nan
2014-11-21  6:35 ` [PATCH v10 1/2] kprobes: Pass the original kprobe for preparing optimized kprobe Wang Nan
2014-11-21  6:35 ` [PATCH v10 2/2] ARM: kprobes: enable OPTPROBES for ARM 32 Wang Nan
2014-11-27 14:36   ` Jon Medhurst (Tixy)
2014-11-28  3:12     ` Masami Hiramatsu [this message]
2014-11-28 10:08       ` Jon Medhurst (Tixy)
2014-11-28 10:43         ` Masami Hiramatsu
2014-11-28 11:13         ` Russell King - ARM Linux
2014-11-28 11:17           ` Jon Medhurst (Tixy)
2014-11-29  1:28     ` Wang Nan
2014-12-01  1:29     ` Wang Nan
2014-12-01  8:59       ` Wang Nan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5477E82A.3020208@hitachi.com \
    --to=masami.hiramatsu.pt@hitachi.com \
    --cc=ben.dooks@codethink.co.uk \
    --cc=cl@linux.com \
    --cc=davem@davemloft.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=lizefan@huawei.com \
    --cc=rabin@rab.in \
    --cc=taras.kondratiuk@linaro.org \
    --cc=tixy@linaro.org \
    --cc=wangnan0@huawei.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox