From: Yunlong Song <yunlong.song@huawei.com>
To: <a.p.zijlstra@chello.nl>, <paulus@samba.org>, <mingo@redhat.com>,
"Arnaldo Carvalho de Melo" <acme@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <wangnan0@huawei.com>,
<dsahern@gmail.com>
Subject: Re: [Question] How does perf still record the stack of a specified pid even when that process is interrupted and CPU is scheduled to other process
Date: Sat, 25 Apr 2015 22:03:44 +0800 [thread overview]
Message-ID: <553B9EC0.9060001@huawei.com> (raw)
In-Reply-To: <553A49CF.1020905@huawei.com>
On 2015/4/24 21:49, Yunlong Song wrote:
> On 2015/4/24 21:31, Yunlong Song wrote:
>> [Profiling Background]
>> Now we are profiling the performance of ext4 and f2fs on an eMMC card with iozone,
>> we find a case that ext4 is better than f2fs in random write under the test of
>> "iozone -s 262144 -r 64 -i 0 -i 2". We want to analyze the I/O delay of the two
>> file systems. We have got a conclusion that 1% of sys_write takes up 60% time of
>> the overall sys_write (262144/64=4096). We want to find out the call stack during
>> this specific 1% sys_write. Our idea is to record the stack in a certain time period
>> and since the specific 1% case takes up 60% time, the total number of records of its
>> stack should also takes up 60% of the total records, then we can recognize those stacks
>> and figure out what the I/O stack of f2fs is doing in the 1% case.
>>
>> [Profiling Problem]
>>
>> Although perf can record the events (with call stack) of a specified pid, e.g. using
>> "perf record -g iozone -s 262144 -r 64 -i 0 -i 2". But we find iozone is interrupted
>> and the CPU is scheduled to other process. As a result, perf will not record any events
>> of iozone until iozone's context is recovered and the CPU is scheduled to continue
>> processing the sys_write of iozone.
>
>> This obeys our initial idea which is described in [Profiling Background],
>
> This "disobeys" our initial idea which is described in [Profiling Background],
>
> since we cannot recognize the call stacks of the specific 1% case
>> by using the ratio of the record number.
>>
>> [Alternative Solution without Perf]
>> We can use /proc/#pid/stack to record the stack in a certain period (e.g. 1ms) of iozone,
>> no matter whether iozone is interrupted or not. However, we have not taken a deep sight
>> into this, since we want to use perf to do this kind of thing.
>>
>> [Question about Perf]
>> So we have a question that "How does perf still record the stack of a specified pid even
>> when that process is interrupted and CPU is scheduled to other process?"
>>
>
>
Here is another example:
$ cat test.c
void func_loop(void)
{
int i, sum = 0;
for (i = 0; i < 100000; i++)
sum += 1;
}
void func_sleep(void)
{
sleep(1);
}
int main(void)
{
while (1) {
func_loop();
func_sleep();
}
return 0;
}
$ gcc test.c -o test -O0
$ ./test
$ ps aux | grep test
Yunlong+ 133798 0.0 0.0 4056 348 pts/7 S+ 21:52 0:00 ./test
$ perf record -g -p 133798 sleep 5
$ perf report -g --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 16 of event 'cycles'
# Event count (approx.): 2439331
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
79.46% 76.48% test test [.] func_loop
|
---func_loop
|
|--97.07%-- main
| __libc_start_main
--2.93%-- [...]
77.14% 0.00% test libc-2.19.so [.] __libc_start_main
|
---__libc_start_main
77.14% 0.00% test test [.] main
|
---main
__libc_start_main
20.54% 0.00% test [unknown] [.] 0000000000000000
|
---0
20.54% 0.00% test libc-2.19.so [.] __GI___libc_nanosleep
|
---__GI___libc_nanosleep
0
20.54% 0.00% test [unknown] [k] 0x00007f2fa07ef6b0
|
---0x7f2fa07ef6b0
__GI___libc_nanosleep
0
20.54% 0.00% test [kernel.kallsyms] [k] system_call_fastpath
|
---system_call_fastpath
0x7f2fa07ef6b0
__GI___libc_nanosleep
As shown above, the func_loop is regarded as the main operation and its call stack
is showed on the top of perf report. However, we know that func_sleep is really the
main operation from the view of time duration indeed. And if we want to its actual
call stack in the real time, we can see the result as following:
$ while true ; do cat /proc/133798/stack ; sleep 0.1 ; done
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
As shown above, func_sleep is regarded as the main operation from the view of time and
its call stack is printed in real time.
--
Thanks,
Yunlong Song
next prev parent reply other threads:[~2015-04-25 14:04 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-24 13:31 [Question] How does perf still record the stack of a specified pid even when that process is interrupted and CPU is scheduled to other process Yunlong Song
2015-04-24 13:49 ` Yunlong Song
2015-04-25 14:03 ` Yunlong Song [this message]
2015-04-24 13:49 ` David Ahern
2015-04-24 13:56 ` Yunlong Song
2015-04-24 13:58 ` David Ahern
2015-04-25 14:05 ` Yunlong Song
2015-04-25 15:53 ` David Ahern
2015-05-05 21:53 ` Rabin Vincent
2015-05-05 22:24 ` David Ahern
2015-05-06 4:13 ` Yunlong Song
2015-05-06 4:10 ` Yunlong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=553B9EC0.9060001@huawei.com \
--to=yunlong.song@huawei.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=dsahern@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulus@samba.org \
--cc=wangnan0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox