All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>,
	Lin Ming <mlin@ss.pku.edu.cn>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mingo@elte.hu" <mingo@elte.hu>,
	"rusty@rustcorp.com.au" <rusty@rustcorp.com.au>,
	"a.p.zijlstra@chello.nl" <a.p.zijlstra@chello.nl>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"Zuo, Jiao" <jiao.zuo@intel.com>
Subject: Re: [RFC 1/2] kernel patch for dump user space stack tool
Date: Thu, 19 Apr 2012 14:13:19 +0800	[thread overview]
Message-ID: <4F8FACFF.9070107@gmail.com> (raw)
In-Reply-To: <1334812653.14538.29.camel@ymzhang.sh.intel.com>

On 04/19/2012 01:17 PM, Yanmin Zhang wrote:
> On Thu, 2012-04-19 at 11:50 +0800, Cong Wang wrote:
>> On 04/17/2012 10:37 PM, Tu, Xiaobing wrote:
>>> Resend the patch because of the log is too long on a single line.
>>>
>>> From: xiaobing tu<xiaobing.tu@intel.com>
>>>
>>> Here is the kernel patch for this tool, The idea is to output user space stack call-chain from
>>> /proc/xxx/stack, currently, /proc/xxx/stack only output kernel stack call chain. We extend
>>> it to output user space call chain in hex format
>>>
>>
>> Can you teach me why we still need this as we have pstack?
> Cong,
>
> Sorry for replying so late. Xiaobing told me you sent him email and I
> didn't receive the 1st one you sent out.


Based on the length of your reply and the description of the patch, you 
hide lots of information in your patch description.

>
> I tried pstack and it does work. It means developers in the world wanted
> the tool long long ago.
>
> Although not checking the source codes of pstack (sorry, I'm busy in debugging
> many critical issues), I think pstack is based on ptrace interface, which means:
> 1) It need traps into system for many times to collect call frames of one
> task.
> 2) It need send signal to the ptraced process to stop it. Such behavior
> might have some impact if the ptraced process also processes many signals.
> 3) The data parsing to get symbols might not be split from data collection.
> I mean, it collects call frames of one process, then parses it; then collects the 2nd
> task's. If there are many processes, it couldn't collect the data just at the monitor
> time point.


Yet another one who wants to "fix" ptrace. ;-)

>
> Why do we work out the tools? The original requirement is from real work.
> We are enabling Android on Medfield. One typical error of Android is ANR.
> When a process couldn't respond in 5 seconds, Android reports an ANR error,
> and dumps JAVA call stack. However, it couldn't dump userspace lib (such like
> bionic, written by C or C++). In addition, Android just dumps the stack of
> the non-responding process. It doesn't dump stack of others. As binder is basic
> framework in Android, processes communicate by binder in the model of client/server.
> When one process is not responding quickly, maybe another process blocks it. We
> need dump that process status.
>
> Many teams complained it's hard to debug such ANR issues, especially the ones which
> are triggered at MTBF testing. Sometimes, an ANR happens after MTBF testing runs
> for one week. Developers ask us to implement such tool over and over again.
>
> Besides ANR, sometimes, system might not respond to any user operation. Usually,
> kernel or firmware would reset system. At that time, we also need get the call
> chains of all the user space processes before system is reset.


I am not familiar with Andriod at all, so a quick question is if this is 
only for Andriod, why you introduce this for all? IOW, why not provide a 
Kconfig?

BTW, I am sure you need to put the above paragraphs into your patch 
description, to make it clear why the patch is needed.

>
> With our tool,
> 1) We could collect the HEX-format call chain data and /proc/XXX/maps
> of all the processes quickly, then parse them either after rebooting, or
> after the issue is reported. It could catch the scene just at the time point
> when the error happens. Our experiments shows the tool could collect the data
> of all processes within 200ms.
> 2) The new tool won't stop the processes and have less impact on them.
> Considering a scenario of performance bottleneck investigation, statistics collection
> shouldn't have big impact on running processes.
> 3) It could support both i386 and x86-64. I tried pstack and it doesn't work
> with x86-64.
> 4) It follows /proc/XXX/stack interface and it's easy to use it.
>
> Besides this tool, we are considering to extend it to collect user space
> call chain of current process from kernel when kernel detects some other
> abnormal behavior.
>

In my previous reply, I ran 'pstrack' on my x86-64 machine, don't 
understand why you said it doesn't work with x86-64? I guess pstack 
supports more than just x86, as ptrace is available in other arch's too.

Thanks.

  reply	other threads:[~2012-04-19  6:13 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-17 14:37 [RFC 1/2] kernel patch for dump user space stack tool Tu, Xiaobing
2012-04-19  3:50 ` Cong Wang
2012-04-19  5:17   ` Yanmin Zhang
2012-04-19  6:13     ` Cong Wang [this message]
2012-04-19  6:28       ` Yanmin Zhang
2012-04-20  9:38     ` Peter Zijlstra
2012-04-24  0:56       ` Yanmin Zhang
2012-04-20  9:54     ` Peter Zijlstra
2012-04-24  2:19       ` Yanmin Zhang
  -- strict thread matches above, loose matches on Subject: below --
2012-04-11  8:07 Tu, Xiaobing
2012-04-17  4:43 ` Lin Ming
2012-04-17 14:38   ` Tu, Xiaobing
2012-04-20  9:44 ` Peter Zijlstra
2012-04-24  1:30   ` Yanmin Zhang
2012-04-24 10:10     ` Peter Zijlstra
2012-04-25  2:58       ` Yanmin Zhang
2012-04-24 10:11     ` Peter Zijlstra
2012-04-25  2:44       ` Yanmin Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F8FACFF.9070107@gmail.com \
    --to=xiyou.wangcong@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=jiao.zuo@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mlin@ss.pku.edu.cn \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=xiaobing.tu@intel.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.