* Re: [merged] proctxt-update-kernel-filesystem-proctxt-documentation.patch removed from -mm tree
2009-06-24 7:35 ` Eric W. Biederman
@ 2009-06-24 9:33 ` Stefani Seibold
2009-06-24 15:30 ` Andrew Morton
2009-06-24 12:03 ` [patch 2/2] procfs: provide stack information for threads V0.9 Stefani Seibold
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Stefani Seibold @ 2009-06-24 9:33 UTC (permalink / raw)
To: Eric W. Biederman, Andrew Morton
Cc: Alexey Dobriyan, linux-kernel, Peter Zijlstra, Ingo Molnar
Am Mittwoch, den 24.06.2009, 00:35 -0700 schrieb Eric W. Biederman:
> Andrew Morton <akpm@linux-foundation.org> writes:
>
> > On Wed, 24 Jun 2009 08:45:03 +0200 Stefani Seibold <stefani@seibold.net> wrote:
> >
> >> Am Dienstag, den 23.06.2009, 23:32 -0700 schrieb Andrew Morton:
> >> > On Wed, 24 Jun 2009 08:20:44 +0200 Stefani Seibold <stefani@seibold.net> wrote:
> >> >
> >> > > what is with the associated
> >> > > procfs-provide-stack-information-for-threads-v08.patch
> >> > > patch?
> >> > >
> >> > > There was no real objections against this patch, so why not merge it for
> >> > > 2.6.31?
> >> >
> >> > Alexey pointed out that it doesn't actually work.
> >>
> >> That is not true... it works. With my patch the kernel does exactly know
> >> where the thread stack is and therefor it is easy to determinate the
> >> associated map.
>
> Usually yes, but not in all cases.
Which cases? The only way i know is to set the stack pointer to an
arbitrary place in user space.... And this is not a common use case.
>
>
> > On Tue, 16 Jun 2009 02:33:33 +0400 Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> >> On Mon, Jun 15, 2009 at 03:02:05PM -0700, akpm@linux-foundation.org wrote:
> >> > procfs-provide-stack-information-for-threads-v08.patch
> >> > --- a/fs/proc/array.c~procfs-provide-stack-information-for-threads-v08
> >>
> >> > +++ a/fs/proc/array.c
> >> > @@ -321,6 +321,54 @@ static inline void task_context_switch_c
> >> > p->nivcsw);
> >> > }
> >> >
> >> > +static inline unsigned long get_stack_usage_in_bytes(struct vm_area_struct *vma,
> >> > + struct task_struct *p)
> >> > +{
> >> > + unsigned long i;
> >> > + struct page *page;
> >> > + unsigned long stkpage;
> >> > +
> >> > + stkpage = KSTK_ESP(p) & PAGE_MASK;
> >> > +
> >> > +#ifdef CONFIG_STACK_GROWSUP
> >> > + for (i = vma->vm_end; i-PAGE_SIZE > stkpage; i -= PAGE_SIZE) {
> >> > +
> >> > + page = follow_page(vma, i-PAGE_SIZE, 0);
> >>
> >> How can this work?
> >>
I replied a message for a solution to this problem but i get no answer.
>
> >> If stack page got swapped out, you'll get smaller than actual result.
> >
> > Alexey's point is that follow_page() will return NULL if it hits a
> > swapped-out stack page and the loop will exit, leading to an incorrect
> > (ie: short) return value from get_stack_usage_in_bytes().
> >
> > Is this claim wrong?
>
No.
I digged in the kernel source and the only solution i found is to use
the walk_page_range() like show_smap() in proc/fs/task_mmu.c.
Maybe there is an easier way, but i dont know.
So i would implement a similar function like smaps_pte_range() in
proc/fs/task_mmu.c to detected the high water usage.
>
> Add to that the code is unnecessarily complicated.
>
I don't like statements like that, without a explaination.
> The patch mixes several different changes together. It deserves being
> broken up into at least two patches.
>
Everybody tells me a different way to do a patch. Which one is the right
way. Ingo's, Andrew's or your way?
And it is a question of time if you a hacker girl which is not a full
time linux kernel developer.
> I am concerned about the performance. Glibc opens /proc/self/maps in
> practically every application so doing something like following page
> tables requires testing and verifying the performance.
>
I understand your concern, that is the reason why i display the stack
high water usage mark only in /proc/<pid>/status. This is normally a
human interface.
/proc/<pid>/maps or smaps will only show where the thread stack is
resided and the max. of the stack size, which is only a simple
subtraction.
The reason to display the max. size is, because the stack start isn't
equal to the map start address.
> Eric
Stefani
Write a patch: 16 hours
To get a patch into the kernel: 16 days
Overhead: 800 percent
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [merged] proctxt-update-kernel-filesystem-proctxt-documentation.patch removed from -mm tree
2009-06-24 9:33 ` Stefani Seibold
@ 2009-06-24 15:30 ` Andrew Morton
2009-06-24 15:57 ` Stefani Seibold
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2009-06-24 15:30 UTC (permalink / raw)
To: Stefani Seibold
Cc: Eric W. Biederman, Alexey Dobriyan, linux-kernel, Peter Zijlstra,
Ingo Molnar
On Wed, 24 Jun 2009 11:33:25 +0200 Stefani Seibold <stefani@seibold.net> wrote:
> > > Alexey's point is that follow_page() will return NULL if it hits a
> > > swapped-out stack page and the loop will exit, leading to an incorrect
> > > (ie: short) return value from get_stack_usage_in_bytes().
> > >
> > > Is this claim wrong?
> >
>
> No.
>
> I digged in the kernel source and the only solution i found is to use
> the walk_page_range() like show_smap() in proc/fs/task_mmu.c.
>
> Maybe there is an easier way, but i dont know.
>
> So i would implement a similar function like smaps_pte_range() in
> proc/fs/task_mmu.c to detected the high water usage.
Perhaps we could enhance follow_page() so that it can tell the caller
when the target page is "virtually there", but swapped out. Add a new
FOLL_SWAP, I guess.
How to communicate this back to the caller? Perhaps add another
argument to follow_page(), perhaps return some magic value such as
#define FOLLOW_PAGE_SWAPPED_PAGE ((struct page *)1)
Adding the additional argument would be nicer.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [merged] proctxt-update-kernel-filesystem-proctxt-documentation.patch removed from -mm tree
2009-06-24 15:30 ` Andrew Morton
@ 2009-06-24 15:57 ` Stefani Seibold
0 siblings, 0 replies; 16+ messages in thread
From: Stefani Seibold @ 2009-06-24 15:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Eric W. Biederman, Alexey Dobriyan, linux-kernel, Peter Zijlstra,
Ingo Molnar
Am Mittwoch, den 24.06.2009, 08:30 -0700 schrieb Andrew Morton:
> On Wed, 24 Jun 2009 11:33:25 +0200 Stefani Seibold <stefani@seibold.net> wrote:
>
> > > > Alexey's point is that follow_page() will return NULL if it hits a
> > > > swapped-out stack page and the loop will exit, leading to an incorrect
> > > > (ie: short) return value from get_stack_usage_in_bytes().
> > > >
> > > > Is this claim wrong?
> > >
> >
> > No.
> >
> > I digged in the kernel source and the only solution i found is to use
> > the walk_page_range() like show_smap() in proc/fs/task_mmu.c.
> >
> > Maybe there is an easier way, but i dont know.
> >
> > So i would implement a similar function like smaps_pte_range() in
> > proc/fs/task_mmu.c to detected the high water usage.
>
> Perhaps we could enhance follow_page() so that it can tell the caller
> when the target page is "virtually there", but swapped out. Add a new
> FOLL_SWAP, I guess.
>
I currently fixed it by using walk_page_range(). I think this is a quiet
good solution. But if you like i can do it in a future version.
> How to communicate this back to the caller? Perhaps add another
> argument to follow_page(), perhaps return some magic value such as
>
> #define FOLLOW_PAGE_SWAPPED_PAGE ((struct page *)1)
>
> Adding the additional argument would be nicer.
IMHO i think it would be the best to add a new FOLL_NOTIFY_SWAP flag and
if the page is swapped out return the FOLLOW_PAGE_SWAPPED_PAGE magic if
this flag is passed.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [patch 2/2] procfs: provide stack information for threads V0.9
2009-06-24 7:35 ` Eric W. Biederman
2009-06-24 9:33 ` Stefani Seibold
@ 2009-06-24 12:03 ` Stefani Seibold
2009-06-24 14:33 ` [patch 2/2] procfs: provide stack information for threads V0.10 Stefani Seibold
2009-06-24 16:28 ` [patch 2/2] procfs: provide stack information for threads V0.11 Stefani Seibold
3 siblings, 0 replies; 16+ messages in thread
From: Stefani Seibold @ 2009-06-24 12:03 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Eric W. Biederman
Cc: Alexey Dobriyan, Peter Zijlstra, Ingo Molnar
Hi,
this is the newest version of the formaly named "detailed stack info"
patch which give you a better overview of the userland application stack
usage, especially for embedded linux.
Currently you are only able to dump the main process/thread stack usage
which is showed in /proc/pid/status by the "VmStk" Value. But you get no
information about the consumed stack memory of the the threads.
There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
marks the vm mapping where the thread stack pointer reside with "[thread
stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
value information, because libpthread doesn't set the start of the stack
to the top of the mapped area, depending of the pthread usage.
A sample output of /proc/<pid>/task/<tid>/maps looks like:
08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7d12000-a7d13000 ---p 00000000 00:00 0
a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
a7f13000-a7f14000 ---p 00000000 00:00 0
a7f14000-a7f36000 rw-p 00000000 00:00 0
a7f36000-a8069000 r-xp 00000000 03:00 4222 /lib/libc.so.6
a8069000-a806b000 r--p 00133000 03:00 4222 /lib/libc.so.6
a806b000-a806c000 rw-p 00135000 03:00 4222 /lib/libc.so.6
a806c000-a806f000 rw-p 00000000 00:00 0
a806f000-a8083000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0
a8083000-a8084000 r--p 00013000 03:00 14462 /lib/libpthread.so.0
a8084000-a8085000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0
a8085000-a8088000 rw-p 00000000 00:00 0
a8088000-a80a4000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2
a80a4000-a80a5000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2
a80a5000-a80a6000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2
afaf5000-afb0a000 rw-p 00000000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
Also there is a new entry "stack usage" in /proc/<pid>/{task/*,}/status
which will you give the current stack usage in kb.
A sample output of /proc/self/status looks like:
Name: cat
State: R (running)
Tgid: 507
Pid: 507
.
.
.
CapBnd: fffffffffffffeff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 0
Stack usage: 12 kB
I also fixed stack base address in /proc/<pid>/{task/*,}/stat to the
base address of the associated thread stack and not the one of the main
process. This makes more sense.
Changes since last posting:
- use walk_page_range() to determinate the stack usage high water mark
- include swapped pages to the stack usage high water mark
The patch is against 2.6.30-rc7 and tested with on intel and ppc
architectures.
ChangeLog:
20. Jan 2009 V0.1
- First Version for Kernel 2.6.28.1
31. Mar 2009 V0.2
- Ported to Kernel 2.6.29
03. Jun 2009 V0.3
- Ported to Kernel 2.6.30
- Redesigned what was suggested by Ingo Molnar
- the thread watch monitor is gone
- the /proc/stackmon entry is also gone
- slim down
04. Jun 2009 V0.4
- Redesigned everything that was suggested by Andrew Morton
- slim down
04. Jun 2009 V0.5
- Code cleanup
06. Jun 2009 V0.6
- Fix missing mm->mmap_sem locking in function task_show_stack_usage()
- Code cleanup
10. Jun 2009 V0.7
- update Documentation/filesystem/proc.txt
10. Jun 2009 V0.8
- change maps/smaps output, displays now the max. stack size
Documentation/filesystems/proc.txt | 5 +-
fs/exec.c | 2
fs/proc/array.c | 85 ++++++++++++++++++++++++++++++++++++-
fs/proc/task_mmu.c | 19 ++++++++
include/linux/sched.h | 1
kernel/fork.c | 2
6 files changed, 112 insertions(+), 2 deletions(-)
Signed-off-by: Stefani Seibold <stefani@seibold.net>
diff -u -N -r linux-2.6.30.orig/Documentation/filesystems/proc.txt linux-2.6.30/Documentation/filesystems/proc.txt
--- linux-2.6.30.orig/Documentation/filesystems/proc.txt 2009-06-10 09:09:27.000000000 +0200
+++ linux-2.6.30/Documentation/filesystems/proc.txt 2009-06-10 09:07:46.000000000 +0200
@@ -176,6 +176,7 @@
CapBnd: ffffffffffffffff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
+ Stack usage: 12 kB
This shows you nearly the same information you would get if you viewed it with
the ps command. In fact, ps uses the proc file system to obtain its
@@ -229,6 +230,7 @@
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
nonvoluntary_ctxt_switches number of non voluntary context switches
+ Stack usage: stack usage high water mark (round up to page size)
..............................................................................
Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
@@ -307,7 +309,7 @@
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7cb1000-a7cb2000 ---p 00000000 00:00 0
-a7cb2000-a7eb2000 rw-p 00000000 00:00 0
+a7cb2000-a7eb2000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
a7eb2000-a7eb3000 ---p 00000000 00:00 0
a7eb3000-a7ed5000 rw-p 00000000 00:00 0
a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
@@ -343,6 +345,7 @@
[stack] = the stack of the main process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler
+ [thread stack, xxxxxxxx] = the stack of the thread, xxxxxxxx is the stack size
or if empty, the mapping is anonymous.
diff -u -N -r linux-2.6.30.orig/fs/exec.c linux-2.6.30/fs/exec.c
--- linux-2.6.30.orig/fs/exec.c 2009-06-04 09:29:47.000000000 +0200
+++ linux-2.6.30/fs/exec.c 2009-06-04 09:32:35.000000000 +0200
@@ -1328,6 +1328,8 @@
if (retval < 0)
goto out;
+ current->stack_start = current->mm->start_stack;
+
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
diff -u -N -r linux-2.6.30.orig/fs/proc/array.c linux-2.6.30/fs/proc/array.c
--- linux-2.6.30.orig/fs/proc/array.c 2009-06-04 09:29:47.000000000 +0200
+++ linux-2.6.30/fs/proc/array.c 2009-06-24 13:53:27.000000000 +0200
@@ -83,6 +83,8 @@
#include <linux/ptrace.h>
#include <linux/tracehook.h>
+#include <linux/swapops.h>
+
#include <asm/pgtable.h>
#include <asm/processor.h>
#include "internal.h"
@@ -321,6 +323,86 @@
p->nivcsw);
}
+struct stack_stats {
+ struct vm_area_struct *vma;
+ unsigned long startpage;
+ unsigned long usage;
+};
+
+static int stack_usage_pte_range(pmd_t *pmd, unsigned long addr,
+ unsigned long end, struct mm_walk *walk)
+{
+ struct stack_stats *ss = walk->private;
+ struct vm_area_struct *vma = ss->vma;
+ pte_t *pte, ptent;
+ spinlock_t *ptl;
+ int ret = 0;
+
+ pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+ for (; addr != end; pte++, addr += PAGE_SIZE) {
+ ptent = *pte;
+
+#ifdef CONFIG_STACK_GROWSUP
+ if (pte_present(ptent) || is_swap_pte(ptent))
+ ss->usage = addr - ss->startpage + PAGE_SIZE;
+#else
+ if (pte_present(ptent) || is_swap_pte(ptent)) {
+ ss->usage = ss->startpage - addr + PAGE_SIZE;
+ ret = 1;
+ break;
+ }
+#endif
+ }
+ pte_unmap_unlock(pte - 1, ptl);
+ cond_resched();
+ return ret;
+}
+
+static inline unsigned long get_stack_usage_in_bytes(struct vm_area_struct *vma,
+ struct task_struct *task)
+{
+ struct stack_stats ss;
+ struct mm_walk stack_walk = {
+ .pmd_entry = stack_usage_pte_range,
+ .mm = vma->vm_mm,
+ .private = &ss,
+ };
+
+ if (!vma->vm_mm || is_vm_hugetlb_page(vma))
+ return 0;
+
+ ss.vma = vma;
+ ss.startpage = task->stack_start & PAGE_MASK;
+ ss.usage = 0;
+
+#ifdef CONFIG_STACK_GROWSUP
+ walk_page_range(KSTK_ESP(task) & PAGE_MASK, vma->vm_end,
+ &stack_walk);
+#else
+ walk_page_range(vma->vm_start, (KSTK_ESP(task) & PAGE_MASK) + PAGE_SIZE,
+ &stack_walk);
+#endif
+ return ss.usage;
+}
+
+static inline void task_show_stack_usage(struct seq_file *m,
+ struct task_struct *task)
+{
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = get_task_mm(task);
+
+ if (mm) {
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, task->stack_start);
+ if (vma)
+ seq_printf(m, "Stack usage:\t%lu kB\n",
+ get_stack_usage_in_bytes(vma, task) >> 10);
+
+ up_read(&mm->mmap_sem);
+ mmput(mm);
+ }
+}
+
int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
@@ -340,6 +422,7 @@
task_show_regs(m, task);
#endif
task_context_switch_counts(m, task);
+ task_show_stack_usage(m, task);
return 0;
}
@@ -481,7 +564,7 @@
rsslim,
mm ? mm->start_code : 0,
mm ? mm->end_code : 0,
- (permitted && mm) ? mm->start_stack : 0,
+ (permitted) ? task->stack_start : 0,
esp,
eip,
/* The signal information here is obsolete.
diff -u -N -r linux-2.6.30.orig/fs/proc/task_mmu.c linux-2.6.30/fs/proc/task_mmu.c
--- linux-2.6.30.orig/fs/proc/task_mmu.c 2009-06-04 09:29:47.000000000 +0200
+++ linux-2.6.30/fs/proc/task_mmu.c 2009-06-10 09:02:40.000000000 +0200
@@ -242,6 +242,25 @@
} else if (vma->vm_start <= mm->start_stack &&
vma->vm_end >= mm->start_stack) {
name = "[stack]";
+ } else {
+ unsigned long stack_start;
+ struct proc_maps_private *pmp;
+
+ pmp = m->private;
+ stack_start = pmp->task->stack_start;
+
+ if (vma->vm_start <= stack_start &&
+ vma->vm_end >= stack_start) {
+ pad_len_spaces(m, len);
+ seq_printf(m,
+ "[thread stack: %08lx]",
+#ifdef CONFIG_STACK_GROWSUP
+ vma->vm_end - stack_start
+#else
+ stack_start - vma->vm_start
+#endif
+ );
+ }
}
} else {
name = "[vdso]";
diff -u -N -r linux-2.6.30.orig/include/linux/sched.h linux-2.6.30/include/linux/sched.h
--- linux-2.6.30.orig/include/linux/sched.h 2009-06-04 09:29:47.000000000 +0200
+++ linux-2.6.30/include/linux/sched.h 2009-06-04 09:32:35.000000000 +0200
@@ -1429,6 +1429,7 @@
/* state flags for use by tracers */
unsigned long trace;
#endif
+ unsigned long stack_start;
};
/* Future-safe accessor for struct task_struct's cpus_allowed. */
diff -u -N -r linux-2.6.30.orig/kernel/fork.c linux-2.6.30/kernel/fork.c
--- linux-2.6.30.orig/kernel/fork.c 2009-06-04 09:29:47.000000000 +0200
+++ linux-2.6.30/kernel/fork.c 2009-06-04 13:15:35.000000000 +0200
@@ -1092,6 +1092,8 @@
if (unlikely(current->ptrace))
ptrace_fork(p, clone_flags);
+ p->stack_start = stack_start;
+
/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
^ permalink raw reply [flat|nested] 16+ messages in thread* [patch 2/2] procfs: provide stack information for threads V0.10
2009-06-24 7:35 ` Eric W. Biederman
2009-06-24 9:33 ` Stefani Seibold
2009-06-24 12:03 ` [patch 2/2] procfs: provide stack information for threads V0.9 Stefani Seibold
@ 2009-06-24 14:33 ` Stefani Seibold
2009-06-24 15:20 ` Ingo Molnar
2009-06-24 16:28 ` [patch 2/2] procfs: provide stack information for threads V0.11 Stefani Seibold
3 siblings, 1 reply; 16+ messages in thread
From: Stefani Seibold @ 2009-06-24 14:33 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Eric W. Biederman
Cc: Alexey Dobriyan, Peter Zijlstra, Ingo Molnar
Hi,
this is the newest version of the formaly named "detailed stack info"
patch which give you a better overview of the userland application stack
usage, especially for embedded linux.
Currently you are only able to dump the main process/thread stack usage
which is showed in /proc/pid/status by the "VmStk" Value. But you get no
information about the consumed stack memory of the the threads.
There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
marks the vm mapping where the thread stack pointer reside with "[thread
stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
value information, because libpthread doesn't set the start of the stack
to the top of the mapped area, depending of the pthread usage.
A sample output of /proc/<pid>/task/<tid>/maps looks like:
08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7d12000-a7d13000 ---p 00000000 00:00 0
a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
a7f13000-a7f14000 ---p 00000000 00:00 0
a7f14000-a7f36000 rw-p 00000000 00:00 0
a7f36000-a8069000 r-xp 00000000 03:00 4222 /lib/libc.so.6
a8069000-a806b000 r--p 00133000 03:00 4222 /lib/libc.so.6
a806b000-a806c000 rw-p 00135000 03:00 4222 /lib/libc.so.6
a806c000-a806f000 rw-p 00000000 00:00 0
a806f000-a8083000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0
a8083000-a8084000 r--p 00013000 03:00 14462 /lib/libpthread.so.0
a8084000-a8085000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0
a8085000-a8088000 rw-p 00000000 00:00 0
a8088000-a80a4000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2
a80a4000-a80a5000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2
a80a5000-a80a6000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2
afaf5000-afb0a000 rw-p 00000000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
Also there is a new entry "stack usage" in /proc/<pid>/{task/*,}/status
which will you give the current stack usage in kb.
A sample output of /proc/self/status looks like:
Name: cat
State: R (running)
Tgid: 507
Pid: 507
.
.
.
CapBnd: fffffffffffffeff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 0
Stack usage: 12 kB
I also fixed stack base address in /proc/<pid>/{task/*,}/stat to the
base address of the associated thread stack and not the one of the main
process. This makes more sense.
Changes since last posting:
- fix off by one bug
- cleanup
The patch is against 2.6.30 and is tested on intel and ppc architectures.
ChangeLog:
20. Jan 2009 V0.1
- First Version for Kernel 2.6.28.1
31. Mar 2009 V0.2
- Ported to Kernel 2.6.29
03. Jun 2009 V0.3
- Ported to Kernel 2.6.30
- Redesigned what was suggested by Ingo Molnar
- the thread watch monitor is gone
- the /proc/stackmon entry is also gone
- slim down
04. Jun 2009 V0.4
- Redesigned everything that was suggested by Andrew Morton
- slim down
04. Jun 2009 V0.5
- Code cleanup
06. Jun 2009 V0.6
- Fix missing mm->mmap_sem locking in function task_show_stack_usage()
- Code cleanup
10. Jun 2009 V0.7
- update Documentation/filesystem/proc.txt
10. Jun 2009 V0.8
- change maps/smaps output, displays now the max. stack size
24. Jun 2009 V0.9
- use walk_page_range() to determinate the stack usage high water mark
- include swapped pages to the stack usage high water mark count
Documentation/filesystems/proc.txt | 5 +-
fs/exec.c | 2
fs/proc/array.c | 85 ++++++++++++++++++++++++++++++++++++-
fs/proc/task_mmu.c | 19 ++++++++
include/linux/sched.h | 1
kernel/fork.c | 2
6 files changed, 112 insertions(+), 2 deletions(-)
Signed-off-by: Stefani Seibold <stefani@seibold.net>
diff -u -N -r linux-2.6.30.orig/Documentation/filesystems/proc.txt linux-2.6.30/Documentation/filesystems/proc.txt
--- linux-2.6.30.orig/Documentation/filesystems/proc.txt 2009-06-24 16:21:46.000000000 +0200
+++ linux-2.6.30/Documentation/filesystems/proc.txt 2009-06-24 16:22:11.000000000 +0200
@@ -176,6 +176,7 @@
CapBnd: ffffffffffffffff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
+ Stack usage: 12 kB
This shows you nearly the same information you would get if you viewed it with
the ps command. In fact, ps uses the proc file system to obtain its
@@ -229,6 +230,7 @@
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
nonvoluntary_ctxt_switches number of non voluntary context switches
+ Stack usage: stack usage high water mark (round up to page size)
..............................................................................
Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
@@ -307,7 +309,7 @@
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7cb1000-a7cb2000 ---p 00000000 00:00 0
-a7cb2000-a7eb2000 rw-p 00000000 00:00 0
+a7cb2000-a7eb2000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
a7eb2000-a7eb3000 ---p 00000000 00:00 0
a7eb3000-a7ed5000 rw-p 00000000 00:00 0
a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
@@ -343,6 +345,7 @@
[stack] = the stack of the main process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler
+ [thread stack, xxxxxxxx] = the stack of the thread, xxxxxxxx is the stack size
or if empty, the mapping is anonymous.
diff -u -N -r linux-2.6.30.orig/fs/exec.c linux-2.6.30/fs/exec.c
--- linux-2.6.30.orig/fs/exec.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/fs/exec.c 2009-06-24 16:22:11.000000000 +0200
@@ -1328,6 +1328,8 @@
if (retval < 0)
goto out;
+ current->stack_start = current->mm->start_stack;
+
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
diff -u -N -r linux-2.6.30.orig/fs/proc/array.c linux-2.6.30/fs/proc/array.c
--- linux-2.6.30.orig/fs/proc/array.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/fs/proc/array.c 2009-06-24 16:24:59.000000000 +0200
@@ -82,6 +82,7 @@
#include <linux/pid_namespace.h>
#include <linux/ptrace.h>
#include <linux/tracehook.h>
+#include <linux/swapops.h>
#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -321,6 +322,87 @@
p->nivcsw);
}
+struct stack_stats {
+ struct vm_area_struct *vma;
+ unsigned long startpage;
+ unsigned long usage;
+};
+
+static int stack_usage_pte_range(pmd_t *pmd, unsigned long addr,
+ unsigned long end, struct mm_walk *walk)
+{
+ struct stack_stats *ss = walk->private;
+ struct vm_area_struct *vma = ss->vma;
+ pte_t *pte, ptent;
+ spinlock_t *ptl;
+ int ret = 0;
+
+ pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+ for (; addr != end; pte++, addr += PAGE_SIZE) {
+ ptent = *pte;
+
+#ifdef CONFIG_STACK_GROWSUP
+ if (pte_present(ptent) || is_swap_pte(ptent))
+ ss->usage = addr - ss->startpage + PAGE_SIZE;
+#else
+ if (pte_present(ptent) || is_swap_pte(ptent)) {
+ ss->usage = ss->startpage - addr + PAGE_SIZE;
+ pte++;
+ ret = 1;
+ break;
+ }
+#endif
+ }
+ pte_unmap_unlock(pte - 1, ptl);
+ cond_resched();
+ return ret;
+}
+
+static inline unsigned long get_stack_usage_in_bytes(struct vm_area_struct *vma,
+ struct task_struct *task)
+{
+ struct stack_stats ss;
+ struct mm_walk stack_walk = {
+ .pmd_entry = stack_usage_pte_range,
+ .mm = vma->vm_mm,
+ .private = &ss,
+ };
+
+ if (!vma->vm_mm || is_vm_hugetlb_page(vma))
+ return 0;
+
+ ss.vma = vma;
+ ss.startpage = task->stack_start & PAGE_MASK;
+ ss.usage = 0;
+
+#ifdef CONFIG_STACK_GROWSUP
+ walk_page_range(KSTK_ESP(task) & PAGE_MASK, vma->vm_end,
+ &stack_walk);
+#else
+ walk_page_range(vma->vm_start, (KSTK_ESP(task) & PAGE_MASK) + PAGE_SIZE,
+ &stack_walk);
+#endif
+ return ss.usage;
+}
+
+static inline void task_show_stack_usage(struct seq_file *m,
+ struct task_struct *task)
+{
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = get_task_mm(task);
+
+ if (mm) {
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, task->stack_start);
+ if (vma)
+ seq_printf(m, "Stack usage:\t%lu kB\n",
+ get_stack_usage_in_bytes(vma, task) >> 10);
+
+ up_read(&mm->mmap_sem);
+ mmput(mm);
+ }
+}
+
int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
@@ -340,6 +422,7 @@
task_show_regs(m, task);
#endif
task_context_switch_counts(m, task);
+ task_show_stack_usage(m, task);
return 0;
}
@@ -481,7 +564,7 @@
rsslim,
mm ? mm->start_code : 0,
mm ? mm->end_code : 0,
- (permitted && mm) ? mm->start_stack : 0,
+ (permitted) ? task->stack_start : 0,
esp,
eip,
/* The signal information here is obsolete.
diff -u -N -r linux-2.6.30.orig/fs/proc/task_mmu.c linux-2.6.30/fs/proc/task_mmu.c
--- linux-2.6.30.orig/fs/proc/task_mmu.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/fs/proc/task_mmu.c 2009-06-24 16:22:11.000000000 +0200
@@ -242,6 +242,25 @@
} else if (vma->vm_start <= mm->start_stack &&
vma->vm_end >= mm->start_stack) {
name = "[stack]";
+ } else {
+ unsigned long stack_start;
+ struct proc_maps_private *pmp;
+
+ pmp = m->private;
+ stack_start = pmp->task->stack_start;
+
+ if (vma->vm_start <= stack_start &&
+ vma->vm_end >= stack_start) {
+ pad_len_spaces(m, len);
+ seq_printf(m,
+ "[thread stack: %08lx]",
+#ifdef CONFIG_STACK_GROWSUP
+ vma->vm_end - stack_start
+#else
+ stack_start - vma->vm_start
+#endif
+ );
+ }
}
} else {
name = "[vdso]";
diff -u -N -r linux-2.6.30.orig/include/linux/sched.h linux-2.6.30/include/linux/sched.h
--- linux-2.6.30.orig/include/linux/sched.h 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/include/linux/sched.h 2009-06-24 16:22:11.000000000 +0200
@@ -1429,6 +1429,7 @@
/* state flags for use by tracers */
unsigned long trace;
#endif
+ unsigned long stack_start;
};
/* Future-safe accessor for struct task_struct's cpus_allowed. */
diff -u -N -r linux-2.6.30.orig/kernel/fork.c linux-2.6.30/kernel/fork.c
--- linux-2.6.30.orig/kernel/fork.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/kernel/fork.c 2009-06-24 16:22:11.000000000 +0200
@@ -1092,6 +1092,8 @@
if (unlikely(current->ptrace))
ptrace_fork(p, clone_flags);
+ p->stack_start = stack_start;
+
/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [patch 2/2] procfs: provide stack information for threads V0.10
2009-06-24 14:33 ` [patch 2/2] procfs: provide stack information for threads V0.10 Stefani Seibold
@ 2009-06-24 15:20 ` Ingo Molnar
2009-06-24 15:49 ` Stefani Seibold
0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2009-06-24 15:20 UTC (permalink / raw)
To: Stefani Seibold
Cc: Andrew Morton, linux-kernel, Eric W. Biederman, Alexey Dobriyan,
Peter Zijlstra
* Stefani Seibold <stefani@seibold.net> wrote:
> Hi,
>
> this is the newest version of the formaly named "detailed stack info"
> patch which give you a better overview of the userland application stack
> usage, especially for embedded linux.
>
> Currently you are only able to dump the main process/thread stack usage
> which is showed in /proc/pid/status by the "VmStk" Value. But you get no
> information about the consumed stack memory of the the threads.
>
> There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
> marks the vm mapping where the thread stack pointer reside with "[thread
> stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
> value information, because libpthread doesn't set the start of the stack
> to the top of the mapped area, depending of the pthread usage.
>
> A sample output of /proc/<pid>/task/<tid>/maps looks like:
>
> 08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
> 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
> 0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
> a7d12000-a7d13000 ---p 00000000 00:00 0
> a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
I have the same question as before: have you checked the use of that
field in tools/perf/builtin-record.c, and how your change will
impact that?
Ingo
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [patch 2/2] procfs: provide stack information for threads V0.10
2009-06-24 15:20 ` Ingo Molnar
@ 2009-06-24 15:49 ` Stefani Seibold
2009-06-24 17:40 ` Johannes Weiner
0 siblings, 1 reply; 16+ messages in thread
From: Stefani Seibold @ 2009-06-24 15:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andrew Morton, linux-kernel, Eric W. Biederman, Alexey Dobriyan,
Peter Zijlstra
Am Mittwoch, den 24.06.2009, 17:20 +0200 schrieb Ingo Molnar:
> * Stefani Seibold <stefani@seibold.net> wrote:
>
> > Hi,
> >
> > this is the newest version of the formaly named "detailed stack info"
> > patch which give you a better overview of the userland application stack
> > usage, especially for embedded linux.
> >
> > Currently you are only able to dump the main process/thread stack usage
> > which is showed in /proc/pid/status by the "VmStk" Value. But you get no
> > information about the consumed stack memory of the the threads.
> >
> > There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
> > marks the vm mapping where the thread stack pointer reside with "[thread
> > stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
> > value information, because libpthread doesn't set the start of the stack
> > to the top of the mapped area, depending of the pthread usage.
> >
> > A sample output of /proc/<pid>/task/<tid>/maps looks like:
> >
> > 08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
> > 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
> > 0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
> > a7d12000-a7d13000 ---p 00000000 00:00 0
> > a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
>
> I have the same question as before: have you checked the use of that
> field in tools/perf/builtin-record.c, and how your change will
> impact that?
>
Good question... i have another one: What is tools/perf/builtin-record.c
and where can i find it? Then i could check it.
> Ingo
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [patch 2/2] procfs: provide stack information for threads V0.10
2009-06-24 15:49 ` Stefani Seibold
@ 2009-06-24 17:40 ` Johannes Weiner
2009-06-24 17:46 ` Ingo Molnar
0 siblings, 1 reply; 16+ messages in thread
From: Johannes Weiner @ 2009-06-24 17:40 UTC (permalink / raw)
To: Stefani Seibold
Cc: Ingo Molnar, Andrew Morton, linux-kernel, Eric W. Biederman,
Alexey Dobriyan, Peter Zijlstra
On Wed, Jun 24, 2009 at 05:49:50PM +0200, Stefani Seibold wrote:
> Am Mittwoch, den 24.06.2009, 17:20 +0200 schrieb Ingo Molnar:
> > * Stefani Seibold <stefani@seibold.net> wrote:
> >
> > > Hi,
> > >
> > > this is the newest version of the formaly named "detailed stack info"
> > > patch which give you a better overview of the userland application stack
> > > usage, especially for embedded linux.
> > >
> > > Currently you are only able to dump the main process/thread stack usage
> > > which is showed in /proc/pid/status by the "VmStk" Value. But you get no
> > > information about the consumed stack memory of the the threads.
> > >
> > > There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
> > > marks the vm mapping where the thread stack pointer reside with "[thread
> > > stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
> > > value information, because libpthread doesn't set the start of the stack
> > > to the top of the mapped area, depending of the pthread usage.
> > >
> > > A sample output of /proc/<pid>/task/<tid>/maps looks like:
> > >
> > > 08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
> > > 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
> > > 0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
> > > a7d12000-a7d13000 ---p 00000000 00:00 0
> > > a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
> >
> > I have the same question as before: have you checked the use of that
> > field in tools/perf/builtin-record.c, and how your change will
> > impact that?
> >
>
> Good question... i have another one: What is tools/perf/builtin-record.c
> and where can i find it? Then i could check it.
You can find it in a recent git tree from Linus.
On the original question: builtin-record.c is unaffected by this patch
as this exact field will only be parsed if the mapping is executable.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [patch 2/2] procfs: provide stack information for threads V0.10
2009-06-24 17:40 ` Johannes Weiner
@ 2009-06-24 17:46 ` Ingo Molnar
2009-06-24 19:08 ` Johannes Weiner
0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2009-06-24 17:46 UTC (permalink / raw)
To: Johannes Weiner
Cc: Stefani Seibold, Andrew Morton, linux-kernel, Eric W. Biederman,
Alexey Dobriyan, Peter Zijlstra
* Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Wed, Jun 24, 2009 at 05:49:50PM +0200, Stefani Seibold wrote:
> > Am Mittwoch, den 24.06.2009, 17:20 +0200 schrieb Ingo Molnar:
> > > * Stefani Seibold <stefani@seibold.net> wrote:
> > >
> > > > Hi,
> > > >
> > > > this is the newest version of the formaly named "detailed stack info"
> > > > patch which give you a better overview of the userland application stack
> > > > usage, especially for embedded linux.
> > > >
> > > > Currently you are only able to dump the main process/thread stack usage
> > > > which is showed in /proc/pid/status by the "VmStk" Value. But you get no
> > > > information about the consumed stack memory of the the threads.
> > > >
> > > > There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
> > > > marks the vm mapping where the thread stack pointer reside with "[thread
> > > > stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
> > > > value information, because libpthread doesn't set the start of the stack
> > > > to the top of the mapped area, depending of the pthread usage.
> > > >
> > > > A sample output of /proc/<pid>/task/<tid>/maps looks like:
> > > >
> > > > 08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
> > > > 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
> > > > 0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
> > > > a7d12000-a7d13000 ---p 00000000 00:00 0
> > > > a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
> > >
> > > I have the same question as before: have you checked the use of that
> > > field in tools/perf/builtin-record.c, and how your change will
> > > impact that?
> > >
> >
> > Good question... i have another one: What is
> > tools/perf/builtin-record.c and where can i find it? Then i
> > could check it.
>
> You can find it in a recent git tree from Linus.
>
> On the original question: builtin-record.c is unaffected by this
> patch as this exact field will only be parsed if the mapping is
> executable.
A stack can be executable too. It is not common, but possible.
Ingo
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [patch 2/2] procfs: provide stack information for threads V0.10
2009-06-24 17:46 ` Ingo Molnar
@ 2009-06-24 19:08 ` Johannes Weiner
2009-06-25 9:36 ` Ingo Molnar
2009-06-25 10:09 ` [tip:perfcounters/urgent] perf record: Fix filemap pathname parsing in /proc/pid/maps tip-bot for Johannes Weiner
0 siblings, 2 replies; 16+ messages in thread
From: Johannes Weiner @ 2009-06-24 19:08 UTC (permalink / raw)
To: Ingo Molnar
Cc: Stefani Seibold, Andrew Morton, linux-kernel, Eric W. Biederman,
Alexey Dobriyan, Peter Zijlstra
On Wed, Jun 24, 2009 at 07:46:37PM +0200, Ingo Molnar wrote:
>
> * Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> > On Wed, Jun 24, 2009 at 05:49:50PM +0200, Stefani Seibold wrote:
> > > Am Mittwoch, den 24.06.2009, 17:20 +0200 schrieb Ingo Molnar:
> > > > * Stefani Seibold <stefani@seibold.net> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > this is the newest version of the formaly named "detailed stack info"
> > > > > patch which give you a better overview of the userland application stack
> > > > > usage, especially for embedded linux.
> > > > >
> > > > > Currently you are only able to dump the main process/thread stack usage
> > > > > which is showed in /proc/pid/status by the "VmStk" Value. But you get no
> > > > > information about the consumed stack memory of the the threads.
> > > > >
> > > > > There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
> > > > > marks the vm mapping where the thread stack pointer reside with "[thread
> > > > > stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
> > > > > value information, because libpthread doesn't set the start of the stack
> > > > > to the top of the mapped area, depending of the pthread usage.
> > > > >
> > > > > A sample output of /proc/<pid>/task/<tid>/maps looks like:
> > > > >
> > > > > 08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
> > > > > 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
> > > > > 0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
> > > > > a7d12000-a7d13000 ---p 00000000 00:00 0
> > > > > a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
> > > >
> > > > I have the same question as before: have you checked the use of that
> > > > field in tools/perf/builtin-record.c, and how your change will
> > > > impact that?
> > > >
> > >
> > > Good question... i have another one: What is
> > > tools/perf/builtin-record.c and where can i find it? Then i
> > > could check it.
> >
> > You can find it in a recent git tree from Linus.
> >
> > On the original question: builtin-record.c is unaffected by this
> > patch as this exact field will only be parsed if the mapping is
> > executable.
>
> A stack can be executable too. It is not common, but possible.
It also ignores the field if it doesn't start with a slash, so it's
even safe for executable stacks.
On a different note, I think that parser is not working for file
mappings with paths containing spaces. Not common, but possible :)
The below, sorry: untested, should fix this up. I think we don't
expect a slash in those lines except in a pathname, so looking for the
first slash should be okay. What do you think?
---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: tools/perf: fix filemap pathname parsing in /proc/pid/maps
Looking backward for the first space from the end of a line in
/proc/pid/maps does not find the start of the pathname of the mapped
file if it contains a space.
Since the only slashes we have in this file occur in the (absolute!)
pathname column of file mappings, looking for the first slash in a
line is a safe method to find the name.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d7ebbd7..9b899ba 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -306,12 +306,11 @@ static void pid_synthesize_mmap_samples(pid_t pid)
continue;
pbf += n + 3;
if (*pbf == 'x') { /* vm_exec */
- char *execname = strrchr(bf, ' ');
+ char *execname = strchr(bf, '/');
- if (execname == NULL || execname[1] != '/')
+ if (execname == NULL)
continue;
- execname += 1;
size = strlen(execname);
execname[size - 1] = '\0'; /* Remove \n */
memcpy(mmap_ev.filename, execname, size);
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [patch 2/2] procfs: provide stack information for threads V0.10
2009-06-24 19:08 ` Johannes Weiner
@ 2009-06-25 9:36 ` Ingo Molnar
2009-06-25 10:09 ` [tip:perfcounters/urgent] perf record: Fix filemap pathname parsing in /proc/pid/maps tip-bot for Johannes Weiner
1 sibling, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2009-06-25 9:36 UTC (permalink / raw)
To: Johannes Weiner
Cc: Stefani Seibold, Andrew Morton, linux-kernel, Eric W. Biederman,
Alexey Dobriyan, Peter Zijlstra
* Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Wed, Jun 24, 2009 at 07:46:37PM +0200, Ingo Molnar wrote:
> >
> > * Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > > On Wed, Jun 24, 2009 at 05:49:50PM +0200, Stefani Seibold wrote:
> > > > Am Mittwoch, den 24.06.2009, 17:20 +0200 schrieb Ingo Molnar:
> > > > > * Stefani Seibold <stefani@seibold.net> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > this is the newest version of the formaly named "detailed stack info"
> > > > > > patch which give you a better overview of the userland application stack
> > > > > > usage, especially for embedded linux.
> > > > > >
> > > > > > Currently you are only able to dump the main process/thread stack usage
> > > > > > which is showed in /proc/pid/status by the "VmStk" Value. But you get no
> > > > > > information about the consumed stack memory of the the threads.
> > > > > >
> > > > > > There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
> > > > > > marks the vm mapping where the thread stack pointer reside with "[thread
> > > > > > stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
> > > > > > value information, because libpthread doesn't set the start of the stack
> > > > > > to the top of the mapped area, depending of the pthread usage.
> > > > > >
> > > > > > A sample output of /proc/<pid>/task/<tid>/maps looks like:
> > > > > >
> > > > > > 08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
> > > > > > 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
> > > > > > 0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
> > > > > > a7d12000-a7d13000 ---p 00000000 00:00 0
> > > > > > a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
> > > > >
> > > > > I have the same question as before: have you checked the use of that
> > > > > field in tools/perf/builtin-record.c, and how your change will
> > > > > impact that?
> > > > >
> > > >
> > > > Good question... i have another one: What is
> > > > tools/perf/builtin-record.c and where can i find it? Then i
> > > > could check it.
> > >
> > > You can find it in a recent git tree from Linus.
> > >
> > > On the original question: builtin-record.c is unaffected by this
> > > patch as this exact field will only be parsed if the mapping is
> > > executable.
> >
> > A stack can be executable too. It is not common, but possible.
>
> It also ignores the field if it doesn't start with a slash, so
> it's even safe for executable stacks.
>
> On a different note, I think that parser is not working for file
> mappings with paths containing spaces. Not common, but possible
> :)
>
> The below, sorry: untested, should fix this up. I think we don't
> expect a slash in those lines except in a pathname, so looking for
> the first slash should be okay. What do you think?
heh - good one - applied, thanks Johannes!
Ingo
^ permalink raw reply [flat|nested] 16+ messages in thread* [tip:perfcounters/urgent] perf record: Fix filemap pathname parsing in /proc/pid/maps
2009-06-24 19:08 ` Johannes Weiner
2009-06-25 9:36 ` Ingo Molnar
@ 2009-06-25 10:09 ` tip-bot for Johannes Weiner
1 sibling, 0 replies; 16+ messages in thread
From: tip-bot for Johannes Weiner @ 2009-06-25 10:09 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, stefani, a.p.zijlstra, hannes, ebiederm,
akpm, tglx, mingo, adobriyan
Commit-ID: 76c64c5e4c47b6d28deb3cae8dfa07a93c2229dc
Gitweb: http://git.kernel.org/tip/76c64c5e4c47b6d28deb3cae8dfa07a93c2229dc
Author: Johannes Weiner <hannes@cmpxchg.org>
AuthorDate: Wed, 24 Jun 2009 21:08:36 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 25 Jun 2009 11:35:58 +0200
perf record: Fix filemap pathname parsing in /proc/pid/maps
Looking backward for the first space from the end of a line in
/proc/pid/maps does not find the start of the pathname of the mapped
file if it contains a space.
Since the only slashes we have in this file occur in the (absolute!)
pathname column of file mappings, looking for the first slash in a
line is a safe method to find the name.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090624190835.GA25548@cmpxchg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
tools/perf/builtin-record.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d7ebbd7..9b899ba 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -306,12 +306,11 @@ static void pid_synthesize_mmap_samples(pid_t pid)
continue;
pbf += n + 3;
if (*pbf == 'x') { /* vm_exec */
- char *execname = strrchr(bf, ' ');
+ char *execname = strchr(bf, '/');
- if (execname == NULL || execname[1] != '/')
+ if (execname == NULL)
continue;
- execname += 1;
size = strlen(execname);
execname[size - 1] = '\0'; /* Remove \n */
memcpy(mmap_ev.filename, execname, size);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [patch 2/2] procfs: provide stack information for threads V0.11
2009-06-24 7:35 ` Eric W. Biederman
` (2 preceding siblings ...)
2009-06-24 14:33 ` [patch 2/2] procfs: provide stack information for threads V0.10 Stefani Seibold
@ 2009-06-24 16:28 ` Stefani Seibold
3 siblings, 0 replies; 16+ messages in thread
From: Stefani Seibold @ 2009-06-24 16:28 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, Eric W. Biederman
Cc: Alexey Dobriyan, Peter Zijlstra, Ingo Molnar
Hi,
this is the newest version of the formaly named "detailed stack info"
patch which give you a better overview of the userland application stack
usage, especially for embedded linux.
Currently you are only able to dump the main process/thread stack usage
which is showed in /proc/pid/status by the "VmStk" Value. But you get no
information about the consumed stack memory of the the threads.
There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which
marks the vm mapping where the thread stack pointer reside with "[thread
stack xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a
value information, because libpthread doesn't set the start of the stack
to the top of the mapped area, depending of the pthread usage.
A sample output of /proc/<pid>/task/<tid>/maps looks like:
08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7d12000-a7d13000 ---p 00000000 00:00 0
a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
a7f13000-a7f14000 ---p 00000000 00:00 0
a7f14000-a7f36000 rw-p 00000000 00:00 0
a7f36000-a8069000 r-xp 00000000 03:00 4222 /lib/libc.so.6
a8069000-a806b000 r--p 00133000 03:00 4222 /lib/libc.so.6
a806b000-a806c000 rw-p 00135000 03:00 4222 /lib/libc.so.6
a806c000-a806f000 rw-p 00000000 00:00 0
a806f000-a8083000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0
a8083000-a8084000 r--p 00013000 03:00 14462 /lib/libpthread.so.0
a8084000-a8085000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0
a8085000-a8088000 rw-p 00000000 00:00 0
a8088000-a80a4000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2
a80a4000-a80a5000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2
a80a5000-a80a6000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2
afaf5000-afb0a000 rw-p 00000000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
Also there is a new entry "stack usage" in /proc/<pid>/{task/*,}/status
which will you give the current stack usage in kb.
A sample output of /proc/self/status looks like:
Name: cat
State: R (running)
Tgid: 507
Pid: 507
.
.
.
CapBnd: fffffffffffffeff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 0
Stack usage: 12 kB
I also fixed stack base address in /proc/<pid>/{task/*,}/stat to the
base address of the associated thread stack and not the one of the main
process. This makes more sense.
Changes since last posting:
- fix compatibility with tools/perf/builtin-record.c in upstream kernel
The patch is against 2.6.30 and is tested on intel and ppc architectures.
ChangeLog:
20. Jan 2009 V0.1
- First Version for Kernel 2.6.28.1
31. Mar 2009 V0.2
- Ported to Kernel 2.6.29
03. Jun 2009 V0.3
- Ported to Kernel 2.6.30
- Redesigned what was suggested by Ingo Molnar
- the thread watch monitor is gone
- the /proc/stackmon entry is also gone
- slim down
04. Jun 2009 V0.4
- Redesigned everything that was suggested by Andrew Morton
- slim down
04. Jun 2009 V0.5
- Code cleanup
06. Jun 2009 V0.6
- Fix missing mm->mmap_sem locking in function task_show_stack_usage()
- Code cleanup
10. Jun 2009 V0.7
- update Documentation/filesystem/proc.txt
10. Jun 2009 V0.8
- change maps/smaps output, displays now the max. stack size
24. Jun 2009 V0.9
- use walk_page_range() to determinate the stack usage high water mark
- include swapped pages to the stack usage high water mark count
24. Jun 2009 V0.10
- fix off by one bug
- cleanup
Documentation/filesystems/proc.txt | 5 +-
fs/exec.c | 2
fs/proc/array.c | 85 ++++++++++++++++++++++++++++++++++++-
fs/proc/task_mmu.c | 19 ++++++++
include/linux/sched.h | 1
kernel/fork.c | 2
6 files changed, 112 insertions(+), 2 deletions(-)
Signed-off-by: Stefani Seibold <stefani@seibold.net>
diff -u -N -r linux-2.6.30.orig/Documentation/filesystems/proc.txt linux-2.6.30/Documentation/filesystems/proc.txt
--- linux-2.6.30.orig/Documentation/filesystems/proc.txt 2009-06-24 16:21:46.000000000 +0200
+++ linux-2.6.30/Documentation/filesystems/proc.txt 2009-06-24 16:22:11.000000000 +0200
@@ -176,6 +176,7 @@
CapBnd: ffffffffffffffff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
+ Stack usage: 12 kB
This shows you nearly the same information you would get if you viewed it with
the ps command. In fact, ps uses the proc file system to obtain its
@@ -229,6 +230,7 @@
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
nonvoluntary_ctxt_switches number of non voluntary context switches
+ Stack usage: stack usage high water mark (round up to page size)
..............................................................................
Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
@@ -307,7 +309,7 @@
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7cb1000-a7cb2000 ---p 00000000 00:00 0
-a7cb2000-a7eb2000 rw-p 00000000 00:00 0
+a7cb2000-a7eb2000 rw-p 00000000 00:00 0 [threadstack:001ff4b4]
a7eb2000-a7eb3000 ---p 00000000 00:00 0
a7eb3000-a7ed5000 rw-p 00000000 00:00 0
a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
@@ -343,6 +345,7 @@
[stack] = the stack of the main process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler
+ [threadstack:xxxxxxxx] = the stack of the thread, xxxxxxxx is the stack size
or if empty, the mapping is anonymous.
diff -u -N -r linux-2.6.30.orig/fs/exec.c linux-2.6.30/fs/exec.c
--- linux-2.6.30.orig/fs/exec.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/fs/exec.c 2009-06-24 16:22:11.000000000 +0200
@@ -1328,6 +1328,8 @@
if (retval < 0)
goto out;
+ current->stack_start = current->mm->start_stack;
+
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
diff -u -N -r linux-2.6.30.orig/fs/proc/array.c linux-2.6.30/fs/proc/array.c
--- linux-2.6.30.orig/fs/proc/array.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/fs/proc/array.c 2009-06-24 16:24:59.000000000 +0200
@@ -82,6 +82,7 @@
#include <linux/pid_namespace.h>
#include <linux/ptrace.h>
#include <linux/tracehook.h>
+#include <linux/swapops.h>
#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -321,6 +322,87 @@
p->nivcsw);
}
+struct stack_stats {
+ struct vm_area_struct *vma;
+ unsigned long startpage;
+ unsigned long usage;
+};
+
+static int stack_usage_pte_range(pmd_t *pmd, unsigned long addr,
+ unsigned long end, struct mm_walk *walk)
+{
+ struct stack_stats *ss = walk->private;
+ struct vm_area_struct *vma = ss->vma;
+ pte_t *pte, ptent;
+ spinlock_t *ptl;
+ int ret = 0;
+
+ pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+ for (; addr != end; pte++, addr += PAGE_SIZE) {
+ ptent = *pte;
+
+#ifdef CONFIG_STACK_GROWSUP
+ if (pte_present(ptent) || is_swap_pte(ptent))
+ ss->usage = addr - ss->startpage + PAGE_SIZE;
+#else
+ if (pte_present(ptent) || is_swap_pte(ptent)) {
+ ss->usage = ss->startpage - addr + PAGE_SIZE;
+ pte++;
+ ret = 1;
+ break;
+ }
+#endif
+ }
+ pte_unmap_unlock(pte - 1, ptl);
+ cond_resched();
+ return ret;
+}
+
+static inline unsigned long get_stack_usage_in_bytes(struct vm_area_struct *vma,
+ struct task_struct *task)
+{
+ struct stack_stats ss;
+ struct mm_walk stack_walk = {
+ .pmd_entry = stack_usage_pte_range,
+ .mm = vma->vm_mm,
+ .private = &ss,
+ };
+
+ if (!vma->vm_mm || is_vm_hugetlb_page(vma))
+ return 0;
+
+ ss.vma = vma;
+ ss.startpage = task->stack_start & PAGE_MASK;
+ ss.usage = 0;
+
+#ifdef CONFIG_STACK_GROWSUP
+ walk_page_range(KSTK_ESP(task) & PAGE_MASK, vma->vm_end,
+ &stack_walk);
+#else
+ walk_page_range(vma->vm_start, (KSTK_ESP(task) & PAGE_MASK) + PAGE_SIZE,
+ &stack_walk);
+#endif
+ return ss.usage;
+}
+
+static inline void task_show_stack_usage(struct seq_file *m,
+ struct task_struct *task)
+{
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = get_task_mm(task);
+
+ if (mm) {
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, task->stack_start);
+ if (vma)
+ seq_printf(m, "Stack usage:\t%lu kB\n",
+ get_stack_usage_in_bytes(vma, task) >> 10);
+
+ up_read(&mm->mmap_sem);
+ mmput(mm);
+ }
+}
+
int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
@@ -340,6 +422,7 @@
task_show_regs(m, task);
#endif
task_context_switch_counts(m, task);
+ task_show_stack_usage(m, task);
return 0;
}
@@ -481,7 +564,7 @@
rsslim,
mm ? mm->start_code : 0,
mm ? mm->end_code : 0,
- (permitted && mm) ? mm->start_stack : 0,
+ (permitted) ? task->stack_start : 0,
esp,
eip,
/* The signal information here is obsolete.
diff -u -N -r linux-2.6.30.orig/fs/proc/task_mmu.c linux-2.6.30/fs/proc/task_mmu.c
--- linux-2.6.30.orig/fs/proc/task_mmu.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/fs/proc/task_mmu.c 2009-06-24 16:22:11.000000000 +0200
@@ -242,6 +242,25 @@
} else if (vma->vm_start <= mm->start_stack &&
vma->vm_end >= mm->start_stack) {
name = "[stack]";
+ } else {
+ unsigned long stack_start;
+ struct proc_maps_private *pmp;
+
+ pmp = m->private;
+ stack_start = pmp->task->stack_start;
+
+ if (vma->vm_start <= stack_start &&
+ vma->vm_end >= stack_start) {
+ pad_len_spaces(m, len);
+ seq_printf(m,
+ "[threadstack:%08lx]",
+#ifdef CONFIG_STACK_GROWSUP
+ vma->vm_end - stack_start
+#else
+ stack_start - vma->vm_start
+#endif
+ );
+ }
}
} else {
name = "[vdso]";
diff -u -N -r linux-2.6.30.orig/include/linux/sched.h linux-2.6.30/include/linux/sched.h
--- linux-2.6.30.orig/include/linux/sched.h 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/include/linux/sched.h 2009-06-24 16:22:11.000000000 +0200
@@ -1429,6 +1429,7 @@
/* state flags for use by tracers */
unsigned long trace;
#endif
+ unsigned long stack_start;
};
/* Future-safe accessor for struct task_struct's cpus_allowed. */
diff -u -N -r linux-2.6.30.orig/kernel/fork.c linux-2.6.30/kernel/fork.c
--- linux-2.6.30.orig/kernel/fork.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30/kernel/fork.c 2009-06-24 16:22:11.000000000 +0200
@@ -1092,6 +1092,8 @@
if (unlikely(current->ptrace))
ptrace_fork(p, clone_flags);
+ p->stack_start = stack_start;
+
/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
^ permalink raw reply [flat|nested] 16+ messages in thread