* Weirdness in /proc/<pid>/maps and /proc/<pid>/stat.
@ 2010-04-27 18:53 Robin Holt
2010-04-27 21:18 ` Stefani Seibold
2010-04-28 3:19 ` KOSAKI Motohiro
0 siblings, 2 replies; 6+ messages in thread
From: Robin Holt @ 2010-04-27 18:53 UTC (permalink / raw)
To: Stefani Seibold, KOSAKI Motohiro, Linus Torvalds; +Cc: linux-kernel
With commit d899bf7b, the behavior of field 28 of /proc/<pid>/stat
was changed as was /proc/<pid>/maps. I don't know if that change was
correct, but its resulting behavior is much more difficult to explain.
I was wondering if we could determine what the "correct" behavior is
before I spend much more time trying to give it the wrong behavior.
My test program is attached below. Essentially:
fork() -> pthread_create() -> fork()
x86_64 2.6.32 stable kernel:
Step stat-28 maps-threadstack
p (parent) 0x7fff5607ddc0 N/A
c (child) 0x7fff55c7dc50 N/A
ppthread 0x7f2cf5c9bff0 0x7f2cf5c9d000:007feff0
ppthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
cpthread 0x7f2cf589be30 0x7f2cf5c9d000:007feff0
cpthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
Note: For all of the above, the /proc/<pids>/task/*/maps files had the
stack line like:
7fff55c7d000-7fff56081000 rw-p 00000000 00:00 0 [stack]
The problems I see are:
1) In the fork() case, we are using the current userland stack
pointer for task->stack_start. This appears wrong as the
function which called fork() may be returned to and may
further return to higher level callers, finding sp now
beyond the value reported in /proc/self/stat. Additionally,
the value reported for the child of the fork has no relationship
to the stack size rlimit any longer.
2) In the pthread + fork case, in addition to the problem
above, the size information in /proc/self/maps
is incorrect as it does not take into consideration
the same return paths.
The problem I am running into is coming up with any way to
make the task->stack_start value usable.
Thanks,
Robin Holt
------------------------------- Example --------------------------------
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <asm-generic/unistd.h>
#include <sys/types.h>
pid_t
my_fork(void)
{
char test_array[2097152];
pid_t child;
printf("pid %d my_fork array at %p\n", getpid(), test_array);
child = fork();
return child;
}
void
*as_pthread(void *ignore)
{
char test_array[2097152];
printf("pid %d pthread array at %p\n", getpid(), test_array);
my_fork();
sleep(600);
return NULL;
}
int
main (int argc, char **argv)
{
char test_array[2097152];
pid_t child;
pthread_t pt;
printf("pid %d is parent 0x%p\n", getpid(), test_array);
child = my_fork();
if (child)
printf("child of main is %d\n", child);
else
sleep(1);
pthread_create(&pt, NULL, as_pthread, NULL);
sleep(600);
return 0;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Weirdness in /proc/<pid>/maps and /proc/<pid>/stat.
2010-04-27 18:53 Weirdness in /proc/<pid>/maps and /proc/<pid>/stat Robin Holt
@ 2010-04-27 21:18 ` Stefani Seibold
2010-04-27 23:22 ` Robin Holt
2010-04-28 3:19 ` KOSAKI Motohiro
1 sibling, 1 reply; 6+ messages in thread
From: Stefani Seibold @ 2010-04-27 21:18 UTC (permalink / raw)
To: Robin Holt; +Cc: KOSAKI Motohiro, Linus Torvalds, linux-kernel
Am Dienstag, den 27.04.2010, 13:53 -0500 schrieb Robin Holt:
> With commit d899bf7b, the behavior of field 28 of /proc/<pid>/stat
> was changed as was /proc/<pid>/maps. I don't know if that change was
> correct, but its resulting behavior is much more difficult to explain.
> I was wondering if we could determine what the "correct" behavior is
> before I spend much more time trying to give it the wrong behavior.
>
> My test program is attached below. Essentially:
> fork() -> pthread_create() -> fork()
>
> x86_64 2.6.32 stable kernel:
> Step stat-28 maps-threadstack
> p (parent) 0x7fff5607ddc0 N/A
> c (child) 0x7fff55c7dc50 N/A
> ppthread 0x7f2cf5c9bff0 0x7f2cf5c9d000:007feff0
> ppthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
> cpthread 0x7f2cf589be30 0x7f2cf5c9d000:007feff0
> cpthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
I must guess, because the output of your example code does not fit to
your description. But the stat-28 make sense for me. A thread gets a new
stack, a fork inherits the stack of the process.
> Note: For all of the above, the /proc/<pids>/task/*/maps files had the
> stack line like:
> 7fff55c7d000-7fff56081000 rw-p 00000000 00:00 0 [stack]
>
This should be okay, because this is the stack of the main thread. The
stack of the thread is marked as [thread stack: xxxxxxxx]. Where
xxxxxxxx is the max size of the stack, because the thread stack could
start at a arbitrary position inside the VMA.
> The problems I see are:
> 1) In the fork() case, we are using the current userland stack
> pointer for task->stack_start. This appears wrong as the
> function which called fork() may be returned to and may
> further return to higher level callers, finding sp now
> beyond the value reported in /proc/self/stat. Additionally,
> the value reported for the child of the fork has no relationship
> to the stack size rlimit any longer.
>
> 2) In the pthread + fork case, in addition to the problem
> above, the size information in /proc/self/maps
> is incorrect as it does not take into consideration
> the same return paths.
>
> The problem I am running into is coming up with any way to
> make the task->stack_start value usable.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Weirdness in /proc/<pid>/maps and /proc/<pid>/stat.
2010-04-27 21:18 ` Stefani Seibold
@ 2010-04-27 23:22 ` Robin Holt
0 siblings, 0 replies; 6+ messages in thread
From: Robin Holt @ 2010-04-27 23:22 UTC (permalink / raw)
To: Stefani Seibold; +Cc: Robin Holt, KOSAKI Motohiro, Linus Torvalds, linux-kernel
On Tue, Apr 27, 2010 at 11:18:49PM +0200, Stefani Seibold wrote:
> Am Dienstag, den 27.04.2010, 13:53 -0500 schrieb Robin Holt:
> > With commit d899bf7b, the behavior of field 28 of /proc/<pid>/stat
> > was changed as was /proc/<pid>/maps. I don't know if that change was
> > correct, but its resulting behavior is much more difficult to explain.
> > I was wondering if we could determine what the "correct" behavior is
> > before I spend much more time trying to give it the wrong behavior.
> >
> > My test program is attached below. Essentially:
> > fork() -> pthread_create() -> fork()
> >
> > x86_64 2.6.32 stable kernel:
> > Step stat-28 maps-threadstack
> > p (parent) 0x7fff5607ddc0 N/A
> > c (child) 0x7fff55c7dc50 N/A
> > ppthread 0x7f2cf5c9bff0 0x7f2cf5c9d000:007feff0
> > ppthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
> > cpthread 0x7f2cf589be30 0x7f2cf5c9d000:007feff0
> > cpthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
>
> I must guess, because the output of your example code does not fit to
> your description. But the stat-28 make sense for me. A thread gets a new
> stack, a fork inherits the stack of the process.
In the fork case, the child's stack is not limited to its location at
the time of the fork call, but rather the top which the parent used.
The first two lines illustrate this. Additionally, as the child returns
from the my_fork() function, its sp is much closer to the stat-28 field
of the parent above and much (in my test code 2MB) higher than what
field 28 is reporting as the start of the stack.
My point was field 28 is currently a nearly random value due to the
setting task->stack_start at the time fork is called. The current stack
pointer at that time has a very limited relationship to the start of
the stack.
This point further factors into the max stack size calculation. In the
case of the pthread'd tasks which did the fork, their max stack size
is really the same as their parents. Just because part of the stack is
consumed at the time of the fork call does not mean that portion of the
stack is not available to the task, merely that it was used at the time
of the system call.
>
> > Note: For all of the above, the /proc/<pids>/task/*/maps files had the
> > stack line like:
> > 7fff55c7d000-7fff56081000 rw-p 00000000 00:00 0 [stack]
> >
>
> This should be okay, because this is the stack of the main thread. The
> stack of the thread is marked as [thread stack: xxxxxxxx]. Where
> xxxxxxxx is the max size of the stack, because the thread stack could
> start at a arbitrary position inside the VMA.
>
> > The problems I see are:
> > 1) In the fork() case, we are using the current userland stack
> > pointer for task->stack_start. This appears wrong as the
> > function which called fork() may be returned to and may
> > further return to higher level callers, finding sp now
> > beyond the value reported in /proc/self/stat. Additionally,
> > the value reported for the child of the fork has no relationship
> > to the stack size rlimit any longer.
> >
> > 2) In the pthread + fork case, in addition to the problem
> > above, the size information in /proc/self/maps
> > is incorrect as it does not take into consideration
> > the same return paths.
> >
> > The problem I am running into is coming up with any way to
> > make the task->stack_start value usable.
Thanks,
Robin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Weirdness in /proc/<pid>/maps and /proc/<pid>/stat.
2010-04-27 18:53 Weirdness in /proc/<pid>/maps and /proc/<pid>/stat Robin Holt
2010-04-27 21:18 ` Stefani Seibold
@ 2010-04-28 3:19 ` KOSAKI Motohiro
2010-04-28 9:45 ` Robin Holt
1 sibling, 1 reply; 6+ messages in thread
From: KOSAKI Motohiro @ 2010-04-28 3:19 UTC (permalink / raw)
To: Robin Holt
Cc: kosaki.motohiro, Stefani Seibold, Linus Torvalds, linux-kernel,
Andrew Morton
>
> With commit d899bf7b, the behavior of field 28 of /proc/<pid>/stat
> was changed as was /proc/<pid>/maps. I don't know if that change was
> correct, but its resulting behavior is much more difficult to explain.
> I was wondering if we could determine what the "correct" behavior is
> before I spend much more time trying to give it the wrong behavior.
>
> My test program is attached below. Essentially:
> fork() -> pthread_create() -> fork()
>
> x86_64 2.6.32 stable kernel:
> Step stat-28 maps-threadstack
> p (parent) 0x7fff5607ddc0 N/A
> c (child) 0x7fff55c7dc50 N/A
> ppthread 0x7f2cf5c9bff0 0x7f2cf5c9d000:007feff0
> ppthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
> cpthread 0x7f2cf589be30 0x7f2cf5c9d000:007feff0
> cpthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
> Note: For all of the above, the /proc/<pids>/task/*/maps files had the
> stack line like:
> 7fff55c7d000-7fff56081000 rw-p 00000000 00:00 0 [stack]
>
> The problems I see are:
> 1) In the fork() case, we are using the current userland stack
> pointer for task->stack_start. This appears wrong as the
> function which called fork() may be returned to and may
> further return to higher level callers, finding sp now
> beyond the value reported in /proc/self/stat. Additionally,
> the value reported for the child of the fork has no relationship
> to the stack size rlimit any longer.
BUG.
> 2) In the pthread + fork case, in addition to the problem
> above, the size information in /proc/self/maps
> is incorrect as it does not take into consideration
> the same return paths.
BUG.
Robin, do you really need this feature? if not, I'll revert this one.
sidenote: if anyone really need to know thread stack range, I think we need to
prevent vma consoliation of thread stack, iow need to implement MAP_STACK.
>
> The problem I am running into is coming up with any way to
> make the task->stack_start value usable.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Weirdness in /proc/<pid>/maps and /proc/<pid>/stat.
2010-04-28 3:19 ` KOSAKI Motohiro
@ 2010-04-28 9:45 ` Robin Holt
2010-04-28 15:21 ` [Patch] Revert commit d899bf7b and its fixup commits -V1 Robin Holt
0 siblings, 1 reply; 6+ messages in thread
From: Robin Holt @ 2010-04-28 9:45 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Robin Holt, Stefani Seibold, Linus Torvalds, linux-kernel,
Andrew Morton
On Wed, Apr 28, 2010 at 12:19:17PM +0900, KOSAKI Motohiro wrote:
> >
> > With commit d899bf7b, the behavior of field 28 of /proc/<pid>/stat
> > was changed as was /proc/<pid>/maps. I don't know if that change was
> > correct, but its resulting behavior is much more difficult to explain.
> > I was wondering if we could determine what the "correct" behavior is
> > before I spend much more time trying to give it the wrong behavior.
> >
> > My test program is attached below. Essentially:
> > fork() -> pthread_create() -> fork()
> >
> > x86_64 2.6.32 stable kernel:
> > Step stat-28 maps-threadstack
> > p (parent) 0x7fff5607ddc0 N/A
> > c (child) 0x7fff55c7dc50 N/A
> > ppthread 0x7f2cf5c9bff0 0x7f2cf5c9d000:007feff0
> > ppthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
> > cpthread 0x7f2cf589be30 0x7f2cf5c9d000:007feff0
> > cpthread+fork 0x7f2cf589be30 0x7f2cf5c9d000:003fee30
> > Note: For all of the above, the /proc/<pids>/task/*/maps files had the
> > stack line like:
> > 7fff55c7d000-7fff56081000 rw-p 00000000 00:00 0 [stack]
> >
> > The problems I see are:
> > 1) In the fork() case, we are using the current userland stack
> > pointer for task->stack_start. This appears wrong as the
> > function which called fork() may be returned to and may
> > further return to higher level callers, finding sp now
> > beyond the value reported in /proc/self/stat. Additionally,
> > the value reported for the child of the fork has no relationship
> > to the stack size rlimit any longer.
>
> BUG.
>
>
> > 2) In the pthread + fork case, in addition to the problem
> > above, the size information in /proc/self/maps
> > is incorrect as it does not take into consideration
> > the same return paths.
>
> BUG.
>
> Robin, do you really need this feature? if not, I'll revert this one.
The person who reported the problem does not need the threadstack feature.
They do need a predictable way to find the start_stack value.
I do not see any way to make it consistent without finding the vma for
the stack and seeing if it covers both the task->stack_start and the
stack_start and if not, splitting the vma to fit the stack_start and
stack_size range.
As noted, that means we need to make these vma's non-mergable and also
means a heavily pthreaded application will have considerably more vmas.
Also, as pthreads exit, we need to ensure that userland takes care
of cleaning up the vmas or we are likely to have trouble when a
pthread_exit() is followed by another pthread_create().
My head is spinning. I really feel like a revert and then clearly scoping
the intended behavior and finding a way to implement that behavior is
the best way forward. I can be convinced otherwise, but I hope to not
do the work ;)
Thanks,
Robin
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Patch] Revert commit d899bf7b and its fixup commits -V1.
2010-04-28 9:45 ` Robin Holt
@ 2010-04-28 15:21 ` Robin Holt
0 siblings, 0 replies; 6+ messages in thread
From: Robin Holt @ 2010-04-28 15:21 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Robin Holt, Stefani Seibold, linux-kernel, Andrew Morton,
Michal Simek, Ingo Molnar
Originally, commit d899bf7b attempted to introduce a new feature for
showing where the threadstack was located and how many pages are being
utilized by the stack.
Commit c44972f1 was applied to fix the NO_MMU case.
Commit 89240ba0 was applied to fix a bug in ia32 executables being loaded.
Commit 9ebd4eba7 was applied to fix a bug which had kernel threads
printing a userland stack address.
Commit 1306d603f was then applied to revert the stack pages being used
to solve a significant performance regression.
This patch nearly undoes the effect of all these patches.
The reason for reverting these is it provides an unusable value in
field 28. For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address. This unpredictability of the stack_start value makes
it worthless. That includes the intended use of showing how much stack
space a thread has.
Other architectures will get different values. As an example, ia64
gets 0. The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.
I only partially reverted c44972f1. If I had completely reverted it,
I would have had to change mm/Makefile only build pagewalk.o when
CONFIG_PROC_PAGE_MONITOR is configured. Since I could not test the
builds without significant effort, I decided to not change mm/Makefile.
I only partially reverted 89240ba0. I left the KSTK_ESP() change in
place as that seemed worthwhile.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/filesystems/proc.txt | 3 +--
fs/compat.c | 2 --
fs/exec.c | 2 --
fs/proc/array.c | 3 +--
fs/proc/task_mmu.c | 19 -------------------
include/linux/sched.h | 1 -
kernel/fork.c | 2 --
7 files changed, 2 insertions(+), 30 deletions(-)
Index: revert_stack_start_V1/Documentation/filesystems/proc.txt
===================================================================
--- revert_stack_start_V1.orig/Documentation/filesystems/proc.txt 2010-04-28 09:45:35.711399569 -0500
+++ revert_stack_start_V1/Documentation/filesystems/proc.txt 2010-04-28 09:45:39.543654029 -0500
@@ -316,7 +316,7 @@ address perms offset dev in
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7cb1000-a7cb2000 ---p 00000000 00:00 0
-a7cb2000-a7eb2000 rw-p 00000000 00:00 0 [threadstack:001ff4b4]
+a7cb2000-a7eb2000 rw-p 00000000 00:00 0
a7eb2000-a7eb3000 ---p 00000000 00:00 0
a7eb3000-a7ed5000 rw-p 00000000 00:00 0
a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
@@ -352,7 +352,6 @@ is not associated with a file:
[stack] = the stack of the main process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler
- [threadstack:xxxxxxxx] = the stack of the thread, xxxxxxxx is the stack size
or if empty, the mapping is anonymous.
Index: revert_stack_start_V1/fs/compat.c
===================================================================
--- revert_stack_start_V1.orig/fs/compat.c 2010-04-28 09:45:35.711399569 -0500
+++ revert_stack_start_V1/fs/compat.c 2010-04-28 09:45:39.547653405 -0500
@@ -1531,8 +1531,6 @@ int compat_do_execve(char * filename,
if (retval < 0)
goto out;
- current->stack_start = current->mm->start_stack;
-
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
Index: revert_stack_start_V1/fs/exec.c
===================================================================
--- revert_stack_start_V1.orig/fs/exec.c 2010-04-28 09:45:35.711399569 -0500
+++ revert_stack_start_V1/fs/exec.c 2010-04-28 09:45:39.575649084 -0500
@@ -1387,8 +1387,6 @@ int do_execve(char * filename,
if (retval < 0)
goto out;
- current->stack_start = current->mm->start_stack;
-
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
Index: revert_stack_start_V1/fs/proc/array.c
===================================================================
--- revert_stack_start_V1.orig/fs/proc/array.c 2010-04-28 09:45:35.711399569 -0500
+++ revert_stack_start_V1/fs/proc/array.c 2010-04-28 09:45:39.611650788 -0500
@@ -81,7 +81,6 @@
#include <linux/pid_namespace.h>
#include <linux/ptrace.h>
#include <linux/tracehook.h>
-#include <linux/swapops.h>
#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -495,7 +494,7 @@ static int do_task_stat(struct seq_file
rsslim,
mm ? mm->start_code : 0,
mm ? mm->end_code : 0,
- (permitted && mm) ? task->stack_start : 0,
+ (permitted && mm) ? mm->start_stack : 0,
esp,
eip,
/* The signal information here is obsolete.
Index: revert_stack_start_V1/fs/proc/task_mmu.c
===================================================================
--- revert_stack_start_V1.orig/fs/proc/task_mmu.c 2010-04-28 09:45:35.711399569 -0500
+++ revert_stack_start_V1/fs/proc/task_mmu.c 2010-04-28 09:45:39.643649777 -0500
@@ -247,25 +247,6 @@ static void show_map_vma(struct seq_file
} else if (vma->vm_start <= mm->start_stack &&
vma->vm_end >= mm->start_stack) {
name = "[stack]";
- } else {
- unsigned long stack_start;
- struct proc_maps_private *pmp;
-
- pmp = m->private;
- stack_start = pmp->task->stack_start;
-
- if (vma->vm_start <= stack_start &&
- vma->vm_end >= stack_start) {
- pad_len_spaces(m, len);
- seq_printf(m,
- "[threadstack:%08lx]",
-#ifdef CONFIG_STACK_GROWSUP
- vma->vm_end - stack_start
-#else
- stack_start - vma->vm_start
-#endif
- );
- }
}
} else {
name = "[vdso]";
Index: revert_stack_start_V1/include/linux/sched.h
===================================================================
--- revert_stack_start_V1.orig/include/linux/sched.h 2010-04-28 09:45:35.711399569 -0500
+++ revert_stack_start_V1/include/linux/sched.h 2010-04-28 09:45:39.691648704 -0500
@@ -1497,7 +1497,6 @@ struct task_struct {
/* bitmask of trace recursion */
unsigned long trace_recursion;
#endif /* CONFIG_TRACING */
- unsigned long stack_start;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR /* memcg uses this to do batch job */
struct memcg_batch_info {
int do_batch; /* incremented when batch uncharge started */
Index: revert_stack_start_V1/kernel/fork.c
===================================================================
--- revert_stack_start_V1.orig/kernel/fork.c 2010-04-28 09:45:35.711399569 -0500
+++ revert_stack_start_V1/kernel/fork.c 2010-04-28 09:45:39.715648886 -0500
@@ -1114,8 +1114,6 @@ static struct task_struct *copy_process(
p->bts = NULL;
- p->stack_start = stack_start;
-
/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-04-28 15:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-27 18:53 Weirdness in /proc/<pid>/maps and /proc/<pid>/stat Robin Holt
2010-04-27 21:18 ` Stefani Seibold
2010-04-27 23:22 ` Robin Holt
2010-04-28 3:19 ` KOSAKI Motohiro
2010-04-28 9:45 ` Robin Holt
2010-04-28 15:21 ` [Patch] Revert commit d899bf7b and its fixup commits -V1 Robin Holt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox