public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] getrusage: fill ru_maxrss value
@ 2008-11-24 16:02 Jiri Pirko
  2008-11-24 18:00 ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Pirko @ 2008-11-24 16:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Oleg Nesterov

Based on the patch previously posted by Frank Mayhar:
http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772

Slightly changed to not panic and to set the value properly.
This patch enables "time" application to show maxresident value correctly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 include/linux/sched.h |    3 +++
 kernel/exit.c         |   21 +++++++++++++++++++++
 kernel/fork.c         |    2 ++
 kernel/sys.c          |    8 ++++++++
 4 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 644ffbd..b88c538 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -560,6 +560,7 @@ struct signal_struct {
 	unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
 	unsigned long inblock, oublock, cinblock, coublock;
 	struct task_io_accounting ioac;
+	unsigned long cmaxrss;
 
 	/*
 	 * We don't bother to synchronize most readers of this at all,
@@ -1177,6 +1178,8 @@ struct task_struct {
 	struct timespec real_start_time;	/* boot based time */
 /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
 	unsigned long min_flt, maj_flt;
+/* maxrss gets the hiwater_rss in do_exit() */
+	unsigned long maxrss;
 
 	struct task_cputime cputime_expires;
 	struct list_head cpu_timers[3];
diff --git a/kernel/exit.c b/kernel/exit.c
index 2d8be7e..e0f5fd5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -120,6 +120,8 @@ static void __exit_signal(struct task_struct *tsk)
 		sig->inblock += task_io_get_inblock(tsk);
 		sig->oublock += task_io_get_oublock(tsk);
 		task_io_accounting_add(&sig->ioac, &tsk->ioac);
+		if (tsk->maxrss > sig->cmaxrss)
+			sig->cmaxrss = tsk->maxrss;
 		sig = NULL; /* Marker for below. */
 	}
 
@@ -1051,6 +1053,14 @@ NORET_TYPE void do_exit(long code)
 	if (tsk->mm) {
 		update_hiwater_rss(tsk->mm);
 		update_hiwater_vm(tsk->mm);
+		BUG_ON(tsk->maxrss != 0);
+		/*
+		 * Store the hiwater_rss in a task field, since when we need
+		 * it in __exit_signal() the mm structure is gone; we can't
+		 * stuff it in the signal structure since then we would be
+		 * racing with wait_task_zombie().
+		 */
+		tsk->maxrss = tsk->mm->hiwater_rss;
 	}
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
@@ -1354,6 +1364,17 @@ static int wait_task_zombie(struct task_struct *p, int options,
 			sig->oublock + sig->coublock;
 		task_io_accounting_add(&psig->ioac, &p->ioac);
 		task_io_accounting_add(&psig->ioac, &sig->ioac);
+		/*
+		 * Save the maximum RSS of this task and all its terminated,
+		 * waited-for children.  Don't accumulate the RSS since, one,
+		 * other operating systems (FreeBSD, AIX) do it this way and,
+		 * two, the cumulative RSS of a long-lived process with lots
+		 * of sequential child process would be meaningless.
+		 */
+		if (sig->cmaxrss > psig->cmaxrss)
+			psig->cmaxrss = sig->cmaxrss;
+		if (p->maxrss > psig->cmaxrss)
+			psig->cmaxrss = p->maxrss;
 		spin_unlock_irq(&p->parent->sighand->siglock);
 	}
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 2a372a0..a853be4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -622,6 +622,7 @@ static int copy_mm(unsigned long clone_flags, struct task_struct * tsk)
 
 	tsk->min_flt = tsk->maj_flt = 0;
 	tsk->nvcsw = tsk->nivcsw = 0;
+	tsk->maxrss = 0;
 
 	tsk->mm = NULL;
 	tsk->active_mm = NULL;
@@ -847,6 +848,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
 	sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
 	task_io_accounting_init(&sig->ioac);
+	sig->cmaxrss = 0;
 	taskstats_tgid_init(sig);
 
 	task_lock(current->group_leader);
diff --git a/kernel/sys.c b/kernel/sys.c
index 31deba8..4f3855b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1569,6 +1569,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 			r->ru_majflt = p->signal->cmaj_flt;
 			r->ru_inblock = p->signal->cinblock;
 			r->ru_oublock = p->signal->coublock;
+			r->ru_maxrss = p->signal->cmaxrss;
 
 			if (who == RUSAGE_CHILDREN)
 				break;
@@ -1588,6 +1589,13 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 				accumulate_thread_rusage(t, r);
 				t = next_thread(t);
 			} while (t != p);
+			if (p->mm) {
+				update_hiwater_rss(p->mm);
+				if (r->ru_maxrss < p->mm->hiwater_rss)
+					r->ru_maxrss = p->mm->hiwater_rss;
+			} else {
+				r->ru_maxrss = p->maxrss;
+			}
 			break;
 
 		default:
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] getrusage: fill ru_maxrss value
  2008-11-24 16:02 [PATCH] getrusage: fill ru_maxrss value Jiri Pirko
@ 2008-11-24 18:00 ` Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2008-11-24 18:00 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: linux-kernel, Andrew Morton

On 11/24, Jiri Pirko wrote:
>
> @@ -1588,6 +1589,13 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
>  				accumulate_thread_rusage(t, r);
>  				t = next_thread(t);
>  			} while (t != p);
> +			if (p->mm) {
> +				update_hiwater_rss(p->mm);
> +				if (r->ru_maxrss < p->mm->hiwater_rss)
> +					r->ru_maxrss = p->mm->hiwater_rss;

The task can free its ->mm right after the check. You need get_task_mm(),
but you can't do this under ->siglock because it takes task_lock().

I think you can move this code outside of lock_task_sighand().

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] getrusage: fill ru_maxrss value
@ 2008-12-02 12:18 Jiri Pirko
  2008-12-02 16:05 ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Pirko @ 2008-12-02 12:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Oleg Nesterov

Based on the patch previously posted by Frank Mayhar:
http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772

Changed to do not panic and to set the value properly. Also made some
modifications as Oleg suggested. maxrss value is set to pages as "time"
application converts it co KBs.

This patch enables "time" application to show maxresident value
correctly.

Note that even without this patch applied there is a race between two
parallel update_hiwater_rss() callers.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 include/linux/sched.h |    1 +
 kernel/exit.c         |    4 ++++
 kernel/fork.c         |    1 +
 kernel/sys.c          |   17 +++++++++++++++++
 4 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 644ffbd..818c37d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -560,6 +560,7 @@ struct signal_struct {
 	unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
 	unsigned long inblock, oublock, cinblock, coublock;
 	struct task_io_accounting ioac;
+	unsigned long maxrss, cmaxrss;
 
 	/*
 	 * We don't bother to synchronize most readers of this at all,
diff --git a/kernel/exit.c b/kernel/exit.c
index 2d8be7e..f09d4e7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1051,6 +1051,8 @@ NORET_TYPE void do_exit(long code)
 	if (tsk->mm) {
 		update_hiwater_rss(tsk->mm);
 		update_hiwater_vm(tsk->mm);
+
+		tsk->signal->maxrss = tsk->mm->hiwater_rss;
 	}
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
@@ -1354,6 +1356,8 @@ static int wait_task_zombie(struct task_struct *p, int options,
 			sig->oublock + sig->coublock;
 		task_io_accounting_add(&psig->ioac, &p->ioac);
 		task_io_accounting_add(&psig->ioac, &sig->ioac);
+		if (psig->cmaxrss < sig->maxrss)
+			psig->cmaxrss = sig->maxrss;
 		spin_unlock_irq(&p->parent->sighand->siglock);
 	}
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 2a372a0..bb881aa 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -847,6 +847,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
 	sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
 	task_io_accounting_init(&sig->ioac);
+	sig->maxrss = sig->cmaxrss = 0;
 	taskstats_tgid_init(sig);
 
 	task_lock(current->group_leader);
diff --git a/kernel/sys.c b/kernel/sys.c
index 31deba8..f604b05 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1546,6 +1546,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 	unsigned long flags;
 	cputime_t utime, stime;
 	struct task_cputime cputime;
+	unsigned long maxrss = 0;
 
 	memset((char *) r, 0, sizeof *r);
 	utime = stime = cputime_zero;
@@ -1555,6 +1556,17 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 		goto out;
 	}
 
+	if (who == RUSAGE_SELF || who == RUSAGE_BOTH) {
+		struct mm_struct *mm;
+
+		mm = get_task_mm(p);
+		if (mm) {
+			update_hiwater_rss(mm);
+			maxrss = mm->hiwater_rss;
+			mmput(mm);
+		}
+	}
+
 	if (!lock_task_sighand(p, &flags))
 		return;
 
@@ -1569,6 +1581,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 			r->ru_majflt = p->signal->cmaj_flt;
 			r->ru_inblock = p->signal->cinblock;
 			r->ru_oublock = p->signal->coublock;
+			r->ru_maxrss = p->signal->cmaxrss;
 
 			if (who == RUSAGE_CHILDREN)
 				break;
@@ -1588,6 +1601,10 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 				accumulate_thread_rusage(t, r);
 				t = next_thread(t);
 			} while (t != p);
+			if (r->ru_maxrss < maxrss)
+				r->ru_maxrss = maxrss;
+			if (r->ru_maxrss < p->signal->maxrss)
+				r->ru_maxrss = p->signal->maxrss;
 			break;
 
 		default:
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] getrusage: fill ru_maxrss value
  2008-12-02 12:18 Jiri Pirko
@ 2008-12-02 16:05 ` Oleg Nesterov
  2008-12-02 16:50   ` Michael Kerrisk
  2008-12-02 16:53   ` Jiri Pirko
  0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2008-12-02 16:05 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: linux-kernel, Andrew Morton, Michael Kerrisk

On 12/02, Jiri Pirko wrote:
>
> Based on the patch previously posted by Frank Mayhar:
> http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772
>
> Changed to do not panic and to set the value properly. Also made some
> modifications as Oleg suggested. maxrss value is set to pages as "time"
> application converts it co KBs.
>
> This patch enables "time" application to show maxresident value
> correctly.

I believe the changelog could be a bit more descriptive ;)
And I think the patch needs more CCs. (Hugh and Michael cc'ed).

What about exec? Should we preserve signal_struct->maxrss, or should
we reset it? I don't know what is the right behaviour, but imho it
should be consistent with task_mem()/xacct_add_tsk(). And since
bprm_mm_init() does not copy ->hiwater_xxx, perhaps we should reset.
Or we should change the exec_mmap() path to preserve ->hiwater_xxx,
I dunno.

> Note that even without this patch applied there is a race between two
> parallel update_hiwater_rss() callers.
>
> [...snip...]
> @@ -1051,6 +1051,8 @@ NORET_TYPE void do_exit(long code)
>  	if (tsk->mm) {
>  		update_hiwater_rss(tsk->mm);
>  		update_hiwater_vm(tsk->mm);
> +
> +		tsk->signal->maxrss = tsk->mm->hiwater_rss;

Yes. But still it is not good the patch adds new racy calls
(k_getrusage() too). And in fact I think this code in do_exit()
should die, and we can update signal->maxrss in exit_mm().

Unless I missed something, the "if (tsk->mm) {}" code in
do_exit() is not needed because xacct_add_tsk() is not right
anyway. What we imho need is get_mm_hiwater_xxx(), can be
also used by task_mm().

I'll send the patch a bit later, then we can tweak this patch.

Do you agree?

(just in case, I don't mean that mm/ uses update_hiwater_xxx()
 "safely", afaics try_to_unmap can race with unmap_region
 for example, but still).

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] getrusage: fill ru_maxrss value
  2008-12-02 16:05 ` Oleg Nesterov
@ 2008-12-02 16:50   ` Michael Kerrisk
  2008-12-02 16:53   ` Jiri Pirko
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Kerrisk @ 2008-12-02 16:50 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Jiri Pirko, linux-kernel, Andrew Morton, linux-api

On Tue, Dec 2, 2008 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 12/02, Jiri Pirko wrote:
>>
>> Based on the patch previously posted by Frank Mayhar:
>> http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772
>>
>> Changed to do not panic and to set the value properly. Also made some
>> modifications as Oleg suggested. maxrss value is set to pages as "time"
>> application converts it co KBs.
>>
>> This patch enables "time" application to show maxresident value
>> correctly.
>
> I believe the changelog could be a bit more descriptive ;)
> And I think the patch needs more CCs. (Hugh and Michael cc'ed).

Thanks Oleg.

Jiri: linux-api@ should also be CCed on API changes; see
Documentation/SubmitChecklist and
http://thread.gmane.org/gmane.linux.ltp/5658

> What about exec? Should we preserve signal_struct->maxrss, or should
> we reset it? I don't know what is the right behaviour, but imho it
> should be consistent with task_mem()/xacct_add_tsk(). And since
> bprm_mm_init() does not copy ->hiwater_xxx, perhaps we should reset.
> Or we should change the exec_mmap() path to preserve ->hiwater_xxx,
> I dunno.
>
>> Note that even without this patch applied there is a race between two
>> parallel update_hiwater_rss() callers.
>>
>> [...snip...]
>> @@ -1051,6 +1051,8 @@ NORET_TYPE void do_exit(long code)
>>       if (tsk->mm) {
>>               update_hiwater_rss(tsk->mm);
>>               update_hiwater_vm(tsk->mm);
>> +
>> +             tsk->signal->maxrss = tsk->mm->hiwater_rss;
>
> Yes. But still it is not good the patch adds new racy calls
> (k_getrusage() too). And in fact I think this code in do_exit()
> should die, and we can update signal->maxrss in exit_mm().
>
> Unless I missed something, the "if (tsk->mm) {}" code in
> do_exit() is not needed because xacct_add_tsk() is not right
> anyway. What we imho need is get_mm_hiwater_xxx(), can be
> also used by task_mm().
>
> I'll send the patch a bit later, then we can tweak this patch.
>
> Do you agree?
>
> (just in case, I don't mean that mm/ uses update_hiwater_xxx()
>  "safely", afaics try_to_unmap can race with unmap_region
>  for example, but still).
>
> Oleg.
>
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] getrusage: fill ru_maxrss value
  2008-12-02 16:05 ` Oleg Nesterov
  2008-12-02 16:50   ` Michael Kerrisk
@ 2008-12-02 16:53   ` Jiri Pirko
  1 sibling, 0 replies; 6+ messages in thread
From: Jiri Pirko @ 2008-12-02 16:53 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Andrew Morton, Michael Kerrisk

On Tue, 2 Dec 2008 17:05:07 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> On 12/02, Jiri Pirko wrote:
> >
> > Based on the patch previously posted by Frank Mayhar:
> > http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772
> >
> > Changed to do not panic and to set the value properly. Also made some
> > modifications as Oleg suggested. maxrss value is set to pages as "time"
> > application converts it co KBs.
> >
> > This patch enables "time" application to show maxresident value
> > correctly.
> 
> I believe the changelog could be a bit more descriptive ;)
> And I think the patch needs more CCs. (Hugh and Michael cc'ed).
> 
> What about exec? Should we preserve signal_struct->maxrss, or should
> we reset it? I don't know what is the right behaviour, but imho it
> should be consistent with task_mem()/xacct_add_tsk(). And since
> bprm_mm_init() does not copy ->hiwater_xxx, perhaps we should reset.
> Or we should change the exec_mmap() path to preserve ->hiwater_xxx,
> I dunno.
> 
> > Note that even without this patch applied there is a race between two
> > parallel update_hiwater_rss() callers.
> >
> > [...snip...]
> > @@ -1051,6 +1051,8 @@ NORET_TYPE void do_exit(long code)
> >  	if (tsk->mm) {
> >  		update_hiwater_rss(tsk->mm);
> >  		update_hiwater_vm(tsk->mm);
> > +
> > +		tsk->signal->maxrss = tsk->mm->hiwater_rss;
> 
> Yes. But still it is not good the patch adds new racy calls
> (k_getrusage() too). And in fact I think this code in do_exit()
> should die, and we can update signal->maxrss in exit_mm().
> 
> Unless I missed something, the "if (tsk->mm) {}" code in
> do_exit() is not needed because xacct_add_tsk() is not right
> anyway. What we imho need is get_mm_hiwater_xxx(), can be
> also used by task_mm().
> 
> I'll send the patch a bit later, then we can tweak this patch.
> 
> Do you agree?
Okay.

Jirka
> 
> (just in case, I don't mean that mm/ uses update_hiwater_xxx()
>  "safely", afaics try_to_unmap can race with unmap_region
>  for example, but still).
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-02 16:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-24 16:02 [PATCH] getrusage: fill ru_maxrss value Jiri Pirko
2008-11-24 18:00 ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2008-12-02 12:18 Jiri Pirko
2008-12-02 16:05 ` Oleg Nesterov
2008-12-02 16:50   ` Michael Kerrisk
2008-12-02 16:53   ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox