public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* fix text reporting in O(1) proc_pid_statm()
  2004-08-24  7:55     ` O(1) proc_pid_statm() William Lee Irwin III
@ 2004-08-24 17:05       ` William Lee Irwin III
  0 siblings, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2004-08-24 17:05 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel

On Tue, Aug 24, 2004 at 12:55:39AM -0700, William Lee Irwin III wrote:
> Merely removing down_read(&mm->mmap_sem) from task_vsize() is too
> half-assed to let stand. The following patch removes the vma iteration
> as well as the down_read(&mm->mmap_sem) from both task_mem() and
> task_statm() and callers for the CONFIG_MMU=y case in favor of
> accounting the various stats reported at the times of vma creation,
> destruction, and modification. Unlike the 2.4.x patches of the same
> name, this has no per-pte-modification overhead whatsoever.
> This patch quashes end user complaints of top(1) being slow as well as
> kernel hacker complaints of per-pte accounting overhead simultaneously.
> Incremental atop the task_vsize() de-mmap_sem-ification of 2.6.8.1-mm4:

Some kind of brainfart happened here, though it's not visible on the
default display from top(1) etc. This patch fixes up the gibberish I
mistakenly put down for text with the proper text size, and subtracts
it from data as per the O(vmas) code beforehand.


Index: mm4-2.6.8.1/fs/proc/task_mmu.c
===================================================================
--- mm4-2.6.8.1.orig/fs/proc/task_mmu.c	2004-08-23 18:29:33.000000000 -0700
+++ mm4-2.6.8.1/fs/proc/task_mmu.c	2004-08-24 10:00:21.530755896 -0700
@@ -36,8 +36,8 @@
 	       int *data, int *resident)
 {
 	*shared = mm->shared_vm;
-	*text = mm->exec_vm - ((mm->end_code - mm->start_code) >> PAGE_SHIFT);
-	*data = mm->total_vm - mm->shared_vm;
+	*text = (mm->end_code - mm->start_code) >> PAGE_SHIFT;
+	*data = mm->total_vm - mm->shared_vm - *text;
 	*resident = mm->rss;
 	return mm->total_vm;
 }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fix text reporting in O(1) proc_pid_statm()
@ 2004-08-24 23:06 Albert Cahalan
  2004-08-24 23:12 ` William Lee Irwin III
  0 siblings, 1 reply; 8+ messages in thread
From: Albert Cahalan @ 2004-08-24 23:06 UTC (permalink / raw)
  To: wli; +Cc: linux-kernel mailing list

> - *text = mm->exec_vm - ((mm->end_code - mm->start_code) >> PAGE_SHIFT);
> - *data = mm->total_vm - mm->shared_vm;
> + *text = (mm->end_code - mm->start_code) >> PAGE_SHIFT;
> + *data = mm->total_vm - mm->shared_vm - *text;

It's actually still wrong. This has been broken for
a very long time. It you can fix it, great. Otherwise
this is a useless value, since /proc/*/stat provides
start_code and end_code already.

The statm file is supposed to contain a field known
as "trs" or "trss". This is like rss, but text-only.
Likewise, statm also contains "drs" or "drss" for data.
When you subtract start_code from end_code, you're
generating a value known as "tsiz" (the text size).
The statm file is supposed to supply trs, not tsiz.

Back in the days of a.out, statm also contained lrs
for libraries. ELF broke this.

The statm VM size is supposed to count IO mappings.
So if your X server maps 64 MB of video RAM, then
the statm file should have a value 64 MB larger than
the status file has.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fix text reporting in O(1) proc_pid_statm()
  2004-08-24 23:06 fix text reporting in O(1) proc_pid_statm() Albert Cahalan
@ 2004-08-24 23:12 ` William Lee Irwin III
  2004-08-24 23:18   ` William Lee Irwin III
  2004-08-25  2:52   ` Albert Cahalan
  0 siblings, 2 replies; 8+ messages in thread
From: William Lee Irwin III @ 2004-08-24 23:12 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list

At some point in the past, I wrote:
>> - *text = mm->exec_vm - ((mm->end_code - mm->start_code) >> PAGE_SHIFT);
>> - *data = mm->total_vm - mm->shared_vm;
>> + *text = (mm->end_code - mm->start_code) >> PAGE_SHIFT;
>> + *data = mm->total_vm - mm->shared_vm - *text;

On Tue, Aug 24, 2004 at 07:06:56PM -0400, Albert Cahalan wrote:
> It's actually still wrong. This has been broken for
> a very long time. It you can fix it, great. Otherwise
> this is a useless value, since /proc/*/stat provides
> start_code and end_code already.
> The statm file is supposed to contain a field known
> as "trs" or "trss". This is like rss, but text-only.
> Likewise, statm also contains "drs" or "drss" for data.
> When you subtract start_code from end_code, you're
> generating a value known as "tsiz" (the text size).
> The statm file is supposed to supply trs, not tsiz.

The current 2.6 semantics are purely virtual, so this merely
reimplements those semantics more efficiently. The scheme you
describe would require accounting at the time of pte modification
to implement in a like fashion, which has been rejected.


On Tue, Aug 24, 2004 at 07:06:56PM -0400, Albert Cahalan wrote:
> Back in the days of a.out, statm also contained lrs
> for libraries. ELF broke this.
> The statm VM size is supposed to count IO mappings.
> So if your X server maps 64 MB of video RAM, then
> the statm file should have a value 64 MB larger than
> the status file has.

This would not be difficult to perform additional accounting for.
I'll follow up with that shortly.


-- wli

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fix text reporting in O(1) proc_pid_statm()
  2004-08-24 23:12 ` William Lee Irwin III
@ 2004-08-24 23:18   ` William Lee Irwin III
  2004-08-24 23:24     ` William Lee Irwin III
  2004-08-25  2:52   ` Albert Cahalan
  1 sibling, 1 reply; 8+ messages in thread
From: William Lee Irwin III @ 2004-08-24 23:18 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list

On Tue, Aug 24, 2004 at 07:06:56PM -0400, Albert Cahalan wrote:
>> Back in the days of a.out, statm also contained lrs
>> for libraries. ELF broke this.
>> The statm VM size is supposed to count IO mappings.
>> So if your X server maps 64 MB of video RAM, then
>> the statm file should have a value 64 MB larger than
>> the status file has.

On Tue, Aug 24, 2004 at 04:12:36PM -0700, William Lee Irwin III wrote:
> This would not be difficult to perform additional accounting for.
> I'll follow up with that shortly.

Account reserved memory properly as per acahalan's sepecified semantics.


Index: mm4-2.6.8.1/fs/proc/task_mmu.c
===================================================================
--- mm4-2.6.8.1.orig/fs/proc/task_mmu.c	2004-08-24 10:00:21.530755896 -0700
+++ mm4-2.6.8.1/fs/proc/task_mmu.c	2004-08-24 16:14:48.802211472 -0700
@@ -19,7 +19,7 @@
 		"VmStk:\t%8lu kB\n"
 		"VmExe:\t%8lu kB\n"
 		"VmLib:\t%8lu kB\n",
-		mm->total_vm << (PAGE_SHIFT-10),
+		(mm->total_vm - mm->reserved_vm) << (PAGE_SHIFT-10),
 		mm->locked_vm << (PAGE_SHIFT-10),
 		mm->rss << (PAGE_SHIFT-10),
 		data << (PAGE_SHIFT-10),
Index: mm4-2.6.8.1/include/linux/sched.h
===================================================================
--- mm4-2.6.8.1.orig/include/linux/sched.h	2004-08-23 18:29:33.000000000 -0700
+++ mm4-2.6.8.1/include/linux/sched.h	2004-08-24 16:11:15.251676088 -0700
@@ -226,7 +226,7 @@
 	unsigned long start_brk, brk, start_stack;
 	unsigned long arg_start, arg_end, env_start, env_end;
 	unsigned long rlimit_rss, rss, total_vm, locked_vm, shared_vm;
-	unsigned long exec_vm, stack_vm, def_flags;
+	unsigned long exec_vm, stack_vm, reserved_vm, def_flags;
 
 	unsigned long saved_auxv[40]; /* for /proc/PID/auxv */
 
Index: mm4-2.6.8.1/mm/mmap.c
===================================================================
--- mm4-2.6.8.1.orig/mm/mmap.c	2004-08-23 23:33:42.000000000 -0700
+++ mm4-2.6.8.1/mm/mmap.c	2004-08-24 16:13:14.139602376 -0700
@@ -749,6 +749,8 @@
 		mm->stack_vm += pages;
 	if (flags & VM_EXEC)
 		mm->exec_vm += pages;
+	if (flags & (VM_RESERVED|VM_IO))
+		mm->reserved_vm += pages;
 }
 
 /*

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fix text reporting in O(1) proc_pid_statm()
  2004-08-24 23:18   ` William Lee Irwin III
@ 2004-08-24 23:24     ` William Lee Irwin III
  2004-08-25  0:01       ` William Lee Irwin III
  0 siblings, 1 reply; 8+ messages in thread
From: William Lee Irwin III @ 2004-08-24 23:24 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list

On Tue, Aug 24, 2004 at 04:12:36PM -0700, William Lee Irwin III wrote:
>> This would not be difficult to perform additional accounting for.
>> I'll follow up with that shortly.

On Tue, Aug 24, 2004 at 04:18:41PM -0700, William Lee Irwin III wrote:
> Account reserved memory properly as per acahalan's sepecified semantics.

Unrelated fix. Unaccount VM_DONTCOPY vmas properly; the child inherits
the whole of the parent's virtual accounting from the memcpy() in
copy_mm(), but the VM_DONTCOPY check here is where a decision is made
for the child not to inherit the vmas corresponding to some accounted
memory usages. Hence, unaccount them when skipping over them here.


-- wli

Index: mm4-2.6.8.1/kernel/fork.c
===================================================================
--- mm4-2.6.8.1.orig/kernel/fork.c	2004-08-23 16:19:50.000000000 -0700
+++ mm4-2.6.8.1/kernel/fork.c	2004-08-24 16:19:45.404121128 -0700
@@ -391,8 +391,11 @@
 	for (mpnt = current->mm->mmap ; mpnt ; mpnt = mpnt->vm_next) {
 		struct file *file;
 
-		if(mpnt->vm_flags & VM_DONTCOPY)
+		if (mpnt->vm_flags & VM_DONTCOPY) {
+			__vm_stat_account(mm, mpnt->vm_flags, mpnt->vm_file,
+							-vma_pages(mpnt));
 			continue;
+		}
 		charge = 0;
 		if (mpnt->vm_flags & VM_ACCOUNT) {
 			unsigned int len = (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fix text reporting in O(1) proc_pid_statm()
  2004-08-24 23:24     ` William Lee Irwin III
@ 2004-08-25  0:01       ` William Lee Irwin III
  0 siblings, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2004-08-25  0:01 UTC (permalink / raw)
  To: Albert Cahalan, linux-kernel mailing list

On Tue, Aug 24, 2004 at 04:18:41PM -0700, William Lee Irwin III wrote:
>> Account reserved memory properly as per acahalan's sepecified semantics.

On Tue, Aug 24, 2004 at 04:24:24PM -0700, William Lee Irwin III wrote:
> Unrelated fix. Unaccount VM_DONTCOPY vmas properly; the child inherits
> the whole of the parent's virtual accounting from the memcpy() in
> copy_mm(), but the VM_DONTCOPY check here is where a decision is made
> for the child not to inherit the vmas corresponding to some accounted
> memory usages. Hence, unaccount them when skipping over them here.

Unrelated improvement (not mandatory or a fix for a bug). Remove the
accounting overhead when CONFIG_PROC_FS is not defined.


Index: mm4-2.6.8.1/include/linux/mm.h
===================================================================
--- mm4-2.6.8.1.orig/include/linux/mm.h	2004-08-23 18:29:33.000000000 -0700
+++ mm4-2.6.8.1/include/linux/mm.h	2004-08-24 16:57:36.114920616 -0700
@@ -754,7 +754,15 @@
 		int write);
 extern int remap_page_range(struct vm_area_struct *vma, unsigned long from,
 		unsigned long to, unsigned long size, pgprot_t prot);
+
+#ifdef CONFIG_PROC_FS
 void __vm_stat_account(struct mm_struct *, unsigned long, struct file *, long);
+#else
+static inline void __vm_stat_account(struct mm_struct *mm,
+			unsigned long flags, struct file *file, long pages)
+{
+}
+#endif /* CONFIG_PROC_FS */
 
 static inline void vm_stat_account(struct vm_area_struct *vma)
 {
Index: mm4-2.6.8.1/mm/mmap.c
===================================================================
--- mm4-2.6.8.1.orig/mm/mmap.c	2004-08-24 16:13:14.139602376 -0700
+++ mm4-2.6.8.1/mm/mmap.c	2004-08-24 16:57:23.943770912 -0700
@@ -729,6 +729,7 @@
 	return NULL;
 }
 
+#ifdef CONFIG_PROC_FS
 void __vm_stat_account(struct mm_struct *mm, unsigned long flags,
 						struct file *file, long pages)
 {
@@ -752,6 +753,7 @@
 	if (flags & (VM_RESERVED|VM_IO))
 		mm->reserved_vm += pages;
 }
+#endif /* CONFIG_PROC_FS */
 
 /*
  * The caller must hold down_write(current->mm->mmap_sem).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fix text reporting in O(1) proc_pid_statm()
  2004-08-24 23:12 ` William Lee Irwin III
  2004-08-24 23:18   ` William Lee Irwin III
@ 2004-08-25  2:52   ` Albert Cahalan
  2004-08-25  3:04     ` William Lee Irwin III
  1 sibling, 1 reply; 8+ messages in thread
From: Albert Cahalan @ 2004-08-25  2:52 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel mailing list

On Tue, 2004-08-24 at 19:12, William Lee Irwin III wrote:
> At some point in the past, I wrote:
> >> - *text = mm->exec_vm - ((mm->end_code - mm->start_code) >> PAGE_SHIFT);
> >> - *data = mm->total_vm - mm->shared_vm;
> >> + *text = (mm->end_code - mm->start_code) >> PAGE_SHIFT;
> >> + *data = mm->total_vm - mm->shared_vm - *text;
> 
> On Tue, Aug 24, 2004 at 07:06:56PM -0400, Albert Cahalan wrote:
> > It's actually still wrong. This has been broken for
> > a very long time. It you can fix it, great. Otherwise
> > this is a useless value, since /proc/*/stat provides
> > start_code and end_code already.
> > The statm file is supposed to contain a field known
> > as "trs" or "trss". This is like rss, but text-only.
> > Likewise, statm also contains "drs" or "drss" for data.
> > When you subtract start_code from end_code, you're
> > generating a value known as "tsiz" (the text size).
> > The statm file is supposed to supply trs, not tsiz.
> 
> The current 2.6 semantics are purely virtual, so this merely
> reimplements those semantics more efficiently. The scheme you
> describe would require accounting at the time of pte modification
> to implement in a like fashion, which has been rejected.

Hmmm, why not reject RSS too then? It's the same thing.

If trs and drs and so on were kept, then rss would
just be the sum of them. Per-permission tracking
seems about right:

rss[perms] += change;

(where "perms" is rwx plus shared/private and dirty/clean)

At some point, it would be good to have per-vma rss.
This would be displayed by the pmap command's -x option.
(added to /proc/*/maps right before the filename)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fix text reporting in O(1) proc_pid_statm()
  2004-08-25  2:52   ` Albert Cahalan
@ 2004-08-25  3:04     ` William Lee Irwin III
  0 siblings, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2004-08-25  3:04 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list

On Tue, 2004-08-24 at 19:12, William Lee Irwin III wrote:
>> The current 2.6 semantics are purely virtual, so this merely
>> reimplements those semantics more efficiently. The scheme you
>> describe would require accounting at the time of pte modification
>> to implement in a like fashion, which has been rejected.

On Tue, Aug 24, 2004 at 10:52:31PM -0400, Albert Cahalan wrote:
> Hmmm, why not reject RSS too then? It's the same thing.
> If trs and drs and so on were kept, then rss would
> just be the sum of them. Per-permission tracking
> seems about right:
> rss[perms] += change;
> (where "perms" is rwx plus shared/private and dirty/clean)
> At some point, it would be good to have per-vma rss.
> This would be displayed by the pmap command's -x option.
> (added to /proc/*/maps right before the filename)

Generally the way these patches have operated is accounting
only the interesting combinations of those with respect to
reporting in the interest of conserving space in the mm. These
have also been combined with various checks on e.g. address
ranges and some special casing of hugetlb. So it's not been
quite as simple as all that.

The code from my 2.6.0-test7-bk1 forward port of the Red Hat patches
for this was as follows:

static inline void vm_account(struct vm_area_struct *vma, pte_t pte,
					unsigned long addr, long adjustment)
{
	struct mm_struct *mm = vma->vm_mm;
	unsigned long pfn;
	struct page *page;

	if (!pte_present(pte))
		return;

	pfn = pte_pfn(pte);
	if (!pfn_valid(pfn))
		goto out;

	page = pfn_to_page(pfn);
	if (PageReserved(page))
		goto out;

	if (vma->vm_flags & VM_EXECUTABLE)
		mm->text += adjustment;
	else if (vma->vm_flags & (VM_STACK_FLAGS & (VM_GROWSUP | VM_GROWSDOWN))) {
		mm->data += adjustment;
		mm->stack += adjustment;
	} else if (addr >= TASK_UNMAPPED_BASE)
		mm->lib += adjustment;
	else
		mm->data += adjustment;

	if (page_mapping(page))
		mm->shared += adjustment;

out:
	if (pte_write(pte))
		mm->dirty += adjustment;
}


-- wli

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-08-25  3:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-24 23:06 fix text reporting in O(1) proc_pid_statm() Albert Cahalan
2004-08-24 23:12 ` William Lee Irwin III
2004-08-24 23:18   ` William Lee Irwin III
2004-08-24 23:24     ` William Lee Irwin III
2004-08-25  0:01       ` William Lee Irwin III
2004-08-25  2:52   ` Albert Cahalan
2004-08-25  3:04     ` William Lee Irwin III
  -- strict thread matches above, loose matches on Subject: below --
2004-08-22  8:34 2.6.8.1-mm4 Andrew Morton
2004-08-23 20:21 ` 2.6.8.1-mm4 wli
2004-08-24  6:14   ` 2.6.8.1-mm4 Andrew Morton
2004-08-24  7:55     ` O(1) proc_pid_statm() William Lee Irwin III
2004-08-24 17:05       ` fix text reporting in " William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox