* [PATCH V3] nommu: add anonymous page memcg accounting
@ 2010-10-21 12:28 Steven J. Magnani
2010-10-22 3:20 ` KAMEZAWA Hiroyuki
2010-10-22 3:53 ` Balbir Singh
0 siblings, 2 replies; 6+ messages in thread
From: Steven J. Magnani @ 2010-10-21 12:28 UTC (permalink / raw)
To: linux-mm; +Cc: balbir, dhowells, linux-kernel, kamezawa.hiroyu,
Steven J. Magnani
Add the necessary calls to track VM anonymous page usage (only).
V3 changes:
* Use vma->vm_mm instead of current->mm when charging pages, for clarity
* Document that reclaim is not possible with only anonymous page accounting
so the OOM-killer is invoked when a limit is exceeded
* Add TODO to implement file cache (reclaim) support or optimize away
page_cgroup->lru
V2 changes:
* Added update of memory cgroup documentation
* Clarify use of 'file' to distinguish anonymous mappings
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
---
diff -uprN a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
--- a/Documentation/cgroups/memory.txt 2010-10-05 09:14:36.000000000 -0500
+++ b/Documentation/cgroups/memory.txt 2010-10-21 07:25:24.000000000 -0500
@@ -34,6 +34,7 @@ Current Status: linux-2.6.34-mmotm(devel
Features:
- accounting anonymous pages, file caches, swap caches usage and limiting them.
+ NOTE: On NOMMU systems, only anonymous pages are accounted.
- private LRU and reclaim routine. (system's global LRU and private LRU
work independently from each other)
- optionally, memory+swap usage can be accounted and limited.
@@ -640,13 +641,41 @@ At reading, current status of OOM is sho
under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may
be stopped.)
-11. TODO
+11. NOMMU Support
+
+Systems without a Memory Management Unit do not support virtual memory,
+swapping, page faults, or migration, and are therefore limited to operating
+entirely within the system's RAM. On such systems, maintaining an ability to
+allocate sufficiently large blocks of contiguous memory is usually a challenge.
+This makes the overhead involved in memory cgroup support more of a concern,
+particularly when the memory page size is small.
+
+Typically, embedded systems are comparatively simple and deterministic, and are
+required to remain stable over long periods. Invocation of the OOM-killer, were
+it to occur in an uncontrolled manner, would likely destabilize such systems.
+
+Even a well-designed system may be presented with external stimuli that could
+lead to OOM conditions. One example is a system that is required to check a
+user-supplied removable FAT filesystem. As there is no way to bound the size
+or coherence of the user's filesystem, the memory required to run dosfsck on
+it may exceed the system's capacity. Running dosfsck in a memory cgroup
+can preserve system stability even in the face of excessive memory demands.
+
+At the present time, only anonymous pages are included in NOMMU memory cgroup
+accounting. As anonymous pages are not reclaimable, when a memory cgroup
+exceeds its limit, reclaim will fail and the OOM-killer will be invoked.
+See the Reclaim section of this document.
+
+12. TODO
1. Add support for accounting huge pages (as a separate controller)
2. Make per-cgroup scanner reclaim not-shared pages first
3. Teach controller to account for shared-pages
4. Start reclamation in the background when the limit is
not yet hit but the usage is getting closer
+5. NOMMU: implement file cache accounting (which would support reclaim)
+ or optimize away page_cgroup->lru, which is just per-page overhead when
+ reclaim is not supported.
Summary
diff -uprN a/mm/nommu.c b/mm/nommu.c
--- a/mm/nommu.c 2010-10-13 08:20:38.000000000 -0500
+++ b/mm/nommu.c 2010-10-20 07:34:11.000000000 -0500
@@ -524,8 +524,10 @@ static void delete_nommu_region(struct v
/*
* free a contiguous series of pages
*/
-static void free_page_series(unsigned long from, unsigned long to)
+static void free_page_series(unsigned long from, unsigned long to,
+ const struct file *file)
{
+ mem_cgroup_uncharge_start();
for (; from < to; from += PAGE_SIZE) {
struct page *page = virt_to_page(from);
@@ -534,8 +536,13 @@ static void free_page_series(unsigned lo
if (page_count(page) != 1)
kdebug("free page %p: refcount not one: %d",
page, page_count(page));
+ /* Only anonymous pages are charged, currently */
+ if (!file)
+ mem_cgroup_uncharge_page(page);
+
put_page(page);
}
+ mem_cgroup_uncharge_end();
}
/*
@@ -563,7 +570,8 @@ static void __put_nommu_region(struct vm
* from ramfs/tmpfs mustn't be released here */
if (region->vm_flags & VM_MAPPED_COPY) {
kdebug("free series");
- free_page_series(region->vm_start, region->vm_top);
+ free_page_series(region->vm_start, region->vm_top,
+ region->vm_file);
}
kmem_cache_free(vm_region_jar, region);
} else {
@@ -1117,9 +1125,27 @@ static int do_mmap_private(struct vm_are
set_page_refcounted(&pages[point]);
base = page_address(pages);
- region->vm_flags = vma->vm_flags |= VM_MAPPED_COPY;
+
region->vm_start = (unsigned long) base;
region->vm_end = region->vm_start + rlen;
+
+ /* Only anonymous pages are charged, currently */
+ if (!vma->vm_file) {
+ for (point = 0; point < total; point++) {
+ int charge_failed =
+ mem_cgroup_newpage_charge(&pages[point],
+ vma->vm_mm,
+ GFP_KERNEL);
+ if (charge_failed) {
+ free_page_series(region->vm_start,
+ region->vm_end, NULL);
+ region->vm_start = region->vm_end = 0;
+ goto enomem;
+ }
+ }
+ }
+
+ region->vm_flags = vma->vm_flags |= VM_MAPPED_COPY;
region->vm_top = region->vm_start + (total << PAGE_SHIFT);
vma->vm_start = region->vm_start;
@@ -1150,7 +1176,7 @@ static int do_mmap_private(struct vm_are
return 0;
error_free:
- free_page_series(region->vm_start, region->vm_end);
+ free_page_series(region->vm_start, region->vm_end, vma->vm_file);
region->vm_start = vma->vm_start = 0;
region->vm_end = vma->vm_end = 0;
region->vm_top = 0;
@@ -1213,16 +1239,15 @@ unsigned long do_mmap_pgoff(struct file
INIT_LIST_HEAD(&vma->anon_vma_chain);
vma->vm_flags = vm_flags;
vma->vm_pgoff = pgoff;
+ vma->vm_mm = current->mm;
if (file) {
region->vm_file = file;
get_file(file);
vma->vm_file = file;
get_file(file);
- if (vm_flags & VM_EXECUTABLE) {
+ if (vm_flags & VM_EXECUTABLE)
added_exe_file_vma(current->mm);
- vma->vm_mm = current->mm;
- }
}
down_write(&nommu_region_sem);
@@ -1555,7 +1580,7 @@ static int shrink_vma(struct mm_struct *
add_nommu_region(region);
up_write(&nommu_region_sem);
- free_page_series(from, to);
+ free_page_series(from, to, vma->vm_file);
return 0;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V3] nommu: add anonymous page memcg accounting
2010-10-21 12:28 [PATCH V3] nommu: add anonymous page memcg accounting Steven J. Magnani
@ 2010-10-22 3:20 ` KAMEZAWA Hiroyuki
2010-10-22 13:26 ` Steven J. Magnani
2010-10-22 3:53 ` Balbir Singh
1 sibling, 1 reply; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-22 3:20 UTC (permalink / raw)
To: Steven J. Magnani; +Cc: linux-mm, balbir, dhowells, linux-kernel
On Thu, 21 Oct 2010 07:28:08 -0500
"Steven J. Magnani" <steve@digidescorp.com> wrote:
> Add the necessary calls to track VM anonymous page usage (only).
>
> V3 changes:
> * Use vma->vm_mm instead of current->mm when charging pages, for clarity
> * Document that reclaim is not possible with only anonymous page accounting
> so the OOM-killer is invoked when a limit is exceeded
> * Add TODO to implement file cache (reclaim) support or optimize away
> page_cgroup->lru
>
> V2 changes:
> * Added update of memory cgroup documentation
> * Clarify use of 'file' to distinguish anonymous mappings
>
> Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Thanks,
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
BTW, have you tried oom_notifier+NOMMU memory limit oom-killer ?
It may be a chance to implement a custom OOM-Killer in userland on
EMBEDED systems.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V3] nommu: add anonymous page memcg accounting
2010-10-22 3:20 ` KAMEZAWA Hiroyuki
@ 2010-10-22 13:26 ` Steven J. Magnani
2010-10-25 0:13 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 6+ messages in thread
From: Steven J. Magnani @ 2010-10-22 13:26 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, dhowells, linux-kernel
On Fri, 2010-10-22 at 12:20 +0900, KAMEZAWA Hiroyuki wrote:
> BTW, have you tried oom_notifier+NOMMU memory limit oom-killer ?
> It may be a chance to implement a custom OOM-Killer in userland on
> EMBEDED systems.
No - for what I need (simple sandboxing) just running my 'problem'
process in a memory cgroup is sufficient. I might even be able to get
away with oom_kill_allocating_task and no cgroup, but since that would
allow dosfsck to run the system completely out of memory there's no
guarantee that it would be the one that pushes the system over the edge.
What do you mean by "NOMMU memory limit"? (Is there some other way to
achieve the same functionality?)
I looked into David's initial suggestion of using ulimit to create a
sandbox but it seems that nommu.c doesn't respect RLIMIT_AS. When I can
find some time I'll try to cook up a patch for that.
Also it seems that nommu.c doesn't ever decrement mm->total_vm, which if
I'm reading the code correctly (before the 2.6.36 OOM-killer rewrite)
could throw off badness calculations for processes that do lots of
malloc/free operations. In 2.6.36 it doesn't look to me like this would
have any ill effects.
Thanks for all the feedback. I fully agree that maintenance should be a
strong consideration when merging new code.
Regards,
------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
www.digidescorp.com Earthling, return my space modulator!"
#include <standard.disclaimer>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V3] nommu: add anonymous page memcg accounting
2010-10-22 13:26 ` Steven J. Magnani
@ 2010-10-25 0:13 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-25 0:13 UTC (permalink / raw)
To: steve; +Cc: linux-mm, balbir, dhowells, linux-kernel
On Fri, 22 Oct 2010 08:26:08 -0500
"Steven J. Magnani" <steve@digidescorp.com> wrote:
> On Fri, 2010-10-22 at 12:20 +0900, KAMEZAWA Hiroyuki wrote:
> > BTW, have you tried oom_notifier+NOMMU memory limit oom-killer ?
> > It may be a chance to implement a custom OOM-Killer in userland on
> > EMBEDED systems.
>
> No - for what I need (simple sandboxing) just running my 'problem'
> process in a memory cgroup is sufficient. I might even be able to get
> away with oom_kill_allocating_task and no cgroup, but since that would
> allow dosfsck to run the system completely out of memory there's no
> guarantee that it would be the one that pushes the system over the edge.
>
> What do you mean by "NOMMU memory limit"? (Is there some other way to
> achieve the same functionality?)
>
I just meant memory cgroup for NOMMU.
> I looked into David's initial suggestion of using ulimit to create a
> sandbox but it seems that nommu.c doesn't respect RLIMIT_AS. When I can
> find some time I'll try to cook up a patch for that.
Hmm. I think fixing RLIMIT_AS is better. (but no nack to this patch.)
Using memcg for _a_ program sounds like overkill...
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V3] nommu: add anonymous page memcg accounting
2010-10-21 12:28 [PATCH V3] nommu: add anonymous page memcg accounting Steven J. Magnani
2010-10-22 3:20 ` KAMEZAWA Hiroyuki
@ 2010-10-22 3:53 ` Balbir Singh
2010-10-22 4:34 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 6+ messages in thread
From: Balbir Singh @ 2010-10-22 3:53 UTC (permalink / raw)
To: Steven J. Magnani; +Cc: linux-mm, dhowells, linux-kernel, kamezawa.hiroyu
* Steven J. Magnani <steve@digidescorp.com> [2010-10-21 07:28:08]:
> Add the necessary calls to track VM anonymous page usage (only).
>
> V3 changes:
> * Use vma->vm_mm instead of current->mm when charging pages, for clarity
> * Document that reclaim is not possible with only anonymous page accounting
> so the OOM-killer is invoked when a limit is exceeded
> * Add TODO to implement file cache (reclaim) support or optimize away
> page_cgroup->lru
>
> V2 changes:
> * Added update of memory cgroup documentation
> * Clarify use of 'file' to distinguish anonymous mappings
>
> Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
BTW, I have no way of testing this, we need to rely on the NOMMU
community to test this.
--
Three Cheers,
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V3] nommu: add anonymous page memcg accounting
2010-10-22 3:53 ` Balbir Singh
@ 2010-10-22 4:34 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-22 4:34 UTC (permalink / raw)
To: balbir; +Cc: Steven J. Magnani, linux-mm, dhowells, linux-kernel
On Fri, 22 Oct 2010 09:23:03 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * Steven J. Magnani <steve@digidescorp.com> [2010-10-21 07:28:08]:
>
> > Add the necessary calls to track VM anonymous page usage (only).
> >
> > V3 changes:
> > * Use vma->vm_mm instead of current->mm when charging pages, for clarity
> > * Document that reclaim is not possible with only anonymous page accounting
> > so the OOM-killer is invoked when a limit is exceeded
> > * Add TODO to implement file cache (reclaim) support or optimize away
> > page_cgroup->lru
> >
> > V2 changes:
> > * Added update of memory cgroup documentation
> > * Clarify use of 'file' to distinguish anonymous mappings
> >
> > Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
>
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>
> BTW, I have no way of testing this, we need to rely on the NOMMU
> community to test this.
>
Yes, that's the biggest problem.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-10-25 0:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-21 12:28 [PATCH V3] nommu: add anonymous page memcg accounting Steven J. Magnani
2010-10-22 3:20 ` KAMEZAWA Hiroyuki
2010-10-22 13:26 ` Steven J. Magnani
2010-10-25 0:13 ` KAMEZAWA Hiroyuki
2010-10-22 3:53 ` Balbir Singh
2010-10-22 4:34 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).