From: Paul Jackson <pj@sgi.com>
To: Dinakar Guniguntala <dino@in.ibm.com>,
Simon Derr <Simon.Derr@bull.net>,
Erich Focht <efocht@hpce.nec.com>
Cc: linux-mm@kvack.org, Paul Jackson <pj@sgi.com>
Subject: [PATCH 1/4] cpusets oom_kill and page_alloc tweaks
Date: Sun, 10 Jul 2005 18:58:42 -0700 (PDT) [thread overview]
Message-ID: <20050711015842.23183.53450.sendpatchset@tomahawk.engr.sgi.com> (raw)
In-Reply-To: <20050711015835.23183.40213.sendpatchset@tomahawk.engr.sgi.com>
This patch applies a few comment and code cleanups, to mm/oom_kill.c
and mm/page_alloc.c, prior to applying a patch set to improve
cpuset management of memory placement.
The first comment changed in oom_kill.c, and the first one and only
one changed in page_alloc.c, were seriously misleading. The code
layout change in select_bad_process() makes room for adding another
condition on which a process can be spared the oom killer (see the
subsequent cpuset_nodes_overlap patch for this addition).
Also a couple typos and spellos that bugged me, while I was here.
And add documentation for cpuset flag notify_on_release.
This patch should have no material affect.
Signed-off-by: Paul Jackson <pj@sgi.com>
Index: linux-2.6-mem_exclusive/mm/oom_kill.c
===================================================================
--- linux-2.6-mem_exclusive.orig/mm/oom_kill.c 2005-06-29 23:11:41.000000000 -0700
+++ linux-2.6-mem_exclusive/mm/oom_kill.c 2005-07-02 17:40:03.000000000 -0700
@@ -6,8 +6,8 @@
* for goading me into coding this file...
*
* The routines in this file are used to kill a process when
- * we're seriously out of memory. This gets called from kswapd()
- * in linux/mm/vmscan.c when we really run out of memory.
+ * we're seriously out of memory. This gets called from __alloc_pages()
+ * in mm/page_alloc.c when we really run out of memory.
*
* Since we won't call these routines often (on a well-configured
* machine) this file will double as a 'coding guide' and a signpost
@@ -26,7 +26,7 @@
/**
* oom_badness - calculate a numeric value for how bad this task has been
* @p: task struct of which task we should calculate
- * @p: current uptime in seconds
+ * @uptime: current uptime in seconds
*
* The formula used is relatively simple and documented inline in the
* function. The main rationale is that we want to select a good task
@@ -57,9 +57,9 @@ unsigned long badness(struct task_struct
/*
* Processes which fork a lot of child processes are likely
- * a good choice. We add the vmsize of the childs if they
+ * a good choice. We add the vmsize of the children if they
* have an own mm. This prevents forking servers to flood the
- * machine with an endless amount of childs
+ * machine with an endless amount of children
*/
list_for_each(tsk, &p->children) {
struct task_struct *chld;
@@ -143,28 +143,32 @@ static struct task_struct * select_bad_p
struct timespec uptime;
do_posix_clock_monotonic_gettime(&uptime);
- do_each_thread(g, p)
- /* skip the init task with pid == 1 */
- if (p->pid > 1 && p->oomkilladj != OOM_DISABLE) {
- unsigned long points;
+ do_each_thread(g, p) {
+ unsigned long points;
+ int releasing;
- /*
- * This is in the process of releasing memory so wait it
- * to finish before killing some other task by mistake.
- */
- if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags & PF_EXITING)) &&
- !(p->flags & PF_DEAD))
- return ERR_PTR(-1UL);
- if (p->flags & PF_SWAPOFF)
- return p;
-
- points = badness(p, uptime.tv_sec);
- if (points > maxpoints || !chosen) {
- chosen = p;
- maxpoints = points;
- }
+ /* skip the init task with pid == 1 */
+ if (p->pid == 1)
+ continue;
+ if (p->oomkilladj == OOM_DISABLE)
+ continue;
+ /*
+ * This is in the process of releasing memory so for wait it
+ * to finish before killing some other task by mistake.
+ */
+ releasing = test_tsk_thread_flag(p, TIF_MEMDIE) ||
+ p->flags & PF_EXITING;
+ if (releasing && !(p->flags & PF_DEAD))
+ return ERR_PTR(-1UL);
+ if (p->flags & PF_SWAPOFF)
+ return p;
+
+ points = badness(p, uptime.tv_sec);
+ if (points > maxpoints || !chosen) {
+ chosen = p;
+ maxpoints = points;
}
- while_each_thread(g, p);
+ } while_each_thread(g, p);
return chosen;
}
@@ -189,7 +193,8 @@ static void __oom_kill_task(task_t *p)
return;
}
task_unlock(p);
- printk(KERN_ERR "Out of Memory: Killed process %d (%s).\n", p->pid, p->comm);
+ printk(KERN_ERR "Out of Memory: Killed process %d (%s).\n",
+ p->pid, p->comm);
/*
* We give our sacrificial lamb high priority and access to
Index: linux-2.6-mem_exclusive/mm/page_alloc.c
===================================================================
--- linux-2.6-mem_exclusive.orig/mm/page_alloc.c 2005-06-29 23:11:41.000000000 -0700
+++ linux-2.6-mem_exclusive/mm/page_alloc.c 2005-07-02 17:40:04.000000000 -0700
@@ -898,10 +898,9 @@ rebalance:
if (likely(did_some_progress)) {
/*
- * Go through the zonelist yet one more time, keep
- * very high watermark here, this is only to catch
- * a parallel oom killing, we must fail if we're still
- * under heavy pressure.
+ * Go through the zone list yet one more time, with
+ * min watermark and trying harder. Since try_to_free_pages
+ * made some progress, there might be something free.
*/
for (i = 0; (z = zones[i]) != NULL; i++) {
if (!zone_watermark_ok(z, order, z->pages_min,
Index: linux-2.6-mem_exclusive/Documentation/cpusets.txt
===================================================================
--- linux-2.6-mem_exclusive.orig/Documentation/cpusets.txt 2005-07-02 17:40:04.000000000 -0700
+++ linux-2.6-mem_exclusive/Documentation/cpusets.txt 2005-07-02 17:40:54.000000000 -0700
@@ -143,7 +143,7 @@ Cpusets extends these two mechanisms as
The implementation of cpusets requires a few, simple hooks
into the rest of the kernel, none in performance critical paths:
- - in main/init.c, to initialize the root cpuset at system boot.
+ - in init/main.c, to initialize the root cpuset at system boot.
- in fork and exit, to attach and detach a task from its cpuset.
- in sched_setaffinity, to mask the requested CPUs by what's
allowed in that tasks cpuset.
@@ -154,7 +154,7 @@ into the rest of the kernel, none in per
and related changes in both sched.c and arch/ia64/kernel/domain.c
- in the mbind and set_mempolicy system calls, to mask the requested
Memory Nodes by what's allowed in that tasks cpuset.
- - in page_alloc, to restrict memory to allowed nodes.
+ - in page_alloc.c, to restrict memory to allowed nodes.
- in vmscan.c, to restrict page recovery to the current cpuset.
In addition a new file system, of type "cpuset" may be mounted,
@@ -182,6 +182,7 @@ containing the following files describin
- mems: list of Memory Nodes in that cpuset
- cpu_exclusive flag: is cpu placement exclusive?
- mem_exclusive flag: is memory placement exclusive?
+ - notify_on_release: call /sbin/cpuset_release_agent on exit if set
- tasks: list of tasks (by pid) attached to that cpuset
New cpusets are created using the mkdir system call or shell
@@ -356,7 +357,8 @@ Now you want to do something with this c
In this directory you can find several files:
# ls
-cpus cpu_exclusive mems mem_exclusive tasks
+cpu_exclusive mem_exclusive notify_on_release
+cpus mems tasks
Reading them will give you information about the state of this cpuset:
the CPUs and Memory Nodes it can use, the processes that are using
Index: linux-2.6-mem_exclusive/include/linux/gfp.h
===================================================================
--- linux-2.6-mem_exclusive.orig/include/linux/gfp.h 2005-07-02 17:40:04.000000000 -0700
+++ linux-2.6-mem_exclusive/include/linux/gfp.h 2005-07-02 17:42:02.000000000 -0700
@@ -39,7 +39,7 @@ struct vm_area_struct;
#define __GFP_COMP 0x4000u /* Add compound page metadata */
#define __GFP_ZERO 0x8000u /* Return zeroed page on success */
#define __GFP_NOMEMALLOC 0x10000u /* Don't use emergency reserves */
-#define __GFP_NORECLAIM 0x20000u /* No realy zone reclaim during allocation */
+#define __GFP_NORECLAIM 0x20000u /* No zone reclaim during page_cache_alloc */
#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
#define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2005-07-11 1:58 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-11 1:58 [PATCH 0/4] cpusets mems_allowed and oom Paul Jackson
2005-07-11 1:58 ` Paul Jackson [this message]
2005-07-11 1:58 ` [PATCH 2/4] cpusets new __GFP_HARDWALL flag Paul Jackson
2005-07-11 1:58 ` [PATCH 3/4] cpusets formalize intermediate GFP_KERNEL containment Paul Jackson
2005-07-11 1:59 ` [PATCH 4/4] cpusets confine oom_killer to mem_exclusive cpuset Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050711015842.23183.53450.sendpatchset@tomahawk.engr.sgi.com \
--to=pj@sgi.com \
--cc=Simon.Derr@bull.net \
--cc=dino@in.ibm.com \
--cc=efocht@hpce.nec.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox