public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Terminate process that fails on a constrained allocation
@ 2006-02-08 18:05 Christoph Lameter
  2006-02-08 18:13 ` Andi Kleen
  2006-02-08 18:33 ` Paul Jackson
  0 siblings, 2 replies; 28+ messages in thread
From: Christoph Lameter @ 2006-02-08 18:05 UTC (permalink / raw)
  To: akpm; +Cc: pj, ak, linux-kernel

Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints). If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior. The process may be killed but then the
sysadmin or developer can investigate the situation. The solution is similar
to what we do when we run out of hugepages.

This patch adds a check before the out of memory killer is invoked. At that
point performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes. If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should not call
the OOM killer.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16-rc2/mm/page_alloc.c
===================================================================
--- linux-2.6.16-rc2.orig/mm/page_alloc.c	2006-02-02 22:03:08.000000000 -0800
+++ linux-2.6.16-rc2/mm/page_alloc.c	2006-02-08 09:55:21.000000000 -0800
@@ -1011,6 +1011,28 @@ rebalance:
 		if (page)
 			goto got_pg;
 
+#ifdef CONFIG_NUMA
+		/*
+		 * In the NUMA case we may have gotten here because the
+		 * memory policies or cpusets have restricted the allocation.
+		 */
+		{
+			nodemask_t nodes;
+
+			nodes_empty(nodes);
+			for (z = zonelist->zones; *z; z++)
+				if (cpuset_zone_allowed(*z, gfp_mask))
+					node_set((*z)->zone_pgdat->node_id,
+							nodes);
+			/*
+			 * If we were only allowed to allocate from
+			 * a subset of nodes then terminate the process.
+			 */
+			if (!nodes_subset(node_online_map, nodes))
+				return NULL;
+		}
+#endif
+
 		out_of_memory(gfp_mask, order);
 		goto restart;
 	}

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: Terminate process that fails on a constrained allocation
@ 2006-02-08 20:14 linux
  0 siblings, 0 replies; 28+ messages in thread
From: linux @ 2006-02-08 20:14 UTC (permalink / raw)
  To: clameter; +Cc: linux-kernel

Would perhaps a less drastic solution, that at least supports the common
partitioned-system configuration, be to limit the oom killer to processes
whose nodes are a *subset* of ours?

That way, a limited process won't kill any unlimited processes, but it
can fight with other processes with the same limits.  However, an
unlimited process discovering oom can kill anything on the system if
necessary.

(This requires a modified version of cpuset_excl_nodes_overlap that
calls nodes_subset() instead of nodes_intersects(), but it's pretty
straightforward.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2006-02-08 23:57 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-08 18:05 Terminate process that fails on a constrained allocation Christoph Lameter
2006-02-08 18:13 ` Andi Kleen
2006-02-08 18:34   ` Paul Jackson
2006-02-08 18:54     ` Christoph Lameter
2006-02-08 19:01       ` Andi Kleen
2006-02-08 19:15         ` Christoph Lameter
2006-02-08 18:33 ` Paul Jackson
2006-02-08 18:42   ` Christoph Lameter
2006-02-08 18:57     ` Paul Jackson
2006-02-08 19:02       ` Christoph Lameter
2006-02-08 19:05       ` Andi Kleen
2006-02-08 20:22         ` Paul Jackson
2006-02-08 20:36           ` Christoph Lameter
2006-02-08 20:55             ` Paul Jackson
2006-02-08 21:01               ` Andi Kleen
2006-02-08 21:03                 ` Paul Jackson
2006-02-08 21:21                   ` Andi Kleen
2006-02-08 21:39                     ` Andrew Morton
2006-02-08 22:11                       ` Christoph Lameter
2006-02-08 22:41                         ` Andi Kleen
2006-02-08 23:29                           ` Christoph Lameter
2006-02-08 23:35                           ` Andrew Morton
2006-02-08 22:48                         ` Christoph Lameter
2006-02-08 23:28                       ` Christoph Lameter
2006-02-08 23:43                         ` Andrew Morton
2006-02-08 23:54                           ` Christoph Lameter
2006-02-08 23:57                             ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2006-02-08 20:14 linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox