public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [patch] mempool-2.5.1-D2
@ 2001-12-15 22:17 Ingo Molnar
  2001-12-17 16:19 ` Stephan von Krawczynski
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2001-12-15 22:17 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: bcrl, linux-kernel


On Sat, 15 Dec 2001, Stephan von Krawczynski wrote:

> >  - mempool_alloc(), if called from a process context, never fails. This
> >    simplifies lowlevel IO code (which often must not fail) visibly.
>
> Uh, do you trust your own word? This already sounds like an upcoming
> deadlock to me _now_. [...]

please check it out how it works. It's not done by 'loop forever until
some allocation succeeds'. It's done by FIFO queueing for pool elements
that are guaranteed to be freed after some reasonable timeout. (and there
is no other freeing path that might leak the elements.)

> [...] I saw a lot of try-and-error during the last month regarding
> exactly this point. There have been VM-days where allocs didn't really
> fail (set with right flags), but didn't come back either. [...]

hm, iirc, the code was just re-trying the allocation infinitely (while
sleeping on kswapd_wait).

> [...] And exactly this was the reason why the stuff was _broken_.
> Obviously no process can wait a indefinitely long time to get its
> alloc fulfilled. And there are conditions under heavy load where this
> cannot be met, and you will see complete stall.

this is the problem with doing this in the (current) page allocator:
allocation and freeing of pages is done by every process, so the real ones
that need those pages for deadlock avoidance are starved. Identifying
reserved pools and creating closed circuits of allocation/freeing
relations solves this problem - 'outsiders' cannot 'steal' from the
reserve. In addition, creating pools of composite structures helps as well
in cases where multiple allocations are needed to start a guaranteed
freeing operation.

mempool moves deadlock avoidance to a different, and explicit level. If
everything uses mempools then the normal allocators (the page allocator)
can remove all their reserved pools and deadlock-avoidance code.

> [...] Looking at your mempool-ideas one cannot fight the impression
> that you try to "patch" around a deficiency of the current code. This
> cannot be the right thing to do.

to the contrary - i'm not 'patching around' any deficiency, i'm removing
the need to put deadlock avoidance into the page allocator. But in this
transitional period of time the 'old code' still stays around for a while.
If you look at Ben's patch you'll see the same kind of dualness - until a
mechanizm is fully used things like that are unavoidable.

	Ingo


^ permalink raw reply	[flat|nested] 13+ messages in thread
* [patch] mempool-2.5.1-D1
@ 2001-12-14 18:14 Ingo Molnar
  2001-12-14 19:13 ` [patch] mempool-2.5.1-D2 Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2001-12-14 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel, Suparna Bhattacharya

[-- Attachment #1: Type: TEXT/PLAIN, Size: 288 bytes --]


there is another thinko in the mempool code, reported by Suparna
Bhattacharya. If mempool_alloc() is called from an IRQ context then we
return too early. The correct behavior is to allocate GFP_ATOMIC, if that
fails then we look at the pool and return an element, or return NULL.

	Ingo

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1883 bytes --]

--- linux/mm/mempool.c.orig	Fri Dec 14 16:55:12 2001
+++ linux/mm/mempool.c	Fri Dec 14 17:03:52 2001
@@ -176,7 +176,8 @@
  *
  * this function only sleeps if the alloc_fn function sleeps or
  * returns NULL. Note that due to preallocation, this function
- * *never* fails.
+ * *never* fails when called from process contexts. (it might
+ * fail if called from an IRQ context.)
  */
 void * mempool_alloc(mempool_t *pool, int gfp_mask)
 {
@@ -185,7 +186,7 @@
 	struct list_head *tmp;
 	int curr_nr;
 	DECLARE_WAITQUEUE(wait, current);
-	int gfp_nowait = gfp_mask & ~__GFP_WAIT;
+	int gfp_nowait = gfp_mask & ~(__GFP_WAIT | __GFP_IO);
 
 repeat_alloc:
 	element = pool->alloc(gfp_nowait, pool->pool_data);
@@ -196,15 +197,11 @@
 	 * If the pool is less than 50% full then try harder
 	 * to allocate an element:
 	 */
-	if (gfp_mask != gfp_nowait) {
-		if (pool->curr_nr <= pool->min_nr/2) {
-			element = pool->alloc(gfp_mask, pool->pool_data);
-			if (likely(element != NULL))
-				return element;
-		}
-	} else
-		/* we must not sleep */
-		return NULL;
+	if ((gfp_mask != gfp_nowait) && (pool->curr_nr <= pool->min_nr/2)) {
+		element = pool->alloc(gfp_mask, pool->pool_data);
+		if (likely(element != NULL))
+			return element;
+	}
 
 	/*
 	 * Kick the VM at this point.
@@ -217,10 +214,12 @@
 		list_del(tmp);
 		element = tmp;
 		pool->curr_nr--;
-		spin_unlock_irqrestore(&pool->lock, flags);
-
-		return element;
+		goto out_unlock;
 	}
+	/* We must not sleep in the GFP_ATOMIC case */
+	if (gfp_mask == gfp_nowait)
+		goto out_unlock;
+
 	add_wait_queue_exclusive(&pool->wait, &wait);
 	set_task_state(current, TASK_UNINTERRUPTIBLE);
 
@@ -236,6 +235,9 @@
 	remove_wait_queue(&pool->wait, &wait);
 
 	goto repeat_alloc;
+out_unlock:
+	spin_unlock_irqrestore(&pool->lock, flags);
+	return element;
 }
 
 /**

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2001-12-21 19:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-15 22:17 [patch] mempool-2.5.1-D2 Ingo Molnar
2001-12-17 16:19 ` Stephan von Krawczynski
2001-12-17 20:56   ` Ingo Molnar
2001-12-17 20:44     ` Benjamin LaHaise
2001-12-17 23:57     ` Stephan von Krawczynski
2001-12-18 16:43       ` Ingo Molnar
2001-12-18 15:36         ` Stephan von Krawczynski
  -- strict thread matches above, loose matches on Subject: below --
2001-12-14 18:14 [patch] mempool-2.5.1-D1 Ingo Molnar
2001-12-14 19:13 ` [patch] mempool-2.5.1-D2 Ingo Molnar
2001-12-14 22:27   ` Benjamin LaHaise
2001-12-15  6:41     ` Ingo Molnar
2001-12-15  5:29       ` Benjamin LaHaise
2001-12-15 17:50       ` Stephan von Krawczynski
2001-12-18  0:46       ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox