public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] mempool-2.5.1-D0
@ 2001-12-14 13:49 Ingo Molnar
  2001-12-14 18:14 ` [patch] mempool-2.5.1-D1 Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2001-12-14 13:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel


the attached patch against -pre11 fixes a possible deadlock pointed out by
Arjan: gfp_nowait needs to exclude __GFP_IO as well, to avoid some of the
deeper deadlocks where the first ->alloc() would generate IO.

	Ingo

--- linux/mm/mempool.c.orig	Fri Dec 14 12:34:08 2001
+++ linux/mm/mempool.c	Fri Dec 14 12:35:53 2001
@@ -185,7 +185,7 @@
 	struct list_head *tmp;
 	int curr_nr;
 	DECLARE_WAITQUEUE(wait, current);
-	int gfp_nowait = gfp_mask & ~__GFP_WAIT;
+	int gfp_nowait = gfp_mask & ~(__GFP_WAIT | __GFP_IO);

 repeat_alloc:
 	element = pool->alloc(gfp_nowait, pool->pool_data);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch] mempool-2.5.1-D1
  2001-12-14 13:49 [patch] mempool-2.5.1-D0 Ingo Molnar
@ 2001-12-14 18:14 ` Ingo Molnar
  2001-12-14 19:13   ` [patch] mempool-2.5.1-D2 Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2001-12-14 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel, Suparna Bhattacharya

[-- Attachment #1: Type: TEXT/PLAIN, Size: 288 bytes --]


there is another thinko in the mempool code, reported by Suparna
Bhattacharya. If mempool_alloc() is called from an IRQ context then we
return too early. The correct behavior is to allocate GFP_ATOMIC, if that
fails then we look at the pool and return an element, or return NULL.

	Ingo

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1883 bytes --]

--- linux/mm/mempool.c.orig	Fri Dec 14 16:55:12 2001
+++ linux/mm/mempool.c	Fri Dec 14 17:03:52 2001
@@ -176,7 +176,8 @@
  *
  * this function only sleeps if the alloc_fn function sleeps or
  * returns NULL. Note that due to preallocation, this function
- * *never* fails.
+ * *never* fails when called from process contexts. (it might
+ * fail if called from an IRQ context.)
  */
 void * mempool_alloc(mempool_t *pool, int gfp_mask)
 {
@@ -185,7 +186,7 @@
 	struct list_head *tmp;
 	int curr_nr;
 	DECLARE_WAITQUEUE(wait, current);
-	int gfp_nowait = gfp_mask & ~__GFP_WAIT;
+	int gfp_nowait = gfp_mask & ~(__GFP_WAIT | __GFP_IO);
 
 repeat_alloc:
 	element = pool->alloc(gfp_nowait, pool->pool_data);
@@ -196,15 +197,11 @@
 	 * If the pool is less than 50% full then try harder
 	 * to allocate an element:
 	 */
-	if (gfp_mask != gfp_nowait) {
-		if (pool->curr_nr <= pool->min_nr/2) {
-			element = pool->alloc(gfp_mask, pool->pool_data);
-			if (likely(element != NULL))
-				return element;
-		}
-	} else
-		/* we must not sleep */
-		return NULL;
+	if ((gfp_mask != gfp_nowait) && (pool->curr_nr <= pool->min_nr/2)) {
+		element = pool->alloc(gfp_mask, pool->pool_data);
+		if (likely(element != NULL))
+			return element;
+	}
 
 	/*
 	 * Kick the VM at this point.
@@ -217,10 +214,12 @@
 		list_del(tmp);
 		element = tmp;
 		pool->curr_nr--;
-		spin_unlock_irqrestore(&pool->lock, flags);
-
-		return element;
+		goto out_unlock;
 	}
+	/* We must not sleep in the GFP_ATOMIC case */
+	if (gfp_mask == gfp_nowait)
+		goto out_unlock;
+
 	add_wait_queue_exclusive(&pool->wait, &wait);
 	set_task_state(current, TASK_UNINTERRUPTIBLE);
 
@@ -236,6 +235,9 @@
 	remove_wait_queue(&pool->wait, &wait);
 
 	goto repeat_alloc;
+out_unlock:
+	spin_unlock_irqrestore(&pool->lock, flags);
+	return element;
 }
 
 /**

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch] mempool-2.5.1-D2
  2001-12-14 18:14 ` [patch] mempool-2.5.1-D1 Ingo Molnar
@ 2001-12-14 19:13   ` Ingo Molnar
  2001-12-14 22:27     ` Benjamin LaHaise
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2001-12-14 19:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel, Andrew Morton

[-- Attachment #1: Type: TEXT/PLAIN, Size: 135 bytes --]


Andrew Morton noticed another bug, run_tasklist() should not be called as
TASK_UNINTERRUPTIBLE. The attached patch fixes this.

	Ingo

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2026 bytes --]

--- linux/mm/mempool.c.orig	Fri Dec 14 16:55:12 2001
+++ linux/mm/mempool.c	Fri Dec 14 18:03:07 2001
@@ -176,7 +176,8 @@
  *
  * this function only sleeps if the alloc_fn function sleeps or
  * returns NULL. Note that due to preallocation, this function
- * *never* fails.
+ * *never* fails when called from process contexts. (it might
+ * fail if called from an IRQ context.)
  */
 void * mempool_alloc(mempool_t *pool, int gfp_mask)
 {
@@ -185,7 +186,7 @@
 	struct list_head *tmp;
 	int curr_nr;
 	DECLARE_WAITQUEUE(wait, current);
-	int gfp_nowait = gfp_mask & ~__GFP_WAIT;
+	int gfp_nowait = gfp_mask & ~(__GFP_WAIT | __GFP_IO);
 
 repeat_alloc:
 	element = pool->alloc(gfp_nowait, pool->pool_data);
@@ -196,15 +197,11 @@
 	 * If the pool is less than 50% full then try harder
 	 * to allocate an element:
 	 */
-	if (gfp_mask != gfp_nowait) {
-		if (pool->curr_nr <= pool->min_nr/2) {
-			element = pool->alloc(gfp_mask, pool->pool_data);
-			if (likely(element != NULL))
-				return element;
-		}
-	} else
-		/* we must not sleep */
-		return NULL;
+	if ((gfp_mask != gfp_nowait) && (pool->curr_nr <= pool->min_nr/2)) {
+		element = pool->alloc(gfp_mask, pool->pool_data);
+		if (likely(element != NULL))
+			return element;
+	}
 
 	/*
 	 * Kick the VM at this point.
@@ -218,19 +215,25 @@
 		element = tmp;
 		pool->curr_nr--;
 		spin_unlock_irqrestore(&pool->lock, flags);
-
 		return element;
 	}
+	spin_unlock_irqrestore(&pool->lock, flags);
+
+	/* We must not sleep in the GFP_ATOMIC case */
+	if (gfp_mask == gfp_nowait)
+		return NULL;
+
+	run_task_queue(&tq_disk);
+
 	add_wait_queue_exclusive(&pool->wait, &wait);
 	set_task_state(current, TASK_UNINTERRUPTIBLE);
 
+	spin_lock_irqsave(&pool->lock, flags);
 	curr_nr = pool->curr_nr;
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-	if (!curr_nr) {
-		run_task_queue(&tq_disk);
+	if (!curr_nr)
 		schedule();
-	}
 
 	current->state = TASK_RUNNING;
 	remove_wait_queue(&pool->wait, &wait);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-14 19:13   ` [patch] mempool-2.5.1-D2 Ingo Molnar
@ 2001-12-14 22:27     ` Benjamin LaHaise
  2001-12-15  6:41       ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin LaHaise @ 2001-12-14 22:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Fri, Dec 14, 2001 at 08:13:49PM +0100, Ingo Molnar wrote:
> 
> Andrew Morton noticed another bug, run_tasklist() should not be called as
> TASK_UNINTERRUPTIBLE. The attached patch fixes this.

Btw, wouldn't reservation result in the same effect as these mempools for 
significantly less code?

		-ben

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-15  6:41       ` Ingo Molnar
@ 2001-12-15  5:29         ` Benjamin LaHaise
  2001-12-15 17:50         ` Stephan von Krawczynski
  2001-12-18  0:46         ` Pavel Machek
  2 siblings, 0 replies; 15+ messages in thread
From: Benjamin LaHaise @ 2001-12-15  5:29 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Sat, Dec 15, 2001 at 07:41:12AM +0100, Ingo Molnar wrote:
> exactly what kind of SLAB based reservation system do you have in mind?
> (what interface, how would it work, etc.) Take a look at how bio.c,
> highmem.c and raid1.c uses the mempool mechanizm, the main properties of
> mempool cannot be expressed via SLAB reservation:
> 
>  - mempool allows the use of non-SLAB allocators as the underlying
>    allocator. (eg. the highmem.c mempool uses the page allocator to alloc
>    lowmem pages. raid1.c uses 4 allocators: kmalloc(), page_alloc(),
>    bio_alloc() and mempool_alloc() of a different pool.)

That's of dubious value.  Personally, I think there should be two 
allocators: slab and page allocs.  Anything beyond that seems to be 
duplicating functionality.

>  - mempool_alloc(), if called from a process context, never fails. This
>    simplifies lowlevel IO code (which often must not fail) visibly.

Arguably, the same should be possible with normal allocations.  Btw, if 
you looked at the page level reservations, it did so but still left the 
details of how that memory is reserved up to the vm.  The idea behind that 
is to reserve memory, but still allow clean and immediately reclaimable 
pages to populate the memory until it is allocate (many reservations will 
never be touched as they're worst case journal/whatnot protection).  Plus, 
with a reservation in hand, an allocation from that reservation will 
never fail.

>  - mempool handles allocation in a more deadlock-avoidance-aware way than
>    a normal allocator would do:
> 
>         - first it ->alloc()'s atomically

Great.  Function calls through pointers are really not a good idea on 
modern cpus.

>         - then it tries to take from the pool if the pool is at least
>           half full
>         - then it ->alloc()'s non-atomically
>         - then it takes from the pool if it's non-empty
>         - then it waits for pool elements to be freed

Oh dear.  Another set of vm logic that has to be kept in sync with the 
behaviour of the slab, alloc_pages and try_to_free_pages.  We're already 
failing to keep alloc_pages deadlock free; how can you be certain that 
this arbitary "half full pool" condition is not going to cause deadlocks 
for $random_arbitary_driver?

>    this makes for five different levels of allocation, ordered for
>    performance and blocking-avoidance, while still kicking the VM and
>    trying as hard as possible if there is a resource squeeze. In the
>    normal case we never touch the mempool spinlocks, we just call
>    ->alloc() and if the core allocator does per-CPU caching then we'll
>    have the exact same high level of scalability as the underlying
>    allocator.

Again, this is duplicating functionality that doesn't need to be.  The 
1 additional branch for the uncommon case that reservations adds is 
far, far cheaper and easier to understand.

>  - mempool adds reservation without increasing the complexity of the
>    underlying allocators.

This is where my basic disagreement with the approach comes from.  As 
I see it, all of the logic that mempools adds is already present in the 
current system (or at the very least should be).  To give you some insight 
to how I think reservations should work and how they can simplify code in 
the current allocators, take the case of an ordinary memory allocation of 
a single page.  Quite simply, if there are no immediately free pages, we 
need to wait for another page to be returned to the free pool (this is 
identical to the logic you added in mempool that prevents a pool from 
failing an allocation).  Right now, memory allocations can fail because 
we allow ourselves to grossly overcommit memory usage.  That you're adding 
mempool to patch over that behaviour, is *wrong*, imo.  The correct way to 
fix this is to make the underlying allocator behave properly: the system 
has enough information at the time of the initial allocation to 
deterministically say "yes, the vm will be able to allocate this page" or 
"no, i have to wait until another user frees up memory".  Yes, you can 
argue that we don't currently keep all the necessary statistics on hand 
to make this determination, but that's a small matter of programming.

The above is looks like a bit more of a rant than I'd meant to write, but 
I think the current allocator is broken and in need of fixing, and once 
fixed there should be no need for yet another layer on top of it.

		-ben
-- 
Fish.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-14 22:27     ` Benjamin LaHaise
@ 2001-12-15  6:41       ` Ingo Molnar
  2001-12-15  5:29         ` Benjamin LaHaise
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ingo Molnar @ 2001-12-15  6:41 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel


On Fri, 14 Dec 2001, Benjamin LaHaise wrote:

> Btw, wouldn't reservation result in the same effect as these mempools
> for significantly less code?

exactly what kind of SLAB based reservation system do you have in mind?
(what interface, how would it work, etc.) Take a look at how bio.c,
highmem.c and raid1.c uses the mempool mechanizm, the main properties of
mempool cannot be expressed via SLAB reservation:

 - mempool allows the use of non-SLAB allocators as the underlying
   allocator. (eg. the highmem.c mempool uses the page allocator to alloc
   lowmem pages. raid1.c uses 4 allocators: kmalloc(), page_alloc(),
   bio_alloc() and mempool_alloc() of a different pool.)

 - mempool_alloc(), if called from a process context, never fails. This
   simplifies lowlevel IO code (which often must not fail) visibly.

 - mempool allows the pooling of arbitrarily complex memory buffers, not
   just a single SLAB buffer. (eg. the raid1.c resync pool uses a
   combination of alloc_mempool(), bio_alloc() and multiple page_alloc()
   buffers. This is also a performance enhancement for raid1.c.)

 - mempool handles allocation in a more deadlock-avoidance-aware way than
   a normal allocator would do:

        - first it ->alloc()'s atomically
        - then it tries to take from the pool if the pool is at least
          half full
        - then it ->alloc()'s non-atomically
        - then it takes from the pool if it's non-empty
        - then it waits for pool elements to be freed

   this makes for five different levels of allocation, ordered for
   performance and blocking-avoidance, while still kicking the VM and
   trying as hard as possible if there is a resource squeeze. In the
   normal case we never touch the mempool spinlocks, we just call
   ->alloc() and if the core allocator does per-CPU caching then we'll
   have the exact same high level of scalability as the underlying
   allocator.

 - mempool adds reservation without increasing the complexity of the
   underlying allocators.

	Ingo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-15  6:41       ` Ingo Molnar
  2001-12-15  5:29         ` Benjamin LaHaise
@ 2001-12-15 17:50         ` Stephan von Krawczynski
  2001-12-18  0:46         ` Pavel Machek
  2 siblings, 0 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2001-12-15 17:50 UTC (permalink / raw)
  To: mingo; +Cc: bcrl, linux-kernel

On Sat, 15 Dec 2001 07:41:12 +0100 (CET)
Ingo Molnar <mingo@elte.hu> wrote:

>  - mempool_alloc(), if called from a process context, never fails. This
>    simplifies lowlevel IO code (which often must not fail) visibly.

Uh, do you trust your own word? This already sounds like an upcoming deadlock
to me _now_. I saw a lot of try-and-error during the last month regarding
exactly this point. There have been VM-days where allocs didn't really fail
(set with right flags), but didn't come back either. And exactly this was the
reason why the stuff was _broken_. Obviously no process can wait a indefinitely
long time to get its alloc fulfilled. And there are conditions under heavy load
where this cannot be met, and you will see complete stall.

In fact I pretty much agree to Ben's thesis that the current allocator has a
problem. I would not call it broken, but it cannot present the ad-hoc answer to
one (_the_) important question: what is the correct cache page to drop _now_
when resources get low and I have to successfully return an allocation?
This is _the_ central issue that must be solved in a VM with such a tremendous
page caching going on like we have now. And really important is the fact the
answer must be presentable ad-hoc. If you have to loop around, wait for I/O or
whatever, then the basic design is already sub-optimal. 
Looking at your mempool-ideas one cannot fight the impression that you try to
"patch" around a deficiency of the current code. This cannot be the right thing
to do.

Regards,
Stephan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
@ 2001-12-15 22:17 Ingo Molnar
  2001-12-17 16:19 ` Stephan von Krawczynski
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2001-12-15 22:17 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: bcrl, linux-kernel


On Sat, 15 Dec 2001, Stephan von Krawczynski wrote:

> >  - mempool_alloc(), if called from a process context, never fails. This
> >    simplifies lowlevel IO code (which often must not fail) visibly.
>
> Uh, do you trust your own word? This already sounds like an upcoming
> deadlock to me _now_. [...]

please check it out how it works. It's not done by 'loop forever until
some allocation succeeds'. It's done by FIFO queueing for pool elements
that are guaranteed to be freed after some reasonable timeout. (and there
is no other freeing path that might leak the elements.)

> [...] I saw a lot of try-and-error during the last month regarding
> exactly this point. There have been VM-days where allocs didn't really
> fail (set with right flags), but didn't come back either. [...]

hm, iirc, the code was just re-trying the allocation infinitely (while
sleeping on kswapd_wait).

> [...] And exactly this was the reason why the stuff was _broken_.
> Obviously no process can wait a indefinitely long time to get its
> alloc fulfilled. And there are conditions under heavy load where this
> cannot be met, and you will see complete stall.

this is the problem with doing this in the (current) page allocator:
allocation and freeing of pages is done by every process, so the real ones
that need those pages for deadlock avoidance are starved. Identifying
reserved pools and creating closed circuits of allocation/freeing
relations solves this problem - 'outsiders' cannot 'steal' from the
reserve. In addition, creating pools of composite structures helps as well
in cases where multiple allocations are needed to start a guaranteed
freeing operation.

mempool moves deadlock avoidance to a different, and explicit level. If
everything uses mempools then the normal allocators (the page allocator)
can remove all their reserved pools and deadlock-avoidance code.

> [...] Looking at your mempool-ideas one cannot fight the impression
> that you try to "patch" around a deficiency of the current code. This
> cannot be the right thing to do.

to the contrary - i'm not 'patching around' any deficiency, i'm removing
the need to put deadlock avoidance into the page allocator. But in this
transitional period of time the 'old code' still stays around for a while.
If you look at Ben's patch you'll see the same kind of dualness - until a
mechanizm is fully used things like that are unavoidable.

	Ingo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-15 22:17 Ingo Molnar
@ 2001-12-17 16:19 ` Stephan von Krawczynski
  2001-12-17 20:56   ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Stephan von Krawczynski @ 2001-12-17 16:19 UTC (permalink / raw)
  To: mingo; +Cc: bcrl, linux-kernel

On Sat, 15 Dec 2001 23:17:56 +0100 (CET)
Ingo Molnar <mingo@elte.hu> wrote:

> 
> On Sat, 15 Dec 2001, Stephan von Krawczynski wrote:
> 
> > >  - mempool_alloc(), if called from a process context, never fails. This
> > >    simplifies lowlevel IO code (which often must not fail) visibly.
> >
> > Uh, do you trust your own word? This already sounds like an upcoming
> > deadlock to me _now_. [...]
> 
> please check it out how it works. It's not done by 'loop forever until
> some allocation succeeds'. It's done by FIFO queueing for pool elements
> that are guaranteed to be freed after some reasonable timeout. (and there
> is no other freeing path that might leak the elements.)

This is like solving a problem by not looking onto it. You will obviously _not_
shoot down allocated and still used bios, no matter how long they are going to
take. So your fixed size pool will run out in certain (maybe weird) conditions.
If you cannot resize (alloc additional mem from standard VM) you are just dead.
Look at it from a different point of view: its basically all the same. Standard
VM has a limited resource and tries to give it away in an intelligent way.
Mempool does the same thing - for a smaller sized environment. But that is per
se no gain. And just as Andrea pointed out, the not-used part of the resources
is just plain wasted - though he thinks this is _good_ because it is simpler in
design and implementation. But on the other hand, you could just do it vice
versa: don't make the mempools, make a cache-pool. VM handles memory, and we
are using a fixed size (but resizeable) mem block as pure pool for page cache.
every page that is somehow locked down (iow _used_, and not simply cached) is
pulled away/out from the cache-pool. cache pool ages (hello rik :-), but stays
the same size. You end up with _lots_ of _free_ mem in normal loads and
acceptable performance. This is no good design. But it doesn't need to answer
the question, what pages to expell under pressure, because per definition in
this design there is nothing to expell/drop. when mem gets low it really _is_
low, because your applications ate it all up. The current design cannot answer
this question correctly, because I must not be able to see allocation failures
in a box of 1 GB ram and very few used applications - and a hugh page cache.
But they are there. So there is a problem, probably in the implementation of a
working design. The answer "drivers must be able to cope with failing allocs"
is WRONG WRONG WRONG. They should not oops, ok, but they cannot stand such a
situation, you will always loose something (probably data). Your good points in
mempool usage will all be the simple fact, that there is a memory reserve that
is not touched by the page cache. There are about 29 ways to achieve this same
goal - and most of them are a lot more straight forward and require less
changes in the rest-kernel.

Please _solve_ the problem, do not _spread_ it.

Regards,
Stephan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-17 20:56   ` Ingo Molnar
@ 2001-12-17 20:44     ` Benjamin LaHaise
  2001-12-17 23:57     ` Stephan von Krawczynski
  1 sibling, 0 replies; 15+ messages in thread
From: Benjamin LaHaise @ 2001-12-17 20:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephan von Krawczynski, linux-kernel

On Mon, Dec 17, 2001 at 09:56:07PM +0100, Ingo Molnar wrote:
> sure, the pool will run out under heavy VM load. Will it stay empty
> forever? Nope, because all mempool users are *required* to deallocate the
> buffer after some (reasonable) timeout. (such as IO latency.) This is
> pretty much by definition. (Sure there might be weird cases like IO
> failure timeouts, but sooner or later the buffer will be returned, and it
> will be reused.)

loop.  deadlock.  kmap.  deadlock.  You've got a lot of code to fix before 
this statement is remotely true.

> (by the way, this is true for every other reservation solution as well,
> just look at the patches. You wont resize on the fly whenever there is
> shortage - thats the problem with shortages, there just wont be more RAM.
> If anyone uses reserved pools and doesnt release those buffers then we are
> deadlocked. Memory reserves *must not* be used as a kmalloc pool. Doing
> that can be considered an advanced form of a 'memory leak'.)

Absolutely.  That's why I think we should at least do some work on design 
of the code so that we have an idea of what the pitfalls are, plus 
documentation before putting it into the kernel.

		-ben
-- 
Fish.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-17 16:19 ` Stephan von Krawczynski
@ 2001-12-17 20:56   ` Ingo Molnar
  2001-12-17 20:44     ` Benjamin LaHaise
  2001-12-17 23:57     ` Stephan von Krawczynski
  0 siblings, 2 replies; 15+ messages in thread
From: Ingo Molnar @ 2001-12-17 20:56 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: bcrl, linux-kernel


On Mon, 17 Dec 2001, Stephan von Krawczynski wrote:

> [...] You will obviously _not_ shoot down allocated and still used
> bios, no matter how long they are going to take. So your fixed size
> pool will run out in certain (maybe weird) conditions. If you cannot
> resize (alloc additional mem from standard VM) you are just dead.

sure, the pool will run out under heavy VM load. Will it stay empty
forever? Nope, because all mempool users are *required* to deallocate the
buffer after some (reasonable) timeout. (such as IO latency.) This is
pretty much by definition. (Sure there might be weird cases like IO
failure timeouts, but sooner or later the buffer will be returned, and it
will be reused.)

(by the way, this is true for every other reservation solution as well,
just look at the patches. You wont resize on the fly whenever there is
shortage - thats the problem with shortages, there just wont be more RAM.
If anyone uses reserved pools and doesnt release those buffers then we are
deadlocked. Memory reserves *must not* be used as a kmalloc pool. Doing
that can be considered an advanced form of a 'memory leak'.)

(and there is mempool_resize() if some aspect of the device is changed.)

	Ingo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-17 20:56   ` Ingo Molnar
  2001-12-17 20:44     ` Benjamin LaHaise
@ 2001-12-17 23:57     ` Stephan von Krawczynski
  2001-12-18 16:43       ` Ingo Molnar
  1 sibling, 1 reply; 15+ messages in thread
From: Stephan von Krawczynski @ 2001-12-17 23:57 UTC (permalink / raw)
  To: mingo; +Cc: bcrl, linux-kernel

>                                                                     
> On Mon, 17 Dec 2001, Stephan von Krawczynski wrote:                 
>                                                                     
> > [...] You will obviously _not_ shoot down allocated and still used
> > bios, no matter how long they are going to take. So your fixed    
size                                                                  
> > pool will run out in certain (maybe weird) conditions. If you     
cannot                                                                
> > resize (alloc additional mem from standard VM) you are just dead. 
>                                                                     
> sure, the pool will run out under heavy VM load. Will it stay empty 
> forever? Nope, because all mempool users are *required* to          
deallocate the                                                        
> buffer after some (reasonable) timeout. (such as IO latency.) This  
is                                                                    
> pretty much by definition. (Sure there might be weird cases like IO 
> failure timeouts, but sooner or later the buffer will be returned,  
and it                                                                
> will be reused.)                                                    
                                                                      
Hm, and where is the real-world-difference to standard VM? I mean     
today your bad-ass application gets shot down by L's oom-killer and   
your VM will "refill". So you're not going to die for long in the     
current situation either.                                             
I have yet to see the brilliance in mempools. I mean, for sure I can  
imagine systems that are going to like it (e.g. embedded) a _lot_. But
these are far off the "standard" system profile.                      
I asked this several times now, and I will continue to, where is the  
VM _design_ guru that explains the designed short path to drop page   
caches when in need of allocable mem, regarding a system with         
aggressive caching like 2.4? This _must_ exist. If it does not, the   
whole issue is broken, and it is obvious that nobody will ever find an
acceptable implementation.                                            
I turned this problem about a hundred times round now, and as far as I
can see everything comes down to the simple fact, that VM has to      
_know_ the difference between a only-cached page and a _really-used_  
one. And I do agree with Rik, that the only-cached pages need an aging
algorithm, probably a most-simple approach (could be list-ordering).  
This should answer the question: who's dropped next?                  
On the other hand you have aging in the used-pages for finding out    
who's swapped out next. BUT I would say that swapping should only     
happen when only-cached pages are down to a minimum level (like 5% of 
memtotal).                                                            
Forgive my simplistic approach, where are the guys to shoot me?       
And where the hell is the need for mempool in this rough design idea? 
                                                                      
Regards,                                                              
Stephan                                                               
                                                                      

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-15  6:41       ` Ingo Molnar
  2001-12-15  5:29         ` Benjamin LaHaise
  2001-12-15 17:50         ` Stephan von Krawczynski
@ 2001-12-18  0:46         ` Pavel Machek
  2 siblings, 0 replies; 15+ messages in thread
From: Pavel Machek @ 2001-12-18  0:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Benjamin LaHaise, linux-kernel

Hi!

>  - mempool_alloc(), if called from a process context, never fails. This
>    simplifies lowlevel IO code (which often must not fail) visibly.

Really? I do not see how you can guarantee this on machine with finite
ammount of memory.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-18 16:43       ` Ingo Molnar
@ 2001-12-18 15:36         ` Stephan von Krawczynski
  0 siblings, 0 replies; 15+ messages in thread
From: Stephan von Krawczynski @ 2001-12-18 15:36 UTC (permalink / raw)
  To: mingo; +Cc: bcrl, linux-kernel

On Tue, 18 Dec 2001 17:43:01 +0100 (CET)
Ingo Molnar <mingo@elte.hu> wrote:

> 
> On Tue, 18 Dec 2001, Stephan von Krawczynski wrote:
> 
> > Hm, and where is the real-world-difference to standard VM? I mean
> > today your bad-ass application gets shot down by L's oom-killer and
> > your VM will "refill". So you're not going to die for long in the
> > current situation either. [...]
> 
> Think of the following trivial case: 'the whole system is full of dirty
> pagecache pages, the rest is kmalloc()ed somewhere'. Nothing to oom,
> nothing to kill, plenty of swap left and no RAM. And besides, in this
> situation, oom is the worst possible answer, the application getting
> oom-ed is not at fault in this case.

You are right that this is a broken situation. Now your answer is a _specific_
patch: you say "let nice people using mempools survive (and fuck the rest
(implicit))". You do not solve the problem, you drive _around_ it for _certain_
VM users (the mempool guys). This is _not_ the correct answer to the situation.
In my eyes "correct" would mean: what is the reason for the pagecache (dirty)
being able to eat up all my mem (which may be _plenty_). Something is wrong
with the design then, remember the basics:(very meta this one :-) 
object vm {
	pool with free mem
	pool with cached-only mem
	pool with dirty-cache mem (your naming)
	...
	func cached_to_dirty_mem
	func dirty_to_cached_mem
	func drop_cached_mem
	func alloc_cached_mem
	...
}

Your problem obviously is that the function moving pages from dirty_to_cached
(meaning not dirty) is either not existing or not working and thats why your
situation is broken (there is nothing that can be dropped, meaning pool with
cached-only is empty and cannot be filled from dirty-cache). If it is not
existing, then the basic design relies on _external_ undefines (VM users
dropping the dirty pages themselves at undefined time and  order) and is
therefore provable incomplete and broken for all management cases with limited
resources (like mem). You cannot really intend to save the broken state by
saying "let it break, but not for mempool". As long as you cannot create a
closed circle in VM you will break. But even inside mempools you fight the same
problem. You have to rely on mempool-users giving back the resources early
enough to handle the new requests. If the timed-circle  doesn't work out as
expected you increase your mempool (resize). It's all the same. This is exactly
like a VM situation without (or without agressive) page-cache. You upgrade RAM
if you ran out. This analogy brings up the real and simple cure inside your
design: you made the page-cache go away from the mempools. This would obviously
cure VM, too. But it would seriously hit performance, so its a nono.

The really working example of limited-resources-management inside linux is the
scheduler. There you have "users" (processes) that work or not, and when there
is "no work" (e.g. idle), you may very well run niced-processes (in
simplification _one_) eating up the "rest" of the resources to make something
out of it. But if a "real" user comes in and wants resources, the nice one will
go away. It is a complete design.

In VM the page-cache should be a special case "nice" user. It can use all
available resources, but has to vanish if someone really needs them. And this
is currently _not_ solved, incomplete, and therefore contains big black holes,
like your described situation.

Regards,
Stephan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] mempool-2.5.1-D2
  2001-12-17 23:57     ` Stephan von Krawczynski
@ 2001-12-18 16:43       ` Ingo Molnar
  2001-12-18 15:36         ` Stephan von Krawczynski
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2001-12-18 16:43 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: bcrl, linux-kernel


On Tue, 18 Dec 2001, Stephan von Krawczynski wrote:

> Hm, and where is the real-world-difference to standard VM? I mean
> today your bad-ass application gets shot down by L's oom-killer and
> your VM will "refill". So you're not going to die for long in the
> current situation either. [...]

Think of the following trivial case: 'the whole system is full of dirty
pagecache pages, the rest is kmalloc()ed somewhere'. Nothing to oom,
nothing to kill, plenty of swap left and no RAM. And besides, in this
situation, oom is the worst possible answer, the application getting
oom-ed is not at fault in this case.

	Ingo


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2001-12-21 19:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-14 13:49 [patch] mempool-2.5.1-D0 Ingo Molnar
2001-12-14 18:14 ` [patch] mempool-2.5.1-D1 Ingo Molnar
2001-12-14 19:13   ` [patch] mempool-2.5.1-D2 Ingo Molnar
2001-12-14 22:27     ` Benjamin LaHaise
2001-12-15  6:41       ` Ingo Molnar
2001-12-15  5:29         ` Benjamin LaHaise
2001-12-15 17:50         ` Stephan von Krawczynski
2001-12-18  0:46         ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2001-12-15 22:17 Ingo Molnar
2001-12-17 16:19 ` Stephan von Krawczynski
2001-12-17 20:56   ` Ingo Molnar
2001-12-17 20:44     ` Benjamin LaHaise
2001-12-17 23:57     ` Stephan von Krawczynski
2001-12-18 16:43       ` Ingo Molnar
2001-12-18 15:36         ` Stephan von Krawczynski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox