All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Dobson <colpatch@us.ibm.com>
To: Christoph Lameter <clameter@engr.sgi.com>
Cc: linux-kernel@vger.kernel.org, sri@us.ibm.com, andrea@suse.de,
	pavel@suse.cz, linux-mm@kvack.org
Subject: Re: [patch 3/9] mempool - Make mempools NUMA aware
Date: Thu, 26 Jan 2006 16:15:47 -0800	[thread overview]
Message-ID: <43D96633.4080900@us.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.62.0601261525570.18810@schroedinger.engr.sgi.com>

Christoph Lameter wrote:
> On Thu, 26 Jan 2006, Matthew Dobson wrote:
> 
> 
>>alloc_pages_node() does not guarantee allocation on a specific node, but
>>calling __alloc_pages() with a specific nodelist would.
> 
> 
> True but you have emergency *_node function that do not take nodelists.

Agreed.


>>>There is no way that you would need this patch.
>>
>>My goal was to not change the behavior of the slab allocator when inserting
>>a mempool-backed allocator "under" it.  Without support for at least
>>*requesting* allocations from a specific node when allocating from a
>>mempool, this would change how the slab allocator works.  That would be
>>bad.  The slab allocator now does not guarantee that, for example, a
>>kmalloc_node() request is satisfied by memory from the requested node, but
>>it does at least TRY.  Without adding mempool_alloc_node() then I would
>>never be able to even TRY to satisfy a mempool-backed kmalloc_node()
>>request from the correct node.  I believe that would constitute an
>>unacceptable breakage from normal, documented behavior.  So, I *do* need
>>this patch.
> 
> 
> If you get to the emergency lists then you are already in a tight memory 
> situation. In that situation it does not make sense to worry about the 
> node number the memory is coming from. kmalloc_node is just a kmalloc with 
> an indication of a preference of where the memory should be coming from. 
> The node locality only influences performance and not correctness.
> 
> There is no change to the way the slab allocator works. Just drop the 
> *_node variants.

If you look more carefully at how the emergency mempools are used, I think
you'll better understand why I did this:

Look at patch 9/9, specficially the changes to kmem_getpages():

-	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+	/*
+	 * If this allocation request isn't backed by a memory pool, or if that
+	 * memory pool's gfporder is not the same as the cache's gfporder, fall
+	 * back to alloc_pages_node().
+	 */
+	if (!pool || cachep->gfporder != (int)pool->pool_data)
+		page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+	else
+		page = mempool_alloc_node(pool, flags, nodeid);

Allocations backed by a mempool must always be allocated via
mempool_alloc() (or mempool_alloc_node() in this case).  What that means
is, without a mempool_alloc_node() function, NO mempool backed allocations
will be able to request a specific node, even when the system has PLENTY of
memory!  This, IMO, is unacceptable.  Adding more NUMA-awareness to the
mempool system allows us to keep the same slab behavior as before, as well
as leaving us free to ignore the node requests when memory is low.

-Matt

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Dobson <colpatch@us.ibm.com>
To: Christoph Lameter <clameter@engr.sgi.com>
Cc: linux-kernel@vger.kernel.org, sri@us.ibm.com, andrea@suse.de,
	pavel@suse.cz, linux-mm@kvack.org
Subject: Re: [patch 3/9] mempool - Make mempools NUMA aware
Date: Thu, 26 Jan 2006 16:15:47 -0800	[thread overview]
Message-ID: <43D96633.4080900@us.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.62.0601261525570.18810@schroedinger.engr.sgi.com>

Christoph Lameter wrote:
> On Thu, 26 Jan 2006, Matthew Dobson wrote:
> 
> 
>>alloc_pages_node() does not guarantee allocation on a specific node, but
>>calling __alloc_pages() with a specific nodelist would.
> 
> 
> True but you have emergency *_node function that do not take nodelists.

Agreed.


>>>There is no way that you would need this patch.
>>
>>My goal was to not change the behavior of the slab allocator when inserting
>>a mempool-backed allocator "under" it.  Without support for at least
>>*requesting* allocations from a specific node when allocating from a
>>mempool, this would change how the slab allocator works.  That would be
>>bad.  The slab allocator now does not guarantee that, for example, a
>>kmalloc_node() request is satisfied by memory from the requested node, but
>>it does at least TRY.  Without adding mempool_alloc_node() then I would
>>never be able to even TRY to satisfy a mempool-backed kmalloc_node()
>>request from the correct node.  I believe that would constitute an
>>unacceptable breakage from normal, documented behavior.  So, I *do* need
>>this patch.
> 
> 
> If you get to the emergency lists then you are already in a tight memory 
> situation. In that situation it does not make sense to worry about the 
> node number the memory is coming from. kmalloc_node is just a kmalloc with 
> an indication of a preference of where the memory should be coming from. 
> The node locality only influences performance and not correctness.
> 
> There is no change to the way the slab allocator works. Just drop the 
> *_node variants.

If you look more carefully at how the emergency mempools are used, I think
you'll better understand why I did this:

Look at patch 9/9, specficially the changes to kmem_getpages():

-	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+	/*
+	 * If this allocation request isn't backed by a memory pool, or if that
+	 * memory pool's gfporder is not the same as the cache's gfporder, fall
+	 * back to alloc_pages_node().
+	 */
+	if (!pool || cachep->gfporder != (int)pool->pool_data)
+		page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+	else
+		page = mempool_alloc_node(pool, flags, nodeid);

Allocations backed by a mempool must always be allocated via
mempool_alloc() (or mempool_alloc_node() in this case).  What that means
is, without a mempool_alloc_node() function, NO mempool backed allocations
will be able to request a specific node, even when the system has PLENTY of
memory!  This, IMO, is unacceptable.  Adding more NUMA-awareness to the
mempool system allows us to keep the same slab behavior as before, as well
as leaving us free to ignore the node requests when memory is low.

-Matt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-01-27  0:15 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060125161321.647368000@localhost.localdomain>
2006-01-25 19:39 ` [patch 1/9] mempool - Add page allocator Matthew Dobson
2006-01-25 19:39 ` [patch 2/9] mempool - Use common mempool " Matthew Dobson
2006-01-25 19:39   ` Matthew Dobson
2006-01-25 19:40 ` [patch 4/9] mempool - Update mempool page allocator user Matthew Dobson
2006-01-25 19:40 ` [patch 5/9] mempool - Update kmalloc mempool users Matthew Dobson
2006-01-25 19:40   ` Matthew Dobson
2006-01-25 19:40 ` [patch 6/9] mempool - Update kzalloc " Matthew Dobson
2006-01-25 19:40   ` Matthew Dobson
2006-01-26  7:30   ` Pekka Enberg
2006-01-26  7:30     ` Pekka Enberg
2006-01-26 22:03     ` Matthew Dobson
2006-01-26 22:03       ` Matthew Dobson
2006-01-25 19:40 ` [patch 8/9] slab - Add *_mempool slab variants Matthew Dobson
2006-01-25 19:40   ` Matthew Dobson
2006-01-26  7:41   ` Pekka Enberg
2006-01-26  7:41     ` Pekka Enberg
2006-01-26 22:40     ` Matthew Dobson
2006-01-26 22:40       ` Matthew Dobson
2006-01-27  7:09       ` Pekka J Enberg
2006-01-27  7:09         ` Pekka J Enberg
2006-01-27  7:10       ` Pekka J Enberg
2006-01-27  7:10         ` Pekka J Enberg
2006-01-25 19:40 ` [patch 9/9] slab - Implement single mempool backing for slab allocator Matthew Dobson
2006-01-25 19:40   ` Matthew Dobson
2006-01-26  8:11   ` Pekka Enberg
2006-01-26  8:11     ` Pekka Enberg
2006-01-26 22:48     ` Matthew Dobson
2006-01-26 22:48       ` Matthew Dobson
2006-01-27  7:22       ` Pekka J Enberg
2006-01-27  7:22         ` Pekka J Enberg
2006-01-25 23:51 ` [patch 1/9] mempool - Add page allocator Matthew Dobson
2006-01-25 23:51   ` Matthew Dobson
2006-01-25 23:51 ` [patch 3/9] mempool - Make mempools NUMA aware Matthew Dobson
2006-01-25 23:51   ` Matthew Dobson
2006-01-26 17:54   ` Christoph Lameter
2006-01-26 17:54     ` Christoph Lameter
2006-01-26 22:57     ` Matthew Dobson
2006-01-26 22:57       ` Matthew Dobson
2006-01-26 23:15       ` Christoph Lameter
2006-01-26 23:15         ` Christoph Lameter
2006-01-26 23:24         ` Matthew Dobson
2006-01-26 23:24           ` Matthew Dobson
2006-01-26 23:29           ` Christoph Lameter
2006-01-26 23:29             ` Christoph Lameter
2006-01-27  0:15             ` Matthew Dobson [this message]
2006-01-27  0:15               ` Matthew Dobson
2006-01-27  0:21               ` Christoph Lameter
2006-01-27  0:21                 ` Christoph Lameter
2006-01-27  0:34                 ` Matthew Dobson
2006-01-27  0:34                   ` Matthew Dobson
2006-01-27  0:39                   ` Christoph Lameter
2006-01-27  0:39                     ` Christoph Lameter
2006-01-27  0:44                     ` Matthew Dobson
2006-01-27  0:44                       ` Matthew Dobson
2006-01-27  0:57                       ` Christoph Lameter
2006-01-27  0:57                         ` Christoph Lameter
2006-01-27  1:07                         ` Andi Kleen
2006-01-27  1:07                           ` Andi Kleen
2006-01-27 10:51                   ` Paul Jackson
2006-01-27 10:51                     ` Paul Jackson
2006-01-28  1:00                     ` Matthew Dobson
2006-01-28  1:00                       ` Matthew Dobson
2006-01-28  5:08                       ` Paul Jackson
2006-01-28  5:08                         ` Paul Jackson
2006-01-28  8:16                       ` Pavel Machek
2006-01-28  8:16                         ` Pavel Machek
2006-01-28 16:14                         ` Sridhar Samudrala
2006-01-28 16:14                           ` Sridhar Samudrala
2006-01-28 16:41                           ` Pavel Machek
2006-01-28 16:41                             ` Pavel Machek
2006-01-28 16:53                             ` Sridhar Samudrala
2006-01-28 16:53                               ` Sridhar Samudrala
2006-01-28 22:59                               ` Pavel Machek
2006-01-28 22:59                                 ` Pavel Machek
2006-01-28 23:10                                 ` Let the flames begin... [was Re: [patch 3/9] mempool - Make mempools NUMA aware] Pavel Machek
2006-01-28 23:10                                   ` Pavel Machek
2006-01-27  0:23   ` [patch 3/9] mempool - Make mempools NUMA aware Benjamin LaHaise
2006-01-27  0:23     ` Benjamin LaHaise
2006-01-27  0:35     ` Matthew Dobson
2006-01-27  0:35       ` Matthew Dobson
2006-01-27  3:23       ` Benjamin LaHaise
2006-01-27  3:23         ` Benjamin LaHaise
2006-01-28  1:08         ` Matthew Dobson
2006-01-28  1:08           ` Matthew Dobson
2006-01-25 23:51 ` [patch 7/9] mempool - Update other mempool users Matthew Dobson
2006-01-25 23:51   ` Matthew Dobson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43D96633.4080900@us.ibm.com \
    --to=colpatch@us.ibm.com \
    --cc=andrea@suse.de \
    --cc=clameter@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pavel@suse.cz \
    --cc=sri@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.