* [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-18 9:54 ` Eric Dumazet
0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2007-05-18 9:54 UTC (permalink / raw)
To: Andrew Morton, linux-mm@kvack.org; +Cc: linux kernel, David Miller
alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.
Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.
On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.
We can free all pages that wont be used by the hash table.
On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
TCP established hash table entries: 32768 (order: 6, 393216 bytes)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ae96dd8..2e0ba08 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
;
table = (void*) __get_free_pages(GFP_ATOMIC, order);
+ /*
+ * If bucketsize is not a power-of-two, we may free
+ * some pages at the end of hash table.
+ */
+ if (table) {
+ unsigned long alloc_end = (unsigned long)table +
+ (PAGE_SIZE << order);
+ unsigned long used = (unsigned long)table +
+ PAGE_ALIGN(size);
+ while (used < alloc_end) {
+ free_page(used);
+ used += PAGE_SIZE;
+ }
+ }
}
} while (!table && size > PAGE_SIZE && --log2qty);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-18 9:54 ` Eric Dumazet
0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2007-05-18 9:54 UTC (permalink / raw)
To: Andrew Morton, linux-mm@kvack.org; +Cc: linux kernel, David Miller
alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.
Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.
On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.
We can free all pages that wont be used by the hash table.
On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
TCP established hash table entries: 32768 (order: 6, 393216 bytes)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ae96dd8..2e0ba08 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
;
table = (void*) __get_free_pages(GFP_ATOMIC, order);
+ /*
+ * If bucketsize is not a power-of-two, we may free
+ * some pages at the end of hash table.
+ */
+ if (table) {
+ unsigned long alloc_end = (unsigned long)table +
+ (PAGE_SIZE << order);
+ unsigned long used = (unsigned long)table +
+ PAGE_ALIGN(size);
+ while (used < alloc_end) {
+ free_page(used);
+ used += PAGE_SIZE;
+ }
+ }
}
} while (!table && size > PAGE_SIZE && --log2qty);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
2007-05-18 9:54 ` Eric Dumazet
@ 2007-05-18 18:21 ` Christoph Lameter
-1 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2007-05-18 18:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, linux-mm@kvack.org, linux kernel, David Miller
On Fri, 18 May 2007, Eric Dumazet wrote:
> table = (void*) __get_free_pages(GFP_ATOMIC, order);
ATOMIC? Is there some reason why we need atomic here?
> + /*
> + * If bucketsize is not a power-of-two, we may free
> + * some pages at the end of hash table.
> + */
> + if (table) {
> + unsigned long alloc_end = (unsigned long)table +
> + (PAGE_SIZE << order);
> + unsigned long used = (unsigned long)table +
> + PAGE_ALIGN(size);
> + while (used < alloc_end) {
> + free_page(used);
Isnt this going to interfere with the kernel_map_pages debug stuff?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-18 18:21 ` Christoph Lameter
0 siblings, 0 replies; 18+ messages in thread
From: Christoph Lameter @ 2007-05-18 18:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, linux-mm@kvack.org, linux kernel, David Miller
On Fri, 18 May 2007, Eric Dumazet wrote:
> table = (void*) __get_free_pages(GFP_ATOMIC, order);
ATOMIC? Is there some reason why we need atomic here?
> + /*
> + * If bucketsize is not a power-of-two, we may free
> + * some pages at the end of hash table.
> + */
> + if (table) {
> + unsigned long alloc_end = (unsigned long)table +
> + (PAGE_SIZE << order);
> + unsigned long used = (unsigned long)table +
> + PAGE_ALIGN(size);
> + while (used < alloc_end) {
> + free_page(used);
Isnt this going to interfere with the kernel_map_pages debug stuff?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
2007-05-18 9:54 ` Eric Dumazet
@ 2007-05-19 8:37 ` Andrew Morton
-1 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2007-05-19 8:37 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-mm@kvack.org, linux kernel, David Miller
On Fri, 18 May 2007 11:54:54 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote:
> alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.
>
> Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.
>
> On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.
Watch the 200-column text, please.
> We can free all pages that wont be used by the hash table.
>
> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
>
> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ae96dd8..2e0ba08 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
> for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
> ;
> table = (void*) __get_free_pages(GFP_ATOMIC, order);
> + /*
> + * If bucketsize is not a power-of-two, we may free
> + * some pages at the end of hash table.
> + */
> + if (table) {
> + unsigned long alloc_end = (unsigned long)table +
> + (PAGE_SIZE << order);
> + unsigned long used = (unsigned long)table +
> + PAGE_ALIGN(size);
> + while (used < alloc_end) {
> + free_page(used);
> + used += PAGE_SIZE;
> + }
> + }
> }
> } while (!table && size > PAGE_SIZE && --log2qty);
>
It went BUG.
static inline int put_page_testzero(struct page *page)
{
VM_BUG_ON(atomic_read(&page->_count) == 0);
return atomic_dec_and_test(&page->_count);
}
http://userweb.kernel.org/~akpm/s5000523.jpg
http://userweb.kernel.org/~akpm/config-vmm.txt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-19 8:37 ` Andrew Morton
0 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2007-05-19 8:37 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-mm@kvack.org, linux kernel, David Miller
On Fri, 18 May 2007 11:54:54 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote:
> alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.
>
> Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.
>
> On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.
Watch the 200-column text, please.
> We can free all pages that wont be used by the hash table.
>
> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
>
> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ae96dd8..2e0ba08 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
> for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
> ;
> table = (void*) __get_free_pages(GFP_ATOMIC, order);
> + /*
> + * If bucketsize is not a power-of-two, we may free
> + * some pages at the end of hash table.
> + */
> + if (table) {
> + unsigned long alloc_end = (unsigned long)table +
> + (PAGE_SIZE << order);
> + unsigned long used = (unsigned long)table +
> + PAGE_ALIGN(size);
> + while (used < alloc_end) {
> + free_page(used);
> + used += PAGE_SIZE;
> + }
> + }
> }
> } while (!table && size > PAGE_SIZE && --log2qty);
>
It went BUG.
static inline int put_page_testzero(struct page *page)
{
VM_BUG_ON(atomic_read(&page->_count) == 0);
return atomic_dec_and_test(&page->_count);
}
http://userweb.kernel.org/~akpm/s5000523.jpg
http://userweb.kernel.org/~akpm/config-vmm.txt
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
2007-05-19 8:37 ` Andrew Morton
@ 2007-05-19 18:07 ` Eric Dumazet
-1 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2007-05-19 18:07 UTC (permalink / raw)
To: Andrew Morton, David Howells
Cc: linux-mm@kvack.org, linux kernel, David Miller
Andrew Morton a écrit :
> On Fri, 18 May 2007 11:54:54 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>> alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.
>>
>> Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.
>>
>> On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.
>
> Watch the 200-column text, please.
>
>> We can free all pages that wont be used by the hash table.
>>
>> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
>>
>> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>> ---
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ae96dd8..2e0ba08 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
>> for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
>> ;
>> table = (void*) __get_free_pages(GFP_ATOMIC, order);
>> + /*
>> + * If bucketsize is not a power-of-two, we may free
>> + * some pages at the end of hash table.
>> + */
>> + if (table) {
>> + unsigned long alloc_end = (unsigned long)table +
>> + (PAGE_SIZE << order);
>> + unsigned long used = (unsigned long)table +
>> + PAGE_ALIGN(size);
>> + while (used < alloc_end) {
>> + free_page(used);
>> + used += PAGE_SIZE;
>> + }
>> + }
>> }
>> } while (!table && size > PAGE_SIZE && --log2qty);
>>
>
> It went BUG.
>
> static inline int put_page_testzero(struct page *page)
> {
> VM_BUG_ON(atomic_read(&page->_count) == 0);
> return atomic_dec_and_test(&page->_count);
> }
>
> http://userweb.kernel.org/~akpm/s5000523.jpg
> http://userweb.kernel.org/~akpm/config-vmm.txt
I see :(
Maybe David has an idea how this can be done properly ?
ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-19 18:07 ` Eric Dumazet
0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2007-05-19 18:07 UTC (permalink / raw)
To: Andrew Morton, David Howells
Cc: linux-mm@kvack.org, linux kernel, David Miller
Andrew Morton a ecrit :
> On Fri, 18 May 2007 11:54:54 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>> alloc_large_system_hash() is called at boot time to allocate space for several large hash tables.
>>
>> Lately, TCP hash table was changed and its bucketsize is not a power-of-two anymore.
>>
>> On most setups, alloc_large_system_hash() allocates one big page (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a power-of-two size, bigger than the needed size.
>
> Watch the 200-column text, please.
>
>> We can free all pages that wont be used by the hash table.
>>
>> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
>>
>> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>> ---
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ae96dd8..2e0ba08 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3350,6 +3350,20 @@ void *__init alloc_large_system_hash(const char *tablename,
>> for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
>> ;
>> table = (void*) __get_free_pages(GFP_ATOMIC, order);
>> + /*
>> + * If bucketsize is not a power-of-two, we may free
>> + * some pages at the end of hash table.
>> + */
>> + if (table) {
>> + unsigned long alloc_end = (unsigned long)table +
>> + (PAGE_SIZE << order);
>> + unsigned long used = (unsigned long)table +
>> + PAGE_ALIGN(size);
>> + while (used < alloc_end) {
>> + free_page(used);
>> + used += PAGE_SIZE;
>> + }
>> + }
>> }
>> } while (!table && size > PAGE_SIZE && --log2qty);
>>
>
> It went BUG.
>
> static inline int put_page_testzero(struct page *page)
> {
> VM_BUG_ON(atomic_read(&page->_count) == 0);
> return atomic_dec_and_test(&page->_count);
> }
>
> http://userweb.kernel.org/~akpm/s5000523.jpg
> http://userweb.kernel.org/~akpm/config-vmm.txt
I see :(
Maybe David has an idea how this can be done properly ?
ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
2007-05-18 9:54 ` Eric Dumazet
@ 2007-05-19 18:21 ` William Lee Irwin III
-1 siblings, 0 replies; 18+ messages in thread
From: William Lee Irwin III @ 2007-05-19 18:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, linux-mm@kvack.org, linux kernel, David Miller
On Fri, May 18, 2007 at 11:54:54AM +0200, Eric Dumazet wrote:
> alloc_large_system_hash() is called at boot time to allocate space
> for several large hash tables.
> Lately, TCP hash table was changed and its bucketsize is not a
> power-of-two anymore.
> On most setups, alloc_large_system_hash() allocates one big page
> (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single
> high_order page has a power-of-two size, bigger than the needed size.
> We can free all pages that wont be used by the hash table.
> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
The proper way to do this is to convert the large system hashtable
users to use some data structure / algorithm other than hashing by
separate chaining.
-- wli
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-19 18:21 ` William Lee Irwin III
0 siblings, 0 replies; 18+ messages in thread
From: William Lee Irwin III @ 2007-05-19 18:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, linux-mm@kvack.org, linux kernel, David Miller
On Fri, May 18, 2007 at 11:54:54AM +0200, Eric Dumazet wrote:
> alloc_large_system_hash() is called at boot time to allocate space
> for several large hash tables.
> Lately, TCP hash table was changed and its bucketsize is not a
> power-of-two anymore.
> On most setups, alloc_large_system_hash() allocates one big page
> (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single
> high_order page has a power-of-two size, bigger than the needed size.
> We can free all pages that wont be used by the hash table.
> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
The proper way to do this is to convert the large system hashtable
users to use some data structure / algorithm other than hashing by
separate chaining.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
2007-05-19 18:21 ` William Lee Irwin III
@ 2007-05-19 18:41 ` Eric Dumazet
-1 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2007-05-19 18:41 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Andrew Morton, linux-mm@kvack.org, linux kernel, David Miller
William Lee Irwin III a écrit :
> On Fri, May 18, 2007 at 11:54:54AM +0200, Eric Dumazet wrote:
>> alloc_large_system_hash() is called at boot time to allocate space
>> for several large hash tables.
>> Lately, TCP hash table was changed and its bucketsize is not a
>> power-of-two anymore.
>> On most setups, alloc_large_system_hash() allocates one big page
>> (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single
>> high_order page has a power-of-two size, bigger than the needed size.
>> We can free all pages that wont be used by the hash table.
>> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
>> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
>
> The proper way to do this is to convert the large system hashtable
> users to use some data structure / algorithm other than hashing by
> separate chaining.
No thanks. This was already discussed to death on netdev. To date, hash tables
are a good compromise.
I dont mind losing part of memory, I prefer to keep good performance when
handling 1.000.000 or more tcp sessions.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-19 18:41 ` Eric Dumazet
0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2007-05-19 18:41 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Andrew Morton, linux-mm@kvack.org, linux kernel, David Miller
William Lee Irwin III a ecrit :
> On Fri, May 18, 2007 at 11:54:54AM +0200, Eric Dumazet wrote:
>> alloc_large_system_hash() is called at boot time to allocate space
>> for several large hash tables.
>> Lately, TCP hash table was changed and its bucketsize is not a
>> power-of-two anymore.
>> On most setups, alloc_large_system_hash() allocates one big page
>> (order > 0) with __get_free_pages(GFP_ATOMIC, order). This single
>> high_order page has a power-of-two size, bigger than the needed size.
>> We can free all pages that wont be used by the hash table.
>> On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
>> TCP established hash table entries: 32768 (order: 6, 393216 bytes)
>
> The proper way to do this is to convert the large system hashtable
> users to use some data structure / algorithm other than hashing by
> separate chaining.
No thanks. This was already discussed to death on netdev. To date, hash tables
are a good compromise.
I dont mind losing part of memory, I prefer to keep good performance when
handling 1.000.000 or more tcp sessions.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
2007-05-19 18:07 ` Eric Dumazet
@ 2007-05-19 18:54 ` David Miller, Eric Dumazet
-1 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2007-05-19 18:54 UTC (permalink / raw)
To: dada1; +Cc: akpm, dhowells, linux-mm, linux-kernel
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Sat, 19 May 2007 20:07:11 +0200
> Maybe David has an idea how this can be done properly ?
>
> ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2
You need to use __GFP_COMP or similar to make this splitting+freeing
thing work.
Otherwise the individual pages don't have page references, only
the head page of the high-order page will.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-19 18:54 ` David Miller, Eric Dumazet
0 siblings, 0 replies; 18+ messages in thread
From: David Miller, Eric Dumazet @ 2007-05-19 18:54 UTC (permalink / raw)
To: dada1; +Cc: akpm, dhowells, linux-mm, linux-kernel
> Maybe David has an idea how this can be done properly ?
>
> ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2
You need to use __GFP_COMP or similar to make this splitting+freeing
thing work.
Otherwise the individual pages don't have page references, only
the head page of the high-order page will.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
2007-05-19 18:54 ` David Miller, Eric Dumazet
@ 2007-05-19 20:36 ` Eric Dumazet
-1 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2007-05-19 20:36 UTC (permalink / raw)
To: David Miller, akpm; +Cc: dhowells, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1257 bytes --]
David Miller a écrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Sat, 19 May 2007 20:07:11 +0200
>
>> Maybe David has an idea how this can be done properly ?
>>
>> ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2
>
> You need to use __GFP_COMP or similar to make this splitting+freeing
> thing work.
>
> Otherwise the individual pages don't have page references, only
> the head page of the high-order page will.
>
Oh thanks David for the hint.
I added a split_page() call and it seems to work now.
[PATCH] MM : alloc_large_system_hash() can free some memory for non
power-of-two bucketsize
alloc_large_system_hash() is called at boot time to allocate space for several
large hash tables.
Lately, TCP hash table was changed and its bucketsize is not a power-of-two
anymore.
On most setups, alloc_large_system_hash() allocates one big page (order > 0)
with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a
power-of-two size, bigger than the needed size.
We can free all pages that wont be used by the hash table.
On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
TCP established hash table entries: 32768 (order: 6, 393216 bytes)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: alloc_large.patch --]
[-- Type: text/plain, Size: 823 bytes --]
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ae96dd8..7c219eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3350,6 +3350,21 @@ void *__init alloc_large_system_hash(const char *tablename,
for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
;
table = (void*) __get_free_pages(GFP_ATOMIC, order);
+ /*
+ * If bucketsize is not a power-of-two, we may free
+ * some pages at the end of hash table.
+ */
+ if (table) {
+ unsigned long alloc_end = (unsigned long)table +
+ (PAGE_SIZE << order);
+ unsigned long used = (unsigned long)table +
+ PAGE_ALIGN(size);
+ split_page(virt_to_page(table), order);
+ while (used < alloc_end) {
+ free_page(used);
+ used += PAGE_SIZE;
+ }
+ }
}
} while (!table && size > PAGE_SIZE && --log2qty);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-19 20:36 ` Eric Dumazet
0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2007-05-19 20:36 UTC (permalink / raw)
To: David Miller, akpm; +Cc: dhowells, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1257 bytes --]
David Miller a ecrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Sat, 19 May 2007 20:07:11 +0200
>
>> Maybe David has an idea how this can be done properly ?
>>
>> ref : http://marc.info/?l=linux-netdev&m=117706074825048&w=2
>
> You need to use __GFP_COMP or similar to make this splitting+freeing
> thing work.
>
> Otherwise the individual pages don't have page references, only
> the head page of the high-order page will.
>
Oh thanks David for the hint.
I added a split_page() call and it seems to work now.
[PATCH] MM : alloc_large_system_hash() can free some memory for non
power-of-two bucketsize
alloc_large_system_hash() is called at boot time to allocate space for several
large hash tables.
Lately, TCP hash table was changed and its bucketsize is not a power-of-two
anymore.
On most setups, alloc_large_system_hash() allocates one big page (order > 0)
with __get_free_pages(GFP_ATOMIC, order). This single high_order page has a
power-of-two size, bigger than the needed size.
We can free all pages that wont be used by the hash table.
On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
TCP established hash table entries: 32768 (order: 6, 393216 bytes)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: alloc_large.patch --]
[-- Type: text/plain, Size: 823 bytes --]
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ae96dd8..7c219eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3350,6 +3350,21 @@ void *__init alloc_large_system_hash(const char *tablename,
for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++)
;
table = (void*) __get_free_pages(GFP_ATOMIC, order);
+ /*
+ * If bucketsize is not a power-of-two, we may free
+ * some pages at the end of hash table.
+ */
+ if (table) {
+ unsigned long alloc_end = (unsigned long)table +
+ (PAGE_SIZE << order);
+ unsigned long used = (unsigned long)table +
+ PAGE_ALIGN(size);
+ split_page(virt_to_page(table), order);
+ while (used < alloc_end) {
+ free_page(used);
+ used += PAGE_SIZE;
+ }
+ }
}
} while (!table && size > PAGE_SIZE && --log2qty);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
2007-05-19 18:41 ` Eric Dumazet
@ 2007-05-21 8:11 ` William Lee Irwin III
-1 siblings, 0 replies; 18+ messages in thread
From: William Lee Irwin III @ 2007-05-21 8:11 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, linux-mm@kvack.org, linux kernel, David Miller
William Lee Irwin III a ?crit :
>> The proper way to do this is to convert the large system hashtable
>> users to use some data structure / algorithm other than hashing by
>> separate chaining.
On Sat, May 19, 2007 at 08:41:01PM +0200, Eric Dumazet wrote:
> No thanks. This was already discussed to death on netdev. To date, hash
> tables are a good compromise.
> I dont mind losing part of memory, I prefer to keep good performance when
> handling 1.000.000 or more tcp sessions.
The data structures perform well enough, but I suppose it's not worth
pushing the issue this way.
-- wli
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize
@ 2007-05-21 8:11 ` William Lee Irwin III
0 siblings, 0 replies; 18+ messages in thread
From: William Lee Irwin III @ 2007-05-21 8:11 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, linux-mm@kvack.org, linux kernel, David Miller
William Lee Irwin III a ?crit :
>> The proper way to do this is to convert the large system hashtable
>> users to use some data structure / algorithm other than hashing by
>> separate chaining.
On Sat, May 19, 2007 at 08:41:01PM +0200, Eric Dumazet wrote:
> No thanks. This was already discussed to death on netdev. To date, hash
> tables are a good compromise.
> I dont mind losing part of memory, I prefer to keep good performance when
> handling 1.000.000 or more tcp sessions.
The data structures perform well enough, but I suppose it's not worth
pushing the issue this way.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-05-21 8:11 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-18 9:54 [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize Eric Dumazet
2007-05-18 9:54 ` Eric Dumazet
2007-05-18 18:21 ` Christoph Lameter
2007-05-18 18:21 ` Christoph Lameter
2007-05-19 8:37 ` Andrew Morton
2007-05-19 8:37 ` Andrew Morton
2007-05-19 18:07 ` Eric Dumazet
2007-05-19 18:07 ` Eric Dumazet
2007-05-19 18:54 ` David Miller
2007-05-19 18:54 ` David Miller, Eric Dumazet
2007-05-19 20:36 ` Eric Dumazet
2007-05-19 20:36 ` Eric Dumazet
2007-05-19 18:21 ` William Lee Irwin III
2007-05-19 18:21 ` William Lee Irwin III
2007-05-19 18:41 ` Eric Dumazet
2007-05-19 18:41 ` Eric Dumazet
2007-05-21 8:11 ` William Lee Irwin III
2007-05-21 8:11 ` William Lee Irwin III
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.