Re: [PATCH] Avoiding external fragmentation with a placement policy Version 10

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Joel Schopp <jschopp@austin.ibm.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [PATCH] Avoiding external fragmentation with a placement policy Version 10
Date: Tue, 03 May 2005 14:30:07 -0500	[thread overview]
Message-ID: <4277D13F.3050804@austin.ibm.com> (raw)
In-Reply-To: <20050503164452.A3AB9E592@skynet.csn.ul.ie>

Comments inline below.

> o Tightened what pools are used for fallbacks, less likely to fragment
> o Many micro-optimisations to have the same performance as the standard 
>   allocator. Modified allocator now faster than standard allocator using
>   gcc 3.3.5

Nice.

> o Increased the size of reserve for fallbacks from 10% to 12.5%. 

This just screams out for a tunable.  Systems with different workloads 
and different amounts of memory will behave better with different 
values.  It would be even better if it would self tune, but that might 
prove difficult.

> Difference in performance operations report generated by diff-aim9.sh from VMRegress 0.14
> N  Test          Standard   MBuddy V10  Diff     % diff Test description
>                  Ops/sec    Ops/sec     Ops/sec
> -- ----------    ---------  ----------  -------- ------ ----------------
>  1 add_double    460569.72  465222.46    4652.74  1.01% Thousand Double Precision Additions/second
>  2 add_float     460523.25  465322.45    4799.20  1.04% Thousand Single Precision Additions/second
>  3 add_long     1421763.04 1436042.64   14279.60  1.00% Thousand Long Integer Additions/second
>  4 add_int      1421763.04 1436042.64   14279.60  1.00% Thousand Integer Additions/second
>  5 add_short    1421363.11 1435760.71   14397.60  1.01% Thousand Short Integer Additions/second
>  7 page_test     121048.16  123059.49    2011.33  1.66% System Allocations & Pages/second
>  8 brk_test      445743.79  452407.93    6664.14  1.50% System Memory Allocations/second
>  9 jmp_test     4158416.67 4232083.33   73666.66  1.77% Non-local gotos/second
> 10 signal_test    94417.60   94584.24     166.64  0.18% Signal Traps/second
> 11 exec_test         65.04      66.69       1.65  2.54% Program Loads/second
> 12 fork_test       1537.82    1730.51     192.69 12.53% Task Creations/second
> 13 link_test       6411.28    6477.45      66.17  1.03% Link/Unlink Pairs/second
> 
> The aim9 results show that there are consistent improvements for common
> page-related operations. The results are compiler dependant and there are
> variances of 1-2% between versions.

Any explanation for why fork_test shows markedly better improvement 
compared to the others?

> -#define __GFP_BITS_SHIFT 16	/* Room for 16 __GFP_FOO bits */
> +#define __GFP_BITS_SHIFT 18	/* Room for 16 __GFP_FOO bits */

Comment should have the new 18, not the old 16.

> +#ifdef CONFIG_ALLOCSTATS
> +	/*
> +	 * These are beancounters that track how the placement policy
> +	 * of the buddy allocator is performing
> +	 */
> +	unsigned long fallback_count[ALLOC_TYPES];
> +	unsigned long alloc_count[ALLOC_TYPES];
> +	unsigned long reserve_count[ALLOC_TYPES];
> +	unsigned long kernnorclm_full_steal;
> +	unsigned long kernnorclm_partial_steal;
> +	unsigned long bulk_requests[MAX_ORDER];
> +	unsigned long bulk_alloced[MAX_ORDER];
> +#endif

It would be nice if all of the CONFIG_ALLOCSTATS stuff was broken out as 
a second patch.  It would make this patch much smaller and more readable.

> +int fallback_allocs[ALLOC_TYPES][ALLOC_TYPES] = { 
> +	{ALLOC_KERNNORCLM, ALLOC_FALLBACK,   ALLOC_KERNRCLM,   ALLOC_USERRCLM},
> +	{ALLOC_KERNRCLM,   ALLOC_FALLBACK,   ALLOC_KERNNORCLM, ALLOC_USERRCLM},

I would have thought that KernRclm would want to choose USERRCLM over 
KERNNOCRLM.

> +	{ALLOC_USERRCLM,   ALLOC_FALLBACK,   ALLOC_KERNNORCLM, ALLOC_KERNRCLM},

I'm almost certain the UserRclm type should prefer KERNRCLM over KERNNORCLM.


> +	 * Here, the alloc type lists has been depleted as well as the global
> +	 * pool, so fallback. When falling back, the largest possible block
> +	 * will be taken to keep the fallbacks clustered if possible
> +	 */

I was curious if you had tried taking the smallest possible block.  I 
would think that it would reduce the amount of fallback needed, and thus 
increase the amount available for the 3 allocation types.  I would 
expect a net win, despite not clustering fallbacks particularly well.

> +		alloctype = fallback_list[retry_count];
> +
> +		/* Find a block to allocate */
> +		area = zone->free_area_lists[alloctype] + (MAX_ORDER-1);
> +		current_order=MAX_ORDER;
> +		do {
> +			current_order--;
> +			if (list_empty(&area->free_list)) {
> +				area--;
> +				continue;
> +			}
> +
> +			goto remove_page;
> +		} while (current_order != order);
> +	}

This loop is a bit hard to understand.  I think it would be easier to 
understand if it looked something like this (totally untested):

+		current_order=MAX_ORDER - 1 ;
+		do {
+			if (!list_empty(&area->free_list)) {
+				goto remove_page;
+			}
+
+			area--;
+			current_order--;
+		} while (current_order >= order);

WARNING: multiple messages have this Message-ID (diff)

From: Joel Schopp <jschopp@austin.ibm.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [PATCH] Avoiding external fragmentation with a placement policy Version 10
Date: Tue, 03 May 2005 14:30:07 -0500	[thread overview]
Message-ID: <4277D13F.3050804@austin.ibm.com> (raw)
In-Reply-To: <20050503164452.A3AB9E592@skynet.csn.ul.ie>

Comments inline below.

> o Tightened what pools are used for fallbacks, less likely to fragment
> o Many micro-optimisations to have the same performance as the standard 
>   allocator. Modified allocator now faster than standard allocator using
>   gcc 3.3.5

Nice.

> o Increased the size of reserve for fallbacks from 10% to 12.5%. 

This just screams out for a tunable.  Systems with different workloads 
and different amounts of memory will behave better with different 
values.  It would be even better if it would self tune, but that might 
prove difficult.

> Difference in performance operations report generated by diff-aim9.sh from VMRegress 0.14
> N  Test          Standard   MBuddy V10  Diff     % diff Test description
>                  Ops/sec    Ops/sec     Ops/sec
> -- ----------    ---------  ----------  -------- ------ ----------------
>  1 add_double    460569.72  465222.46    4652.74  1.01% Thousand Double Precision Additions/second
>  2 add_float     460523.25  465322.45    4799.20  1.04% Thousand Single Precision Additions/second
>  3 add_long     1421763.04 1436042.64   14279.60  1.00% Thousand Long Integer Additions/second
>  4 add_int      1421763.04 1436042.64   14279.60  1.00% Thousand Integer Additions/second
>  5 add_short    1421363.11 1435760.71   14397.60  1.01% Thousand Short Integer Additions/second
>  7 page_test     121048.16  123059.49    2011.33  1.66% System Allocations & Pages/second
>  8 brk_test      445743.79  452407.93    6664.14  1.50% System Memory Allocations/second
>  9 jmp_test     4158416.67 4232083.33   73666.66  1.77% Non-local gotos/second
> 10 signal_test    94417.60   94584.24     166.64  0.18% Signal Traps/second
> 11 exec_test         65.04      66.69       1.65  2.54% Program Loads/second
> 12 fork_test       1537.82    1730.51     192.69 12.53% Task Creations/second
> 13 link_test       6411.28    6477.45      66.17  1.03% Link/Unlink Pairs/second
> 
> The aim9 results show that there are consistent improvements for common
> page-related operations. The results are compiler dependant and there are
> variances of 1-2% between versions.

Any explanation for why fork_test shows markedly better improvement 
compared to the others?

> -#define __GFP_BITS_SHIFT 16	/* Room for 16 __GFP_FOO bits */
> +#define __GFP_BITS_SHIFT 18	/* Room for 16 __GFP_FOO bits */

Comment should have the new 18, not the old 16.

> +#ifdef CONFIG_ALLOCSTATS
> +	/*
> +	 * These are beancounters that track how the placement policy
> +	 * of the buddy allocator is performing
> +	 */
> +	unsigned long fallback_count[ALLOC_TYPES];
> +	unsigned long alloc_count[ALLOC_TYPES];
> +	unsigned long reserve_count[ALLOC_TYPES];
> +	unsigned long kernnorclm_full_steal;
> +	unsigned long kernnorclm_partial_steal;
> +	unsigned long bulk_requests[MAX_ORDER];
> +	unsigned long bulk_alloced[MAX_ORDER];
> +#endif

It would be nice if all of the CONFIG_ALLOCSTATS stuff was broken out as 
a second patch.  It would make this patch much smaller and more readable.

> +int fallback_allocs[ALLOC_TYPES][ALLOC_TYPES] = { 
> +	{ALLOC_KERNNORCLM, ALLOC_FALLBACK,   ALLOC_KERNRCLM,   ALLOC_USERRCLM},
> +	{ALLOC_KERNRCLM,   ALLOC_FALLBACK,   ALLOC_KERNNORCLM, ALLOC_USERRCLM},

I would have thought that KernRclm would want to choose USERRCLM over 
KERNNOCRLM.

> +	{ALLOC_USERRCLM,   ALLOC_FALLBACK,   ALLOC_KERNNORCLM, ALLOC_KERNRCLM},

I'm almost certain the UserRclm type should prefer KERNRCLM over KERNNORCLM.


> +	 * Here, the alloc type lists has been depleted as well as the global
> +	 * pool, so fallback. When falling back, the largest possible block
> +	 * will be taken to keep the fallbacks clustered if possible
> +	 */

I was curious if you had tried taking the smallest possible block.  I 
would think that it would reduce the amount of fallback needed, and thus 
increase the amount available for the 3 allocation types.  I would 
expect a net win, despite not clustering fallbacks particularly well.

> +		alloctype = fallback_list[retry_count];
> +
> +		/* Find a block to allocate */
> +		area = zone->free_area_lists[alloctype] + (MAX_ORDER-1);
> +		current_order=MAX_ORDER;
> +		do {
> +			current_order--;
> +			if (list_empty(&area->free_list)) {
> +				area--;
> +				continue;
> +			}
> +
> +			goto remove_page;
> +		} while (current_order != order);
> +	}

This loop is a bit hard to understand.  I think it would be easier to 
understand if it looked something like this (totally untested):

+		current_order=MAX_ORDER - 1 ;
+		do {
+			if (!list_empty(&area->free_list)) {
+				goto remove_page;
+			}
+
+			area--;
+			current_order--;
+		} while (current_order >= order);



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2005-05-03 19:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-03 16:44 [PATCH] Avoiding external fragmentation with a placement policy Version 10 Mel Gorman
2005-05-03 16:44 ` Mel Gorman
2005-05-03 19:30 ` Joel Schopp [this message]
2005-05-03 19:30   ` Joel Schopp
2005-05-04  8:20   ` Mel Gorman
2005-05-04  8:20     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4277D13F.3050804@austin.ibm.com \
    --to=jschopp@austin.ibm.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.