Re: [RFC PATCH 3/3] mm/mempolicy: implement a partial-interleave mempolicy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Gregory Price <gourry.memverge@gmail.com>
Cc: <linux-mm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linux-api@vger.kernel.org>,
	<linux-cxl@vger.kernel.org>, <luto@kernel.org>,
	<tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>,
	<dave.hansen@linux.intel.com>, <hpa@zytor.com>, <arnd@arndb.de>,
	<akpm@linux-foundation.org>, <x86@kernel.org>,
	Gregory Price <gregory.price@memverge.com>
Subject: Re: [RFC PATCH 3/3] mm/mempolicy: implement a partial-interleave mempolicy
Date: Mon, 2 Oct 2023 14:40:35 +0100	[thread overview]
Message-ID: <20231002144035.00000b36@Huawei.com> (raw)
In-Reply-To: <20230914235457.482710-4-gregory.price@memverge.com>

On Thu, 14 Sep 2023 19:54:57 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> The partial-interleave mempolicy implements interleave on an

I'm not sure 'partial' really conveys what is going on here.
Weighted, or uneven-interleave maybe?

> allocation interval. The default node is the local node, for
> which N pages will be allocated before an interleave pass occurs.
> 
> For example:
>   nodes=0,1,2
>   interval=3
>   cpunode=0
> 
> Over 10 consecutive allocations, the following nodes will be selected:
> [0,0,0,1,2,0,0,0,1,2]
> 
> In this example, there is a 60%/20%/20% distribution of memory.
> 
> Using this mechanism, it becomes possible to define an approximate
> distribution percentage of memory across a set of nodes:
> 
> local_node% : interval/((nr_nodes-1)+interval-1)
> other_node% : (1-local_node%)/(nr_nodes-1)

I'd like to see more discussion here of why you would do this...


A few trivial bits inline,

Jonathan

...

> +static unsigned long alloc_pages_bulk_array_partial_interleave(gfp_t gfp,
> +		struct mempolicy *pol, unsigned long nr_pages,
> +		struct page **page_array)
> +{
> +	nodemask_t nodemask = pol->nodes;
> +	unsigned long nr_pages_main;
> +	unsigned long nr_pages_other;
> +	unsigned long total_cycle;
> +	unsigned long delta;
> +	unsigned long interval;
> +	int allocated = 0;
> +	int start_nid;
> +	int nnodes;
> +	int prev, next;
> +	int i;
> +
> +	/* This stabilizes nodes on the stack incase pol->nodes changes */
> +	barrier();
> +
> +	nnodes = nodes_weight(nodemask);
> +	start_nid = numa_node_id();
> +
> +	if (!node_isset(start_nid, nodemask))
> +		start_nid = first_node(nodemask);
> +
> +	if (nnodes == 1) {
> +		allocated = __alloc_pages_bulk(gfp, start_nid,
> +					       NULL, nr_pages_main,
> +					       NULL, page_array);
> +		return allocated;
		return __alloc_pages_bulk(...)

> +	}
> +	/* We don't want to double-count the main node in calculations */
> +	nnodes--;
> +
> +	interval = pol->part_int.interval;
> +	total_cycle = (interval + nnodes);

excess brackets. Same in various other places.


> +	/* Number of pages on main node: (cycles*interval + up to interval) */
> +	nr_pages_main = ((nr_pages / total_cycle) * interval);
> +	nr_pages_main += (nr_pages % total_cycle % (interval + 1));


> +	/* Number of pages on others: (remaining/nodes) + 1 page if delta  */
> +	nr_pages_other = (nr_pages - nr_pages_main) / nnodes;
> +	nr_pages_other /= nnodes;
> +	/* Delta is number of pages beyond interval up to full cycle */
> +	delta = nr_pages - (nr_pages_main + (nr_pages_other * nnodes));
> +
> +	/* start by allocating for the main node, then interleave rest */
> +	prev = start_nid;
> +	allocated = __alloc_pages_bulk(gfp, start_nid, NULL, nr_pages_main,
> +				       NULL, page_array);
> +	for (i = 0; i < nnodes; i++) {
> +		int pages = nr_pages_other + (delta-- ? 1 : 0);
> +
> +		next = next_node_in(prev, nodemask);
> +		if (next < MAX_NUMNODES)
> +			prev = next;
> +		allocated += __alloc_pages_bulk(gfp, next, NULL, pages,
> +						NULL, page_array);
> +	}
> +
> +	return allocated;
> +}
> +

next prev parent reply	other threads:[~2023-10-02 13:40 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14 23:54 [RFC PATCH 0/3] mm/mempolicy: set/get_mempolicy2 Gregory Price
2023-09-14 23:54 ` [RFC PATCH 1/3] mm/mempolicy: refactor do_set_mempolicy for code re-use Gregory Price
2023-10-02 11:03   ` Jonathan Cameron
2023-09-14 23:54 ` [RFC PATCH 2/3] mm/mempolicy: Implement set_mempolicy2 and get_mempolicy2 syscalls Gregory Price
2023-09-15  1:29   ` kernel test robot
2023-09-15  2:12   ` kernel test robot
2023-09-15  4:21   ` kernel test robot
2023-10-02 13:30   ` Jonathan Cameron
2023-10-02 15:30     ` Gregory Price
2023-10-02 18:03     ` Gregory Price
2023-09-14 23:54 ` [RFC PATCH 3/3] mm/mempolicy: implement a partial-interleave mempolicy Gregory Price
2023-10-02 13:40   ` Jonathan Cameron [this message]
2023-10-02 16:10     ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231002144035.00000b36@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.