Re: [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Gregory Price <gregory.price@memverge.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Gregory Price <gourry.memverge@gmail.com>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de,
	luto@kernel.org, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mhocko@kernel.org,
	tj@kernel.org, corbet@lwn.net, rakie.kim@sk.com,
	hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com,
	peterz@infradead.org, jgroves@micron.com,
	ravis.opensrc@micron.com, sthanneeru@micron.com,
	emirakhur@micron.com, Hasan.Maruf@amd.com,
	seungjun.ha@samsung.com,
	Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Subject: Re: [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving
Date: Thu, 4 Jan 2024 13:59:35 -0500	[thread overview]
Message-ID: <ZZcAF4zIpsVN3dLd@memverge.com> (raw)
In-Reply-To: <875y09d5d8.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Thu, Jan 04, 2024 at 01:39:31PM +0800, Huang, Ying wrote:
> Gregory Price <gregory.price@memverge.com> writes:
> 
> > On Wed, Jan 03, 2024 at 01:46:56PM +0800, Huang, Ying wrote:
> >> Gregory Price <gregory.price@memverge.com> writes:
> >> > I'm specifically concerned about:
> >> > 	weighted_interleave_nid
> >> > 	alloc_pages_bulk_array_weighted_interleave
> >> >
> >> > I'm unsure whether kmalloc/kfree is safe (and non-offensive) in those
> >> > contexts. If kmalloc/kfree is safe fine, this problem is trivial.
> >> >
> >> > If not, there is no good solution to this without pre-allocating a
> >> > scratch area per-task.
> >> 
> >> You need to audit whether it's safe for all callers.  I guess that you
> >> need to allocate pages after calling, so you can use the same GFP flags
> >> here.
> >> 
> >
> > After picking away i realized that this code is usually going to get
> > called during page fault handling - duh.  So kmalloc is almost never
> > safe (or can fail), and we it's nasty to try to handle those errors.
> 
> Why not just OOM for allocation failure?
>

2 notes:

1) callers of weighted_interleave_nid do not expect OOM conditions, they
   expect a node selection.  On error, we would simply return the local
   numa node without indication of failure.

2) callers of alloc_pages_bulk_array_weighted_interleave receive the
   total number of pages allocated, and they are expected to detect
   pages allocated != pages requested, and then handle whether to
   OOM or simply retry (allocation may fail for a variety of reasons).

By introducing an allocation into this area, if an allocation failure
occurs, we would essentially need to silently squash it and return
either local_node (interleave_nid) or return 0 (bulk allocator) and
allow the allocation logic to handle any subsequent OOM condition.

That felt less desirable than just allocating a scratch space up front
in the mempolicy and avoiding the issue altogether.

> > Instead of doing that, I simply chose to implement the scratch space
> > in the mempolicy structure
> >
> > mempolicy->wil.scratch_weights[MAX_NUMNODES].
> >
> > We eat an extra 1kb of memory in the mempolicy, but it gives us a safe
> > scratch space we can use any time the task is allocating memory, and
> > prevents the need for any fancy error handling.  That seems like a
> > perfectly reasonable tradeoff.
> 
> I don't think that this is a good idea.  The weight array is temporary.
> 

It's temporary, but it's also only used in the context of the task while
the alloc lock is held.

If you think it's fine to introduce another potential OOM generating
spot, then I'll just go ahead and allocate the memory on the fly.

I do want to point out, though, that weighted_interleave_nid is called
per allocated page.  So now we're not just collecting weights to
calculate the offset, we're doing an allocation (that can fail) per page
allocated for that region.

The bulk allocator amortizes the cost of this allocation by doing it
once while allocating a chunk of pages - but the weighted_interleave_nid
function is called per-page.

By comparison, the memory cost to just allocate a static scratch area in
the mempolicy struct is only incurred by tasks with a mempolicy.

So we're talking ~1MB for 1024 threads with mempolicies to avoid error
conditions mid-page-allocation and to reduce the cost associated with
applying weighted interleave.

~Gregory

next prev parent reply	other threads:[~2024-01-04 18:59 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-23 18:10 [PATCH v5 00/11] mempolicy2, mbind2, and weighted interleave Gregory Price
2023-12-23 18:10 ` [PATCH v5 01/11] mm/mempolicy: implement the sysfs-based weighted_interleave interface Gregory Price
2023-12-27  6:42   ` Huang, Ying
2023-12-26  6:48     ` Gregory Price
2024-01-02  7:41       ` Huang, Ying
2024-01-02 19:45         ` Gregory Price
2024-01-03  2:45           ` Huang, Ying
2024-01-03  2:59             ` Gregory Price
2024-01-03  6:03               ` Huang, Ying
2024-01-03  2:46         ` Gregory Price
2023-12-23 18:10 ` [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving Gregory Price
2023-12-27  8:32   ` Huang, Ying
2023-12-26  7:01     ` Gregory Price
2023-12-26  8:06       ` Gregory Price
2023-12-26 11:32       ` Gregory Price
2024-01-02  8:42       ` Huang, Ying
2024-01-02 20:30         ` Gregory Price
2024-01-03  5:46           ` Huang, Ying
2024-01-03 22:09             ` Gregory Price
2024-01-04  5:39               ` Huang, Ying
2024-01-04 18:59                 ` Gregory Price [this message]
2024-01-05  6:51                   ` Huang, Ying
2024-01-05  7:25                     ` Gregory Price
2024-01-08  7:08                       ` Huang, Ying
2023-12-23 18:10 ` [PATCH v5 03/11] mm/mempolicy: refactor sanitize_mpol_flags for reuse Gregory Price
2023-12-27  8:39   ` Huang, Ying
2023-12-26  7:05     ` Gregory Price
2023-12-26 11:48       ` Gregory Price
2024-01-02  9:09         ` Huang, Ying
2024-01-02 20:32           ` Gregory Price
2023-12-23 18:10 ` [PATCH v5 04/11] mm/mempolicy: create struct mempolicy_args for creating new mempolicies Gregory Price
2023-12-23 18:10 ` [PATCH v5 05/11] mm/mempolicy: refactor kernel_get_mempolicy for code re-use Gregory Price
2023-12-23 18:10 ` [PATCH v5 06/11] mm/mempolicy: allow home_node to be set by mpol_new Gregory Price
2023-12-23 18:10 ` [PATCH v5 07/11] mm/mempolicy: add userland mempolicy arg structure Gregory Price
2023-12-23 18:10 ` [PATCH v5 08/11] mm/mempolicy: add set_mempolicy2 syscall Gregory Price
2024-01-02 14:38   ` Geert Uytterhoeven
2023-12-23 18:10 ` [PATCH v5 09/11] mm/mempolicy: add get_mempolicy2 syscall Gregory Price
2024-01-02 14:46   ` Geert Uytterhoeven
2023-12-23 18:11 ` [PATCH v5 10/11] mm/mempolicy: add the mbind2 syscall Gregory Price
2024-01-02 14:47   ` Geert Uytterhoeven
2023-12-23 18:11 ` [PATCH v5 11/11] mm/mempolicy: extend set_mempolicy2 and mbind2 to support weighted interleave Gregory Price
2023-12-25  7:54 ` [PATCH v5 00/11] mempolicy2, mbind2, and " Huang, Ying
2023-12-26  7:45   ` Gregory Price
2024-01-02  4:27     ` Huang, Ying
2024-01-02 19:06       ` Gregory Price
2024-01-03  3:15         ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZcAF4zIpsVN3dLd@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=emirakhur@micron.com \
    --cc=gourry.memverge@gmail.com \
    --cc=honggyu.kim@sk.com \
    --cc=hpa@zytor.com \
    --cc=hyeongtak.ji@sk.com \
    --cc=jgroves@micron.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rakie.kim@sk.com \
    --cc=ravis.opensrc@micron.com \
    --cc=seungjun.ha@samsung.com \
    --cc=sthanneeru.opensrc@micron.com \
    --cc=sthanneeru@micron.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vtavarespetr@micron.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).