From: Mike Kravetz <mike.kravetz@oracle.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Lorenzo Stoakes <lstoakes@gmail.com>
Subject: Re: mbind MPOL_INTERLEAVE existing pages
Date: Tue, 2 May 2023 09:34:42 -0700 [thread overview]
Message-ID: <20230502163442.GA3873@monkey> (raw)
In-Reply-To: <ZFEMMg7gP7hJzIvl@dhcp22.suse.cz>
On 05/02/23 15:12, Michal Hocko wrote:
> On Tue 02-05-23 09:45:40, Vlastimil Babka wrote:
> > On 5/1/23 20:58, Mike Kravetz wrote:
> > > I received a question from a customer that was trying to move pages via
> > > the mbind system call. In this specific case, the system had two nodes
> > > and all pages in the range were already present on node 0. They then
> > > called mbind with mode MPOL_INTERLEAVE and the MPOL_MF_MOVE_ALL flag. Their
> > > expectation was that half the pages in the range would be moved to node 1
> > > in an interleaved pattern.
> > >
> > > In the above situation, no pages actually get moved. This is because mbind
> > > creates a list of pages to be moved via:
> > >
> > > ret = queue_pages_range(mm, start, end, nmask,
> > > flags | MPOL_MF_INVERT, &pagelist);
> > >
> > > No page will be added to the list as queue_folio_required is called for each
> > > page to determine if it resides within the set of nodes. And, all page are
> > > within the set.
> > >
> > > I have reread the mbind man page several times and agree that one might
> > > expect MPOL_INTERLEAVE with MPOL_MF_MOVE_ALL to move pages and create an
> > > interleaved pattern. My question is should we:
> > > - Change mbind so that pages are moved to an interleaved pattern?
> >
> > I guess it could be worth trying, if there's a use case. And hope nobody
> > else is depending on the current behavior and will complain afterwards :)
>
> I am not sure this is worth it wrt. complexity. Essentially it would
> require to build up the distribution for the whole range first so 2
> passes. Also it could become more tricky if the final node mask has
> nodes of difference distances (it would be a reasonable expectation to
> distribute withe minimum total distances right ;)).
Yes, I was worried about the complexity of such a change. At a high
level, interleave sounds easy. But, like most things the details
could add a bunch of complexity.
> > > - Update the documentation to be more explicit?
>
> Yes, please. I do not think. While this sounds like a neat feature I
> think the additional complexity is likely not worth it. A strong usecase
> might make a difference though.
Well, this user has a 'work around'. They simply make sure to set the
policy of this area (a shared memory segment) before populating. And,
I don't think they would really be happy with the cost of potentially
migrating hundreds of GB of data.
I'll send out a documentation update.
--
Mike Kravetz
prev parent reply other threads:[~2023-05-02 16:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-01 18:58 mbind MPOL_INTERLEAVE existing pages Mike Kravetz
2023-05-02 7:45 ` Vlastimil Babka
2023-05-02 13:12 ` Michal Hocko
2023-05-02 16:34 ` Mike Kravetz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230502163442.GA3873@monkey \
--to=mike.kravetz@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lstoakes@gmail.com \
--cc=mhocko@suse.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.