From: "Li Xinhai" <lixinhai.lxh@gmail.com>
To: yang.shi <yang.shi@linux.alibaba.com>,
"Mike Kravetz" <mike.kravetz@oracle.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: akpm <akpm@linux-foundation.org>, mhocko <mhocko@suse.com>,
n-horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
Date: Wed, 15 Jan 2020 15:36:27 +0800 [thread overview]
Message-ID: <2020011515362520135446@gmail.com> (raw)
In-Reply-To: 253e9110-4ffd-e9ba-feec-48ce899af057@linux.alibaba.com
On 2020-01-15 at 13:23 Yang Shi wrote:
>
>
>On 1/14/20 8:28 PM, Mike Kravetz wrote:
>> On 1/14/20 5:24 PM, Yang Shi wrote:
>>>
>>> On 1/14/20 5:07 PM, Mike Kravetz wrote:
>>>> On 1/14/20 6:09 AM, Li Xinhai wrote:
>>>>> Add cc to
>>>>> Yang Shi <yang.shi@linux.alibaba.com>
>>>>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>> , who has been worked on this part
>>>>>
>>>>> On 2020-01-14 at 17:16 Li Xinhai wrote:
>>>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>>>>>> page:
>>>>>>
>>>>>> Notes
>>>>>> MPOL_MF_STRICT is ignored on huge page mappings.
>>>>>>
>>>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>>>>>> indicate, from test_walk, that walking this vma should be skipped even if
>>>>>> there are misplaced pages.
>>>>>>
>>>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>> I do not necessarily disagree with the change. However, this has made me
>>>> question a couple things:
>>>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>>>> - Is that leftover from the the days when huge page migration was not
>>>> supported?
>>>> - Is it just because huge page migration is more likely to fail than
>>>> base page migration.
>>>> 2) Does the mbind code function properly when unable to migrate a huge page
>>>> MPOL_MF_STRICT is set? A quick look at the code looks like it returns
>>>> EIO.
for question (2),
look into queue_pages_hugetlb(), the misplaced page would not
cause -EIO reported, for both STRICT set alone and STRICT set with MOVE*;
that means STRICT been effectively ignored during isolation phase.
In unmap and move phase, -EIO is reported if failed to move page and
STRICT is set.
>>> I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code.
>>>
>>> It would return -EIO for base pages or THP as what the manpage describes.
>>>
>> I was thinking about a migration failure after isolation. This block of
>> code in do_mbind() after queue_pages_range() and mbind_range().
>>
>> if (!err) {
>> int nr_failed = 0;
>>
>> if (!list_empty(&pagelist)) {
>> WARN_ON_ONCE(flags & MPOL_MF_LAZY);
>> nr_failed = migrate_pages(&pagelist, new_page, NULL,
>> start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
>> if (nr_failed)
>> putback_movable_pages(&pagelist);
>> }
>>
>> if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
>> err = -EIO;
>
>Hmm.. I agree this part in man page does look ambiguous. We may assume
>"MPOL_MF_STRICT is ignored on huge page mappings." implies if
>MPOL_MF_STRICT is specified alone? If MOVE flag is specified it should
>return -EIO if some pages could not be moved as what the man page describes.
>
It looks to me that current code has no feasible way to ignore STRICT
flag for hugetlb page when failure happen in unmap&move phase,
because mbind is about to handle multiple vma(i.e., hugetlb vma mixed with
other vma) in one call.
>I don't know what the intention was at the first place. We may have to
>dig into the history.
>
>>
>
>
next prev parent reply other threads:[~2020-01-15 7:36 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-14 9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai
2020-01-14 9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai
2020-01-14 14:09 ` Li Xinhai
2020-01-14 18:27 ` Yang Shi
2020-01-15 1:07 ` Mike Kravetz
2020-01-15 1:24 ` Yang Shi
2020-01-15 4:28 ` Mike Kravetz
2020-01-15 5:23 ` Yang Shi
2020-01-15 7:36 ` Li Xinhai [this message]
2020-01-15 17:16 ` Yang Shi
2020-01-15 21:07 ` Mike Kravetz
2020-01-15 21:30 ` Yang Shi
2020-01-15 21:45 ` Mike Kravetz
2020-01-15 21:59 ` Yang Shi
2020-01-16 8:07 ` HORIGUCHI NAOYA(堀口 直也)
2020-01-16 15:32 ` Li Xinhai
2020-01-16 7:59 ` Michal Hocko
2020-01-16 19:22 ` Mike Kravetz
2020-01-17 2:32 ` Yang Shi
2020-01-17 2:38 ` Li Xinhai
2020-01-17 7:57 ` Michal Hocko
2020-01-17 12:05 ` Li Xinhai
2020-01-17 15:20 ` Michal Hocko
2020-01-17 15:46 ` Li Xinhai
2020-01-20 12:45 ` Michal Hocko
2020-01-21 14:15 ` Li Xinhai
2020-01-21 14:53 ` Michal Hocko
2020-01-22 13:55 ` Li Xinhai
2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz
2020-01-15 1:25 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2020011515362520135446@gmail.com \
--to=lixinhai.lxh@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=yang.shi@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.