From: "lixinhai.lxh@gmail.com" <lixinhai.lxh@gmail.com>
To: "Hugh Dickins" <hughd@google.com>,
xinhai.li <xinhai.li@outlook.com>,
lixinhai_lxh <lixinhai_lxh@126.com>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
"Michal Hocko" <mhocko@kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Linux API" <linux-api@vger.kernel.org>
Subject: Re: [PATCH] mm: allow unmapped hole at head side of mbind range
Date: Mon, 28 Oct 2019 16:12:59 +0800 [thread overview]
Message-ID: <2019102816125759600417@gmail.com> (raw)
In-Reply-To: alpine.LSU.2.11.1910241900070.1096@eggly.anvils
On 2019-10-28 at 15:14:51 Hugh Dickins wrote:
>On Thu, 24 Oct 2019, Vlastimil Babka wrote:
>
>> + linux-api
>>
>> On 10/24/19 9:35 AM, Li Xinhai wrote:
>> > From: Li Xinhai <xinhai.li@outlook.com>
>> >
>> > mbind_range silently ignore unmapped hole at middle and tail of the
>> > specified range, but report EFAULT if hole at head side.
>>
>>
>> Hmm that's unfortunate. mbind() manpage says:
>>
>> EFAULT Part or all of the memory range specified by nodemask and maxnode
>> points outside your accessible address space. Or, there was an unmapped
>> hole in the specified memory range specified by addr and len.
>>
>> That sounds like any hole inside the specified range should return
>> EFAULT.
>
>Yes (though an exception is allowed when restoring to default).
>
>> But perhaps it can be also interpreted as you suggest, that the
>> whole range is an unmapped hole. There's some risk of breaking existing
>> userspace if we change it either way.
>>
>> > It is more reasonable to support silently ignore holes at any part of
>> > the range, only report EFAULT if the whole range is in hole.
>> >
>> > Signed-off-by: Li Xinhai <xinhai.li@outlook.com>
>
>Xinhai, I'm sceptical about this patch: is it something you found
>by code inspection, or something you found when using mbind()?
>
I encountered issue when using mbind (my issue was about using nodemask
parameter), and then found this special range checking in mbind_range().
>I've not looked long enough to be certain, nor experimented, but:
>
>mbind_range() is only one stage of the mbind() syscall implementation,
>and is preceded by queue_pages_range(): look what queue_pages_test_walk()
>does when MPOL_MF_DISCONTIG_OK not set.
>
>My impression is that mbind_range() is merely correcting an omission
>from the checks already made my queue_pages_test_walk() (an odd way
>to proceed, I admit: would be better to check initially than later).
>
>I do think that you should not make this change without considering
>MPOL_MF_DISCONTIG_OK and its intention.
>
>Hugh
>
A program was used to reveal issues as below.
#include <stddef.h>
#include <sys/mman.h>
#include <numaif.h>
int main(int argc, char *argv[])
{
void *mapAddr;
unsigned long nodemask;
mapAddr = mmap(NULL, 6*(1<<12), PROT_READ|PROT_WRITE, MAP_PRIVATE|
MAP_ANONYMOUS, -1, 0);
// BIND and leave 2 pages as hole in the middle
nodemask = 0x1;
mbind(mapAddr, 6*(1<<12), MPOL_BIND, &nodemask, 2, 0);
munmap(mapAddr+2*(1<<12), 2*(1<<12));
// part 1
mbind(mapAddr-1*(1<<12), 2*(1<<12), MPOL_DEFAULT, NULL, 0, 0);
mbind(mapAddr, 3*(1<<12), MPOL_DEFAULT, NULL, 0, 0);
// part 2
nodemask = 0x2;
mbind(mapAddr+3*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);
mbind(mapAddr+4*(1<<12), 3*(1<<12), MPOL_BIND, &nodemask, 3, 0);
mbind(mapAddr+3*(1<<12), 1*(1<<12), MPOL_BIND, &nodemask, 3, 0);
mbind(mapAddr+4*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);
return 0;
}
syscall results:
83 mmap(NULL, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd24e13000
84 mbind(0x7fbd24e13000, 24576, MPOL_BIND, [0x0000000000000001], 2, 0) = 0
85 munmap(0x7fbd24e15000, 8192) = 0
// part 1
86 mbind(0x7fbd24e12000, 8192, MPOL_DEFAULT, NULL, 0, 0) = -1 EFAULT (Bad address)
87 mbind(0x7fbd24e13000, 12288, MPOL_DEFAULT, NULL, 0, 0) = 0
// part 2
88 mbind(0x7fbd24e16000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
89 mbind(0x7fbd24e17000, 12288, MPOL_BIND, [0x0000000000000002], 3, 0) = 0
90 mbind(0x7fbd24e16000, 4096, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
91 mbind(0x7fbd24e17000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = 0
The results on line 86 and line 89 were not correct (other lines were expected):
line 86: hole at head side of range was reported as error; this should
sucess for MPOL_DEFAULT;
line 89: hole at tail side of range was reported as success; this should
fail for !MPOL_DEFAULT cases;
My patch only corrected line 86 case, but didn't handle line 89 case. It
is better to detect valid or invalid hole for MPOL_DEFAULT and
!MPOL_DEFAULT cases in queue_pages_range phase.
New patch will be prepared, and fullfill the linux API description:
1. for MPOL_DEFAULT, hole at any part of specified range is allowed;
2. for !MPOL_DEFAULT, hole at any part of specified range is not allowed.
Xinhai
(BTW, I am adding two more mail accounts of mine to check which is best for
this mailling list...)
>> > ---
>> >
>> > mm/mempolicy.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> >
>> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> > index 4ae967bcf954..ae160d9936d9 100644
>> > --- a/mm/mempolicy.c
>> > +++ b/mm/mempolicy.c
>> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>> > unsigned long vmend;
>> >
>> > vma = find_vma(mm, start);
>> > - if (!vma || vma->vm_start > start)
>> > + if (!vma || vma->vm_start >= end)
>> > return -EFAULT;
>> >
>> > prev = vma->vm_prev;
>> >
>
next prev parent reply other threads:[~2019-10-28 8:13 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-24 7:35 [PATCH] mm: allow unmapped hole at head side of mbind range Li Xinhai
2019-10-24 10:48 ` Vlastimil Babka
2019-10-25 2:32 ` Hugh Dickins
2019-10-28 8:12 ` lixinhai.lxh [this message]
2019-10-24 12:25 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2019102816125759600417@gmail.com \
--to=lixinhai.lxh@gmail.com \
--cc=hughd@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lixinhai_lxh@126.com \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
--cc=xinhai.li@outlook.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.