* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 11:15 ` Kirill A. Shutemov
@ 2017-11-07 11:15 ` Kirill A. Shutemov
2017-11-07 11:26 ` Florian Weimer
2017-11-07 11:56 ` Nicholas Piggin
2 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-07 11:15 UTC (permalink / raw)
To: Nicholas Piggin
Cc: Florian Weimer, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, Nov 07, 2017 at 04:07:05PM +1100, Nicholas Piggin wrote:
> C'ing everyone who was on the x86 56-bit user virtual address patch.
>
> I think we need more time to discuss this behaviour, in light of the
> regression Florian uncovered. I would propose we turn off the 56-bit
> user virtual address support for x86 for 4.14, and powerpc would
> follow and turn off its 512T support until we can get a better handle
> on the problems. (Actually Florian initially hit a couple of bugs in
> powerpc implementation, but pulling that string uncovers a whole lot
> of difficulties.)
>
> The bi-modal behavior switched based on a combination of mmap address
> hint and MAP_FIXED just sucks. It's segregating our VA space with
> some non-standard heuristics, and it doesn't seem to work very well.
>
> What are we trying to do? Allow SAP HANA etc use huge address spaces
> by coding to these specific mmap heuristics we're going to add,
> rather than solving it properly in a way that requires adding a new
> syscall or personality or prctl or sysctl. Okay, but the cost is that
> despite best efforts, it still changes ABI behaviour for existing
> applications and these heuristics will become baked into the ABI that
> we will have to support. Not a good tradeoff IMO.
>
> First of all, using addr and MAP_FIXED to develop our heuristic can
> never really give unchanged ABI. It's an in-band signal. brk() is a
> good example that steadily keeps incrementing address, so depending
> on malloc usage and address space randomization, you will get a brk()
> that ends exactly at 128T, then the next one will be >
> DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
No, it won't. You will hit stack first.
> Second, the kernel can never completely solve the problem this way.
> How do we know a malloc library will not ask for > 128TB addresses
> and pass them to an unknowing application?
The idea is that an application can provide hint (mallopt() ?) to malloc
implementation that it's ready to full address space. In this case, malloc
can use mmap((void *) -1,...) for its allocations and get full address
space this way.
> And lastly, there are a fair few bugs and places where description
> in changelogs and mailing lists does not match code. You don't want
> to know the mess in powerpc, but even x86 has two I can see:
> MAP_FIXED succeeds even when crossing 128TB addresses (where changelog
> indicated it should not),
Hm. I don't see where the changelog indicated that MAP_FIXED across 128TB
shouldn't work. My intention was that it should, although I haven't stated
it in the changelog.
The idea was we shouldn't allow to slip above 47-bits by accidentally.
Correctly functioning program would never request addr+len above 47-bit
with MAP_FIXED, unless it's ready to handle such addresses. Otherwise the
request would simply fail on machine that doesn't support large VA.
In contrast, addr+len above 47-bit without MAP_FIXED will not fail on
machine that doesn't support large VA, kernel will find another place
under 47-bit. And I can imagine a reasonable application that does
something like this.
So we cannot rely that application is ready to handle large
addresses if we see addr+len without MAP_FIXED.
> arch_get_unmapped_area_topdown() with an address hint is checking
> against TASK_SIZE rather than the limited 128TB address, so it looks
> like it won't follow the heuristics.
You are right. This is broken. If user would request mapping above vdso,
but below DEFAULT_MAP_WINDOW it will succeed.
I'll send patch to fix this. But it doesn't look as a show-stopper to me.
Re-checking things for this reply I found actual bug, see:
http://lkml.kernel.org/r/20171107103804.47341-1-kirill.shutemov@linux.intel.com
> So unless everyone else thinks I'm crazy and disagrees, I'd ask for
> a bit more time to make sure we get this interface right. I would
> hope for something like prctl PR_SET_MM which can be used to set
> our user virtual address bits on a fine grained basis. Maybe a
> sysctl, maybe a personality. Something out-of-band. I don't wan to
> get too far into that discussion yet. First we need to agree whether
> or not the code in the tree today is a problem.
Well, we've discussed before all options you are proposing.
Linus wanted a minimalistic interface, so we took this path for now.
We can always add more ways to get access to full address space later.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 11:15 ` Kirill A. Shutemov
2017-11-07 11:15 ` Kirill A. Shutemov
@ 2017-11-07 11:26 ` Florian Weimer
2017-11-07 11:26 ` Florian Weimer
2017-11-07 11:44 ` Kirill A. Shutemov
2017-11-07 11:56 ` Nicholas Piggin
2 siblings, 2 replies; 34+ messages in thread
From: Florian Weimer @ 2017-11-07 11:26 UTC (permalink / raw)
To: Kirill A. Shutemov, Nicholas Piggin
Cc: Aneesh Kumar K.V, Kirill A. Shutemov, linuxppc-dev, linux-mm,
Andrew Morton, Andy Lutomirski, Dave Hansen, Linus Torvalds,
Peter Zijlstra, Thomas Gleixner, linux-arch, Ingo Molnar,
Linux Kernel Mailing List
On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote:
>> First of all, using addr and MAP_FIXED to develop our heuristic can
>> never really give unchanged ABI. It's an in-band signal. brk() is a
>> good example that steadily keeps incrementing address, so depending
>> on malloc usage and address space randomization, you will get a brk()
>> that ends exactly at 128T, then the next one will be >
>> DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
>
> No, it won't. You will hit stack first.
That's not actually true on POWER in some cases. See the process maps I
posted here:
<https://marc.info/?l=linuxppc-embedded&m=150988538106263&w=2>
Thanks,
Florian
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 11:26 ` Florian Weimer
@ 2017-11-07 11:26 ` Florian Weimer
2017-11-07 11:44 ` Kirill A. Shutemov
1 sibling, 0 replies; 34+ messages in thread
From: Florian Weimer @ 2017-11-07 11:26 UTC (permalink / raw)
To: Kirill A. Shutemov, Nicholas Piggin
Cc: Aneesh Kumar K.V, Kirill A. Shutemov, linuxppc-dev, linux-mm,
Andrew Morton, Andy Lutomirski, Dave Hansen, Linus Torvalds,
Peter Zijlstra, Thomas Gleixner, linux-arch, Ingo Molnar,
Linux Kernel Mailing List
On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote:
>> First of all, using addr and MAP_FIXED to develop our heuristic can
>> never really give unchanged ABI. It's an in-band signal. brk() is a
>> good example that steadily keeps incrementing address, so depending
>> on malloc usage and address space randomization, you will get a brk()
>> that ends exactly at 128T, then the next one will be >
>> DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
>
> No, it won't. You will hit stack first.
That's not actually true on POWER in some cases. See the process maps I
posted here:
<https://marc.info/?l=linuxppc-embedded&m=150988538106263&w=2>
Thanks,
Florian
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 11:26 ` Florian Weimer
2017-11-07 11:26 ` Florian Weimer
@ 2017-11-07 11:44 ` Kirill A. Shutemov
2017-11-07 11:44 ` Kirill A. Shutemov
2017-11-07 13:05 ` Florian Weimer
1 sibling, 2 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-07 11:44 UTC (permalink / raw)
To: Florian Weimer
Cc: Nicholas Piggin, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote:
> On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote:
>
> > > First of all, using addr and MAP_FIXED to develop our heuristic can
> > > never really give unchanged ABI. It's an in-band signal. brk() is a
> > > good example that steadily keeps incrementing address, so depending
> > > on malloc usage and address space randomization, you will get a brk()
> > > that ends exactly at 128T, then the next one will be >
> > > DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
> >
> > No, it won't. You will hit stack first.
>
> That's not actually true on POWER in some cases. See the process maps I
> posted here:
>
> <https://marc.info/?l=linuxppc-embedded&m=150988538106263&w=2>
Hm? I see that in all three cases the [stack] is the last mapping.
Do I miss something?
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 11:44 ` Kirill A. Shutemov
@ 2017-11-07 11:44 ` Kirill A. Shutemov
2017-11-07 13:05 ` Florian Weimer
1 sibling, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-07 11:44 UTC (permalink / raw)
To: Florian Weimer
Cc: Nicholas Piggin, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote:
> On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote:
>
> > > First of all, using addr and MAP_FIXED to develop our heuristic can
> > > never really give unchanged ABI. It's an in-band signal. brk() is a
> > > good example that steadily keeps incrementing address, so depending
> > > on malloc usage and address space randomization, you will get a brk()
> > > that ends exactly at 128T, then the next one will be >
> > > DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
> >
> > No, it won't. You will hit stack first.
>
> That's not actually true on POWER in some cases. See the process maps I
> posted here:
>
> <https://marc.info/?l=linuxppc-embedded&m=150988538106263&w=2>
Hm? I see that in all three cases the [stack] is the last mapping.
Do I miss something?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 11:44 ` Kirill A. Shutemov
2017-11-07 11:44 ` Kirill A. Shutemov
@ 2017-11-07 13:05 ` Florian Weimer
2017-11-07 13:16 ` Kirill A. Shutemov
1 sibling, 1 reply; 34+ messages in thread
From: Florian Weimer @ 2017-11-07 13:05 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Nicholas Piggin, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]
On 11/07/2017 12:44 PM, Kirill A. Shutemov wrote:
> On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote:
>> On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote:
>>
>>>> First of all, using addr and MAP_FIXED to develop our heuristic can
>>>> never really give unchanged ABI. It's an in-band signal. brk() is a
>>>> good example that steadily keeps incrementing address, so depending
>>>> on malloc usage and address space randomization, you will get a brk()
>>>> that ends exactly at 128T, then the next one will be >
>>>> DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
>>>
>>> No, it won't. You will hit stack first.
>>
>> That's not actually true on POWER in some cases. See the process maps I
>> posted here:
>>
>> <https://marc.info/?l=linuxppc-embedded&m=150988538106263&w=2>
>
> Hm? I see that in all three cases the [stack] is the last mapping.
> Do I miss something?
Hah, I had not noticed. Occasionally, the order of heap and stack is
reversed. This happens in approximately 15% of the runs.
See the attached example.
Thanks,
Florian
[-- Attachment #2: maps.txt --]
[-- Type: text/plain, Size: 2684 bytes --]
7fffacc50000-7fffacc90000 rw-p 00000000 00:00 0
7fffacc90000-7fffaccf0000 r--p 00000000 fd:00 25167925 /usr/lib/locale/en_US.utf8/LC_CTYPE
7fffaccf0000-7fffacd00000 r--p 00000000 fd:00 25167928 /usr/lib/locale/en_US.utf8/LC_NUMERIC
7fffacd00000-7fffacd10000 r--p 00000000 fd:00 16798929 /usr/lib/locale/en_US.utf8/LC_TIME
7fffacd10000-7ffface40000 r--p 00000000 fd:00 25167924 /usr/lib/locale/en_US.utf8/LC_COLLATE
7ffface40000-7ffface50000 r--p 00000000 fd:00 16798927 /usr/lib/locale/en_US.utf8/LC_MONETARY
7ffface50000-7ffface60000 r--p 00000000 fd:00 2511 /usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES
7ffface60000-7ffface70000 r--p 00000000 fd:00 16798942 /usr/lib/locale/en_US.utf8/LC_PAPER
7ffface70000-7ffface80000 r--p 00000000 fd:00 25167927 /usr/lib/locale/en_US.utf8/LC_NAME
7ffface80000-7ffface90000 r--p 00000000 fd:00 16798924 /usr/lib/locale/en_US.utf8/LC_ADDRESS
7ffface90000-7fffacea0000 r--p 00000000 fd:00 16798928 /usr/lib/locale/en_US.utf8/LC_TELEPHONE
7fffacea0000-7fffaceb0000 r--p 00000000 fd:00 16798926 /usr/lib/locale/en_US.utf8/LC_MEASUREMENT
7fffaceb0000-7fffacec0000 r--s 00000000 fd:00 8390657 /usr/lib64/gconv/gconv-modules.cache
7fffacec0000-7fffad0d0000 r-xp 00000000 fd:00 8390335 /usr/lib64/libc-2.25.so
7fffad0d0000-7fffad0e0000 ---p 00210000 fd:00 8390335 /usr/lib64/libc-2.25.so
7fffad0e0000-7fffad0f0000 r--p 00210000 fd:00 8390335 /usr/lib64/libc-2.25.so
7fffad0f0000-7fffad100000 rw-p 00220000 fd:00 8390335 /usr/lib64/libc-2.25.so
7fffad100000-7fffad110000 r--p 00000000 fd:00 16798925 /usr/lib/locale/en_US.utf8/LC_IDENTIFICATION
7fffad110000-7fffad120000 r-xp 00000000 fd:00 63543 /usr/bin/cat
7fffad120000-7fffad130000 r--p 00000000 fd:00 63543 /usr/bin/cat
7fffad130000-7fffad140000 rw-p 00010000 fd:00 63543 /usr/bin/cat
7fffad140000-7fffad160000 r-xp 00000000 00:00 0 [vdso]
7fffad160000-7fffad1a0000 r-xp 00000000 fd:00 8390328 /usr/lib64/ld-2.25.so
7fffad1a0000-7fffad1b0000 r--p 00030000 fd:00 8390328 /usr/lib64/ld-2.25.so
7fffad1b0000-7fffad1c0000 rw-p 00040000 fd:00 8390328 /usr/lib64/ld-2.25.so
7fffc2cf0000-7fffc2d20000 rw-p 00000000 00:00 0 [stack]
7fffc8c10000-7fffc8c40000 rw-p 00000000 00:00 0 [heap]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 13:05 ` Florian Weimer
@ 2017-11-07 13:16 ` Kirill A. Shutemov
2017-11-08 6:08 ` Michael Ellerman
0 siblings, 1 reply; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-07 13:16 UTC (permalink / raw)
To: Florian Weimer
Cc: Nicholas Piggin, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, Nov 07, 2017 at 02:05:42PM +0100, Florian Weimer wrote:
> On 11/07/2017 12:44 PM, Kirill A. Shutemov wrote:
> > On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote:
> > > On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote:
> > >
> > > > > First of all, using addr and MAP_FIXED to develop our heuristic can
> > > > > never really give unchanged ABI. It's an in-band signal. brk() is a
> > > > > good example that steadily keeps incrementing address, so depending
> > > > > on malloc usage and address space randomization, you will get a brk()
> > > > > that ends exactly at 128T, then the next one will be >
> > > > > DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
> > > >
> > > > No, it won't. You will hit stack first.
> > >
> > > That's not actually true on POWER in some cases. See the process maps I
> > > posted here:
> > >
> > > <https://marc.info/?l=linuxppc-embedded&m=150988538106263&w=2>
> >
> > Hm? I see that in all three cases the [stack] is the last mapping.
> > Do I miss something?
>
> Hah, I had not noticed. Occasionally, the order of heap and stack is
> reversed. This happens in approximately 15% of the runs.
Heh. I guess ASLR on Power is too fancy :)
That's strange layout. It doesn't give that much (relatively speaking)
virtual address space for both stack and heap to grow.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 13:16 ` Kirill A. Shutemov
@ 2017-11-08 6:08 ` Michael Ellerman
2017-11-08 6:18 ` Florian Weimer
0 siblings, 1 reply; 34+ messages in thread
From: Michael Ellerman @ 2017-11-08 6:08 UTC (permalink / raw)
To: Kirill A. Shutemov, Florian Weimer, Kees Cook
Cc: linux-arch, Dave Hansen, Peter Zijlstra, Linus Torvalds,
Ingo Molnar, Linux Kernel Mailing List, Nicholas Piggin,
Andy Lutomirski, linux-mm, Aneesh Kumar K.V, Andrew Morton,
linuxppc-dev, Thomas Gleixner, Kirill A. Shutemov
"Kirill A. Shutemov" <kirill@shutemov.name> writes:
> On Tue, Nov 07, 2017 at 02:05:42PM +0100, Florian Weimer wrote:
>> On 11/07/2017 12:44 PM, Kirill A. Shutemov wrote:
>> > On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote:
>> > > On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote:
>> > >
>> > > > > First of all, using addr and MAP_FIXED to develop our heuristic can
>> > > > > never really give unchanged ABI. It's an in-band signal. brk() is a
>> > > > > good example that steadily keeps incrementing address, so depending
>> > > > > on malloc usage and address space randomization, you will get a brk()
>> > > > > that ends exactly at 128T, then the next one will be >
>> > > > > DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
>> > > >
>> > > > No, it won't. You will hit stack first.
>> > >
>> > > That's not actually true on POWER in some cases. See the process maps I
>> > > posted here:
>> > >
>> > > <https://marc.info/?l=linuxppc-embedded&m=150988538106263&w=2>
>> >
>> > Hm? I see that in all three cases the [stack] is the last mapping.
>> > Do I miss something?
>>
>> Hah, I had not noticed. Occasionally, the order of heap and stack is
>> reversed. This happens in approximately 15% of the runs.
>
> Heh. I guess ASLR on Power is too fancy :)
Fancy implies we're doing it on purpose :P
> That's strange layout. It doesn't give that much (relatively speaking)
> virtual address space for both stack and heap to grow.
I'm pretty sure it only happens when you're running an ELF interpreter
directly, because of Kees patch which changed the logic to load ELF
interpreters in the mmap region, vs PIE binaries which go to
ELF_ET_DYN_BASE. (eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only
for PIE"))
It only happens with ASLR enabled. Presumably it's because our brk_rnd()
is overly aggressive in this case, it randomises up to 1GB, and the heap
jumps over the stack.
cheers
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-08 6:08 ` Michael Ellerman
@ 2017-11-08 6:18 ` Florian Weimer
0 siblings, 0 replies; 34+ messages in thread
From: Florian Weimer @ 2017-11-08 6:18 UTC (permalink / raw)
To: Michael Ellerman, Kirill A. Shutemov, Kees Cook
Cc: linux-arch, Dave Hansen, Peter Zijlstra, Linus Torvalds,
Ingo Molnar, Linux Kernel Mailing List, Nicholas Piggin,
Andy Lutomirski, linux-mm, Aneesh Kumar K.V, Andrew Morton,
linuxppc-dev, Thomas Gleixner, Kirill A. Shutemov
On 11/08/2017 07:08 AM, Michael Ellerman wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
>
>> On Tue, Nov 07, 2017 at 02:05:42PM +0100, Florian Weimer wrote:
>>> On 11/07/2017 12:44 PM, Kirill A. Shutemov wrote:
>>>> On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote:
>>>>> On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote:
>>>>>
>>>>>>> First of all, using addr and MAP_FIXED to develop our heuristic can
>>>>>>> never really give unchanged ABI. It's an in-band signal. brk() is a
>>>>>>> good example that steadily keeps incrementing address, so depending
>>>>>>> on malloc usage and address space randomization, you will get a brk()
>>>>>>> that ends exactly at 128T, then the next one will be >
>>>>>>> DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
>>>>>>
>>>>>> No, it won't. You will hit stack first.
>>>>>
>>>>> That's not actually true on POWER in some cases. See the process maps I
>>>>> posted here:
>>>>>
>>>>> <https://marc.info/?l=linuxppc-embedded&m=150988538106263&w=2>
>>>>
>>>> Hm? I see that in all three cases the [stack] is the last mapping.
>>>> Do I miss something?
>>>
>>> Hah, I had not noticed. Occasionally, the order of heap and stack is
>>> reversed. This happens in approximately 15% of the runs.
>>
>> Heh. I guess ASLR on Power is too fancy :)
>
> Fancy implies we're doing it on purpose :P
>
>> That's strange layout. It doesn't give that much (relatively speaking)
>> virtual address space for both stack and heap to grow.
>
> I'm pretty sure it only happens when you're running an ELF interpreter
> directly, because of Kees patch which changed the logic to load ELF
> interpreters in the mmap region, vs PIE binaries which go to
> ELF_ET_DYN_BASE. (eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only
> for PIE"))
From that commit:
+ * There are effectively two types of ET_DYN
+ * binaries: programs (i.e. PIE: ET_DYN with INTERP)
+ * and loaders (ET_DYN without INTERP, since they
+ * _are_ the ELF interpreter). The loaders must
Note that the comment is a bit misleading: statically linked PIE
binaries are ET_DYN without INTERP, too.
So any oddity which is observable today with an explicitly ld.so
invocation only will gain more relevance once we get static PIE support
in user space because it will then affect regular applications, too.
(Well, statically linked ones.) In this sense, process layouts which
cause premature brk failure or insufficient stack allocations are real bugs.
Thanks,
Florian
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 11:15 ` Kirill A. Shutemov
2017-11-07 11:15 ` Kirill A. Shutemov
2017-11-07 11:26 ` Florian Weimer
@ 2017-11-07 11:56 ` Nicholas Piggin
2017-11-07 12:28 ` Kirill A. Shutemov
2 siblings, 1 reply; 34+ messages in thread
From: Nicholas Piggin @ 2017-11-07 11:56 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Florian Weimer, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, 7 Nov 2017 14:15:43 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Tue, Nov 07, 2017 at 04:07:05PM +1100, Nicholas Piggin wrote:
> > C'ing everyone who was on the x86 56-bit user virtual address patch.
> >
> > I think we need more time to discuss this behaviour, in light of the
> > regression Florian uncovered. I would propose we turn off the 56-bit
> > user virtual address support for x86 for 4.14, and powerpc would
> > follow and turn off its 512T support until we can get a better handle
> > on the problems. (Actually Florian initially hit a couple of bugs in
> > powerpc implementation, but pulling that string uncovers a whole lot
> > of difficulties.)
> >
> > The bi-modal behavior switched based on a combination of mmap address
> > hint and MAP_FIXED just sucks. It's segregating our VA space with
> > some non-standard heuristics, and it doesn't seem to work very well.
> >
> > What are we trying to do? Allow SAP HANA etc use huge address spaces
> > by coding to these specific mmap heuristics we're going to add,
> > rather than solving it properly in a way that requires adding a new
> > syscall or personality or prctl or sysctl. Okay, but the cost is that
> > despite best efforts, it still changes ABI behaviour for existing
> > applications and these heuristics will become baked into the ABI that
> > we will have to support. Not a good tradeoff IMO.
> >
> > First of all, using addr and MAP_FIXED to develop our heuristic can
> > never really give unchanged ABI. It's an in-band signal. brk() is a
> > good example that steadily keeps incrementing address, so depending
> > on malloc usage and address space randomization, you will get a brk()
> > that ends exactly at 128T, then the next one will be >
> > DEFAULT_MAP_WINDOW, and it will switch you to 56 bit address space.
>
> No, it won't. You will hit stack first.
I guess so. Florian's bug didn't crash there for some reason, okay
but I suppose my point about brk is not exactly where the standard
heap is, but the pattern of allocations. An allocator that uses
mmap for managing its address space might do the same thing, e.g.,
incrementally expand existing mmaps as necessary.
> > Second, the kernel can never completely solve the problem this way.
> > How do we know a malloc library will not ask for > 128TB addresses
> > and pass them to an unknowing application?
>
> The idea is that an application can provide hint (mallopt() ?) to malloc
> implementation that it's ready to full address space. In this case, malloc
> can use mmap((void *) -1,...) for its allocations and get full address
> space this way.
Point is, there's nothing stopping an allocator library or runtime
from asking for mmap anywhere and returning it to userspace.
Do > 128TB pointers matter so much that we should add this heuristic
to prevent breakage, but little enough that we can accept some rare
cases getting through? Genuine question.
>
> > And lastly, there are a fair few bugs and places where description
> > in changelogs and mailing lists does not match code. You don't want
> > to know the mess in powerpc, but even x86 has two I can see:
> > MAP_FIXED succeeds even when crossing 128TB addresses (where changelog
> > indicated it should not),
>
> Hm. I don't see where the changelog indicated that MAP_FIXED across 128TB
> shouldn't work. My intention was that it should, although I haven't stated
> it in the changelog.
To mitigate this, we are not going to allocate virtual address space
above 47-bit by default.
But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 47-bits.
Yet we got 48 bit address with 47 bit address (with MAP_FIXED).
>
> The idea was we shouldn't allow to slip above 47-bits by accidentally.
>
> Correctly functioning program would never request addr+len above 47-bit
> with MAP_FIXED, unless it's ready to handle such addresses. Otherwise the
> request would simply fail on machine that doesn't support large VA.
>
> In contrast, addr+len above 47-bit without MAP_FIXED will not fail on
> machine that doesn't support large VA, kernel will find another place
> under 47-bit. And I can imagine a reasonable application that does
> something like this.
>
> So we cannot rely that application is ready to handle large
> addresses if we see addr+len without MAP_FIXED.
By the same logic, a request for addr > 128TB without MAP_FIXED will
not fail, therefore we can't rely on that either.
Or an app that links to a library that attempts MAP_FIXED allocation
of addr + len above 128TB might use high bits of pointer returned by
that library because those are never satisfied today and the library
would fall back.
>
> > arch_get_unmapped_area_topdown() with an address hint is checking
> > against TASK_SIZE rather than the limited 128TB address, so it looks
> > like it won't follow the heuristics.
>
> You are right. This is broken. If user would request mapping above vdso,
> but below DEFAULT_MAP_WINDOW it will succeed.
>
> I'll send patch to fix this. But it doesn't look as a show-stopper to me.
>
> Re-checking things for this reply I found actual bug, see:
>
> http://lkml.kernel.org/r/20171107103804.47341-1-kirill.shutemov@linux.intel.com
It's not a show stopper per se, and of course I don't expect code to be
bug free, it's just concerning that we haven't had any kind of regression
testing or observations that have checked for this.
>
> > So unless everyone else thinks I'm crazy and disagrees, I'd ask for
> > a bit more time to make sure we get this interface right. I would
> > hope for something like prctl PR_SET_MM which can be used to set
> > our user virtual address bits on a fine grained basis. Maybe a
> > sysctl, maybe a personality. Something out-of-band. I don't wan to
> > get too far into that discussion yet. First we need to agree whether
> > or not the code in the tree today is a problem.
>
> Well, we've discussed before all options you are proposing.
> Linus wanted a minimalistic interface, so we took this path for now.
> We can always add more ways to get access to full address space later.
>
Right, I'm just raising it again with additional justifications. I don't
think it's the right way to go or has to be rushed now.
Thanks,
Nick
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 11:56 ` Nicholas Piggin
@ 2017-11-07 12:28 ` Kirill A. Shutemov
2017-11-07 12:28 ` Kirill A. Shutemov
2017-11-07 13:33 ` Nicholas Piggin
0 siblings, 2 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-07 12:28 UTC (permalink / raw)
To: Nicholas Piggin
Cc: Florian Weimer, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, Nov 07, 2017 at 10:56:36PM +1100, Nicholas Piggin wrote:
> > No, it won't. You will hit stack first.
>
> I guess so. Florian's bug didn't crash there for some reason, okay
> but I suppose my point about brk is not exactly where the standard
> heap is, but the pattern of allocations. An allocator that uses
> mmap for managing its address space might do the same thing, e.g.,
> incrementally expand existing mmaps as necessary.
With MAP_FIXED? I don't think so.
> > > Second, the kernel can never completely solve the problem this way.
> > > How do we know a malloc library will not ask for > 128TB addresses
> > > and pass them to an unknowing application?
> >
> > The idea is that an application can provide hint (mallopt() ?) to malloc
> > implementation that it's ready to full address space. In this case, malloc
> > can use mmap((void *) -1,...) for its allocations and get full address
> > space this way.
>
> Point is, there's nothing stopping an allocator library or runtime
> from asking for mmap anywhere and returning it to userspace.
Right. Nobody would stop it from doing stupid things. There are many
things that a library may do that application would not be happy about.
> Do > 128TB pointers matter so much that we should add this heuristic
> to prevent breakage, but little enough that we can accept some rare
> cases getting through? Genuine question.
At the end of the day what matters is if heuristic helps prevent breakage
of existing userspace and doesn't stay in the way of legitimate use of
full address space.
So far, it looks okay to me.
> > The idea was we shouldn't allow to slip above 47-bits by accidentally.
> >
> > Correctly functioning program would never request addr+len above 47-bit
> > with MAP_FIXED, unless it's ready to handle such addresses. Otherwise the
> > request would simply fail on machine that doesn't support large VA.
> >
> > In contrast, addr+len above 47-bit without MAP_FIXED will not fail on
> > machine that doesn't support large VA, kernel will find another place
> > under 47-bit. And I can imagine a reasonable application that does
> > something like this.
> >
> > So we cannot rely that application is ready to handle large
> > addresses if we see addr+len without MAP_FIXED.
>
> By the same logic, a request for addr > 128TB without MAP_FIXED will
> not fail, therefore we can't rely on that either.
>
> Or an app that links to a library that attempts MAP_FIXED allocation
> of addr + len above 128TB might use high bits of pointer returned by
> that library because those are never satisfied today and the library
> would fall back.
If you want to point that it's ABI break, yes it is.
But we allow ABI break as long as nobody notices. I think it's reasonable
to expect that nobody relies on such corner cases.
If we would find any piece of software affect by the change we would need
to reconsider.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 12:28 ` Kirill A. Shutemov
@ 2017-11-07 12:28 ` Kirill A. Shutemov
2017-11-07 13:33 ` Nicholas Piggin
1 sibling, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-07 12:28 UTC (permalink / raw)
To: Nicholas Piggin
Cc: Florian Weimer, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, Nov 07, 2017 at 10:56:36PM +1100, Nicholas Piggin wrote:
> > No, it won't. You will hit stack first.
>
> I guess so. Florian's bug didn't crash there for some reason, okay
> but I suppose my point about brk is not exactly where the standard
> heap is, but the pattern of allocations. An allocator that uses
> mmap for managing its address space might do the same thing, e.g.,
> incrementally expand existing mmaps as necessary.
With MAP_FIXED? I don't think so.
> > > Second, the kernel can never completely solve the problem this way.
> > > How do we know a malloc library will not ask for > 128TB addresses
> > > and pass them to an unknowing application?
> >
> > The idea is that an application can provide hint (mallopt() ?) to malloc
> > implementation that it's ready to full address space. In this case, malloc
> > can use mmap((void *) -1,...) for its allocations and get full address
> > space this way.
>
> Point is, there's nothing stopping an allocator library or runtime
> from asking for mmap anywhere and returning it to userspace.
Right. Nobody would stop it from doing stupid things. There are many
things that a library may do that application would not be happy about.
> Do > 128TB pointers matter so much that we should add this heuristic
> to prevent breakage, but little enough that we can accept some rare
> cases getting through? Genuine question.
At the end of the day what matters is if heuristic helps prevent breakage
of existing userspace and doesn't stay in the way of legitimate use of
full address space.
So far, it looks okay to me.
> > The idea was we shouldn't allow to slip above 47-bits by accidentally.
> >
> > Correctly functioning program would never request addr+len above 47-bit
> > with MAP_FIXED, unless it's ready to handle such addresses. Otherwise the
> > request would simply fail on machine that doesn't support large VA.
> >
> > In contrast, addr+len above 47-bit without MAP_FIXED will not fail on
> > machine that doesn't support large VA, kernel will find another place
> > under 47-bit. And I can imagine a reasonable application that does
> > something like this.
> >
> > So we cannot rely that application is ready to handle large
> > addresses if we see addr+len without MAP_FIXED.
>
> By the same logic, a request for addr > 128TB without MAP_FIXED will
> not fail, therefore we can't rely on that either.
>
> Or an app that links to a library that attempts MAP_FIXED allocation
> of addr + len above 128TB might use high bits of pointer returned by
> that library because those are never satisfied today and the library
> would fall back.
If you want to point that it's ABI break, yes it is.
But we allow ABI break as long as nobody notices. I think it's reasonable
to expect that nobody relies on such corner cases.
If we would find any piece of software affect by the change we would need
to reconsider.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 12:28 ` Kirill A. Shutemov
2017-11-07 12:28 ` Kirill A. Shutemov
@ 2017-11-07 13:33 ` Nicholas Piggin
2017-11-07 13:33 ` Nicholas Piggin
2017-11-07 13:45 ` Aneesh Kumar K.V
1 sibling, 2 replies; 34+ messages in thread
From: Nicholas Piggin @ 2017-11-07 13:33 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Florian Weimer, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, 7 Nov 2017 15:28:25 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Tue, Nov 07, 2017 at 10:56:36PM +1100, Nicholas Piggin wrote:
> > > No, it won't. You will hit stack first.
> >
> > I guess so. Florian's bug didn't crash there for some reason, okay
> > but I suppose my point about brk is not exactly where the standard
> > heap is, but the pattern of allocations. An allocator that uses
> > mmap for managing its address space might do the same thing, e.g.,
> > incrementally expand existing mmaps as necessary.
>
> With MAP_FIXED? I don't think so.
brk() based allocator effectively usees MAP_FIXED. If you know where
your addresses are, using MAP_FIXED can be used.
But okay let's ignore MAP_FIXED as a corner case. Then what happens
with !MAP_FIXED when an allocation ends exactly at 128TB and then next
one begins at 128TB? Won't that expand the address space? Should we
ignore that corner case too?
>
> > > > Second, the kernel can never completely solve the problem this way.
> > > > How do we know a malloc library will not ask for > 128TB addresses
> > > > and pass them to an unknowing application?
> > >
> > > The idea is that an application can provide hint (mallopt() ?) to malloc
> > > implementation that it's ready to full address space. In this case, malloc
> > > can use mmap((void *) -1,...) for its allocations and get full address
> > > space this way.
> >
> > Point is, there's nothing stopping an allocator library or runtime
> > from asking for mmap anywhere and returning it to userspace.
>
> Right. Nobody would stop it from doing stupid things. There are many
> things that a library may do that application would not be happy about.
Indeed.
>
> > Do > 128TB pointers matter so much that we should add this heuristic
> > to prevent breakage, but little enough that we can accept some rare
> > cases getting through? Genuine question.
>
> At the end of the day what matters is if heuristic helps prevent breakage
> of existing userspace and doesn't stay in the way of legitimate use of
> full address space.
>
> So far, it looks okay to me.
Well that wasn't really the point of my question. Yes of course that
is important. But the question is how are these heuristics chosen and
evaluated? Why is this a good change to make?
We've decided there is some benefit from preventing 128TB pointers, but
also not enough that we have to completely prevent them accidentally
being returned. Yet it's important enough to make mmap behaviour diverge
in about 5 ways around 128TB depending on what combination of address
and length and MAP_FIXED you specify, and introducing this new way to
use the interface to get an expanded address space?
And we're doing this because we don't want to add a prctl or personality
or whatever for SAP HANA, because it was decided that would be too complex?
>
> > > The idea was we shouldn't allow to slip above 47-bits by accidentally.
> > >
> > > Correctly functioning program would never request addr+len above 47-bit
> > > with MAP_FIXED, unless it's ready to handle such addresses. Otherwise the
> > > request would simply fail on machine that doesn't support large VA.
> > >
> > > In contrast, addr+len above 47-bit without MAP_FIXED will not fail on
> > > machine that doesn't support large VA, kernel will find another place
> > > under 47-bit. And I can imagine a reasonable application that does
> > > something like this.
> > >
> > > So we cannot rely that application is ready to handle large
> > > addresses if we see addr+len without MAP_FIXED.
> >
> > By the same logic, a request for addr > 128TB without MAP_FIXED will
> > not fail, therefore we can't rely on that either.
> >
> > Or an app that links to a library that attempts MAP_FIXED allocation
> > of addr + len above 128TB might use high bits of pointer returned by
> > that library because those are never satisfied today and the library
> > would fall back.
>
> If you want to point that it's ABI break, yes it is.
>
> But we allow ABI break as long as nobody notices. I think it's reasonable
> to expect that nobody relies on such corner cases.
I accept the point, but I do worry mmap syscall is very fundamental,
and that it is used in a lot of ways that are difficult to foresee,
so there's a lot of room for unintended consequences. Plus the state
of the code (that's not to pick on x86, powerpc is worse) is worrying.
>
> If we would find any piece of software affect by the change we would need
> to reconsider.
>
Problem is that if there was an issue caused by this, it's unlikely
to be found until a long time later. By that time, hopefully there
aren't too many other apps that are now rely on the behaviour if it
has to be changed.
I just don't see why that's a risk worth taking at all. I can't see how
the cost benefit is there. I have to confess I could have been more
involved in discussion, but is any other interface honestly too complex
to add?
If it is decided to keep these kind of heuristics, can we get just a
small but reasonably precise description of each change to the
interface and ways for using the new functionality, such that would be
suitable for the man page? I couldn't fix powerpc because nothing
matches and even Aneesh and you differ on some details (MAP_FIXED
behaviour).
Thanks,
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 13:33 ` Nicholas Piggin
@ 2017-11-07 13:33 ` Nicholas Piggin
2017-11-07 13:45 ` Aneesh Kumar K.V
1 sibling, 0 replies; 34+ messages in thread
From: Nicholas Piggin @ 2017-11-07 13:33 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Florian Weimer, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, 7 Nov 2017 15:28:25 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Tue, Nov 07, 2017 at 10:56:36PM +1100, Nicholas Piggin wrote:
> > > No, it won't. You will hit stack first.
> >
> > I guess so. Florian's bug didn't crash there for some reason, okay
> > but I suppose my point about brk is not exactly where the standard
> > heap is, but the pattern of allocations. An allocator that uses
> > mmap for managing its address space might do the same thing, e.g.,
> > incrementally expand existing mmaps as necessary.
>
> With MAP_FIXED? I don't think so.
brk() based allocator effectively usees MAP_FIXED. If you know where
your addresses are, using MAP_FIXED can be used.
But okay let's ignore MAP_FIXED as a corner case. Then what happens
with !MAP_FIXED when an allocation ends exactly at 128TB and then next
one begins at 128TB? Won't that expand the address space? Should we
ignore that corner case too?
>
> > > > Second, the kernel can never completely solve the problem this way.
> > > > How do we know a malloc library will not ask for > 128TB addresses
> > > > and pass them to an unknowing application?
> > >
> > > The idea is that an application can provide hint (mallopt() ?) to malloc
> > > implementation that it's ready to full address space. In this case, malloc
> > > can use mmap((void *) -1,...) for its allocations and get full address
> > > space this way.
> >
> > Point is, there's nothing stopping an allocator library or runtime
> > from asking for mmap anywhere and returning it to userspace.
>
> Right. Nobody would stop it from doing stupid things. There are many
> things that a library may do that application would not be happy about.
Indeed.
>
> > Do > 128TB pointers matter so much that we should add this heuristic
> > to prevent breakage, but little enough that we can accept some rare
> > cases getting through? Genuine question.
>
> At the end of the day what matters is if heuristic helps prevent breakage
> of existing userspace and doesn't stay in the way of legitimate use of
> full address space.
>
> So far, it looks okay to me.
Well that wasn't really the point of my question. Yes of course that
is important. But the question is how are these heuristics chosen and
evaluated? Why is this a good change to make?
We've decided there is some benefit from preventing 128TB pointers, but
also not enough that we have to completely prevent them accidentally
being returned. Yet it's important enough to make mmap behaviour diverge
in about 5 ways around 128TB depending on what combination of address
and length and MAP_FIXED you specify, and introducing this new way to
use the interface to get an expanded address space?
And we're doing this because we don't want to add a prctl or personality
or whatever for SAP HANA, because it was decided that would be too complex?
>
> > > The idea was we shouldn't allow to slip above 47-bits by accidentally.
> > >
> > > Correctly functioning program would never request addr+len above 47-bit
> > > with MAP_FIXED, unless it's ready to handle such addresses. Otherwise the
> > > request would simply fail on machine that doesn't support large VA.
> > >
> > > In contrast, addr+len above 47-bit without MAP_FIXED will not fail on
> > > machine that doesn't support large VA, kernel will find another place
> > > under 47-bit. And I can imagine a reasonable application that does
> > > something like this.
> > >
> > > So we cannot rely that application is ready to handle large
> > > addresses if we see addr+len without MAP_FIXED.
> >
> > By the same logic, a request for addr > 128TB without MAP_FIXED will
> > not fail, therefore we can't rely on that either.
> >
> > Or an app that links to a library that attempts MAP_FIXED allocation
> > of addr + len above 128TB might use high bits of pointer returned by
> > that library because those are never satisfied today and the library
> > would fall back.
>
> If you want to point that it's ABI break, yes it is.
>
> But we allow ABI break as long as nobody notices. I think it's reasonable
> to expect that nobody relies on such corner cases.
I accept the point, but I do worry mmap syscall is very fundamental,
and that it is used in a lot of ways that are difficult to foresee,
so there's a lot of room for unintended consequences. Plus the state
of the code (that's not to pick on x86, powerpc is worse) is worrying.
>
> If we would find any piece of software affect by the change we would need
> to reconsider.
>
Problem is that if there was an issue caused by this, it's unlikely
to be found until a long time later. By that time, hopefully there
aren't too many other apps that are now rely on the behaviour if it
has to be changed.
I just don't see why that's a risk worth taking at all. I can't see how
the cost benefit is there. I have to confess I could have been more
involved in discussion, but is any other interface honestly too complex
to add?
If it is decided to keep these kind of heuristics, can we get just a
small but reasonably precise description of each change to the
interface and ways for using the new functionality, such that would be
suitable for the man page? I couldn't fix powerpc because nothing
matches and even Aneesh and you differ on some details (MAP_FIXED
behaviour).
Thanks,
Nick
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 13:33 ` Nicholas Piggin
2017-11-07 13:33 ` Nicholas Piggin
@ 2017-11-07 13:45 ` Aneesh Kumar K.V
2017-11-07 14:01 ` Kirill A. Shutemov
2017-11-08 4:56 ` Michael Ellerman
1 sibling, 2 replies; 34+ messages in thread
From: Aneesh Kumar K.V @ 2017-11-07 13:45 UTC (permalink / raw)
To: Nicholas Piggin, Kirill A. Shutemov
Cc: Florian Weimer, Kirill A. Shutemov, linuxppc-dev, linux-mm,
Andrew Morton, Andy Lutomirski, Dave Hansen, Linus Torvalds,
Peter Zijlstra, Thomas Gleixner, linux-arch, Ingo Molnar,
Linux Kernel Mailing List
>
> If it is decided to keep these kind of heuristics, can we get just a
> small but reasonably precise description of each change to the
> interface and ways for using the new functionality, such that would be
> suitable for the man page? I couldn't fix powerpc because nothing
> matches and even Aneesh and you differ on some details (MAP_FIXED
> behaviour).
I would consider MAP_FIXED as my mistake. We never discussed this
explicitly and I kind of assumed it to behave the same way. ie, we
search in lower address space (128TB) if the hint addr is below 128TB.
IIUC we agree on the below.
1) MAP_FIXED allow the addr to be used, even if hint addr is below 128TB
but hint_addr + len is > 128TB.
2) For everything else we search in < 128TB space if hint addr is below
128TB
3) We don't switch to large address space if hint_addr + len > 128TB.
The decision to switch to large address space is primarily based on hint
addr
Is there any other rule we need to outline? Or is any of the above not
correct?
-aneesh
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 13:45 ` Aneesh Kumar K.V
@ 2017-11-07 14:01 ` Kirill A. Shutemov
2017-11-07 14:01 ` Kirill A. Shutemov
2017-11-09 17:15 ` Nicholas Piggin
2017-11-08 4:56 ` Michael Ellerman
1 sibling, 2 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-07 14:01 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Nicholas Piggin, Florian Weimer, Kirill A. Shutemov, linuxppc-dev,
linux-mm, Andrew Morton, Andy Lutomirski, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Thomas Gleixner, linux-arch,
Ingo Molnar, Linux Kernel Mailing List
On Tue, Nov 07, 2017 at 07:15:58PM +0530, Aneesh Kumar K.V wrote:
>
> >
> > If it is decided to keep these kind of heuristics, can we get just a
> > small but reasonably precise description of each change to the
> > interface and ways for using the new functionality, such that would be
> > suitable for the man page? I couldn't fix powerpc because nothing
> > matches and even Aneesh and you differ on some details (MAP_FIXED
> > behaviour).
>
>
> I would consider MAP_FIXED as my mistake. We never discussed this explicitly
> and I kind of assumed it to behave the same way. ie, we search in lower
> address space (128TB) if the hint addr is below 128TB.
>
> IIUC we agree on the below.
>
> 1) MAP_FIXED allow the addr to be used, even if hint addr is below 128TB but
> hint_addr + len is > 128TB.
>
> 2) For everything else we search in < 128TB space if hint addr is below
> 128TB
>
> 3) We don't switch to large address space if hint_addr + len > 128TB. The
> decision to switch to large address space is primarily based on hint addr
>
> Is there any other rule we need to outline? Or is any of the above not
> correct?
That's correct.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 14:01 ` Kirill A. Shutemov
@ 2017-11-07 14:01 ` Kirill A. Shutemov
2017-11-09 17:15 ` Nicholas Piggin
1 sibling, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-07 14:01 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Nicholas Piggin, Florian Weimer, Kirill A. Shutemov, linuxppc-dev,
linux-mm, Andrew Morton, Andy Lutomirski, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Thomas Gleixner, linux-arch,
Ingo Molnar, Linux Kernel Mailing List
On Tue, Nov 07, 2017 at 07:15:58PM +0530, Aneesh Kumar K.V wrote:
>
> >
> > If it is decided to keep these kind of heuristics, can we get just a
> > small but reasonably precise description of each change to the
> > interface and ways for using the new functionality, such that would be
> > suitable for the man page? I couldn't fix powerpc because nothing
> > matches and even Aneesh and you differ on some details (MAP_FIXED
> > behaviour).
>
>
> I would consider MAP_FIXED as my mistake. We never discussed this explicitly
> and I kind of assumed it to behave the same way. ie, we search in lower
> address space (128TB) if the hint addr is below 128TB.
>
> IIUC we agree on the below.
>
> 1) MAP_FIXED allow the addr to be used, even if hint addr is below 128TB but
> hint_addr + len is > 128TB.
>
> 2) For everything else we search in < 128TB space if hint addr is below
> 128TB
>
> 3) We don't switch to large address space if hint_addr + len > 128TB. The
> decision to switch to large address space is primarily based on hint addr
>
> Is there any other rule we need to outline? Or is any of the above not
> correct?
That's correct.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 14:01 ` Kirill A. Shutemov
2017-11-07 14:01 ` Kirill A. Shutemov
@ 2017-11-09 17:15 ` Nicholas Piggin
2017-11-09 17:15 ` Nicholas Piggin
2017-11-09 19:44 ` Matthew Wilcox
1 sibling, 2 replies; 34+ messages in thread
From: Nicholas Piggin @ 2017-11-09 17:15 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Aneesh Kumar K.V, Florian Weimer, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, 7 Nov 2017 17:01:58 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Tue, Nov 07, 2017 at 07:15:58PM +0530, Aneesh Kumar K.V wrote:
> >
> > >
> > > If it is decided to keep these kind of heuristics, can we get just a
> > > small but reasonably precise description of each change to the
> > > interface and ways for using the new functionality, such that would be
> > > suitable for the man page? I couldn't fix powerpc because nothing
> > > matches and even Aneesh and you differ on some details (MAP_FIXED
> > > behaviour).
> >
> >
> > I would consider MAP_FIXED as my mistake. We never discussed this explicitly
> > and I kind of assumed it to behave the same way. ie, we search in lower
> > address space (128TB) if the hint addr is below 128TB.
> >
> > IIUC we agree on the below.
> >
> > 1) MAP_FIXED allow the addr to be used, even if hint addr is below 128TB but
> > hint_addr + len is > 128TB.
> >
> > 2) For everything else we search in < 128TB space if hint addr is below
> > 128TB
> >
> > 3) We don't switch to large address space if hint_addr + len > 128TB. The
> > decision to switch to large address space is primarily based on hint addr
> >
> > Is there any other rule we need to outline? Or is any of the above not
> > correct?
>
> That's correct.
>
Thanks guys, I'll send out some powerpc patches to match -- it deviates in
its MAP_FIXED handling (treats it the same as !MAP_FIXED).
So these semantics are what we're going with? Anything that does mmap() is
guaranteed of getting a 47-bit pointer and it can use the top 17 bits for
itself? Is intended to be cross-platform or just x86 and power specific?
Also, this may follow from deduction from 1-3, but for explicit
specification in man page:
4) To get an unspecified allocation with the largest possible address range,
we pass in -1 for mmap hint.
Are we allowing 8 bits bits of unused address in this case, or must the
app not assume anything about number of bits used?
Thanks,
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-09 17:15 ` Nicholas Piggin
@ 2017-11-09 17:15 ` Nicholas Piggin
2017-11-09 19:44 ` Matthew Wilcox
1 sibling, 0 replies; 34+ messages in thread
From: Nicholas Piggin @ 2017-11-09 17:15 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Aneesh Kumar K.V, Florian Weimer, Kirill A. Shutemov,
linuxppc-dev, linux-mm, Andrew Morton, Andy Lutomirski,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
linux-arch, Ingo Molnar, Linux Kernel Mailing List
On Tue, 7 Nov 2017 17:01:58 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Tue, Nov 07, 2017 at 07:15:58PM +0530, Aneesh Kumar K.V wrote:
> >
> > >
> > > If it is decided to keep these kind of heuristics, can we get just a
> > > small but reasonably precise description of each change to the
> > > interface and ways for using the new functionality, such that would be
> > > suitable for the man page? I couldn't fix powerpc because nothing
> > > matches and even Aneesh and you differ on some details (MAP_FIXED
> > > behaviour).
> >
> >
> > I would consider MAP_FIXED as my mistake. We never discussed this explicitly
> > and I kind of assumed it to behave the same way. ie, we search in lower
> > address space (128TB) if the hint addr is below 128TB.
> >
> > IIUC we agree on the below.
> >
> > 1) MAP_FIXED allow the addr to be used, even if hint addr is below 128TB but
> > hint_addr + len is > 128TB.
> >
> > 2) For everything else we search in < 128TB space if hint addr is below
> > 128TB
> >
> > 3) We don't switch to large address space if hint_addr + len > 128TB. The
> > decision to switch to large address space is primarily based on hint addr
> >
> > Is there any other rule we need to outline? Or is any of the above not
> > correct?
>
> That's correct.
>
Thanks guys, I'll send out some powerpc patches to match -- it deviates in
its MAP_FIXED handling (treats it the same as !MAP_FIXED).
So these semantics are what we're going with? Anything that does mmap() is
guaranteed of getting a 47-bit pointer and it can use the top 17 bits for
itself? Is intended to be cross-platform or just x86 and power specific?
Also, this may follow from deduction from 1-3, but for explicit
specification in man page:
4) To get an unspecified allocation with the largest possible address range,
we pass in -1 for mmap hint.
Are we allowing 8 bits bits of unused address in this case, or must the
app not assume anything about number of bits used?
Thanks,
Nick
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-09 17:15 ` Nicholas Piggin
2017-11-09 17:15 ` Nicholas Piggin
@ 2017-11-09 19:44 ` Matthew Wilcox
2017-11-09 19:44 ` Matthew Wilcox
` (2 more replies)
1 sibling, 3 replies; 34+ messages in thread
From: Matthew Wilcox @ 2017-11-09 19:44 UTC (permalink / raw)
To: Nicholas Piggin
Cc: Kirill A. Shutemov, Aneesh Kumar K.V, Florian Weimer,
Kirill A. Shutemov, linuxppc-dev, linux-mm, Andrew Morton,
Andy Lutomirski, Dave Hansen, Linus Torvalds, Peter Zijlstra,
Thomas Gleixner, linux-arch, Ingo Molnar,
Linux Kernel Mailing List
On Fri, Nov 10, 2017 at 04:15:26AM +1100, Nicholas Piggin wrote:
> So these semantics are what we're going with? Anything that does mmap() is
> guaranteed of getting a 47-bit pointer and it can use the top 17 bits for
> itself? Is intended to be cross-platform or just x86 and power specific?
It is x86 and powerpc specific. The arm64 people have apparently stumbled
across apps that expect to be able to use bit 48 for their own purposes.
And their address space is 48 bit by default. Oops.
> Also, this may follow from deduction from 1-3, but for explicit
> specification in man page:
>
> 4) To get an unspecified allocation with the largest possible address range,
> we pass in -1 for mmap hint.
>
> Are we allowing 8 bits bits of unused address in this case, or must the
> app not assume anything about number of bits used?
Maybe document it as: "If the app wants to use the top N bits of addresses
for its own purposes, pass in (~0UL >> N) as the mmap hint." ?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-09 19:44 ` Matthew Wilcox
@ 2017-11-09 19:44 ` Matthew Wilcox
2017-11-10 1:26 ` Nicholas Piggin
2017-11-10 12:08 ` David Laight
2 siblings, 0 replies; 34+ messages in thread
From: Matthew Wilcox @ 2017-11-09 19:44 UTC (permalink / raw)
To: Nicholas Piggin
Cc: Kirill A. Shutemov, Aneesh Kumar K.V, Florian Weimer,
Kirill A. Shutemov, linuxppc-dev, linux-mm, Andrew Morton,
Andy Lutomirski, Dave Hansen, Linus Torvalds, Peter Zijlstra,
Thomas Gleixner, linux-arch, Ingo Molnar,
Linux Kernel Mailing List
On Fri, Nov 10, 2017 at 04:15:26AM +1100, Nicholas Piggin wrote:
> So these semantics are what we're going with? Anything that does mmap() is
> guaranteed of getting a 47-bit pointer and it can use the top 17 bits for
> itself? Is intended to be cross-platform or just x86 and power specific?
It is x86 and powerpc specific. The arm64 people have apparently stumbled
across apps that expect to be able to use bit 48 for their own purposes.
And their address space is 48 bit by default. Oops.
> Also, this may follow from deduction from 1-3, but for explicit
> specification in man page:
>
> 4) To get an unspecified allocation with the largest possible address range,
> we pass in -1 for mmap hint.
>
> Are we allowing 8 bits bits of unused address in this case, or must the
> app not assume anything about number of bits used?
Maybe document it as: "If the app wants to use the top N bits of addresses
for its own purposes, pass in (~0UL >> N) as the mmap hint." ?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-09 19:44 ` Matthew Wilcox
2017-11-09 19:44 ` Matthew Wilcox
@ 2017-11-10 1:26 ` Nicholas Piggin
2017-11-10 12:08 ` David Laight
2 siblings, 0 replies; 34+ messages in thread
From: Nicholas Piggin @ 2017-11-10 1:26 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Kirill A. Shutemov, Aneesh Kumar K.V, Florian Weimer,
Kirill A. Shutemov, linuxppc-dev, linux-mm, Andrew Morton,
Andy Lutomirski, Dave Hansen, Linus Torvalds, Peter Zijlstra,
Thomas Gleixner, linux-arch, Ingo Molnar,
Linux Kernel Mailing List
On Thu, 9 Nov 2017 11:44:21 -0800
Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Nov 10, 2017 at 04:15:26AM +1100, Nicholas Piggin wrote:
> > So these semantics are what we're going with? Anything that does mmap() is
> > guaranteed of getting a 47-bit pointer and it can use the top 17 bits for
> > itself? Is intended to be cross-platform or just x86 and power specific?
>
> It is x86 and powerpc specific. The arm64 people have apparently stumbled
> across apps that expect to be able to use bit 48 for their own purposes.
> And their address space is 48 bit by default. Oops.
Okay, so it's something we should make into an "official" API?
>
> > Also, this may follow from deduction from 1-3, but for explicit
> > specification in man page:
> >
> > 4) To get an unspecified allocation with the largest possible address range,
> > we pass in -1 for mmap hint.
> >
> > Are we allowing 8 bits bits of unused address in this case, or must the
> > app not assume anything about number of bits used?
>
> Maybe document it as: "If the app wants to use the top N bits of addresses
> for its own purposes, pass in (~0UL >> N) as the mmap hint." ?
Well we don't have code for that yet, but the problem would also be that
it succeeds, and actually it probably goes over the limit. So you would
have to map a dummy page there so subsequent hints to fail and fall back.
Not sure... it would be nice to be able to specify number of bits, but I
think this gets a bit hairy. -1 to use all bits seems a bit easier.
Thanks,
Nick
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-09 19:44 ` Matthew Wilcox
2017-11-09 19:44 ` Matthew Wilcox
2017-11-10 1:26 ` Nicholas Piggin
@ 2017-11-10 12:08 ` David Laight
2017-11-11 10:30 ` Nicholas Piggin
2 siblings, 1 reply; 34+ messages in thread
From: David Laight @ 2017-11-10 12:08 UTC (permalink / raw)
To: 'Matthew Wilcox', Nicholas Piggin
Cc: Florian Weimer, linux-arch@vger.kernel.org, linux-mm,
Peter Zijlstra, linuxppc-dev@lists.ozlabs.org,
Linux Kernel Mailing List, Andy Lutomirski, Dave Hansen,
Thomas Gleixner, Aneesh Kumar K.V, Kirill A. Shutemov,
Andrew Morton, Linus Torvalds, Ingo Molnar, Kirill A. Shutemov
From: Matthew Wilcox
> Sent: 09 November 2017 19:44
>
> On Fri, Nov 10, 2017 at 04:15:26AM +1100, Nicholas Piggin wrote:
> > So these semantics are what we're going with? Anything that does mmap() is
> > guaranteed of getting a 47-bit pointer and it can use the top 17 bits for
> > itself? Is intended to be cross-platform or just x86 and power specific?
>
> It is x86 and powerpc specific. The arm64 people have apparently stumbled
> across apps that expect to be able to use bit 48 for their own purposes.
> And their address space is 48 bit by default. Oops.
(Do you mean 49bit?)
Aren't such apps just doomed to be broken?
ISTR there is something on (IIRC) sparc64 that does a 'match'
on the high address bits to make it much harder to overrun
one area into another.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-10 12:08 ` David Laight
@ 2017-11-11 10:30 ` Nicholas Piggin
0 siblings, 0 replies; 34+ messages in thread
From: Nicholas Piggin @ 2017-11-11 10:30 UTC (permalink / raw)
To: David Laight
Cc: 'Matthew Wilcox', Florian Weimer,
linux-arch@vger.kernel.org, Dave Hansen, Peter Zijlstra,
Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
Andy Lutomirski, linux-mm, Aneesh Kumar K.V, Kirill A. Shutemov,
Andrew Morton, linuxppc-dev@lists.ozlabs.org, Thomas Gleixner,
Kirill A. Shutemov
On Fri, 10 Nov 2017 12:08:35 +0000
David Laight <David.Laight@ACULAB.COM> wrote:
> From: Matthew Wilcox
> > Sent: 09 November 2017 19:44
> >
> > On Fri, Nov 10, 2017 at 04:15:26AM +1100, Nicholas Piggin wrote:
> > > So these semantics are what we're going with? Anything that does mmap() is
> > > guaranteed of getting a 47-bit pointer and it can use the top 17 bits for
> > > itself? Is intended to be cross-platform or just x86 and power specific?
> >
> > It is x86 and powerpc specific. The arm64 people have apparently stumbled
> > across apps that expect to be able to use bit 48 for their own purposes.
> > And their address space is 48 bit by default. Oops.
>
> (Do you mean 49bit?)
I think he meant bit 47, which makes sense because they were probably
ported from x86-64 with 47 bit address. That seems to be why x86-64
5-level and powerpc decided to limit to a 47 bit address space by
default.
>
> Aren't such apps just doomed to be broken?
Well they're not portable but they are not broken if virtual address
is limited.
>
> ISTR there is something on (IIRC) sparc64 that does a 'match'
> on the high address bits to make it much harder to overrun
> one area into another.
I'm not sure about that but I think the problem would be the app
masking out bits from the pointer for its own use before ever
dereferencing it.
Thanks,
Nick
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-07 13:45 ` Aneesh Kumar K.V
2017-11-07 14:01 ` Kirill A. Shutemov
@ 2017-11-08 4:56 ` Michael Ellerman
2017-11-08 8:30 ` Kirill A. Shutemov
1 sibling, 1 reply; 34+ messages in thread
From: Michael Ellerman @ 2017-11-08 4:56 UTC (permalink / raw)
To: Aneesh Kumar K.V, Nicholas Piggin, Kirill A. Shutemov
Cc: Florian Weimer, linux-arch, Dave Hansen, Peter Zijlstra,
Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
Andy Lutomirski, linux-mm, Andrew Morton, linuxppc-dev,
Thomas Gleixner, Kirill A. Shutemov
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
>>
>> If it is decided to keep these kind of heuristics, can we get just a
>> small but reasonably precise description of each change to the
>> interface and ways for using the new functionality, such that would be
>> suitable for the man page? I couldn't fix powerpc because nothing
>> matches and even Aneesh and you differ on some details (MAP_FIXED
>> behaviour).
>
>
> I would consider MAP_FIXED as my mistake. We never discussed this
> explicitly and I kind of assumed it to behave the same way. ie, we
> search in lower address space (128TB) if the hint addr is below 128TB.
>
> IIUC we agree on the below.
>
> 1) MAP_FIXED allow the addr to be used, even if hint addr is below 128TB
> but hint_addr + len is > 128TB.
So:
mmap(0x7ffffffff000, 0x2000, ..., MAP_FIXED ...) = 0x7ffffffff000
> 2) For everything else we search in < 128TB space if hint addr is below
> 128TB
mmap((x < 128T), 0x1000, ...) = (y < 128T)
...
mmap(0x7ffffffff000, 0x1000, ...) = 0x7ffffffff000
mmap(0x800000000000, 0x1000, ...) = 0x800000000000
...
mmap((x >= 128T), 0x1000, ...) = (y >= 128T)
> 3) We don't switch to large address space if hint_addr + len > 128TB.
> The decision to switch to large address space is primarily based on hint
> addr
But does the mmap succeed in that case or not?
ie: mmap(0x7ffffffff000, 0x2000, ...) = ?
cheers
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-08 4:56 ` Michael Ellerman
@ 2017-11-08 8:30 ` Kirill A. Shutemov
2017-11-08 8:30 ` Kirill A. Shutemov
0 siblings, 1 reply; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-08 8:30 UTC (permalink / raw)
To: Michael Ellerman
Cc: Aneesh Kumar K.V, Nicholas Piggin, Florian Weimer, linux-arch,
Dave Hansen, Peter Zijlstra, Linus Torvalds, Ingo Molnar,
Linux Kernel Mailing List, Andy Lutomirski, linux-mm,
Andrew Morton, linuxppc-dev, Thomas Gleixner, Kirill A. Shutemov
On Wed, Nov 08, 2017 at 03:56:06PM +1100, Michael Ellerman wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
>
> >>
> >> If it is decided to keep these kind of heuristics, can we get just a
> >> small but reasonably precise description of each change to the
> >> interface and ways for using the new functionality, such that would be
> >> suitable for the man page? I couldn't fix powerpc because nothing
> >> matches and even Aneesh and you differ on some details (MAP_FIXED
> >> behaviour).
> >
> >
> > I would consider MAP_FIXED as my mistake. We never discussed this
> > explicitly and I kind of assumed it to behave the same way. ie, we
> > search in lower address space (128TB) if the hint addr is below 128TB.
> >
> > IIUC we agree on the below.
> >
> > 1) MAP_FIXED allow the addr to be used, even if hint addr is below 128TB
> > but hint_addr + len is > 128TB.
>
> So:
> mmap(0x7ffffffff000, 0x2000, ..., MAP_FIXED ...) = 0x7ffffffff000
>
> > 2) For everything else we search in < 128TB space if hint addr is below
> > 128TB
>
> mmap((x < 128T), 0x1000, ...) = (y < 128T)
> ...
> mmap(0x7ffffffff000, 0x1000, ...) = 0x7ffffffff000
> mmap(0x800000000000, 0x1000, ...) = 0x800000000000
> ...
> mmap((x >= 128T), 0x1000, ...) = (y >= 128T)
>
> > 3) We don't switch to large address space if hint_addr + len > 128TB.
> > The decision to switch to large address space is primarily based on hint
> > addr
>
> But does the mmap succeed in that case or not?
>
> ie: mmap(0x7ffffffff000, 0x2000, ...) = ?
It does, but resulting address doesn't match the hint. It's somewhere
below 47-bit border.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: POWER: Unexpected fault when writing to brk-allocated memory
2017-11-08 8:30 ` Kirill A. Shutemov
@ 2017-11-08 8:30 ` Kirill A. Shutemov
0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2017-11-08 8:30 UTC (permalink / raw)
To: Michael Ellerman
Cc: Aneesh Kumar K.V, Nicholas Piggin, Florian Weimer, linux-arch,
Dave Hansen, Peter Zijlstra, Linus Torvalds, Ingo Molnar,
Linux Kernel Mailing List, Andy Lutomirski, linux-mm,
Andrew Morton, linuxppc-dev, Thomas Gleixner, Kirill A. Shutemov
On Wed, Nov 08, 2017 at 03:56:06PM +1100, Michael Ellerman wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
>
> >>
> >> If it is decided to keep these kind of heuristics, can we get just a
> >> small but reasonably precise description of each change to the
> >> interface and ways for using the new functionality, such that would be
> >> suitable for the man page? I couldn't fix powerpc because nothing
> >> matches and even Aneesh and you differ on some details (MAP_FIXED
> >> behaviour).
> >
> >
> > I would consider MAP_FIXED as my mistake. We never discussed this
> > explicitly and I kind of assumed it to behave the same way. ie, we
> > search in lower address space (128TB) if the hint addr is below 128TB.
> >
> > IIUC we agree on the below.
> >
> > 1) MAP_FIXED allow the addr to be used, even if hint addr is below 128TB
> > but hint_addr + len is > 128TB.
>
> So:
> mmap(0x7ffffffff000, 0x2000, ..., MAP_FIXED ...) = 0x7ffffffff000
>
> > 2) For everything else we search in < 128TB space if hint addr is below
> > 128TB
>
> mmap((x < 128T), 0x1000, ...) = (y < 128T)
> ...
> mmap(0x7ffffffff000, 0x1000, ...) = 0x7ffffffff000
> mmap(0x800000000000, 0x1000, ...) = 0x800000000000
> ...
> mmap((x >= 128T), 0x1000, ...) = (y >= 128T)
>
> > 3) We don't switch to large address space if hint_addr + len > 128TB.
> > The decision to switch to large address space is primarily based on hint
> > addr
>
> But does the mmap succeed in that case or not?
>
> ie: mmap(0x7ffffffff000, 0x2000, ...) = ?
It does, but resulting address doesn't match the hint. It's somewhere
below 47-bit border.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread