[RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
       [not found] ` <20161227015413.187403-30-kirill.shutemov@linux.intel.com>
@ 2017-01-02  8:44   ` Arnd Bergmann
  2017-01-03  6:08     ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2017-01-02  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, December 27, 2016 4:54:13 AM CET Kirill A. Shutemov wrote:
> This patch introduces new rlimit resource to manage maximum virtual
> address available to userspace to map.
> 
> On x86, 5-level paging enables 56-bit userspace virtual address space.
> Not all user space is ready to handle wide addresses. It's known that
> at least some JIT compilers use high bit in pointers to encode their
> information. It collides with valid pointers with 5-level paging and
> leads to crashes.
> 
> The patch aims to address this compatibility issue.
> 
> MM would use min(RLIMIT_VADDR, TASK_SIZE) as upper limit of virtual
> address available to map by userspace.
> 
> The default hard limit will be RLIM_INFINITY, which basically means that
> TASK_SIZE limits available address space.
> 
> The soft limit will also be RLIM_INFINITY everywhere, but the machine
> with 5-level paging enabled. In this case, soft limit would be
> (1UL << 47) - PAGE_SIZE. It?s current x86-64 TASK_SIZE_MAX with 4-level
> paging which known to be safe
> 
> New rlimit resource would follow usual semantics with regards to
> inheritance: preserved on fork(2) and exec(2). This has potential to
> break application if limits set too wide or too narrow, but this is not
> uncommon for other resources (consider RLIMIT_DATA or RLIMIT_AS).
> 
> As with other resources you can set the limit lower than current usage.
> It would affect only future virtual address space allocations.
> 
> Use-cases for new rlimit:
> 
>   - Bumping the soft limit to RLIM_INFINITY, allows current process all
>     its children to use addresses above 47-bits.
> 
>   - Bumping the soft limit to RLIM_INFINITY after fork(2), but before
>     exec(2) allows the child to use addresses above 47-bits.
> 
>   - Lowering the hard limit to 47-bits would prevent current process all
>     its children to use addresses above 47-bits, unless a process has
>     CAP_SYS_RESOURCES.
> 
>   - It?s also can be handy to lower hard or soft limit to arbitrary
>     address. User-mode emulation in QEMU may lower the limit to 32-bit
>     to emulate 32-bit machine on 64-bit host.
> 
> TODO:
>   - port to non-x86;
> 
> Not-yet-signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: linux-api at vger.kernel.org

This seems to nicely address the same problem on arm64, which has
run into the same issue due to the various page table formats
that can currently be chosen at compile time.

I don't see how this interacts with the existing
PER_LINUX32/PER_LINUX32_3GB personality flags, but I assume you have
either already thought of that, or we can come up with a good way
to define what happens when conflicting settings are applied.

The two reasonable ways I can think of are to either use the
minimum of the two limits, or to make the personality syscall
set the soft rlimit and use whatever limit was last set.

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-02  8:44   ` [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR Arnd Bergmann
@ 2017-01-03  6:08     ` Andy Lutomirski
  2017-01-03 13:18       ` Arnd Bergmann
  2017-01-03 16:04       ` Kirill A. Shutemov
  0 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2017-01-03  6:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 2, 2017 at 12:44 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday, December 27, 2016 4:54:13 AM CET Kirill A. Shutemov wrote:
>> As with other resources you can set the limit lower than current usage.
>> It would affect only future virtual address space allocations.

I still don't buy all these use cases:

>>
>> Use-cases for new rlimit:
>>
>>   - Bumping the soft limit to RLIM_INFINITY, allows current process all
>>     its children to use addresses above 47-bits.

OK, I get this, but only as a workaround for programs that make
assumptions about the address space and don't use some mechanism (to
be designed?) to work correctly in spite of a larger address space.

>>
>>   - Bumping the soft limit to RLIM_INFINITY after fork(2), but before
>>     exec(2) allows the child to use addresses above 47-bits.

Ditto.

>>
>>   - Lowering the hard limit to 47-bits would prevent current process all
>>     its children to use addresses above 47-bits, unless a process has
>>     CAP_SYS_RESOURCES.

I've tried and I can't imagine any reason to do this.

>>
>>   - It?s also can be handy to lower hard or soft limit to arbitrary
>>     address. User-mode emulation in QEMU may lower the limit to 32-bit
>>     to emulate 32-bit machine on 64-bit host.

I don't understand.  QEMU user-mode emulation intercepts all syscalls.
What QEMU would *actually* want is a way to say "allocate me some
memory with the high N bits clear".  mmap-via-int80 on x86 should be
fixed to do this, but a new syscall with an explicit parameter would
work, as would a prctl changing the current limit.

>>
>> TODO:
>>   - port to non-x86;
>>
>> Not-yet-signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: linux-api at vger.kernel.org
>
> This seems to nicely address the same problem on arm64, which has
> run into the same issue due to the various page table formats
> that can currently be chosen at compile time.

On further reflection, I think this has very little to do with paging
formats except insofar as paging formats make us notice the problem.
The issue is that user code wants to be able to assume an upper limit
on an address, and it gets an upper limit right now that depends on
architecture due to paging formats.  But someone really might want to
write a *portable* 64-bit program that allocates memory with the high
16 bits clear.  So let's add such a mechanism directly.

As a thought experiment, what if x86_64 simply never allocated "high"
(above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall
were used?  Old glibc would continue working.  Old VMs would work.
New programs that want to use ginormous mappings would have to use the
new syscall.  This would be totally stateless and would have no issues
with CRIU.

If necessary, we could also have a prctl that changes a
"personality-like" limit that is in effect when the old mmap was used.
I say "personality-like" because it would reset under exactly the same
conditions that personality resets itself.

Thoughts?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-03  6:08     ` Andy Lutomirski
@ 2017-01-03 13:18       ` Arnd Bergmann
  2017-01-03 18:29         ` Andy Lutomirski
  2017-01-03 16:04       ` Kirill A. Shutemov
  1 sibling, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2017-01-03 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday, January 2, 2017 10:08:28 PM CET Andy Lutomirski wrote:
> 
> > This seems to nicely address the same problem on arm64, which has
> > run into the same issue due to the various page table formats
> > that can currently be chosen at compile time.
> 
> On further reflection, I think this has very little to do with paging
> formats except insofar as paging formats make us notice the problem.
> The issue is that user code wants to be able to assume an upper limit
> on an address, and it gets an upper limit right now that depends on
> architecture due to paging formats.  But someone really might want to
> write a *portable* 64-bit program that allocates memory with the high
> 16 bits clear.  So let's add such a mechanism directly.
> 
> As a thought experiment, what if x86_64 simply never allocated "high"
> (above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall
> were used?  Old glibc would continue working.  Old VMs would work.
> New programs that want to use ginormous mappings would have to use the
> new syscall.  This would be totally stateless and would have no issues
> with CRIU.

I can see this working well for the 47-bit addressing default, but
what about applications that actually rely on 39-bit addressing
(I'd have to double-check, but I think this was the limit that
people were most interested in for arm64)?

39 bits seems a little small to make that the default for everyone
who doesn't pass the extra flag. Having to pass another flag to
limit the addresses introduces other problems (e.g. mmap from
library call that doesn't pass that flag).

> If necessary, we could also have a prctl that changes a
> "personality-like" limit that is in effect when the old mmap was used.
> I say "personality-like" because it would reset under exactly the same
> conditions that personality resets itself.

For "personality-like", it would still have to interact
with the existing PER_LINUX32 and PER_LINUX32_3GB flags that
do the exact same thing, so actually using personality might
be better.

We still have a few bits in the personality arguments, and
we could combine them with the existing ADDR_LIMIT_3GB
and ADDR_LIMIT_32BIT flags that are mutually exclusive by
definition, such as

        ADDR_LIMIT_32BIT =      0x0800000, /* existing */
        ADDR_LIMIT_3GB   =      0x8000000, /* existing */
        ADDR_LIMIT_39BIT =      0x0010000, /* next free bit */
        ADDR_LIMIT_42BIT =      0x8010000,
        ADDR_LIMIT_47BIT =      0x0810000,
        ADDR_LIMIT_48BIT =      0x8810000,

This would probably take only one or two personality bits for the
limits that are interesting in practice.

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-03 13:18       ` Arnd Bergmann
@ 2017-01-03 18:29         ` Andy Lutomirski
  2017-01-03 22:07           ` Arnd Bergmann
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-01-03 18:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 3, 2017 at 5:18 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday, January 2, 2017 10:08:28 PM CET Andy Lutomirski wrote:
>>
>> > This seems to nicely address the same problem on arm64, which has
>> > run into the same issue due to the various page table formats
>> > that can currently be chosen at compile time.
>>
>> On further reflection, I think this has very little to do with paging
>> formats except insofar as paging formats make us notice the problem.
>> The issue is that user code wants to be able to assume an upper limit
>> on an address, and it gets an upper limit right now that depends on
>> architecture due to paging formats.  But someone really might want to
>> write a *portable* 64-bit program that allocates memory with the high
>> 16 bits clear.  So let's add such a mechanism directly.
>>
>> As a thought experiment, what if x86_64 simply never allocated "high"
>> (above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall
>> were used?  Old glibc would continue working.  Old VMs would work.
>> New programs that want to use ginormous mappings would have to use the
>> new syscall.  This would be totally stateless and would have no issues
>> with CRIU.
>
> I can see this working well for the 47-bit addressing default, but
> what about applications that actually rely on 39-bit addressing
> (I'd have to double-check, but I think this was the limit that
> people were most interested in for arm64)?
>
> 39 bits seems a little small to make that the default for everyone
> who doesn't pass the extra flag. Having to pass another flag to
> limit the addresses introduces other problems (e.g. mmap from
> library call that doesn't pass that flag).

That's a fair point.  Maybe my straw man isn't so good.

>
>> If necessary, we could also have a prctl that changes a
>> "personality-like" limit that is in effect when the old mmap was used.
>> I say "personality-like" because it would reset under exactly the same
>> conditions that personality resets itself.
>
> For "personality-like", it would still have to interact
> with the existing PER_LINUX32 and PER_LINUX32_3GB flags that
> do the exact same thing, so actually using personality might
> be better.
>
> We still have a few bits in the personality arguments, and
> we could combine them with the existing ADDR_LIMIT_3GB
> and ADDR_LIMIT_32BIT flags that are mutually exclusive by
> definition, such as
>
>         ADDR_LIMIT_32BIT =      0x0800000, /* existing */
>         ADDR_LIMIT_3GB   =      0x8000000, /* existing */
>         ADDR_LIMIT_39BIT =      0x0010000, /* next free bit */
>         ADDR_LIMIT_42BIT =      0x8010000,
>         ADDR_LIMIT_47BIT =      0x0810000,
>         ADDR_LIMIT_48BIT =      0x8810000,
>
> This would probably take only one or two personality bits for the
> limits that are interesting in practice.

Hmm.  What if we approached this a bit differently?  We could add a
single new personality bit ADDR_LIMIT_EXPLICIT.  Setting this bit
cause PER_LINUX32_3GB etc to be automatically cleared.  When
ADDR_LIMIT_EXPLICIT is in effect, prctl can set a 64-bit numeric
limit.  If ADDR_LIMIT_EXPLICIT is cleared, the prctl value stops being
settable and reading it via prctl returns whatever is implied by the
other personality bits.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-03 18:29         ` Andy Lutomirski
@ 2017-01-03 22:07           ` Arnd Bergmann
  2017-01-03 22:09             ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2017-01-03 22:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, January 3, 2017 10:29:33 AM CET Andy Lutomirski wrote:
> 
> Hmm.  What if we approached this a bit differently?  We could add a
> single new personality bit ADDR_LIMIT_EXPLICIT.  Setting this bit
> cause PER_LINUX32_3GB etc to be automatically cleared.

Both the ADDR_LIMIT_32BIT and ADDR_LIMIT_3GB flags I guess?

> When
> ADDR_LIMIT_EXPLICIT is in effect, prctl can set a 64-bit numeric
> limit.  If ADDR_LIMIT_EXPLICIT is cleared, the prctl value stops being
> settable and reading it via prctl returns whatever is implied by the
> other personality bits.

I don't see anything wrong with it, but I'm a bit confused now
what this would be good for, compared to using just prctl.

Is this about setuid clearing the personality but not the prctl,
or something else?

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-03 22:07           ` Arnd Bergmann
@ 2017-01-03 22:09             ` Andy Lutomirski
  2017-01-04 13:55               ` Arnd Bergmann
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-01-03 22:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 3, 2017 at 2:07 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday, January 3, 2017 10:29:33 AM CET Andy Lutomirski wrote:
>>
>> Hmm.  What if we approached this a bit differently?  We could add a
>> single new personality bit ADDR_LIMIT_EXPLICIT.  Setting this bit
>> cause PER_LINUX32_3GB etc to be automatically cleared.
>
> Both the ADDR_LIMIT_32BIT and ADDR_LIMIT_3GB flags I guess?

Yes.

>
>> When
>> ADDR_LIMIT_EXPLICIT is in effect, prctl can set a 64-bit numeric
>> limit.  If ADDR_LIMIT_EXPLICIT is cleared, the prctl value stops being
>> settable and reading it via prctl returns whatever is implied by the
>> other personality bits.
>
> I don't see anything wrong with it, but I'm a bit confused now
> what this would be good for, compared to using just prctl.
>
> Is this about setuid clearing the personality but not the prctl,
> or something else?

It's to avid ambiguity as to what happens if you set ADDR_LIMIT_32BIT
and use the prctl.  ISTM it would be nice for the semantics to be
fully defined in all cases.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-03 22:09             ` Andy Lutomirski
@ 2017-01-04 13:55               ` Arnd Bergmann
  0 siblings, 0 replies; 11+ messages in thread
From: Arnd Bergmann @ 2017-01-04 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, January 3, 2017 2:09:16 PM CET Andy Lutomirski wrote:
> >
> >> When
> >> ADDR_LIMIT_EXPLICIT is in effect, prctl can set a 64-bit numeric
> >> limit.  If ADDR_LIMIT_EXPLICIT is cleared, the prctl value stops being
> >> settable and reading it via prctl returns whatever is implied by the
> >> other personality bits.
> >
> > I don't see anything wrong with it, but I'm a bit confused now
> > what this would be good for, compared to using just prctl.
> >
> > Is this about setuid clearing the personality but not the prctl,
> > or something else?
> 
> It's to avid ambiguity as to what happens if you set ADDR_LIMIT_32BIT
> and use the prctl.  ISTM it would be nice for the semantics to be
> fully defined in all cases.
> 

Ok, got it.

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-03  6:08     ` Andy Lutomirski
  2017-01-03 13:18       ` Arnd Bergmann
@ 2017-01-03 16:04       ` Kirill A. Shutemov
  2017-01-03 18:27         ` Andy Lutomirski
  1 sibling, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2017-01-03 16:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 02, 2017 at 10:08:28PM -0800, Andy Lutomirski wrote:
> On Mon, Jan 2, 2017 at 12:44 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tuesday, December 27, 2016 4:54:13 AM CET Kirill A. Shutemov wrote:
> >> As with other resources you can set the limit lower than current usage.
> >> It would affect only future virtual address space allocations.
> 
> I still don't buy all these use cases:
> 
> >>
> >> Use-cases for new rlimit:
> >>
> >>   - Bumping the soft limit to RLIM_INFINITY, allows current process all
> >>     its children to use addresses above 47-bits.
> 
> OK, I get this, but only as a workaround for programs that make
> assumptions about the address space and don't use some mechanism (to
> be designed?) to work correctly in spite of a larger address space.

I guess you've misread the case. It's opt-in for large adrress space, not
other way around.

I believe 47-bit VA by default is right way to go to make the transition
without breaking userspace.

> >>   - Bumping the soft limit to RLIM_INFINITY after fork(2), but before
> >>     exec(2) allows the child to use addresses above 47-bits.
> 
> Ditto.
> 
> >>
> >>   - Lowering the hard limit to 47-bits would prevent current process all
> >>     its children to use addresses above 47-bits, unless a process has
> >>     CAP_SYS_RESOURCES.
> 
> I've tried and I can't imagine any reason to do this.

That's just if something went wrong and we want to stop an application
from use addresses above 47-bit.

> >>   - It?s also can be handy to lower hard or soft limit to arbitrary
> >>     address. User-mode emulation in QEMU may lower the limit to 32-bit
> >>     to emulate 32-bit machine on 64-bit host.
> 
> I don't understand.  QEMU user-mode emulation intercepts all syscalls.
> What QEMU would *actually* want is a way to say "allocate me some
> memory with the high N bits clear".  mmap-via-int80 on x86 should be
> fixed to do this, but a new syscall with an explicit parameter would
> work, as would a prctl changing the current limit.

Look at mess in mmap_find_vma(). QEmu has to guess where is free virtual
memory. That's unnessesary complex.

prctl would work for this too. new-mmap would *not*: there are more ways
to allocate vitual address space: shmat(), mremap(). Changing all of them
just for this is stupid.

> >>
> >> TODO:
> >>   - port to non-x86;
> >>
> >> Not-yet-signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> Cc: linux-api at vger.kernel.org
> >
> > This seems to nicely address the same problem on arm64, which has
> > run into the same issue due to the various page table formats
> > that can currently be chosen at compile time.
> 
> On further reflection, I think this has very little to do with paging
> formats except insofar as paging formats make us notice the problem.
> The issue is that user code wants to be able to assume an upper limit
> on an address, and it gets an upper limit right now that depends on
> architecture due to paging formats.  But someone really might want to
> write a *portable* 64-bit program that allocates memory with the high
> 16 bits clear.  So let's add such a mechanism directly.
> 
> As a thought experiment, what if x86_64 simply never allocated "high"
> (above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall
> were used?  Old glibc would continue working.  Old VMs would work.
> New programs that want to use ginormous mappings would have to use the
> new syscall.  This would be totally stateless and would have no issues
> with CRIU.

Except, we need more than mmap as I mentioned.

And what about stack? I'm not sure that everybody would be happy with
stack in the middle of address space.

> If necessary, we could also have a prctl that changes a
> "personality-like" limit that is in effect when the old mmap was used.
> I say "personality-like" because it would reset under exactly the same
> conditions that personality resets itself.
> 
> Thoughts?
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-03 16:04       ` Kirill A. Shutemov
@ 2017-01-03 18:27         ` Andy Lutomirski
  2017-01-04 14:19           ` Kirill A. Shutemov
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2017-01-03 18:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 3, 2017 at 8:04 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Mon, Jan 02, 2017 at 10:08:28PM -0800, Andy Lutomirski wrote:
>> On Mon, Jan 2, 2017 at 12:44 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Tuesday, December 27, 2016 4:54:13 AM CET Kirill A. Shutemov wrote:
>> >> As with other resources you can set the limit lower than current usage.
>> >> It would affect only future virtual address space allocations.
>>
>> I still don't buy all these use cases:
>>
>> >>
>> >> Use-cases for new rlimit:
>> >>
>> >>   - Bumping the soft limit to RLIM_INFINITY, allows current process all
>> >>     its children to use addresses above 47-bits.
>>
>> OK, I get this, but only as a workaround for programs that make
>> assumptions about the address space and don't use some mechanism (to
>> be designed?) to work correctly in spite of a larger address space.
>
> I guess you've misread the case. It's opt-in for large adrress space, not
> other way around.
>
> I believe 47-bit VA by default is right way to go to make the transition
> without breaking userspace.

What I meant was: setting the rlimit to anything other than -1ULL is a
workaround, but otherwise I agree.  This still makes little sense if
set by PAM or other conventional rlimit tools.

>> >>
>> >>   - Lowering the hard limit to 47-bits would prevent current process all
>> >>     its children to use addresses above 47-bits, unless a process has
>> >>     CAP_SYS_RESOURCES.
>>
>> I've tried and I can't imagine any reason to do this.
>
> That's just if something went wrong and we want to stop an application
> from use addresses above 47-bit.

But CAP_SYS_RESOURCES still makes no sense in this context.

>
>> >>   - It?s also can be handy to lower hard or soft limit to arbitrary
>> >>     address. User-mode emulation in QEMU may lower the limit to 32-bit
>> >>     to emulate 32-bit machine on 64-bit host.
>>
>> I don't understand.  QEMU user-mode emulation intercepts all syscalls.
>> What QEMU would *actually* want is a way to say "allocate me some
>> memory with the high N bits clear".  mmap-via-int80 on x86 should be
>> fixed to do this, but a new syscall with an explicit parameter would
>> work, as would a prctl changing the current limit.
>
> Look at mess in mmap_find_vma(). QEmu has to guess where is free virtual
> memory. That's unnessesary complex.
>
> prctl would work for this too. new-mmap would *not*: there are more ways
> to allocate vitual address space: shmat(), mremap(). Changing all of them
> just for this is stupid.

Fair enough.

Except that mmap-via-int80, shmat-via-int80, etc should still work (if
I understand what qemu needs correctly), as would the prctl.

>
>> >>
>> >> TODO:
>> >>   - port to non-x86;
>> >>
>> >> Not-yet-signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >> Cc: linux-api at vger.kernel.org
>> >
>> > This seems to nicely address the same problem on arm64, which has
>> > run into the same issue due to the various page table formats
>> > that can currently be chosen at compile time.
>>
>> On further reflection, I think this has very little to do with paging
>> formats except insofar as paging formats make us notice the problem.
>> The issue is that user code wants to be able to assume an upper limit
>> on an address, and it gets an upper limit right now that depends on
>> architecture due to paging formats.  But someone really might want to
>> write a *portable* 64-bit program that allocates memory with the high
>> 16 bits clear.  So let's add such a mechanism directly.
>>
>> As a thought experiment, what if x86_64 simply never allocated "high"
>> (above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall
>> were used?  Old glibc would continue working.  Old VMs would work.
>> New programs that want to use ginormous mappings would have to use the
>> new syscall.  This would be totally stateless and would have no issues
>> with CRIU.
>
> Except, we need more than mmap as I mentioned.
>
> And what about stack? I'm not sure that everybody would be happy with
> stack in the middle of address space.

I would, personally.  I think that, for very large address spaces, we
should allocate a large block of stack and get rid of the "stack grows
down forever" legacy idea.  Then we would never need to worry about
the stack eventually hitting some other allocation.  And 2^57 bytes is
hilariously large for a default stack.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-03 18:27         ` Andy Lutomirski
@ 2017-01-04 14:19           ` Kirill A. Shutemov
  2017-01-05 17:53             ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2017-01-04 14:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 03, 2017 at 10:27:22AM -0800, Andy Lutomirski wrote:
> On Tue, Jan 3, 2017 at 8:04 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > And what about stack? I'm not sure that everybody would be happy with
> > stack in the middle of address space.
> 
> I would, personally.  I think that, for very large address spaces, we
> should allocate a large block of stack and get rid of the "stack grows
> down forever" legacy idea.  Then we would never need to worry about
> the stack eventually hitting some other allocation.  And 2^57 bytes is
> hilariously large for a default stack.

The stack in the middle of address space can prevent creating other huuuge
contiguous mapping. Databases may want this.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
  2017-01-04 14:19           ` Kirill A. Shutemov
@ 2017-01-05 17:53             ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2017-01-05 17:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 4, 2017 at 6:19 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Tue, Jan 03, 2017 at 10:27:22AM -0800, Andy Lutomirski wrote:
>> On Tue, Jan 3, 2017 at 8:04 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>> > And what about stack? I'm not sure that everybody would be happy with
>> > stack in the middle of address space.
>>
>> I would, personally.  I think that, for very large address spaces, we
>> should allocate a large block of stack and get rid of the "stack grows
>> down forever" legacy idea.  Then we would never need to worry about
>> the stack eventually hitting some other allocation.  And 2^57 bytes is
>> hilariously large for a default stack.
>
> The stack in the middle of address space can prevent creating other huuuge
> contiguous mapping. Databases may want this.

Fair enough.  OTOH, 2^47 is nowhere near the middle if we were to put
it near the top of the legacy address space.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-01-05 17:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20161227015413.187403-1-kirill.shutemov@linux.intel.com>
     [not found] ` <20161227015413.187403-30-kirill.shutemov@linux.intel.com>
2017-01-02  8:44   ` [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR Arnd Bergmann
2017-01-03  6:08     ` Andy Lutomirski
2017-01-03 13:18       ` Arnd Bergmann
2017-01-03 18:29         ` Andy Lutomirski
2017-01-03 22:07           ` Arnd Bergmann
2017-01-03 22:09             ` Andy Lutomirski
2017-01-04 13:55               ` Arnd Bergmann
2017-01-03 16:04       ` Kirill A. Shutemov
2017-01-03 18:27         ` Andy Lutomirski
2017-01-04 14:19           ` Kirill A. Shutemov
2017-01-05 17:53             ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).