All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
To: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
Cc: Dmitry Safonov <dsafonov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	"Kirill A. Shutemov"
	<kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	linux-arch <linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>,
	Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>,
	"linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	Carlos O'Donell <carlos-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Dmitry Safonov
	<0x7f454c46-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
	Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	X86 ML <x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: Re: How should we handle variable address space sizes (Re: [RFC 3/4] x86/mm: define TASK_SIZE as current->mm->task_size)
Date: Mon, 2 Jan 2017 12:49:07 +0300	[thread overview]
Message-ID: <20170102094907.GC30735@node.shutemov.name> (raw)
In-Reply-To: <CALCETrXMCVOmVcQYxF_ghPdEjLuNNqbcnoRKRVpJegsQ=SPEFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Dec 30, 2016 at 06:11:05PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 30, 2016 at 7:56 AM, Dmitry Safonov <dsafonov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> wrote:
> > Keep task's virtual address space size as mm_struct field which
> > exists for a long time - it's initialized in setup_new_exec()
> > depending on the new task's personality.
> > This way TASK_SIZE will always be the same as current->mm->task_size.
> > Previously, there could be an issue about different values of
> > TASK_SIZE and current->mm->task_size: e.g, a 32-bit process can unset
> > ADDR_LIMIT_3GB personality (with personality syscall) and
> > so TASK_SIZE will be 4Gb, which is larger than mm->task_size = 3Gb.
> > As TASK_SIZE *and* current->mm->task_size are used both in code
> > frequently, this difference creates a subtle situations, for example:
> > one can mmap addresses > 3Gb, but they will be hidden in
> > /proc/pid/pagemap as it checks mm->task_size.
> > I've moved initialization of mm->task_size earlier in setup_new_exec()
> > as arch_pick_mmap_layout() initializes mmap_legacy_base with
> > TASK_UNMAPPED_BASE, which depends on TASK_SIZE.
> 
> I don't like this patch so much because I think that we should figure
> out how this will all work in the long run first.  I've added some
> more people to the thread because other arches have similar issues and
> because x86 is about to get considerably more complicated (choices
> include 3GB, 4GB, 47-bit, and 56-bit (the latter IIRC)).
> 
> Here are a few of my thoughts on the matter.  This isn't all that well
> thought out:
> 
> The address space limit, especially if CRIU is in play, isn't really a
> hard limit.  For example, you could allocate high memory then lower
> the limit.  Similarly, I see no reason that an x32 program should be
> forbidden from mapping some high addresses or, similarly, that an i386
> program can't (if it really wanted to) do a 64-bit mmap() and get a
> high address.
> 
> On that note, can we just *delete* the task_size check from pagemap?
> It's been there since the very beginning:
> 
> commit 85863e475e59afb027b0113290e3796ee6020b7d
> Author: Matt Mackall <mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org>
> Date:   Mon Feb 4 22:29:04 2008 -0800
> 
>     maps4: add /proc/pid/pagemap interface
> 
> and there's no explanation for why it's needed.
> 
> So maybe we should have a *number* (not a bit) that indicates the
> maximum address that mmap() will return unless an override is in use.
> Since common practice seems to be to stick this in the personality
> field, we may need some fancy encoding.  Executing a setuid binary
> needs to reset to the default, and personality handles that.

If we want to be able to specify arbitrary address as maximum, a fancy
encoding would need to claim 51 bits (63 VA - 12 in-page address) on x86
from the persona flag.
To me, it's stretching personality interface too far.

Maybe it's easier to reset the rlimit for suid binaries?

-- 
 Kirill A. Shutemov

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-arch <linux-arch@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Will Deacon <will.deacon@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Carlos O'Donell <carlos@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>
Subject: Re: How should we handle variable address space sizes (Re: [RFC 3/4] x86/mm: define TASK_SIZE as current->mm->task_size)
Date: Mon, 2 Jan 2017 12:49:07 +0300	[thread overview]
Message-ID: <20170102094907.GC30735@node.shutemov.name> (raw)
Message-ID: <20170102094907.YrnN8mHlJT-d-xzq3djH5fFmkCn3borlFFnlhUmTpAY@z> (raw)
In-Reply-To: <CALCETrXMCVOmVcQYxF_ghPdEjLuNNqbcnoRKRVpJegsQ=SPEFQ@mail.gmail.com>

On Fri, Dec 30, 2016 at 06:11:05PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 30, 2016 at 7:56 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> > Keep task's virtual address space size as mm_struct field which
> > exists for a long time - it's initialized in setup_new_exec()
> > depending on the new task's personality.
> > This way TASK_SIZE will always be the same as current->mm->task_size.
> > Previously, there could be an issue about different values of
> > TASK_SIZE and current->mm->task_size: e.g, a 32-bit process can unset
> > ADDR_LIMIT_3GB personality (with personality syscall) and
> > so TASK_SIZE will be 4Gb, which is larger than mm->task_size = 3Gb.
> > As TASK_SIZE *and* current->mm->task_size are used both in code
> > frequently, this difference creates a subtle situations, for example:
> > one can mmap addresses > 3Gb, but they will be hidden in
> > /proc/pid/pagemap as it checks mm->task_size.
> > I've moved initialization of mm->task_size earlier in setup_new_exec()
> > as arch_pick_mmap_layout() initializes mmap_legacy_base with
> > TASK_UNMAPPED_BASE, which depends on TASK_SIZE.
> 
> I don't like this patch so much because I think that we should figure
> out how this will all work in the long run first.  I've added some
> more people to the thread because other arches have similar issues and
> because x86 is about to get considerably more complicated (choices
> include 3GB, 4GB, 47-bit, and 56-bit (the latter IIRC)).
> 
> Here are a few of my thoughts on the matter.  This isn't all that well
> thought out:
> 
> The address space limit, especially if CRIU is in play, isn't really a
> hard limit.  For example, you could allocate high memory then lower
> the limit.  Similarly, I see no reason that an x32 program should be
> forbidden from mapping some high addresses or, similarly, that an i386
> program can't (if it really wanted to) do a 64-bit mmap() and get a
> high address.
> 
> On that note, can we just *delete* the task_size check from pagemap?
> It's been there since the very beginning:
> 
> commit 85863e475e59afb027b0113290e3796ee6020b7d
> Author: Matt Mackall <mpm@selenic.com>
> Date:   Mon Feb 4 22:29:04 2008 -0800
> 
>     maps4: add /proc/pid/pagemap interface
> 
> and there's no explanation for why it's needed.
> 
> So maybe we should have a *number* (not a bit) that indicates the
> maximum address that mmap() will return unless an override is in use.
> Since common practice seems to be to stick this in the personality
> field, we may need some fancy encoding.  Executing a setuid binary
> needs to reset to the default, and personality handles that.

If we want to be able to specify arbitrary address as maximum, a fancy
encoding would need to claim 51 bits (63 VA - 12 in-page address) on x86
from the persona flag.
To me, it's stretching personality interface too far.

Maybe it's easier to reset the rlimit for suid binaries?

-- 
 Kirill A. Shutemov

WARNING: multiple messages have this Message-ID (diff)
From: kirill@shutemov.name (Kirill A. Shutemov)
To: linux-arm-kernel@lists.infradead.org
Subject: How should we handle variable address space sizes (Re: [RFC 3/4] x86/mm: define TASK_SIZE as current->mm->task_size)
Date: Mon, 2 Jan 2017 12:49:07 +0300	[thread overview]
Message-ID: <20170102094907.GC30735@node.shutemov.name> (raw)
In-Reply-To: <CALCETrXMCVOmVcQYxF_ghPdEjLuNNqbcnoRKRVpJegsQ=SPEFQ@mail.gmail.com>

On Fri, Dec 30, 2016 at 06:11:05PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 30, 2016 at 7:56 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> > Keep task's virtual address space size as mm_struct field which
> > exists for a long time - it's initialized in setup_new_exec()
> > depending on the new task's personality.
> > This way TASK_SIZE will always be the same as current->mm->task_size.
> > Previously, there could be an issue about different values of
> > TASK_SIZE and current->mm->task_size: e.g, a 32-bit process can unset
> > ADDR_LIMIT_3GB personality (with personality syscall) and
> > so TASK_SIZE will be 4Gb, which is larger than mm->task_size = 3Gb.
> > As TASK_SIZE *and* current->mm->task_size are used both in code
> > frequently, this difference creates a subtle situations, for example:
> > one can mmap addresses > 3Gb, but they will be hidden in
> > /proc/pid/pagemap as it checks mm->task_size.
> > I've moved initialization of mm->task_size earlier in setup_new_exec()
> > as arch_pick_mmap_layout() initializes mmap_legacy_base with
> > TASK_UNMAPPED_BASE, which depends on TASK_SIZE.
> 
> I don't like this patch so much because I think that we should figure
> out how this will all work in the long run first.  I've added some
> more people to the thread because other arches have similar issues and
> because x86 is about to get considerably more complicated (choices
> include 3GB, 4GB, 47-bit, and 56-bit (the latter IIRC)).
> 
> Here are a few of my thoughts on the matter.  This isn't all that well
> thought out:
> 
> The address space limit, especially if CRIU is in play, isn't really a
> hard limit.  For example, you could allocate high memory then lower
> the limit.  Similarly, I see no reason that an x32 program should be
> forbidden from mapping some high addresses or, similarly, that an i386
> program can't (if it really wanted to) do a 64-bit mmap() and get a
> high address.
> 
> On that note, can we just *delete* the task_size check from pagemap?
> It's been there since the very beginning:
> 
> commit 85863e475e59afb027b0113290e3796ee6020b7d
> Author: Matt Mackall <mpm@selenic.com>
> Date:   Mon Feb 4 22:29:04 2008 -0800
> 
>     maps4: add /proc/pid/pagemap interface
> 
> and there's no explanation for why it's needed.
> 
> So maybe we should have a *number* (not a bit) that indicates the
> maximum address that mmap() will return unless an override is in use.
> Since common practice seems to be to stick this in the personality
> field, we may need some fancy encoding.  Executing a setuid binary
> needs to reset to the default, and personality handles that.

If we want to be able to specify arbitrary address as maximum, a fancy
encoding would need to claim 51 bits (63 VA - 12 in-page address) on x86
from the persona flag.
To me, it's stretching personality interface too far.

Maybe it's easier to reset the rlimit for suid binaries?

-- 
 Kirill A. Shutemov

  parent reply	other threads:[~2017-01-02  9:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-31  2:11 How should we handle variable address space sizes (Re: [RFC 3/4] x86/mm: define TASK_SIZE as current->mm->task_size) Andy Lutomirski
2016-12-31  2:11 ` Andy Lutomirski
2016-12-31  2:11 ` Andy Lutomirski
     [not found] ` <CALCETrXMCVOmVcQYxF_ghPdEjLuNNqbcnoRKRVpJegsQ=SPEFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-02  9:49   ` Kirill A. Shutemov [this message]
2017-01-02  9:49     ` Kirill A. Shutemov
2017-01-02  9:49     ` Kirill A. Shutemov
2017-01-02 16:52     ` Andy Lutomirski
2017-01-02 16:52       ` Andy Lutomirski
2017-01-02 16:52       ` Andy Lutomirski
2017-01-02 23:06       ` hpa
2017-01-02 23:06         ` hpa at zytor.com
2017-01-02 23:06         ` hpa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170102094907.GC30735@node.shutemov.name \
    --to=kirill-okw7cidhh8elwutg50ltga@public.gmane.org \
    --cc=0x7f454c46-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=carlos-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=catalin.marinas-5wv7dgnIgG8@public.gmane.org \
    --cc=dsafonov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org \
    --cc=hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org \
    --cc=kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    --cc=mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
    --cc=x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.