All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] x86_64: uninline TASK_SIZE
Date: Tue, 23 Apr 2019 13:12:58 +0200	[thread overview]
Message-ID: <20190423111258.GA23410@gmail.com> (raw)
In-Reply-To: <CALCETrWqPStQqv2xCJZLzsqTE-9L9U08CMApWyKD1zwjrrH4ow@mail.gmail.com>


* Andy Lutomirski <luto@kernel.org> wrote:

> > Saving 2KB on a defconfig is quite a lot.
> 
> Saving 2kB of text by adding 8 bytes to thread_info seems rather
> dubious to me.  You only need 256 tasks before you lose.  My
> not-particularly-loaded laptop has 865 tasks right now.

I was suggesting current->task_size or thread_info->task_size as a way to 
100% avoid the function call overhead. Worth a tiny amount of RAM - even 
with 1 million tasks it's only 4MB of RAM. ;-)

Some TASK_SIZE users are prominent syscalls: mmap(), 

> As a general principle, the mere existence of TIF_ADDR32 is a bug. The 
> value of that flag is *wrong* under the 32-bit variant of CRIU. How 
> about instead making some more progress toward getting rid of dubious 
> TASK_SIZE users?  I'm working on a little series to get rid of most of 
> them.  Meanwhile: it sure looks like a large fraction of the users are 
> confused as to whether TASK_SIZE is the highest user address or the 
> lowest non-user address.

I really like that, replacing TASK_SIZE with *nothing* would be even 
faster.

In fact instead of just reducing its usage I'd suggest removing TASK_SIZE 
altogether and renaming TASK_SIZE_MAX back to TASK_SIZE, or something 
like that - the confusion from the deceptively macro-ish naming of 
TASK_SIZE is real.

The original commit of making TASK_SIZE dynamic on the task's compat flag 
was done in:

  84929801e14d: [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes

Here's the justification given:

    Appended patch will setup compatibility mode TASK_SIZE properly.  This will
    fix atleast three known bugs that can be encountered while running
    compatibility mode apps.
    
    a) A malicious 32bit app can have an elf section at 0xffffe000.  During
       exec of this app, we will have a memory leak as insert_vm_struct() is
       not checking for return value in syscall32_setup_pages() and thus not
       freeing the vma allocated for the vsyscall page.  And instead of exec
       failing (as it has addresses > TASK_SIZE), we were allowing it to
       succeed previously.
    
    b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area
       may return addresses beyond 32bits, ultimately causing corruption
       because of wrap-around and resulting in SEGFAULT, instead of returning
       ENOMEM.
    
    c) 32bit app doing this below mmap will now fail.
    
      mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE,
            MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0);

I believe a) is addressed by getting rid of the vsyscall page - but it 
might also not be a current problem because the vsyscall page has its own 
gate-vma now.

b) shouldn't be an issue if the mmap allocator correctly treats the 
compat bit - this doesn't require generic TASK_SIZE variations though, as 
the mmap allocator is already specific to arch/x86/.

c) is a variant of a) I believe, which should be fixed by now.

I just looked through some of the current TASK_SIZE users, and *ALL* of 
them seem dubious to me, with the exception of the mmap allocators. In 
fact some of them seem to be active bugs:

kernel/:

 - PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE, prctl_set_mm() et al. Ugh, what a 
   nasty prctl()! But besides that, the TASK_SIZE restriction to the ABI 
   looks questionable: if we offer this CRIU functionality then why 
   should it be impossible for a 32-bit CRIU task to switch to 64-bit?

 - kernel/events/core.c: TASK_SIZE_MAX should be a fine filter here, in 
   fact it's probably *wrong* to restrict the profiling data here just 
   because the task happens to be in 32-bit compat mode.

 - kernel/rseq.c: is this TASK_SIZE restriction even required, wouldn't 
   TASK_SIZE_MAX be sufficient?

mm/:

 - GUP's get_get_area() et al looks really weird - why do we special-case 
   vsyscalls:

      - Can we get rid of the vsyscall page in modern kernels?

      - I don't think anyone runs those ancient glibc versions with a 
        fresh kernel anymore - can we start generating a WARN()ing 
        perhaps to see whether there's any complaints?

      - Or at least pretend it doesn't exist in terms of a GUP target page?

 - mm/kasan/generic_report.c:get_wild_bug_type() - this can use 
   TASK_SIZE_MAX just fine IMHO.

 - mm/mempolicy.c:mpol_shared_policy_init() - unsure, but I suspect we 
   can just create the pseudo-vma with a TASK_SIZE_MAX vm_end just fine.

 - mm/mlock.c:mlockall() - I believe it could be considered an outright 
   *bug* if there any pages outside the 32-bit area and don't get mlocked 
   by mlockall, just because this is a compat task. Especially with the 
   CRIU prctl() having 64-bit vmas outside the 32-bit mappings is a real 
   possibility, right? I.e. TASK_SIZE_MAX would be the right solution 
   here.

To turn the argument around: beyond the memory allocators, which includes 
the mmap and huge-mmap variants plus the SysV shmem allocator, can we 
list all the places that absolutely *rely* on TASK_SIZE being TIF_ADDR32 
restricted on compat tasks? I couldn't find any.

So I concur 100% that most TASK_SIZE uses are questionable. In fact think 
84929801e14d was a mistake, and we should effectively revert it 
carefully, by:

 - First by moving almost all TASK_SIZE users over to TASK_SIZE_MAX, 
   analyzing and justifying the impact case by case.

 - Then making the mmap allocators compat compatible (ha) without relying 
   on TASK_SIZE.

 - Renaming TASK_SIZE back to TASK_SIZE_MAX and getting rid of the 
   TASK_SIZE and TASK_SIZE_MAX differentiation.

Or am I missing some complication?

Thanks,

	Ingo

  reply	other threads:[~2019-04-23 11:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-21 16:06 [PATCH] x86_64: uninline TASK_SIZE Alexey Dobriyan
2019-04-21 18:28 ` Ingo Molnar
2019-04-21 20:07   ` hpa
2019-04-21 21:10     ` Alexey Dobriyan
2019-04-22 10:34       ` Ingo Molnar
2019-04-22 14:30         ` Andy Lutomirski
2019-04-22 22:09           ` Alexey Dobriyan
2019-04-23  0:54             ` Andy Lutomirski
2019-04-23 11:12               ` Ingo Molnar [this message]
2019-04-23 17:21                 ` Andy Lutomirski
2019-04-24 10:38                   ` Ingo Molnar
2019-04-22 22:04         ` Alexey Dobriyan
2019-04-21 22:07   ` [PATCH v2] " Alexey Dobriyan
2019-04-22 22:40 ` [PATCH] " Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190423111258.GA23410@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=adobriyan@gmail.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.