From: Ingo Molnar <mingo@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Linus Torvalds <torvalds@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] x86_64: uninline TASK_SIZE
Date: Tue, 23 Apr 2019 13:12:58 +0200 [thread overview]
Message-ID: <20190423111258.GA23410@gmail.com> (raw)
In-Reply-To: <CALCETrWqPStQqv2xCJZLzsqTE-9L9U08CMApWyKD1zwjrrH4ow@mail.gmail.com>
* Andy Lutomirski <luto@kernel.org> wrote:
> > Saving 2KB on a defconfig is quite a lot.
>
> Saving 2kB of text by adding 8 bytes to thread_info seems rather
> dubious to me. You only need 256 tasks before you lose. My
> not-particularly-loaded laptop has 865 tasks right now.
I was suggesting current->task_size or thread_info->task_size as a way to
100% avoid the function call overhead. Worth a tiny amount of RAM - even
with 1 million tasks it's only 4MB of RAM. ;-)
Some TASK_SIZE users are prominent syscalls: mmap(),
> As a general principle, the mere existence of TIF_ADDR32 is a bug. The
> value of that flag is *wrong* under the 32-bit variant of CRIU. How
> about instead making some more progress toward getting rid of dubious
> TASK_SIZE users? I'm working on a little series to get rid of most of
> them. Meanwhile: it sure looks like a large fraction of the users are
> confused as to whether TASK_SIZE is the highest user address or the
> lowest non-user address.
I really like that, replacing TASK_SIZE with *nothing* would be even
faster.
In fact instead of just reducing its usage I'd suggest removing TASK_SIZE
altogether and renaming TASK_SIZE_MAX back to TASK_SIZE, or something
like that - the confusion from the deceptively macro-ish naming of
TASK_SIZE is real.
The original commit of making TASK_SIZE dynamic on the task's compat flag
was done in:
84929801e14d: [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes
Here's the justification given:
Appended patch will setup compatibility mode TASK_SIZE properly. This will
fix atleast three known bugs that can be encountered while running
compatibility mode apps.
a) A malicious 32bit app can have an elf section at 0xffffe000. During
exec of this app, we will have a memory leak as insert_vm_struct() is
not checking for return value in syscall32_setup_pages() and thus not
freeing the vma allocated for the vsyscall page. And instead of exec
failing (as it has addresses > TASK_SIZE), we were allowing it to
succeed previously.
b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area
may return addresses beyond 32bits, ultimately causing corruption
because of wrap-around and resulting in SEGFAULT, instead of returning
ENOMEM.
c) 32bit app doing this below mmap will now fail.
mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE,
MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0);
I believe a) is addressed by getting rid of the vsyscall page - but it
might also not be a current problem because the vsyscall page has its own
gate-vma now.
b) shouldn't be an issue if the mmap allocator correctly treats the
compat bit - this doesn't require generic TASK_SIZE variations though, as
the mmap allocator is already specific to arch/x86/.
c) is a variant of a) I believe, which should be fixed by now.
I just looked through some of the current TASK_SIZE users, and *ALL* of
them seem dubious to me, with the exception of the mmap allocators. In
fact some of them seem to be active bugs:
kernel/:
- PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE, prctl_set_mm() et al. Ugh, what a
nasty prctl()! But besides that, the TASK_SIZE restriction to the ABI
looks questionable: if we offer this CRIU functionality then why
should it be impossible for a 32-bit CRIU task to switch to 64-bit?
- kernel/events/core.c: TASK_SIZE_MAX should be a fine filter here, in
fact it's probably *wrong* to restrict the profiling data here just
because the task happens to be in 32-bit compat mode.
- kernel/rseq.c: is this TASK_SIZE restriction even required, wouldn't
TASK_SIZE_MAX be sufficient?
mm/:
- GUP's get_get_area() et al looks really weird - why do we special-case
vsyscalls:
- Can we get rid of the vsyscall page in modern kernels?
- I don't think anyone runs those ancient glibc versions with a
fresh kernel anymore - can we start generating a WARN()ing
perhaps to see whether there's any complaints?
- Or at least pretend it doesn't exist in terms of a GUP target page?
- mm/kasan/generic_report.c:get_wild_bug_type() - this can use
TASK_SIZE_MAX just fine IMHO.
- mm/mempolicy.c:mpol_shared_policy_init() - unsure, but I suspect we
can just create the pseudo-vma with a TASK_SIZE_MAX vm_end just fine.
- mm/mlock.c:mlockall() - I believe it could be considered an outright
*bug* if there any pages outside the 32-bit area and don't get mlocked
by mlockall, just because this is a compat task. Especially with the
CRIU prctl() having 64-bit vmas outside the 32-bit mappings is a real
possibility, right? I.e. TASK_SIZE_MAX would be the right solution
here.
To turn the argument around: beyond the memory allocators, which includes
the mmap and huge-mmap variants plus the SysV shmem allocator, can we
list all the places that absolutely *rely* on TASK_SIZE being TIF_ADDR32
restricted on compat tasks? I couldn't find any.
So I concur 100% that most TASK_SIZE uses are questionable. In fact think
84929801e14d was a mistake, and we should effectively revert it
carefully, by:
- First by moving almost all TASK_SIZE users over to TASK_SIZE_MAX,
analyzing and justifying the impact case by case.
- Then making the mmap allocators compat compatible (ha) without relying
on TASK_SIZE.
- Renaming TASK_SIZE back to TASK_SIZE_MAX and getting rid of the
TASK_SIZE and TASK_SIZE_MAX differentiation.
Or am I missing some complication?
Thanks,
Ingo
next prev parent reply other threads:[~2019-04-23 11:13 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-21 16:06 [PATCH] x86_64: uninline TASK_SIZE Alexey Dobriyan
2019-04-21 18:28 ` Ingo Molnar
2019-04-21 20:07 ` hpa
2019-04-21 21:10 ` Alexey Dobriyan
2019-04-22 10:34 ` Ingo Molnar
2019-04-22 14:30 ` Andy Lutomirski
2019-04-22 22:09 ` Alexey Dobriyan
2019-04-23 0:54 ` Andy Lutomirski
2019-04-23 11:12 ` Ingo Molnar [this message]
2019-04-23 17:21 ` Andy Lutomirski
2019-04-24 10:38 ` Ingo Molnar
2019-04-22 22:04 ` Alexey Dobriyan
2019-04-21 22:07 ` [PATCH v2] " Alexey Dobriyan
2019-04-22 22:40 ` [PATCH] " Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190423111258.GA23410@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=adobriyan@gmail.com \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.