public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Safonov <dsafonov@virtuozzo.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	Shuah Khan <shuahkh@osg.samsung.com>,
	Borislav Petkov <bp@alien8.de>, X86 ML <x86@kernel.org>,
	<khorenko@virtuozzo.com>,
	Andrew Morton <akpm@linux-foundation.org>, <xemul@virtuozzo.com>,
	<linux-kselftest@vger.kernel.org>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode
Date: Thu, 7 Apr 2016 18:18:39 +0300	[thread overview]
Message-ID: <57067A4F.9090101@virtuozzo.com> (raw)
In-Reply-To: <CALCETrVNTC9DmKbrYvEDLiAa3s+ZQNyCChk7Y1F6cw6Qv3OuJg@mail.gmail.com>

On 04/07/2016 05:39 PM, Andy Lutomirski wrote:
> On Apr 7, 2016 5:12 AM, "Dmitry Safonov" <dsafonov@virtuozzo.com> wrote:
>> On 04/06/2016 09:04 PM, Andy Lutomirski wrote:
>>> [cc Dave Hansen for MPX]
>>>
>>> On Apr 6, 2016 9:30 AM, "Dmitry Safonov" <dsafonov@virtuozzo.com> wrote:
>>>> Now each process that runs natively on x86_64 may execute 32-bit code
>>>> by proper setting it's CS selector: either from LDT or reuse Linux's
>>>> USER32_CS. The vice-versa is also valid: running 64-bit code in
>>>> compatible task is also possible by choosing USER_CS.
>>>> So we may switch between 32 and 64 bit code execution in any process.
>>>> Linux will choose the right syscall numbers in entries for those
>>>> processes. But it still will consider them native/compat by the
>>>> personality, that elf loader set on launch. This affects i.e., ptrace
>>>> syscall on those tasks: PTRACE_GETREGSET will return 64/32-bit regset
>>>> according to process's mode (that's how strace detect task's
>>>> personality from 4.8 version).
>>>>
>>>> This patch adds arch_prctl calls for x86 that make possible to tell
>>>> Linux kernel in which mode the application is running currently.
>>>> Mainly, this is needed for CRIU: restoring compatible & native
>>>> applications both from 64-bit restorer. By that reason I wrapped all
>>>> the code in CONFIG_CHECKPOINT_RESTORE.
>>>> This patch solves also a problem for running 64-bit code in 32-bit elf
>>>> (and reverse), that you have only 32-bit elf vdso for fast syscalls.
>>>> When switching between native <-> compat mode by arch_prctl, it will
>>>> remap needed vdso binary blob for target mode.
>>> General comments first:
>> Thanks for your comments.
>>> You forgot about x32.
>> Will add x32 support for v2.
>>
>>> I think that you should separate vdso remapping from "personality".
>>> vdso remapping should be available even on native 32-bit builds, which
>>> means that either you can't use arch_prctl for it or you'll have to
>>> wire up arch_prctl as a 32-bit syscall.
>> I cant say, I got your point. Do you mean by vdso remapping
>> mremap for vdso/vvar pages? I think, it should work now.
> For 32-bit, the vdso *must* exist in memory at the address that the
> kernel thinks it's at.  Even if you had a pure 32-bit restore stub,
> you would still need vdso remap, because there's a chance the vdso
> could land at an unusable address, say one page off from where you
> want it.  You couldn't map a wrapper because there wouldn't be any
> space for it without moving the real vdso out of the way.
>
> Remember, you *cannot* mremap() the 32-bit vdso because you will
> crash.  It works by luck for 64-bit, but it's plausible that we'd want
> to change that some day.  (I have awful patches that speed a bunch of
> things up at the cost of a vdso trampoline for 64-bit code and a bunch
> of other hacks.  Those patches will never go in for real, but
> something else might want the ability to use 64-bit vdso trampolines.)
Thanks for the elaboration, now I see. Signals and fast syscalls
expect mm->context.vdso to be correct.
>
>> I did remapping for vdso as blob for native x86_64 task differs
>> to compatible task. So it's just changing blobs, address value
>> is there for convenience - I may omit it and just remap
>> different vdso blob at the same place where was previous vdso.
>> I'm not sure, why do we need possibility to map 64-bit vdso blob
>> on native 32-bit builds?
> That would fail, but I think the API should exist.  But a native
> 32-bit program should be able to remap the 32-bit vdso.
>
> IOW, I think you should be able to do, roughly:
>
> map_new_vdso(VDSO_32BIT, addr);
>
> on any kernel.
>
> Am I making sense?
Yes. I will rework it for some API.
>
>>> For "personality", someone needs to enumerate all of the various thigs
>>> that try to track bitness and see how many of them even make sense.
>>> On brief inspection:
>>>
>>>    - TIF_IA32: affects signal format and does something to ptrace.  I
>>> suspect that whatever it does to ptrace is nonsensical, and I don't
>>> know whether we're stuck with it.
>>>
>>>    - TIF_ADDR32 affects TASK_SIZE and mmap behavior (and the latter
>>> isn't even done in a sensible way).
>>>
>>>    - is_64bit_mm affects MPX and uprobes.
>>>
>>> On even more brief inspection:
>>>
>>>    - uprobes using is_64bit_mm is buggy.
>>>
>>>    - I doubt that having TASK_SIZE vary serves any purpose.  Does anyone
>>> know why TASK_SIZE is different for different tasks?  It would save
>>> code size and speed things up if TASK_SIZE were always TASK_SIZE_MAX.
>>>    - Using TIF_IA32 for signal processing is IMO suboptimal.  Instead,
>>> we should record which syscall installed the signal handler and use
>>> the corresponding frame format.
>> Oh, I like it, will do.
>>
>>>    - Using TIF_IA32 of the *target* for ptrace is nonsense.  Having
>>> strace figure out syscall type using that is actively buggy, and I ran
>>> into that bug a few days ago and cursed at it.  strace should inspect
>>> TS_COMPAT (I don't know how, but that's what should happen).  We may
>>> be stuck with this for ABI reasons.
>> ptrace may check seg_32bit for code selector, what do you think?
> Not sure.  I have never fully wrapped my had around ptrace.
Hm, I guess, it's better to check TS_COMPAT, after some thinking:
It's set up on compatible syscall enter, so there is no need to
check seg_32bit anyway.

Huge thanks, will work on v2 according to your comments.

  reply	other threads:[~2016-04-07 15:19 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-06 16:29 [PATCH 0/2] x86: add arch_prctl to switch between native/compat modes Dmitry Safonov
2016-04-06 16:29 ` [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode Dmitry Safonov
2016-04-06 18:04   ` Andy Lutomirski
2016-04-06 18:49     ` Andy Lutomirski
2016-04-07 12:11     ` Dmitry Safonov
2016-04-07 12:21       ` Cyrill Gorcunov
2016-04-07 12:35         ` Dmitry Safonov
2016-04-07 14:39       ` Andy Lutomirski
2016-04-07 15:18         ` Dmitry Safonov [this message]
2016-04-08 13:50         ` Dmitry Safonov
2016-04-08 15:56           ` Andy Lutomirski
2016-04-08 16:18             ` Dmitry Safonov
2016-04-08 20:44               ` Andy Lutomirski
2016-04-09  8:06                 ` Dmitry Safonov
2016-04-13 16:55                 ` Dmitry Safonov
2016-04-14 18:27                   ` Andy Lutomirski
2016-04-20 11:04                     ` Peter Zijlstra
2016-04-20 15:40                       ` Andy Lutomirski
2016-04-20 19:05                         ` Peter Zijlstra
2016-04-21 19:39                           ` Andy Lutomirski
2016-04-21 20:12                             ` Peter Zijlstra
2016-04-21 23:27                               ` Andy Lutomirski
2016-04-21 23:46                                 ` Andy Lutomirski
2016-04-25 15:16                                 ` Peter Zijlstra
2016-04-25 16:50                                   ` Andy Lutomirski
2016-04-06 16:29 ` [PATCH 2/2] x86/tools/testing: add test for ARCH_SET_COMPAT Dmitry Safonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57067A4F.9090101@virtuozzo.com \
    --to=dsafonov@virtuozzo.com \
    --cc=0x7f454c46@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=khorenko@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=shuahkh@osg.samsung.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox