All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Emelyanov <xemul@parallels.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>
Subject: Re: mm: NULL ptr deref handling mmaping of special mappings
Date: Mon, 19 May 2014 12:27:29 +0400	[thread overview]
Message-ID: <5379C071.4090100@parallels.com> (raw)
In-Reply-To: <CALCETrWw7tS2Lpnb1OxgZpBwHvOSbDk2zBVtUTJEp5eooYUyhA@mail.gmail.com>

On 05/15/2014 11:42 PM, Andy Lutomirski wrote:
> On May 14, 2014 8:36 PM, "Pavel Emelyanov" <xemul@parallels.com> wrote:
>>
>> On 05/15/2014 02:23 AM, Andy Lutomirski wrote:
>>> On Wed, May 14, 2014 at 3:11 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>>>> On Wed, May 14, 2014 at 02:33:54PM -0700, Andy Lutomirski wrote:
>>>>> On Wed, May 14, 2014 at 2:31 PM, Andrew Morton
>>>>> <akpm@linux-foundation.org> wrote:
>>>>>> On Wed, 14 May 2014 17:11:00 -0400 Sasha Levin <sasha.levin@oracle.com> wrote:
>>>>>>
>>>>>>>> In my linux-next all that code got deleted by Andy's "x86, vdso:
>>>>>>>> Reimplement vdso.so preparation in build-time C" anyway.  What kernel
>>>>>>>> were you looking at?
>>>>>>>
>>>>>>> Deleted? It appears in today's -next. arch/x86/vdso/vma.c:124 .
>>>>>>>
>>>>>>> I don't see Andy's patch removing that code either.
>>>>>>
>>>>>> ah, OK, it got moved from arch/x86/vdso/vdso32-setup.c into
>>>>>> arch/x86/vdso/vma.c.
>>>>>>
>>>>>> Maybe you managed to take a fault against the symbol area between the
>>>>>> _install_special_mapping() and the remap_pfn_range() call, but mmap_sem
>>>>>> should prevent that.
>>>>>>
>>>>>> Or the remap_pfn_range() call never happened.  Should map_vdso() be
>>>>>> running _install_special_mapping() at all if
>>>>>> image->sym_vvar_page==NULL?
>>>>>
>>>>> I'm confused: are we talking about 3.15-rcsomething or linux-next?
>>>>> That code changed.
>>>>>
>>>>> Would this all make more sense if there were just a single vma in
>>>>> here?  cc: Pavel and Cyrill, who might have to deal with this stuff in
>>>>> CRIU
>>>>
>>>> Well, for criu we've not modified any vdso kernel's code (except
>>>> setting VM_SOFTDIRTY for this vdso VMA in _install_special_mapping).
>>>> And never experienced problems Sasha points. Looks like indeed in
>>>> -next code is pretty different from mainline one. To figure out
>>>> why I need to fetch -next branch and get some research. I would
>>>> try to do that tomorrow (still hoping someone more experienced
>>>> in mm system would beat me on that).
>>>
>>> I can summarize:
>>>
>>> On 3.14 and before, the vdso is just a bunch of ELF headers and
>>> executable data.  When executed by 64-bit binaries, it reads from the
>>> fixmap to do its thing.  That is, it reads from kernel addresses that
>>> don't have vmas.  When executed by 32-bit binaries, it doesn't read
>>> anything, since there was no 32-bit timing code.
>>>
>>> On 3.15, the x86_64 vdso is unchanged.  The 32-bit vdso is preceded by
>>> a separate vma containing two pages worth of time-varying read-only
>>> data.  The vdso reads those pages using PIC references.
>>>
>>> On linux-next, all vdsos work the same way.  There are two vmas.  The
>>> first vma is executable text, which can be poked at by ptrace, etc
>>> normally.  The second vma contains time-varying state, should not
>>> allow poking, and is accessed by PIC references.
>>
>> Is this 2nd vma seen in /proc/pid/maps? And if so, is it marked somehow?
> 
> It is in maps, and it's not marked.  I can write a patch to change
> that.  I imagine it shouldn't be called [vdso], though.

That would be great.

>>
>>> What does CRIU do to restore the vdso?  Will 3.15 and/or linux-next
>>> need to make some concession for CRIU?
>>
>> We detect the vdso by "[vdso]" mark in proc at dump time and mark it in
>> the images. At restore time we check that vdso symbols layout hasn't changed
>> and just remap it in proper location.
>>
>> If this remains the same in -next, then we're fine :)
> 
> If you just remap the vdso, you'll crash.
> 
> This is the case in 3.15, too, for 32-bit apps, anyway.
> 
> What happens if you try to checkpoint a program that's in the vdso or,
> worse, in a signal frame with the vdso on the stack?

Nothing good, unfortunately :( And this is one of the things we're investigating.
Cyrill can shed more light on it, as he's the one in charge.

> --Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Pavel Emelyanov <xemul@parallels.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>
Subject: Re: mm: NULL ptr deref handling mmaping of special mappings
Date: Mon, 19 May 2014 12:27:29 +0400	[thread overview]
Message-ID: <5379C071.4090100@parallels.com> (raw)
In-Reply-To: <CALCETrWw7tS2Lpnb1OxgZpBwHvOSbDk2zBVtUTJEp5eooYUyhA@mail.gmail.com>

On 05/15/2014 11:42 PM, Andy Lutomirski wrote:
> On May 14, 2014 8:36 PM, "Pavel Emelyanov" <xemul@parallels.com> wrote:
>>
>> On 05/15/2014 02:23 AM, Andy Lutomirski wrote:
>>> On Wed, May 14, 2014 at 3:11 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>>>> On Wed, May 14, 2014 at 02:33:54PM -0700, Andy Lutomirski wrote:
>>>>> On Wed, May 14, 2014 at 2:31 PM, Andrew Morton
>>>>> <akpm@linux-foundation.org> wrote:
>>>>>> On Wed, 14 May 2014 17:11:00 -0400 Sasha Levin <sasha.levin@oracle.com> wrote:
>>>>>>
>>>>>>>> In my linux-next all that code got deleted by Andy's "x86, vdso:
>>>>>>>> Reimplement vdso.so preparation in build-time C" anyway.  What kernel
>>>>>>>> were you looking at?
>>>>>>>
>>>>>>> Deleted? It appears in today's -next. arch/x86/vdso/vma.c:124 .
>>>>>>>
>>>>>>> I don't see Andy's patch removing that code either.
>>>>>>
>>>>>> ah, OK, it got moved from arch/x86/vdso/vdso32-setup.c into
>>>>>> arch/x86/vdso/vma.c.
>>>>>>
>>>>>> Maybe you managed to take a fault against the symbol area between the
>>>>>> _install_special_mapping() and the remap_pfn_range() call, but mmap_sem
>>>>>> should prevent that.
>>>>>>
>>>>>> Or the remap_pfn_range() call never happened.  Should map_vdso() be
>>>>>> running _install_special_mapping() at all if
>>>>>> image->sym_vvar_page==NULL?
>>>>>
>>>>> I'm confused: are we talking about 3.15-rcsomething or linux-next?
>>>>> That code changed.
>>>>>
>>>>> Would this all make more sense if there were just a single vma in
>>>>> here?  cc: Pavel and Cyrill, who might have to deal with this stuff in
>>>>> CRIU
>>>>
>>>> Well, for criu we've not modified any vdso kernel's code (except
>>>> setting VM_SOFTDIRTY for this vdso VMA in _install_special_mapping).
>>>> And never experienced problems Sasha points. Looks like indeed in
>>>> -next code is pretty different from mainline one. To figure out
>>>> why I need to fetch -next branch and get some research. I would
>>>> try to do that tomorrow (still hoping someone more experienced
>>>> in mm system would beat me on that).
>>>
>>> I can summarize:
>>>
>>> On 3.14 and before, the vdso is just a bunch of ELF headers and
>>> executable data.  When executed by 64-bit binaries, it reads from the
>>> fixmap to do its thing.  That is, it reads from kernel addresses that
>>> don't have vmas.  When executed by 32-bit binaries, it doesn't read
>>> anything, since there was no 32-bit timing code.
>>>
>>> On 3.15, the x86_64 vdso is unchanged.  The 32-bit vdso is preceded by
>>> a separate vma containing two pages worth of time-varying read-only
>>> data.  The vdso reads those pages using PIC references.
>>>
>>> On linux-next, all vdsos work the same way.  There are two vmas.  The
>>> first vma is executable text, which can be poked at by ptrace, etc
>>> normally.  The second vma contains time-varying state, should not
>>> allow poking, and is accessed by PIC references.
>>
>> Is this 2nd vma seen in /proc/pid/maps? And if so, is it marked somehow?
> 
> It is in maps, and it's not marked.  I can write a patch to change
> that.  I imagine it shouldn't be called [vdso], though.

That would be great.

>>
>>> What does CRIU do to restore the vdso?  Will 3.15 and/or linux-next
>>> need to make some concession for CRIU?
>>
>> We detect the vdso by "[vdso]" mark in proc at dump time and mark it in
>> the images. At restore time we check that vdso symbols layout hasn't changed
>> and just remap it in proper location.
>>
>> If this remains the same in -next, then we're fine :)
> 
> If you just remap the vdso, you'll crash.
> 
> This is the case in 3.15, too, for 32-bit apps, anyway.
> 
> What happens if you try to checkpoint a program that's in the vdso or,
> worse, in a signal frame with the vdso on the stack?

Nothing good, unfortunately :( And this is one of the things we're investigating.
Cyrill can shed more light on it, as he's the one in charge.

> --Andy

  reply	other threads:[~2014-05-19  8:27 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-14 15:55 mm: NULL ptr deref handling mmaping of special mappings Sasha Levin
2014-05-14 15:55 ` Sasha Levin
2014-05-14 20:23 ` Andrew Morton
2014-05-14 20:23   ` Andrew Morton
2014-05-14 20:41   ` Sasha Levin
2014-05-14 20:41     ` Sasha Levin
2014-05-14 21:03     ` Andrew Morton
2014-05-14 21:03       ` Andrew Morton
2014-05-14 21:11       ` Sasha Levin
2014-05-14 21:11         ` Sasha Levin
2014-05-14 21:31         ` Andrew Morton
2014-05-14 21:31           ` Andrew Morton
2014-05-14 21:33           ` Andy Lutomirski
2014-05-14 21:33             ` Andy Lutomirski
2014-05-14 22:11             ` Cyrill Gorcunov
2014-05-14 22:11               ` Cyrill Gorcunov
2014-05-14 22:23               ` Andy Lutomirski
2014-05-14 22:23                 ` Andy Lutomirski
2014-05-15  2:36                 ` Pavel Emelyanov
2014-05-15  2:36                   ` Pavel Emelyanov
2014-05-15 19:42                   ` Andy Lutomirski
2014-05-15 19:42                     ` Andy Lutomirski
2014-05-19  8:27                     ` Pavel Emelyanov [this message]
2014-05-19  8:27                       ` Pavel Emelyanov
2014-05-19  8:40                       ` Cyrill Gorcunov
2014-05-19  8:40                         ` Cyrill Gorcunov
2014-05-15  8:45                 ` Cyrill Gorcunov
2014-05-15  8:45                   ` Cyrill Gorcunov
2014-05-15 19:46                   ` Andy Lutomirski
2014-05-15 19:46                     ` Andy Lutomirski
2014-05-15 19:53                     ` Cyrill Gorcunov
2014-05-15 19:53                       ` Cyrill Gorcunov
2014-05-15 19:59                       ` Andy Lutomirski
2014-05-15 19:59                         ` Andy Lutomirski
2014-05-15 20:19                         ` Cyrill Gorcunov
2014-05-15 20:19                           ` Cyrill Gorcunov
2014-05-15 21:31                           ` Cyrill Gorcunov
2014-05-15 21:31                             ` Cyrill Gorcunov
2014-05-15 21:42                             ` Andy Lutomirski
2014-05-15 21:42                               ` Andy Lutomirski
2014-05-15 21:57                               ` Cyrill Gorcunov
2014-05-15 21:57                                 ` Cyrill Gorcunov
2014-05-15 22:15                                 ` Andy Lutomirski
2014-05-15 22:15                                   ` Andy Lutomirski
2014-05-16 22:40                                   ` Andy Lutomirski
2014-05-16 22:40                                     ` Andy Lutomirski
2014-05-16 22:56                                     ` H. Peter Anvin
2014-05-16 22:56                                       ` H. Peter Anvin
2014-05-16 23:10                                       ` Andy Lutomirski
2014-05-17  6:15                                     ` Cyrill Gorcunov
2014-05-17  6:15                                       ` Cyrill Gorcunov
2014-05-14 22:51           ` Andy Lutomirski
2014-05-14 22:51             ` Andy Lutomirski
2014-05-14 21:26       ` Andy Lutomirski
2014-05-14 21:26         ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5379C071.4090100@parallels.com \
    --to=xemul@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=gorcunov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.