public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Possible race in copy of fpu->state in copy_process against the exeve'ing parent?
@ 2016-04-13  3:11 Jianyu Zhan
  2016-04-13  3:19 ` Jianyu Zhan
  2016-04-13  6:09 ` Ingo Molnar
  0 siblings, 2 replies; 5+ messages in thread
From: Jianyu Zhan @ 2016-04-13  3:11 UTC (permalink / raw)
  To: mingo, H. Peter Anvin, suresh.b.siddha; +Cc: x86, LKML

Hi,

I encountered a panic on a Linux-3.2 kernel on a x86_64 machine, and
suspect it is a race condition.  And I checked the current mainline
and found it was fixed unintendedly.

So I hope x86/fpu maintainer help verify this.  Thanks verfy much.


The panic stack trace :

 #0 [ffff88529d33f990] try_crashdump at ffffffff8105b8ca
 #1 [ffff88529d33f9a0] dump_on_panic at ffffffff8105b965
 #2 [ffff88529d33fa60] notifier_call_chain at ffffffff8139f784
 #3 [ffff88529d33fac0] atomic_notifier_call_chain at ffffffff8139f81d
 #4 [ffff88529d33fad0] panic at ffffffff8139971c
 #5 [ffff88529d33fb50] oops_end at ffffffff8139d34a
 #6 [ffff88529d33fb80] no_context at ffffffff81021569
 #7 [ffff88529d33fbd0] __bad_area_nosemaphore at ffffffff81021730
 #8 [ffff88529d33fc20] bad_area at ffffffff810217ac
 #9 [ffff88529d33fc50] do_page_fault at ffffffff8139f509
#10 [ffff88529d33fd70] page_fault at ffffffff8139caef
    [exception RIP: prepare_to_copy+35]
<------------------ PANIC !!!
    RIP: ffffffff810013f4  RSP: ffff88529d33fe20  RFLAGS: 00010286
    RAX: 00000000ffffffff  RBX: 0000000001200011  RCX: ffff884fe73f6320
    RDX: 00000000ffffffff  RSI: 00007fff07d36bd0  RDI: 0000000000000000
    RBP: ffff88529d33fe20   R8: 00007f5c4a209770   R9: 0000000000000000
    R10: 00007f5c4a209770  R11: 0000000000000202  R12: 0000000000000000
    R13: 0000000000000000  R14: ffff884fe73f6320  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffff88529d33fe28] copy_process at ffffffff81038211
#12 [ffff88529d33fea8] do_fork at ffffffff810393ec
#13 [ffff88529d33ff38] sys_clone at ffffffff81009118
#14 [ffff88529d33ff48] stub_clone at ffffffff813a31d3


crash7> dis -r prepare_to_copy+35
0xffffffff810013d1 <prepare_to_copy>:   push   %rbp
0xffffffff810013d2 <prepare_to_copy+1>: mov    %rsp,%rbp
0xffffffff810013d5 <prepare_to_copy+4>: nopl   0x0(%rax,%rax,1)
0xffffffff810013da <prepare_to_copy+9>: mov    %rdi,%rcx
0xffffffff810013dd <prepare_to_copy+12>:        cmpl   $0x0,0x4d8(%rdi)
0xffffffff810013e4 <prepare_to_copy+19>:        je
0xffffffff8100142e <prepare_to_copy+93>
0xffffffff810013e6 <prepare_to_copy+21>:        mov    0x4e0(%rdi),%rdi
0xffffffff810013ed <prepare_to_copy+28>:        xchg   %ax,%ax
0xffffffff810013ef <prepare_to_copy+30>:        or     $0xffffffff,%eax
0xffffffff810013f2 <prepare_to_copy+33>:        mov    %eax,%edx
0xffffffff810013f4 <prepare_to_copy+35>:        xsaveopt64 (%rdi)
<---- PANIC HERE

when panic the %rdi is 0x0000000000000000, which is fpu->state.



So  I suspect there is a possible race:


   Parent:

       sys_execve
         do_execve
           do_execve_common
             search_binary_handler
                load_elf_binary
                  start_thread
                    start_thread_common
                       free_thread_xstate(current)
                         fpu_free
                            fpu->state = NULL


    Child:

        sys_clone
          do_fork
             copy_process
                dup_task_struct
                   prepare_to_copy
                      unlazy_fpu
                         __save_init_fpu
                           fpu_save_init
                             fpu_xsave(fpu)   <---- fpu->sate is NULL,
so cause a
                                                                NULL
dereference.

Scenario:  Parent is still exeve'ing,  and just set fpu->state to NULL,
and the a concurrent clone() forks a Child and in which  fpu_xsave()
tries to fpu_xsave, when fpu->state is NULL.

The race window seems quite small, and I have checked the Parent's
'sum_exec_runtime' is 536920255(~0.53s).

I checked the mainline, and found commit 304bceda6a18(" x86, fpu: use
non-lazy fpu restore for processors supporting xsave") seems
unintendedly fix this?





Regards,
Jianyu Zhan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race in copy of fpu->state in copy_process against the exeve'ing parent?
  2016-04-13  3:11 Possible race in copy of fpu->state in copy_process against the exeve'ing parent? Jianyu Zhan
@ 2016-04-13  3:19 ` Jianyu Zhan
  2016-04-13  6:09 ` Ingo Molnar
  1 sibling, 0 replies; 5+ messages in thread
From: Jianyu Zhan @ 2016-04-13  3:19 UTC (permalink / raw)
  To: mingo, H. Peter Anvin, suresh.b.siddha; +Cc: x86, LKML

On Wed, Apr 13, 2016 at 11:11 AM, Jianyu Zhan <nasa4836@gmail.com> wrote:
>
> So  I suspect there is a possible race:
>
>
>    Parent:
>
>        sys_execve
>          do_execve
>            do_execve_common
>              search_binary_handler
>                 load_elf_binary
>                   start_thread
>                     start_thread_common
>                        free_thread_xstate(current)
>                          fpu_free
>                             fpu->state = NULL
>
>
>     Child:
>
>         sys_clone
>           do_fork
>              copy_process
>                 dup_task_struct
>                    prepare_to_copy
>                       unlazy_fpu
>                          __save_init_fpu
>                            fpu_save_init
>                              fpu_xsave(fpu)   <---- fpu->sate is NULL,
> so cause a
>                                                                 NULL
> dereference.
>

Hmm,  I am wrong, it is not Parent vs Child.

It is :   Parent  executes  sys_execuve,  and then right after that,
executes sys_clone.


 Regards,
 Jianyu Zhan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race in copy of fpu->state in copy_process against the exeve'ing parent?
  2016-04-13  3:11 Possible race in copy of fpu->state in copy_process against the exeve'ing parent? Jianyu Zhan
  2016-04-13  3:19 ` Jianyu Zhan
@ 2016-04-13  6:09 ` Ingo Molnar
  2016-04-13  7:23   ` Jianyu Zhan
  1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2016-04-13  6:09 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: mingo, H. Peter Anvin, suresh.b.siddha, x86, LKML,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner, Oleg Nesterov,
	Dave Hansen


* Jianyu Zhan <nasa4836@gmail.com> wrote:

> Hi,
> 
> I encountered a panic on a Linux-3.2 kernel on a x86_64 machine, and
> suspect it is a race condition.  And I checked the current mainline
> and found it was fixed unintendedly.
> 
> So I hope x86/fpu maintainer help verify this.  Thanks verfy much.
> 
> 
> The panic stack trace :
> 
>  #0 [ffff88529d33f990] try_crashdump at ffffffff8105b8ca
>  #1 [ffff88529d33f9a0] dump_on_panic at ffffffff8105b965
>  #2 [ffff88529d33fa60] notifier_call_chain at ffffffff8139f784
>  #3 [ffff88529d33fac0] atomic_notifier_call_chain at ffffffff8139f81d
>  #4 [ffff88529d33fad0] panic at ffffffff8139971c
>  #5 [ffff88529d33fb50] oops_end at ffffffff8139d34a
>  #6 [ffff88529d33fb80] no_context at ffffffff81021569
>  #7 [ffff88529d33fbd0] __bad_area_nosemaphore at ffffffff81021730
>  #8 [ffff88529d33fc20] bad_area at ffffffff810217ac
>  #9 [ffff88529d33fc50] do_page_fault at ffffffff8139f509
> #10 [ffff88529d33fd70] page_fault at ffffffff8139caef
>     [exception RIP: prepare_to_copy+35]
> <------------------ PANIC !!!
>     RIP: ffffffff810013f4  RSP: ffff88529d33fe20  RFLAGS: 00010286
>     RAX: 00000000ffffffff  RBX: 0000000001200011  RCX: ffff884fe73f6320
>     RDX: 00000000ffffffff  RSI: 00007fff07d36bd0  RDI: 0000000000000000
>     RBP: ffff88529d33fe20   R8: 00007f5c4a209770   R9: 0000000000000000
>     R10: 00007f5c4a209770  R11: 0000000000000202  R12: 0000000000000000
>     R13: 0000000000000000  R14: ffff884fe73f6320  R15: 0000000000000001
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> #11 [ffff88529d33fe28] copy_process at ffffffff81038211
> #12 [ffff88529d33fea8] do_fork at ffffffff810393ec
> #13 [ffff88529d33ff38] sys_clone at ffffffff81009118
> #14 [ffff88529d33ff48] stub_clone at ffffffff813a31d3
> 
> 
> crash7> dis -r prepare_to_copy+35
> 0xffffffff810013d1 <prepare_to_copy>:   push   %rbp
> 0xffffffff810013d2 <prepare_to_copy+1>: mov    %rsp,%rbp
> 0xffffffff810013d5 <prepare_to_copy+4>: nopl   0x0(%rax,%rax,1)
> 0xffffffff810013da <prepare_to_copy+9>: mov    %rdi,%rcx
> 0xffffffff810013dd <prepare_to_copy+12>:        cmpl   $0x0,0x4d8(%rdi)
> 0xffffffff810013e4 <prepare_to_copy+19>:        je
> 0xffffffff8100142e <prepare_to_copy+93>
> 0xffffffff810013e6 <prepare_to_copy+21>:        mov    0x4e0(%rdi),%rdi
> 0xffffffff810013ed <prepare_to_copy+28>:        xchg   %ax,%ax
> 0xffffffff810013ef <prepare_to_copy+30>:        or     $0xffffffff,%eax
> 0xffffffff810013f2 <prepare_to_copy+33>:        mov    %eax,%edx
> 0xffffffff810013f4 <prepare_to_copy+35>:        xsaveopt64 (%rdi)
> <---- PANIC HERE
> 
> when panic the %rdi is 0x0000000000000000, which is fpu->state.
> 
> 
> 
> So  I suspect there is a possible race:
> 
> 
>    Parent:
> 
>        sys_execve
>          do_execve
>            do_execve_common
>              search_binary_handler
>                 load_elf_binary
>                   start_thread
>                     start_thread_common
>                        free_thread_xstate(current)
>                          fpu_free
>                             fpu->state = NULL
> 
> 
>     Child:
> 
>         sys_clone
>           do_fork
>              copy_process
>                 dup_task_struct
>                    prepare_to_copy
>                       unlazy_fpu
>                          __save_init_fpu
>                            fpu_save_init
>                              fpu_xsave(fpu)   <---- fpu->sate is NULL,
> so cause a
>                                                                 NULL
> dereference.
> 
> Scenario:  Parent is still exeve'ing,  and just set fpu->state to NULL,
> and the a concurrent clone() forks a Child and in which  fpu_xsave()
> tries to fpu_xsave, when fpu->state is NULL.
> 
> The race window seems quite small, and I have checked the Parent's
> 'sum_exec_runtime' is 536920255(~0.53s).
> 
> I checked the mainline, and found commit 304bceda6a18(" x86, fpu: use
> non-lazy fpu restore for processors supporting xsave") seems
> unintendedly fix this?

So I'm not sure I understand the suggested race. Separate tasks have separate 
fpu->state states, so a parallel execve() and clone() has no effect on each other. 
There's no FPU state sharing.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race in copy of fpu->state in copy_process against the exeve'ing parent?
  2016-04-13  6:09 ` Ingo Molnar
@ 2016-04-13  7:23   ` Jianyu Zhan
  2016-04-13 10:25     ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Jianyu Zhan @ 2016-04-13  7:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mingo, H. Peter Anvin, suresh.b.siddha, x86, LKML,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner, Oleg Nesterov,
	Dave Hansen

On Wed, Apr 13, 2016 at 2:09 PM, Ingo Molnar <mingo@kernel.org> wrote:
> So I'm not sure I understand the suggested race. Separate tasks have separate
> fpu->state states, so a parallel execve() and clone() has no effect on each other.
> There's no FPU state sharing.


Hi, Ingo, thans for reply.

Let me  try describe the situation clearly :


>From the panic stack trace, we can infer the call path before panic:


        sys_clone
          do_fork
             copy_process
                dup_task_struct(current)
                   prepare_to_copy(current)
                      unlazy_fpu(current)
                         __save_init_fpu(current)
                           fpu_save_init(current)
                             fpu_xsave(&current->thread.fpu) <---- PANIC


In this case , &thread.fpu.state is NULL, so it caused a write to
NULL address fault,
which of course is invalid and resulted in a panic.

After reviewing the code, I found only one place in kernel(v3.2.33)  that
could make the fpu.state NULL is
from this call path:

      sys_execve
         do_execve
           do_execve_common
             search_binary_handler
                load_elf_binary
                  start_thread
                    start_thread_common
                       free_thread_xstate(current)
                         fpu_free(&current->thread.fpu)
                            fpu->state = NULL


And I also learned that after the first time fpu is used, init_fpu will be
called to allocate a new fpu->state.


So I suspect if this is a problem : we call sys_clone right after
sys_execve, but right before init_fpu()
is called for the first time to allocate a struct fpu for current,
so a NULL fpu->state is seen.


And  commit 304bceda6a18(" x86, fpu: use non-lazy fpu restore for
processors supporting xsave") seems
unintendedly fix this ?


Regards,
Jianyu Zhan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race in copy of fpu->state in copy_process against the exeve'ing parent?
  2016-04-13  7:23   ` Jianyu Zhan
@ 2016-04-13 10:25     ` Oleg Nesterov
  0 siblings, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2016-04-13 10:25 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: Ingo Molnar, mingo, H. Peter Anvin, suresh.b.siddha, x86, LKML,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner, Dave Hansen

On 04/13, Jianyu Zhan wrote:
>
> From the panic stack trace, we can infer the call path before panic:
>
>
>         sys_clone
>           do_fork
>              copy_process
>                 dup_task_struct(current)
>                    prepare_to_copy(current)
>                       unlazy_fpu(current)
>                          __save_init_fpu(current)
>                            fpu_save_init(current)
>                              fpu_xsave(&current->thread.fpu) <---- PANIC
>
>
> In this case , &thread.fpu.state is NULL, so it caused a write to
> NULL address fault,

Yes, but iirc fpu.state == NULL is not a problem. The problem is that
TS_USEDFPU is set while it should not.

>       sys_execve
>          do_execve
>            do_execve_common
>              search_binary_handler
>                 load_elf_binary
>                   start_thread
>                     start_thread_common
>                        free_thread_xstate(current)
>                          fpu_free(&current->thread.fpu)
>                             fpu->state = NULL

Yes, but note that exec path also calls flush_thread() which clears TS_USEDFPU.

Yes, this is confusing, and we had a lot bugs in this area. To be honest I didn't
even try to recall how this (very old) code works, sorry... So I can't say what
exactly could explain the wrong TS_USEDFPU.

Oleg.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-13 11:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-13  3:11 Possible race in copy of fpu->state in copy_process against the exeve'ing parent? Jianyu Zhan
2016-04-13  3:19 ` Jianyu Zhan
2016-04-13  6:09 ` Ingo Molnar
2016-04-13  7:23   ` Jianyu Zhan
2016-04-13 10:25     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox