From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965468AbcDMGJ5 (ORCPT ); Wed, 13 Apr 2016 02:09:57 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:33570 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965424AbcDMGJt (ORCPT ); Wed, 13 Apr 2016 02:09:49 -0400 Date: Wed, 13 Apr 2016 08:09:43 +0200 From: Ingo Molnar To: Jianyu Zhan Cc: mingo@redhat.com, "H. Peter Anvin" , suresh.b.siddha@intel.com, x86@kernel.org, LKML , Andy Lutomirski , Borislav Petkov , Thomas Gleixner , Oleg Nesterov , Dave Hansen Subject: Re: Possible race in copy of fpu->state in copy_process against the exeve'ing parent? Message-ID: <20160413060943.GA4705@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Jianyu Zhan wrote: > Hi, > > I encountered a panic on a Linux-3.2 kernel on a x86_64 machine, and > suspect it is a race condition. And I checked the current mainline > and found it was fixed unintendedly. > > So I hope x86/fpu maintainer help verify this. Thanks verfy much. > > > The panic stack trace : > > #0 [ffff88529d33f990] try_crashdump at ffffffff8105b8ca > #1 [ffff88529d33f9a0] dump_on_panic at ffffffff8105b965 > #2 [ffff88529d33fa60] notifier_call_chain at ffffffff8139f784 > #3 [ffff88529d33fac0] atomic_notifier_call_chain at ffffffff8139f81d > #4 [ffff88529d33fad0] panic at ffffffff8139971c > #5 [ffff88529d33fb50] oops_end at ffffffff8139d34a > #6 [ffff88529d33fb80] no_context at ffffffff81021569 > #7 [ffff88529d33fbd0] __bad_area_nosemaphore at ffffffff81021730 > #8 [ffff88529d33fc20] bad_area at ffffffff810217ac > #9 [ffff88529d33fc50] do_page_fault at ffffffff8139f509 > #10 [ffff88529d33fd70] page_fault at ffffffff8139caef > [exception RIP: prepare_to_copy+35] > <------------------ PANIC !!! > RIP: ffffffff810013f4 RSP: ffff88529d33fe20 RFLAGS: 00010286 > RAX: 00000000ffffffff RBX: 0000000001200011 RCX: ffff884fe73f6320 > RDX: 00000000ffffffff RSI: 00007fff07d36bd0 RDI: 0000000000000000 > RBP: ffff88529d33fe20 R8: 00007f5c4a209770 R9: 0000000000000000 > R10: 00007f5c4a209770 R11: 0000000000000202 R12: 0000000000000000 > R13: 0000000000000000 R14: ffff884fe73f6320 R15: 0000000000000001 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #11 [ffff88529d33fe28] copy_process at ffffffff81038211 > #12 [ffff88529d33fea8] do_fork at ffffffff810393ec > #13 [ffff88529d33ff38] sys_clone at ffffffff81009118 > #14 [ffff88529d33ff48] stub_clone at ffffffff813a31d3 > > > crash7> dis -r prepare_to_copy+35 > 0xffffffff810013d1 : push %rbp > 0xffffffff810013d2 : mov %rsp,%rbp > 0xffffffff810013d5 : nopl 0x0(%rax,%rax,1) > 0xffffffff810013da : mov %rdi,%rcx > 0xffffffff810013dd : cmpl $0x0,0x4d8(%rdi) > 0xffffffff810013e4 : je > 0xffffffff8100142e > 0xffffffff810013e6 : mov 0x4e0(%rdi),%rdi > 0xffffffff810013ed : xchg %ax,%ax > 0xffffffff810013ef : or $0xffffffff,%eax > 0xffffffff810013f2 : mov %eax,%edx > 0xffffffff810013f4 : xsaveopt64 (%rdi) > <---- PANIC HERE > > when panic the %rdi is 0x0000000000000000, which is fpu->state. > > > > So I suspect there is a possible race: > > > Parent: > > sys_execve > do_execve > do_execve_common > search_binary_handler > load_elf_binary > start_thread > start_thread_common > free_thread_xstate(current) > fpu_free > fpu->state = NULL > > > Child: > > sys_clone > do_fork > copy_process > dup_task_struct > prepare_to_copy > unlazy_fpu > __save_init_fpu > fpu_save_init > fpu_xsave(fpu) <---- fpu->sate is NULL, > so cause a > NULL > dereference. > > Scenario: Parent is still exeve'ing, and just set fpu->state to NULL, > and the a concurrent clone() forks a Child and in which fpu_xsave() > tries to fpu_xsave, when fpu->state is NULL. > > The race window seems quite small, and I have checked the Parent's > 'sum_exec_runtime' is 536920255(~0.53s). > > I checked the mainline, and found commit 304bceda6a18(" x86, fpu: use > non-lazy fpu restore for processors supporting xsave") seems > unintendedly fix this? So I'm not sure I understand the suggested race. Separate tasks have separate fpu->state states, so a parallel execve() and clone() has no effect on each other. There's no FPU state sharing. Thanks, Ingo