From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54536) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1etYYm-0003uj-2H for qemu-devel@nongnu.org; Wed, 07 Mar 2018 07:52:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1etYYi-0000GY-6H for qemu-devel@nongnu.org; Wed, 07 Mar 2018 07:52:44 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:32798) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1etYYh-0000GB-Tu for qemu-devel@nongnu.org; Wed, 07 Mar 2018 07:52:40 -0500 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w27Cphoi135543 for ; Wed, 7 Mar 2018 07:52:39 -0500 Received: from e14.ny.us.ibm.com (e14.ny.us.ibm.com [129.33.205.204]) by mx0a-001b2d01.pphosted.com with ESMTP id 2gjf2tuh7p-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Wed, 07 Mar 2018 07:52:38 -0500 Received: from localhost by e14.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 7 Mar 2018 07:52:37 -0500 References: <079a5da7-6586-b974-6b99-e5de055b1bd1@linux.vnet.ibm.com> <20180302092318.GA6026@stefanha-x1.localdomain> <6a3461c2-368d-1aa1-5b86-a6a602251829@linux.vnet.ibm.com> <20180305110356.GF7910@stefanha-x1.localdomain> <12e1269c-6eae-a400-cc00-2c5c8e4bb8f9@linux.vnet.ibm.com> <20180306073458.24118b01@mschwideX1> From: Farhan Ali Date: Wed, 7 Mar 2018 07:52:31 -0500 MIME-Version: 1.0 In-Reply-To: <20180306073458.24118b01@mschwideX1> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Message-Id: <55bacbf2-8b30-1082-ff25-0174470c24b8@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [BUG] I/O thread segfault for QEMU on s390x List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Martin Schwidefsky , Christian Borntraeger Cc: linux-s390 , Thomas Huth , Cornelia Huck , famz@redhat.com, Heiko Carstens , QEMU Developers , mreitz@redhat.com, qemu-s390x@nongnu.org, Hendrik Brueckner , Stefan Hajnoczi , Paolo Bonzini On 03/06/2018 01:34 AM, Martin Schwidefsky wrote: > On Mon, 5 Mar 2018 20:08:45 +0100 > Christian Borntraeger wrote: > >> Do you happen to run with a recent host kernel that has >> >> commit 7041d28115e91f2144f811ffe8a195c696b1e1d0 >> s390: scrub registers on kernel entry and KVM exit >> >> Can you run with this on top >> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S >> index 13a133a6015c..d6dc0e5e8f74 100644 >> --- a/arch/s390/kernel/entry.S >> +++ b/arch/s390/kernel/entry.S >> @@ -426,13 +426,13 @@ ENTRY(system_call) >> UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER >> BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP >> stmg %r0,%r7,__PT_R0(%r11) >> - # clear user controlled register to prevent speculative use >> - xgr %r0,%r0 >> mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC >> mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW >> mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC >> stg %r14,__PT_FLAGS(%r11) >> .Lsysc_do_svc: >> + # clear user controlled register to prevent speculative use >> + xgr %r0,%r0 >> # load address of system call table >> lg %r10,__THREAD_sysc_table(%r13,%r12) >> llgh %r8,__PT_INT_CODE+2(%r11) >> >> >> To me it looks like that the critical section cleanup (interrupt during system call entry) might >> save the registers again into ptregs but we have already zeroed out r0. >> This patch moves the clearing of r0 after sysc_do_svc, which should fix the critical >> section cleanup. >> >> Adding Martin and Heiko. Will spin a patch. > > Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug > for days now. The point is that if the system call handler is interrupted > after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call > repeats the stmg for %r0-%r7 but now %r0 is already zero. > > Please commit a patch for this and I'll will queue it up immediately. > This patch does fix the QEMU crash. I haven't seen the crash after running the test case for more than a day. Thanks to everyone for taking a look at this problem :) Thanks Farhan