From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44587) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkcEU-000296-8H for qemu-devel@nongnu.org; Thu, 15 Sep 2016 15:22:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bkcEO-0000LN-B5 for qemu-devel@nongnu.org; Thu, 15 Sep 2016 15:22:01 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:48012) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkcEO-0000Kv-2g for qemu-devel@nongnu.org; Thu, 15 Sep 2016 15:21:56 -0400 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u8FJI7G1008179 for ; Thu, 15 Sep 2016 15:21:54 -0400 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 25fr9hnj56-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 15 Sep 2016 15:21:54 -0400 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 15 Sep 2016 20:21:50 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 34C081B08061 for ; Thu, 15 Sep 2016 20:23:35 +0100 (BST) Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u8FJLjpr61669428 for ; Thu, 15 Sep 2016 19:21:45 GMT Received: from d06av06.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u8FJLjVq010100 for ; Thu, 15 Sep 2016 15:21:45 -0400 Date: Thu, 15 Sep 2016 21:21:42 +0200 From: David Hildenbrand In-Reply-To: <33773797-04ec-413f-7ba2-4bb7a4350a44@de.ibm.com> References: <33773797-04ec-413f-7ba2-4bb7a4350a44@de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: <20160915212142.5fd5048e@thinkpad-w530> Subject: Re: [Qemu-devel] [s390] possible deadlock in handle_sigp? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger Cc: Paolo Bonzini , KVM list , Cornelia Huck , qemu-devel > On 09/12/2016 08:03 PM, Paolo Bonzini wrote: > > > > > > On 12/09/2016 19:37, Christian Borntraeger wrote: > >> On 09/12/2016 06:44 PM, Paolo Bonzini wrote: > >>> I think that two CPUs doing reciprocal SIGPs could in principle end up > >>> waiting on each other to complete their run_on_cpu. If the SIGP has to > >>> be synchronous the fix is not trivial (you'd have to put the CPU in a > >>> state similar to cpu->halted = 1), otherwise it's enough to replace > >>> run_on_cpu with async_run_on_cpu. > >> > >> IIRC the sigps are supossed to be serialized by the big QEMU lock. WIll > >> have a look. > > > > Yes, but run_on_cpu drops it when it waits on the qemu_work_cond > > condition variable. (Related: I stumbled upon it because I wanted to > > remove the BQL from run_on_cpu work items). > > Yes, seems you are right. If both CPUs have just exited from KVM doing a > crossover sigp, they will do the arch_exit handling before the run_on_cpu > stuff which might result in this hang. (luckily it seems quite unlikely > but still we need to fix it). > We cannot simply use async as the callbacks also provide the condition > code for the initiater, so this requires some rework. > > Smells like having to provide a lock per CPU. Trylock that lock, if that's not possible, cc=busy. SIGP SET ARCHITECTURE has to lock all CPUs. That was the initital design, until I realized that this was all protected by the BQL. David