From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55878)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1YYDkZ-0003BL-Rd
	for qemu-devel@nongnu.org; Wed, 18 Mar 2015 09:11:12 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1YYDkW-0008ED-3k
	for qemu-devel@nongnu.org; Wed, 18 Mar 2015 09:11:07 -0400
Received: from mx1.redhat.com ([209.132.183.28]:37950)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1YYDkV-0008E7-Sq
	for qemu-devel@nongnu.org; Wed, 18 Mar 2015 09:11:04 -0400
Message-ID: <55097962.8030202@redhat.com>
Date: Wed, 18 Mar 2015 14:10:58 +0100
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1426683232-8847-1-git-send-email-peter.crosthwaite@xilinx.com>
In-Reply-To: <1426683232-8847-1-git-send-email-peter.crosthwaite@xilinx.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH] cpus: Don't kick un-inited cpus.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Crosthwaite <peter.crosthwaite@xilinx.com>, qemu-devel@nongnu.org
Cc: alistair.francis@xilinx.com


On 18/03/2015 13:53, Peter Crosthwaite wrote:
> following a464982499b2f637f6699e3d03e0a9d2e0b5288b, it's now possible for
> there to be attempts to take the BQL before CPUs have been realized in
> cases where a machine model inits peripherals before the first CPU.
> 
> BQL lock aquisition kicks the first_cpu, leading to a segfault if this
> happens pre-realize. Guard the CPU kick routine to perform no action for
> a CPU that doesn't exist or doesn't have a thread yet.
> 
> Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
> ---
>  cpus.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/cpus.c b/cpus.c
> index 1ce90a1..c90dfa8 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1046,6 +1046,10 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>  
>  static void qemu_cpu_kick_thread(CPUState *cpu)
>  {
> +    if (!cpu || !cpu->thread) {
> +        return;
> +    }
> +
>  #ifndef _WIN32
>      int err;
>  
> 

That's been fixed already for a couple of weeks. :)

commit 21618b3e55ad2c6fede0bffcaea466091811ce59
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Feb 27 20:01:03 2015 +0100

    cpus: be more paranoid in avoiding deadlocks

    For good measure, ensure that the following sequence:

       thread 1 calls qemu_mutex_lock_iothread
       thread 2 calls qemu_mutex_lock_iothread
       VCPU thread are created
       VCPU thread enters execution loop

    results in the VCPU threads letting the other two threads run
    and obeying iothread_requesting_mutex even if the VCPUs are
    not halted.  To do this, check iothread_requesting_mutex
    before execution starts.

    Tested-by: Leon Alrae <leon.alrae@imgtec.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit 6b49809c597331803ea941eadda813e5bb4e8fe2
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Feb 27 19:58:23 2015 +0100

    cpus: fix deadlock and segfault in qemu_mutex_lock_iothread

    When two threads (other than the low-priority TCG VCPU thread)
    are competing for the iothread lock, a deadlock can happen.  This
    is because iothread_requesting_mutex is set to false by the first
    thread that gets the mutex, and then the VCPU thread might never
    yield from the execution loop.  If iothread_requesting_mutex is
    changed from a bool to a counter, the deadlock is fixed.

    However, there is another bug in qemu_mutex_lock_iothread that
    can be triggered by the new call_rcu thread.  The bug happens
    if qemu_mutex_lock_iothread is called before the CPUs are
    created.  In that case, first_cpu is NULL and the caller
    segfaults in qemu_mutex_lock_iothread.  To fix this, just
    do not do the kick if first_cpu is NULL.

    Reported-by: Leon Alrae <leon.alrae@imgtec.com>
    Reported-by: Andreas Gustafsson <gson@gson.org>
    Tested-by: Leon Alrae <leon.alrae@imgtec.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>