From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59229)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <drjones@redhat.com>) id 1cC0pm-0002T6-Nr
	for qemu-devel@nongnu.org; Wed, 30 Nov 2016 04:05:47 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <drjones@redhat.com>) id 1cC0ph-0005TX-2l
	for qemu-devel@nongnu.org; Wed, 30 Nov 2016 04:05:46 -0500
Received: from mx1.redhat.com ([209.132.183.28]:46340)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <drjones@redhat.com>) id 1cC0pg-0005Sn-QP
	for qemu-devel@nongnu.org; Wed, 30 Nov 2016 04:05:40 -0500
Date: Wed, 30 Nov 2016 10:05:34 +0100
From: Andrew Jones <drjones@redhat.com>
Message-ID: <20161130090534.jor25j4hnec7dlbp@kamzik.brq.redhat.com>
References: <14abb3dd-b639-3c31-cade-073fff209ca6@redhat.com>
	<20161129132354.GF15786@lemon>
	<04fa01e1-0613-fc14-527b-e3432c6fec1a@redhat.com>
	<20161129141746.GA2043@lemon>
	<20161129152428.4w6c6fuate4eouc5@kamzik.brq.redhat.com>
	<20161129153944.GA11237@lemon>
	<20161129160123.t55xzd3ggqnlcpsj@kamzik.brq.redhat.com>
	<07abf6bb-ec21-da2c-bc8c-c9f136ba5c01@redhat.com>
	<20161129193828.cbzvz5iya7pjms44@kamzik.brq.redhat.com>
	<CAFEAcA8JgFKskK2qQfo2NrNjtZNV7AgWNzFR2RqfsiY1YvhUYg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAFEAcA8JgFKskK2qQfo2NrNjtZNV7AgWNzFR2RqfsiY1YvhUYg@mail.gmail.com>
Subject: Re: [Qemu-devel] Linux kernel polling for QEMU
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: Fam Zheng <famz@redhat.com>, Eliezer Tamir <eliezer.tamir@linux.intel.com>, "Michael S. Tsirkin" <mst@redhat.com>, QEMU Developers <qemu-devel@nongnu.org>, Jens Axboe <axboe@fb.com>, Christian Borntraeger <borntraeger@de.ibm.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Davide Libenzi <davidel@xmailserver.org>, Christoph Hellwig <hch@lst.de>

On Wed, Nov 30, 2016 at 07:19:12AM +0000, Peter Maydell wrote:
> On 29 November 2016 at 19:38, Andrew Jones <drjones@redhat.com> wrote:
> > Thanks for making me look, I was simply assuming we were in the while
> > loops above.
> >
> > I couldn't get the problem to reproduce with access to the monitor,
> > but by adding '-d exec' I was able to see cpu0 was on the wfe in
> > smp_boot_secondary. It should only stay there until cpu1 executes the
> > sev in secondary_cinit, but it looks like TCG doesn't yet implement sev
> >
> >  $ grep SEV target-arm/translate.c
> >         /* TODO: Implement SEV, SEVL and WFE.  May help SMP performance.
> 
> Yes, we currently NOP SEV. We only implement WFE as "yield back
> to TCG top level loop", though, so this is fine. The idea is
> that WFE gets used in busy loops so it's a helpful hint to
> try running some other TCG vCPU instead of just spinning in
> the guest on this one. Implementing SEV as a NOP and WFE as
> a more-or-less NOP is architecturally permitted (guest code
> is required to cope with WFE returning "early"). If something
> is not working correctly then it's either buggy guest code
> or a problem with the generic TCG scheduling of CPUs.

The problem is indeed with the scheduling. The way it currently works
is to depend on the iothread to kick a reschedule once in a while, or
a cpu to issue an instruction that does so (wfe/wfi). However if
there's no io and a cpu never issues a scheduling instruction, then it
won't happen. We either need a sched tick or to never have an infinite
iothread ppoll timeout (basically using the ppoll timeout as a tick).

As for being buggy guest code, I don't think so. Here's another
unit test that illustrates the issue taking wfe/sev out.

 #include <asm/smp.h>
 void secondary(void) {
     printf("secondary running\n");
     asm("yield");

     /* A "real" guest cpu shouldn't do this, but even if it
      * does, that shouldn't stop other cpus from running.
      */
     while(1);
 }
 int main(void) {
     smp_boot_secondary(1, secondary);
     printf("primary running\n");
     asm("yield");
     return 0;
 }

With that test we get the two print statements, but it never exits.

Now that I understand the problem much better, I think I may be
coming full circle and advocating the iothread's ppoll never be
allowed to have an infinite timeout again, but now only for tcg.
Something like

 if (timeout < 0 && tcg_enabled())
    timeout = TCG_SCHED_TICK;

Thanks,
drew