From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59229) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cC0pm-0002T6-Nr for qemu-devel@nongnu.org; Wed, 30 Nov 2016 04:05:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cC0ph-0005TX-2l for qemu-devel@nongnu.org; Wed, 30 Nov 2016 04:05:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46340) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cC0pg-0005Sn-QP for qemu-devel@nongnu.org; Wed, 30 Nov 2016 04:05:40 -0500 Date: Wed, 30 Nov 2016 10:05:34 +0100 From: Andrew Jones Message-ID: <20161130090534.jor25j4hnec7dlbp@kamzik.brq.redhat.com> References: <14abb3dd-b639-3c31-cade-073fff209ca6@redhat.com> <20161129132354.GF15786@lemon> <04fa01e1-0613-fc14-527b-e3432c6fec1a@redhat.com> <20161129141746.GA2043@lemon> <20161129152428.4w6c6fuate4eouc5@kamzik.brq.redhat.com> <20161129153944.GA11237@lemon> <20161129160123.t55xzd3ggqnlcpsj@kamzik.brq.redhat.com> <07abf6bb-ec21-da2c-bc8c-c9f136ba5c01@redhat.com> <20161129193828.cbzvz5iya7pjms44@kamzik.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] Linux kernel polling for QEMU List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell Cc: Fam Zheng , Eliezer Tamir , "Michael S. Tsirkin" , QEMU Developers , Jens Axboe , Christian Borntraeger , Stefan Hajnoczi , Paolo Bonzini , Davide Libenzi , Christoph Hellwig On Wed, Nov 30, 2016 at 07:19:12AM +0000, Peter Maydell wrote: > On 29 November 2016 at 19:38, Andrew Jones wrote: > > Thanks for making me look, I was simply assuming we were in the while > > loops above. > > > > I couldn't get the problem to reproduce with access to the monitor, > > but by adding '-d exec' I was able to see cpu0 was on the wfe in > > smp_boot_secondary. It should only stay there until cpu1 executes the > > sev in secondary_cinit, but it looks like TCG doesn't yet implement sev > > > > $ grep SEV target-arm/translate.c > > /* TODO: Implement SEV, SEVL and WFE. May help SMP performance. > > Yes, we currently NOP SEV. We only implement WFE as "yield back > to TCG top level loop", though, so this is fine. The idea is > that WFE gets used in busy loops so it's a helpful hint to > try running some other TCG vCPU instead of just spinning in > the guest on this one. Implementing SEV as a NOP and WFE as > a more-or-less NOP is architecturally permitted (guest code > is required to cope with WFE returning "early"). If something > is not working correctly then it's either buggy guest code > or a problem with the generic TCG scheduling of CPUs. The problem is indeed with the scheduling. The way it currently works is to depend on the iothread to kick a reschedule once in a while, or a cpu to issue an instruction that does so (wfe/wfi). However if there's no io and a cpu never issues a scheduling instruction, then it won't happen. We either need a sched tick or to never have an infinite iothread ppoll timeout (basically using the ppoll timeout as a tick). As for being buggy guest code, I don't think so. Here's another unit test that illustrates the issue taking wfe/sev out. #include void secondary(void) { printf("secondary running\n"); asm("yield"); /* A "real" guest cpu shouldn't do this, but even if it * does, that shouldn't stop other cpus from running. */ while(1); } int main(void) { smp_boot_secondary(1, secondary); printf("primary running\n"); asm("yield"); return 0; } With that test we get the two print statements, but it never exits. Now that I understand the problem much better, I think I may be coming full circle and advocating the iothread's ppoll never be allowed to have an infinite timeout again, but now only for tcg. Something like if (timeout < 0 && tcg_enabled()) timeout = TCG_SCHED_TICK; Thanks, drew