From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38052) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aK5e3-0001ru-5t for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:46:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aK5dz-0006ZS-TJ for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:46:31 -0500 Received: from mail-wm0-x232.google.com ([2a00:1450:400c:c09::232]:37475) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aK5dz-0006ZK-J4 for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:46:27 -0500 Received: by mail-wm0-x232.google.com with SMTP id f206so28040438wmf.0 for ; Fri, 15 Jan 2016 06:46:27 -0800 (PST) References: <87oacqd7v9.fsf@linaro.org> <87bn8nc5kb.fsf@linaro.org> <56990295.1050307@greensocs.com> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <56990295.1050307@greensocs.com> Date: Fri, 15 Jan 2016 14:46:23 +0000 Message-ID: <87a8o6dhog.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] Status of my hacks on the MTTCG WIP branch List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: KONRAD Frederic Cc: MTTCG Devel , Paolo Bonzini , Pranith Kumar , QEMU Developers , alvise rigo KONRAD Frederic writes: > Le 15/01/2016 15:24, Pranith Kumar a écrit : >> Hi Alex, >> >> On Fri, Jan 15, 2016 at 8:53 AM, Alex Bennée > > wrote: >> > Can you try this branch: >> > >> > >> https://github.com/stsquad/qemu/tree/mttcg/multi_tcg_v8_wip_ajb_fix_locks-r1 >> > >> > I think I've caught all the things likely to screw up addressing. >> > >> >> I tried this branch and the boot hangs like follows: >> >> [ 2.001083] random: systemd-udevd urandom read with 1 bits of >> entropy available >> main-loop: WARNING: I/O thread spun for 1000 iterations >> [ 23.778970] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=2102 jiffies, g=-165, c=-166, q=83) >> [ 23.780265] All QSes seen, last rcu_sched kthread activity 2101 >> (4294939656-4294937555), jiffies_till_next_fqs=1, root ->qsmask 0x0 >> [ 23.781228] swapper/0 R running task 0 0 0 >> 0x00000080 >> [ 23.781977] Call trace: >> [ 23.782375] [] dump_backtrace+0x0/0x170 >> [ 23.782852] [] show_stack+0x20/0x2c >> [ 23.783279] [] sched_show_task+0x9c/0xf0 >> [ 23.783746] [] rcu_check_callbacks+0x7b8/0x828 >> [ 23.784230] [] update_process_times+0x40/0x74 >> [ 23.784723] [] tick_sched_handle.isra.15+0x38/0x7c >> [ 23.785247] [] tick_sched_timer+0x48/0x84 >> [ 23.785705] [] __run_hrtimer+0x90/0x200 >> [ 23.786148] [] hrtimer_interrupt+0xec/0x268 >> [ 23.786612] [] arch_timer_handler_virt+0x38/0x48 >> [ 23.787120] [] handle_percpu_devid_irq+0x90/0x12c >> [ 23.787621] [] generic_handle_irq+0x38/0x54 >> [ 23.788093] [] __handle_domain_irq+0x68/0xc4 >> [ 23.788578] [] gic_handle_irq+0x38/0x84 >> [ 23.789035] Exception stack(0xffffffc00073bde0 to 0xffffffc00073bf00) >> [ 23.789650] bde0: 00738000 ffffffc0 0073e71c ffffffc0 0073bf20 >> ffffffc0 00086948 ffffffc0 >> [ 23.790356] be00: 000d848c ffffffc0 00000000 00000000 3ffcdb0c >> ffffffc0 00000000 01000000 >> [ 23.791030] be20: 38b97100 ffffffc0 0073bea0 ffffffc0 67f6e000 >> 00000005 567f1c33 00000000 >> [ 23.791744] be40: 00748cf0 ffffffc0 0073be70 ffffffc0 c1e2e4a0 >> ffffffbd 3a801148 ffffffc0 >> [ 23.792406] be60: 00000000 00000040 0073e000 ffffffc0 3a801168 >> ffffffc0 97bbb588 0000007f >> [ 23.793055] be80: 0021d7e8 ffffffc0 97b3d6ec 0000007f c37184d0 >> 0000007f 00738000 ffffffc0 >> [ 23.793720] bea0: 0073e71c ffffffc0 006ff7e8 ffffffc0 007c8000 >> ffffffc0 0073e680 ffffffc0 >> [ 23.794373] bec0: 0072fac0 ffffffc0 00000001 00000000 0073bf30 >> ffffffc0 0050e9e8 ffffffc0 >> [ 23.795025] bee0: 00000000 00000000 0073bf20 ffffffc0 00086944 >> ffffffc0 0073bf20 ffffffc0 >> [ 23.795721] [] el1_irq+0x64/0xc0 >> [ 23.796131] [] cpu_startup_entry+0x130/0x204 >> [ 23.796605] [] rest_init+0x78/0x84 >> [ 23.797028] [] start_kernel+0x3a0/0x3b8 >> [ 23.797528] rcu_sched kthread starved for 2101 jiffies! >> >> >> I will try to debug and see where it is hanging. >> >> Thanks! >> -- >> Pranith > > Hi Pranith, > > I don't have time today to look into that. > > But I missed a tb_find_physical which happen during tb_lock not held.. > This hack should fix that (and probably slow things down): > > diff --git a/cpu-exec.c b/cpu-exec.c > index 903126f..25a005a 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -252,9 +252,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu, > } > > /* Move the TB to the head of the list */ > - *ptb1 = tb->phys_hash_next; > - tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h]; > - tcg_ctx.tb_ctx.tb_phys_hash[h] = tb; > +// *ptb1 = tb->phys_hash_next; > +// tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h]; > +// tcg_ctx.tb_ctx.tb_phys_hash[h] = tb; > return tb; > } Hmm not in my build cpu_exec: ... tb_lock(); tb = tb_find_fast(cpu); ... Which I think is right. I mean I can see if it wasn't then breakage could occur when you manipulate the lookup but I think we should keep the lock there and if it proves to be a performance hit come up with a safe optimisation. I think Paolo talked about using RCU type locks. -- Alex Bennée