From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38933) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aK5hR-0003KR-TD for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:50:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aK5hO-0007Lk-MY for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:50:01 -0500 Received: from greensocs.com ([193.104.36.180]:57607) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aK5hO-0007Lf-AC for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:49:58 -0500 References: <87oacqd7v9.fsf@linaro.org> <87bn8nc5kb.fsf@linaro.org> <56990295.1050307@greensocs.com> <87a8o6dhog.fsf@linaro.org> From: KONRAD Frederic Message-ID: <56990712.2050107@greensocs.com> Date: Fri, 15 Jan 2016 15:49:54 +0100 MIME-Version: 1.0 In-Reply-To: <87a8o6dhog.fsf@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Status of my hacks on the MTTCG WIP branch List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?Q?Alex_Benn=c3=a9e?= Cc: MTTCG Devel , Paolo Bonzini , Pranith Kumar , QEMU Developers , alvise rigo Le 15/01/2016 15:46, Alex Benn=C3=A9e a =C3=A9crit : > KONRAD Frederic writes: > >> Le 15/01/2016 15:24, Pranith Kumar a =C3=A9crit : >>> Hi Alex, >>> >>> On Fri, Jan 15, 2016 at 8:53 AM, Alex Benn=C3=A9e >> > wrote: >>>> Can you try this branch: >>>> >>>> >>> https://github.com/stsquad/qemu/tree/mttcg/multi_tcg_v8_wip_ajb_fix_l= ocks-r1 >>>> I think I've caught all the things likely to screw up addressing. >>>> >>> I tried this branch and the boot hangs like follows: >>> >>> [ 2.001083] random: systemd-udevd urandom read with 1 bits of >>> entropy available >>> main-loop: WARNING: I/O thread spun for 1000 iterations >>> [ 23.778970] INFO: rcu_sched detected stalls on CPUs/tasks: {} >>> (detected by 0, t=3D2102 jiffies, g=3D-165, c=3D-166, q=3D83) >>> [ 23.780265] All QSes seen, last rcu_sched kthread activity 2101 >>> (4294939656-4294937555), jiffies_till_next_fqs=3D1, root ->qsmask 0x0 >>> [ 23.781228] swapper/0 R running task 0 0 0 >>> 0x00000080 >>> [ 23.781977] Call trace: >>> [ 23.782375] [] dump_backtrace+0x0/0x170 >>> [ 23.782852] [] show_stack+0x20/0x2c >>> [ 23.783279] [] sched_show_task+0x9c/0xf0 >>> [ 23.783746] [] rcu_check_callbacks+0x7b8/0x828 >>> [ 23.784230] [] update_process_times+0x40/0x74 >>> [ 23.784723] [] tick_sched_handle.isra.15+0x38/0x= 7c >>> [ 23.785247] [] tick_sched_timer+0x48/0x84 >>> [ 23.785705] [] __run_hrtimer+0x90/0x200 >>> [ 23.786148] [] hrtimer_interrupt+0xec/0x268 >>> [ 23.786612] [] arch_timer_handler_virt+0x38/0x48 >>> [ 23.787120] [] handle_percpu_devid_irq+0x90/0x12= c >>> [ 23.787621] [] generic_handle_irq+0x38/0x54 >>> [ 23.788093] [] __handle_domain_irq+0x68/0xc4 >>> [ 23.788578] [] gic_handle_irq+0x38/0x84 >>> [ 23.789035] Exception stack(0xffffffc00073bde0 to 0xffffffc00073bf= 00) >>> [ 23.789650] bde0: 00738000 ffffffc0 0073e71c ffffffc0 0073bf20 >>> ffffffc0 00086948 ffffffc0 >>> [ 23.790356] be00: 000d848c ffffffc0 00000000 00000000 3ffcdb0c >>> ffffffc0 00000000 01000000 >>> [ 23.791030] be20: 38b97100 ffffffc0 0073bea0 ffffffc0 67f6e000 >>> 00000005 567f1c33 00000000 >>> [ 23.791744] be40: 00748cf0 ffffffc0 0073be70 ffffffc0 c1e2e4a0 >>> ffffffbd 3a801148 ffffffc0 >>> [ 23.792406] be60: 00000000 00000040 0073e000 ffffffc0 3a801168 >>> ffffffc0 97bbb588 0000007f >>> [ 23.793055] be80: 0021d7e8 ffffffc0 97b3d6ec 0000007f c37184d0 >>> 0000007f 00738000 ffffffc0 >>> [ 23.793720] bea0: 0073e71c ffffffc0 006ff7e8 ffffffc0 007c8000 >>> ffffffc0 0073e680 ffffffc0 >>> [ 23.794373] bec0: 0072fac0 ffffffc0 00000001 00000000 0073bf30 >>> ffffffc0 0050e9e8 ffffffc0 >>> [ 23.795025] bee0: 00000000 00000000 0073bf20 ffffffc0 00086944 >>> ffffffc0 0073bf20 ffffffc0 >>> [ 23.795721] [] el1_irq+0x64/0xc0 >>> [ 23.796131] [] cpu_startup_entry+0x130/0x204 >>> [ 23.796605] [] rest_init+0x78/0x84 >>> [ 23.797028] [] start_kernel+0x3a0/0x3b8 >>> [ 23.797528] rcu_sched kthread starved for 2101 jiffies! >>> >>> >>> I will try to debug and see where it is hanging. >>> >>> Thanks! >>> -- >>> Pranith >> Hi Pranith, >> >> I don't have time today to look into that. >> >> But I missed a tb_find_physical which happen during tb_lock not held.. >> This hack should fix that (and probably slow things down): >> >> diff --git a/cpu-exec.c b/cpu-exec.c >> index 903126f..25a005a 100644 >> --- a/cpu-exec.c >> +++ b/cpu-exec.c >> @@ -252,9 +252,9 @@ static TranslationBlock *tb_find_physical(CPUState= *cpu, >> } >> >> /* Move the TB to the head of the list */ >> - *ptb1 =3D tb->phys_hash_next; >> - tb->phys_hash_next =3D tcg_ctx.tb_ctx.tb_phys_hash[h]; >> - tcg_ctx.tb_ctx.tb_phys_hash[h] =3D tb; >> +// *ptb1 =3D tb->phys_hash_next; >> +// tb->phys_hash_next =3D tcg_ctx.tb_ctx.tb_phys_hash[h]; >> +// tcg_ctx.tb_ctx.tb_phys_hash[h] =3D tb; >> return tb; >> } > Hmm not in my build cpu_exec: > > ... > tb_lock(); > tb =3D tb_find_fast(cpu); > ... > > Which I think is right. I mean I can see if it wasn't then breakage > could occur when you manipulate the lookup but I think we should keep > the lock there and if it proves to be a performance hit come up with a > safe optimisation. I think Paolo talked about using RCU type locks. That's definitely a performance hit. Ok we should talk about that Monday. Fred > > -- > Alex Benn=C3=A9e