From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34293) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aK5Ow-0006MI-Vb for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:30:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aK5Os-0002Pj-4R for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:30:54 -0500 Received: from greensocs.com ([193.104.36.180]:49191) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aK5Or-0002Me-N5 for qemu-devel@nongnu.org; Fri, 15 Jan 2016 09:30:50 -0500 References: <87oacqd7v9.fsf@linaro.org> <87bn8nc5kb.fsf@linaro.org> From: KONRAD Frederic Message-ID: <56990295.1050307@greensocs.com> Date: Fri, 15 Jan 2016 15:30:45 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------010304060406000305030403" Subject: Re: [Qemu-devel] Status of my hacks on the MTTCG WIP branch List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pranith Kumar , =?UTF-8?Q?Alex_Benn=c3=a9e?= Cc: MTTCG Devel , Paolo Bonzini , QEMU Developers , alvise rigo This is a multi-part message in MIME format. --------------010304060406000305030403 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Le 15/01/2016 15:24, Pranith Kumar a =C3=A9crit : > Hi Alex, > > On Fri, Jan 15, 2016 at 8:53 AM, Alex Benn=C3=A9e > wrote: > > Can you try this branch: > > > >=20 > https://github.com/stsquad/qemu/tree/mttcg/multi_tcg_v8_wip_ajb_fix_loc= ks-r1 > > > > I think I've caught all the things likely to screw up addressing. > > > > I tried this branch and the boot hangs like follows: > > [ 2.001083] random: systemd-udevd urandom read with 1 bits of=20 > entropy available > main-loop: WARNING: I/O thread spun for 1000 iterations > [ 23.778970] INFO: rcu_sched detected stalls on CPUs/tasks: {}=20 > (detected by 0, t=3D2102 jiffies, g=3D-165, c=3D-166, q=3D83) > [ 23.780265] All QSes seen, last rcu_sched kthread activity 2101=20 > (4294939656-4294937555), jiffies_till_next_fqs=3D1, root ->qsmask 0x0 > [ 23.781228] swapper/0 R running task 0 0 0=20 > 0x00000080 > [ 23.781977] Call trace: > [ 23.782375] [] dump_backtrace+0x0/0x170 > [ 23.782852] [] show_stack+0x20/0x2c > [ 23.783279] [] sched_show_task+0x9c/0xf0 > [ 23.783746] [] rcu_check_callbacks+0x7b8/0x828 > [ 23.784230] [] update_process_times+0x40/0x74 > [ 23.784723] [] tick_sched_handle.isra.15+0x38/0x7c > [ 23.785247] [] tick_sched_timer+0x48/0x84 > [ 23.785705] [] __run_hrtimer+0x90/0x200 > [ 23.786148] [] hrtimer_interrupt+0xec/0x268 > [ 23.786612] [] arch_timer_handler_virt+0x38/0x48 > [ 23.787120] [] handle_percpu_devid_irq+0x90/0x12c > [ 23.787621] [] generic_handle_irq+0x38/0x54 > [ 23.788093] [] __handle_domain_irq+0x68/0xc4 > [ 23.788578] [] gic_handle_irq+0x38/0x84 > [ 23.789035] Exception stack(0xffffffc00073bde0 to 0xffffffc00073bf00= ) > [ 23.789650] bde0: 00738000 ffffffc0 0073e71c ffffffc0 0073bf20=20 > ffffffc0 00086948 ffffffc0 > [ 23.790356] be00: 000d848c ffffffc0 00000000 00000000 3ffcdb0c=20 > ffffffc0 00000000 01000000 > [ 23.791030] be20: 38b97100 ffffffc0 0073bea0 ffffffc0 67f6e000=20 > 00000005 567f1c33 00000000 > [ 23.791744] be40: 00748cf0 ffffffc0 0073be70 ffffffc0 c1e2e4a0=20 > ffffffbd 3a801148 ffffffc0 > [ 23.792406] be60: 00000000 00000040 0073e000 ffffffc0 3a801168=20 > ffffffc0 97bbb588 0000007f > [ 23.793055] be80: 0021d7e8 ffffffc0 97b3d6ec 0000007f c37184d0=20 > 0000007f 00738000 ffffffc0 > [ 23.793720] bea0: 0073e71c ffffffc0 006ff7e8 ffffffc0 007c8000=20 > ffffffc0 0073e680 ffffffc0 > [ 23.794373] bec0: 0072fac0 ffffffc0 00000001 00000000 0073bf30=20 > ffffffc0 0050e9e8 ffffffc0 > [ 23.795025] bee0: 00000000 00000000 0073bf20 ffffffc0 00086944=20 > ffffffc0 0073bf20 ffffffc0 > [ 23.795721] [] el1_irq+0x64/0xc0 > [ 23.796131] [] cpu_startup_entry+0x130/0x204 > [ 23.796605] [] rest_init+0x78/0x84 > [ 23.797028] [] start_kernel+0x3a0/0x3b8 > [ 23.797528] rcu_sched kthread starved for 2101 jiffies! > > > I will try to debug and see where it is hanging. > > Thanks! > --=20 > Pranith Hi Pranith, I don't have time today to look into that. But I missed a tb_find_physical which happen during tb_lock not held.. This hack should fix that (and probably slow things down): diff --git a/cpu-exec.c b/cpu-exec.c index 903126f..25a005a 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -252,9 +252,9 @@ static TranslationBlock *tb_find_physical(CPUState *c= pu, } /* Move the TB to the head of the list */ - *ptb1 =3D tb->phys_hash_next; - tb->phys_hash_next =3D tcg_ctx.tb_ctx.tb_phys_hash[h]; - tcg_ctx.tb_ctx.tb_phys_hash[h] =3D tb; +// *ptb1 =3D tb->phys_hash_next; +// tb->phys_hash_next =3D tcg_ctx.tb_ctx.tb_phys_hash[h]; +// tcg_ctx.tb_ctx.tb_phys_hash[h] =3D tb; return tb; } Fred --------------010304060406000305030403 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable

Le 15/01/2016 15:24, Pranith Kumar a =C3=A9crit=C2=A0:
Hi Alex,

On Fri, Jan 15, 2016 at 8:53 AM, Alex Benn=C3=A9e <alex.bennee@lina= ro.org> wrote:
> Can you try this branch:
>
> https://github.com/stsquad/qemu/tree/mttcg/multi_tcg_v8_wip_= ajb_fix_locks-r1
>
> I think I've caught all the things likely to screw up addressing.
>

I tried this branch and the boot hangs like follows:

[=C2=A0=C2=A0=C2=A0 2.001083] random: systemd-udevd urandom r= ead with 1 bits of entropy available
main-loop: WARNING: I/O thread spun for 1000 iterations
[=C2=A0=C2=A0 23.778970] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=3D2102 jiffies, g=3D-165, c=3D-166, q=3D83)
[=C2=A0=C2=A0 23.780265] All QSes seen, last rcu_sched kthrea= d activity 2101 (4294939656-4294937555), jiffies_till_next_fqs=3D1, root ->qsmask 0x0
[=C2=A0=C2=A0 23.781228] swapper/0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 R=C2=A0 running task=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0 0x00000080
[=C2=A0=C2=A0 23.781977] Call trace:
[=C2=A0=C2=A0 23.782375] [<ffffffc00008a4cc>] dump_backtrace+0x0/0x170
[=C2=A0=C2=A0 23.782852] [<ffffffc00008a65c>] show_stack+0x20/0x2c
[=C2=A0=C2=A0 23.783279] [<ffffffc0000c6ba0>] sched_show_task+0x9c/0xf0
[=C2=A0=C2=A0 23.783746] [<ffffffc0000f244c>] rcu_check_callbacks+0x7b8/0x828
[=C2=A0=C2=A0 23.784230] [<ffffffc0000f75c4>] update_process_times+0x40/0x74
[=C2=A0=C2=A0 23.784723] [<ffffffc000107a60>] tick_sched_handle.isra.15+0x38/0x7c
[=C2=A0=C2=A0 23.785247] [<ffffffc000107aec>] tick_sched_timer+0x48/0x84
[=C2=A0=C2=A0 23.785705] [<ffffffc0000f7bb0>] __run_hrtimer+0x90/0x200
[=C2=A0=C2=A0 23.786148] [<ffffffc0000f874c>] hrtimer_interrupt+0xec/0x268
[=C2=A0=C2=A0 23.786612] [<ffffffc0003d9304>] arch_timer_handler_virt+0x38/0x48
[=C2=A0=C2=A0 23.787120] [<ffffffc0000e9ac4>] handle_percpu_devid_irq+0x90/0x12c
[=C2=A0=C2=A0 23.787621] [<ffffffc0000e53fc>] generic_handle_irq+0x38/0x54
[=C2=A0=C2=A0 23.788093] [<ffffffc0000e5744>] __handle_domain_irq+0x68/0xc4
[=C2=A0=C2=A0 23.788578] [<ffffffc000082478>] gic_handle_irq+0x38/0x84
[=C2=A0=C2=A0 23.789035] Exception stack(0xffffffc00073bde0 t= o 0xffffffc00073bf00)
[=C2=A0=C2=A0 23.789650] bde0: 00738000 ffffffc0 0073e71c fff= fffc0 0073bf20 ffffffc0 00086948 ffffffc0
[=C2=A0=C2=A0 23.790356] be00: 000d848c ffffffc0 00000000 000= 00000 3ffcdb0c ffffffc0 00000000 01000000
[=C2=A0=C2=A0 23.791030] be20: 38b97100 ffffffc0 0073bea0 fff= fffc0 67f6e000 00000005 567f1c33 00000000
[=C2=A0=C2=A0 23.791744] be40: 00748cf0 ffffffc0 0073be70 fff= fffc0 c1e2e4a0 ffffffbd 3a801148 ffffffc0
[=C2=A0=C2=A0 23.792406] be60: 00000000 00000040 0073e000 fff= fffc0 3a801168 ffffffc0 97bbb588 0000007f
[=C2=A0=C2=A0 23.793055] be80: 0021d7e8 ffffffc0 97b3d6ec 000= 0007f c37184d0 0000007f 00738000 ffffffc0
[=C2=A0=C2=A0 23.793720] bea0: 0073e71c ffffffc0 006ff7e8 fff= fffc0 007c8000 ffffffc0 0073e680 ffffffc0
[=C2=A0=C2=A0 23.794373] bec0: 0072fac0 ffffffc0 00000001 000= 00000 0073bf30 ffffffc0 0050e9e8 ffffffc0
[=C2=A0=C2=A0 23.795025] bee0: 00000000 00000000 0073bf20 fff= fffc0 00086944 ffffffc0 0073bf20 ffffffc0
[=C2=A0=C2=A0 23.795721] [<ffffffc0000855a4>] el1_irq+0= x64/0xc0
[=C2=A0=C2=A0 23.796131] [<ffffffc0000d8488>] cpu_startup_entry+0x130/0x204
[=C2=A0=C2=A0 23.796605] [<ffffffc0004fba38>] rest_init+0x78/0x84
[=C2=A0=C2=A0 23.797028] [<ffffffc0006ca99c>] start_kernel+0x3a0/0x3b8
[=C2=A0=C2=A0 23.797528] rcu_sched kthread starved for 2101 j= iffies!


I will try to debug and see where it is hanging.

Thanks!
--
Pranith

Hi Pranith,

I don't have time today to look into that.

But I missed a tb_find_physical which happen during tb_lock not held..
This hack should fix that (and probably slow things down):

diff --git a/cpu-exec.c b/cpu-exec.c
index 903126f..25a005a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -252,9 +252,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0
=C2=A0=C2=A0=C2=A0=C2=A0 /* Move the TB to the head of the list */ -=C2=A0=C2=A0=C2=A0 *ptb1 =3D tb->phys_hash_next;
-=C2=A0=C2=A0=C2=A0 tb->phys_hash_next =3D tcg_ctx.tb_ctx.tb_phys_= hash[h];
-=C2=A0=C2=A0=C2=A0 tcg_ctx.tb_ctx.tb_phys_hash[h] =3D tb;
+//=C2=A0=C2=A0=C2=A0 *ptb1 =3D tb->phys_hash_next;
+//=C2=A0=C2=A0=C2=A0 tb->phys_hash_next =3D tcg_ctx.tb_ctx.tb_phy= s_hash[h];
+//=C2=A0=C2=A0=C2=A0 tcg_ctx.tb_ctx.tb_phys_hash[h] =3D tb;
=C2=A0=C2=A0=C2=A0=C2=A0 return tb;
=C2=A0}

Fred
--------------010304060406000305030403--