From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48007) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYrkv-0007M9-BL for qemu-devel@nongnu.org; Thu, 25 Feb 2016 03:58:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYrkq-0001CV-8I for qemu-devel@nongnu.org; Thu, 25 Feb 2016 03:58:41 -0500 Received: from greensocs.com ([193.104.36.180]:45244) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYrkp-0001BZ-Sq for qemu-devel@nongnu.org; Thu, 25 Feb 2016 03:58:36 -0500 Message-ID: <56CEC235.3090603@greensocs.com> Date: Thu, 25 Feb 2016 09:58:29 +0100 From: Frederic Konrad MIME-Version: 1.0 References: <874mcy818t.fsf@linaro.org> In-Reply-To: <874mcy818t.fsf@linaro.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Making all TB invalidation asynchronous (MTTCG safe_work)? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?B?QWxleCBCZW5uw6ll?= , MTTCG Devel , "qemu-devel@nongnu.org" Cc: Paolo Bonzini , Sergey Fedorov Hi Alex, We decided in Seattle to make this flag per tb (eg move it to the tb struct). On 24/02/2016 18:30, Alex Benn=C3=A9e wrote: > Hi, > > So I've been working on reducing MTTCG tb_lock contention and currently > have a tb_lock around the following code (in my cpu_exec): > > /* Note: we do it here to avoid a gcc bug on Mac OS X when > doing it in tb_find_slow */ > tb_lock(); > if (tcg_ctx.tb_ctx.tb_invalidated_flag) { > /* as some TB could have been invalidated because > of memory exceptions while generating the code, we > must recompute the hash index here */ > next_tb =3D 0; > tcg_ctx.tb_ctx.tb_invalidated_flag =3D 0; > } > /* see if we can patch the calling TB. When the TB > spans two pages, we cannot safely do a direct > jump. */ > if (next_tb !=3D 0 && tb->page_addr[1] =3D=3D -1 > && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) { > tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK), > next_tb & TB_EXIT_MASK, tb); > } > tb_unlock(); > > And this started me down the rabbit hole of the meaning of > tcg_ctx.tb_ctx.tb_invalidated_flag. So as far as I follow there are two > places this is set: > > * We've run out of translation memory and we are throwing everything > away (tb_alloc =3D=3D NULL) > * We've invalidated the physical pages of some TranslationBlocks > > The first case there is a slightly convoluted buffer overflow handing > code (tb_gen_code): > > if (unlikely(!tb)) { > buffer_overflow: > /* flush must be done */ > tb_flush_safe(cpu); > /* Don't forget to invalidate previous TB info. */ > tcg_ctx.tb_ctx.tb_invalidated_flag =3D 1; > tb_unlock(); > cpu_loop_exit(cpu); > } > > Which I'm sure could be more simply handled by just queuing the safe > tb_flush and returning a NULL tb and letting the execution loop unwind > before resetting the translation buffers. > > The second case has been partially asynced by Fred: > > /* invalidate one TB */ > void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_a= ddr) > { > CPUState *cpu; > PageDesc *p; > unsigned int h; > tb_page_addr_t phys_pc; > struct CPUDiscardTBParams *params; > > assert_tb_lock(); /* added by me because of bellow */ > > /* remove the TB from the hash list */ > phys_pc =3D tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); > h =3D tb_phys_hash_func(phys_pc); > tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); > > /* remove the TB from the page list */ > if (tb->page_addr[0] !=3D page_addr) { > p =3D page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); > tb_page_remove(&p->first_tb, tb); > invalidate_page_bitmap(p); > } > if (tb->page_addr[1] !=3D -1 && tb->page_addr[1] !=3D page_addr= ) { > p =3D page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); > tb_page_remove(&p->first_tb, tb); > invalidate_page_bitmap(p); > } > > tcg_ctx.tb_ctx.tb_invalidated_flag =3D 1; > > CPU_FOREACH(cpu) { > params =3D g_malloc(sizeof(struct CPUDiscardTBParams)); > params->cpu =3D cpu; > params->tb =3D tb; > async_run_on_cpu(cpu, cpu_discard_tb_from_jmp_cache, params= ); > } > async_run_safe_work_on_cpu(first_cpu, tb_invalidate_jmp_remove,= tb); > > tcg_ctx.tb_ctx.tb_phys_invalidate_count++; > } > > But I'm wondering why we can't defer all the page invalidation to safe > work? > > I don't think it matters to the invalidating vCPU as it has to get > to the end of its block anyway. For other vCPUs as there is no strict > synchronisation can we not pretend what ever the operation was that > triggered the invalidation happened just as the block ended? > > The final case I don't quite follow is the avoiding invalidation of > tb_next in cpu_exec_nocache() if we have already caused a tb > invalidation event: > > tb->orig_tb =3D tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb= ; > > Which is later (in cpu_io_recompile): > > if (tb->cflags & CF_NOCACHE) { > if (tb->orig_tb) { > /* Invalidate original TB if this TB was generated in > * cpu_exec_nocache() */ > tb_phys_invalidate(tb->orig_tb, -1); > } > tb_free(tb); > } > > My aim in all of this is to see if we can remove another flag from > tb_ctx (one less thing to mutex access to) and make the code flow easie= r > to follow. So remaining question: > > * Are there cases where not immediately invalidating the tb_page > structures would cause problems for the emulation? Is that the same issue we might have with the memory barriers? Fred > > Thanks in advance for any elucidation ;-) > > -- > Alex Benn=C3=A9e >