From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59984) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYyiM-0006pI-Rc for qemu-devel@nongnu.org; Thu, 25 Feb 2016 11:24:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYyiJ-0004qo-M3 for qemu-devel@nongnu.org; Thu, 25 Feb 2016 11:24:30 -0500 Received: from mail-wm0-x22f.google.com ([2a00:1450:400c:c09::22f]:34429) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYyiJ-0004qc-9u for qemu-devel@nongnu.org; Thu, 25 Feb 2016 11:24:27 -0500 Received: by mail-wm0-x22f.google.com with SMTP id b205so38687043wmb.1 for ; Thu, 25 Feb 2016 08:24:27 -0800 (PST) References: <874mcy818t.fsf@linaro.org> <56CEC235.3090603@greensocs.com> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <56CEC235.3090603@greensocs.com> Date: Thu, 25 Feb 2016 16:24:24 +0000 Message-ID: <87y4a87o7b.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] Making all TB invalidation asynchronous (MTTCG safe_work)? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Frederic Konrad Cc: MTTCG Devel , Paolo Bonzini , "Emilio G. Cota" , "qemu-devel@nongnu.org" , Sergey Fedorov Frederic Konrad writes: > Hi Alex, > > We decided in Seattle to make this flag per tb (eg move it to the tb > struct). Indeed and looking at the state of Emilo's branch I see he did exactly that and solved a lot of the lock contention problems. > > On 24/02/2016 18:30, Alex Bennée wrote: >> Hi, >> >> So I've been working on reducing MTTCG tb_lock contention and currently >> have a tb_lock around the following code (in my cpu_exec): >> >> /* Note: we do it here to avoid a gcc bug on Mac OS X when >> doing it in tb_find_slow */ >> tb_lock(); >> if (tcg_ctx.tb_ctx.tb_invalidated_flag) { >> /* as some TB could have been invalidated because >> of memory exceptions while generating the code, we >> must recompute the hash index here */ >> next_tb = 0; >> tcg_ctx.tb_ctx.tb_invalidated_flag = 0; >> } >> /* see if we can patch the calling TB. When the TB >> spans two pages, we cannot safely do a direct >> jump. */ >> if (next_tb != 0 && tb->page_addr[1] == -1 >> && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) { >> tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK), >> next_tb & TB_EXIT_MASK, tb); >> } >> tb_unlock(); >> >> And this started me down the rabbit hole of the meaning of >> tcg_ctx.tb_ctx.tb_invalidated_flag. So as far as I follow there are two >> places this is set: >> >> * We've run out of translation memory and we are throwing everything >> away (tb_alloc == NULL) >> * We've invalidated the physical pages of some TranslationBlocks >> >> The first case there is a slightly convoluted buffer overflow handing >> code (tb_gen_code): >> >> if (unlikely(!tb)) { >> buffer_overflow: >> /* flush must be done */ >> tb_flush_safe(cpu); >> /* Don't forget to invalidate previous TB info. */ >> tcg_ctx.tb_ctx.tb_invalidated_flag = 1; >> tb_unlock(); >> cpu_loop_exit(cpu); >> } >> >> Which I'm sure could be more simply handled by just queuing the safe >> tb_flush and returning a NULL tb and letting the execution loop unwind >> before resetting the translation buffers. >> >> The second case has been partially asynced by Fred: >> >> /* invalidate one TB */ >> void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) >> { >> CPUState *cpu; >> PageDesc *p; >> unsigned int h; >> tb_page_addr_t phys_pc; >> struct CPUDiscardTBParams *params; >> >> assert_tb_lock(); /* added by me because of bellow */ >> >> /* remove the TB from the hash list */ >> phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); >> h = tb_phys_hash_func(phys_pc); >> tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); >> >> /* remove the TB from the page list */ >> if (tb->page_addr[0] != page_addr) { >> p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); >> tb_page_remove(&p->first_tb, tb); >> invalidate_page_bitmap(p); >> } >> if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) { >> p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); >> tb_page_remove(&p->first_tb, tb); >> invalidate_page_bitmap(p); >> } >> >> tcg_ctx.tb_ctx.tb_invalidated_flag = 1; >> >> CPU_FOREACH(cpu) { >> params = g_malloc(sizeof(struct CPUDiscardTBParams)); >> params->cpu = cpu; >> params->tb = tb; >> async_run_on_cpu(cpu, cpu_discard_tb_from_jmp_cache, params); >> } >> async_run_safe_work_on_cpu(first_cpu, tb_invalidate_jmp_remove, tb); >> >> tcg_ctx.tb_ctx.tb_phys_invalidate_count++; >> } >> >> But I'm wondering why we can't defer all the page invalidation to safe >> work? >> >> I don't think it matters to the invalidating vCPU as it has to get >> to the end of its block anyway. For other vCPUs as there is no strict >> synchronisation can we not pretend what ever the operation was that >> triggered the invalidation happened just as the block ended? >> >> The final case I don't quite follow is the avoiding invalidation of >> tb_next in cpu_exec_nocache() if we have already caused a tb >> invalidation event: >> >> tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb; >> >> Which is later (in cpu_io_recompile): >> >> if (tb->cflags & CF_NOCACHE) { >> if (tb->orig_tb) { >> /* Invalidate original TB if this TB was generated in >> * cpu_exec_nocache() */ >> tb_phys_invalidate(tb->orig_tb, -1); >> } >> tb_free(tb); >> } >> >> My aim in all of this is to see if we can remove another flag from >> tb_ctx (one less thing to mutex access to) and make the code flow easier >> to follow. So remaining question: >> >> * Are there cases where not immediately invalidating the tb_page >> structures would cause problems for the emulation? > > Is that the same issue we might have with the memory barriers? > > Fred >> >> Thanks in advance for any elucidation ;-) >> >> -- >> Alex Bennée >> -- Alex Bennée