From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yoshiaki Tamura Subject: Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty. Date: Tue, 16 Mar 2010 22:49:19 +0900 Message-ID: <4B9F8C5F.5040108@lab.ntt.co.jp> References: <1268736839-27371-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <1268736839-27371-3-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <4B9F7D78.5090201@redhat.com> <4B9F8502.3070108@lab.ntt.co.jp> <4B9F87A9.3070509@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, anthony@codemonkey.ws, ohmura.kei@lab.ntt.co.jp To: Avi Kivity Return-path: Received: from tama500.ecl.ntt.co.jp ([129.60.39.148]:61261 "EHLO tama500.ecl.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933053Ab0CPNth (ORCPT ); Tue, 16 Mar 2010 09:49:37 -0400 In-Reply-To: <4B9F87A9.3070509@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > On 03/16/2010 03:17 PM, Yoshiaki Tamura wrote: >> Avi Kivity wrote: >>> On 03/16/2010 12:53 PM, Yoshiaki Tamura wrote: >>>> Modifies wrapper functions for byte-based phys_ram_dirty bitmap to >>>> bit-based phys_ram_dirty bitmap, and adds more wrapper functions to >>>> prevent >>>> direct access to the phys_ram_dirty bitmap. >>> >>>> + >>>> +static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) >>>> +{ >>>> + unsigned long mask; >>>> + int index = (addr>> TARGET_PAGE_BITS) / HOST_LONG_BITS; >>>> + int offset = (addr>> TARGET_PAGE_BITS)& (HOST_LONG_BITS - 1); >>>> + int ret = 0; >>>> + >>>> + mask = 1UL<< offset; >>>> + if (phys_ram_vga_dirty[index]& mask) >>>> + ret |= VGA_DIRTY_FLAG; >>>> + if (phys_ram_code_dirty[index]& mask) >>>> + ret |= CODE_DIRTY_FLAG; >>>> + if (phys_ram_migration_dirty[index]& mask) >>>> + ret |= MIGRATION_DIRTY_FLAG; >>>> + >>>> + return ret; >>>> } >>>> >>>> static inline int cpu_physical_memory_get_dirty(ram_addr_t addr, >>>> int dirty_flags) >>>> { >>>> - return phys_ram_dirty[addr>> TARGET_PAGE_BITS]& dirty_flags; >>>> + return cpu_physical_memory_get_dirty_flags(addr)& dirty_flags; >>>> } >>> >>> This turns one cacheline access into three. If the dirty bitmaps were in >>> an array, you could do >>> >>> return dirty_bitmaps[dirty_index][addr >> (TARGET_PAGE_BITS + >>> BITS_IN_LONG)] & mask; >>> >>> with one cacheline access. >> >> If I'm understanding the existing code correctly, >> int dirty_flags can be combined, like VGA + MIGRATION. >> If we only have to worry about a single dirty flag, I agree with your >> idea. > > From a quick grep it seems flags are not combined, except for something > strange with CODE_DIRTY_FLAG: Thanks for checking out. But the CODE_DIRTY_FLAG makes me really nervous... >> static void notdirty_mem_writel(void *opaque, target_phys_addr_t >> ram_addr, >> uint32_t val) >> { >> int dirty_flags; >> dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS]; >> if (!(dirty_flags & CODE_DIRTY_FLAG)) { >> #if !defined(CONFIG_USER_ONLY) >> tb_invalidate_phys_page_fast(ram_addr, 4); >> dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS]; >> #endif >> } >> stl_p(qemu_get_ram_ptr(ram_addr), val); >> dirty_flags |= (0xff & ~CODE_DIRTY_FLAG); >> phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS] = dirty_flags; >> /* we remove the notdirty callback only if the code has been >> flushed */ >> if (dirty_flags == 0xff) >> tlb_set_dirty(cpu_single_env, cpu_single_env->mem_io_vaddr); >> } > > I can't say I understand what it does. Me neither. This the reason I had to take naive approach... >> On the other hand, qemu seems to require getting combined dirty flags. >> If we introduce dirty bitmaps for each type, we need to access each >> bitmap to get combined flags. I wasn't sure how to make this more >> efficient... >> >>>> static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) >>>> { >>>> - phys_ram_dirty[addr>> TARGET_PAGE_BITS] = 0xff; >>>> + unsigned long mask; >>>> + int index = (addr>> TARGET_PAGE_BITS) / HOST_LONG_BITS; >>>> + int offset = (addr>> TARGET_PAGE_BITS)& (HOST_LONG_BITS - 1); >>>> + >>>> + mask = 1UL<< offset; >>>> + phys_ram_vga_dirty[index] |= mask; >>>> + phys_ram_code_dirty[index] |= mask; >>>> + phys_ram_migration_dirty[index] |= mask; >>>> +} >>> >>> This is also three cacheline accesses. I think we should have a master >>> bitmap which is updated by set_dirty(), and which is or'ed into the >>> other bitmaps when they are accessed. At least the vga and migration >>> bitmaps are only read periodically, not randomly, so this would be very >>> fast. In a way, this is similar to how the qemu bitmap is updated from >>> the kvm bitmap today. >> >> Sounds good to me. >> So we're going to introduce 4 (VGA, CODE, MIGRATION, master) bit-based >> bitmaps in total. >> > > Yeah, except CODE doesn't behave like the others. Would be best to > understand what it's requirements are before making the change. Maybe > CODE will need separate handling (so master will only feed VGA and > MIGRATION). After implementing this patch set, I thought separating the wrapper functions for each dirty flag type might be an option. Unifying everything makes inefficient here. But anyway, do you know somebody who has a strong insight on this CODE_DIRTY_FLAG?