From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MUKLg-0005Dy-DA for qemu-devel@nongnu.org; Fri, 24 Jul 2009 08:58:08 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MUKLc-0005C7-Rb for qemu-devel@nongnu.org; Fri, 24 Jul 2009 08:58:08 -0400 Received: from [199.232.76.173] (port=44344 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MUKLc-0005C1-Hm for qemu-devel@nongnu.org; Fri, 24 Jul 2009 08:58:04 -0400 Received: from goliath.siemens.de ([192.35.17.28]:21957) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MUKLb-0005XJ-JP for qemu-devel@nongnu.org; Fri, 24 Jul 2009 08:58:04 -0400 Message-ID: <4A69AFD7.3050703@siemens.com> Date: Fri, 24 Jul 2009 14:57:59 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <1248384704-47824-1-git-send-email-agraf@suse.de> <1248384704-47824-2-git-send-email-agraf@suse.de> <1248384704-47824-3-git-send-email-agraf@suse.de> <4A699405.9010703@siemens.com> <9F9055AA-6842-4938-AE7E-62A515E01025@suse.de> <4A69985B.70308@siemens.com> <75AD4198-F91C-45D0-89E5-E9F27FF2B90A@suse.de> <4A69A044.6080502@siemens.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH 2/3] Assume PPC64 host on PPC32 KVM List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Graf Cc: qemu-devel@nongnu.org Alexander Graf wrote: > > On 24.07.2009, at 13:51, Jan Kiszka wrote: > >> Alexander Graf wrote: >>> >>> On 24.07.2009, at 13:17, Jan Kiszka wrote: >>> >>>> Alexander Graf wrote: >>>>> >>>>> On 24.07.2009, at 12:59, Jan Kiszka wrote: >>>>> >>>>>> Alexander Graf wrote: >>>>>>> When talking to the kernel about dirty maps, we need to find out >>>>>>> which >>>>>>> bits were actually set. This is done by set_bit and test_bit like >>>>>>> functiontality which uses the "long" variable type. >>>>>>> >>>>>>> Now, with PPC32 userspace and PPC64 kernel space (which is pretty >>>>>>> common), >>>>>>> we can't interpret the bits properly anymore, because we think >>>>>>> long is >>>>>>> 32 bits wide. >>>>>>> >>>>>>> So for PPC dirty bitmap analysis, let's just assume we're always >>>>>>> running >>>>>>> on a PPC64 host. Currently there is no dirty bitmap >>>>>>> implementation for >>>>>>> PPC32 / PPCEMB anyways. >>>>>>> >>>>>>> Unbreaks dirty logging on PPC. >>>>>>> >>>>>>> Signed-off-by: Alexander Graf >>>>>>> --- >>>>>>> kvm-all.c | 6 ++++++ >>>>>>> 1 files changed, 6 insertions(+), 0 deletions(-) >>>>>>> >>>>>>> diff --git a/kvm-all.c b/kvm-all.c >>>>>>> index 824bb4c..bfaa623 100644 >>>>>>> --- a/kvm-all.c >>>>>>> +++ b/kvm-all.c >>>>>>> @@ -357,7 +357,13 @@ int >>>>>>> kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, >>>>>>> for (phys_addr = mem->start_addr, addr = mem->phys_offset; >>>>>>> phys_addr < mem->start_addr + mem->memory_size; >>>>>>> phys_addr += TARGET_PAGE_SIZE, addr += >>>>>>> TARGET_PAGE_SIZE) { >>>>>>> +#ifdef HOST_PPC >>>>>>> + /* Big endian keeps us from having different long >>>>>>> sizes >>>>>>> in user and >>>>>>> + * kernel space, so assume we're always on ppc64. */ >>>>>>> + uint64_t *bitmap = (uint64_t *)d.dirty_bitmap; >>>>>>> +#else >>>>>>> unsigned long *bitmap = (unsigned long *)d.dirty_bitmap; >>>>>>> +#endif >>>>>>> unsigned nr = (phys_addr - mem->start_addr) >> >>>>>>> TARGET_PAGE_BITS; >>>>>>> unsigned word = nr / (sizeof(*bitmap) * 8); >>>>>>> unsigned bit = nr % (sizeof(*bitmap) * 8); >>>>>> >>>>>> This rather screams for a generic fix. Current code assumes >>>>>> sizeof(unsigned long) == 8. That should already break on 32-bit x86 >>>>>> hosts. So either do (sizeof(*bitmap) * sizeof(unsigned long)) or >>>>>> switch >>>>>> to uint64_t - but for ALL hosts. >>>>> >>>>> I don't see where that would break. The kernel treats the array as >>>>> ulong*, userspace treats it as ulong* and set_bit in kernel does >>>>> bitmap[word] |= (1 << bit). So as long as userspace long and kernel >>>>> long >>>>> are the same, it works. >>>>> >>>>> In fact - it should even work out with little endian and different >>>>> ulong >>>>> sizes. It just breaks on BE. >>>> >>>> Err, yes, forget it. >>>> >>>> But let's help me understanding the actual problem: Do you have >>>> different ulong sizes in your scenario? Why? Is it a compat issue of >>>> 32-bit userland on 64-bit kernel? >>> >>> 32-bit userland on 64-bit kernel. >> >> OK. So this is an issue due to an underspecified KVM ABI, right? > > Well it's a design decision in the (generic KVM) ABI. I wouldn't call it a "decision" :). I think it happened to be ulong as KVM grew up on x86. > >>> kernel: sizeof(ulong) = 8 >>> userspace: sizeof(ulong) = 4 >>> >>> now, with big endian, a "1" is on the rightmost byte - which means >>> looking at the bytes it's >>> >>> kernel: byte[7] >>> userspace: byte[3] >>> >>> So if you set bit nr "1" with the current logic, the kernel would set >>> bit "1" (in the first 8 bytes), userspace would read bit "1" in the >>> second byte, thus 32 + 1. >>> >>> On little endian, the lower word is on the first 4 bytes, so it would >>> still be bit "1" in the first byte. >>> >> >> Big endian machines require us to agree on the word size of the bitmap >> so that 32-on-64-bit works - and 32-on-32 doesn't break. I think the >> latter would be the case with your patch, no? Or don't we have 32-bit >> KVM PowerPC kernels? > > There are no 32-bit PowerPC KVM kernels that can do dirty logging. By design or by lack of implementation? And what if some other arch with support for both pops up? #ifdef HOST_PPC is no long-term solutions. And I see no need to install a temporary workaround given the currently existing kernel support. > >> In any case, I suggest to pin down the word size and use it for all >> hosts. > > That would break backwards compatibility. x86 and ia64 are little endian for which is doesn't matter, PowerPC and s390 don't support dirty logging so far. Or what would break? Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux