From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1MUJJN-0005J2-10
	for qemu-devel@nongnu.org; Fri, 24 Jul 2009 07:51:41 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1MUJJI-0005Do-9O
	for qemu-devel@nongnu.org; Fri, 24 Jul 2009 07:51:40 -0400
Received: from [199.232.76.173] (port=58407 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1MUJJH-0005Dc-V0
	for qemu-devel@nongnu.org; Fri, 24 Jul 2009 07:51:36 -0400
Received: from gecko.sbs.de ([194.138.37.40]:15080)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <jan.kiszka@siemens.com>) id 1MUJJH-0007tp-0P
	for qemu-devel@nongnu.org; Fri, 24 Jul 2009 07:51:35 -0400
Message-ID: <4A69A044.6080502@siemens.com>
Date: Fri, 24 Jul 2009 13:51:32 +0200
From: Jan Kiszka <jan.kiszka@siemens.com>
MIME-Version: 1.0
References: <1248384704-47824-1-git-send-email-agraf@suse.de>	<1248384704-47824-2-git-send-email-agraf@suse.de>
	<1248384704-47824-3-git-send-email-agraf@suse.de>
	<4A699405.9010703@siemens.com>
	<9F9055AA-6842-4938-AE7E-62A515E01025@suse.de>
	<4A69985B.70308@siemens.com>
	<75AD4198-F91C-45D0-89E5-E9F27FF2B90A@suse.de>
In-Reply-To: <75AD4198-F91C-45D0-89E5-E9F27FF2B90A@suse.de>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH 2/3] Assume PPC64 host on PPC32 KVM
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexander Graf <agraf@suse.de>
Cc: qemu-devel@nongnu.org

Alexander Graf wrote:
> 
> On 24.07.2009, at 13:17, Jan Kiszka wrote:
> 
>> Alexander Graf wrote:
>>>
>>> On 24.07.2009, at 12:59, Jan Kiszka wrote:
>>>
>>>> Alexander Graf wrote:
>>>>> When talking to the kernel about dirty maps, we need to find out which
>>>>> bits were actually set. This is done by set_bit and test_bit like
>>>>> functiontality which uses the "long" variable type.
>>>>>
>>>>> Now, with PPC32 userspace and PPC64 kernel space (which is pretty
>>>>> common),
>>>>> we can't interpret the bits properly anymore, because we think long is
>>>>> 32 bits wide.
>>>>>
>>>>> So for PPC dirty bitmap analysis, let's just assume we're always
>>>>> running
>>>>> on a PPC64 host. Currently there is no dirty bitmap implementation for
>>>>> PPC32 / PPCEMB anyways.
>>>>>
>>>>> Unbreaks dirty logging on PPC.
>>>>>
>>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>>> ---
>>>>> kvm-all.c |    6 ++++++
>>>>> 1 files changed, 6 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/kvm-all.c b/kvm-all.c
>>>>> index 824bb4c..bfaa623 100644
>>>>> --- a/kvm-all.c
>>>>> +++ b/kvm-all.c
>>>>> @@ -357,7 +357,13 @@ int
>>>>> kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>>>>>        for (phys_addr = mem->start_addr, addr = mem->phys_offset;
>>>>>             phys_addr < mem->start_addr + mem->memory_size;
>>>>>             phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
>>>>> +#ifdef HOST_PPC
>>>>> +             /* Big endian keeps us from having different long sizes
>>>>> in user and
>>>>> +              * kernel space, so assume we're always on ppc64. */
>>>>> +            uint64_t *bitmap = (uint64_t *)d.dirty_bitmap;
>>>>> +#else
>>>>>            unsigned long *bitmap = (unsigned long *)d.dirty_bitmap;
>>>>> +#endif
>>>>>            unsigned nr = (phys_addr - mem->start_addr) >>
>>>>> TARGET_PAGE_BITS;
>>>>>            unsigned word = nr / (sizeof(*bitmap) * 8);
>>>>>            unsigned bit = nr % (sizeof(*bitmap) * 8);
>>>>
>>>> This rather screams for a generic fix. Current code assumes
>>>> sizeof(unsigned long) == 8. That should already break on 32-bit x86
>>>> hosts. So either do (sizeof(*bitmap) * sizeof(unsigned long)) or switch
>>>> to uint64_t - but for ALL hosts.
>>>
>>> I don't see where that would break. The kernel treats the array as
>>> ulong*, userspace treats it as ulong* and set_bit in kernel does
>>> bitmap[word] |= (1 << bit). So as long as userspace long and kernel long
>>> are the same, it works.
>>>
>>> In fact - it should even work out with little endian and different ulong
>>> sizes. It just breaks on BE.
>>
>> Err, yes, forget it.
>>
>> But let's help me understanding the actual problem: Do you have
>> different ulong sizes in your scenario? Why? Is it a compat issue of
>> 32-bit userland on 64-bit kernel?
> 
> 32-bit userland on 64-bit kernel.

OK. So this is an issue due to an underspecified KVM ABI, right?

> 
> kernel: sizeof(ulong) = 8
> userspace: sizeof(ulong) = 4
> 
> now, with big endian, a "1" is on the rightmost byte - which means
> looking at the bytes it's
> 
> kernel: byte[7]
> userspace: byte[3]
> 
> So if you set bit nr "1" with the current logic, the kernel would set
> bit "1" (in the first 8 bytes), userspace would read bit "1" in the
> second byte, thus 32 + 1.
> 
> On little endian, the lower word is on the first 4 bytes, so it would
> still be bit "1" in the first byte.
> 

Big endian machines require us to agree on the word size of the bitmap
so that 32-on-64-bit works - and 32-on-32 doesn't break. I think the
latter would be the case with your patch, no? Or don't we have 32-bit
KVM PowerPC kernels?

In any case, I suggest to pin down the word size and use it for all hosts.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux