From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Graf <agraf@suse.de>
Subject: Re: [PATCH] kvm: Fix memory slot page alignment logic
Date: Mon, 10 Nov 2014 15:47:49 +0100
Message-ID: <5460D015.10900@suse.de>
References: <1415395125-18926-1-git-send-email-agraf@suse.de>	<20141110133101.194f5ea0@nial.usersys.redhat.com>	<5460BACA.9000702@suse.de> <20141110145511.173f48ec@nial.usersys.redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Cc: qemu-ppc@nongnu.org, stuart.yoder@freescale.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com, kvm@vger.kernel.org,
	qemu-stable@nongnu.org
To: Igor Mammedov <imammedo@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from cantor2.suse.de ([195.135.220.15]:41632 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753063AbaKJOrv (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 10 Nov 2014 09:47:51 -0500
In-Reply-To: <20141110145511.173f48ec@nial.usersys.redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


On 10.11.14 14:55, Igor Mammedov wrote:
> On Mon, 10 Nov 2014 14:16:58 +0100
> Alexander Graf <agraf@suse.de> wrote:
> 
>>
>>
>> On 10.11.14 13:31, Igor Mammedov wrote:
>>> On Fri,  7 Nov 2014 22:18:45 +0100
>>> Alexander Graf <agraf@suse.de> wrote:
>>>
>>>> Memory slots have to be page aligned to get entered into KVM. There
>>>> is existing logic that tries to ensure that we pad memory slots that
>>>> are not page aligned to the biggest region that would still fit in the
>>>> alignment requirements.
>>>>
>>>> Unfortunately, that logic is broken. It tries to calculate the start
>>>> offset based on the region size.
>>>>
>>>> Fix up the logic to do the thing it was intended to do and document it
>>>> properly in the comment above it.
>>>>
>>>> With this patch applied, I can successfully run an e500 guest with more
>>>> than 3GB RAM (at which point RAM starts overlapping subpage memory regions).
>>>>
>>>> Cc: qemu-stable@nongnu.org
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>>  kvm-all.c | 6 ++++--
>>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kvm-all.c b/kvm-all.c
>>>> index 44a5e72..596e7ce 100644
>>>> --- a/kvm-all.c
>>>> +++ b/kvm-all.c
>>>> @@ -634,8 +634,10 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>>>>      unsigned delta;
>>>>  
>>>>      /* kvm works in page size chunks, but the function may be called
>>>> -       with sub-page size and unaligned start address. */
>>>> -    delta = TARGET_PAGE_ALIGN(size) - size;
>>>> +       with sub-page size and unaligned start address. Pad the start
>>>> +       address to next and truncate size to previous page boundary. */
>>> I'm a bit confused how it works at all.
>>> Lets assume that there is no mapped pages that include start_addr,
>>> then if start_addr were padded to next page, kvm would map it from there
>>> but the rest of QEMU would still use unaligned start_addr for MemoryRegion
>>> that isn't even mapped.
>>
>> Sorry, I don't understand this paragraph. Memory slots in general are
>> accelerations for memory access - for MMIO (RAM is usually aligned), KVM
>> can always exit to QEMU and just do a manual MMIO exit.
>>
>>> It would seem that instead of padding up to the next page, start_addr
>>> should be moved to the start of the page that includes it to make page
>>> with original start_addr available to guest.
>>
>> No, because in that case you would map something as RAM that really
>> isn't RAM.
>>
>> Imagine you have the following memory layout:
>>
>> 0x1000 page size
>>
>> 1) 0x00000 - 0x10000 RAM
>> 2) 0x10000 - 0x10100 MMIO
>> 3) 0x10100 - 0x20000 RAM
>>
>> Then you want to map 1) as memory slot and 4) from 0x11000 onwards as
>> memory slot.
> so every access to RAM 0x10100-0x11000 which is not represented as memory
> slot would cause VMEXIT?

Yes, there's no other way. Otherwise we wouldn't be able to trap on the
exits from 0x10000 - 0x10100. Hardware only gives us page granularity.

Usually this isn't an issue because overlapping MMIO regions are pretty
large chunks of power-of-2 size - if you see any overlapping at all. On
e500 this bites us though, because we end up with small MSI-X windows
inside our address space (which in turn might also be a bug, but that
doesn't mean that the slot mapping logic should be left as broken as it is).


Alex