From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH] kvm: Fix memory slot page alignment logic Date: Mon, 10 Nov 2014 15:47:49 +0100 Message-ID: <5460D015.10900@suse.de> References: <1415395125-18926-1-git-send-email-agraf@suse.de> <20141110133101.194f5ea0@nial.usersys.redhat.com> <5460BACA.9000702@suse.de> <20141110145511.173f48ec@nial.usersys.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: qemu-ppc@nongnu.org, stuart.yoder@freescale.com, qemu-devel@nongnu.org, pbonzini@redhat.com, kvm@vger.kernel.org, qemu-stable@nongnu.org To: Igor Mammedov Return-path: Received: from cantor2.suse.de ([195.135.220.15]:41632 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753063AbaKJOrv (ORCPT ); Mon, 10 Nov 2014 09:47:51 -0500 In-Reply-To: <20141110145511.173f48ec@nial.usersys.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 10.11.14 14:55, Igor Mammedov wrote: > On Mon, 10 Nov 2014 14:16:58 +0100 > Alexander Graf wrote: > >> >> >> On 10.11.14 13:31, Igor Mammedov wrote: >>> On Fri, 7 Nov 2014 22:18:45 +0100 >>> Alexander Graf wrote: >>> >>>> Memory slots have to be page aligned to get entered into KVM. There >>>> is existing logic that tries to ensure that we pad memory slots that >>>> are not page aligned to the biggest region that would still fit in the >>>> alignment requirements. >>>> >>>> Unfortunately, that logic is broken. It tries to calculate the start >>>> offset based on the region size. >>>> >>>> Fix up the logic to do the thing it was intended to do and document it >>>> properly in the comment above it. >>>> >>>> With this patch applied, I can successfully run an e500 guest with more >>>> than 3GB RAM (at which point RAM starts overlapping subpage memory regions). >>>> >>>> Cc: qemu-stable@nongnu.org >>>> Signed-off-by: Alexander Graf >>>> --- >>>> kvm-all.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kvm-all.c b/kvm-all.c >>>> index 44a5e72..596e7ce 100644 >>>> --- a/kvm-all.c >>>> +++ b/kvm-all.c >>>> @@ -634,8 +634,10 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add) >>>> unsigned delta; >>>> >>>> /* kvm works in page size chunks, but the function may be called >>>> - with sub-page size and unaligned start address. */ >>>> - delta = TARGET_PAGE_ALIGN(size) - size; >>>> + with sub-page size and unaligned start address. Pad the start >>>> + address to next and truncate size to previous page boundary. */ >>> I'm a bit confused how it works at all. >>> Lets assume that there is no mapped pages that include start_addr, >>> then if start_addr were padded to next page, kvm would map it from there >>> but the rest of QEMU would still use unaligned start_addr for MemoryRegion >>> that isn't even mapped. >> >> Sorry, I don't understand this paragraph. Memory slots in general are >> accelerations for memory access - for MMIO (RAM is usually aligned), KVM >> can always exit to QEMU and just do a manual MMIO exit. >> >>> It would seem that instead of padding up to the next page, start_addr >>> should be moved to the start of the page that includes it to make page >>> with original start_addr available to guest. >> >> No, because in that case you would map something as RAM that really >> isn't RAM. >> >> Imagine you have the following memory layout: >> >> 0x1000 page size >> >> 1) 0x00000 - 0x10000 RAM >> 2) 0x10000 - 0x10100 MMIO >> 3) 0x10100 - 0x20000 RAM >> >> Then you want to map 1) as memory slot and 4) from 0x11000 onwards as >> memory slot. > so every access to RAM 0x10100-0x11000 which is not represented as memory > slot would cause VMEXIT? Yes, there's no other way. Otherwise we wouldn't be able to trap on the exits from 0x10000 - 0x10100. Hardware only gives us page granularity. Usually this isn't an issue because overlapping MMIO regions are pretty large chunks of power-of-2 size - if you see any overlapping at all. On e500 this bites us though, because we end up with small MSI-X windows inside our address space (which in turn might also be a bug, but that doesn't mean that the slot mapping logic should be left as broken as it is). Alex