From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45325) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XnqG7-00005i-DC for qemu-devel@nongnu.org; Mon, 10 Nov 2014 09:48:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XnqFz-0005O0-FS for qemu-devel@nongnu.org; Mon, 10 Nov 2014 09:47:59 -0500 Message-ID: <5460D015.10900@suse.de> Date: Mon, 10 Nov 2014 15:47:49 +0100 From: Alexander Graf MIME-Version: 1.0 References: <1415395125-18926-1-git-send-email-agraf@suse.de> <20141110133101.194f5ea0@nial.usersys.redhat.com> <5460BACA.9000702@suse.de> <20141110145511.173f48ec@nial.usersys.redhat.com> In-Reply-To: <20141110145511.173f48ec@nial.usersys.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] kvm: Fix memory slot page alignment logic List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-stable@nongnu.org, stuart.yoder@freescale.com, qemu-ppc@nongnu.org, pbonzini@redhat.com On 10.11.14 14:55, Igor Mammedov wrote: > On Mon, 10 Nov 2014 14:16:58 +0100 > Alexander Graf wrote: > >> >> >> On 10.11.14 13:31, Igor Mammedov wrote: >>> On Fri, 7 Nov 2014 22:18:45 +0100 >>> Alexander Graf wrote: >>> >>>> Memory slots have to be page aligned to get entered into KVM. There >>>> is existing logic that tries to ensure that we pad memory slots that >>>> are not page aligned to the biggest region that would still fit in the >>>> alignment requirements. >>>> >>>> Unfortunately, that logic is broken. It tries to calculate the start >>>> offset based on the region size. >>>> >>>> Fix up the logic to do the thing it was intended to do and document it >>>> properly in the comment above it. >>>> >>>> With this patch applied, I can successfully run an e500 guest with more >>>> than 3GB RAM (at which point RAM starts overlapping subpage memory regions). >>>> >>>> Cc: qemu-stable@nongnu.org >>>> Signed-off-by: Alexander Graf >>>> --- >>>> kvm-all.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kvm-all.c b/kvm-all.c >>>> index 44a5e72..596e7ce 100644 >>>> --- a/kvm-all.c >>>> +++ b/kvm-all.c >>>> @@ -634,8 +634,10 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add) >>>> unsigned delta; >>>> >>>> /* kvm works in page size chunks, but the function may be called >>>> - with sub-page size and unaligned start address. */ >>>> - delta = TARGET_PAGE_ALIGN(size) - size; >>>> + with sub-page size and unaligned start address. Pad the start >>>> + address to next and truncate size to previous page boundary. */ >>> I'm a bit confused how it works at all. >>> Lets assume that there is no mapped pages that include start_addr, >>> then if start_addr were padded to next page, kvm would map it from there >>> but the rest of QEMU would still use unaligned start_addr for MemoryRegion >>> that isn't even mapped. >> >> Sorry, I don't understand this paragraph. Memory slots in general are >> accelerations for memory access - for MMIO (RAM is usually aligned), KVM >> can always exit to QEMU and just do a manual MMIO exit. >> >>> It would seem that instead of padding up to the next page, start_addr >>> should be moved to the start of the page that includes it to make page >>> with original start_addr available to guest. >> >> No, because in that case you would map something as RAM that really >> isn't RAM. >> >> Imagine you have the following memory layout: >> >> 0x1000 page size >> >> 1) 0x00000 - 0x10000 RAM >> 2) 0x10000 - 0x10100 MMIO >> 3) 0x10100 - 0x20000 RAM >> >> Then you want to map 1) as memory slot and 4) from 0x11000 onwards as >> memory slot. > so every access to RAM 0x10100-0x11000 which is not represented as memory > slot would cause VMEXIT? Yes, there's no other way. Otherwise we wouldn't be able to trap on the exits from 0x10000 - 0x10100. Hardware only gives us page granularity. Usually this isn't an issue because overlapping MMIO regions are pretty large chunks of power-of-2 size - if you see any overlapping at all. On e500 this bites us though, because we end up with small MSI-X windows inside our address space (which in turn might also be a bug, but that doesn't mean that the slot mapping logic should be left as broken as it is). Alex