From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45325)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1XnqG7-00005i-DC
	for qemu-devel@nongnu.org; Mon, 10 Nov 2014 09:48:04 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1XnqFz-0005O0-FS
	for qemu-devel@nongnu.org; Mon, 10 Nov 2014 09:47:59 -0500
Message-ID: <5460D015.10900@suse.de>
Date: Mon, 10 Nov 2014 15:47:49 +0100
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <1415395125-18926-1-git-send-email-agraf@suse.de>	<20141110133101.194f5ea0@nial.usersys.redhat.com>	<5460BACA.9000702@suse.de>
	<20141110145511.173f48ec@nial.usersys.redhat.com>
In-Reply-To: <20141110145511.173f48ec@nial.usersys.redhat.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] kvm: Fix memory slot page alignment logic
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Igor Mammedov <imammedo@redhat.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-stable@nongnu.org, stuart.yoder@freescale.com, qemu-ppc@nongnu.org, pbonzini@redhat.com


On 10.11.14 14:55, Igor Mammedov wrote:
> On Mon, 10 Nov 2014 14:16:58 +0100
> Alexander Graf <agraf@suse.de> wrote:
> 
>>
>>
>> On 10.11.14 13:31, Igor Mammedov wrote:
>>> On Fri,  7 Nov 2014 22:18:45 +0100
>>> Alexander Graf <agraf@suse.de> wrote:
>>>
>>>> Memory slots have to be page aligned to get entered into KVM. There
>>>> is existing logic that tries to ensure that we pad memory slots that
>>>> are not page aligned to the biggest region that would still fit in the
>>>> alignment requirements.
>>>>
>>>> Unfortunately, that logic is broken. It tries to calculate the start
>>>> offset based on the region size.
>>>>
>>>> Fix up the logic to do the thing it was intended to do and document it
>>>> properly in the comment above it.
>>>>
>>>> With this patch applied, I can successfully run an e500 guest with more
>>>> than 3GB RAM (at which point RAM starts overlapping subpage memory regions).
>>>>
>>>> Cc: qemu-stable@nongnu.org
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>>  kvm-all.c | 6 ++++--
>>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kvm-all.c b/kvm-all.c
>>>> index 44a5e72..596e7ce 100644
>>>> --- a/kvm-all.c
>>>> +++ b/kvm-all.c
>>>> @@ -634,8 +634,10 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>>>>      unsigned delta;
>>>>  
>>>>      /* kvm works in page size chunks, but the function may be called
>>>> -       with sub-page size and unaligned start address. */
>>>> -    delta = TARGET_PAGE_ALIGN(size) - size;
>>>> +       with sub-page size and unaligned start address. Pad the start
>>>> +       address to next and truncate size to previous page boundary. */
>>> I'm a bit confused how it works at all.
>>> Lets assume that there is no mapped pages that include start_addr,
>>> then if start_addr were padded to next page, kvm would map it from there
>>> but the rest of QEMU would still use unaligned start_addr for MemoryRegion
>>> that isn't even mapped.
>>
>> Sorry, I don't understand this paragraph. Memory slots in general are
>> accelerations for memory access - for MMIO (RAM is usually aligned), KVM
>> can always exit to QEMU and just do a manual MMIO exit.
>>
>>> It would seem that instead of padding up to the next page, start_addr
>>> should be moved to the start of the page that includes it to make page
>>> with original start_addr available to guest.
>>
>> No, because in that case you would map something as RAM that really
>> isn't RAM.
>>
>> Imagine you have the following memory layout:
>>
>> 0x1000 page size
>>
>> 1) 0x00000 - 0x10000 RAM
>> 2) 0x10000 - 0x10100 MMIO
>> 3) 0x10100 - 0x20000 RAM
>>
>> Then you want to map 1) as memory slot and 4) from 0x11000 onwards as
>> memory slot.
> so every access to RAM 0x10100-0x11000 which is not represented as memory
> slot would cause VMEXIT?

Yes, there's no other way. Otherwise we wouldn't be able to trap on the
exits from 0x10000 - 0x10100. Hardware only gives us page granularity.

Usually this isn't an issue because overlapping MMIO regions are pretty
large chunks of power-of-2 size - if you see any overlapping at all. On
e500 this bites us though, because we end up with small MSI-X windows
inside our address space (which in turn might also be a bug, but that
doesn't mean that the slot mapping logic should be left as broken as it is).


Alex