From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47354)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1Vq5Jf-0001m2-3I
	for qemu-devel@nongnu.org; Mon, 09 Dec 2013 13:12:29 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1Vq5JZ-0006Eq-37
	for qemu-devel@nongnu.org; Mon, 09 Dec 2013 13:12:23 -0500
Received: from mx1.redhat.com ([209.132.183.28]:6665)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1Vq5JY-0006ER-RC
	for qemu-devel@nongnu.org; Mon, 09 Dec 2013 13:12:17 -0500
Date: Mon, 9 Dec 2013 16:10:33 -0200
From: Marcelo Tosatti <mtosatti@redhat.com>
Message-ID: <20131209181032.GA8315@amt.cnet>
References: <1386143939-19142-1-git-send-email-gaowanlong@cn.fujitsu.com>
	<52A1939D.1080709@redhat.com> <20131206184936.GA10903@amt.cnet>
	<52A5FEF5.1010504@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <52A5FEF5.1010504@redhat.com>
Subject: Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest
 numa nodes to host numa nodes
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: drjones@redhat.com, ehabkost@redhat.com, lersek@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, bsd@redhat.com, anthony@codemonkey.ws, hutao@cn.fujitsu.com, y-goto@jp.fujitsu.com, peter.huangpeng@huawei.com, afaerber@suse.de, Wanlong Gao <gaowanlong@cn.fujitsu.com>

On Mon, Dec 09, 2013 at 06:33:41PM +0100, Paolo Bonzini wrote:
> Il 06/12/2013 19:49, Marcelo Tosatti ha scritto:
> >> > You'll have with your patches (without them it's worse of course):
> >> > 
> >> >    RAM offset    physical address   node 0
> >> >    0-3840M       0-3840M            host node 0
> >> >    4096M-4352M   4096M-4352M        host node 0
> >> >    4352M-8192M   4352M-8192M        host node 1
> >> >    3840M-4096M   8192M-8448M        host node 1
> >> > 
> >> > So only 0-3G and 5-8G are aligned, 3G-5G and 8G-8.25G cannot use
> >> > gigabyte pages because they are split across host nodes.
> > AFAIK the TLB caches virt->phys translations, why specifics of 
> > a given phys address is a factor into TLB caching?
> 
> The problem is that "-numa mem" receives memory sizes and these do not
> take into account the hole below 4G.
> 
> Thus, two adjacent host-physical addresses (two adjacent ram_addr_t-s)
> map to very far guest-physical addresses, are assigned to different
> guest nodes, and from there to different host nodes.  In the above
> example this happens for 3G-5G.

Physical address which is what the TLB uses does not take node
information into account.

> On second thought, this is not particularly important, or at least not
> yet.  It's not really possible to control the NUMA policy for
> hugetlbfs-allocated memory, right?

It is possible. I don't know what happens if conflicting NUMA policies
are specified for different virtual address ranges that map to a single
huge page.

In whatever way that is resolved by the kernel, it is not relevant since the TLB
caches phys->virt translations and not {phys, node info}->virt
translations.

> >> > So rather than your patches, it seems simpler to just widen the PCI hole
> >> > to 1G for i440FX and 2G for q35.
> >> > 
> >> > What do you think?
> > 
> > Problem is its a guest visible change. To get 1GB TLB entries with
> > "legacy guest visible machine types" (which require new machine types
> > at the host side, but invisible to guest), that won't work.
> > Windows registration invalidation etc.
> 
> Yeah, that's a tradeoff to make.

Perhaps increasing the PCI hole size should be done for other reasons?
Note that dropping the 1GB alignment piix.c patch requires the hole size
+ start to be 1G aligned.