From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ryan Harper Subject: Re: [PATCH 0/6] xen,xend,tools: NUMA support for Xen Date: Tue, 11 Jul 2006 20:23:13 -0500 Message-ID: <20060712012313.GO1694@us.ibm.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Pratt Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org * Ian Pratt [2006-07-11 16:28]: > > > What sort of box are these numbers taken from? If it's not a NUMA > > > system then the slowdowns are rather poor. We're particularly > > > interested in not slowing down non-NUMA and small-NUMA (e.g., AMD > K8) > > > x86 systems. They are what we really want to see measurements from. > > > > The measurements are taken from a two-way Operton 248 (2.1Ghz) , > > small-NUMA. I agree that there is significant overhead, however, we > > aren't talking about fast path here; correct me if I'm wrong. We > > are only adding overhead to during domain startup. The end result > > being we pay for local memory allocation at creation time while > > benefiting from local memory access for the lifetime of the domain. > > > > I'm going to gather some oprofile data to see if I missed something > > obvious, but in general I think that having local memory is of greater > > benefit for the lifetime of a domain than the cost we incur during its > > creation. > > What do the numbers look like on a 1 node system? For K8 small numa, I don't have a 1 node system available. > > The shadow mode code potentially churns the page allocator a fair bit. > It'll be disappointing if we have to add complexity of quicklists etc. Yeah, I forgot about shadow mode; good point. > > It does kind of surprise me that the overhead is as high as you've > measured. In the case where there's memory available in the favoured > node I'd expect allocation performance to be very similar. 4 times > slower and worsening for large allocations seems odd -- 0.3 microseconds > a page is a bit more than I'd expect during back-to-back allocations. > It's certainly worth trying to understand the overhead a bit more. I agree. I'm a little mystified by the overhead as well. On the larger system, ballooning up to 23G had something like 11% overhead, which was more reasonable, though the domain creation tests showed more than 11% on that system as well. I'll get the oprofile data and take a look. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com