From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: NUMA guest: best-fit-nodes algorithm (was Re: [PATCH 00/11] PV NUMA Guests) Date: Fri, 23 Apr 2010 14:45:58 +0200 Message-ID: <4BD19686.1050602@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Dulloor , "Cui, Dexuan" Cc: xen-devel , "Nakajima, Jun" List-Id: xen-devel@lists.xenproject.org Dulloor wrote: > Cui, Dexuan wrote: >> xc_select_best_fit_nodes() decides the "min-set" of host nodes that >> will be used for the guest. It only considers the current memory >> usage of the system. Maybe we should also condider the cpu load? And >> the number of the nodes must be 2^^n? And how to handle the case >> #vcpu is < #vnode? >> And looks your patches only consider the guest's memory requirement >> -- guest's vcpu requirement is neglected? e.g., a guest may not need >> a very large amount of memory while it needs many vcpus. >> xc_select_best_fit_nodes() should consider this when >> determining the number of vnode. > I agree with you. I was planning to consider vcpu load as the next > step. Also, I am looking for a good heuristic. I looked at the > nodeload heuristic (currently in xen), but found it too naive. > But, if you/Andre think it is a good heuristic, I will add the > support. Actually, I think in future we should do away with strict > vcpu-affinities and rely more on a scheduler with necessary NUMA > support to complement our placement strategies. > > As of now, we don't SPLIT, if #vcpu < #vnode. We use STRIPING in that > case. Determing the current load of a node is quite a hard thing to do currently in Xen. If guests are pinned to nodes (which I'd consider necessary with the current credit scheduler), then using this affinity is a good heuristic to find good nodes, at least the best I can think of. So until we have a NUMA aware scheduler, we should go with this solution. Of course it only measures the theoretical load of a node and doesn't distinguish between idle and loaded guests. One would need something like a permanently running xm top to gather statistics about the guest's load, but that is something for a future patch. (Or is there a guest load metric already measured in Xen?) Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12