From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265920AbUEUR2l (ORCPT ); Fri, 21 May 2004 13:28:41 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265872AbUEUR2k (ORCPT ); Fri, 21 May 2004 13:28:40 -0400 Received: from mail.fastclick.com ([205.180.85.17]:3246 "EHLO mail.fastclick.net") by vger.kernel.org with ESMTP id S265920AbUEUR2j (ORCPT ); Fri, 21 May 2004 13:28:39 -0400 Message-ID: <40AE3BF5.5080804@fastclick.com> Date: Fri, 21 May 2004 10:27:17 -0700 From: "Brett E." Reply-To: brettspamacct@fastclick.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Martin J. Bligh" CC: linux-kernel mailing list , jbarnes@engr.sgi.com Subject: Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)? References: <40AD52A4.3060607@fastclick.com> <273180000.1085121453@[10.10.2.4]> In-Reply-To: <273180000.1085121453@[10.10.2.4]> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Martin J. Bligh wrote: >>Say you have a bunch of single-threaded processes on a NUMA machine. >>Does the kernel make sure to prefer allocations using a certain CPU's >>memory, preferring to run a given process on the CPU which contains >>its memory? Or should I use the NUMA API(libnuma) to spell this out >>to the kernel? Does the kernel do the right thing in this case? > > > The kernel will generally do the right thing (process local alloc) by > default. In 99% of cases, you don't want to muck with it - unless you're > running one single app dominating the whole system, and nothing else is > going on, you probably don't want to specify anything explicitly. > > M. > Let's say I have a 2 way opteron and want to run 4 long-lived processes. I fork and exec to create 1 of the processes, it chooses to run on processor 0 since processor 1 is overloaded at that time, so its homenode is processor 0. I fork and exec another, it chooses processor 0 since processors 1 is overloaded at that time. .. Let's say an uneven distribution is chosen for all 4 processes, with all processes mapped to processor 0. So they allocate on node 0 yet the scheduler will map these to both processors since CPU should be balanced. In this case, you will have a situation where the second processor will have to fetch memory from the other processor's memory. So a better solution would be to use numactl to set the homenodes explicitly, choosing processor 0 for 2 processes, processor 1 for the 2 other processes. Is this incorrect?