From mboxrd@z Thu Jan 1 00:00:00 1970 From: tgh Subject: Re: Re: NUMA and SMP Date: Tue, 20 Mar 2007 21:10:10 +0800 Message-ID: <45FFDD32.8030607@ncic.ac.cn> References: <907625E08839C4409CE5768403633E0B018E1879@sefsexmb1.amd.com> <8790346913e7b2e96fdc58199e039895@xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <8790346913e7b2e96fdc58199e039895@xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Emmanuel Ackaouy Cc: Ryan Harper , "Petersson, Mats" , xen-devel , David Pilger , Anthony Liguori List-Id: xen-devel@lists.xenproject.org I am puzzled ,what is the page migration? Thank you in advance Emmanuel Ackaouy =E5=86=99=E9=81=93: > On the topic of NUMA: > > I'd like to dispute the assumption that a NUMA-aware OS can actually > make good decisions about the initial placement of memory in a > reasonable hardware ccNUMA system. > > How does the OS know on which node a particular chunk of memory > will be most accessed? The truth is that unless the application or > person running the application is herself NUMA-aware and can provide > placement hints or directives, the OS will seldom beat a round-robin / > interleave or random placement strategy. > > To illustrate, consider an app which lays out a bunch of data in memory > in a single thread and then spawns worker threads to process it. > > Is the OS to place memory close to the initial thread? How can it=20 > possibly > know how many threads will eventually process the data? > > Even if the OS knew how many threads will eventually crunch the data, > it cannot possibly know at placement time if each thread will work on a= n > assigned data subset (and if so, which one) or if it will act as a=20 > pipeline > stage with all the data being passed from one thread to the next. > > If you go beyond initial memory placement or start considering memory > migration, then it's even harder to win because you have to pay copy > and stall penalties during migrations. So you have to be real smart > about predicting the future to do better than your ~10-40% memory > bandwidth and latency hit associated with doing simple memory > interleaving on a modern hardware-ccNUMA system. > > And it gets worse for you when your app is successfully taking advantag= e > of the memory cache hierarchy because its performance is less impacted > by raw memory latency and bandwidth. > > Things also get more difficult on a time-sharing host with competing > apps. > > There is a strong argument for making hypervisors and OSes NUMA > aware in the sense that: > 1- They know about system topology > 2- They can export this information up the stack to applications and=20 > users > 3- They can take in directives from users and applications to=20 > partition the > host and place some threads and memory in specific partitions. > 4- They use an interleaved (or random) initial memory placement strateg= y > by default. > > The argument that the OS on its own -- without user or application > directives -- can make better placement decisions than round-robin or > random placement is -- in my opinion -- flawed. > > I also am skeptical that the complexity associated with page migration > strategies would be worthwhile: If you got it wrong the first time, wha= t > makes you think you'll do better this time? > > Emmanuel. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >