From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 24 May 2002 05:10:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 24 May 2002 05:10:16 -0400 Received: from e1.ny.us.ibm.com ([32.97.182.101]:13952 "EHLO e1.ny.us.ibm.com") by vger.kernel.org with ESMTP id ; Fri, 24 May 2002 05:10:16 -0400 Date: Fri, 24 May 2002 14:43:01 +0530 From: Dipankar Sarma To: BALBIR SINGH Cc: linux-kernel@vger.kernel.org, Rusty Russell , Paul McKenney , lse-tech@lists.sourceforge.net Subject: Re: [Lse-tech] Re: [RFC] Dynamic percpu data allocator Message-ID: <20020524144301.D11249@in.ibm.com> Reply-To: dipankar@in.ibm.com In-Reply-To: <20020524114318.A11249@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 24, 2002 at 02:08:50PM +0530, BALBIR SINGH wrote: > > Sure, I understand what you are talking about now. That makes a lot > of sense, I will go through your document once more and read it. > I was thinking of the two combined (allocating CPU local memory > for certain data structs also includes allocating one copy per CPU). > Is there a reason to delay the implementation of CPU local memory, > or are we waiting for NUMA guys to do it? Is it not useful in an > SMP system to allocate CPU local memory? In an SMP system, the entire memory is equidistant from the CPUs. So, any memory that is exclusively accessed by once cpu only is CPU-local. On a NUMA machine however that isn't true, so you need special schemes. The thing about one-copy-per-cpu allocator that I describe is that it interleaves per-cpu data to save on space. That is if you allocate per-cpu ints i1, i2, it will be laid out in memory like this - CPU #0 CPU#1 --------- --------- Start of cache line i1 i1 i2 i2 . . . . . . . . . . --------- ---------- End of cache line The per-cpu copies of i1 and i2 for CPU #0 and CPU #1 are allocated from different cache lines of memory, but copy of i1 and i2 for CPU #0 are in the same cache line. This interleaving saves space by avoiding the need to pad small data structures to cache line sizes. This essentially how the static per-cpu data area in 2.5 kernel is laid out in memory. Since copies for CPU #0 and CPU #1 for the same variable are on different cache lines, assuming that code that accesses "this" CPU's copy will not result in cache line bouncing. On an SMP machine, I can allocate the cache lines for different CPUs, where the interleaved data structures are laid out, using the slab allocator. On a NUMA machine however, I would want to make sure that cache line allocated for this purpose for CPU #N is closest possible to CPU #N. Thanks -- Dipankar Sarma http://lse.sourceforge.net Linux Technology Center, IBM Software Lab, Bangalore, India.