From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 30 May 2002 13:51:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 30 May 2002 13:51:50 -0400 Received: from e21.nc.us.ibm.com ([32.97.136.227]:19084 "EHLO e21.nc.us.ibm.com") by vger.kernel.org with ESMTP id ; Thu, 30 May 2002 13:51:49 -0400 Date: Thu, 30 May 2002 23:25:13 +0530 From: Dipankar Sarma To: Mala Anand Cc: BALBIR SINGH , linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net, lse-tech-admin@lists.sourceforge.net, Paul McKenney , Rusty Russell Subject: Re: [Lse-tech] Re: [RFC] Dynamic percpu data allocator Message-ID: <20020530232513.C3575@in.ibm.com> Reply-To: dipankar@in.ibm.com In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 30, 2002 at 08:56:36AM -0500, Mala Anand wrote: > > dipankar@beaverton.ibm.co > m To: BALBIR SINGH > > >The per-cpu data allocator allocates one copy for *each* CPU. > >It uses the slab allocator underneath. Eventually, when/if we have > >per-cpu/numa-node slab allocation, the per-cpu data allocator > >can allocate every CPU's copy from memory closest to it. > > Does this mean that memory allocation will happen in "each" CPU? > Do slab allocator allocate the memory in each cpu? Your per-cpu > data allocator sounds like the hot list skbs that are in the tcpip stack > in the sense it is one level above the slab allocator and the list is > kept per cpu. If slab allocator is fixed for per cpu, do you still > need this per-cpu data allocator? Actually I don't know for sure what plans are afoot to fix the slab allocator for per-cpu. One plan I heard about was allocating from per-cpu pools rather than per-cpu copies. My requirements are similar to the hot list skbs. I want to do this - int *ctrp1, *ctrp2; ctrp1 = kmalloc_percpu(sizeof(*ctrp1), GFP_ATOMIC); if (ctrp1 == NULL) { /* recover */ } ctrp2 = kmalloc_percpu(sizeof(*ctrp2), GFP_ATOMIC); if (ctrp2 == NULL) { /* recover */ } *per_cpu_ptr(ctrp1, smp_processor_id())++; this_cpu_ptr(ctrp2)++; Now I can allocate by making ctrp1/ctrp2 point to an array of NR_CPUS and kmalloc() memory for each CPU's copy of the int. This is simple and will work. void **ptrs = kmalloc(sizeof(*ptrs) * NR_CPUS, flags); if (!ptrs) return NULL; for (i = 0; i < NR_CPUS; i++) { ptrs[i] = kmalloc(size, flags); if (!ptrs[i]) goto unwind_oom; } However I would like to use kmalloc_percpu() for allocating very small objects - typlically integer counters or small structures to be used as per-cpu counters for things like dst entries and dentries. kmalloc will waste the rest of the cache line for such small objects. The alternative is to use a layer of code to interleave small objects and save on space. CPU #0 CPU#1 --------- --------- Start of cache line *ctrp1 *ctrp1 *ctrp2 *ctrp2 . . . . . . . . . . --------- ---------- End of cache line I have an allocator that interleaves objects like this if they can be fitted into size that is a factor of SMP_CACHE_BYTES. I hope someone can tell me that I don't even have to do this. Otherwise I will go ahead and do my thing. Thanks -- Dipankar Sarma http://lse.sourceforge.net Linux Technology Center, IBM Software Lab, Bangalore, India.