From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261599AbULTSHX (ORCPT ); Mon, 20 Dec 2004 13:07:23 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261593AbULTSHM (ORCPT ); Mon, 20 Dec 2004 13:07:12 -0500 Received: from e6.ny.us.ibm.com ([32.97.182.146]:3545 "EHLO e6.ny.us.ibm.com") by vger.kernel.org with ESMTP id S261599AbULTSGN (ORCPT ); Mon, 20 Dec 2004 13:06:13 -0500 Date: Mon, 20 Dec 2004 23:50:57 +0530 From: Ravikiran G Thirumalai To: Manfred Spraul Cc: Linux Kernel Mailing List Subject: Re: [RFC] Reimplementation of linux dynamic percpu memory allocator Message-ID: <20041220182057.GA16859@in.ibm.com> References: <41C35DD6.1050804@colorfullife.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <41C35DD6.1050804@colorfullife.com> User-Agent: Mutt/1.4i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 17, 2004 at 11:29:42PM +0100, Manfred Spraul wrote: > > > Probably the right approach. slab should use per-cpu for it's internal > head arrays, but I've never converted the slab code due to > chicken-and-egg problems and due to the additional pointer dereference. > > >+ * Allocator is slow -- expected to be called during module/subsytem > >+ * init. alloc_percpu can block. > > > > > How slow is slow? Haven't measured it, but the allocator is not designed for speed. Once the block to be alloced from is identified, the allocator builds and sorts a map of objects in ascending order so that we allocate from the smallest chunk. Goal is to enhance memory/cacheline utlization and reduce fragmentation rather than speed. It is not expected that the allocator will be used from the fastpath. > I think the block subsystem uses alloc_percpu for some statistics > counters, i.e. one alloc during creation of a new disk. The slab > implementation was really slow and that caused problems with LVM (or > something like that) stress tests. Hmmm..I knew from some experiments earlier that access to per cpu versions of memory was slow with the slab based implementation -- which this patch addresses, but I didn't know allocs themselves were slow... Creation of a disk should not be a fast path no? > > >+ /* Map pages for each cpu by splitting vm_struct for each cpu */ > >+ for (i = 0; i < NR_CPUS; i++) { > >+ if (cpu_possible(i)) { > >+ tmppage = &blkp->pages[i*cpu_pages]; > >+ tmp.addr = area->addr + i * PCPU_BLKSIZE; > >+ /* map_vm_area assumes a guard page of size > >PAGE_SIZE */ > >+ tmp.size = PCPU_BLKSIZE + PAGE_SIZE; > >+ if (map_vm_area(&tmp, PAGE_KERNEL, &tmppage)) > >+ goto fail_map; > > > > > That means no large pte entries for the per-cpu allocations, right? > I think that's a bad idea for non-numa systems. What about a fallback to > simple getfreepages() for non-numa systems? Can we have large pte entries with PAGE_SIZEd pages? > >+ * This allocator is slow as we assume allocs to come > >+ * by during boot/module init. > >+ * Should not be called from interrupt context > > > > > "Must not" - it contains down() and thus can sleep. > :D Yes will replace 'should not' with 'must not' in my next iteration. Thanks for the comments and feedback. Kiran