From: Ravikiran G Thirumalai <kiran@in.ibm.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
Date: Mon, 20 Dec 2004 23:50:57 +0530 [thread overview]
Message-ID: <20041220182057.GA16859@in.ibm.com> (raw)
In-Reply-To: <41C35DD6.1050804@colorfullife.com>
On Fri, Dec 17, 2004 at 11:29:42PM +0100, Manfred Spraul wrote:
> >
> Probably the right approach. slab should use per-cpu for it's internal
> head arrays, but I've never converted the slab code due to
> chicken-and-egg problems and due to the additional pointer dereference.
>
> >+ * Allocator is slow -- expected to be called during module/subsytem
> >+ * init. alloc_percpu can block.
> >
> >
> How slow is slow?
Haven't measured it, but the allocator is not designed for speed.
Once the block to be alloced from is identified, the allocator builds
and sorts a map of objects in ascending order so that we allocate
from the smallest chunk. Goal is to enhance memory/cacheline utlization
and reduce fragmentation rather than speed. It is not expected that
the allocator will be used from the fastpath.
> I think the block subsystem uses alloc_percpu for some statistics
> counters, i.e. one alloc during creation of a new disk. The slab
> implementation was really slow and that caused problems with LVM (or
> something like that) stress tests.
Hmmm..I knew from some experiments earlier that access to per cpu versions
of memory was slow with the slab based implementation -- which this patch
addresses, but I didn't know allocs themselves were slow...
Creation of a disk should not be a fast path no?
>
> >+ /* Map pages for each cpu by splitting vm_struct for each cpu */
> >+ for (i = 0; i < NR_CPUS; i++) {
> >+ if (cpu_possible(i)) {
> >+ tmppage = &blkp->pages[i*cpu_pages];
> >+ tmp.addr = area->addr + i * PCPU_BLKSIZE;
> >+ /* map_vm_area assumes a guard page of size
> >PAGE_SIZE */
> >+ tmp.size = PCPU_BLKSIZE + PAGE_SIZE;
> >+ if (map_vm_area(&tmp, PAGE_KERNEL, &tmppage))
> >+ goto fail_map;
> >
> >
> That means no large pte entries for the per-cpu allocations, right?
> I think that's a bad idea for non-numa systems. What about a fallback to
> simple getfreepages() for non-numa systems?
Can we have large pte entries with PAGE_SIZEd pages?
> >+ * This allocator is slow as we assume allocs to come
> >+ * by during boot/module init.
> >+ * Should not be called from interrupt context
> >
> >
> "Must not" - it contains down() and thus can sleep.
>
:D Yes will replace 'should not' with 'must not' in my next iteration.
Thanks for the comments and feedback.
Kiran
next prev parent reply other threads:[~2004-12-20 18:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-17 22:29 [RFC] Reimplementation of linux dynamic percpu memory allocator Manfred Spraul
2004-12-20 18:20 ` Ravikiran G Thirumalai [this message]
2004-12-20 18:24 ` Manfred Spraul
2004-12-20 19:25 ` Ravikiran G Thirumalai
2004-12-29 16:33 ` Manfred Spraul
2004-12-29 17:52 ` Ravikiran G Thirumalai
2005-01-12 18:12 ` Ravikiran G Thirumalai
-- strict thread matches above, loose matches on Subject: below --
2004-12-17 22:03 Ravikiran G Thirumalai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041220182057.GA16859@in.ibm.com \
--to=kiran@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.