From: Ravikiran G Thirumalai <kiran@in.ibm.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] Reimplementation of linux dynamic percpu memory allocator
Date: Mon, 20 Dec 2004 23:50:57 +0530 [thread overview]
Message-ID: <20041220182057.GA16859@in.ibm.com> (raw)
In-Reply-To: <41C35DD6.1050804@colorfullife.com>
On Fri, Dec 17, 2004 at 11:29:42PM +0100, Manfred Spraul wrote:
> >
> Probably the right approach. slab should use per-cpu for it's internal
> head arrays, but I've never converted the slab code due to
> chicken-and-egg problems and due to the additional pointer dereference.
>
> >+ * Allocator is slow -- expected to be called during module/subsytem
> >+ * init. alloc_percpu can block.
> >
> >
> How slow is slow?
Haven't measured it, but the allocator is not designed for speed.
Once the block to be alloced from is identified, the allocator builds
and sorts a map of objects in ascending order so that we allocate
from the smallest chunk. Goal is to enhance memory/cacheline utlization
and reduce fragmentation rather than speed. It is not expected that
the allocator will be used from the fastpath.
> I think the block subsystem uses alloc_percpu for some statistics
> counters, i.e. one alloc during creation of a new disk. The slab
> implementation was really slow and that caused problems with LVM (or
> something like that) stress tests.
Hmmm..I knew from some experiments earlier that access to per cpu versions
of memory was slow with the slab based implementation -- which this patch
addresses, but I didn't know allocs themselves were slow...
Creation of a disk should not be a fast path no?
>
> >+ /* Map pages for each cpu by splitting vm_struct for each cpu */
> >+ for (i = 0; i < NR_CPUS; i++) {
> >+ if (cpu_possible(i)) {
> >+ tmppage = &blkp->pages[i*cpu_pages];
> >+ tmp.addr = area->addr + i * PCPU_BLKSIZE;
> >+ /* map_vm_area assumes a guard page of size
> >PAGE_SIZE */
> >+ tmp.size = PCPU_BLKSIZE + PAGE_SIZE;
> >+ if (map_vm_area(&tmp, PAGE_KERNEL, &tmppage))
> >+ goto fail_map;
> >
> >
> That means no large pte entries for the per-cpu allocations, right?
> I think that's a bad idea for non-numa systems. What about a fallback to
> simple getfreepages() for non-numa systems?
Can we have large pte entries with PAGE_SIZEd pages?
> >+ * This allocator is slow as we assume allocs to come
> >+ * by during boot/module init.
> >+ * Should not be called from interrupt context
> >
> >
> "Must not" - it contains down() and thus can sleep.
>
:D Yes will replace 'should not' with 'must not' in my next iteration.
Thanks for the comments and feedback.
Kiran
next prev parent reply other threads:[~2004-12-20 18:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-17 22:29 [RFC] Reimplementation of linux dynamic percpu memory allocator Manfred Spraul
2004-12-20 18:20 ` Ravikiran G Thirumalai [this message]
2004-12-20 18:24 ` Manfred Spraul
2004-12-20 19:25 ` Ravikiran G Thirumalai
2004-12-29 16:33 ` Manfred Spraul
2004-12-29 17:52 ` Ravikiran G Thirumalai
2005-01-12 18:12 ` Ravikiran G Thirumalai
-- strict thread matches above, loose matches on Subject: below --
2004-12-17 22:03 Ravikiran G Thirumalai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041220182057.GA16859@in.ibm.com \
--to=kiran@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox