Re: [patch] arbitrary size memory allocator, memarea-2.4.15-D6

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: <mingo@elte.hu>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Subject: Re: [patch] arbitrary size memory allocator, memarea-2.4.15-D6
Date: 17 Nov 2001 11:00:38 -0700	[thread overview]
Message-ID: <m14rnt9t15.fsf@frodo.biederman.org> (raw)
In-Reply-To: <Pine.LNX.4.33.0111121714100.14093-200000@localhost.localdomain>
In-Reply-To: <Pine.LNX.4.33.0111121714100.14093-200000@localhost.localdomain>

Ingo Molnar <mingo@elte.hu> writes:

> in the past couple of years the buddy allocator has started to show
> limitations that are hurting performance and flexibility.
> 
> eg. one of the main reasons why we keep MAX_ORDER at an almost obscenely
> high level is the fact that we occasionally have to allocate big,
> physically continuous memory areas. We do not realistically expect to be
> able to allocate such high-order pages after bootup, still every page
> allocation carries the cost of it. And even with MAX_ORDER at 10, large
> RAM boxes have hit this limit and are hurting visibly - as witnessed by
> Anton. Falling back to vmalloc() is not a high-quality option, due to the
> TLB-miss overhead.

And additionally vmalloc is nearly as subject to fragmentation as
contiguous memory is.  And on some machines the amount of memory
dedicated to vmalloc is comparatively small. 128M or so.

> If we had an allocator that could handle large, rare but
> performance-insensitive allocations, then we could decrease MAX_ORDER back
> to 5 or 6, which would result in less cache-footprint and faster operation
> of the page allocator.

It definitely sounds reasonable.  A special allocator for a hard and
different case. 

> Obviously, alloc_memarea() can be pretty slow if RAM is getting full, nor
> does it guarantee allocation, so for non-boot allocations other backup
> mechanizms have to be used, such as vmalloc(). It is not a replacement for
> the buddy allocator - it's not intended for frequent use.

If we can fix it so that this allocator works well enough that you
don't need a backup allocator but instead when this fails you can
pretty much figure that you couldn't allocate what you are after
then it has a much better chance of being useful.

> alloc_memarea() tries to optimize away as much as possible from linear
> scanning of zone mem-maps, but the worst-case scenario is that it has to
> iterate over all pages - which can be ~256K iterations if eg. we search on
> a 1 GB box.

Hmm.  Can't you assume that buddies are coalesced?

> possible future improvements:
> 
> - alloc_memarea() could zap clean pagecache pages as well.
> 
> - if/once reverse pte mappings are added, alloc_memarea() could also
>   initiate the swapout of anonymous & dirty pages. These modifications
>   would make it pretty likely to succeed if the allocation size is
>   realistic.

Except for anonymous pages we have perfectly serviceable reverse
mappings.  They are slow but this is a performance insensitive
allocator so it shouldn't be a big deal to use page->address_space->i_mmap.

But I suspect you could get farther by generating a zone on the fly
for the area you want to free up, and using the normal mechanisms,
or a slight variation on them to free up all the pages in that
area.

> - possibly add 'alignment' and 'offset' to the __alloc_memarea()
>   arguments, to possibly create a given alignment for the memarea, to
>   handle really broken hardware and possibly result in better page
>   coloring as well.
> 
> - if we extended the buddy allocator to have a page-granularity bitmap as
>   well, then alloc_memarea() could search for physically continuous page
>   areas *much* faster. But this creates a real runtime (and cache
>   footprint) overhead in the buddy allocator.

I don't see the need to make this allocator especially fast so I doubt
that would really help.

> i've tested the patch pretty thoroughly on big and small RAM boxes. The
> patch is against 2.4.15-pre3.
> 
> Reports, comments, suggestions welcome,

See above.

Eric

next prev parent reply	other threads:[~2001-11-17 18:20 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.33.0111081802380.15975-100000@localhost.localdomain.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.33.0111081836080.15975-100000@localhost.localdomain.suse.lists.linux.kernel>
2001-11-08 23:00   ` speed difference between using hard-linked and modular drives? Andi Kleen
2001-11-09  0:05     ` Anton Blanchard
2001-11-09  5:45       ` Andi Kleen
2001-11-09  6:04         ` David S. Miller
2001-11-09  6:39           ` Andi Kleen
2001-11-09  6:54             ` Andrew Morton
2001-11-09  7:17               ` David S. Miller
2001-11-09  7:16                 ` Andrew Morton
2001-11-09  7:24                   ` David S. Miller
2001-11-09  8:21                   ` Ingo Molnar
2001-11-09  7:35                     ` Andrew Morton
2001-11-09  7:44                       ` David S. Miller
2001-11-09  7:14             ` David S. Miller
2001-11-09  7:16             ` David S. Miller
2001-11-09 12:59               ` Alan Cox
2001-11-09 12:54                 ` David S. Miller
2001-11-09 13:15                   ` Philip Dodd
2001-11-09 13:26                     ` David S. Miller
2001-11-09 20:45                       ` Mike Fedyk
2001-11-09 13:17                   ` Andi Kleen
2001-11-09 13:25                     ` David S. Miller
2001-11-09 13:39                       ` Andi Kleen
2001-11-09 13:41                         ` David S. Miller
2001-11-10  5:20               ` Anton Blanchard
2001-11-10  4:56             ` Anton Blanchard
2001-11-10  5:09               ` Andi Kleen
2001-11-10 13:29               ` David S. Miller
2001-11-10 13:44                 ` David S. Miller
2001-11-10 13:52                 ` David S. Miller
2001-11-10 14:29                   ` Numbers: ext2/ext3/reiser Performance (ext3 is slow) Oktay Akbal
2001-11-10 14:47                     ` arjan
2001-11-10 17:41                       ` Oktay Akbal
2001-11-10 17:56                         ` Arjan van de Ven
2001-11-15 17:24                         ` Stephen C. Tweedie
2001-11-12 16:59               ` [patch] arbitrary size memory allocator, memarea-2.4.15-D6 Ingo Molnar
2001-11-12 18:19                 ` Jeff Garzik
2001-11-12 23:26                   ` Ingo Molnar
2001-11-13 15:59                   ` Riley Williams
2001-11-14 20:49                     ` Tom Gall
2001-11-15  1:11                     ` Anton Blanchard
2001-11-17 18:00                 ` Eric W. Biederman [this message]
2001-11-09  3:12     ` speed difference between using hard-linked and modular drives? Rusty Russell
2001-11-09  5:59       ` Andi Kleen
2001-11-09 11:16       ` Helge Hafting
2001-11-12  9:59         ` Rusty Russell
2001-11-12 23:23           ` David S. Miller
2001-11-12 23:14             ` Rusty Russell
2001-11-13  1:30               ` Mike Fedyk
2001-11-13  1:15                 ` David Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m14rnt9t15.fsf@frodo.biederman.org \
    --to=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox