Re: [RFC][PATCH] flexible array implementation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: containers@lists.linux-foundation.org, bblum@google.com,
	linux-kernel@vger.kernel.org, menage@google.com
Subject: Re: [RFC][PATCH] flexible array implementation
Date: Tue, 21 Jul 2009 14:56:51 -0700	[thread overview]
Message-ID: <1248213411.13249.5669.camel@nimitz> (raw)
In-Reply-To: <20090721131839.27f3a5aa.akpm@linux-foundation.org>

On Tue, 2009-07-21 at 13:18 -0700, Andrew Morton wrote:
> On Tue, 21 Jul 2009 09:03:33 -0700
> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> > Once a structure goes over PAGE_SIZE*2, we see occasional
> > allocation failures.  Some people have chosen to switch
> > over to things like vmalloc() that will let them keep
> > array-like access to such a large structures.  But,
> > vmalloc() has plenty of downsides.
> 
> Thanks for looking into this.
> 
> > Here's an alternative.  I think it's what Andrew was
> > suggesting  here:
> > 
> > 	http://lkml.org/lkml/2009/7/2/518 
> > 
> > I call it a flexible array.  It does all of its work in
> > PAGE_SIZE bits, so never does an order>0 allocation.
> > The base level has PAGE_SIZE-2*sizeof(int) bytes of
> > storage for pointers to the second level.  So, with a
> > 32-bit arch, you get about 4MB (4183112 bytes) of total
> > storage when the objects pack nicely into a page.  It
> > is half that on 64-bit because the pointers are twice
> > the size.
> > 
> > The interface is dirt simple.  4 functions:
> > 	alloc_flex_array()
> > 	free_flex_array()
> 
> flex_array_alloc() and flex_array_free(), please.

Will do.

> The interface is rather unexpected.  Some callers will want
> random-access writes and will have sparse data sets.  Can we make it
> more array-like?

I changed the flex_array_put() function to take an index and added a
flex_array_append() that just calls flex_array_put() along with the
->nr_elements variable manipulations.

> What are the constraints of this implementation?  Maximum index, maximum
> number of elements, etc?
> 
> What are the locking rules?  Caller-provided, obviously (and
> correctly).  If the caller wants to use a spinlock the caller must use
> GFP_ATOMIC and handle the fallout when that fails.  (But they'd need to
> handle the fallout with mutex/GFP_KERNEL too).

I've added some comments in the kerneldoc for the (newly renamed) alloc
function:

 * The maximum number of elements is currently the number of elements
 * that can be stored in a page times the number of page pointers
 * that we can fit in the base structure or (using integer math):
 *
 *      (PAGE_SIZE/element_size) * (PAGE_SIZE-8)/sizeof(void *)
 *
 * Here's a table showing example capacities.  Note that the maximum
 * index that the get/put() functions is just nr_objects-1.
 *
 * Element size | Objects  | Objects |
 * PAGE_SIZE=4k |  32-bit  |  64-bit |
 * ----------------------------------|
 *      1 byte  |  4186112 | 2093056 |
 *      2 bytes |  2093056 | 1046528 |
 *      3 bytes |  1395030 |  697515 |
 *      4 bytes |  1046528 |  523264 |
 *     32 bytes |   130816 |   65408 |
 *     33 bytes |   126728 |   63364 |
 *   2048 bytes |     2044 |   10228 |
 *   2049 bytes |     1022 |     511 |
 *       void * |  1046528 |  261632 |
 *
 * Since 64-bit pointers are twice the size, we lose half the
 * capacity in the base structure.  Also note that no effort is made
 * to efficiently pack objects across page boundaries.

I figure it's less likely to get lost there than in the patch
description.

> > We could also potentially just pass the "element_size"
> > into each of the API functions instead of storing it
> > internally.  That would get us one more base pointer
> > on 32-bit.
> > 
> > The last improvement that I thought about was letting
> > the individual array members span pages.  In this
> > implementation, if you have a 2049-byte object, it
> > will only pack one of them into each "part" with
> > no attempt to pack them.  At this point, I don't think
> > the added complexity would be worth it.
> 
> I expect the majority of callers will be storing plain old pointers in here.
> 
> In fact the facility would be quite useful if it explicitly stored and
> returned void*'s, like radix-tree and IDR.

I think that would be just as easy as sticking some helpers around it

int flex_array_put_ptr(struct flex_array *fa, int element_nr,
		       void *ptr, gfp_t flags)
{
	return flex_array_put(fa, element_nr, &ptr, flags);
}
void *flex_array_get_ptr(struct flex_array *fa, int element_nr)
{
	return *(void **)flex_array_get(fa, element_nr);
}

Or, as you say, we may not want the store-by-value semantics at all.  In
that case, we can just do this by default.

> Do we know of any potential callers which would want flex_array to
> store elements by value in this manner?

The original user I was thinking of was the one that spawned this idea,
the pid_t:

>         if (PIDLIST_REALLOC_DIFFERENCE(length, dest)) {
> -               newlist = kmalloc(dest * sizeof(pid_t), GFP_KERNEL);
> +               newlist = pidlist_allocate(dest);
>                 if (newlist) {
>                         memcpy(newlist, list, dest * sizeof(pid_t));
> -                       kfree(list);
> +                       pidlist_free(list);
>                         *p = newlist;
>                 }

I think they wanted elements by value.

> > +struct flex_array *alloc_flex_array(int element_size, int total, gfp_t flags)
> > +{
> > +	struct flex_array *ret;
> > +	int max_size = __nr_part_ptrs() * __elements_per_part(element_size);
> > +
> > +	/* max_size will end up 0 if element_size > PAGE_SIZE */
> > +	if (total > max_size)
> > +		return NULL;
> > +	ret = kzalloc(sizeof(struct flex_array), flags);
> > +	if (!ret)
> > +		return NULL;
> > +	ret->element_size = element_size;
> > +	return ret;
> > +}
> 
> I expect that a decent proportion of users of this facility will only
> ever want a single flex_array.  So they'll want to be able to define and
> initialise their flex_array at compile-time.
> 
> That looks pretty easy to do?

Initializing the base structure should be pretty easy.  I've included a
FLEX_ARRAY_INIT() function to do it.

New patch on the way...

-- Dave

next prev parent reply	other threads:[~2009-07-21 21:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-21 16:03 [RFC][PATCH] flexible array implementation Dave Hansen
2009-07-21 16:47 ` Mike Waychison
2009-07-21 17:22   ` Dave Hansen
2009-07-21 16:57 ` Denys Vlasenko
2009-07-21 17:25   ` Dave Hansen
2009-07-21 20:18 ` Andrew Morton
2009-07-21 21:56   ` Dave Hansen [this message]
2009-07-21 22:33     ` Andrew Morton
2009-07-21 20:28 ` Paul Menage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1248213411.13249.5669.camel@nimitz \
    --to=dave@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bblum@google.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox