All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: george anzinger <george@mvista.com>
Cc: Linus Torvalds <torvalds@transmeta.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20
Date: Mon, 09 Dec 2002 00:04:40 -0800	[thread overview]
Message-ID: <3DF44E98.DD173EE8@digeo.com> (raw)
In-Reply-To: 3DF4487C.67FD90EF@mvista.com

george anzinger wrote:
> 
> Andrew Morton wrote:
> >
> > george anzinger wrote:
> > >
> > > --- linux-2.5.50-bk7-kb/include/linux/id_reuse.h        Wed Dec 31 16:00:00 1969
> > > +++ linux/include/linux/id_reuse.h      Sat Dec  7 21:37:58 2002
> >
> > Maybe I'm thick, but this whole id_resue layer seems rather obscure.
> >
> > As it is being positioned as a general-purpose utility it needs
> > API documentation as well as a general description.
> 
> Hm... This whole thing came up to solve and issue related to
> having a finite number of timers.  The ID layer is just a
> way of saving a pointer to a given "thing" (a timer
> structure in this case) in a way that it can be recovered
> quickly.  It is really just a tree structure with 32
> branches (or is it sizeof long branches) at each node.
> There is a bit map to indicate if any free slots are
> available and if so under which branch.  This makes
> allocation of a new ID quite fast.  The "reuse" thing is
> there to separate it from the original code which
> "attempted" to not reuse and ID for some time.

Sounds a bit like the pid allocator?

Is the "don't reuse an ID for some time" requirement still there?

I think you can use radix trees for this.  Just put the pointer
to your "thing" direct into the tree.  The space overhead will
be about the same.

radix-trees do not currently have a "find next empty slot from this
offset" function but that is quite straightforward.  Not quite
as fast, unless an occupancy bitmap is added to the radix-tree
node.  That's something whcih I have done before - in fact it was
an array of occupancy maps so I could do an efficient in-order
gang lookup of "all dirty pages from this offset" and "all locked
pages from this offset".  It was before its time, and mouldered.

> ...
> > A lot of the functions in this header are too large to be inlined.
> 
> Hm...  What is "too large", i.e. how much code.

A few lines, I suspect.

>  Also, is it used more than once?

Don't trust the compiler too much ;)  Uninlining mpage_writepage()
saved a couple of hundred bytes of code, even though it has only
one call site.

> ...
> > Please, just open-code the locking.  This simply makes it harder to follow the
> > main code.
> 
> But makes it easy to change the lock method, to, for
> example, use irq or irqsave or "shudder" RCU.

A diligent programmer would visit all sites as part of that conversion
anyway.

> >
> > > +
> > > +static struct idr_layer *id_free;
> > > +static int id_free_cnt;
> >
> > hm.  We seem to have a global private freelist here.  Is the more SMP-friendly
> > slab not suitable?
> 
> There is a short local free list to avoid calling slab with
> a spinlock held.  Only enough entries are kept to allocate a
> new node at each branch from the root to leaf, and only for
> this reason.

Fair enough. There are similar requirements elsewhere and the plan
there is to create a page reservation API, so you can ensure that
the page allocator will be able to provide at least N pages.  Then
take the lock and go for it.

I have code for that which is about to bite the bit bucket.   But the
new version should be in place soon.   Other users will be radix tree
nodes, pte_chains and mm_chains (shared pagetable patch).

> ...
> >
> > Recursion!
> 
> Yes, it is a tree after all.

lib/radix_tree.c does everything iteratively.

> >
> > > +void idr_init(struct idr *idp)
> >
> > Please tell us a bit about this id layer: what problems it solves, how it
> > solves them, why it is needed and why existing kernel facilities are
> > unsuitable.
> >
> The prior version of the code had a CONFIG option to set the
> maximum number of timers.  This caused enough memory to be
> "compiled" in to keep pointers to this many timers.  The ID
> layer was invented (by Jim Houston, by the way) to eliminate
> this CONFIG thing.  If I were to ask for a capability from
> slab that would eliminate the need for this it would be the
> ability to, given an address and a slab pool, to validate
> that the address was "live" and from that pool.  I.e. that
> the address is a pointer to currently allocated block from
> that memory pool.  With this, I could just pass the address
> to the user as the timer_id.

That might cause problems with 64-bit kernel/32-bit userspace.
Passing out kernel addresses in this way may have other problems..

>  As it is, I need a way to give
> the user a handle that he can pass back that will allow me
> to quickly find his timer and, along the way, validate that
> he was not spoofing, or just plain confused.
> 
> So what the ID layer does is pass back an available <id>
> (which I can pass to the user) while storing a pointer to
> the timer which is <id>ed.  Later, given the <id>, it passes
> back the pointer, or NULL if the id is not in use.

OK.
 
> As I said above, the pointers are kept in "nodes" of 32
> along with a few bits of overhead, and these are arranged in
> a dynamic tree which grows as the number of allocated timers
> increases.  The depth of the tree is 1 for up to 32 , 2 for
> up to 1024, and so on.  The depth can never get beyond 5, by
> which time the system will, long since, be out of memory.
> At this time the leaf nodes are release when empty but the
> branch nodes are not.  (This is an enhancement saved for
> later, if it seems useful.)
> 
> I am open to a better method that solves the problem...

It seems reasonable.  It would be nice to be able to use radix trees,
but that's a lot of work if the patch isn't going anywhere.

If radix trees are unsuitable then yes, dressing this up as a
new core kernel capability (documentation!  separate patch!)
would be appropriate.

But I suspect the radix-tree _will_ suit, and it would be nice to grow
the usefulness of radix-trees rather than creating similar-but-different
trees.  We can do whizzy things with radix-trees; more than at present.

Of course, that was only a teeny part of your patch. I just happened
to spy it as it flew past.  Given that you're at rev 20, perhaps a
splitup and more accessible presentation would help.

  reply	other threads:[~2002-12-09 22:21 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-25 20:01 [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 7 george anzinger
2002-10-25 21:47 ` Nicholas Wourms
2002-10-25 23:04   ` [PATCH 2/3] ac3 fix " george anzinger
2002-10-25 22:00 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) " george anzinger
2002-10-29 19:37 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 8 george anzinger
2002-10-29 19:58 ` george anzinger
2002-10-30 19:42 ` george anzinger
2002-10-30 19:59 ` george anzinger
2002-10-31 10:53 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 10 george anzinger
2002-10-31 17:57 ` george anzinger
2002-11-04 21:12 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 11 george anzinger
2002-11-05 10:58 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 12 george anzinger
2002-11-06  6:32 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 13 george anzinger
2002-11-13 18:37 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 14 george anzinger
2002-11-18 21:56 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 15 george anzinger
2002-11-21 10:29 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 16 george anzinger
2002-11-25 20:17 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 17 george anzinger
2002-11-28  0:43 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 18 george anzinger
2002-12-06  9:32 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 19 george anzinger
2002-12-08  7:48 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20 george anzinger
2002-12-08 23:34   ` Andrew Morton
2002-12-09  7:38     ` george anzinger
2002-12-09  8:04       ` Andrew Morton [this message]
2002-12-10  8:30         ` george anzinger
2002-12-10  9:24           ` Andrew Morton
2002-12-10  9:51             ` William Lee Irwin III
2002-12-10 23:39             ` george anzinger
2002-12-10 15:14           ` Joe Korty
2002-12-10 22:57             ` george anzinger
2002-12-09 12:34       ` george anzinger
2002-12-09 19:40       ` Andrew Morton
2002-12-09  9:48 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20.1 george anzinger
2002-12-20  9:52 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 21 george anzinger
2002-12-30 23:51 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 22 george anzinger
2003-01-04  0:29 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 23 george anzinger
2003-01-08 23:12 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 24 george anzinger
  -- strict thread matches above, loose matches on Subject: below --
2002-12-09 15:24 [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20 Jim Houston
2002-12-09 22:35 ` William Lee Irwin III
2002-12-10  1:55   ` Jim Houston
2002-12-10  2:11     ` William Lee Irwin III
2002-12-10 16:41       ` Jim Houston

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DF44E98.DD173EE8@digeo.com \
    --to=akpm@digeo.com \
    --cc=george@mvista.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.