All of lore.kernel.org
 help / color / mirror / Atom feed
From: george anzinger <george@mvista.com>
To: Andrew Morton <akpm@digeo.com>
Cc: Linus Torvalds <torvalds@transmeta.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20
Date: Tue, 10 Dec 2002 00:30:36 -0800	[thread overview]
Message-ID: <3DF5A62C.242E171@mvista.com> (raw)
In-Reply-To: 3DF44E98.DD173EE8@digeo.com

Andrew Morton wrote:
> 
> george anzinger wrote:
> >
> > Andrew Morton wrote:
> > >
> > > george anzinger wrote:
> > > >
> > > > --- linux-2.5.50-bk7-kb/include/linux/id_reuse.h        Wed Dec 31 16:00:00 1969
> > > > +++ linux/include/linux/id_reuse.h      Sat Dec  7 21:37:58 2002
> > >
> > > Maybe I'm thick, but this whole id_resue layer seems rather obscure.
> > >
> > > As it is being positioned as a general-purpose utility it needs
> > > API documentation as well as a general description.
> >
> > Hm... This whole thing came up to solve and issue related to
> > having a finite number of timers.  The ID layer is just a
> > way of saving a pointer to a given "thing" (a timer
> > structure in this case) in a way that it can be recovered
> > quickly.  It is really just a tree structure with 32
> > branches (or is it sizeof long branches) at each node.
> > There is a bit map to indicate if any free slots are
> > available and if so under which branch.  This makes
> > allocation of a new ID quite fast.  The "reuse" thing is
> > there to separate it from the original code which
> > "attempted" to not reuse and ID for some time.
> 
> Sounds a bit like the pid allocator?
> 
> Is the "don't reuse an ID for some time" requirement still there?

I don't see the need for the "don't reuse an ID for some
time" thing and it looked like what Jim had messed up the
book keeping AND it also looked like it failed to actually
work.  All of this convinced me that the added complexity
was just not worth it.
> 
> I think you can use radix trees for this.  Just put the pointer
> to your "thing" direct into the tree.  The space overhead will
> be about the same.
> 
> radix-trees do not currently have a "find next empty slot from this
> offset" function but that is quite straightforward.  Not quite
> as fast, unless an occupancy bitmap is added to the radix-tree
> node.  That's something whcih I have done before - in fact it was
> an array of occupancy maps so I could do an efficient in-order
> gang lookup of "all dirty pages from this offset" and "all locked
> pages from this offset".  It was before its time, and mouldered.

Gosh, I think this is what I have.  Is it already in the
kernel tree somewhere?  Oh, I found it.  I will look at
this, tomorrow...

-g
> 
> > ...
> > > A lot of the functions in this header are too large to be inlined.
> >
> > Hm...  What is "too large", i.e. how much code.
> 
> A few lines, I suspect.
> 
> >  Also, is it used more than once?
> 
> Don't trust the compiler too much ;)  Uninlining mpage_writepage()
> saved a couple of hundred bytes of code, even though it has only
> one call site.
> 
> > ...
> > > Please, just open-code the locking.  This simply makes it harder to follow the
> > > main code.
> >
> > But makes it easy to change the lock method, to, for
> > example, use irq or irqsave or "shudder" RCU.
> 
> A diligent programmer would visit all sites as part of that conversion
> anyway.
> 
> > >
> > > > +
> > > > +static struct idr_layer *id_free;
> > > > +static int id_free_cnt;
> > >
> > > hm.  We seem to have a global private freelist here.  Is the more SMP-friendly
> > > slab not suitable?
> >
> > There is a short local free list to avoid calling slab with
> > a spinlock held.  Only enough entries are kept to allocate a
> > new node at each branch from the root to leaf, and only for
> > this reason.
> 
> Fair enough. There are similar requirements elsewhere and the plan
> there is to create a page reservation API, so you can ensure that
> the page allocator will be able to provide at least N pages.  Then
> take the lock and go for it.
> 
> I have code for that which is about to bite the bit bucket.   But the
> new version should be in place soon.   Other users will be radix tree
> nodes, pte_chains and mm_chains (shared pagetable patch).
> 
> > ...
> > >
> > > Recursion!
> >
> > Yes, it is a tree after all.
> 
> lib/radix_tree.c does everything iteratively.
> 
> > >
> > > > +void idr_init(struct idr *idp)
> > >
> > > Please tell us a bit about this id layer: what problems it solves, how it
> > > solves them, why it is needed and why existing kernel facilities are
> > > unsuitable.
> > >
> > The prior version of the code had a CONFIG option to set the
> > maximum number of timers.  This caused enough memory to be
> > "compiled" in to keep pointers to this many timers.  The ID
> > layer was invented (by Jim Houston, by the way) to eliminate
> > this CONFIG thing.  If I were to ask for a capability from
> > slab that would eliminate the need for this it would be the
> > ability to, given an address and a slab pool, to validate
> > that the address was "live" and from that pool.  I.e. that
> > the address is a pointer to currently allocated block from
> > that memory pool.  With this, I could just pass the address
> > to the user as the timer_id.
> 
> That might cause problems with 64-bit kernel/32-bit userspace.
> Passing out kernel addresses in this way may have other problems..
> 
> >  As it is, I need a way to give
> > the user a handle that he can pass back that will allow me
> > to quickly find his timer and, along the way, validate that
> > he was not spoofing, or just plain confused.
> >
> > So what the ID layer does is pass back an available <id>
> > (which I can pass to the user) while storing a pointer to
> > the timer which is <id>ed.  Later, given the <id>, it passes
> > back the pointer, or NULL if the id is not in use.
> 
> OK.
> 
> > As I said above, the pointers are kept in "nodes" of 32
> > along with a few bits of overhead, and these are arranged in
> > a dynamic tree which grows as the number of allocated timers
> > increases.  The depth of the tree is 1 for up to 32 , 2 for
> > up to 1024, and so on.  The depth can never get beyond 5, by
> > which time the system will, long since, be out of memory.
> > At this time the leaf nodes are release when empty but the
> > branch nodes are not.  (This is an enhancement saved for
> > later, if it seems useful.)
> >
> > I am open to a better method that solves the problem...
> 
> It seems reasonable.  It would be nice to be able to use radix trees,
> but that's a lot of work if the patch isn't going anywhere.
> 
> If radix trees are unsuitable then yes, dressing this up as a
> new core kernel capability (documentation!  separate patch!)
> would be appropriate.
> 
> But I suspect the radix-tree _will_ suit, and it would be nice to grow
> the usefulness of radix-trees rather than creating similar-but-different
> trees.  We can do whizzy things with radix-trees; more than at present.
> 
> Of course, that was only a teeny part of your patch. I just happened
> to spy it as it flew past.  Given that you're at rev 20, perhaps a
> splitup and more accessible presentation would help.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

  reply	other threads:[~2002-12-10  8:23 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-25 20:01 [PATCH 2/3] High-res-timers part 2 (x86 platform code) take 7 george anzinger
2002-10-25 21:47 ` Nicholas Wourms
2002-10-25 23:04   ` [PATCH 2/3] ac3 fix " george anzinger
2002-10-25 22:00 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) " george anzinger
2002-10-29 19:37 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 8 george anzinger
2002-10-29 19:58 ` george anzinger
2002-10-30 19:42 ` george anzinger
2002-10-30 19:59 ` george anzinger
2002-10-31 10:53 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 10 george anzinger
2002-10-31 17:57 ` george anzinger
2002-11-04 21:12 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 11 george anzinger
2002-11-05 10:58 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 12 george anzinger
2002-11-06  6:32 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 13 george anzinger
2002-11-13 18:37 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 14 george anzinger
2002-11-18 21:56 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 15 george anzinger
2002-11-21 10:29 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 16 george anzinger
2002-11-25 20:17 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 17 george anzinger
2002-11-28  0:43 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 18 george anzinger
2002-12-06  9:32 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 19 george anzinger
2002-12-08  7:48 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20 george anzinger
2002-12-08 23:34   ` Andrew Morton
2002-12-09  7:38     ` george anzinger
2002-12-09  8:04       ` Andrew Morton
2002-12-10  8:30         ` george anzinger [this message]
2002-12-10  9:24           ` Andrew Morton
2002-12-10  9:51             ` William Lee Irwin III
2002-12-10 23:39             ` george anzinger
2002-12-10 15:14           ` Joe Korty
2002-12-10 22:57             ` george anzinger
2002-12-09 12:34       ` george anzinger
2002-12-09 19:40       ` Andrew Morton
2002-12-09  9:48 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20.1 george anzinger
2002-12-20  9:52 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 21 george anzinger
2002-12-30 23:51 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 22 george anzinger
2003-01-04  0:29 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 23 george anzinger
2003-01-08 23:12 ` [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 24 george anzinger
  -- strict thread matches above, loose matches on Subject: below --
2002-12-09 15:24 [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20 Jim Houston
2002-12-09 22:35 ` William Lee Irwin III
2002-12-10  1:55   ` Jim Houston
2002-12-10  2:11     ` William Lee Irwin III
2002-12-10 16:41       ` Jim Houston

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DF5A62C.242E171@mvista.com \
    --to=george@mvista.com \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.