* Common hash table implementation
@ 2001-07-18 0:57 Brian J. Watson
2001-07-18 1:34 ` Larry McVoy
2001-07-18 9:48 ` Richard Guenther
0 siblings, 2 replies; 16+ messages in thread
From: Brian J. Watson @ 2001-07-18 0:57 UTC (permalink / raw)
To: Linux Kernel, schoebel
A couple of days ago, I was thinking about a common hash table
implementation, ala include/linux/list.h. Then I came across
include/linux/ghash.h, and thought that someone's already done it.
After that I noticed the copyright line said 1997, and a quick check
in cscope showed that nobody's including it.
Does anyone know if this file is worth studying and working with? I
have to wonder if nobody's using it after four years.
Does anyone see a problem with a common hash table implementation?
I've implemented a few hash tables from scratch for our clustering
work, and it's starting to get a little old. Something easy to use
like list.h would be a lot nicer.
--
Brian Watson | "The common people of England... so
Linux Kernel Developer | jealous of their liberty, but like the
Open SSI Clustering Lab | common people of most other countries
Compaq Computer Corp | never rightly considering wherein it
Los Angeles, CA | consists..."
| -Adam Smith, Wealth of Nations, 1776
mailto:Brian.J.Watson@compaq.com
http://opensource.compaq.com/
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Common hash table implementation 2001-07-18 0:57 Common hash table implementation Brian J. Watson @ 2001-07-18 1:34 ` Larry McVoy 2001-07-18 13:46 ` Daniel Phillips 2001-07-22 2:23 ` Rusty Russell 2001-07-18 9:48 ` Richard Guenther 1 sibling, 2 replies; 16+ messages in thread From: Larry McVoy @ 2001-07-18 1:34 UTC (permalink / raw) To: Brian J. Watson; +Cc: Linux Kernel, schoebel We've got a fairly nice hash table interface in BitKeeper that we'd be happy to provide under the GPL. I've always thought it would be cool to have it in the kernel, we use it everywhere. http://bitmover.com:8888//home/bk/bugfixes/src/src/mdbm will let you browse it. The general interface is gdbm() like and there are both file backed and memory backed versions. It was designed to be useful in small and large configs, you can get a hash into 128 bytes if I recall correctly. On Tue, Jul 17, 2001 at 05:57:25PM -0700, Brian J. Watson wrote: > A couple of days ago, I was thinking about a common hash table > implementation, ala include/linux/list.h. Then I came across > include/linux/ghash.h, and thought that someone's already done it. > After that I noticed the copyright line said 1997, and a quick check > in cscope showed that nobody's including it. > > Does anyone know if this file is worth studying and working with? I > have to wonder if nobody's using it after four years. > > Does anyone see a problem with a common hash table implementation? > I've implemented a few hash tables from scratch for our clustering > work, and it's starting to get a little old. Something easy to use > like list.h would be a lot nicer. > > -- > Brian Watson | "The common people of England... so > Linux Kernel Developer | jealous of their liberty, but like the > Open SSI Clustering Lab | common people of most other countries > Compaq Computer Corp | never rightly considering wherein it > Los Angeles, CA | consists..." > | -Adam Smith, Wealth of Nations, 1776 > > mailto:Brian.J.Watson@compaq.com > http://opensource.compaq.com/ > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-18 1:34 ` Larry McVoy @ 2001-07-18 13:46 ` Daniel Phillips 2001-07-21 0:24 ` Brian J. Watson 2001-07-22 2:23 ` Rusty Russell 1 sibling, 1 reply; 16+ messages in thread From: Daniel Phillips @ 2001-07-18 13:46 UTC (permalink / raw) To: Larry McVoy, Brian J. Watson; +Cc: Linux Kernel, schoebel On Wednesday 18 July 2001 03:34, Larry McVoy wrote: > We've got a fairly nice hash table interface in BitKeeper that we'd > be happy to provide under the GPL. I've always thought it would be > cool to have it in the kernel, we use it everywhere. > > http://bitmover.com:8888//home/bk/bugfixes/src/src/mdbm Oh goodie, lots of new hash functions to test :-) I'll pass the interesting ones on to the guys with the serious hash-testing equipment. I think the original poster was thinking more along the lines of a generic insertion, deletion and lookup interface, which we are now doing in an almost-generic way in a few places. Once place that is distinctly un-generic is the buffer hash, for no good reason that I can see. This would be a good starting point for a demonstration. -- Daniel ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-18 13:46 ` Daniel Phillips @ 2001-07-21 0:24 ` Brian J. Watson 2001-07-21 20:25 ` Daniel Phillips 0 siblings, 1 reply; 16+ messages in thread From: Brian J. Watson @ 2001-07-21 0:24 UTC (permalink / raw) To: Daniel Phillips, Larry McVoy; +Cc: Linux Kernel Daniel Phillips wrote: > > On Wednesday 18 July 2001 03:34, Larry McVoy wrote: > > We've got a fairly nice hash table interface in BitKeeper that we'd > > be happy to provide under the GPL. I've always thought it would be > > cool to have it in the kernel, we use it everywhere. > > > > http://bitmover.com:8888//home/bk/bugfixes/src/src/mdbm Thanks, Larry. Your hashing functions are much more sophisticated than the simple modulo operator I've been using for hashing by inode number. > I think the original poster was thinking more along the lines of a > generic insertion, deletion and lookup interface, which we are now > doing in an almost-generic way in a few places. Once place that is > distinctly un-generic is the buffer hash, for no good reason that I > can see. This would be a good starting point for a demonstration. > Daniel's correct. I'm hashing function agnostic, but would like some common code to simplify the management of a hash table. Richard Guenther sent the following link to his own common hashing code, which makes nice use of pseudo-templates: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/glame/glame/src/include/hash.h?rev=1.5&content-type=text/plain A few things I would consider changing are: - ditching the pprev pointer - encapsulating the next pointer inside a struct hash_head_##FOOBAR - stripping out the hard-coded hashing function, and allowing the user to provide their own All the backslashes offend my aesthetic sensibility, but the preprocessor provides no alternative. ;) -- Brian Watson | "The common people of England... so Linux Kernel Developer | jealous of their liberty, but like the Open SSI Clustering Project | common people of most other countries Compaq Computer Corp | never rightly considering wherein it Los Angeles, CA | consists..." | -Adam Smith, Wealth of Nations, 1776 mailto:Brian.J.Watson@compaq.com http://opensource.compaq.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-21 0:24 ` Brian J. Watson @ 2001-07-21 20:25 ` Daniel Phillips 2001-07-22 10:18 ` Richard Guenther ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Daniel Phillips @ 2001-07-21 20:25 UTC (permalink / raw) To: Brian J. Watson, Larry McVoy; +Cc: Linux Kernel On Saturday 21 July 2001 02:24, Brian J. Watson wrote: > Daniel Phillips wrote: > > On Wednesday 18 July 2001 03:34, Larry McVoy wrote: > > > We've got a fairly nice hash table interface in BitKeeper that > > > we'd be happy to provide under the GPL. I've always thought it > > > would be cool to have it in the kernel, we use it everywhere. > > > > > > http://bitmover.com:8888//home/bk/bugfixes/src/src/mdbm > > Thanks, Larry. Your hashing functions are much more sophisticated > than the simple modulo operator I've been using for hashing by inode > number. Yes, I tested almost all of them to see how well they worked my directory index application. There are really only two criterea: 1) How random is the hash 2) How efficient is it My testing was hardly what you would call rigorous. Basically, what I do is hash a lot of very unrandom strings and see how uniform the resulting hash bucket distribution is. The *only* function from Larry's set that did well on the randomness side is the linear congruential hash - it did nearly as well as my dx_hack_hash. Surprisingly, at least to me, the CRC32 turned in an extremely variable performance. With a small number of buckets (say 100) it did ok, but with a larger numbers it showed a very lumpy distribution. Yes, this is way too imprecise a way of describing what happened and I should take a closer look at it. I don't have the mathematical background to be really sure about this, but I suspect CRC32 isn't optimized at all for randomness - it's optimized for detecting bit errors and has good properties with respect to neighbouring bits, properties that are no use at all to a randomizing funciton. Anyway, I wasn't all that unhappy to see CRC32 turn in a poor performance for two reasons: a) the 1K xor table would represent a 25% increase of the indexing code and b) hashing through the table eats an extra 1K of precious L1 cache. The linear congruential hash from Larry's set and my dx_hack_hash share a common characteristic: they both munge each character against a pseudorandom sequence. In Larry's hash it's a linear congruential sequence, and in my case it's a feedback shift register. In addition, I use a multiply to spread the effect of each character over a broader range of bits. Larry's hash doesn't do this and you can see right away that strings that vary only in the last character aren't going to be distributed very randomly. It might work a little better with the hashing step spelled this way: - ((h) = 0x63c63cd9*(h) + 0x9c39c33d + (c)) + ((h) = 0x63c63cd9*(h + (c)) + 0x9c39c33d) I haven't tried this, but I will. There are people out there who know a lot more about analyzing hash functions than I do, and I have their names somewhere in my mailbox. I'll go look them up soon and submit for proper testing the whole batch of functions that have been suggested to me over the last few months. By the way, in case you haven't already deduced this, this stuff is really time consuming. [...] > Richard Guenther sent the following link to his own common hashing > code, which makes nice use of pseudo-templates: > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/glame/glame >/src/include/hash.h?rev=1.5&content-type=text/plain > > A few things I would consider changing are: > > - ditching the pprev pointer Yes, you want to use the generic list macros there. > - encapsulating the next pointer inside a struct hash_head_##FOOBAR I think the generic list macros give you that for free. > - stripping out the hard-coded hashing function, and allowing the > user to provide their own Naturally. And trying to reduce the size of the macros. It's not that easy to get stuff that has dozens of lines ending with "\" into the kernel. You might have better luck just generalizing a few short sets of common operations used in hashes, and showing examples of how you'd use them to rewrite some of the existing hash code. Obviously, the new, improved approach has to be no less efficient than the current way of doing things. > All the backslashes offend my aesthetic sensibility, but the > preprocessor provides no alternative. ;) It's hard to argue against using inlines there. It's true that there are a lot of generalizations you just can't do with inlines, but so what? What matters is how efficient the generated code is and to a lesser extent, how readable the source is. You could make that source quite a bit more readable with a few *small* macros and some inline functions. Suggestion: express the bucket probe as an inline, compute the hash outside and pass it in. Then you can wrap the whole thing up in a really short macro. -- Daniel ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-21 20:25 ` Daniel Phillips @ 2001-07-22 10:18 ` Richard Guenther 2001-07-23 14:36 ` Daniel Phillips 2001-07-22 16:37 ` Larry McVoy 2001-07-22 23:34 ` Eyal Lebedinsky 2 siblings, 1 reply; 16+ messages in thread From: Richard Guenther @ 2001-07-22 10:18 UTC (permalink / raw) To: Daniel Phillips; +Cc: Brian J. Watson, Larry McVoy, Linux Kernel On Sat, 21 Jul 2001, Daniel Phillips wrote: > On Saturday 21 July 2001 02:24, Brian J. Watson wrote: > > Daniel Phillips wrote: > > Richard Guenther sent the following link to his own common hashing > > code, which makes nice use of pseudo-templates: > > > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/glame/glame > >/src/include/hash.h?rev=1.5&content-type=text/plain > > > > A few things I would consider changing are: > > > > - ditching the pprev pointer > > Yes, you want to use the generic list macros there. You get one-pointer size hash table entries and generic deletion from it. > > - encapsulating the next pointer inside a struct hash_head_##FOOBAR > > I think the generic list macros give you that for free. Umm, if you use such, you get lists.h style type-casting stuff which doesnt have a nice interface as my_type *hash_find_my() instead you'd get hash_dead_my *hash_find_my() which you'll have to cast with something like the list_entry() macro. > > - stripping out the hard-coded hashing function, and allowing the > > user to provide their own Ok, if the two expressions are not generic enough. > Naturally. And trying to reduce the size of the macros. It's not that > easy to get stuff that has dozens of lines ending with "\" into the > kernel. You might have better luck just generalizing a few short sets > of common operations used in hashes, and showing examples of how you'd > use them to rewrite some of the existing hash code. Obviously, the > new, improved approach has to be no less efficient than the current way > of doing things. All those \s are to encapsulate the whole thing into the template-style macro - not that I like this, but I cannot see an alternative. > > All the backslashes offend my aesthetic sensibility, but the > > preprocessor provides no alternative. ;) > > It's hard to argue against using inlines there. It's true that there > are a lot of generalizations you just can't do with inlines, but so > what? What matters is how efficient the generated code is and to a > lesser extent, how readable the source is. You could make that source > quite a bit more readable with a few *small* macros and some inline > functions. Suggestion: express the bucket probe as an inline, compute > the hash outside and pass it in. Then you can wrap the whole thing up > in a really short macro. Ok, with an approach like the list.h one (struct hash_head) you'd get there, but of course without automatic type conversion (and safety). Of course, if you can tidy up the macro without changing to use a hash_head structure, I'd be glad to see how :) Richard. -- Richard Guenther <richard.guenther@uni-tuebingen.de> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ The GLAME Project: http://www.glame.de/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-22 10:18 ` Richard Guenther @ 2001-07-23 14:36 ` Daniel Phillips 0 siblings, 0 replies; 16+ messages in thread From: Daniel Phillips @ 2001-07-23 14:36 UTC (permalink / raw) To: Richard Guenther; +Cc: Brian J. Watson, Larry McVoy, Linux Kernel On Sunday 22 July 2001 12:18, Richard Guenther wrote: > On Sat, 21 Jul 2001, Daniel Phillips wrote: > > On Saturday 21 July 2001 02:24, Brian J. Watson wrote: > > > Daniel Phillips wrote: > > > Richard Guenther sent the following link to his own common > > > hashing code, which makes nice use of pseudo-templates: > > > > > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/glame/g > > >lame /src/include/hash.h?rev=1.5&content-type=text/plain > > > > > > A few things I would consider changing are: > > > > > > - ditching the pprev pointer > > > > Yes, you want to use the generic list macros there. > > You get one-pointer size hash table entries and generic deletion from > it. Having thought about it a little more, I don't see why the symmetric double link (i.e., like the page hash, not like the buffer hash) couldn't be used with single-pointer tables, using similar tests for NULL in insertion and deletion. > > > - encapsulating the next pointer inside a struct > > > hash_head_##FOOBAR > > > > I think the generic list macros give you that for free. > > Umm, if you use such, you get lists.h style type-casting stuff which > doesnt have a nice interface as You could wrap the final product in inlines with internal typecasts. As you pointed out, C just doesn't have templates. > > Naturally. And trying to reduce the size of the macros. It's not > > that easy to get stuff that has dozens of lines ending with "\" > > into the kernel. You might have better luck just generalizing a > > few short sets of common operations used in hashes, and showing > > examples of how you'd use them to rewrite some of the existing hash > > code. Obviously, the new, improved approach has to be no less > > efficient than the current way of doing things. > > All those \s are to encapsulate the whole thing into the > template-style macro - not that I like this, but I cannot see an > alternative. An obvious alternative is to leave things the way they are. I've noticed that this is a stance Linus typically adopts for improvements that aren't clearly better in every way ;-) -- Daniel ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-21 20:25 ` Daniel Phillips 2001-07-22 10:18 ` Richard Guenther @ 2001-07-22 16:37 ` Larry McVoy 2001-07-23 14:24 ` Daniel Phillips 2001-07-22 23:34 ` Eyal Lebedinsky 2 siblings, 1 reply; 16+ messages in thread From: Larry McVoy @ 2001-07-22 16:37 UTC (permalink / raw) To: Daniel Phillips; +Cc: Brian J. Watson, Larry McVoy, Linux Kernel On Sat, Jul 21, 2001 at 10:25:51PM +0200, Daniel Phillips wrote: > 1) How random is the hash > 2) How efficient is it The hash is not the only part to consider for performance. The rest of the code is important as well. The code I pointed you to has been really carefully tuned for performance. And it can be made to be MP safe, SGI did that and managed to get 455,000 random fetches/second on an 8 way R4400 (each of these is about the same as the original Pentium at 150Mhz). -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-22 16:37 ` Larry McVoy @ 2001-07-23 14:24 ` Daniel Phillips 0 siblings, 0 replies; 16+ messages in thread From: Daniel Phillips @ 2001-07-23 14:24 UTC (permalink / raw) To: Larry McVoy; +Cc: Brian J. Watson, Larry McVoy, Linux Kernel On Sunday 22 July 2001 18:37, Larry McVoy wrote: > On Sat, Jul 21, 2001 at 10:25:51PM +0200, Daniel Phillips wrote: > > 1) How random is the hash > > 2) How efficient is it > > The hash is not the only part to consider for performance. The rest > of the code is important as well. The code I pointed you to has been > really carefully tuned for performance. Yes, I can see that. The linear congruential hash will be faster than the CRC32 on most modern machines, where we have fast multiplies vs multi-cycle table access. If it's true that the CRC32 is actually less random as well, I'd consider dropping the others and just going with the linear congruential hash. > And it can be made to be MP > safe, SGI did that and managed to get 455,000 random fetches/second > on an 8 way R4400 (each of these is about the same as the original > Pentium at 150Mhz). Did I mention that your linear congruential hash rated among the best of all I've tested? It's possible it might be further improved along the lines I suggested. I'll try this pretty soon. -- Daniel ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-21 20:25 ` Daniel Phillips 2001-07-22 10:18 ` Richard Guenther 2001-07-22 16:37 ` Larry McVoy @ 2001-07-22 23:34 ` Eyal Lebedinsky 2001-07-24 12:57 ` Daniel Phillips 2 siblings, 1 reply; 16+ messages in thread From: Eyal Lebedinsky @ 2001-07-22 23:34 UTC (permalink / raw) To: Linux Kernel Daniel Phillips wrote: > Yes, I tested almost all of them to see how well they worked my > directory index application. There are really only two criterea: > > 1) How random is the hash > 2) How efficient is it > > My testing was hardly what you would call rigorous. Basically, what I > do is hash a lot of very unrandom strings and see how uniform the Actually, to measure the randomness you need to measure the randomness of the output in the face of non-random input. Most well constructed hash functions perform well when the strings are random, however real world data (e.g. directory contntent) is not random at all. Efficiency should measure both space and time resources. If it should work in a multithreaded situation then another level of complexity is added. -- Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.anu.edu.au/eyal/> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-22 23:34 ` Eyal Lebedinsky @ 2001-07-24 12:57 ` Daniel Phillips 0 siblings, 0 replies; 16+ messages in thread From: Daniel Phillips @ 2001-07-24 12:57 UTC (permalink / raw) To: Eyal Lebedinsky, Linux Kernel On Monday 23 July 2001 01:34, Eyal Lebedinsky wrote: > Daniel Phillips wrote: > > Yes, I tested almost all of them to see how well they worked my > > directory index application. There are really only two criterea: > > > > 1) How random is the hash > > 2) How efficient is it > > > > My testing was hardly what you would call rigorous. Basically, > > what I do is hash a lot of very unrandom strings and see how > > uniform the > > Actually, to measure the randomness you need to measure the > randomness of the output in the face of non-random input. This is exactly what I do. > Most well constructed > hash functions perform well when the strings are random, however real > world data (e.g. directory contntent) is not random at all. I think you meant to say there, "even many poorly constructed hash functions perform well when..." > Efficiency should measure both space and time resources. If it should > work in a multithreaded situation then another level of complexity is > added. Sure, I could have added "how big is it". For me, that's just another kind of efficiency. Writing the code so it's reentrant is just good practice. There is no excuse whatsoever for not doing that for something simple like a hash function, even if you yourself never expect to run two copies concurrently. -- Daniel ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-18 1:34 ` Larry McVoy 2001-07-18 13:46 ` Daniel Phillips @ 2001-07-22 2:23 ` Rusty Russell 2001-07-24 12:28 ` Daniel Phillips 1 sibling, 1 reply; 16+ messages in thread From: Rusty Russell @ 2001-07-22 2:23 UTC (permalink / raw) To: Larry McVoy; +Cc: Linux Kernel In message <20010717183410.S29668@work.bitmover.com> you write: > We've got a fairly nice hash table interface in BitKeeper that we'd be > happy to provide under the GPL. I've always thought it would be cool > to have it in the kernel, we use it everywhere. > > http://bitmover.com:8888//home/bk/bugfixes/src/src/mdbm Hmmm.... cf. tdb on sourceforge. Although having code to be mmap or read/write backed is overkill in the kernel. Interestingly, there's an unused, undocumented hash table interface in include/linux/ghash.h. Cheers, Rusty. -- Premature optmztion is rt of all evl. --DK ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-22 2:23 ` Rusty Russell @ 2001-07-24 12:28 ` Daniel Phillips 0 siblings, 0 replies; 16+ messages in thread From: Daniel Phillips @ 2001-07-24 12:28 UTC (permalink / raw) To: Rusty Russell, Larry McVoy; +Cc: Linux Kernel On Sunday 22 July 2001 04:23, Rusty Russell wrote: > Interestingly, there's an unused, undocumented hash table interface > in include/linux/ghash.h. Yikes: #define DEF_HASH(LINKAGE,NAME,HASHSIZE,TYPE,PTRS,KEYTYPE,KEY,KEYCMP,KEYEQ,HASHFN)\ -- Daniel ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-18 0:57 Common hash table implementation Brian J. Watson 2001-07-18 1:34 ` Larry McVoy @ 2001-07-18 9:48 ` Richard Guenther 1 sibling, 0 replies; 16+ messages in thread From: Richard Guenther @ 2001-07-18 9:48 UTC (permalink / raw) To: Brian J. Watson; +Cc: Linux Kernel, schoebel On Tue, 17 Jul 2001, Brian J. Watson wrote: > A couple of days ago, I was thinking about a common hash table > implementation, ala include/linux/list.h. Then I came across > include/linux/ghash.h, and thought that someone's already done it. > After that I noticed the copyright line said 1997, and a quick check > in cscope showed that nobody's including it. > > Does anyone know if this file is worth studying and working with? I > have to wonder if nobody's using it after four years. > > Does anyone see a problem with a common hash table implementation? > I've implemented a few hash tables from scratch for our clustering > work, and it's starting to get a little old. Something easy to use > like list.h would be a lot nicer. You may look at http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/glame/glame/src/include/hash.h?rev=1.5&content-type=text/plain which is essentially a automatic generator of code for static hash tables like those from linux/mm/ Richard. -- Richard Guenther <richard.guenther@uni-tuebingen.de> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ The GLAME Project: http://www.glame.de/ ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <oupitgqjxoi.fsf@pigdrop.muc.suse.de>]
* Re: Common hash table implementation [not found] <oupitgqjxoi.fsf@pigdrop.muc.suse.de> @ 2001-07-20 22:32 ` Brian J. Watson 2001-07-21 22:57 ` Daniel Phillips 0 siblings, 1 reply; 16+ messages in thread From: Brian J. Watson @ 2001-07-20 22:32 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel Andi Kleen wrote: > It's a "fuzzy hash", which is a bit different from the normal hash and > probably not always appropiate. > > It was at one point used in the dcache but then later ripped out again > when the data structures changed. > Ahh. I'm not familiar with fuzzy hashes, but I suspected they might not be what I was interested in. > I would like to see a generic hash abstraction in the spirit of list.h > Especially if it would NOT use normal list_heads as open hash table > heads but instead single pointers for the head [currently some important > > hash tables like the inode hash are twice as big as needed because > each bucket is two pointers instead of one] > I agree. Hash tables such as inode_hashtable and dentry_hashtable are half as efficient under stress as they would otherwise be, because they use an array of list_heads. OTOH, I have no objections to using list_heads in other applications where a singly-linked list is all that's needed. Common code is a Good Thing. I'm just commenting specifically on hash table implementations, which tend to be used for really _big_ data structures. -- Brian Watson | "The common people of England... so Linux Kernel Developer | jealous of their liberty, but like the Open SSI Clustering Project | common people of most other countries Compaq Computer Corp | never rightly considering wherein it Los Angeles, CA | consists..." | -Adam Smith, Wealth of Nations, 1776 mailto:Brian.J.Watson@compaq.com http://opensource.compaq.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Common hash table implementation 2001-07-20 22:32 ` Brian J. Watson @ 2001-07-21 22:57 ` Daniel Phillips 0 siblings, 0 replies; 16+ messages in thread From: Daniel Phillips @ 2001-07-21 22:57 UTC (permalink / raw) To: Brian J. Watson, Andi Kleen; +Cc: Linux Kernel On Saturday 21 July 2001 00:32, Brian J. Watson wrote: > Andi Kleen wrote: > > hash tables like the inode hash are twice as big as needed because > > each bucket is two pointers instead of one] > > I agree. Hash tables such as inode_hashtable and dentry_hashtable are > half as efficient under stress as they would otherwise be, because > they use an array of list_heads. Whoops, yes. The buffer_head pprev strategy gives both the double linked list and a table vector of single links, at the expense of a few extra tests. For a one-size-fits-all hash strategy the question is, double-linked of singled-linked? The advantage of a double linked list for a hash is quick, generic deletion. Using the pprev strategy the disadvantage of double links in the table vector can be eliminated. The main advantage of the single link is space savings both in the table and in the structures. The disadvantage is having to do an extra bucket search on deletion, though this is partially offset by fewer pointer operations to perform insertion or deletion. It's hard to say which is fastest. It might turn out that the single linked strategy is actually faster, it really depends on typical depth of the buckets. A double-linked list deletion needs to follow two pointers and set two links, the single-linked list only one of each. There's a similar difference for insertion. So the extra overhead of insertion and deletion can be set off against say, three or four iterations of the bucket search loop. If buckets average six to eight elements deep, it's a tie. A more subtle cost of the double-link approach is the slight extra cache pressure due to the enlarged objects. This cost is incurred continously as the objects are used, it might very well add up to more than the cost of the final list traversal for the delete in the single-linked case. How about implementing both strategies, using your generic interface to make them look the same? And then seeing which is fastest in practice. -- Daniel ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2001-07-24 13:57 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-07-18 0:57 Common hash table implementation Brian J. Watson
2001-07-18 1:34 ` Larry McVoy
2001-07-18 13:46 ` Daniel Phillips
2001-07-21 0:24 ` Brian J. Watson
2001-07-21 20:25 ` Daniel Phillips
2001-07-22 10:18 ` Richard Guenther
2001-07-23 14:36 ` Daniel Phillips
2001-07-22 16:37 ` Larry McVoy
2001-07-23 14:24 ` Daniel Phillips
2001-07-22 23:34 ` Eyal Lebedinsky
2001-07-24 12:57 ` Daniel Phillips
2001-07-22 2:23 ` Rusty Russell
2001-07-24 12:28 ` Daniel Phillips
2001-07-18 9:48 ` Richard Guenther
[not found] <oupitgqjxoi.fsf@pigdrop.muc.suse.de>
2001-07-20 22:32 ` Brian J. Watson
2001-07-21 22:57 ` Daniel Phillips
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox