From mboxrd@z Thu Jan 1 00:00:00 1970 From: Linus Torvalds Subject: Re: Fix object re-hashing Date: Sun, 12 Feb 2006 11:10:11 -0800 (PST) Message-ID: References: <7vaccwbf6v.fsf@assigned-by-dhcp.cox.net> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: git@vger.kernel.org X-From: git-owner@vger.kernel.org Sun Feb 12 20:10:38 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1F8Mc6-0005c6-4h for gcvg-git@gmane.org; Sun, 12 Feb 2006 20:10:26 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751415AbWBLTKQ (ORCPT ); Sun, 12 Feb 2006 14:10:16 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751412AbWBLTKQ (ORCPT ); Sun, 12 Feb 2006 14:10:16 -0500 Received: from smtp.osdl.org ([65.172.181.4]:645 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S1751415AbWBLTKP (ORCPT ); Sun, 12 Feb 2006 14:10:15 -0500 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id k1CJABDZ017729 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 12 Feb 2006 11:10:11 -0800 Received: from localhost (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id k1CJABQp016264; Sun, 12 Feb 2006 11:10:11 -0800 To: Junio C Hamano In-Reply-To: X-Spam-Status: No, hits=0 required=5 tests= X-Spam-Checker-Version: SpamAssassin 2.63-osdl_revision__1.68__ X-MIMEDefang-Filter: osdl$Revision: 1.129 $ X-Scanned-By: MIMEDefang 2.36 Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: On Sun, 12 Feb 2006, Linus Torvalds wrote: > > - the "overflow of the overflow": when the linear probing itself > overflows the size of the hash queue, it will "change direction" by > overflowing back to index zero. > > Happily, the re-hashing does not need to care about this case, because > the new hash is bigger: the rule we have when doing the re-hashing is > that as we re-hash, the "i" entries we have already re-hashed are all > valid in the new hash, so even if overflow occurs, it will occur the > right way (and if it overflows all the way past the current "i", we'll > re-hash the already re-hashed entry anyway). Btw, this is only always true if the new hash is at least twice the size of the old hash, I think. Otherwise a re-hash can fill up the new entries and overflow entirely before we've actually even re-hashed all the old entries, and then we'd need to re-hash even the overflowed entries (which are now below "i"). If the new size is at least twice the old size, the "upper area" cannot overflow completely (there has to be empty room), and we cannot be in the situation that we need to move even the overflowed entries when we remove an old hash entry. Anyway, if all this makes you nervous, the conceptually much simpler way to do the re-sizing is to not do the in-place re-hashing. Instead of doing the xrealloc(), just do a "xmalloc()" of the new area, do the re-hashing (which now _must_ re-hash in just the "0..oldcount-1" old area) into the new area, and then free the old area after rehashing. That would make things more obviously correct, and perhaps simpler. Johannes, do you want to try that? Btw, as it currently stands, I worry a tiny tiny bit about the obj_allocs = (obj_allocs < 32 ? 32 : 2 * obj_allocs) thing, because I think that second "32" needs to be a "64" to be really safe (ie guarantee that the new obj_allocs value is always at least twice the old one). Anyway, I'm pretty sure people smarter than me have already codified exactly what needs to be done for a in-place rehash of a linear probe hash overflow algorithm. This must all be in some "hashing 101" book. I had to think it through from first principles rather than "knowing" what the right answer was (which probably means that I slept through some fundamental algorithms class in University ;) Linus