From mboxrd@z Thu Jan 1 00:00:00 1970 From: "George Spelvin" Subject: Re: Git commit generation numbers Date: 17 Jul 2011 14:27:43 -0400 Message-ID: <20110717182743.14423.qmail@science.horizon.com> Cc: linux@horizon.com, torvalds@linux-foundation.org To: git@vger.kernel.org X-From: git-owner@vger.kernel.org Sun Jul 17 20:30:20 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QiW6e-0002Fr-9X for gcvg-git-2@lo.gmane.org; Sun, 17 Jul 2011 20:30:20 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756064Ab1GQS1p (ORCPT ); Sun, 17 Jul 2011 14:27:45 -0400 Received: from science.horizon.com ([71.41.210.146]:50488 "HELO science.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1754577Ab1GQS1p (ORCPT ); Sun, 17 Jul 2011 14:27:45 -0400 Received: (qmail 14424 invoked by uid 1000); 17 Jul 2011 14:27:43 -0400 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: > The thing I hate about it is very fundamental: I think it's a hack around a basic git > design mistake. And it's a mistake we have known about for a long time. > > Now, I don't think it's a *fatal* mistake, but I do find it very broken to basically > say "we made a mistake in the original commit design, and instead of fixing it we > create a separate workaround for it". > > THAT I find distasteful. My reaction is that if we're going to add generation > numbers, then were should just do it the way we should have done them originally, > rather than as some separate hack. There are a few design mistakes in git. The way the object type and size are prefixed to the data for hasing purposes, which prevents aligned fetching from memory-mapped data in the hashing code, isn't too pretty either. But git has generally preferred to avoid storing information that can be recomputed. File renames are the big example. given this, why the heck store generation numbers? They *can* be computed on demand, so arguably they *should*. Cacheing is then an optimization, just like packs, pack indexes, the hashed object storage directories, and all that. I'm in the "make it a cache" camp, honestly. For example, here's a different possible generation number scheme. By making the generation number a cache, it becomes a valid alternative to experiment with. Simply store a topologically sorted list of commits. Each commit's position can serve as a generation number, and is greater than the positions of all ancestors. But by using the offset within the list, the number is stored implicitly. Generation numbers don't have to be consecutive as long as they're correctly ordered, so you could, e.g. choose to make them unique. I don't think this is actually worth it; I'm just using it as a not-completely-insane example of a different design that nonetheless achieves the same goal. Why freeze this in the object format?