From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH 0/3] Using a bit more decoration Date: Mon, 8 Apr 2013 23:51:26 -0400 Message-ID: <20130409035126.GA17319@sigill.intra.peff.net> References: <20130408210903.GC9649@sigill.intra.peff.net> <1365462114-8630-1-git-send-email-gitster@pobox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: git@vger.kernel.org To: Junio C Hamano X-From: git-owner@vger.kernel.org Tue Apr 09 05:51:36 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UPPap-0001nI-Lf for gcvg-git-2@plane.gmane.org; Tue, 09 Apr 2013 05:51:35 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935675Ab3DIDvb (ORCPT ); Mon, 8 Apr 2013 23:51:31 -0400 Received: from 75-15-5-89.uvs.iplsin.sbcglobal.net ([75.15.5.89]:34667 "EHLO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935659Ab3DIDvb (ORCPT ); Mon, 8 Apr 2013 23:51:31 -0400 Received: (qmail 19276 invoked by uid 107); 9 Apr 2013 03:53:22 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Mon, 08 Apr 2013 23:53:22 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 08 Apr 2013 23:51:26 -0400 Content-Disposition: inline In-Reply-To: <1365462114-8630-1-git-send-email-gitster@pobox.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Apr 08, 2013 at 04:01:51PM -0700, Junio C Hamano wrote: > This comes on top of the two "decorate" patches that introduced the > clear_decoration() function. > [...] > Junio C Hamano (3): > commit: shrink "indegree" field > commit: rename parse_commit_date() > commit: add get_commit_encoding() Neat. Reading through, I didn't notice anything obviously wrong with any of the patches (though there is one gcc warning, which I'll respond to separately). It does make me a little nervous to have code that almost never gets exercised (i.e., when indegree is really high, or a large number of encodings). It seems like a bug waiting to happen when somebody does hit that condition. But the decoration lookups do look fairly simple and easy to get right. > I am not sure if pre-parsing and sharing the encoding values in-core > would affect performance very much, but now we have 2 bytes (or 6 > bytes, if you are on 64-bit archs) free to use in "struct commit" > for later use (e.g. extra object flags that are relevant only for > commit objects, without having to widen "struct object"). I wasn't able to measure any speedup, but I'm not surprised; something like "git log" already spends way more effort extracting and pretty-printing the commit objects. I was actually thinking of something related to this recently. It would be nice to be able to allocate a slab of very efficient fixed-size per-commit or per-object storage. That would eliminate the need to pay the cost for "indegree" all the time; the topo-sort could allocate a slab of indegree integers, then throw it away when the sort was done. Similarly, a traversal would not have to worry about pollution of the object flags, nor about using an arbitrarily large number of flags[1], if it could allocate a separate flag structure. Using a separate hash would be too slow. I'm thinking more like giving each commit an integer "id" as it is parsed, and then using that id to index into a temporary slab (you'd grow the slab as you see more commits but that's OK, since we're always indexing it, not using a pointer). The slab management would get amortized, so the main cost would be the extra indirect memory access (i.e., hitting "flags[commit->id]" instead of "commit->id"), and the extra memory used by the id field. I dunno. Maybe those costs would make it too slow. I haven't actually coded or measured anything yet. -Peff [1] The thing that made me think about this was making a version of paint_down_to_common that could keep a separate color for each source, rather than only PARENT1 and PARENT2. That would let us do the "--contains" traversal in a breadth-first way, but calculate the answer simultaneously for all sources (i.e., avoid the depth-first "go to roots" problem that the current "tag --contains" has if you do not use timestamp cutoffs).