git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Ericsson <ae@op5.se>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Pierre Habouzit <madcoder@debian.org>,
	git@vger.kernel.org, Scott Chacon <schacon@gmail.com>
Subject: Re: libgit2 - a true git library
Date: Mon, 03 Nov 2008 11:17:38 +0100	[thread overview]
Message-ID: <490ECFC2.1070901@op5.se> (raw)
In-Reply-To: <20081102015041.GG15463@spearce.org>

Shawn O. Pearce wrote:
> Andreas Ericsson <ae@op5.se> wrote:
>> Shawn O. Pearce wrote:
>>> Eh, I disagree here.  In git.git today "struct commit" exposes its
>>> buffer with the canonical commit encoding.  Having that visible
>>> wrecks what Nico and I were thinking about doing with pack v4 and
>>> encoding commits in a non-canonical format when stored in packs.
>>> Ditto with trees.
>> Err... isn't that backwards?
> 
> No.
> 
>> Surely you want to store stuff in the
>> canonical format so you're forced to do as few translations as
>> possible?
> 
> No.  We suspect that canonical format is harder to decompress and
> parse during revision traversal.  Other encodings in the pack file
> may produce much faster runtime performance, and reduce page faults
> (due to smaller pack sizes).
> 
> We hardly ever use the canonical format for actual output; most
> output rips the canonical format apart and then formats the data
> the way it was requested.  If we have the data *already* parsed in
> the pack its much faster to output.
> 

I'll have to look into the pack v4 stuff, as I can't get what you're
saying to make sense to me. Not canonicalizing the data when storing
it means you'll have to have conversion routines from all the various
encodings, unless you first canonicalize it and then encode it in the
way you need it in the pack way. Never using canonical format means
it has no potential for combinatorial explosion, and every converter
needs to know about every other format.


>> Or are you trying to speed up packing by skipping the
>> canonicalization part?
> 
> Wrong; we're trying to speed up reading.  Packing may go slower,
> especially during the first conversion of v2->v4 for any given
> repository, but packing is infrequent so the minor (if any) drop
> in performance here is probably worth the reading performance gains.
> 
>> Well, if macro usage is adhered to one wouldn't have to worry,
>> since the macro can just be rewritten with a function later (if,
>> for example, translation or some such happens to be required).
>> Older code linking to a newer library would work (assuming the
>> size of the commit object doesn't change anyway),
> 
> You are assuming too much magic.  If the older ABI used a macro
> and the newer one (which supports pack v4) organized struct commit
> differently and the user upgrades libgit2.so the older applications
> just broke, horribly.
> 

Naturally. Re-sizing objects that weren't previously protected by
accessor functions will always break the ABI unless the layout of
the previously existing items in the object doesn't change, which
is exactly what I said.

> We know we want to do pack v4 in the near future.  Or at least
> experiment with it and see if it works.  If it does, we don't
> want to have to cause a major ABI breakage across all those newly
> installed libgit2s... yikes.
> 

That's not necessary, although it requires a bit more thought when
changing the objects.

> I'm really in favor of accessor functions for the first version of
> the library.  They can always be converted to macros once someone
> shows that their git visualizer program saves 10 ms on a 8,000 ms
> render operation by avoiding accessor functions.  I'd rather spend
> our brain cycles optimizing the runtime and the in-core data so
> we spend less time in our tight revision traversal loops.
> 

I agree that for early versions they should definitely be functions,
but since we've been talking about "final design" earlier in the
thread, that's what I was referring to here to.

> Seriously.  We make at least 10 or 11 function calls *per commit*
> that comes out of get_revision().  If the formatting application is
> really suffering from its 4 or 5 accessor function calls in order
> to get that returned data, we probably should also be looking at
> how we can avoid function cals in the library.
> 
> Oh, and even with 4 or 5 accessor functions per commit in the
> application that is *still* better than the 10 or so calls the
> application probably makes today scraping "git log --format=raw"
> off a pipe and segment it into the different fields it needs.
> 

Right.

> Unless pipes in Linux somehow allow negative time warping with
> CPU counters.  Though on dual-core systems they might, since the
> two processes can run on different cores.  But oh, you didn't want
> to worry about threading support too much in libgit2, so I guess
> you also don't want to use multi-core systems.
> 

All comps where I use git today are multi-core systems, so I'm very
much in favour of parallell processing. Otoh, I also don't think it
makes sense to jump through hoops to make sure one can, fe, read and
parse all tags in multiple threads simultaneously. In other words, I
don't think we need to bother adding thread-safety for stuff that 
really only should happen if the application is poorly designed
(unless it's straightforward to do so, ofcourse).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

  reply	other threads:[~2008-11-03 10:19 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-31 17:07 libgit2 - a true git library Shawn O. Pearce
2008-10-31 17:28 ` Pieter de Bie
2008-10-31 17:29   ` Pieter de Bie
2008-10-31 17:47 ` Pierre Habouzit
2008-10-31 18:41   ` Shawn O. Pearce
2008-10-31 18:54     ` Pierre Habouzit
2008-10-31 19:57       ` Shawn O. Pearce
2008-10-31 20:12         ` Pierre Habouzit
2008-10-31 20:05     ` Junio C Hamano
2008-10-31 21:58       ` Shawn O. Pearce
2008-11-01 17:30     ` Pierre Habouzit
2008-11-01 18:44       ` Andreas Ericsson
2008-11-01 18:48         ` Pierre Habouzit
2008-11-01 20:29         ` Shawn O. Pearce
2008-11-01 21:58           ` Andreas Ericsson
2008-11-02  1:50             ` Shawn O. Pearce
2008-11-03 10:17               ` Andreas Ericsson [this message]
2008-11-02  1:56       ` Shawn O. Pearce
2008-11-02  9:25         ` Pierre Habouzit
2008-10-31 20:24   ` Nicolas Pitre
2008-10-31 20:29     ` david
2008-10-31 20:56       ` Nicolas Pitre
2008-10-31 21:43         ` Shawn O. Pearce
2008-10-31 21:50           ` Shawn O. Pearce
2008-10-31 21:51           ` Pierre Habouzit
2008-10-31 21:31     ` Pierre Habouzit
2008-10-31 22:10       ` Nicolas Pitre
2008-11-01 10:52         ` Andreas Ericsson
2008-10-31 23:24       ` Pieter de Bie
2008-10-31 23:28         ` Shawn O. Pearce
2008-10-31 23:49           ` Junio C Hamano
2008-11-01  0:02             ` Pierre Habouzit
2008-11-01  0:19               ` Shawn O. Pearce
2008-11-01  1:02                 ` Pierre Habouzit
2008-11-01  0:13             ` Shawn O. Pearce
2008-11-01  1:15               ` Nicolas Pitre
2008-11-01  1:19                 ` Shawn O. Pearce
2008-11-01  1:45                   ` Nicolas Pitre
2008-11-01  1:52                     ` Shawn O. Pearce
2008-11-01  2:26                       ` Johannes Schindelin
2008-11-01 11:01                         ` Pierre Habouzit
2008-11-01 13:50                           ` Nicolas Pitre
2008-11-01 17:01                             ` Pierre Habouzit
2008-11-01 20:26                             ` Johannes Schindelin
2008-10-31 23:14     ` Junio C Hamano
2008-10-31 23:33       ` Pierre Habouzit
2008-10-31 23:41       ` Shawn O. Pearce
2008-10-31 23:56         ` Jakub Narebski
2008-11-01  0:41         ` david
2008-11-01  1:00           ` Shawn O. Pearce
2008-11-01  1:04             ` david
2008-11-01  1:08               ` Pierre Habouzit
2008-11-01  1:33                 ` Nicolas Pitre
2008-11-01  1:38                   ` Pierre Habouzit
2008-11-01  1:49                     ` Nicolas Pitre
2008-11-01  1:43                   ` Shawn O. Pearce
2008-11-01  1:53                     ` Nicolas Pitre
2008-11-01 22:57                       ` Shawn O. Pearce
2008-11-02  0:26                         ` Scott Chacon
2008-11-02  1:07                           ` Scott Chacon
2008-11-02  1:36                             ` Shawn O. Pearce
2008-11-02  5:09                             ` David Brown
2008-11-03 16:20                               ` Shawn O. Pearce
2008-11-01  1:06           ` Pierre Habouzit
2008-11-01  1:36             ` david
2008-10-31 20:24   ` Brian Gernhardt
2008-10-31 21:59   ` Andreas Ericsson
2008-10-31 22:01     ` Shawn O. Pearce
2008-10-31 22:51       ` Junio C Hamano
2008-11-01 11:17       ` Andreas Ericsson
2008-10-31 23:22     ` Johannes Schindelin
2008-10-31 23:18 ` Bruno Santos
2008-10-31 23:25   ` Shawn O. Pearce
2008-11-01 19:18 ` Andreas Ericsson
2008-11-01 20:42   ` Shawn O. Pearce
2008-11-02  2:30     ` Johannes Schindelin
2008-11-02  9:19     ` Pierre Habouzit
2008-11-03 13:08     ` Andreas Ericsson
2008-11-08 13:26 ` Steve Frécinaux
2008-11-08 14:35   ` Andreas Ericsson
2008-11-08 17:27     ` Pierre Habouzit
2008-11-09 10:17       ` Andreas Ericsson
2008-11-09 21:02         ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=490ECFC2.1070901@op5.se \
    --to=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=madcoder@debian.org \
    --cc=schacon@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).