All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Ted Ts'o" <tytso@mit.edu>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Clemens Buchacher" <drizzd@aon.at>,
	git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>
Subject: Re: generation numbers (was: [PATCH 0/4] Speed up git tag --contains)
Date: Wed, 6 Jul 2011 20:46:42 +0200	[thread overview]
Message-ID: <201107062046.43820.jnareb@gmail.com> (raw)
In-Reply-To: <20110706181200.GD17978@sigill.intra.peff.net>

On Wed, 6 Jul 2011, Jeff King wrote:
> On Wed, Jul 06, 2011 at 11:01:03AM -0400, Ted Ts'o wrote:
> 
> > Is it worth it to try to replicate this information across repositories?
> 
> Probably not. I suggested notes-cache just because the amount of code is
> very trivial.

Well, generation numbers are universal and would help everybody.  For
new commits with 'generation' header those would be always replicated,
for old commits with 'generation' notes / notes-cache the can be
replicated.
 
> One problem with notes storage is that it's not well optimized for tiny
> pieces of data like this (e.g., the generation number should fit in a
> 32-bit unsigned int, as its max is the size of the longest single path
> in the history graph). But notes are much more general; we will actually
> map each commit to a blob object containing the generation number, which
> is pretty wasteful.

Wasn't textconv-cache using commit-less notes?  The same can be done
for generation notes-cache.  Though it is still wasteful...  By the
way, would we be using text representation (like in 'generation'
commit header) or 32-bit integer binary representation in some
ordering, or variable-length integer (I think git uses them somewhere)?

Nb. I wonder if 32-bit unsigned int would always be enough, for example
Linux kernel + history.

> > Why not just simply have a cache file in the git directory which is
> > managed somewhat like gitk.cache; call it generation.cache?
> 
> Yeah, that would be fine. With a sorted list of binary sha1s and 32-bit
> generation numbers, you're talking about 24 bytes per commit. Or a 6
> megabyte cache for linux-2.6.
> 
> You'd probably want to be a little clever with updates. If I have
> calculated the generation number of every commit, and then do "git
> commit; git tag --contains HEAD", you probably don't want to rewrite the
> entire cache. You could probably journal a fixed number of entries in an
> unsorted file (or even in a parallel directory structure to loose
> objects), and then periodically write out the whole sorted list when the
> journal gets too big. Or choose a more clever data structure that can do
> in-place updates.

And that is the difference between gitk.cache (generated _once_ when starting
gitk, and regenerated on request), and idea of generation.cache

I think it would be simpler to use generation header + generation notes.
Or start with generation notes only.

-- 
Jakub Narebski
Poland

  reply	other threads:[~2011-07-06 18:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-11 19:04 [PATCH 0/4] Speed up git tag --contains Ævar Arnfjörð Bjarmason
2011-06-11 19:04 ` [PATCH 1/4] tag: speed up --contains calculation Ævar Arnfjörð Bjarmason
2011-06-11 19:04 ` [PATCH 2/4] limit "contains" traversals based on commit timestamp Ævar Arnfjörð Bjarmason
2011-06-11 19:04 ` [PATCH 3/4] default core.clockskew variable to one day Ævar Arnfjörð Bjarmason
2011-06-11 19:04 ` [PATCH 4/4] Why is "git tag --contains" so slow? Ævar Arnfjörð Bjarmason
2011-07-06  6:40 ` [PATCH 0/4] Speed up git tag --contains Jeff King
2011-07-06  6:54   ` Jeff King
2011-07-06 19:06     ` Clemens Buchacher
2011-07-06  6:56   ` Jonathan Nieder
2011-07-06  7:03     ` Jeff King
2011-07-06 14:26       ` generation numbers (was: [PATCH 0/4] Speed up git tag --contains) Jakub Narebski
2011-07-06 15:01         ` Ted Ts'o
2011-07-06 18:12           ` Jeff King
2011-07-06 18:46             ` Jakub Narebski [this message]
2011-07-07 18:59               ` Jeff King
2011-07-07 19:34                 ` generation numbers Junio C Hamano
2011-07-07 20:31                   ` Jakub Narebski
2011-07-07 20:52                     ` A Large Angry SCM
2011-07-08  0:29                       ` Junio C Hamano
2011-07-08 22:57                   ` Jeff King
2011-07-06 23:22             ` Junio C Hamano
2011-07-07 19:08               ` Jeff King
2011-07-07 20:10                 ` Jakub Narebski
2018-01-12 18:56   ` [PATCH 0/4] Speed up git tag --contains csilvers
2018-03-03  5:15     ` Jeff King
2018-03-08 23:05       ` csilvers
2018-03-12 13:45       ` Derrick Stolee
2018-03-12 23:59         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201107062046.43820.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=avarab@gmail.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.