All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Stefan Beller <sbeller@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Marc Strapetz <marc.strapetz@syntevo.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: topological index field for commit objects
Date: Fri, 1 Jul 2016 11:59:28 +0200	[thread overview]
Message-ID: <57763F00.4070409@gmail.com> (raw)
In-Reply-To: <20160701065452.GE5358@sigill.intra.peff.net>

W dniu 2016-07-01 o 08:54, Jeff King pisze:
> On Thu, Jun 30, 2016 at 12:30:31PM +0200, Jakub Narębski wrote:
> 
>>> This is one of the open questions. My older patches turned them off when
>>> replacements and grafts are in effect.
>>
>> Well, if you store the cache of generation numbers in the packfile, or in
>> the index of the packfile, or in the bitmap file, or in separate bitmap-like
>> file, generating them on repack, then of course any grafts or replacements
>> invalidate them... though for low level commands (like object counting)
>> replacements are transparent -- or rather they are (and can be) treated as
>> any other ref for reachability analysis.
>>
>> Well, if there are no grafts, you could still use them for doing
>> "git --no-replace-objects log ...", isn't it?
> 
> Yes, replace refs don't invalidate the concept of a cache. It just
> means that you invalidate the invariants of the cache for a specific
> view, so you need a cache which matches that view.
> 
> It has been several years, but I remember at one point having patches
> that summarized the graft/replace state as a single hash, and only used
> the cache if it matched that state. So you could actually keep a cache
> for some set of replace-refs that you have, as well as a cache for the
> case that you've turned them off, etc.
> 
> I don't think that level of complexity is really worth it, though.

Well, you could always update the reachability-helpers cache when running
`git replace` command, and when fetching into 'refs/replace' namespace...

...but this wouldn't take into account the fact that you can change
replace refs "by hand", and that grafts file^{1} is only editable by hand.
So at query time Git would need to check (e.g. via hash of graft file,
hash of packed-refs refs/replace namespace, concatenated) that said
cache is still valid for replace-respecting view. And perhaps update
said cache.

Though if we limit ourself to the replacements mechanism, we could
have a configuration variable saying "I will manipulate replacements
only using git-replace, and I want faster reachability", isn't it?


1.) Can we deprecate and remove grafts mechanism now that we have superior
solution and migration mechanism? 
 
>>>>> I have patches that generate and store the numbers at pack time, similar
>>>>> to the way we do the reachability bitmaps.
>>
>> Ah, so those cached generation numbers are generated and stored at pack
>> time. Where you store them: is it a separate file? Bitmap file? Packfile?
> 
> There were a few iterations of the concept over the years, but the
> pack-time one uses a separate file with the same name prefix as a pack
> (similar to the way bitmaps are stored). The big advantage there is that
> we can piggy-back on the pack .idx to avoid having to write each sha1
> again (20 bytes per commit, whereas the actual data we're caching is
> only 4 bytes).

Does it use any lightweight compression mechanism, or is it not needed?
How does the format of this file looks like?
 
>>> At GitHub we are using them for --contains analysis, along with mass
>>> ahead/behind (e.g., as in https://github.com/gitster/git/branches). My
>>> plan is to send patches upstream, but they need some cleanup first.
>>
>> That would be nice to have, please.
>>
>> Er, is mass ahead/behind something that can be plugged into Git
>> (e.g. for "git branch -v -v"), or is it something GitHub-specific?
> 
> We have a custom command, "git ahead-behind", where you can specify
> arbitrary pairs of commits on stdin. But it's all backed by a function
> which, yes, could be plugged into "branch -v -v". It caches any bitmaps
> it needs, so if you are doing 100 ahead/behind comparisons against
> "master", for example, it only has to find the bitmap for "master" once
> (remember that we sometimes have to traverse to complete a bitmap when
> a branch has been updated since the last repack).

That would be nice to have (perhaps invoked only if number of branches
is high enough; that excludes using it for ahead-behind information that
`git checkout` prints).

-- 
Jakub Narębski


  reply	other threads:[~2016-07-01 10:00 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-29 18:31 topological index field for commit objects Marc Strapetz
2016-06-29 18:59 ` Junio C Hamano
2016-06-29 20:20   ` Stefan Beller
2016-06-29 20:39     ` Junio C Hamano
2016-06-29 20:54       ` Stefan Beller
2016-06-29 21:37         ` Stefan Beller
2016-06-29 21:43           ` Jeff King
2016-06-29 20:56       ` Jeff King
2016-06-29 21:49         ` Jakub Narębski
2016-06-29 22:00           ` Jeff King
2016-06-29 22:11             ` Junio C Hamano
2016-06-29 22:30               ` Jeff King
2016-07-05 11:43                 ` Johannes Schindelin
2016-07-05 12:59                   ` Jakub Narębski
2016-06-30 10:30             ` Jakub Narębski
2016-06-30 18:12               ` Linus Torvalds
2016-06-30 23:39                 ` Jakub Narębski
2016-06-30 23:59                 ` Mike Hommey
2016-07-01  3:17                 ` Jeff King
2016-07-01  6:45                   ` Marc Strapetz
2016-07-01  9:48                   ` Jakub Narębski
2016-07-01 16:08                   ` Junio C Hamano
2016-07-01  6:54               ` Jeff King
2016-07-01  9:59                 ` Jakub Narębski [this message]
2016-07-20  0:07             ` Jakub Narębski
2016-07-20 13:02               ` Jeff King
2017-02-04 13:43                 ` Jakub Narębski
2017-02-17  9:26                   ` Jeff King
2017-02-17  9:28                     ` Jakub Narębski
2016-06-29 22:15       ` Marc Strapetz
2016-06-29 21:00   ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57763F00.4070409@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=marc.strapetz@syntevo.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.