git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Johannes Sixt <j.sixt@viscovery.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Nicolas Pitre <nico@fluxnic.net>,
	John Hawley <warthog19@eaglescrag.net>
Subject: Re: [RFC] Add --create-cache to repack
Date: Fri, 28 Jan 2011 06:37:22 -0800	[thread overview]
Message-ID: <AANLkTim+AUY9SdeAFfkny2_a3qQ9SCDLUHR3s9Q3M98u@mail.gmail.com> (raw)
In-Reply-To: <4D42878E.2020502@viscovery.net>

On Fri, Jan 28, 2011 at 01:08, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Am 1/28/2011 9:06, schrieb Shawn O. Pearce:
>> A cache pack is all objects reachable from a single commit that is
>> part of the project's stable history and won't disappear, and is
>> accessible to all readers of the repository.  By containing only that
>> commit and its contents, if the commit is reached from a reference we
>> know immediately that the entire pack is also reachable.  To help
>> ensure this is true, the --create-cache flag looks for a commit along
>> refs/heads and refs/tags that is at least 1 month old, working under
>> the assumption that a commit this old won't be rebased or pruned.
>
> In one of my repositories, I have two stable branches and a good score of
> topic branches of various ages (a few hours up to two years 8). The topic
> branches will either be dropped eventually, or rebased.
>
> What are the odds that this choice of a tip commit picks one that is in a
> topic branch? Or is there no point in using --create-cache in a repository
> like this?

Argh, you are right.  Its quite likely this would pick a topic
branch... and that isn't really what is desired.

My original concept here was for distribution point repositories,
which are less likely to have these topic branches that will rebase
and disappear.  Though git.git has one called "pu".  *sigh*

A simple fix is to use --heads --tags by default like I do here, but
make the actual parameters we feed to rev-list configurable.  A
repository owner could select only the master branch as input to
rev-list, making it less likely the topic branches would be
considered.  Unfortunately that requires direct access to the
repository.  It fails for a site like GitHub, where you don't manage
the repository at all.

git.git also is problematic because of the man, html and todo
branches.  Branches that are disconnected from the main history but
are very small (e.g. todo) might be selected instead and create a
nearly useless cache file.  Fortunately disconnected branches could
each have their own cache file (with only the inode overhead of having
an additional 3 files per disconnected branch), and pack-objects could
concat all of those packs together when sending.  Its just a challenge
to identify these branches and keep them from being used for that main
project pack.


This started because I was looking for a way to speed up clones coming
from a JGit server.  Cloning the linux-2.6 repository is painful, it
takes a long time to enumerate the 1.8 million objects.  So I tried
adding a cached list of objects reachable from a given commit, which
speeds up the enumeration phase, but JGit still needs to allocate all
of the working set to track those objects, then go find them in packs
and slice out each compressed form and reformat the headers on the
wire.  Its a lot of redundant work when your kernel repository has
360MB of data that you know a client needs if they have asked for your
master branch with no "have" set.

Later I realized, we can get rid of that cached list of objects and
just use the pack itself.  Its far cleaner, as there is no redundant
cache.  But either way (object list or pack) its a bit of a challenge
to automatically identify the right starting points to use.  Linus
Torvalds' linux-2.6 repository is the perfect case for the RFC I
posted, its one branch with all of the history, and it never rewinds.
But maybe Linus is just very unique in this world.  :-)

-- 
Shawn.

  reply	other threads:[~2011-01-28 14:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-28  8:06 [RFC] Add --create-cache to repack Shawn O. Pearce
2011-01-28  9:08 ` Johannes Sixt
2011-01-28 14:37   ` Shawn Pearce [this message]
2011-01-28 15:33     ` Johannes Sixt
2011-01-28 18:22       ` Shawn Pearce
2011-01-28 19:15       ` Jay Soffian
2011-01-28 19:19         ` Shawn Pearce
2011-01-28 18:46     ` Nicolas Pitre
2011-01-28 19:15       ` Shawn Pearce
2011-01-28 21:09         ` Nicolas Pitre
2011-01-29  1:32           ` Shawn Pearce
2011-01-29  2:34             ` Shawn Pearce
2011-01-30  8:05               ` Junio C Hamano
2011-01-30 19:43                 ` Shawn Pearce
2011-01-30 20:02                   ` Junio C Hamano
2011-01-30 20:20                     ` Shawn Pearce
2011-01-30 22:26                   ` Nicolas Pitre
2011-01-29  4:08             ` Nicolas Pitre
2011-01-29  4:35               ` Shawn Pearce
2011-01-30  6:51             ` Junio C Hamano
2011-01-30 17:14               ` Nicolas Pitre
2011-01-30 17:41                 ` A Large Angry SCM
2011-01-30 19:29               ` Shawn Pearce
2011-01-30 22:13             ` Shawn Pearce
2011-01-31 18:47             ` Shawn Pearce
2011-01-31 21:48               ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTim+AUY9SdeAFfkny2_a3qQ9SCDLUHR3s9Q3M98u@mail.gmail.com \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j.sixt@viscovery.net \
    --cc=nico@fluxnic.net \
    --cc=warthog19@eaglescrag.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).