From: Jakub Narebski <jnareb@gmail.com>
To: Asger Ottar Alstrup <asger@area9.dk>
Cc: Avery Pennarun <apenwarr@gmail.com>,
git@vger.kernel.org, Alexander Gavrilov <angavrilov@gmail.com>
Subject: Re: git subtree as a solution to partial cloning?
Date: Mon, 25 May 2009 16:26:00 -0700 (PDT) [thread overview]
Message-ID: <m3bppgdan2.fsf@localhost.localdomain> (raw)
In-Reply-To: <8873ae500905251128h1921895dp6ef227e0e0bbec49@mail.gmail.com>
Asger Ottar Alstrup <asger@area9.dk> writes:
> On Mon, May 25, 2009 at 7:54 PM, Avery Pennarun <apenwarr@gmail.com> wrote:
>> On Mon, May 25, 2009 at 1:35 PM, Asger Ottar Alstrup <asger@area9.dk> wrote:
>>> So a poor mans system could work like this:
>>>
>>> - A reduced repository is defined by a list of paths in a file, I
>>> guess with a format similar to .gitignore
>>
>> Are you sure you want to define the list with exclusions instead of
>> inclusions? I don't really know your use case.
>
> Since the .gitignore format supports !, I believe that should not make
> much of a difference.
>
>> Anyway, if you're using git filter-branch, it'll be up to you to fix
>> the index to contain the list of files you want. (See man
>> git-filter-branch)
>
> Yes, sure, and that is why I asked whether there is some tool in git
> that can give a list of concrete files surviving a .gitignore list of
> patterns.
I think you would want to use git-ls-files, using --exclude-from=<file>
option, and perhaps also -i/--ignored to create list of files to be
removed (using git-update-index) instead of list of files to be kept.
>>> - To extract: A copy of the original repository is made. This copy is
>>> reduced using git filter-branch. Is there some way of turning a
>>> .gitignore syntax file into a concrete list of files? Also, can this
>>> entire step be done in one step without the copy? Having to copy the
>>> entire project first seems excessive. Will filter-branch preserve
>>> and/or prune pack files intelligently?
>>
>> You probably need to read about the differences between git trees,
>> blobs, and commits. You're not actually "copying" anything; you're
>> just creating some new directory structures that contain the
>> *existing* blobs. And of course the existing blobs are in your
>> existing packs.
>
> Thanks. OK, I see now that filter-branch will not destroy the original
> repository. That is not at all obvious from reading the man page, when
> the very first sentence says that it will rewrite history.
What git-filter-branch does is to write _new_ history, and move old
history to refs/original/* namespace (that might have changed; anyway
the old history should be available via reflog). The visible efect
is that history got rewritten.
> But the
> main point of this exercise is to reduce the size of the reduced
> repository so that it can be transferred effectively. So after
> filter-branch, I guess I would run clone afterwards to make the new,
> smaller repository, and then the question becomes: Will clone reuse
> and prune packs intelligently?
Yes, it would... well, you have to take into account that ordinary
clone over local filesystem does hardlinking of packfiles, and you
need to use file:// trick to force repack; also you might want to use
--reference to set up alternates.
But that is not necessary: if you want to push effectively _subset_
of branches, you can define remote infor in appropriate way and push
would intelligently transfer only needed objects.
[...]
> However, there is a large group of users that do not need this, but
> they DO need the entire history of the files they are interested in.
> Subversion does not provide this. Also, Subversion is simply too slow
> to handle the kind of files we need to work with. Also, we have run
> tests on the kind of files we have, and the delta compression that git
> uses is very effective for compression the pdf and openoffice
> documents we use. The big files we have are primarily image files, and
> obviously they do not compress very well. Fortunately, they do not
> change much either.
You might want to turn off deltaification for binary files via `delta`
gitattribute; it might help (it might not).
--
Jakub Narebski
Poland
ShadeHawk on #git
next prev parent reply other threads:[~2009-05-25 23:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <8873ae500905250021p20e7096dwf5bc71c36c4047b@mail.gmail.com>
2009-05-25 7:59 ` git subtree as a solution to partial cloning? Avery Pennarun
2009-05-25 9:33 ` Asger Ottar Alstrup
2009-05-25 15:50 ` Avery Pennarun
2009-05-25 17:35 ` Asger Ottar Alstrup
2009-05-25 17:54 ` Avery Pennarun
2009-05-25 18:28 ` Asger Ottar Alstrup
2009-05-25 19:18 ` Avery Pennarun
2009-05-25 23:26 ` Jakub Narebski [this message]
2009-05-25 7:35 Asger Ottar Alstrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3bppgdan2.fsf@localhost.localdomain \
--to=jnareb@gmail.com \
--cc=angavrilov@gmail.com \
--cc=apenwarr@gmail.com \
--cc=asger@area9.dk \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).