All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Asger Ottar Alstrup <asger@area9.dk>
Cc: Avery Pennarun <apenwarr@gmail.com>,
	git@vger.kernel.org, Alexander Gavrilov <angavrilov@gmail.com>
Subject: Re: git subtree as a solution to partial cloning?
Date: Mon, 25 May 2009 16:26:00 -0700 (PDT)	[thread overview]
Message-ID: <m3bppgdan2.fsf@localhost.localdomain> (raw)
In-Reply-To: <8873ae500905251128h1921895dp6ef227e0e0bbec49@mail.gmail.com>

Asger Ottar Alstrup <asger@area9.dk> writes:

> On Mon, May 25, 2009 at 7:54 PM, Avery Pennarun <apenwarr@gmail.com> wrote:
>> On Mon, May 25, 2009 at 1:35 PM, Asger Ottar Alstrup <asger@area9.dk> wrote:
>>> So a poor mans system could work like this:
>>>
>>> - A reduced repository is defined by a list of paths in a file, I
>>> guess with a format similar to .gitignore
>>
>> Are you sure you want to define the list with exclusions instead of
>> inclusions?  I don't really know your use case.
> 
> Since the .gitignore format supports !, I believe that should not make
> much of a difference.
> 
>> Anyway, if you're using git filter-branch, it'll be up to you to fix
>> the index to contain the list of files you want. (See man
>> git-filter-branch)
> 
> Yes, sure, and that is why I asked whether there is some tool in git
> that can give a list of concrete files surviving a .gitignore list of
> patterns.

I think you would want to use git-ls-files, using --exclude-from=<file>
option, and perhaps also -i/--ignored to create list of files to be
removed (using git-update-index) instead of list of files to be kept.
 
>>> - To extract: A copy of the original repository is made. This copy is
>>> reduced using git filter-branch. Is there some way of turning a
>>> .gitignore syntax file into a concrete list of files? Also, can this
>>> entire step be done in one step without the copy? Having to copy the
>>> entire project first seems excessive. Will filter-branch preserve
>>> and/or prune pack files intelligently?
>>
>> You probably need to read about the differences between git trees,
>> blobs, and commits.  You're not actually "copying" anything; you're
>> just creating some new directory structures that contain the
>> *existing* blobs.  And of course the existing blobs are in your
>> existing packs.
> 
> Thanks. OK, I see now that filter-branch will not destroy the original
> repository. That is not at all obvious from reading the man page, when
> the very first sentence says that it will rewrite history. 

What git-filter-branch does is to write _new_ history, and move old
history to refs/original/* namespace (that might have changed; anyway
the old history should be available via reflog).  The visible efect
is that history got rewritten.

> But the
> main point of this exercise is to reduce the size of the reduced
> repository so that it can be transferred effectively. So after
> filter-branch, I guess I would run clone afterwards to make the new,
> smaller repository, and then the question becomes: Will clone reuse
> and prune packs intelligently?

Yes, it would... well, you have to take into account that ordinary
clone over local filesystem does hardlinking of packfiles, and you
need to use file:// trick to force repack; also you might want to use
--reference to set up alternates.

But that is not necessary: if you want to push effectively _subset_
of branches, you can define remote infor in appropriate way and push
would intelligently transfer only needed objects.

[...]
> However, there is a large group of users that do not need this, but
> they DO need the entire history of the files they are interested in.
> Subversion does not provide this. Also, Subversion is simply too slow
> to handle the kind of files we need to work with. Also, we have run
> tests on the kind of files we have, and the delta compression that git
> uses is very effective for compression the pdf and openoffice
> documents we use. The big files we have are primarily image files, and
> obviously they do not compress very well. Fortunately, they do not
> change much either.

You might want to turn off deltaification for binary files via `delta`
gitattribute; it might help (it might not).

-- 
Jakub Narebski
Poland
ShadeHawk on #git

  parent reply	other threads:[~2009-05-25 23:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8873ae500905250021p20e7096dwf5bc71c36c4047b@mail.gmail.com>
2009-05-25  7:59 ` git subtree as a solution to partial cloning? Avery Pennarun
2009-05-25  9:33   ` Asger Ottar Alstrup
2009-05-25 15:50     ` Avery Pennarun
2009-05-25 17:35       ` Asger Ottar Alstrup
2009-05-25 17:54         ` Avery Pennarun
2009-05-25 18:28           ` Asger Ottar Alstrup
2009-05-25 19:18             ` Avery Pennarun
2009-05-25 23:26             ` Jakub Narebski [this message]
2009-05-25  7:35 Asger Ottar Alstrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3bppgdan2.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=angavrilov@gmail.com \
    --cc=apenwarr@gmail.com \
    --cc=asger@area9.dk \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.