git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rogan Dawes <discard@dawes.za.net>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Figured out how to get Mozilla into git
Date: Sat, 10 Jun 2006 16:47:38 +0200	[thread overview]
Message-ID: <448ADB8A.3070506@dawes.za.net> (raw)
In-Reply-To: <7vzmglgyz0.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> Rogan Dawes <lists@dawes.za.net> writes:
> 
>> Here's an idea. How about separating trees and commits from the actual
>> blobs (e.g. in separate packs)?
> 
> If I remember my numbers correctly, trees for any project with a
> size that matters contribute nonnegligible amount of the total
> pack weight.  Perhaps 10-25%.

Out of curiosity, do you think that it may be possible for tree objects 
to compress more/better if they are packed together? Or does the 
existing pack compression logic already do the diff against similar tree 
objects?

>> In this way, the user has a history that will show all of the commit
>> messages, and would be able to see _which_ files have changed over
>> time e.g. gitk would still work - except for the actual file level
>> diff, "git log" should also still work, etc
> 
> I suspect it would make a very unpleasant system to use.
> Sometimes "git diff -p" would show diffs, and other times it
> mysteriously complain saying that it lacks necessary blobs to do
> its job.  You cannot even run fsck and tell from its output
> which missing objects are OK (because you chose to create such a
> sparse repository) and which are real corruption.

The fsck problem could be worked around by maintaining a list of objects 
that are explicitly not expected to be present. As the list gets shorter 
(perhaps as diffs are performed, other parts of the blob history are 
retrieved, etc), the list will get shorter until we have a complete 
clone of the original tree.

Of course diffs against a version further back in the history would 
fail. But if you start with a checkout of a complete tree, any changes 
made since that point would at least have one version to compare against.

In effect, what we would have is a caching repository (or as Jakub said, 
a lazy clone). An initial checkout would effectively be pre-seeding the 
cache. One does not necessarily even need to get the complete set of 
commit and tree objects, either. The bare minimum would probably be to 
get the HEAD commit, and the tree objects that correspond to that commit.

At that point, one could populate the "uncached objects" list with the 
parent commits. One would not be in a position to get any history at 
all, of course.

As the user performs various operations, e.g. git log, git could either 
go and fetch the necessary objects (updating the uncached list as it 
goes), or fail with a message such as "Cannot perform the requested 
operation - required objects are not available". (We may require another 
utility that would list the objects required for an operation, and 
compare it against the list of "uncached objects", printing out a list 
of which are not yet available locally. I realise that this may be 
expensive. Maybe a repo configuration option "cached" to enable or 
disable this.)

As Jakub suggested, it would be necessary to configure the location of 
the source for any missing objects, but that is probably in the repo 
config anyway.

> A shallow clone with explicit cauterization in grafts file at
> least would not have that problem. Although the user will still
> not see the exact same result as what would happen in a full
> repository, at least we can say "your git log ends at that
> commit because your copy of the history does not go back beyond
> that" and the user would understand.

Or, we could say, perform the operation while you are online, and can 
access the necessary objects. If the user has explicitly chosen to make 
a lazy clone, then they should expect that at some point, whatever they 
do may require them to be online to access items that they have not yet 
cloned.

Rogan

  reply	other threads:[~2006-06-10 14:47 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-09  2:17 Figured out how to get Mozilla into git Jon Smirl
2006-06-09  2:56 ` Nicolas Pitre
2006-06-09  3:06 ` Martin Langhoff
2006-06-09  3:28   ` Jon Smirl
2006-06-09  7:17     ` Jakub Narebski
2006-06-09 15:01       ` Linus Torvalds
2006-06-09 16:11         ` Nicolas Pitre
2006-06-09 16:30           ` Linus Torvalds
2006-06-09 17:38             ` Nicolas Pitre
2006-06-09 17:49               ` Linus Torvalds
2006-06-09 17:10           ` Jakub Narebski
2006-06-09 18:13   ` Jon Smirl
2006-06-09 19:00     ` Linus Torvalds
2006-06-09 20:17       ` Jon Smirl
2006-06-09 20:40         ` Linus Torvalds
2006-06-09 20:56           ` Jon Smirl
2006-06-09 21:57             ` Linus Torvalds
2006-06-09 22:17               ` Linus Torvalds
2006-06-09 23:16               ` Greg KH
2006-06-09 23:37               ` Martin Langhoff
2006-06-09 23:43                 ` Linus Torvalds
2006-06-10  0:00                   ` Jon Smirl
2006-06-10  0:11                     ` Linus Torvalds
2006-06-10  0:16                       ` Jon Smirl
2006-06-10  0:45                         ` Jon Smirl
2006-06-09 20:44         ` Jakub Narebski
2006-06-09 21:05         ` Nicolas Pitre
2006-06-09 21:46           ` Jon Smirl
2006-06-10  1:23         ` Martin Langhoff
2006-06-10  1:14   ` Martin Langhoff
2006-06-10  1:33     ` Linus Torvalds
2006-06-10  1:43       ` Linus Torvalds
2006-06-10  1:48         ` Jon Smirl
2006-06-10  1:59           ` Linus Torvalds
2006-06-10  2:21             ` Jon Smirl
2006-06-10  2:34               ` Carl Worth
2006-06-10  3:08                 ` Linus Torvalds
2006-06-10  8:21                   ` Jakub Narebski
2006-06-10  9:00                     ` Junio C Hamano
2006-06-10  8:36                   ` Rogan Dawes
2006-06-10  9:08                     ` Junio C Hamano
2006-06-10 14:47                       ` Rogan Dawes [this message]
2006-06-10 14:58                         ` Jakub Narebski
2006-06-10 15:14                         ` Nicolas Pitre
2006-06-10 17:53                     ` Linus Torvalds
2006-06-10 18:02                       ` Jon Smirl
2006-06-10 18:36                       ` Rogan Dawes
2006-06-10  3:01               ` Linus Torvalds
2006-06-10  2:30             ` Jon Smirl
2006-06-10  3:41             ` Martin Langhoff
2006-06-10  3:55               ` Junio C Hamano
2006-06-10  4:02               ` Linus Torvalds
2006-06-10  4:11                 ` Linus Torvalds
2006-06-10  6:02                   ` Jon Smirl
2006-06-10  6:15                     ` Junio C Hamano
2006-06-10 15:44                       ` Jon Smirl
2006-06-10 16:15                         ` Timo Hirvonen
2006-06-10 18:37                         ` Petr Baudis
2006-06-10 18:55                         ` Lars Johannsen
2006-06-11 22:00       ` Nicolas Pitre
2006-06-18 19:26         ` Linus Torvalds
2006-06-18 21:40           ` Martin Langhoff
2006-06-18 22:36             ` Linus Torvalds
2006-06-18 22:51               ` Broken PPC sha1.. (Re: Figured out how to get Mozilla into git) Linus Torvalds
2006-06-18 23:25                 ` [PATCH] Fix PPC SHA1 routine for large input buffers Paul Mackerras
2006-06-19  5:02                   ` Linus Torvalds
2006-06-09  3:12 ` Figured out how to get Mozilla into git Pavel Roskin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=448ADB8A.3070506@dawes.za.net \
    --to=discard@dawes.za.net \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).