Re: Change set based shallow clone

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Junio C Hamano <junkio@cox.net>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Paul Mackerras <paulus@samba.org>,
	linux@horizon.com, jonsmirl@gmail.com, git@vger.kernel.org
Subject: Re: Change set based shallow clone
Date: Mon, 11 Sep 2006 11:54:40 -0700	[thread overview]
Message-ID: <7v8xkqgrnz.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0609102002300.27779@g5.osdl.org> (Linus Torvalds's message of "Sun, 10 Sep 2006 20:18:49 -0700 (PDT)")

Linus Torvalds <torvalds@osdl.org> writes:

> On Sun, 10 Sep 2006, Linus Torvalds wrote:
>> 
>> If we did the same pack-file approach that we do for objects, the problem 
>> ends up being that _updating_ things is really hard. What we could do (and 
>> might work) is that a "git repack" would create a "packed representation 
>> of the heads too".
>
> To clarify: I'm _not_ suggesting actually using the "pack-file" 
> representation itself for the references.
>
> I'm saying that we could have something like a
>
> 	.git/refs-packed
>
> file, which could be (for example) just a plain linear text-file, of the 
> form
>
> 	19ed368b17abfb9ad5c7467ea74fd8a045d96b43	refs/heads/html
> 	60a6bf5f53635005f4f68d8b8a33172309193623	refs/heads/maint
> 	2d32e76f893b2ac432201ce7596c8bab691691e6	refs/heads/man
>...
> 	9165ec17fde255a1770886189359897dbb541012	refs/tags/v0.99.7c
> 	02b2acff8bafb6d73c6513469cdda0c6c18c4138	refs/tags/v0.99.7d
> 	...
>
> ie it would contain just a linear file with the "<hex></tab><refname>"
> format.

This depends on how expensive it is to read a file backwards,
but it might make sense to have a single append-only file.

${GIT_DIR}/refs-packed file would be a sequence of records, each
record is of form:

	type SP timestamp LF payload LF length LF

where length and timestamp are ASCII integers, the former is the
length of this record and you would use to compute the file
offset you give lseek() to seek back to the beginning of this
record.  The latter is the creation time of this record.  The
type of the record is either full or incremental (perhaps 'F'
and 'I').

A full record describes in the above "object-name refname" tuple
format.  An incremental record has entries only for the subset
of refs, with 0{40} object name denoting deletion of a ref as in
your suggestion.

A few nice things about this representation:

 - It gives reflog for free (we need to add 'author' and
   'reason' encoded after timestamp, though); it also gives
   reflog to tags which I sometimes miss.

 - Garbage collection is straightforward, although it requires
   locking the whole refs.

 - You can now have 'foo' and 'foo/a', 'foo/b', branches at the
   same time.  It is natural to organize a topic branch 'foo'
   which consists of a merge of subtopics 'foo/a', 'foo/b', ...,
   but this cannot be expressed with the current directory-file
   based scheme.

Some minor details:

 - After every N (I do not think we would need to make this
   configurable, but we need to decide on a sane value) updates,
   we would want to write a full record so that we do not have
   to traverse too long a chain.

 - An incremental record could be delta since the last full,
   instead of delta since the last record, as the above
   introduction hints.  If we start from A, B, and C, and then A
   and B are modified in turn, we might have:

	Full: A B C
        Incr: A'
        Incr: B'

   but we could have instead:

	Full: A B C
        Incr: A'
        Incr: A' B'

   For this to make sense, the record header needs to record a
   back-pointer to the last full record before the current
   incremental record.  With that, we would need to read at most
   two records to find out the current status.

Locking between writers and readers can become problematic.  The
reader would read the last line to find the record length, seek
that much back to find the beginning of the "then-latest"
record, but when there is a simultanous writer, there is no
guarantee that what at the last line of the file _is_ the length
of the last record.  The writer may be in the middle of
appending something.  We may be able to work it around by:

 - Add hash of the record at the end;

 - The reader assumes that there is usually no collision, reads
   the hash and the length, seeks back and reads the record and
   validate the hash -- if the hash does not match, it is likely
   that it detected the writer-reader collision so it should
   wait until the file grows and stops growing and then retry.

But writer-writer contention obviously needs a file lock.  Maybe
we can use flock/lockf/fcntl(F_SETLKW)?  If we are going to do
that then protecting the reader-writer side with the same
mechanism may be simpler and cleaner.

> NOTE! It's important that whatever sceme used gets locking right. The 
> above suggestion gets it right simply because it doesn't really _change_ 
> anything. Any new or modified ref ends up using the old code, and using a 
> ".lock" file and renaming it automatically does the same thing it ever 
> did.

The above changes things quite a bit, so locking aspect of it
needs to be thought through thouroughly.

next prev parent reply	other threads:[~2006-09-11 18:53 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-07 19:52 Change set based shallow clone Jon Smirl
2006-09-07 20:21 ` Jakub Narebski
2006-09-07 20:41   ` Jon Smirl
2006-09-07 21:33     ` Jeff King
2006-09-07 21:51       ` Jakub Narebski
2006-09-07 21:37     ` Jakub Narebski
2006-09-07 22:14     ` Junio C Hamano
2006-09-07 23:09       ` Jon Smirl
2006-09-10 23:20         ` Anand Kumria
2006-09-08  8:48     ` Andreas Ericsson
2006-09-07 22:07 ` Junio C Hamano
2006-09-07 22:40   ` Jakub Narebski
2006-09-08  3:54   ` Martin Langhoff
2006-09-08  5:30     ` Junio C Hamano
2006-09-08  7:15       ` Martin Langhoff
2006-09-08  8:33         ` Junio C Hamano
2006-09-08 17:18         ` A Large Angry SCM
2006-09-08 14:20       ` Jon Smirl
2006-09-08 15:50         ` Jakub Narebski
2006-09-09  3:13           ` Petr Baudis
2006-09-09  8:39             ` Jakub Narebski
2006-09-08  5:05   ` Aneesh Kumar K.V
2006-09-08  1:01 ` linux
2006-09-08  2:23   ` Jon Smirl
2006-09-08  8:36     ` Jakub Narebski
2006-09-08  8:39       ` Junio C Hamano
2006-09-08 18:42     ` linux
2006-09-08 21:13       ` Jon Smirl
2006-09-08 22:27         ` Jakub Narebski
2006-09-08 23:09         ` Linus Torvalds
2006-09-08 23:28           ` Jon Smirl
2006-09-08 23:45             ` Paul Mackerras
2006-09-09  1:45               ` Jon Smirl
2006-09-10 12:41             ` Paul Mackerras
2006-09-10 14:56               ` Jon Smirl
2006-09-10 16:10                 ` linux
2006-09-10 18:00                   ` Jon Smirl
2006-09-10 19:03                     ` linux
2006-09-10 20:00                       ` Linus Torvalds
2006-09-10 21:00                         ` Jon Smirl
2006-09-11  2:49                           ` Linus Torvalds
2006-09-10 22:41                         ` Paul Mackerras
2006-09-11  2:55                           ` Linus Torvalds
2006-09-11  3:18                             ` Linus Torvalds
2006-09-11  6:35                               ` Junio C Hamano
2006-09-11 18:54                               ` Junio C Hamano [this message]
2006-09-11  8:36                             ` Paul Mackerras
2006-09-11 14:26                               ` linux
2006-09-11 15:01                                 ` Jon Smirl
2006-09-11 16:47                                 ` Junio C Hamano
2006-09-11 21:52                                   ` Paul Mackerras
2006-09-11 23:47                                     ` Junio C Hamano
2006-09-12  0:06                                       ` Jakub Narebski
2006-09-12  0:18                                         ` Junio C Hamano
2006-09-12  0:25                                           ` Jakub Narebski
2006-09-11  9:04                             ` Jakub Narebski
2006-09-10 18:51                 ` Junio C Hamano
2006-09-11  0:04                   ` Shawn Pearce
2006-09-11  0:42                     ` Junio C Hamano
2006-09-11  0:03               ` Shawn Pearce
2006-09-11  0:41                 ` Junio C Hamano
2006-09-11  1:04                   ` Jakub Narebski
2006-09-11  2:44                     ` Shawn Pearce
2006-09-11  5:27                       ` Junio C Hamano
2006-09-11  6:08                         ` Shawn Pearce
2006-09-11  7:11                           ` Junio C Hamano
2006-09-11 17:52                             ` Shawn Pearce
2006-09-11  2:11                   ` Jon Smirl
2006-09-09  1:05           ` Paul Mackerras
2006-09-09  2:56             ` Linus Torvalds
2006-09-09  3:23               ` Junio C Hamano
2006-09-09  3:31               ` Paul Mackerras
2006-09-09  4:04                 ` Linus Torvalds
2006-09-09  8:47                   ` Marco Costalba
2006-09-09 17:33                     ` Linus Torvalds
2006-09-09 18:04                       ` Marco Costalba
2006-09-09 18:44                         ` linux
2006-09-09 19:17                           ` Marco Costalba
2006-09-09 20:05                         ` Linus Torvalds
2006-09-09 20:43                           ` Jeff King
2006-09-09 21:11                             ` Junio C Hamano
2006-09-09 21:14                               ` Jeff King
2006-09-09 21:40                             ` Linus Torvalds
2006-09-09 22:54                               ` Jon Smirl
2006-09-10  0:18                                 ` Linus Torvalds
2006-09-10  1:22                                   ` Junio C Hamano
2006-09-10  3:49                           ` Marco Costalba
2006-09-10  4:13                             ` Junio C Hamano
2006-09-10  4:23                               ` Marco Costalba
2006-09-10  4:46                                 ` Marco Costalba
2006-09-10  4:54                                 ` Junio C Hamano
2006-09-10  5:14                                   ` Marco Costalba
2006-09-10  5:46                                     ` Junio C Hamano
2006-09-10 15:21                                     ` linux
2006-09-10 18:32                                       ` Marco Costalba
2006-09-11  9:56                                       ` Paul Mackerras
2006-09-11 12:39                                         ` linux
2006-09-10  9:49                                   ` Jakub Narebski
2006-09-10 10:28                                   ` Josef Weidendorfer
  -- strict thread matches above, loose matches on Subject: below --
2006-09-09 10:31 linux
2006-09-09 13:00 ` Marco Costalba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v8xkqgrnz.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=linux@horizon.com \
    --cc=paulus@samba.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).