Git development
 help / color / mirror / Atom feed
* Re: GTP/0.1 terminology 101: commit reels and references
       [not found] <488D42B6.4030701@gmail.com>
@ 2008-07-28  7:02 ` Sam Vilain
  2008-07-28  7:24   ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Sam Vilain @ 2008-07-28  7:02 UTC (permalink / raw)
  To: Joshua Roys; +Cc: gittorrent, git, Jonas Fonseca

Josh, I'm cc:'ing the main list with this one as well, as I think it
falls into the scope of discussions of wider interest than the rfc- and
implementation-specific discussions that (I envisaged would) happen on
the gittorrent list.

On Sun, 2008-07-27 at 23:53 -0400, Joshua Roys wrote:
> Here's some terms, as best as I understand them.
> 
> The *Commit Reel* is a sequence of objects between two reference points, 
> sorted in a deterministic fashion.  The two reference points are 
> *Reference* objects, or a signed tag containing a collection of git refs 
> (similar to the output from `git show-ref`).

Yes, that's right.  We have the two slightly ambiguous terms, "refs",
which mean the same thing as the rest of git, and "References", which
are more like .git/packed-refs.  Perhaps this is a good time to pick a
better name.  "Slice" would be an accurate term, though it's certainly
tempting to pick another term from the tape taxonomy, such as "Splice"¹
or "Mark"².

>   Commit reels can also, and generally do, include the objects required
>  for a specific commit.

Yes.  The only times where they wouldn't contain all the objects
required for the commits within the reel, is when those objects happened
to be contained by a previous reel.

This is one of the design decisions which I think may be a mistake; a
less expensive to calculate definition of a reel would be *all* objects
between the starting and ending Reference objects.  The current
definition requires a hash index of objects in each reel, which must be
consulted once for all objects when creating the reel index.

> As an example, a commit reel could be the set of objects between the 
> v2.6.25 and v2.6.26 tags of the Linux kernel.  The only thing we would 
> need added to those tags is the list of heads (and tags) at the time of 
> the tagging.

Correct.  It represents all of the changes in a given repository over a
period of time.

> Those two structures form the basis for two *Peer Wire Protocol* (PWP) 
> messages, Reels and References.
> 
> Any questions, corrections, or rotten vegetables?

No, I think that's all good so far...

Cheers,
Sam.

¹ - see
http://en.wikipedia.org/wiki/Reel-to-reel_audio_tape_recording#Description
² - as used by the old unix "mt", see eg
http://docs.sun.com/app/docs/doc/816-0210/6m6nb7mf3?a=view

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GTP/0.1 terminology 101: commit reels and references
  2008-07-28  7:02 ` GTP/0.1 terminology 101: commit reels and references Sam Vilain
@ 2008-07-28  7:24   ` Junio C Hamano
  2008-07-28 10:03     ` Sam Vilain
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-07-28  7:24 UTC (permalink / raw)
  To: Sam Vilain; +Cc: Joshua Roys, gittorrent, git, Jonas Fonseca

Sam Vilain <sam@vilain.net> writes:

>>   Commit reels can also, and generally do, include the objects required
>>  for a specific commit.
>
> Yes.  The only times where they wouldn't contain all the objects
> required for the commits within the reel, is when those objects happened
> to be contained by a previous reel.

What do you mean by "previous" reel?  It is not quite defined in your
message but perhaps defined elsewhere?

How is this different from a bundle?  Does a reel, unlike a bundle,
contain the full tree for the bottom commits? 

> This is one of the design decisions which I think may be a mistake; a
> less expensive to calculate definition of a reel would be *all* objects
> between the starting and ending Reference objects.

Do you mean all such objects and nothing else?  That would imply that a
reel is quite similar to a bundle (but neither rev-list --objects-edge
nor bundle guarantees that the result is minimal).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GTP/0.1 terminology 101: commit reels and references
  2008-07-28  7:24   ` Junio C Hamano
@ 2008-07-28 10:03     ` Sam Vilain
  2008-07-28 12:01       ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Sam Vilain @ 2008-07-28 10:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Joshua Roys, gittorrent, git, Jonas Fonseca

On Mon, 2008-07-28 at 00:24 -0700, Junio C Hamano wrote:
> >>   Commit reels can also, and generally do, include the objects required
> >>  for a specific commit.
> >
> > Yes.  The only times where they wouldn't contain all the objects
> > required for the commits within the reel, is when those objects happened
> > to be contained by a previous reel.
> 
> What do you mean by "previous" reel?  It is not quite defined in your
> message but perhaps defined elsewhere?
> 
> How is this different from a bundle?  Does a reel, unlike a bundle,
> contain the full tree for the bottom commits? 

They are almost identical, both being defined by a set of starting and
ending refs.  And now that you mention it, I feel slightly embarrassed
for not spotting the connection before.  I only really compared reels to
packs, which is what the original specification tried to chop up bitwise
and distribute chunk by chunk.

The differences are:

  - the reel has a defined object order (which as I hoped to demonstrate
    in the test cases, is just a refinement of rev-list --date-order)

  - deltas always point in one direction, to objects "earlier" on
    the reel, so that slices of the reel sent on the network can be made
    thin without resulting in unresolvable deltas (which should be
    possible to do on commit boundaries using rev-list --objects-edge)

  - the behaviour at the beginning of the reel is precisely defined
    (although as I said, I think that the decision might be worth
    revisiting - perhaps getting just the latest reel is a useful
    'shallow clone')

> > This is one of the design decisions which I think may be a mistake; a
> > less expensive to calculate definition of a reel would be *all* objects
> > between the starting and ending Reference objects.
> 
> Do you mean all such objects and nothing else?  That would imply that a
> reel is quite similar to a bundle (but neither rev-list --objects-edge
> nor bundle guarantees that the result is minimal).

It's the lack of guarantees which is the issue, really.  In order to
take the download work of the entire pack and distribute it over
multiple peers, you need a way to carve the bundle up.  This has to
happen in such a way that the fragments that you get back will actually
fit together at the end, and also in such a way that you don't lose the
benefits of delta compression.

The way I thought would be best to do that would be to line up all the
objects in an exactly defined way - hence, the "reel" concept - and then
chop that up.

If a pack is already arranged to line up with the commit reel's
structure, then it's possible that the amount of work required to answer
a "play" request is as little as looking up in the reel index the local
on-disk location within the local pack, and copying that to the network.

I've certainly wondered how much baggage could be removed from this
whole thing, like replacing the "tracker" with a simple git-daemon
message that holds a register of mirrors/peers, possibly layering things
over git:// instead of the bittorrent-like protocol, dividing up blocks
by the commit graph and not all objects, etc.  But I think that it would
be best to defer that kind of design change until the conclusion of this
prototype experiment.

That being said, anything which does shortcut the distance to the finish
line and can be agreed on wouldn't go amiss.

Sam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GTP/0.1 terminology 101: commit reels and references
  2008-07-28 10:03     ` Sam Vilain
@ 2008-07-28 12:01       ` Johannes Schindelin
  2008-07-28 19:26         ` Sam Vilain
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2008-07-28 12:01 UTC (permalink / raw)
  To: Sam Vilain; +Cc: Junio C Hamano, Joshua Roys, gittorrent, git, Jonas Fonseca

Hi,

On Mon, 28 Jul 2008, Sam Vilain wrote:

> On Mon, 2008-07-28 at 00:24 -0700, Junio C Hamano wrote:
>
> >
> > >
> > >> Commit reels can also, and generally do, include the objects 
> > >> required for a specific commit.
> > >
> > > Yes.  The only times where they wouldn't contain all the objects 
> > > required for the commits within the reel, is when those objects 
> > > happened to be contained by a previous reel.
> > 
> > What do you mean by "previous" reel?  It is not quite defined in your 
> > message but perhaps defined elsewhere?
> > 
> > How is this different from a bundle?  Does a reel, unlike a bundle, 
> > contain the full tree for the bottom commits?

AFAICT no, the reel should not contain the full tree for the bottom 
commit.

> They are almost identical, both being defined by a set of starting and 
> ending refs.  And now that you mention it, I feel slightly embarrassed 
> for not spotting the connection before.  I only really compared reels to 
> packs, which is what the original specification tried to chop up bitwise 
> and distribute chunk by chunk.
> 
> The differences are:
> 
>   - the reel has a defined object order (which as I hoped to demonstrate
>     in the test cases, is just a refinement of rev-list --date-order)

Do you mean that the commit reel is a list pointing to bundles that can be 
sorted topologically by their contained commits?

>   - deltas always point in one direction, to objects "earlier" on
>     the reel, so that slices of the reel sent on the network can be made
>     thin without resulting in unresolvable deltas (which should be
>     possible to do on commit boundaries using rev-list --objects-edge)

That is exactly what bundles do.  They are thin, as they assume that a few 
"preconditions", i.e. refs, are present.

>   - the behaviour at the beginning of the reel is precisely defined
>     (although as I said, I think that the decision might be worth
>     revisiting - perhaps getting just the latest reel is a useful
>     'shallow clone')

If you want to allow shallow clones, you must make the bundles non-thin.  
That would be a major bandwidth penalty.

I'd rather not allow shallow clones with Gitorrent.

> > > This is one of the design decisions which I think may be a mistake; 
> > > a less expensive to calculate definition of a reel would be *all* 
> > > objects between the starting and ending Reference objects.
> > 
> > Do you mean all such objects and nothing else?  That would imply that 
> > a reel is quite similar to a bundle (but neither rev-list 
> > --objects-edge nor bundle guarantees that the result is minimal).
> 
> It's the lack of guarantees which is the issue, really.

It should not be too difficult to provide a rev-list option (which is 
inherited by git-bundle, then) to pay an extra time to make sure that the 
bundle is minimal.

BTW this is a good example how communication on the Git mailing list can 
help a GSoC project.

> In order to take the download work of the entire pack and distribute it 
> over multiple peers, you need a way to carve the bundle up.  This has to 
> happen in such a way that the fragments that you get back will actually 
> fit together at the end, and also in such a way that you don't lose the 
> benefits of delta compression.

That should be relatively easy.

> The way I thought would be best to do that would be to line up all the 
> objects in an exactly defined way - hence, the "reel" concept - and then 
> chop that up.

What exactly is that exact definition?

Is it the output of "rev-list --all --objects", chopped into equal chunks 
at commit boundaries?  If so, it should probably be equal in terms of 
size, right?

The tricky thing, of course, is to make that thing incremental, i.e. 
replace only a minimal amount of items in the "commit reel" (if I 
understood correctly, and the commit reel refers to a list of sets of 
commits with their objects) when a branch was modified.

Hmm.  Maybe it would be time for you to draw a tiny diagram for all the 
people too lazy like me, which shows roughly how the communication between 
the peers should look like, and how the reel fits in.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GTP/0.1 terminology 101: commit reels and references
  2008-07-28 12:01       ` Johannes Schindelin
@ 2008-07-28 19:26         ` Sam Vilain
  2008-07-28 22:30           ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Sam Vilain @ 2008-07-28 19:26 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Joshua Roys, gittorrent, git, Jonas Fonseca

On Mon, 2008-07-28 at 14:01 +0200, Johannes Schindelin wrote:
> >   - the reel has a defined object order (which as I hoped to demonstrate
> >     in the test cases, is just a refinement of rev-list --date-order)
> 
> Do you mean that the commit reel is a list pointing to bundles that can be 
> sorted topologically by their contained commits?

Yes, but it is more defined than that.  There are still ambiguities with
topological sort, so the gittorrent spec specified exactly how all ties
are broken.  They happen to be a further refinement of --date-order,
with respect to the ordering of commits.

> >   - deltas always point in one direction, to objects "earlier" on
> >     the reel, so that slices of the reel sent on the network can be made
> >     thin without resulting in unresolvable deltas (which should be
> >     possible to do on commit boundaries using rev-list --objects-edge)
> That is exactly what bundles do.  They are thin, as they assume that a few 
> "preconditions", i.e. refs, are present.

Ok.  I think there are also some other trivial differences such as
bundles containing refs (which in the context of gittorrent will be
useless).

> >   - the behaviour at the beginning of the reel is precisely defined
> >     (although as I said, I think that the decision might be worth
> >     revisiting - perhaps getting just the latest reel is a useful
> >     'shallow clone')
> 
> If you want to allow shallow clones, you must make the bundles non-thin.  
> That would be a major bandwidth penalty.
> 
> I'd rather not allow shallow clones with Gitorrent.

By "Shallow" I think I mean a different thing to you.  I mean something
akin to just the last pack's worth of commits.

> > It's the lack of guarantees which is the issue, really.
> 
> It should not be too difficult to provide a rev-list option (which is 
> inherited by git-bundle, then) to pay an extra time to make sure that the 
> bundle is minimal.

Ok.  But from the current implementation's perspective, this is not yet
needed, we are just using the existing API.

Actually what would be useful would be for the thin pack generation to
also allow any object to be specified as its input list, not just
commits... then we wouldn't have to break blocks on commit boundaries
(see http://gittorrent.utsl.gen.nz/rfc.html#org-blocks).

> > In order to take the download work of the entire pack and distribute it 
> > over multiple peers, you need a way to carve the bundle up.  This has to 
> > happen in such a way that the fragments that you get back will actually 
> > fit together at the end, and also in such a way that you don't lose the 
> > benefits of delta compression.
> 
> That should be relatively easy.
> 
> > The way I thought would be best to do that would be to line up all the 
> > objects in an exactly defined way - hence, the "reel" concept - and then 
> > chop that up.
> 
> What exactly is that exact definition?

http://gittorrent.utsl.gen.nz/rfc.html#org-reels

> Is it the output of "rev-list --all --objects", chopped into equal chunks 
> at commit boundaries?  If so, it should probably be equal in terms of 
> size, right?

No.  It's chopped by uncompressed size.

http://gittorrent.utsl.gen.nz/rfc.html#org-blocks

> The tricky thing, of course, is to make that thing incremental, i.e. 
> replace only a minimal amount of items in the "commit reel" (if I 
> understood correctly, and the commit reel refers to a list of sets of 
> commits with their objects) when a branch was modified.

You would make a new reel to cover a new bunch of updates.  It's
important that the reels don't change too often for reasons I describe
in the RFC.

> Hmm.  Maybe it would be time for you to draw a tiny diagram for all the 
> people too lazy like me, which shows roughly how the communication between 
> the peers should look like, and how the reel fits in.

As I said in my recent message to the list, I wrote another top level
overview here:

http://gittorrent.utsl.gen.nz/rfc.html#org-blocks

Cheers,
Sam.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GTP/0.1 terminology 101: commit reels and references
  2008-07-28 19:26         ` Sam Vilain
@ 2008-07-28 22:30           ` Johannes Schindelin
  2008-07-29  7:00             ` Sam Vilain
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2008-07-28 22:30 UTC (permalink / raw)
  To: Sam Vilain; +Cc: Junio C Hamano, Joshua Roys, gittorrent, git, Jonas Fonseca

Hi,

On Tue, 29 Jul 2008, Sam Vilain wrote:

> On Mon, 2008-07-28 at 14:01 +0200, Johannes Schindelin wrote:
> > >   - the reel has a defined object order (which as I hoped to 
> > >     demonstrate in the test cases, is just a refinement of rev-list 
> > >     --date-order)
> > 
> > Do you mean that the commit reel is a list pointing to bundles that 
> > can be sorted topologically by their contained commits?
> 
> Yes, but it is more defined than that.  There are still ambiguities with 
> topological sort, so the gittorrent spec specified exactly how all ties 
> are broken.  They happen to be a further refinement of --date-order, 
> with respect to the ordering of commits.

But does that not mean that any new ref branching off of an ancient commit 
changes all the pack boundaries?

I'd rather have an intelligent incremental updater, and keep most of the 
existing bundles immutable.  That way, a new ref, or a changed one, can be 
mostly served from peers, not exclusively from the seeders.

> > >   - deltas always point in one direction, to objects "earlier" on 
> > >     the reel, so that slices of the reel sent on the network can be 
> > >     made thin without resulting in unresolvable deltas (which should 
> > >     be possible to do on commit boundaries using rev-list 
> > >     --objects-edge)
> >
> > That is exactly what bundles do.  They are thin, as they assume that a 
> > few "preconditions", i.e. refs, are present.
> 
> Ok.  I think there are also some other trivial differences such as 
> bundles containing refs (which in the context of gittorrent will be 
> useless).

Yeah, I think that bundles themselves are pretty useless in gitorrent.  
But what they _contain_ is pretty much what you need as blocks.

> > >   - the behaviour at the beginning of the reel is precisely defined 
> > >     (although as I said, I think that the decision might be worth 
> > >     revisiting - perhaps getting just the latest reel is a useful 
> > >     'shallow clone')
> > 
> > If you want to allow shallow clones, you must make the bundles 
> > non-thin.  That would be a major bandwidth penalty.
> > 
> > I'd rather not allow shallow clones with Gitorrent.
> 
> By "Shallow" I think I mean a different thing to you.  I mean something 
> akin to just the last pack's worth of commits.

That _is_ a shallow clone.  And that is exactly what I meant.  If you want 
to have all objects of the commits in the same pack, then you are 
basically making fat packs.  Which come with a hefty bandwidth penalty.

That is why I would suggest not allowing shallow clones; if you want to 
allow them, I have to ask myself why bother with a torrent at all...  It 
is not like the shallow clones are large, or that the people fetching them 
will stay around long to seed anything, and the packs would change 
frequently, making the whole torrent business pretty inefficient.

> > > It's the lack of guarantees which is the issue, really.
> > 
> > It should not be too difficult to provide a rev-list option (which is 
> > inherited by git-bundle, then) to pay an extra time to make sure that 
> > the bundle is minimal.
> 
> Ok.  But from the current implementation's perspective, this is not yet 
> needed, we are just using the existing API.

Why make it hard?  We have a lively community with brilliant people, and 
they frequently have fun solving puzzles like this: what is the best 
strategy to make equally sized, rarely (or maybe never?) changing packs 
from a set of given refs.

> Actually what would be useful would be for the thin pack generation to 
> also allow any object to be specified as its input list, not just 
> commits... then we wouldn't have to break blocks on commit boundaries 
> (see http://gittorrent.utsl.gen.nz/rfc.html#org-blocks).

That should be easy, but I think that it would be _even better_ if we ask 
pack-objects to generate several packs from the needed objects.  Ooops.  
That already exists: 

	$ git pack-objects --max-pack-size=<n>

Storing the packs in a second GIT_OBJECT_DIRECTORY that has the 
original as an alternate, together with the --local flag, should help even 
further: You can mark the last pack (which does not reach max-pack-size, 
most likely), remove it and just rerun the packing.

Of course, this needs some thought when large chunks of the object 
database become stale when a long branch was just deleted.  Not a major 
obstacle, though.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GTP/0.1 terminology 101: commit reels and references
  2008-07-28 22:30           ` Johannes Schindelin
@ 2008-07-29  7:00             ` Sam Vilain
  0 siblings, 0 replies; 7+ messages in thread
From: Sam Vilain @ 2008-07-29  7:00 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Joshua Roys, gittorrent, git, Jonas Fonseca

On Tue, 2008-07-29 at 00:30 +0200, Johannes Schindelin wrote:
> > Yes, but it is more defined than that.  There are still ambiguities with 
> > topological sort, so the gittorrent spec specified exactly how all ties 
> > are broken.  They happen to be a further refinement of --date-order, 
> > with respect to the ordering of commits.
> 
> But does that not mean that any new ref branching off of an ancient commit 
> changes all the pack boundaries?

No.  A "References" object is a snapshot of all refs at a particular
time.  If you want to make a new ref you make a new "References" object.
*all* of the new objects are contained in the new reel, and the new
reels do not affect the old reels.

> That should be easy, but I think that it would be _even better_ if we ask 
> pack-objects to generate several packs from the needed objects.  Ooops.  
> That already exists: 
> 
> 	$ git pack-objects --max-pack-size=<n>

This does not deterministically generate the same pack for a given set of
refs across all git versions.

Your ideas would have been excellent earlier on, perhaps if developed
they might have resulted in something quite a bit simpler with all of
the features the current protocol has - but given we are in the second
half of a GSoC project of which the end is in sight then I think we
should shelve them until the project finishes.  There has certainly been
a lot of useful things come out of them!

Cheers,
Sam.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-07-29  7:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <488D42B6.4030701@gmail.com>
2008-07-28  7:02 ` GTP/0.1 terminology 101: commit reels and references Sam Vilain
2008-07-28  7:24   ` Junio C Hamano
2008-07-28 10:03     ` Sam Vilain
2008-07-28 12:01       ` Johannes Schindelin
2008-07-28 19:26         ` Sam Vilain
2008-07-28 22:30           ` Johannes Schindelin
2008-07-29  7:00             ` Sam Vilain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox