git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git bundle vs git rev-list
@ 2014-12-05 22:36 Jesse Hopkins
  2014-12-05 23:01 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jesse Hopkins @ 2014-12-05 22:36 UTC (permalink / raw)
  To: git

Hello all –



I am working to create a wrapper around git bundle to  synchronize of
git repos via sneakernet from network ‘a’ to network ‘b’ transfer on a
fairly frequent basis (daily to weekly).   Network ‘b’ has a
gatekeeper who is persnickety about what content might end up on his
network. The gatekeeper wants to know about the content being
transferred.



I’ve come up with a scheme to list the final form of all files
included in the bundle in whole or in part, see the psuedo code below:



# BEGIN PSEUDOCODE

#Create the bundle
git bundle create out.bundle --all "--since=<last_bundle_date>"

#Get list of commits
included_commits = git rev-list --all "--since=<last_bundle_date>"


#For each commit, get the immediate parent(s), and find objects in its
parents' tree that are not in its tree
foreach commit in included_commits:
               #Get all blobs in this commit's tree, map blob to file name
               CommitBlobsMapToFilename = Process(git ls-tree -r commit)

               #Now find the parent commit(s)
               ParentCommits = git rev-list --parents -n 1 commit

               foreach parent in ParentCommits:
                              #Get all blobs in the parent's tree
                              ParentBlobsMapToFilename = Process(git
ls-tree -r parent)

                              #Find blobs in this commit's tree that
are not in the parent's commit tree
                              NewBlobs =
setdiff(CommitBlobsMapToFilename , ParentBlobsMapToFilename);

                              #Write each new blob contents to a unique filename
                              foreach blob in NewBlobs
                                             filename =
CommitBlobsMapToFilename(blob)
                                             filename = makeUnique(filename)
                                             git show blob > filename
 # END PSEUDOCODE


This scheme has worked well, but this is approach is predicated on the
assumption that

git bundle create  –all –since=<last_bundle>

uses the same commits that are returned by

git rev-list --all --since=<last_bundle>

However, I’ve noticed a scenario where that is not the case.  I create
a bundle using --since=yesterday, where no activity has been made
within the past few days.  As expected, 'git rev-list --all
--since=yesterday' returns 0 commits.  However, the command 'git
bundle create --all --since=yesterday' creates a bundle containing the
full history.

Tags seem to be the culprit, but I don’t know why. I do notice in the
output of git bundle that it mentions “skipping ref …” and “skipping
tag …”, and sure enough all branches and most tags are shown as being
skipped.  However there are a few tags that are missing from that
list.

If I use --branches rather than --all as the limiter, then all is
well.  In that case, git rev-list still returns 0 commits, and git
bundle reports that it is refusing to make an empty bundle, as
expected.

So after all that, I have a two questions:

1. Any thoughts on why a tag would be included by 'git bundle', when
'git rev-list' with the same arguments returns empty?

2. Is there a way to list commits contained in the bundle file itself?
 This seems like it would be more robust than trying to re-create the
commit list via 'git rev-list'.

Thanks,

Jesse

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git bundle vs git rev-list
  2014-12-05 22:36 git bundle vs git rev-list Jesse Hopkins
@ 2014-12-05 23:01 ` Junio C Hamano
  2014-12-05 23:13 ` brian m. carlson
  2014-12-06  5:16 ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2014-12-05 23:01 UTC (permalink / raw)
  To: Jesse Hopkins; +Cc: git

Jesse Hopkins <jesse.hops@gmail.com> writes:

> 2. Is there a way to list commits contained in the bundle file itself?
>  This seems like it would be more robust than trying to re-create the
> commit list via 'git rev-list'.

"git bundle list-heads o.bndl" shows the positive endpoints, but
there is no corresponding "git bundle list-prereq" that shows the
prerequisite commits.

Running "git bundle verify o.bndl" in an empty directory will show
the negative endpoints that are required to be in the receiving
repository in its error message, e.g.

    $ git bundle verify ~/w/git.git/o.bndle
    error: Repository lacks these prerequisite commits:
    error: bf404025edf1d7f5a69aa07cbaa88622e9d528df 
    error: 15ab2081fff5b234ec5705a8645d39c1fdcf204c 
    ...

so collecting them would be one way to substitute "list-prereq".

Once you learned the positive and negative endpoints, running "git
rev-list --objects $positive_ones --not $negative_ones" should list
all the objects contained in the bundle.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git bundle vs git rev-list
  2014-12-05 22:36 git bundle vs git rev-list Jesse Hopkins
  2014-12-05 23:01 ` Junio C Hamano
@ 2014-12-05 23:13 ` brian m. carlson
  2014-12-05 23:40   ` Junio C Hamano
  2014-12-06  5:16 ` Jeff King
  2 siblings, 1 reply; 6+ messages in thread
From: brian m. carlson @ 2014-12-05 23:13 UTC (permalink / raw)
  To: Jesse Hopkins; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

On Fri, Dec 05, 2014 at 03:36:18PM -0700, Jesse Hopkins wrote:
> 1. Any thoughts on why a tag would be included by 'git bundle', when
> 'git rev-list' with the same arguments returns empty?

I think the answer to this is found in the git rev-list manpage:

  List commits that are reachable by following the parent links from the
  given commit(s), but exclude commits that are reachable from the
  one(s) given with a ^ in front of them.

The operative word here is "commits".  A bundle might include one or
more tag objects, or unannotated tags, even though no new commits were
available within the time frame.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git bundle vs git rev-list
  2014-12-05 23:13 ` brian m. carlson
@ 2014-12-05 23:40   ` Junio C Hamano
  2014-12-05 23:42     ` brian m. carlson
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2014-12-05 23:40 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Jesse Hopkins, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On Fri, Dec 05, 2014 at 03:36:18PM -0700, Jesse Hopkins wrote:
>> 1. Any thoughts on why a tag would be included by 'git bundle', when
>> 'git rev-list' with the same arguments returns empty?
>
> I think the answer to this is found in the git rev-list manpage:
>
>   List commits that are reachable by following the parent links from the
>   given commit(s), but exclude commits that are reachable from the
>   one(s) given with a ^ in front of them.
>
> The operative word here is "commits".  A bundle might include one or
> more tag objects, or unannotated tags, even though no new commits were
> available within the time frame.

Is this what a recent "git bundle create" change in 2.1.1 and 2.2
fixed?  The Release Notes to them seem to have this entry:

 * "git bundle create" with date-range specification were meant to
   exclude tags outside the range, but it did not work correctly.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git bundle vs git rev-list
  2014-12-05 23:40   ` Junio C Hamano
@ 2014-12-05 23:42     ` brian m. carlson
  0 siblings, 0 replies; 6+ messages in thread
From: brian m. carlson @ 2014-12-05 23:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jesse Hopkins, git

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

On Fri, Dec 05, 2014 at 03:40:06PM -0800, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On Fri, Dec 05, 2014 at 03:36:18PM -0700, Jesse Hopkins wrote:
> >> 1. Any thoughts on why a tag would be included by 'git bundle', when
> >> 'git rev-list' with the same arguments returns empty?
> >
> > I think the answer to this is found in the git rev-list manpage:
> >
> >   List commits that are reachable by following the parent links from the
> >   given commit(s), but exclude commits that are reachable from the
> >   one(s) given with a ^ in front of them.
> >
> > The operative word here is "commits".  A bundle might include one or
> > more tag objects, or unannotated tags, even though no new commits were
> > available within the time frame.
> 
> Is this what a recent "git bundle create" change in 2.1.1 and 2.2
> fixed?  The Release Notes to them seem to have this entry:
> 
>  * "git bundle create" with date-range specification were meant to
>    exclude tags outside the range, but it did not work correctly.

That certainly could be the case.  I was thinking that perhaps someone
had created a tag recently, but your explanation is more likely.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git bundle vs git rev-list
  2014-12-05 22:36 git bundle vs git rev-list Jesse Hopkins
  2014-12-05 23:01 ` Junio C Hamano
  2014-12-05 23:13 ` brian m. carlson
@ 2014-12-06  5:16 ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2014-12-06  5:16 UTC (permalink / raw)
  To: Jesse Hopkins; +Cc: git

On Fri, Dec 05, 2014 at 03:36:18PM -0700, Jesse Hopkins wrote:

> #Create the bundle
> git bundle create out.bundle --all "--since=<last_bundle_date>"

Others pointed out that a bug in the handling of --since may be the
culprit here. However, I'd encourage you to use actual sha1s, as they
are going to be more robust (especially in the face of any clock skew in
the commit timestamps).

You should be able to follow a procedure like:

  1. On day 1, create a bundle from scratch:

       git bundle create out.bundle --all

  2. Before you send it out, record its tips in the local repository
     for later reference:

       git fetch out.bundle +refs/*:refs/remotes/bundle/*

  3. On day 2, create a bundle from the previously recorded tips:

       git bundle create out.bundle --all --not --remotes=bundle

  4. Update your tips in the same way:

       git fetch out.bundle +refs/*:refs/remotes/bundle/*

and so on for day 3 and onward.

Note that this is not the only way to store those tips (I just did it
using git refs because it's simple to manipulate). You could also just
store it in a file:

      # checkpoint
      git ls-remote out.bundle | cut -f1 | sort -u >tips

      # make incremental bundle
      git bundle create out.bundle --all --not $(cat tips)

This also makes it easy to recover if the other side ever gets out of
sync (say you create and checkpoint a bundle on the sending side, but it
never makes it to the remote; how do you know where to start from?). You
can always get the latest set of tips from the remote by running:

      git ls-remote . | cut -f1 | sort -u >tips

on it and then sneaker-netting the tips file back to the sender.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-12-06  5:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-05 22:36 git bundle vs git rev-list Jesse Hopkins
2014-12-05 23:01 ` Junio C Hamano
2014-12-05 23:13 ` brian m. carlson
2014-12-05 23:40   ` Junio C Hamano
2014-12-05 23:42     ` brian m. carlson
2014-12-06  5:16 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).