git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesse Hopkins <jesse.hops@gmail.com>
To: git@vger.kernel.org
Subject: git bundle vs git rev-list
Date: Fri, 5 Dec 2014 15:36:18 -0700	[thread overview]
Message-ID: <CAL3By--xYnXFUdDP3MDxAxvfeBT3ArFrdAV=apzdWg6_kiD2Yg@mail.gmail.com> (raw)

Hello all –



I am working to create a wrapper around git bundle to  synchronize of
git repos via sneakernet from network ‘a’ to network ‘b’ transfer on a
fairly frequent basis (daily to weekly).   Network ‘b’ has a
gatekeeper who is persnickety about what content might end up on his
network. The gatekeeper wants to know about the content being
transferred.



I’ve come up with a scheme to list the final form of all files
included in the bundle in whole or in part, see the psuedo code below:



# BEGIN PSEUDOCODE

#Create the bundle
git bundle create out.bundle --all "--since=<last_bundle_date>"

#Get list of commits
included_commits = git rev-list --all "--since=<last_bundle_date>"


#For each commit, get the immediate parent(s), and find objects in its
parents' tree that are not in its tree
foreach commit in included_commits:
               #Get all blobs in this commit's tree, map blob to file name
               CommitBlobsMapToFilename = Process(git ls-tree -r commit)

               #Now find the parent commit(s)
               ParentCommits = git rev-list --parents -n 1 commit

               foreach parent in ParentCommits:
                              #Get all blobs in the parent's tree
                              ParentBlobsMapToFilename = Process(git
ls-tree -r parent)

                              #Find blobs in this commit's tree that
are not in the parent's commit tree
                              NewBlobs =
setdiff(CommitBlobsMapToFilename , ParentBlobsMapToFilename);

                              #Write each new blob contents to a unique filename
                              foreach blob in NewBlobs
                                             filename =
CommitBlobsMapToFilename(blob)
                                             filename = makeUnique(filename)
                                             git show blob > filename
 # END PSEUDOCODE


This scheme has worked well, but this is approach is predicated on the
assumption that

git bundle create  –all –since=<last_bundle>

uses the same commits that are returned by

git rev-list --all --since=<last_bundle>

However, I’ve noticed a scenario where that is not the case.  I create
a bundle using --since=yesterday, where no activity has been made
within the past few days.  As expected, 'git rev-list --all
--since=yesterday' returns 0 commits.  However, the command 'git
bundle create --all --since=yesterday' creates a bundle containing the
full history.

Tags seem to be the culprit, but I don’t know why. I do notice in the
output of git bundle that it mentions “skipping ref …” and “skipping
tag …”, and sure enough all branches and most tags are shown as being
skipped.  However there are a few tags that are missing from that
list.

If I use --branches rather than --all as the limiter, then all is
well.  In that case, git rev-list still returns 0 commits, and git
bundle reports that it is refusing to make an empty bundle, as
expected.

So after all that, I have a two questions:

1. Any thoughts on why a tag would be included by 'git bundle', when
'git rev-list' with the same arguments returns empty?

2. Is there a way to list commits contained in the bundle file itself?
 This seems like it would be more robust than trying to re-create the
commit list via 'git rev-list'.

Thanks,

Jesse

             reply	other threads:[~2014-12-05 22:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-05 22:36 Jesse Hopkins [this message]
2014-12-05 23:01 ` git bundle vs git rev-list Junio C Hamano
2014-12-05 23:13 ` brian m. carlson
2014-12-05 23:40   ` Junio C Hamano
2014-12-05 23:42     ` brian m. carlson
2014-12-06  5:16 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3By--xYnXFUdDP3MDxAxvfeBT3ArFrdAV=apzdWg6_kiD2Yg@mail.gmail.com' \
    --to=jesse.hops@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).