git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Some feedback on 'git clone create'
@ 2024-12-04  5:28 Sainan
  2024-12-04 23:19 ` Justin Tobler
  0 siblings, 1 reply; 6+ messages in thread
From: Sainan @ 2024-12-04  5:28 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi, I hope this email finds you well.

I think Git bundles/packfiles are an exceptional compression format, but I find there are some rough edges with the tool to create them:

1. There is no way to specify that you want a shallow bundle, instead you are only able to a) pack the entire tree at a given head or b) pack new/updated objects in a specified range. Anecdotally, this could store data in ~67% of the size of an equivalent .zip file.

2. It seems that when specifiying a commit hash, it raises an error:
$ git bundle create repo.bundle $(git rev-list HEAD | head -n 1)
> fatal: Refusing to create empty bundle.
This confuses me slightly because I thought a commit hash should also be a valid head _pointer_. 'git rev-list' also seems to agree with me on this.

-- Sainan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some feedback on 'git clone create'
  2024-12-04  5:28 Some feedback on 'git clone create' Sainan
@ 2024-12-04 23:19 ` Justin Tobler
  2024-12-05  7:03   ` Sainan
  2024-12-05  8:38   ` Patrick Steinhardt
  0 siblings, 2 replies; 6+ messages in thread
From: Justin Tobler @ 2024-12-04 23:19 UTC (permalink / raw)
  To: Sainan; +Cc: git@vger.kernel.org

On 24/12/04 05:28AM, Sainan wrote:
> Hi, I hope this email finds you well.
> 
> I think Git bundles/packfiles are an exceptional compression format, but I find there are some rough edges with the tool to create them:
> 
> 1. There is no way to specify that you want a shallow bundle, instead you are only able to a) pack the entire tree at a given head or b) pack new/updated objects in a specified range. Anecdotally, this could store data in ~67% of the size of an equivalent .zip file.

You can create an incremental bundle covering a specified range.
Something like the following example might help you achieve what you are
looking for:

  $ git bundle create inc.bundle main~10..main

> 
> 2. It seems that when specifiying a commit hash, it raises an error:
> $ git bundle create repo.bundle $(git rev-list HEAD | head -n 1)
> > fatal: Refusing to create empty bundle.
> This confuses me slightly because I thought a commit hash should also be a valid head _pointer_. 'git rev-list' also seems to agree with me on this.

A bundle is essentially a pack file with a header indicating the
references contained within the bundle. If no reference is provided, the
bundle is considered empty and git refuses to create it. I think this
makes sense in the context of unbundling as you probably would not want
to add new objects without updating references in the target repository.

From the git-bundle(1) docs for "create", the usage does say it accepts
<git-rev-list-args> which may be a bit misleading because
git-rev-list(1) does consider the commit hash as valid. Maybe that
should be updated to indicate that proper references are expected.

-Justin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some feedback on 'git clone create'
  2024-12-04 23:19 ` Justin Tobler
@ 2024-12-05  7:03   ` Sainan
  2024-12-05  8:38   ` Patrick Steinhardt
  1 sibling, 0 replies; 6+ messages in thread
From: Sainan @ 2024-12-05  7:03 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git@vger.kernel.org

> You can create an incremental bundle covering a specified range.

Yeah, I know that you can bundle a delta, but I mean an analogue to a shallow clone, which is kind of like traversing the tree in a different direction.

(Sorry for double-send, forgot to CC git@vger.kernel.org on the reply.)

-- Sainan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some feedback on 'git clone create'
  2024-12-04 23:19 ` Justin Tobler
  2024-12-05  7:03   ` Sainan
@ 2024-12-05  8:38   ` Patrick Steinhardt
  2024-12-05 15:27     ` Justin Tobler
  1 sibling, 1 reply; 6+ messages in thread
From: Patrick Steinhardt @ 2024-12-05  8:38 UTC (permalink / raw)
  To: Justin Tobler; +Cc: Sainan, git@vger.kernel.org

On Wed, Dec 04, 2024 at 05:19:53PM -0600, Justin Tobler wrote:
> On 24/12/04 05:28AM, Sainan wrote:
> > Hi, I hope this email finds you well.
> > 
> > I think Git bundles/packfiles are an exceptional compression format, but I find there are some rough edges with the tool to create them:
> > 
> > 1. There is no way to specify that you want a shallow bundle, instead you are only able to a) pack the entire tree at a given head or b) pack new/updated objects in a specified range. Anecdotally, this could store data in ~67% of the size of an equivalent .zip file.
> 
> You can create an incremental bundle covering a specified range.
> Something like the following example might help you achieve what you are
> looking for:
> 
>   $ git bundle create inc.bundle main~10..main

Yup. The thing that might not be immediately obvious is that
git-bundle(1) accepts git-rev-list(1) arguments, so you can influence
what is and isn't included via that. You can for example even generate
partial bundles without blobs:

    $ git bundle create partial.bundle main~10..main \
        --filter=blob:none

What you can do with the resulting bundle might be a different story.

> > 2. It seems that when specifiying a commit hash, it raises an error:
> > $ git bundle create repo.bundle $(git rev-list HEAD | head -n 1)
> > > fatal: Refusing to create empty bundle.
> > This confuses me slightly because I thought a commit hash should also be a valid head _pointer_. 'git rev-list' also seems to agree with me on this.
> 
> A bundle is essentially a pack file with a header indicating the
> references contained within the bundle. If no reference is provided, the
> bundle is considered empty and git refuses to create it. I think this
> makes sense in the context of unbundling as you probably would not want
> to add new objects without updating references in the target repository.
> 
> From the git-bundle(1) docs for "create", the usage does say it accepts
> <git-rev-list-args> which may be a bit misleading because
> git-rev-list(1) does consider the commit hash as valid. Maybe that
> should be updated to indicate that proper references are expected.

That's somewhat weird indeed. I don't see a strong reason why the first
of the following commands works while the second one doesn't:

    $ git bundle create inc.bundle master~..master
    $ git bundle create inc.bundle $(git rev-parse master~)..$(git rev-parse master)

It's not like the bundle has "master" in its header in the first command
anyway, it only lists HEAD in there. So I'd claim that we could do the
same for the second command, as well.

Patrick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some feedback on 'git clone create'
  2024-12-05  8:38   ` Patrick Steinhardt
@ 2024-12-05 15:27     ` Justin Tobler
  2024-12-05 18:45       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Justin Tobler @ 2024-12-05 15:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Sainan, git@vger.kernel.org

On 24/12/05 09:38AM, Patrick Steinhardt wrote:
> On Wed, Dec 04, 2024 at 05:19:53PM -0600, Justin Tobler wrote:
> > On 24/12/04 05:28AM, Sainan wrote:
> > > Hi, I hope this email finds you well.
> > > 
> > > I think Git bundles/packfiles are an exceptional compression format, but I find there are some rough edges with the tool to create them:
> > > 
> > > 1. There is no way to specify that you want a shallow bundle, instead you are only able to a) pack the entire tree at a given head or b) pack new/updated objects in a specified range. Anecdotally, this could store data in ~67% of the size of an equivalent .zip file.
> > 
> > You can create an incremental bundle covering a specified range.
> > Something like the following example might help you achieve what you are
> > looking for:
> > 
> >   $ git bundle create inc.bundle main~10..main
> 
> Yup. The thing that might not be immediately obvious is that
> git-bundle(1) accepts git-rev-list(1) arguments, so you can influence
> what is and isn't included via that. You can for example even generate
> partial bundles without blobs:
> 
>     $ git bundle create partial.bundle main~10..main \
>         --filter=blob:none
> 
> What you can do with the resulting bundle might be a different story.

When trying to unbundle an incremental bundle into a repository that
lacks the prerequisite objects, Git fails. These prerequisite objects
are also listed in the bundle header. Maybe it would be nice if we were
able to create a shallow repository from this bundle.

> 
> > > 2. It seems that when specifiying a commit hash, it raises an error:
> > > $ git bundle create repo.bundle $(git rev-list HEAD | head -n 1)
> > > > fatal: Refusing to create empty bundle.
> > > This confuses me slightly because I thought a commit hash should also be a valid head _pointer_. 'git rev-list' also seems to agree with me on this.
> > 
> > A bundle is essentially a pack file with a header indicating the
> > references contained within the bundle. If no reference is provided, the
> > bundle is considered empty and git refuses to create it. I think this
> > makes sense in the context of unbundling as you probably would not want
> > to add new objects without updating references in the target repository.
> > 
> > From the git-bundle(1) docs for "create", the usage does say it accepts
> > <git-rev-list-args> which may be a bit misleading because
> > git-rev-list(1) does consider the commit hash as valid. Maybe that
> > should be updated to indicate that proper references are expected.
> 
> That's somewhat weird indeed. I don't see a strong reason why the first
> of the following commands works while the second one doesn't:
> 
>     $ git bundle create inc.bundle master~..master
>     $ git bundle create inc.bundle $(git rev-parse master~)..$(git rev-parse master)
> 
> It's not like the bundle has "master" in its header in the first command
> anyway, it only lists HEAD in there. So I'd claim that we could do the
> same for the second command, as well.

I'm not quite sure I follow. According to gitformat-bundle(5), we should
see "obj-id SP refname LF" in the header. Inspecting the header of a
bundle created from `git bundle create inc.bundle master~..master` also
shows "refs/heads/master" in the header.

Looking at git-bundle(1) docs more closely it also does mention this
limitation:

       A revision name or a range whose right-hand-side cannot be
       resolved to a reference is not accepted:

           $ git bundle create HEAD.bundle $(git rev-parse HEAD)
           fatal: Refusing to create empty bundle.
           $ git bundle create master-yesterday.bundle master~10..master~5
           fatal: Refusing to create empty bundle.

It looks like when creating a bundle in `bundle.c:create_bundle()`, if
the call to `write_bundle_refs()` returns a reference count of 0, git
dies with the error seen. When a commit hash is used for the
rev-list-arg, it is not able to determine a reference from it.

-Justin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some feedback on 'git clone create'
  2024-12-05 15:27     ` Justin Tobler
@ 2024-12-05 18:45       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2024-12-05 18:45 UTC (permalink / raw)
  To: Justin Tobler; +Cc: Patrick Steinhardt, Sainan, git@vger.kernel.org

Justin Tobler <jltobler@gmail.com> writes:

> When trying to unbundle an incremental bundle into a repository that
> lacks the prerequisite objects, Git fails. These prerequisite objects
> are also listed in the bundle header. Maybe it would be nice if we were
> able to create a shallow repository from this bundle.

The prerequisite objects are required for two reasons when unpacking
a bundle.  One is to ensure that the resulting history is complete,
as the central idea of bundle was to "freeze the over-the-wire data
transferred during a fetch" but it predates "shallow clone", which
allows a clone to be lacking history beyond certain point in the
topology.  The other is that the pack data stream recorded in a
bundle is allowed to be a "thin pack", with objects represented as a
delta against other objects that do not exist in the same pack---the
recipient is expected to supply these delta base objects to ensure
that these deltified objects can be reconstituted, which is where
the "prerequisite" comes from.

In order to allow creating a shallow clone out of a bundle when
using revision exclusions (see "git bundle --help"), we'd update the
bundle format to allow us to create a bundle using "thick pack",
which we do not currently do.  And more importantly, the recipient
must be able to identify such a "thick pack", as the data currently
defined, a bundle with "prerequisite" by definition is "thin" and
cannot be used to create a shallow clone out of.  Which means we'd
need an updated bundle file format.

> I'm not quite sure I follow. According to gitformat-bundle(5), we should
> see "obj-id SP refname LF" in the header. Inspecting the header of a
> bundle created from `git bundle create inc.bundle master~..master` also
> shows "refs/heads/master" in the header.

Yes, if there is one thing I regret in the way the bundle header was
designed and wish I could fix is to that we lack "HEAD" unless we
force to include it, and "git clone" out of a bundle sometimes fails
because of it (you can "git fetch" out of the bundle, naming the
refs you find in the bundle header instaed).

> It looks like when creating a bundle in `bundle.c:create_bundle()`, if
> the call to `write_bundle_refs()` returns a reference count of 0, git
> dies with the error seen. When a commit hash is used for the
> rev-list-arg, it is not able to determine a reference from it.

Yes.  It is not fundamental, though.  There are occasions where "git
fetch" that does not transfer any object data is still useful (e.g.,
when they create a new branch that points at a commit that already
is included in the history of another branch, you only need to learn
what that new ref is pointing at, but you already have all the
object data).  As the central idea of "bundle" is to freeze the data
that goes over the wire for a fetch (to allow you sneaker-net
instead of "git fetch"), a bundle that says "you are expected to
have the history reachable from these commits, now you be aware of
the fact that this and that ref points at this and that objects"
should be possible.  Such a bundle would have the prerequisite
section and the ref advertisement section, but there is no need for
any pack data.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-12-05 18:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-04  5:28 Some feedback on 'git clone create' Sainan
2024-12-04 23:19 ` Justin Tobler
2024-12-05  7:03   ` Sainan
2024-12-05  8:38   ` Patrick Steinhardt
2024-12-05 15:27     ` Justin Tobler
2024-12-05 18:45       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).