* Some feedback on 'git clone create'
@ 2024-12-04 5:28 Sainan
2024-12-04 23:19 ` Justin Tobler
0 siblings, 1 reply; 6+ messages in thread
From: Sainan @ 2024-12-04 5:28 UTC (permalink / raw)
To: git@vger.kernel.org
Hi, I hope this email finds you well.
I think Git bundles/packfiles are an exceptional compression format, but I find there are some rough edges with the tool to create them:
1. There is no way to specify that you want a shallow bundle, instead you are only able to a) pack the entire tree at a given head or b) pack new/updated objects in a specified range. Anecdotally, this could store data in ~67% of the size of an equivalent .zip file.
2. It seems that when specifiying a commit hash, it raises an error:
$ git bundle create repo.bundle $(git rev-list HEAD | head -n 1)
> fatal: Refusing to create empty bundle.
This confuses me slightly because I thought a commit hash should also be a valid head _pointer_. 'git rev-list' also seems to agree with me on this.
-- Sainan
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Some feedback on 'git clone create' 2024-12-04 5:28 Some feedback on 'git clone create' Sainan @ 2024-12-04 23:19 ` Justin Tobler 2024-12-05 7:03 ` Sainan 2024-12-05 8:38 ` Patrick Steinhardt 0 siblings, 2 replies; 6+ messages in thread From: Justin Tobler @ 2024-12-04 23:19 UTC (permalink / raw) To: Sainan; +Cc: git@vger.kernel.org On 24/12/04 05:28AM, Sainan wrote: > Hi, I hope this email finds you well. > > I think Git bundles/packfiles are an exceptional compression format, but I find there are some rough edges with the tool to create them: > > 1. There is no way to specify that you want a shallow bundle, instead you are only able to a) pack the entire tree at a given head or b) pack new/updated objects in a specified range. Anecdotally, this could store data in ~67% of the size of an equivalent .zip file. You can create an incremental bundle covering a specified range. Something like the following example might help you achieve what you are looking for: $ git bundle create inc.bundle main~10..main > > 2. It seems that when specifiying a commit hash, it raises an error: > $ git bundle create repo.bundle $(git rev-list HEAD | head -n 1) > > fatal: Refusing to create empty bundle. > This confuses me slightly because I thought a commit hash should also be a valid head _pointer_. 'git rev-list' also seems to agree with me on this. A bundle is essentially a pack file with a header indicating the references contained within the bundle. If no reference is provided, the bundle is considered empty and git refuses to create it. I think this makes sense in the context of unbundling as you probably would not want to add new objects without updating references in the target repository. From the git-bundle(1) docs for "create", the usage does say it accepts <git-rev-list-args> which may be a bit misleading because git-rev-list(1) does consider the commit hash as valid. Maybe that should be updated to indicate that proper references are expected. -Justin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Some feedback on 'git clone create' 2024-12-04 23:19 ` Justin Tobler @ 2024-12-05 7:03 ` Sainan 2024-12-05 8:38 ` Patrick Steinhardt 1 sibling, 0 replies; 6+ messages in thread From: Sainan @ 2024-12-05 7:03 UTC (permalink / raw) To: Justin Tobler; +Cc: git@vger.kernel.org > You can create an incremental bundle covering a specified range. Yeah, I know that you can bundle a delta, but I mean an analogue to a shallow clone, which is kind of like traversing the tree in a different direction. (Sorry for double-send, forgot to CC git@vger.kernel.org on the reply.) -- Sainan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Some feedback on 'git clone create' 2024-12-04 23:19 ` Justin Tobler 2024-12-05 7:03 ` Sainan @ 2024-12-05 8:38 ` Patrick Steinhardt 2024-12-05 15:27 ` Justin Tobler 1 sibling, 1 reply; 6+ messages in thread From: Patrick Steinhardt @ 2024-12-05 8:38 UTC (permalink / raw) To: Justin Tobler; +Cc: Sainan, git@vger.kernel.org On Wed, Dec 04, 2024 at 05:19:53PM -0600, Justin Tobler wrote: > On 24/12/04 05:28AM, Sainan wrote: > > Hi, I hope this email finds you well. > > > > I think Git bundles/packfiles are an exceptional compression format, but I find there are some rough edges with the tool to create them: > > > > 1. There is no way to specify that you want a shallow bundle, instead you are only able to a) pack the entire tree at a given head or b) pack new/updated objects in a specified range. Anecdotally, this could store data in ~67% of the size of an equivalent .zip file. > > You can create an incremental bundle covering a specified range. > Something like the following example might help you achieve what you are > looking for: > > $ git bundle create inc.bundle main~10..main Yup. The thing that might not be immediately obvious is that git-bundle(1) accepts git-rev-list(1) arguments, so you can influence what is and isn't included via that. You can for example even generate partial bundles without blobs: $ git bundle create partial.bundle main~10..main \ --filter=blob:none What you can do with the resulting bundle might be a different story. > > 2. It seems that when specifiying a commit hash, it raises an error: > > $ git bundle create repo.bundle $(git rev-list HEAD | head -n 1) > > > fatal: Refusing to create empty bundle. > > This confuses me slightly because I thought a commit hash should also be a valid head _pointer_. 'git rev-list' also seems to agree with me on this. > > A bundle is essentially a pack file with a header indicating the > references contained within the bundle. If no reference is provided, the > bundle is considered empty and git refuses to create it. I think this > makes sense in the context of unbundling as you probably would not want > to add new objects without updating references in the target repository. > > From the git-bundle(1) docs for "create", the usage does say it accepts > <git-rev-list-args> which may be a bit misleading because > git-rev-list(1) does consider the commit hash as valid. Maybe that > should be updated to indicate that proper references are expected. That's somewhat weird indeed. I don't see a strong reason why the first of the following commands works while the second one doesn't: $ git bundle create inc.bundle master~..master $ git bundle create inc.bundle $(git rev-parse master~)..$(git rev-parse master) It's not like the bundle has "master" in its header in the first command anyway, it only lists HEAD in there. So I'd claim that we could do the same for the second command, as well. Patrick ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Some feedback on 'git clone create' 2024-12-05 8:38 ` Patrick Steinhardt @ 2024-12-05 15:27 ` Justin Tobler 2024-12-05 18:45 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Justin Tobler @ 2024-12-05 15:27 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Sainan, git@vger.kernel.org On 24/12/05 09:38AM, Patrick Steinhardt wrote: > On Wed, Dec 04, 2024 at 05:19:53PM -0600, Justin Tobler wrote: > > On 24/12/04 05:28AM, Sainan wrote: > > > Hi, I hope this email finds you well. > > > > > > I think Git bundles/packfiles are an exceptional compression format, but I find there are some rough edges with the tool to create them: > > > > > > 1. There is no way to specify that you want a shallow bundle, instead you are only able to a) pack the entire tree at a given head or b) pack new/updated objects in a specified range. Anecdotally, this could store data in ~67% of the size of an equivalent .zip file. > > > > You can create an incremental bundle covering a specified range. > > Something like the following example might help you achieve what you are > > looking for: > > > > $ git bundle create inc.bundle main~10..main > > Yup. The thing that might not be immediately obvious is that > git-bundle(1) accepts git-rev-list(1) arguments, so you can influence > what is and isn't included via that. You can for example even generate > partial bundles without blobs: > > $ git bundle create partial.bundle main~10..main \ > --filter=blob:none > > What you can do with the resulting bundle might be a different story. When trying to unbundle an incremental bundle into a repository that lacks the prerequisite objects, Git fails. These prerequisite objects are also listed in the bundle header. Maybe it would be nice if we were able to create a shallow repository from this bundle. > > > > 2. It seems that when specifiying a commit hash, it raises an error: > > > $ git bundle create repo.bundle $(git rev-list HEAD | head -n 1) > > > > fatal: Refusing to create empty bundle. > > > This confuses me slightly because I thought a commit hash should also be a valid head _pointer_. 'git rev-list' also seems to agree with me on this. > > > > A bundle is essentially a pack file with a header indicating the > > references contained within the bundle. If no reference is provided, the > > bundle is considered empty and git refuses to create it. I think this > > makes sense in the context of unbundling as you probably would not want > > to add new objects without updating references in the target repository. > > > > From the git-bundle(1) docs for "create", the usage does say it accepts > > <git-rev-list-args> which may be a bit misleading because > > git-rev-list(1) does consider the commit hash as valid. Maybe that > > should be updated to indicate that proper references are expected. > > That's somewhat weird indeed. I don't see a strong reason why the first > of the following commands works while the second one doesn't: > > $ git bundle create inc.bundle master~..master > $ git bundle create inc.bundle $(git rev-parse master~)..$(git rev-parse master) > > It's not like the bundle has "master" in its header in the first command > anyway, it only lists HEAD in there. So I'd claim that we could do the > same for the second command, as well. I'm not quite sure I follow. According to gitformat-bundle(5), we should see "obj-id SP refname LF" in the header. Inspecting the header of a bundle created from `git bundle create inc.bundle master~..master` also shows "refs/heads/master" in the header. Looking at git-bundle(1) docs more closely it also does mention this limitation: A revision name or a range whose right-hand-side cannot be resolved to a reference is not accepted: $ git bundle create HEAD.bundle $(git rev-parse HEAD) fatal: Refusing to create empty bundle. $ git bundle create master-yesterday.bundle master~10..master~5 fatal: Refusing to create empty bundle. It looks like when creating a bundle in `bundle.c:create_bundle()`, if the call to `write_bundle_refs()` returns a reference count of 0, git dies with the error seen. When a commit hash is used for the rev-list-arg, it is not able to determine a reference from it. -Justin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Some feedback on 'git clone create' 2024-12-05 15:27 ` Justin Tobler @ 2024-12-05 18:45 ` Junio C Hamano 0 siblings, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2024-12-05 18:45 UTC (permalink / raw) To: Justin Tobler; +Cc: Patrick Steinhardt, Sainan, git@vger.kernel.org Justin Tobler <jltobler@gmail.com> writes: > When trying to unbundle an incremental bundle into a repository that > lacks the prerequisite objects, Git fails. These prerequisite objects > are also listed in the bundle header. Maybe it would be nice if we were > able to create a shallow repository from this bundle. The prerequisite objects are required for two reasons when unpacking a bundle. One is to ensure that the resulting history is complete, as the central idea of bundle was to "freeze the over-the-wire data transferred during a fetch" but it predates "shallow clone", which allows a clone to be lacking history beyond certain point in the topology. The other is that the pack data stream recorded in a bundle is allowed to be a "thin pack", with objects represented as a delta against other objects that do not exist in the same pack---the recipient is expected to supply these delta base objects to ensure that these deltified objects can be reconstituted, which is where the "prerequisite" comes from. In order to allow creating a shallow clone out of a bundle when using revision exclusions (see "git bundle --help"), we'd update the bundle format to allow us to create a bundle using "thick pack", which we do not currently do. And more importantly, the recipient must be able to identify such a "thick pack", as the data currently defined, a bundle with "prerequisite" by definition is "thin" and cannot be used to create a shallow clone out of. Which means we'd need an updated bundle file format. > I'm not quite sure I follow. According to gitformat-bundle(5), we should > see "obj-id SP refname LF" in the header. Inspecting the header of a > bundle created from `git bundle create inc.bundle master~..master` also > shows "refs/heads/master" in the header. Yes, if there is one thing I regret in the way the bundle header was designed and wish I could fix is to that we lack "HEAD" unless we force to include it, and "git clone" out of a bundle sometimes fails because of it (you can "git fetch" out of the bundle, naming the refs you find in the bundle header instaed). > It looks like when creating a bundle in `bundle.c:create_bundle()`, if > the call to `write_bundle_refs()` returns a reference count of 0, git > dies with the error seen. When a commit hash is used for the > rev-list-arg, it is not able to determine a reference from it. Yes. It is not fundamental, though. There are occasions where "git fetch" that does not transfer any object data is still useful (e.g., when they create a new branch that points at a commit that already is included in the history of another branch, you only need to learn what that new ref is pointing at, but you already have all the object data). As the central idea of "bundle" is to freeze the data that goes over the wire for a fetch (to allow you sneaker-net instead of "git fetch"), a bundle that says "you are expected to have the history reachable from these commits, now you be aware of the fact that this and that ref points at this and that objects" should be possible. Such a bundle would have the prerequisite section and the ref advertisement section, but there is no need for any pack data. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-12-05 18:45 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-04 5:28 Some feedback on 'git clone create' Sainan 2024-12-04 23:19 ` Justin Tobler 2024-12-05 7:03 ` Sainan 2024-12-05 8:38 ` Patrick Steinhardt 2024-12-05 15:27 ` Justin Tobler 2024-12-05 18:45 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).