* Doing blobless clone by default; switching between blobless, treeless and full clones by a command
@ 2025-09-04 9:33 Дилян Палаузов
2025-09-04 9:41 ` Kristoffer Haugsbakk
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Дилян Палаузов @ 2025-09-04 9:33 UTC (permalink / raw)
To: git
Hello,
the most common workflow to see and change files of a repository is to do git clone. With --filter=blob:none this process can be sped up. Blobless and blobfull clones have their trade offs.
I think these additions to git would be useful:
• add a config option to do by default blobless clone, when just git clone URL is invoked.
• add a git command to download all locally missing history, including for treeless clones and blobless clones
• add a git command to convert a repository to pure treeless or pure blobless clone (that is delete some commits) - to save disk space.
The git command to download all locally missing history should show how much per cent it has completed and after Ctrl+C interrupts, when the command is reissued, it should resume downloading the remaining data.
Rationale:
The reason people execute git clone is hardly to issue immediately afterwards git log or git annotate. The reason for git clone is to (try changing something and then) build the software. (Provided that git manages source code.) Doing by default a reduced data download with git clone will sped up the initialization, it will save bytes in transit and reduce server load. In fact I think that by default (without extra configuration) git clone should do a reduced download (blobless) and
git should download the other things, when asked to do so. This default download preference should be ideally managed by an option in global gitconfig . When looking at https://git-scm.com/docs/git-config for “filter” I do not recognize anything relevant.
For the latter https://stackoverflow.com/questions/76770003/is-there-a-way-to-configure-git-to-clone-with-filter-blobnone-by-default suggests adding a variable GITFLAGS='--filter=blob:none' .
There might be already commands to switch a repository to: full download, blobless clone, treeless clone, but I do not know these. In any case, if it is possible to switch easily between full and blobless repository, in both directions, for me it only makes sense if by default the downloads are blobless.
Greetings
Дилян
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Doing blobless clone by default; switching between blobless, treeless and full clones by a command
2025-09-04 9:33 Doing blobless clone by default; switching between blobless, treeless and full clones by a command Дилян Палаузов
@ 2025-09-04 9:41 ` Kristoffer Haugsbakk
2025-09-04 12:19 ` Derrick Stolee
2025-09-05 14:49 ` Konstantin Ryabitsev
2025-09-07 19:42 ` Ben Knoble
2 siblings, 1 reply; 9+ messages in thread
From: Kristoffer Haugsbakk @ 2025-09-04 9:41 UTC (permalink / raw)
To: Дилян Палаузов,
git
On Thu, Sep 4, 2025, at 11:33, Дилян Палаузов wrote:
> • add a git command to download all locally missing history, including
> for treeless clones and blobless clones
This sounds like git-backfill(1).
I’ve never used blob/treeless.
--
Kristoffer Haugsbakk
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Doing blobless clone by default; switching between blobless, treeless and full clones by a command
2025-09-04 9:41 ` Kristoffer Haugsbakk
@ 2025-09-04 12:19 ` Derrick Stolee
2025-09-04 16:47 ` Junio C Hamano
2025-09-05 12:23 ` Patrick Steinhardt
0 siblings, 2 replies; 9+ messages in thread
From: Derrick Stolee @ 2025-09-04 12:19 UTC (permalink / raw)
To: Kristoffer Haugsbakk,
Дилян Палаузов,
git
On 9/4/2025 5:41 AM, Kristoffer Haugsbakk wrote:
> On Thu, Sep 4, 2025, at 11:33, Дилян Палаузов wrote:
>> • add a git command to download all locally missing history, including
>> for treeless clones and blobless clones
>
> This sounds like git-backfill(1).
Indeed, 'git backfill' is intended to assist with downloading the blobs
that were not selected in a blobless partial clone.
> I’ve never used blob/treeless.
I don't believe that 'git backfill' is optimized for treeless clones.
Treeless clones are not intended for "refilling" as downloading missing
trees is particularly expensive.
...
And regarding the original thought for "we should have an option for
doing blobless clones by default" the current way to do that is to use
'scalar clone' which is shipped with Git already.
When Scalar was originally contributed to Git, it was partly to enable
users to opt-in to a version of 'git clone' that changes behavior with
best practices and advanced features as they are developed. This is in
contrast to 'git clone' which needs to remain backwards compatible, so
any new features need to be selected with an option or config setting.
But there was always the possibility that the feedback from having
'scalar clone' available could lead to a future builtin of the form
'git big-clone' that adds similar optimizations for large repos. (My
opinion is that 'scalar clone' _is_ this 'git big-clone' but maybe it
is not discoverable enough.)
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Doing blobless clone by default; switching between blobless, treeless and full clones by a command
2025-09-04 12:19 ` Derrick Stolee
@ 2025-09-04 16:47 ` Junio C Hamano
2025-09-05 12:23 ` Patrick Steinhardt
1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2025-09-04 16:47 UTC (permalink / raw)
To: Derrick Stolee
Cc: Kristoffer Haugsbakk,
Дилян Палаузов,
git
Derrick Stolee <stolee@gmail.com> writes:
> But there was always the possibility that the feedback from having
> 'scalar clone' available could lead to a future builtin of the form
> 'git big-clone' that adds similar optimizations for large repos. (My
> opinion is that 'scalar clone' _is_ this 'git big-clone' but maybe it
> is not discoverable enough.)
A separate command like "bit-clone" would not be discoverable
enough, either. When a new feature matures in the playground, it
would be a welcome change for "git clone" to borrow it as a new
option, or even better yet, automatically enable it depending on the
size of the thing, with end-user consent.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Doing blobless clone by default; switching between blobless, treeless and full clones by a command
2025-09-04 12:19 ` Derrick Stolee
2025-09-04 16:47 ` Junio C Hamano
@ 2025-09-05 12:23 ` Patrick Steinhardt
2025-09-05 13:40 ` Derrick Stolee
1 sibling, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2025-09-05 12:23 UTC (permalink / raw)
To: Derrick Stolee
Cc: Kristoffer Haugsbakk,
Дилян Палаузов,
git
On Thu, Sep 04, 2025 at 08:19:59AM -0400, Derrick Stolee wrote:
> On 9/4/2025 5:41 AM, Kristoffer Haugsbakk wrote:
> > On Thu, Sep 4, 2025, at 11:33, Дилян Палаузов wrote:
> >> • add a git command to download all locally missing history, including
> >> for treeless clones and blobless clones
> >
> > This sounds like git-backfill(1).
>
> Indeed, 'git backfill' is intended to assist with downloading the blobs
> that were not selected in a blobless partial clone.
> > I’ve never used blob/treeless.
>
> I don't believe that 'git backfill' is optimized for treeless clones.
> Treeless clones are not intended for "refilling" as downloading missing
> trees is particularly expensive.
Yeah, indeed. I guess we can tweak the way we backfill trees by batching
by depth. E.g. we:
1. Collect all root trees and fetch them in a batch.
2. For each fetched tree, figure out all missing transitive trees and
fetch that level.
3. Repeat for the next-deeper level.
But that batching is definitely not ideal, and there's going to be cases
where it performs _way_ worse compared to backfilling blobs. That's
nothing we can really avoid though.
One idea would be that the remote tells us about all the trees we may
have to fetch. But that information alone is not helpful unless we also
have the links between trees, and as soon as we have that the info is
basically interchangable with having the actual trees in the first
place.
So in general, the recommendation I typically give is to not use
treeless clones at all.
Patrick
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Doing blobless clone by default; switching between blobless, treeless and full clones by a command
2025-09-05 12:23 ` Patrick Steinhardt
@ 2025-09-05 13:40 ` Derrick Stolee
2025-09-05 14:30 ` Patrick Steinhardt
0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2025-09-05 13:40 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Kristoffer Haugsbakk,
Дилян Палаузов,
git
On 9/5/2025 8:23 AM, Patrick Steinhardt wrote:
> On Thu, Sep 04, 2025 at 08:19:59AM -0400, Derrick Stolee wrote:
> So in general, the recommendation I typically give is to not use
> treeless clones at all.
Rather, I'd say that treeless clones are useful if you want the
speed of a shallow clone with some need to analyze commit history
(with no path history) for an ephemeral scenario like a CI build.
Treeless clones are not a good approach for doing ongoing work as
a human. They are a tool for a very narrow case, so don't use them
unless you understand how to avoid their pitfalls.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Doing blobless clone by default; switching between blobless, treeless and full clones by a command
2025-09-05 13:40 ` Derrick Stolee
@ 2025-09-05 14:30 ` Patrick Steinhardt
0 siblings, 0 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2025-09-05 14:30 UTC (permalink / raw)
To: Derrick Stolee
Cc: Kristoffer Haugsbakk,
Дилян Палаузов,
git
On Fri, Sep 05, 2025 at 09:40:49AM -0400, Derrick Stolee wrote:
> On 9/5/2025 8:23 AM, Patrick Steinhardt wrote:
> > On Thu, Sep 04, 2025 at 08:19:59AM -0400, Derrick Stolee wrote:
>
> > So in general, the recommendation I typically give is to not use
> > treeless clones at all.
>
> Rather, I'd say that treeless clones are useful if you want the
> speed of a shallow clone with some need to analyze commit history
> (with no path history) for an ephemeral scenario like a CI build.
>
> Treeless clones are not a good approach for doing ongoing work as
> a human. They are a tool for a very narrow case, so don't use them
> unless you understand how to avoid their pitfalls.
Ah, yes. I should've quantified my statement a bit more carefully.
Thanks for adding in this angle.
Patrick
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Doing blobless clone by default; switching between blobless, treeless and full clones by a command
2025-09-04 9:33 Doing blobless clone by default; switching between blobless, treeless and full clones by a command Дилян Палаузов
2025-09-04 9:41 ` Kristoffer Haugsbakk
@ 2025-09-05 14:49 ` Konstantin Ryabitsev
2025-09-07 19:42 ` Ben Knoble
2 siblings, 0 replies; 9+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-05 14:49 UTC (permalink / raw)
To: Дилян Палаузов
Cc: git
On Thu, Sep 04, 2025 at 12:33:08PM +0300, Дилян Палаузов wrote:
> Rationale:
>
> The reason people execute git clone is hardly to issue immediately
> afterwards git log or git annotate. The reason for git clone is to (try
> changing something and then) build the software. (Provided that git manages
> source code.) Doing by default a reduced data download with git clone
> will sped up the initialization, it will save bytes in transit and reduce
> server load. In fact I think that by default (without extra configuration)
> git clone should do a reduced download (blobless) and git should download
> the other things, when asked to do so. This default download preference
> should be ideally managed by an option in global gitconfig . When looking
> at https://git-scm.com/docs/git-config for “filter” I do not recognize
> anything relevant.
As a counter-rationale, shallow clones generate a lot more load on the server
side, because there are no packs available for this operation. Making this the
default behaviour will likely result in slower clones for everyone and more
unavailable servers due to high load.
-K
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Doing blobless clone by default; switching between blobless, treeless and full clones by a command
2025-09-04 9:33 Doing blobless clone by default; switching between blobless, treeless and full clones by a command Дилян Палаузов
2025-09-04 9:41 ` Kristoffer Haugsbakk
2025-09-05 14:49 ` Konstantin Ryabitsev
@ 2025-09-07 19:42 ` Ben Knoble
2 siblings, 0 replies; 9+ messages in thread
From: Ben Knoble @ 2025-09-07 19:42 UTC (permalink / raw)
To: Дилян Палаузов
Cc: git
> Le 4 sept. 2025 à 05:37, Дилян Палаузов <dilyan.palauzov@aegee.org> a écrit :
>
> Hello,
>
> Rationale:
>
> The reason people execute git clone is hardly to issue immediately afterwards git log or git annotate.
Maybe for you, and maybe in many contexts, but I also frequently clone things to run various history spelunking searches on them.
> The reason for git clone is to (try changing something and then) build the software. (Provided that git manages source code.) Doing by default a reduced data download with git clone will sped up the initialization, it will save bytes in transit and reduce server load. In fact I think that by default (without extra configuration) git clone should do a reduced download (blobless) and
> git should download the other things, when asked to do so.
Absolutely not (in my opinion, :p). Not having the entire repository available except when connected to a network defeats the tremendous advantage of distributed version control. Namely, privileged forks are given status by social agreement, not technical requirement. I want the whole repository available independently.
> This default download preference should be ideally managed by an option in global gitconfig . When looking at https://git-scm.com/docs/git-config for “filter” I do not recognize anything relevant.
This seems more moderated and achievable. If you would prefer to clone less by default (I would not), go for it. I mostly don’t work with repos where this matters, though, or where git-maintenance doesn’t do most of the job I need after a 1-time setup cost.
I do chastise folks for mis-managing large binary files in history that create large blobs and clone times, though :)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-09-07 19:42 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-04 9:33 Doing blobless clone by default; switching between blobless, treeless and full clones by a command Дилян Палаузов
2025-09-04 9:41 ` Kristoffer Haugsbakk
2025-09-04 12:19 ` Derrick Stolee
2025-09-04 16:47 ` Junio C Hamano
2025-09-05 12:23 ` Patrick Steinhardt
2025-09-05 13:40 ` Derrick Stolee
2025-09-05 14:30 ` Patrick Steinhardt
2025-09-05 14:49 ` Konstantin Ryabitsev
2025-09-07 19:42 ` Ben Knoble
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).