public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [GSoC] [Proposal]: Implement promisor remote fetch ordering
@ 2026-02-28 23:27 Abraham Samuel Adekunle
  2026-03-03  9:27 ` Christian Couder
  0 siblings, 1 reply; 7+ messages in thread
From: Abraham Samuel Adekunle @ 2026-02-28 23:27 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Karthik Nayak, Justin Tobler, Siddharth Asthana,
	Ayush Chandekar, Lucas Seiki Oshiro, Junio C Hamano,
	Patrick Steinhardt, Phillip Wood

Hello,
This is my proposal for the project
"Implement promisor remote fetch ordering" for the 2026 GSoC programme.

Personal Bio:
=============
Full Name:  Abraham Samuel Adekunle
Email: abrahamadekunle50@gmail.com
GitHub: https://github.com/devdekunle
Pronouns: he/him

About Me:
=========
My name is Abraham Samuel Adekunle. I love to code, read and I am a
harworker. In my free time I love to play games and listen to soothing
music and well, also shifting into diffuse thinking to gain a new
perspective of whatever challenge I am trying to solve.

I am very curious so I really love to learn as its a never ending
journey, and I believe in the power of "yet".
I can understand anything, it is only a matter of time and effort.
I love to figure out things and be part of a community
where we can share experiences and support each other in growth.

Past Experience with Git:
=========================
I first learnt about Git during my ALX Software Engineering days in
2022, it proved challenging at first understanding what was going on
and a git merge conflict was always a scary experience.
Now I feel elated actually contributing to this renowned project.

Contributions to the Git Community:
====================================
My first contribution to the Git community was during the contribution
phase of the December 2024 Outreachy contribution phase where I first
learned to send patches and had my first interactions with the Git code
base. I did not make it through then but it was an opportunity to try
again.

Contributions to other Communities:
===================================
I have contributed very sparingly to the Systemd project and also
the Linux Kernel.

Microproject:
=============
Link: https://lore.kernel.org/git/aV_IGCld5T_dBxTs@Adekunles-MacBook-Air.local/
Branch: aa/add-p-previous-decisions
Status: Merged to master
Commit ID: 8cafc305e22a59efb92472d4132616e24d3184c6
Description:  "git add -p" and friends notes what the current status
	       of the hunk being shown is

Other Contributions:
====================
1.
Link: https://lore.kernel.org/git/cover.1771066252.git.abrahamadekunle50@gmail.com/
Branch: aa/add-p-no-auto-advance
Status: Merged to next
Description: "git add -p" learned a new mode that allows the user to
	      revisit a file that was already dealt with

2.
Link: https://lore.kernel.org/git/aWZkEYHhcIhdAjkh@Adekunles-MacBook-Air.local/
Status: Stalled
Description: the patch attempts to remove the use of the_repository
	     global variable in some builtins

Project Overview and Objective:
===============================
I have always wondered what happens in the background when I see these
details on my screen in a "git fetch" process.

	remote: Enumerating objects: 57, done.
	remote: Counting objects: 100% (57/57), done.
	remote: Compressing objects: 100% (12/12), done.
	Receiving objects: 100% (57/57), 48.3 KiB | 512.00 KiB/s, done.
	Resolving deltas: 100% (21/21), done.
	remote: Total 57 (delta 21), reused 13 (delta 5), pack-reused 30
	From https://example.com/me/repo
	1a2b3c4..5d6e7f8  feature/xyz -> origin/feature/xyz

And when I saw this project from the list of projects listed,
I was endeared to it as it is an opportunity to work in an area of the
that Git code base that will satisfy my curiousity while also being
mentored by very best and most experienced Engineers there is.

When a Git repository is configured with multiple promisor remotes,
there is currently no mechanism to specify or optimize the order in
which these remotes should be queried when fetching missing objects.
Different remotes may have different performance characteristics
such as characteristics, cost, or reliability which makes the
fetching order an important consideration.

The project aims to implement a fetch ordering mechanism for multiple
promisor remotes by designing a flexible system that allows a server
to dictate their preferred order to the client to ensure performance
and cost management.

Review of Previous Work:
========================
The project is part of the Large Object Promisor "LOP" effort
documented in Documentatio/technical/large-object-promisor.adoc.

In a bid to better handle large objects, the promisor-remote
capability was added to the Git protocol v2, as documented in
the promisor-remote section of Documentation/gitprotocol-v2.adoc,
which enables a protocol negotiation so that the server can advertise
one or more promisor remotes and so that the client and server can
discuss if the client could directly use a promisor remote the server
is advertising and if an agreement is reached, the client would be
able to get the large blobs directly from the promisor remote without
the server acting as a relay between the client and the promisor remote when
fetching missing large blobs.

The ground work for adding this capability to the v2 protocol was
started by Christian Couder in [1], where if the "promisor.advertise"
config is set to true, the server can then propagate its promisor remote
configurations to the client over the v2 protocol during the negotiation
in the form

	"promisor-remote=name=prom1,url=url_encoded_value1;name=prom2,url=url_encoded_value2"

The client can then choose to accept some promisor remotes the server
is advertising using the "All", "None", "KnownName" or "KnownUrl"
configurations as values for the "promisor.acceptfromServer" config option.

In [2], Christian added the option for a server to advertise more
fields after the "name" and "url", such as "token" and
"partialCloneFilter" for the client to use this additional information
in deciding the remotes to use as its promisor remotes by comparing it
with its local config information.

This was implemented by adding the "promisor.sendFields" and "promisor.checkFields"
config values to the server and client respectively.
For example, if "promisor.sendFields" is set to "partialCloneFilter", and the
server has the remote configured like so:
[remote "foo"]
	url = https://pr.test
	partialCloneFilter = blob:none
	token = "fake"
then
"name=foo,url=https://pr.test,partialCloneFilter=blob:none,token=fake"
will be advertised by the server to the client who can then decide,
using the "promisor.checkFields" setting, to check if the passed field
matches certain conditions before deciding to use it.

This work by Christian is very crucial to this project as I will take
advantage of this and enable the advertisement of a "priority" field
that the server can use to communicate with the client in deciding to
use the server recommended fetch order or not.

in [3] Christian also implemented the option "promisor.storeFields" which
allowed the value of the configuration to be saved in the client's
configuration file for use at a later time.
As above, this option will also prove important when the server advertises
the "priority" field as it will allow the client decided to store it in its
config settings for that promisor remote, for later use when fetching
the remaining blobs from the promisor remotes.

High Level Approach to Project Execution:
=========================================
1. Server Side Advertisement:
-----------------------------
As the server knows about the promisor remotes which hold the
large object blobs, it could recommend the order in which these remotes
could be queried by the client using a "priority=<value>" field of the
promisor-remote capability in the Git v2 protocol, where <value> could
be an integer between 1 and 65535, where the smallest integer indicates
highest priority.

This will be an optional feature which will be enabled by the server
if it wants to recommend ordered fetching to the client via
the "promisor.sendFields=priority" config option.

Hence if the server advertises promisor remotes prom1 and prom2,
it could be of the form
	"promisor-remote=name=prom1,url=https://prom1.com,priority=10;name=prom2,url=https://prom2.com,priority=20",
if the server is configured as:
[remote "prom1"]
         url = https://prom1.com
	 priority = 10
[remote "prom2"]
	url = https://prom2.com
	priority = 20

If the "promisor.sendFields" values does not include the "priority"
field in its comma or space separated options, the field will not be
advertised in the promisor-remote capability.

2. Client Side Parsing:
-----------------------
The client can already use the "promisor.acceptFromServer" option to
decide which promisor remotes it will accept, so this new field
"priority" might not be significant at all in the deciding phase but when
fetching missing blobs from the accepted promisor remotes.

Instead, if the client wants to use the server recommended "priority"
later when fetching the missing blob from the accepted promisor remotes,
the "priority" field will be added to the "promisor.storeFields" config
options so that the passed value can be saved to the client config.
If the client does not enable this option in the config, the "priority"
field will not be saved in the local config and the fetching order will
default to the local config order.

A new config "promisor.honorServerFetchOrder" will be implemented
on the client side to determine if the client will use the recommended
server advertised promisor remote fetching order or not.
This config can only be enabled if "promsior.acceptFromServer" is not
"None".

The options for this config value will be [true|false|local-first] where
"false" (default) ignores server priority and will rely on the current
config order.
"true" sorts candidate advertised remotes by priority in ascending
order (smallest tried first).
"local-first" will try remotes in local .git/config first in the order
the promisors are placed in the config file and then
server advertised ones ordered by priority, if the object has not been
found by now. This last values makes me feel somehow as all objects
could have been fetched already but I am just stating my thought process.

Proposed Project Execution Timeline:
====================================

1. Study code base to understand promisor-remote and fetch mechanism (May 1 - 14, 2026):
   -------------------------------------------------------------------------------------
- Study the code base to understand how the client and server
  communicate using the protocol when client contacts the server.
- Study how Git currently handles fetching from multiple remotes.
- Set up blog for posting once a week

2. Community Bonding (May 1 - 24, 2026):
----------------------------------------
- Discuss design details with community and mentors
- Understand safety, security constraints and design considerations.
  when implementing fetch ordering.
- Read indepth the Documentations for promisor-remote, gitprotocol-v2,
  and other necessary documentations.
- Post updates on my blog

3. Review Existing Patches (May 25 - June 14, 2026):
-------------------------------
- Study Christian's patches in-depth to understand how a new field is
  added to the promisor remote of server, what conditions
  are used to ensure the data is of the right format, correctly passed from
  server to client, and correctly parsed and stored by client.
- Understand the tests to see how these new features are tested
- Post updates on my blog

4. Allow a server to add the "priority" field to the promisor-remote capability (June 14 - June 21, 2026)
-------------------------------------------------------------------------------
- Discuss with mentors on the suggested approach
- Allow the server to add the field "priority" to the promisor-remote
  capability when it is enabled in "promisor.sendFields".
- Write tests to ensure proper implementation
- Update documentation in Documenantation/config/promisor.adoc
- Submit patch to mailing list for discussions and address reviews
- Post updates on my blog

5. Allow Client to Decide to use the field (June 21 - 31, 2026):
-----------------------------------------------------------------
- Discuss strategy with mentors
- Allow the server to store the "priority" in its .git/config if it
  accepts the promisor remotes and it is included in "promisor.storeFields"
- Write unit tests to ensure proper implementations
- Update documentation in Documenantation/config/promisor.adoc
- Submit patches to mailing list for reviews and address feedbacks
- Post updates on my blog

6. Implement setting to decide fetching order: (July 1 - July 14, 2026):
------------------------------------------------------------------------
- Discuss with mentors on the approach and considerations for fetch order
- Implement option "promisor.honorServerOrder" which will decide the
  order of fetching missing objects from the remote
- Write unittests to test implementation
- Update documentation in Documenantation/config/promisor.adoc
- Submit patches for review and address reviews
- Post updates on my blog

7. Implement fetching based on the selected order (July 15 - August 15, 2026):
------------------------------------------------------------------------------
- Implement the ordered fetching for missing objects based on the cient's
  configuration in 6 above.
- Write unit tests to ensure the order was properly followed
- Submit to mailing list and be involved in the review process
- Post updates on my blog

8. Final Report on Project (August 15 - 24, 2026):
--------------------------------------------------
- Document any final report in my blog with details of my experience
- Finalize any pending tasks

Availability:
=============
I will be able to give 30 hours a week to make the project a success

Post GSoC
=========

Though this is not my first contribution to Git, as I have contributed
very lightly to the codebase before, I am committed to
continuously contributing to Git and become a part of the next set
of contributors to champion the continuous development of Git.

Appreciation
============
To Junio C Hamano, Phillip Wood, and everyone who helped with my patches.
I really appreciate your guidance, patience and direction while
reviewing and my patches.

Thanks

References
===========
1. https://lore.kernel.org/git/20250218113204.2847463-1-christian.couder@gmail.com/
2. https://lore.kernel.org/git/20250908053056.956907-1-christian.couder@gmail.com/
3. https://lore.kernel.org/git/20260216132317.15894-1-christian.couder@gmail.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] [Proposal]: Implement promisor remote fetch ordering
  2026-02-28 23:27 [GSoC] [Proposal]: " Abraham Samuel Adekunle
@ 2026-03-03  9:27 ` Christian Couder
  2026-03-03 12:08   ` Samuel Abraham
  2026-03-10 15:11   ` Samuel Abraham
  0 siblings, 2 replies; 7+ messages in thread
From: Christian Couder @ 2026-03-03  9:27 UTC (permalink / raw)
  To: Abraham Samuel Adekunle
  Cc: git, Karthik Nayak, Justin Tobler, Siddharth Asthana,
	Ayush Chandekar, Lucas Seiki Oshiro, Junio C Hamano,
	Patrick Steinhardt, Phillip Wood

Hi,

On Sun, Mar 1, 2026 at 12:27 AM Abraham Samuel Adekunle
<abrahamadekunle50@gmail.com> wrote:
>
> Hello,
> This is my proposal for the project
> "Implement promisor remote fetch ordering" for the 2026 GSoC programme.

Thanks for being interested in Git and this project in particular.

> Personal Bio:
> =============
> Full Name:  Abraham Samuel Adekunle
> Email: abrahamadekunle50@gmail.com
> GitHub: https://github.com/devdekunle
> Pronouns: he/him
>
> About Me:
> =========
> My name is Abraham Samuel Adekunle. I love to code, read and I am a
> harworker. In my free time I love to play games and listen to soothing

I guess: s/harworker/hardworker/

> music and well, also shifting into diffuse thinking to gain a new
> perspective of whatever challenge I am trying to solve.

[...]

> Contributions to the Git Community:
> ====================================
> My first contribution to the Git community was during the contribution
> phase of the December 2024 Outreachy contribution phase where I first
> learned to send patches and had my first interactions with the Git code
> base. I did not make it through then but it was an opportunity to try
> again.

Nice that you are trying again.

> Contributions to other Communities:
> ===================================
> I have contributed very sparingly to the Systemd project and also
> the Linux Kernel.
>
> Microproject:
> =============
> Link: https://lore.kernel.org/git/aV_IGCld5T_dBxTs@Adekunles-MacBook-Air.local/
> Branch: aa/add-p-previous-decisions
> Status: Merged to master
> Commit ID: 8cafc305e22a59efb92472d4132616e24d3184c6
> Description:  "git add -p" and friends notes what the current status
>                of the hunk being shown is
>
> Other Contributions:
> ====================
> 1.
> Link: https://lore.kernel.org/git/cover.1771066252.git.abrahamadekunle50@gmail.com/
> Branch: aa/add-p-no-auto-advance
> Status: Merged to next
> Description: "git add -p" learned a new mode that allows the user to
>               revisit a file that was already dealt with
>
> 2.
> Link: https://lore.kernel.org/git/aWZkEYHhcIhdAjkh@Adekunles-MacBook-Air.local/
> Status: Stalled
> Description: the patch attempts to remove the use of the_repository
>              global variable in some builtins

It looks like you also have 2 contributions merged from October 2024
(when you applied for Outreachy). You can mention them too.

> Project Overview and Objective:
> ===============================
> I have always wondered what happens in the background when I see these
> details on my screen in a "git fetch" process.
>
>         remote: Enumerating objects: 57, done.
>         remote: Counting objects: 100% (57/57), done.
>         remote: Compressing objects: 100% (12/12), done.
>         Receiving objects: 100% (57/57), 48.3 KiB | 512.00 KiB/s, done.
>         Resolving deltas: 100% (21/21), done.
>         remote: Total 57 (delta 21), reused 13 (delta 5), pack-reused 30
>         From https://example.com/me/repo
>         1a2b3c4..5d6e7f8  feature/xyz -> origin/feature/xyz
>
> And when I saw this project from the list of projects listed,
> I was endeared to it as it is an opportunity to work in an area of the
> that Git code base that will satisfy my curiousity while also being

s/curiousity/curiosity/

> mentored by very best and most experienced Engineers there is.
>
> When a Git repository is configured with multiple promisor remotes,
> there is currently no mechanism to specify or optimize the order in
> which these remotes should be queried when fetching missing objects.
> Different remotes may have different performance characteristics
> such as characteristics, cost, or reliability which makes the
> fetching order an important consideration.

In which order are they currently queried?

> The project aims to implement a fetch ordering mechanism for multiple
> promisor remotes by designing a flexible system that allows a server
> to dictate their preferred order to the client to ensure performance
> and cost management.

A part of the whole system that allows servers to advertise
information already exists and should be reused.

We use "advertise" instead of "dictate" because the client should be
able to decide.

> Review of Previous Work:
> ========================
> The project is part of the Large Object Promisor "LOP" effort
> documented in Documentatio/technical/large-object-promisor.adoc.

s/Documentatio/Documentation/
s/large-object-promisor/large-object-promisors/

> In a bid to better handle large objects, the promisor-remote
> capability was added to the Git protocol v2, as documented in
> the promisor-remote section of Documentation/gitprotocol-v2.adoc,
> which enables a protocol negotiation so that the server can advertise
> one or more promisor remotes and so that the client and server can
> discuss if the client could directly use a promisor remote the server
> is advertising and if an agreement is reached, the client would be
> able to get the large blobs directly from the promisor remote without
> the server acting as a relay between the client and the promisor remote when
> fetching missing large blobs.
>
> The ground work for adding this capability to the v2 protocol was
> started by Christian Couder in [1], where if the "promisor.advertise"
> config is set to true, the server can then propagate its promisor remote
> configurations to the client over the v2 protocol during the negotiation
> in the form
>
>         "promisor-remote=name=prom1,url=url_encoded_value1;name=prom2,url=url_encoded_value2"
>
> The client can then choose to accept some promisor remotes the server
> is advertising using the "All", "None", "KnownName" or "KnownUrl"
> configurations as values for the "promisor.acceptfromServer" config option.
>
> In [2], Christian added the option for a server to advertise more
> fields after the "name" and "url", such as "token" and
> "partialCloneFilter" for the client to use this additional information
> in deciding the remotes to use as its promisor remotes by comparing it
> with its local config information.
>
> This was implemented by adding the "promisor.sendFields" and "promisor.checkFields"
> config values to the server and client respectively.
> For example, if "promisor.sendFields" is set to "partialCloneFilter", and the
> server has the remote configured like so:
> [remote "foo"]
>         url = https://pr.test
>         partialCloneFilter = blob:none
>         token = "fake"
> then
> "name=foo,url=https://pr.test,partialCloneFilter=blob:none,token=fake"
> will be advertised by the server to the client who can then decide,
> using the "promisor.checkFields" setting, to check if the passed field
> matches certain conditions before deciding to use it.
>
> This work by Christian is very crucial to this project as I will take
> advantage of this and enable the advertisement of a "priority" field
> that the server can use to communicate with the client in deciding to
> use the server recommended fetch order or not.
>
> in [3] Christian also implemented the option "promisor.storeFields" which
> allowed the value of the configuration to be saved in the client's
> configuration file for use at a later time.
> As above, this option will also prove important when the server advertises
> the "priority" field as it will allow the client decided to store it in its
> config settings for that promisor remote, for later use when fetching
> the remaining blobs from the promisor remotes.

Yeah, this is about allowing the server to advertise priority
information, and the client to accept it or not, but this doesn't talk
much about how this information will be used to actually change the
fetch order.

It would be nice if this could talk about which order is currently
used. You might want to take a look at
Documentation/technical/partial-clone.adoc, especially the "Using many
promisor remotes" section.

> High Level Approach to Project Execution:
> =========================================
> 1. Server Side Advertisement:
> -----------------------------
> As the server knows about the promisor remotes which hold the
> large object blobs,

First I would say "large blob objects" or just "large blobs" instead
of "large object blobs" if I wanted to talk about them.

Then it's true that the "promisor-remote" capability in protocol v2
was developed especially to help with large blobs and the LOP effort,
but this GSoC project could be useful for any partial clone that uses
multiple promisor remotes. So you could talk about "objects", not just
"large blobs".

> it could recommend the order in which these remotes
> could be queried by the client using a "priority=<value>" field of the
> promisor-remote capability in the Git v2 protocol, where <value> could
> be an integer between 1 and 65535, where the smallest integer indicates
> highest priority.
>
> This will be an optional feature which will be enabled by the server
> if it wants to recommend ordered fetching to the client via
> the "promisor.sendFields=priority" config option.
>
> Hence if the server advertises promisor remotes prom1 and prom2,
> it could be of the form
>         "promisor-remote=name=prom1,url=https://prom1.com,priority=10;name=prom2,url=https://prom2.com,priority=20",
> if the server is configured as:
> [remote "prom1"]
>          url = https://prom1.com
>          priority = 10
> [remote "prom2"]
>         url = https://prom2.com
>         priority = 20
>
> If the "promisor.sendFields" values does not include the "priority"
> field in its comma or space separated options, the field will not be
> advertised in the promisor-remote capability.

The issue is that right now "priority = 10" or "priority = 20" if they
were configured would change nothing in the order used to fetch from
promisor remotes. So the first thing to do (before having the server
send that and the client use it or not) is to actually introduce the
`remote.<name>.priority` config option and make it change the fetch
order. When that works, it makes sense to allow the server to
advertise it, and the client to accept it or not from the server.

> 2. Client Side Parsing:
> -----------------------
> The client can already use the "promisor.acceptFromServer" option to
> decide which promisor remotes it will accept, so this new field
> "priority" might not be significant at all in the deciding phase but when
> fetching missing blobs from the accepted promisor remotes.

If that's what you mean, I agree that the priority advertised by a
server for a promisor remote is not likely to be a (good) criteria on
the client side to help decide if the client accepts to use the
promisor remote or not. You might want to reword the above paragraph
though as it's not easy to understand.

> Instead, if the client wants to use the server recommended "priority"
> later when fetching the missing blob from the accepted promisor remotes,
> the "priority" field will be added to the "promisor.storeFields" config
> options so that the passed value can be saved to the client config.

Yeah, that's the most likely way the client would use it.

> If the client does not enable this option in the config, the "priority"
> field will not be saved in the local config and the fetching order will
> default to the local config order.

Right.

> A new config "promisor.honorServerFetchOrder" will be implemented
> on the client side to determine if the client will use the recommended
> server advertised promisor remote fetching order or not.

I don't think this is necessary. If the client doesn't want to use the
priority advertised by the server, it just needs to not add "priority"
to the "promisor.storeFields" config variable.

> This config can only be enabled if "promsior.acceptFromServer" is not
> "None".
>
> The options for this config value will be [true|false|local-first] where
> "false" (default) ignores server priority and will rely on the current
> config order.
> "true" sorts candidate advertised remotes by priority in ascending
> order (smallest tried first).
> "local-first" will try remotes in local .git/config first in the order
> the promisors are placed in the config file  and then
> server advertised ones ordered by priority, if the object has not been
> found by now. This last values makes me feel somehow as all objects

s/values/value/

> could have been fetched already but I am just stating my thought process.

I think we will likely not need something like this. The 3 different
possibilities could be configured this way:

- to rely on the order advertised by the server: just add "priority"
to "promisor.storeFields"
- to rely on local "priority" config: just add "priority = XXX" to
some/all "remote.<name>"
- to rely on the default order: add nothing

> Proposed Project Execution Timeline:
> ====================================

This needs to take into account that the first step should be to
actually introduce the `remote.<name>.priority` config option and make
it change the fetch order.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] [Proposal]: Implement promisor remote fetch ordering
  2026-03-03  9:27 ` Christian Couder
@ 2026-03-03 12:08   ` Samuel Abraham
  2026-03-10 15:11   ` Samuel Abraham
  1 sibling, 0 replies; 7+ messages in thread
From: Samuel Abraham @ 2026-03-03 12:08 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Karthik Nayak, Justin Tobler, Siddharth Asthana,
	Ayush Chandekar, Lucas Seiki Oshiro, Junio C Hamano,
	Patrick Steinhardt, Phillip Wood

On Tue, Mar 3, 2026 at 10:27 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> Hi,
>
> On Sun, Mar 1, 2026 at 12:27 AM Abraham Samuel Adekunle
> <abrahamadekunle50@gmail.com> wrote:
> >
> > Hello,
> > This is my proposal for the project
> > "Implement promisor remote fetch ordering" for the 2026 GSoC programme.
>
> Thanks for being interested in Git and this project in particular.

Thank you.

>
> > Personal Bio:
> > =============
> > Full Name:  Abraham Samuel Adekunle
> > Email: abrahamadekunle50@gmail.com
> > GitHub: https://github.com/devdekunle
> > Pronouns: he/him
> >
> > About Me:
> > =========
> > My name is Abraham Samuel Adekunle. I love to code, read and I am a
> > harworker. In my free time I love to play games and listen to soothing
>
> I guess: s/harworker/hardworker/

Okay I will fix it

>
> > music and well, also shifting into diffuse thinking to gain a new
> > perspective of whatever challenge I am trying to solve.
>
> [...]
>
> > Contributions to the Git Community:
> > ====================================
> > My first contribution to the Git community was during the contribution
> > phase of the December 2024 Outreachy contribution phase where I first
> > learned to send patches and had my first interactions with the Git code
> > base. I did not make it through then but it was an opportunity to try
> > again.
>
> Nice that you are trying again.

Thank you :)

>
> > Contributions to other Communities:
> > ===================================
> > I have contributed very sparingly to the Systemd project and also
> > the Linux Kernel.
> >
> > Microproject:
> > =============
> > Link: https://lore.kernel.org/git/aV_IGCld5T_dBxTs@Adekunles-MacBook-Air.local/
> > Branch: aa/add-p-previous-decisions
> > Status: Merged to master
> > Commit ID: 8cafc305e22a59efb92472d4132616e24d3184c6
> > Description:  "git add -p" and friends notes what the current status
> >                of the hunk being shown is
> >
> > Other Contributions:
> > ====================
> > 1.
> > Link: https://lore.kernel.org/git/cover.1771066252.git.abrahamadekunle50@gmail.com/
> > Branch: aa/add-p-no-auto-advance
> > Status: Merged to next
> > Description: "git add -p" learned a new mode that allows the user to
> >               revisit a file that was already dealt with
> >
> > 2.
> > Link: https://lore.kernel.org/git/aWZkEYHhcIhdAjkh@Adekunles-MacBook-Air.local/
> > Status: Stalled
> > Description: the patch attempts to remove the use of the_repository
> >              global variable in some builtins
>
> It looks like you also have 2 contributions merged from October 2024
> (when you applied for Outreachy). You can mention them too.

Okay I will do that.

>
> > Project Overview and Objective:
> > ===============================
> > I have always wondered what happens in the background when I see these
> > details on my screen in a "git fetch" process.
> >
> >         remote: Enumerating objects: 57, done.
> >         remote: Counting objects: 100% (57/57), done.
> >         remote: Compressing objects: 100% (12/12), done.
> >         Receiving objects: 100% (57/57), 48.3 KiB | 512.00 KiB/s, done.
> >         Resolving deltas: 100% (21/21), done.
> >         remote: Total 57 (delta 21), reused 13 (delta 5), pack-reused 30
> >         From https://example.com/me/repo
> >         1a2b3c4..5d6e7f8  feature/xyz -> origin/feature/xyz
> >
> > And when I saw this project from the list of projects listed,
> > I was endeared to it as it is an opportunity to work in an area of the
> > that Git code base that will satisfy my curiousity while also being
>
> s/curiousity/curiosity/

Thanks

>
> > mentored by very best and most experienced Engineers there is.
> >
> > When a Git repository is configured with multiple promisor remotes,
> > there is currently no mechanism to specify or optimize the order in
> > which these remotes should be queried when fetching missing objects.
> > Different remotes may have different performance characteristics
> > such as characteristics, cost, or reliability which makes the
> > fetching order an important consideration.
>
> In which order are they currently queried?

In the order they appear in the config file, with promisor remote
configured with the
extensions.partialClone (most likely "origin") bring the last one tried.

>
> > The project aims to implement a fetch ordering mechanism for multiple
> > promisor remotes by designing a flexible system that allows a server
> > to dictate their preferred order to the client to ensure performance
> > and cost management.
>
> A part of the whole system that allows servers to advertise
> information already exists and should be reused.
>
> We use "advertise" instead of "dictate" because the client should be
> able to decide.

I will reword it. Thanks

>
> > Review of Previous Work:
> > ========================
> > The project is part of the Large Object Promisor "LOP" effort
> > documented in Documentatio/technical/large-object-promisor.adoc.
>
> s/Documentatio/Documentation/
> s/large-object-promisor/large-object-promisors/

Thank you

>
> > In a bid to better handle large objects, the promisor-remote
> > capability was added to the Git protocol v2, as documented in
> > the promisor-remote section of Documentation/gitprotocol-v2.adoc,
> > which enables a protocol negotiation so that the server can advertise
> > one or more promisor remotes and so that the client and server can
> > discuss if the client could directly use a promisor remote the server
> > is advertising and if an agreement is reached, the client would be
> > able to get the large blobs directly from the promisor remote without
> > the server acting as a relay between the client and the promisor remote when
> > fetching missing large blobs.
> >
> > The ground work for adding this capability to the v2 protocol was
> > started by Christian Couder in [1], where if the "promisor.advertise"
> > config is set to true, the server can then propagate its promisor remote
> > configurations to the client over the v2 protocol during the negotiation
> > in the form
> >
> >         "promisor-remote=name=prom1,url=url_encoded_value1;name=prom2,url=url_encoded_value2"
> >
> > The client can then choose to accept some promisor remotes the server
> > is advertising using the "All", "None", "KnownName" or "KnownUrl"
> > configurations as values for the "promisor.acceptfromServer" config option.
> >
> > In [2], Christian added the option for a server to advertise more
> > fields after the "name" and "url", such as "token" and
> > "partialCloneFilter" for the client to use this additional information
> > in deciding the remotes to use as its promisor remotes by comparing it
> > with its local config information.
> >
> > This was implemented by adding the "promisor.sendFields" and "promisor.checkFields"
> > config values to the server and client respectively.
> > For example, if "promisor.sendFields" is set to "partialCloneFilter", and the
> > server has the remote configured like so:
> > [remote "foo"]
> >         url = https://pr.test
> >         partialCloneFilter = blob:none
> >         token = "fake"
> > then
> > "name=foo,url=https://pr.test,partialCloneFilter=blob:none,token=fake"
> > will be advertised by the server to the client who can then decide,
> > using the "promisor.checkFields" setting, to check if the passed field
> > matches certain conditions before deciding to use it.
> >
> > This work by Christian is very crucial to this project as I will take
> > advantage of this and enable the advertisement of a "priority" field
> > that the server can use to communicate with the client in deciding to
> > use the server recommended fetch order or not.
> >
> > in [3] Christian also implemented the option "promisor.storeFields" which
> > allowed the value of the configuration to be saved in the client's
> > configuration file for use at a later time.
> > As above, this option will also prove important when the server advertises
> > the "priority" field as it will allow the client decided to store it in its
> > config settings for that promisor remote, for later use when fetching
> > the remaining blobs from the promisor remotes.
>
> Yeah, this is about allowing the server to advertise priority
> information, and the client to accept it or not, but this doesn't talk
> much about how this information will be used to actually change the
> fetch order.
>
> It would be nice if this could talk about which order is currently
> used. You might want to take a look at
> Documentation/technical/partial-clone.adoc, especially the "Using many
> promisor remotes" section.

Okay thank you. I will add that to the v2.

>
> > High Level Approach to Project Execution:
> > =========================================
> > 1. Server Side Advertisement:
> > -----------------------------
> > As the server knows about the promisor remotes which hold the
> > large object blobs,
>
> First I would say "large blob objects" or just "large blobs" instead
> of "large object blobs" if I wanted to talk about them.

Okay thank you

>
> Then it's true that the "promisor-remote" capability in protocol v2
> was developed especially to help with large blobs and the LOP effort,
> but this GSoC project could be useful for any partial clone that uses
> multiple promisor remotes. So you could talk about "objects", not just
> "large blobs".

Okay Noted

>
> > it could recommend the order in which these remotes
> > could be queried by the client using a "priority=<value>" field of the
> > promisor-remote capability in the Git v2 protocol, where <value> could
> > be an integer between 1 and 65535, where the smallest integer indicates
> > highest priority.
> >
> > This will be an optional feature which will be enabled by the server
> > if it wants to recommend ordered fetching to the client via
> > the "promisor.sendFields=priority" config option.
> >
> > Hence if the server advertises promisor remotes prom1 and prom2,
> > it could be of the form
> >         "promisor-remote=name=prom1,url=https://prom1.com,priority=10;name=prom2,url=https://prom2.com,priority=20",
> > if the server is configured as:
> > [remote "prom1"]
> >          url = https://prom1.com
> >          priority = 10
> > [remote "prom2"]
> >         url = https://prom2.com
> >         priority = 20
> >
> > If the "promisor.sendFields" values does not include the "priority"
> > field in its comma or space separated options, the field will not be
> > advertised in the promisor-remote capability.
>
> The issue is that right now "priority = 10" or "priority = 20" if they
> were configured would change nothing in the order used to fetch from
> promisor remotes. So the first thing to do (before having the server
> send that and the client use it or not) is to actually introduce the
> `remote.<name>.priority` config option and make it change the fetch
> order. When that works, it makes sense to allow the server to
> advertise it, and the client to accept it or not from the server.

Yes thank you.
I will fix this in the v2

>
> > 2. Client Side Parsing:
> > -----------------------
> > The client can already use the "promisor.acceptFromServer" option to
> > decide which promisor remotes it will accept, so this new field
> > "priority" might not be significant at all in the deciding phase but when
> > fetching missing blobs from the accepted promisor remotes.
>
> If that's what you mean, I agree that the priority advertised by a
> server for a promisor remote is not likely to be a (good) criteria on
> the client side to help decide if the client accepts to use the
> promisor remote or not. You might want to reword the above paragraph
> though as it's not easy to understand.

Okay

>
> > Instead, if the client wants to use the server recommended "priority"
> > later when fetching the missing blob from the accepted promisor remotes,
> > the "priority" field will be added to the "promisor.storeFields" config
> > options so that the passed value can be saved to the client config.
>
> Yeah, that's the most likely way the client would use it.
>
> > If the client does not enable this option in the config, the "priority"
> > field will not be saved in the local config and the fetching order will
> > default to the local config order.
>
> Right.
>
> > A new config "promisor.honorServerFetchOrder" will be implemented
> > on the client side to determine if the client will use the recommended
> > server advertised promisor remote fetching order or not.
>
> I don't think this is necessary. If the client doesn't want to use the
> priority advertised by the server, it just needs to not add "priority"
> to the "promisor.storeFields" config variable.

Okay thank you

>
> > This config can only be enabled if "promsior.acceptFromServer" is not
> > "None".
> >
> > The options for this config value will be [true|false|local-first] where
> > "false" (default) ignores server priority and will rely on the current
> > config order.
> > "true" sorts candidate advertised remotes by priority in ascending
> > order (smallest tried first).
> > "local-first" will try remotes in local .git/config first in the order
> > the promisors are placed in the config file  and then
> > server advertised ones ordered by priority, if the object has not been
> > found by now. This last values makes me feel somehow as all objects
>
> s/values/value/
>
> > could have been fetched already but I am just stating my thought process.
>
> I think we will likely not need something like this. The 3 different
> possibilities could be configured this way:
>
> - to rely on the order advertised by the server: just add "priority"
> to "promisor.storeFields"
> - to rely on local "priority" config: just add "priority = XXX" to
> some/all "remote.<name>"
> - to rely on the default order: add nothing

Thank you for the guidance. I will fix all the changes in the v2

>
> > Proposed Project Execution Timeline:
> > ====================================
>
> This needs to take into account that the first step should be to
> actually introduce the `remote.<name>.priority` config option and make
> it change the fetch order.

Yes
Thank you for the review.

Abraham

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] [Proposal]: Implement promisor remote fetch ordering
  2026-03-03  9:27 ` Christian Couder
  2026-03-03 12:08   ` Samuel Abraham
@ 2026-03-10 15:11   ` Samuel Abraham
  1 sibling, 0 replies; 7+ messages in thread
From: Samuel Abraham @ 2026-03-10 15:11 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Karthik Nayak, Justin Tobler, Siddharth Asthana,
	Ayush Chandekar, Lucas Seiki Oshiro, Junio C Hamano,
	Patrick Steinhardt, Phillip Wood

On Tue, Mar 3, 2026 at 10:27 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> Hi,
>
> On Sun, Mar 1, 2026 at 12:27 AM Abraham Samuel Adekunle
> <abrahamadekunle50@gmail.com> wrote:
> >
> > Hello,
> > This is my proposal for the project
> > "Implement promisor remote fetch ordering" for the 2026 GSoC programme.
>
> Thanks for being interested in Git and this project in particular.
>

Hello Christian.
Thank you for taking out time to review my proposal.
I have made your recommended changes and sent a v2.

Thanks
Abraham

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [GSoC Proposal] Implement promisor remote fetch ordering
@ 2026-03-10 18:25 Lorenzo Pegorari
  2026-03-14 17:30 ` Christian Couder
  0 siblings, 1 reply; 7+ messages in thread
From: Lorenzo Pegorari @ 2026-03-10 18:25 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Karthik Nayak, Justin Tobler, Siddharth Asthana,
	Ayush Chandekar, Junio C Hamano

The following is my proposal for the GSoC'26 for the project "Implement
promisor remote fetch ordering".

As soon as the the contributor application period begins, I will submit
the proposal in PDF format to the official GSoC website.

I have dedicated a large section (about 40%) of the proposal to
explaining the current situation and the tests that I have done to gain a
lot of hands-on experience. I consider this section important, but if it
too long-winded, please let me know.

Thank you so much to everyone that is going to spend their time reading
this proposal and giving me their feedback.


==============================

"Implement promisor remote fetch ordering"

==============================


# Personal information

Name: Lorenzo Pegorari
Pronouns: he/him
Location: Cremona, Lombardy, Italy
Timezone: CET (UTC+1)
Email: lorenzo.pegorari2002@gmail.com
GitHub: https://github.com/LorenzoPegorari
LinkedIn: https://www.linkedin.com/in/lorenzopegorari/

------------------------------

# Background

## General

Hi Git team! My name is Lorenzo Pegorari and I am a 23-year-old student
from Italy.

Throughout my undergraduate studies, I constantly tried to differentiate
myself by taking part in as many interesting experiences as possible, to
better define my future professional path. And so, at the end of 2024, I
decided to join the FOSS world in order to improve my software
engineering skills and to contribute to projects that I find meaningful.

My goal is to join the broader Linux community, mostly focusing on the
Linux kernel and Git, and to prove to myself that I am capable of
participating in prestigious (and, in the context of the GSoC,
competitive) organizations. My dream is to one day become a cornerstone
in one of these open-source communities.

## Education

I am currently in the final year of my BSc in Computer Science and
Engineering at Politecnico di Milano (Milan, Italy).

## Previous Open-Source Experience

I am fairly new to contributing to open-source projects, which is why I
am applying for the Google Summer of Code in the first place.

My first step in the open-source world happened on October 16th 2024,
when I released my first project Simply Colorful [1], a free theme for
the note-taking application Obsidian that has now gathered close to 10000
downloads since its release.

Last year, in the summer of 2025, I also had the honor of participating
in the Google Summer of Code 2025 with the organization BRL-CAD, where I
successfully completed my proposed project "Developing a MOOSE-based
Console for arbalest: a first step to merge arbalest and qged" [2].

The general goal of the project was to take the initial step in merging
BRL-CAD's two in-development GUIs: "arbalest" and "qged". The primary
objective was to transfer qged's sophisticated GED (Geometry EDiting)
console, which works via low-level calls to BRL-CAD's core libraries,
into arbalest, while preserving the application's distinctive clean and
easy-to-scale architecture. To support this endeavor, I also expanded
BRL-CAD's new lightweight, modular, object-oriented API, known as
"MOOSE". In addition to these core tasks, I also fixed some compatibility
issues related to arbalest's Qt6 widgets to ensure proper display across
different OSs, and resolved various GUI-related bugs.

More information regarding my previous GSoC participation can be found in
my final project report [3] and in my GSoC'25 daily notes [4]. I am also
extremely happy to say that my work was much appreciated by the BRL-CAD
community, with the organization admin Christopher Sean Morrison stating
that my final project report was "outstanding", and my amazing mentor,
Dr. Daniel Rossberg, noting that my performance was "awesome" and that I
was "a pleasure to mentor".

I would also like to state that, even though I have been quite busy
lately, I still consider myself a part of the BRL-CAD community, having
done some small fixes after the GSoC'25 ended, and now having personally
helped some new developers to (hopefully) join BRL-CAD for the GSoC'26.

## Git Experience

I joined the Git community at the beginning of 2026, but I have been
interested in this project since 2024. In fact, last year, when I was
deciding which organization to join the GSoC'25 with, I seriously
considered Git, but in the end I discarded it because I felt not skilled
enough to take part in such a complex organization. This time, though, I
feel much more confident, and so here I am!

Since the start of this year, I have explored the codebase as much as
possible, focusing on the GSoC project ideas regarding partial clones,
which were the ones that I personally found most interesting and valuable.

So far, I have made the following contributions to Git:

 * [GSoC PATCH v2] diff: improve scaling of filenames in diffstat to handle UTF-8 chars
   * Link: https://lore.kernel.org/git/cover.1768520441.git.lorenzo.pegorari2002@gmail.com
   * Description: The computation of column width made by `git diff --stat`
                  was confused when pathnames contained non-ASCII chars.
		  This issue was reported by a `NEEDSWORK` comment.
   * Status: Merged to `master`
 
 * [GSoC PATCH v3] diff: handle ANSI escape codes in prefix when calculating diffstat width
   * Link: https://lore.kernel.org/git/cover.1772226209.git.lorenzo.pegorari2002@gmail.com
   * Description: Fixed `git log --graph --stat` not correctly counting
                  the display width of colored graph part of its own
		  output. This issue was reported by a `NEEDSWORK` comment.
   * Status: Merged to `master`.
    
 * [GSoC PATCH v3] doc: improve gitprotocol-pack
   * Link: https://lore.kernel.org/git/cover.1772502209.git.lorenzo.pegorari2002@gmail.com
   * Description: Improved the `gitprotocol-pack` documentation.
   * Status: Will merge to `master`.

## Experience With C

C is my primary language. I used it throughout my university courses, for
most of my personal projects, and during my GSoC'25 project with BRL-CAD,
where, although my main tasks involved using C++ and Qt6, I had to
constantly interface with BRL-CAD's core libraries, which are written in C.

------------------------------

# Current Situation & Testing

## Partial Clones

The "partial clone" feature was introduced to better handle extremely
large repositories, particularly those that contain large binary files.
The problem is clearly shown with the following example, which
illustrates how quickly the size of a Git repository can grow when just
one single 1 MB binary file is frequently committed:

```
#!/bin/bash
git init size_check
# Create 1MB file of random data, to simulate a compressed binary
head -c 1M </dev/urandom >size_check/foo 
git -C size_check add foo
git -C size_check commit -m "foo"
du -hs size_check/.git/objects  # .git/objects size after 1 foo commit = 1.1 MB
for i in {1..50}; do
    head -c 1M </dev/urandom >size_check/foo  # Change the 1MB file
    git -C size_check commit -a -m "foo"
done
du -hs size_check/.git/objects  # .git/objects size after 51 foo commits = 53 MB
```

Partial clones avoid this issue during `clone` and `fetch` operations by
passing all the objects to download through a `--filter=<filter-spec>`
specified by the user, which will limit the number of blobs and trees
that actually get downloaded. The `<filter-spec>`, can, for example, be:
 * `blob:none`, which will filter out all blobs.
 * `tree:0`, which will filter out all trees.
 * `blob:limit=5k`, which will filter out all blobs whose size is greater
   than $5$kB.

The filtered out objects will be lazily downloaded when the user runs a
command that requires those missing data.

This mechanism works with the following steps:
 * When the client wants to fetch some objects from the server using a
   filter, the client, after sending a list of capabilities it wants to
   be in effect, sends the `filter: <filter-spec>` capability, followed
   by a request for the objects that the client wants to retrieve. The
   following is an example of a request (extracted using
   `GIT_TRACE_PACKET=1`) made by a client to a server to fetch 1 object
   using the `<filter-spec>=blob:none`:

   ```
   [...]
   pkt-line.c:85           packet:        fetch< 0000  # "flush-pkt"
   pkt-line.c:85           packet:        fetch> command=fetch  # Execute fetch
   pkt-line.c:85           packet:        fetch> agent=git/2.43.0
   pkt-line.c:85           packet:        fetch> object-format=sha1
   pkt-line.c:85           packet:        fetch> 0001  # "delim-pkt"
   pkt-line.c:85           packet:        fetch> thin-pack  # Capability
   pkt-line.c:85           packet:        fetch> no-progress  # Capability
   pkt-line.c:85           packet:        fetch> ofs-delta  # Capability
   pkt-line.c:85           packet:        fetch> filter blob:none  # Filter capability
   # OID of the object the client wants to retrieve
   pkt-line.c:85           packet:        fetch> want 394ca7a7b5e75a57e736040480f685c8b71844eb  
   pkt-line.c:85           packet:        fetch> done  # End fetch
   pkt-line.c:85           packet:        fetch> 0000  # "flush-pkt"
   [...]
   ```

 * The server will apply the requested `<filter-spec>` as it creates the
   "promisor packfile" of the requested objects. A packfile is a binary
   file that is used to compress many "loose objects", and it does so by
   containing the most recent versions of the stored objects and deltas
   of the previous versions of those objects. A promisor packfile is a
   filtered packfile, where the unwanted objects are not present. The
   promisor packfile is sent to the client.

 * When the client receives the promisor packfile, it can operate
   normally by knowing that the "promisor objects" (the filtered-out
   objects) will be dynamically fetched when needed from "promisor
   remotes" (remotes that have "promised" that they have the missing
   objects). The promisor remotes are defined using the
   `remote.<name>.promisor` and `remote.<name>.partialCloneFilter`
   configuration variables.

## Multiple Promisor Remotes & Testing

Focusing on promisor remotes, currently multiple of them can be
configured and used. This feature gives users the flexibility to use a
specific promisor remote when convenient (e.g., a remote that is
closer/faster for some kind of object). This is particularly useful when
working with extremely large repositories ($+100$GB) that contain many
large binary files. These types of repositories can greatly benefit from
having multiple promisor remotes: a common example is setting them up so
that one promisor remote can act as a "Large Object Promisor" (LOP),
meaning a remote that is used only to store large blobs, while the other
one will be the main remote, used to store everything else.

I created a minimal example setup, mostly based on the test
`t/t5710-promisor-remote-capability` added by `4602676` ("Add
'promisor-remote' capability to protocol v2", 2025-02-18), to experiment
with multiple promisor remotes, in order to not simply rely on the
documentation, but to actually get hands-on experience. The example setup
creates a `server`, a 'lopm' ("Large Object Promisor medium") for blobs
larger than 5kB, a `lopl` ("Large Object Promisor large") for blobs
larger than 50kB, and a `client` that interfaces with all of these
remotes. It is created in the following way:

 * Initially, a very simple Git repository `template` is created, which
   contains just 3 commits, 3 trees, and 3 blobs of different sizes (as
   shown by using `git verify-pack -v`:

   ```
   # "git -C template verify-pack -v *.pack" output:
   <OID-commit-large> commit 217 157 12
   <OID-commit-medium> commit 218 159 296
   <OID-commit-small> commit 169 127 169
   <OID-tree-large> tree   102 105 455
   <OID-tree-medium> tree   69 76 606
   <OID-tree-small> tree   35 46 560
   <OID-blob-large> blob   102400 102444 682  # 100kB blob
   <OID-blob-medium> blob   10240 10254 103126  # 10kB blob
   <OID-blob-small> blob   6 15 113380  # 6 bytes blob
   non delta: 9 objects
   <*.pack>: ok
   ```

 * The bare `server`, based on the `template`, and the bare and empty
   `lopm` and `lopl`, are generated using, respectively, `git clone
   --bare --no-local template server` and `git init --bare lop[m|l]`.
 
 * The objects inside the `server` are unpacked, with all blobs larger
   than 5kB copied inside `lopm` and all blobs larger than 50kB copied
   inside `lopl`. The `server` is then repacked using the command `git
   repack -a -d --filter=blob:limit=5k` (to remove blobs larger than
   5kB), and finally a ".promisor" file is created with the same name as
   the ".pack" file, to tell Git that all missing objects from the pack
   can be found in the configured promisor remotes.

 * The `server` configuration is modified to support `lopm` and `lopl` as
   promisor remotes. Also, make it so that inside `server`, `lopm`, and
   `lopl`, `upload-pack` will support partial clone and partial fetch
   object filtering (using `uploadpack.allowFilter`), and will accept a
   fetch request that asks for any object at all (using
   `uploadpack.allowAnySHA1InWant`):

   ```
   git -C server remote add lopm "file://$(pwd)/lopm"  # Add lopm remote to server
   git -C server config remote.lopm.promisor true  # Make lopm a promisor remote
   git -C server remote add lopl "file://$(pwd)/lopl"  # Add lopl remote to server
   git -C server config remote.lopl.promisor true  # Make lopl a promisor remote
   git -C server config uploadpack.allowFilter true
   git -C server config uploadpack.allowAnySHA1InWant true
   git -C lopm config uploadpack.allowFilter true
   git -C lopm config uploadpack.allowAnySHA1InWant true
   git -C lopl config uploadpack.allowFilter true
   git -C lopl config uploadpack.allowAnySHA1InWant true
   ```

 * The `client` is created by doing a partial clone of the `server`, and
   adding `lopl` and `lopm` as promisor remotes: 

   ```
   GIT_TRACE=$(pwd)/trace \  # Env var to trace general messages
   GIT_TRACE_PACKET=$(pwd)/packet \  # Env var to trace messages for in/out packets 
   GIT_NO_LAZY_FETCH=0 \  # Env var to enable lazily fetch missing objects on demand
   git clone \
       -c remote.lopl.url="file://$(pwd)/lopl" \  # Add remote lopl
       -c remote.lopl.fetch="+refs/heads/*:refs/remotes/lopl/*" \
       -c remote.lopl.promisor=true \  # Make lopl a promisor remote
       -c remote.lopm.url="file://$(pwd)/lopm" \  # Add remote lopm
       -c remote.lopm.fetch="+refs/heads/*:refs/remotes/lopm/*" \
       -c remote.lopm.promisor=true \  # Make lopm a promisor remote
       --no-local --filter="blob:limit=5k" server client
   ```

Now, with this setup, by slightly tweaking the configurations of each
repository, it is possible to deeply test how multiple promisor remotes
are handled in various situations, and actually see what is described in
the documentation.

## Testing Promisor Remotes Advertisement

An important thing to test is the promisor remotes advertisement feature.
This feature is dependent on 2 main configuration options: the
server-side option `promisor.advertise`, which enables the server to
advertise the promisor remotes it is using to the client, and the
client-side option `promisor.acceptFromServer`, which describes how the
client should handle the promisor remotes advertised:

 * If `promisor.advertise=false`, when the `client` wants to fetch an
   object that the `server` does not have, the `server` will not
   advertise the `promisor-remote` capability, and so it has no other
   choice than to first fetch the object from `lopl` and/or `lopm`, and
   then give it to the `client`. This can be checked by doing `git -C
   server rev-list --objects --all --missing=print`, and seeing that the
   previously missing large blobs are now present inside the `server`, or
   by directly looking into the `GIT_TRACE_PACKET` output, and seeing
   that there is no reference to the `promisor-remote` capability.

 * If `promisor.advertise=true`, when the `client` wants to fetch an
   object that the `server` does not have, the `server` will advertise
   its promisor remotes, as seen by the `GIT_TRACE_PACKET` output, which
   will contain:
    
   ```
   [...]
   packet: upload-pack> promisor-remote= \
       name=lopl,url=file://$(pwd)/lopl; \  # Adv lopl
       name=lopm,url=file://$(pwd)/lopm  # Adv lopm
   [...]
   ```

   The `client` can control what advertised promisor remote to accept with
   the following options:

    * If `promisor.acceptFromServer=All`, the `client` will accept all
      advertised promisor remotes. This can be seen by looking at the
      `GIT_TRACE_PACKET` output, which will contain:
      
      ```
      [...]
      packet: clone> promisor-remote=lopl;lopm  # Accept lopl and lopm
      [...]
      ```
    
    * If `promisor.acceptFromServer=KnownName`, the `client` will accept
      promisor remotes which are already configured and have the same
      name. This can be seen by changing the `lopl` name in the `client`
      configuration, and looking at the `GIT_TRACE_PACKET` output, which
      will contain:
    
      ```
      [...]
      packet: clone> promisor-remote=lopm  # Accept lopm (no reference to lopl!)
      [...]
      ```
        
    * If `promisor.acceptFromServer=KnownUrl`, the `client` will accept
      promisor remotes which are already configured and have the same
      name and URL. This can be seen by changing the `lopl` URL in the
      `client` configuration, and looking at the `GIT_TRACE_PACKET`
      output, which will contain:
    
      ```
      [...]
      packet: clone> promisor-remote=lopm  # Accept lopm (no reference to lopl!)
      [...]
      ```

    * If `promisor.acceptFromServer=None`, the `client` won't accept any
      advertised promisor remotes.

Additional pieces of information can be sent by the server when
advertising its promisor remotes to the client. These pieces of
information are configured in the server-side configuration variable
`promisor.sendFields`, and currently can be:

 * `partialCloneFilter`, which contains the partial clone filter used for
   the remote.

 * `token`, which contains an authentication token for the remote.

The client-side configuration variable `promisor.checkFields` can be used
by the client to check if the values transmitted by a server correspond
to the values in its own configuration, and accept the promisor remote if
they are the same.

A simple test can be done by adding the `remote.lopl.partialCloneFilter`,
`remote.lopl.token`, and the `promisor.sendFields` variables to the
`server` configuration. The output of `LOG_TRACE_PACKET` will contain:

```
[...]
packet:  upload-pack> promisor-remote=name=lopl, \  # name field will always be sent
    url=file://$(pwd)/lopl, \  # url field will always be sent
    partialCloneFilter=blob:none, \  # partialCloneFilter field of lopl
    token=value; \  # token field of lopl
    [...]
```

Recently, with the patch series "Implement `promisor.storeFields` and
`--filter=auto`" [5], the new client-side configuration variable
`promisor.storeFields` was added. It contains a list of field names
`partialCloneFilter` and/or `token`), and the values of these fields,
when transmitted by the server, will be stored in the local configuration
on the client.

## Testing Multiple Promisor Remotes Fetch Order

Finally, the last mechanism that is fundamental to understand is the
fetch order when multiple promisor remotes are defined:

 * When multiple remotes are configured, they are tried one after the
   other in the order in which they appear in the configuration, until
   all objects are fetched. This can be easily seen from the output of
   `GIT_TRACE`, which initially tries to fetch the objects from `lopl`,
   and then from `lopm`:

   ```
   [...]
   trace: built-in: git fetch lopl [...] --filter=blob:none [...]
   [...]
   trace: built-in: git fetch lopm [...] --filter=blob:none [...]
   [...]
   ```

   While, if we make it so that we first define `lopm` in the `client`
   configuration, then initially `lopm` will be used to fetch the
   objects, and `lopl` will not be used at all (because `lopm` contains
   all required objects:

   ```
   [...]
   trace: built-in: git fetch lopm [...] --filter=blob:none [...]
   [...]
   ```

 * If the configuration option `extensions.partialClone` is present, the
   promisor remote that it specifies will always be the last one tried
   when fetching objects.
    
------------------------------

# "Implement promisor remote fetch ordering"

## Project Goal

This project aims to improve Git by implementing a fetch ordering
mechanism for multiple promisor remotes, that can be:

 * Configured locally by the client.
 * Advertised by servers through the `promisor-remote` protocol.

## Approach

The bulk of the project will be the creation of a system that allows to
define the order with which the promisor remotes will be tried when
fetching an object.

The first goal will be the creation of a `remote.<name>.promisorPriority`
configuration option, which will hold a number between 1 and 'UCHAR_MAX',
and which defines the priority of that promisor remote in the fetch
order. This means that the order in which the promisor are tried will be
the following:

 * All promisor remotes that have a valid `remote.<name>.promisorPriority`,
   starting from the one with higher priority (the lower `promisorPriority`
   value). If 2 or more promisor remotes have the same priority, they will be
   tried following the order in which they appear in the configuration file.

 * All promisor remotes that don't have or have an invalid
   `remote.<name>.promisorPriority` configuration option. If 2 or more
   promisor remotes don't define any priority, or have an invalid priority,
   they will be tried following the order in which they appear in the
   configuration file.

 * The promisor remote defined inside the `extensions.partialClone`, no
   matter their priority (which will be ignored if present). This is
   necessary for backward compatibility.
    
Having already taken a look at the code, I have a general idea of th
major steps to take to actually introduce the
`remote.<name>.promisorPriority` configuration option:

 * Modify the `promisor_remote` linked list (inside `promisor-remote.h`)
   to introduce the new member `promisor_priority}` and the
   `promisor_remote_config()` function (inside `promisor-remote.c`), to
   correctly fill the `promisor_priority` of all promisor remotes read
   from the configuration file.

 * Modify the `promisor_remote_get_direct()` function (defined inside
   `promisor-remote.c`), which fetches all requested objects from all
   promisor remotes, trying them one at a time until all objects are
   fetched, to make it follow the previously defined promisor remote order.

When the first goal is achieved, the client-side-only fetch ordering
mechanism for multiple promisor remotes, controllable locally from the
client configuration, will be complete.

The second goal will be the introduction of the new `promisorPriority`
field for the `promisor.sendFields`, `promisor.checkFields`, and
`promisor.storeFields` configuration variables. With this new field, the
server will be able to tell the priorities of the promisor remotes that
it advertises to the client, and the client will be able to either check
or store these suggested priorities.

My general plan to implement the `promisorPriority` field is the following:

 * Create the `static const char promisor_field_priority` variable inside
   `promisor-remote.c`, and add this variable inside the `known_fields` array.

 * Introduce the new member `priority` to the `promisor_info struct`, a
   structure for promisor remotes involved in the `promisor-remote`
   protocol capability, and the new member `store_priority` to the
   `store_info struct`, a structure used in the "store fields" mechanism.

 * Create the new `valid_priority()` function, which has to parse the
   value inside the `promisorPriority` field, and check if it is valid.

 * Modify many functions inside of the `promisor-remote.c` file to
   support the new field. Some of these functions are:

    * `promisor_remote_info()`
    * `set_one_field()`
    * `match_field_against_config()`
    * `all_fields_match()
    * `parse_one_advertised_remote()`
    * `store_info_new()`
    * `promisor_store_advertised_fields()`

When the second goal is achieved, the mechanism for servers and clients
to, respectively, advertise and check/store the promisor remote fetch
order will be complete.

# Possible Issues

From my understanding, the project as it is proposed will handle all
possible cases, except for one. Let's imagine the following situation:

 * `server1` and `server2` both use the promisor remotes `lop1` and `lop2`.
 * `client` has both `server1` and `server2` as remotes.

In this situation, the `client` has no way to specifically say that when
fetching from `server1`, it wants to first try `lop1` and then `lop2`, while
when fetching from `server2`, it wants to first try `lop2` and then `lop1`.

One way to solve this very specific (and maybe unusual) issue is to
introduce a way to associate a `promisorPriority` to a specific remote. 

## Development Schedule

Project size: large (350 hours).

Timeline:

 * May 01 - May 24 (Community Bonding Period):
    * Discuss with the mentor(s) the best plan to implement the new features.
    * Get familiar with the Git components that are required to implement the new features.
 * May 25 - June 14
    * Add the `remote.<name>.promisorPriority` configuration option.
    * Write tests for the new feature.
    * Update the documentation.
 * June 15 - June 28
    * Implement all the suggestions made by the mentor(s)/community.
    * Refine the patch series
 * June 29 - July 10
    * Complete all remaining work.
    * Submit the midterm project report for evaluation.
 * July 11 - August 02
    * Add support for the `promisorPriority` field.
    * Write tests for the new feature.
    * Update the documentation.
 * August 03 - August 16
    * Implement all the suggestions made by the mentor(s)/community.
    * Refine the patch series
 * August 17 - August 24
    * Wrap up everything that is still pending.
    * Submit the final project report for evaluation.

This development schedule can be subject to changes/corrections during
the "Community Bonding Period".

## Time Availability

I plan to spend 5-6 hours a day from Monday to Saturday on this projects,
so roughly around 30-36 hours a week.

I intend to keep a daily log of what I do, similar to what I have done
during the GSoC'25.

------------------------------

# Possible questions

## Am I eligible for the GSoC?

Yes. It is possible to participate for a second GSoC term as long as the
contributor is still a student.

## Will I use AI?

Mostly no. Most studies right now show that the use of LLM-assisted
coding, for junior developers, is detrimental in many ways: spending more
time on tasks, creating worse code, and learning less during the process.

Considering that my very first goal as a GSoC contributor is to use this
experience to learn as much as possible, I will not use AI for coding.

I will exclusively use AI to check for grammatical and/or syntactical
errors in sentences I have written. I will never use AI to generate text,
but only to double check it.

## What is my reasoning behind proposing a new feature?

As clearly stated in the "General Applicant Information" in the "Git
Developer Pages" [6], contributors suggesting new features should
carefully consider the many potential issues that may arise, and see if
they can be mitigated before the project is submitted.

My reasoning behind the proposal of this new feature is the following:

 * I think that in my proposal I have shown that I have considered
   thoroughly all possible cases regarding the introduction of the
   "promisor remote fetch ordering" feature, and so I feel that the
   necessary discussion to define the details of the project will be
   very quick.

 * I think the proposed new feature is not prone to long naming or user
   interface discussions.

 * I think that the "promisor remote fetch ordering" feature is a
   necessary step to fully support multiple promisor remotes, and to
   fully support the partial clone mechanism.

 * I think the proposed project is not too complex or too difficult for
   me to handle. In fact, although I was interested, I discarded the
   "enhance promisor-remote protocol for better-connected remotes" project
   idea, precisely because it seemed like a way too big and complex
   feature to handle for a GSoC project.

## Why Git?

As I have said already, I have been interested in contributing to Git
since 2024.

The sheer amount of people all across the globe actively using Git and/or
engaging with software that was produced also thanks to Git, makes this
FOSS project, to me, one of the most interesting ones in the world.

Being responsible for the maintenance and development of a software with
this amount of users is extremely challenging, but also really rewarding.
Furthermore, the developers in this community are some of the best in the
industry, and working with them is an amazing opportunity that cannot be
missed.

Finally, simply put, joining the broader Linux community is a dream of
mine, particularly to work on Git and the Linux kernel. In fact, during
February 2026, I didn't work as much on Git, because I was focused on
applying for the LFX "Linux kernel Spring 2026" mentorship to fix bugs in
the Linux kernel.

## Why me?

I hope that it's evident the amount of time and effort that I have put
into this proposal. I intend to give my absolute best to make this
project a success.

Also, having already participated last year in the Google Summer of Code,
I am already very familiar with this online program: I know the dos and
don'ts, and will apply what I learned last year during this term.

Finally, I also intend to continue contributing to Git, particularly to
continue to expand and improve the partial clone feature, which I find
particularly fascinating.

------------------------------

# Links

[1]: https://github.com/LorenzoPegorari/SimplyColorful
[2]: https://summerofcode.withgoogle.com/archive/2025/projects/25f08iuM
[3]: https://lorenzopegorari.github.io/GSoC25-report/
[4]: https://lorenzopegorari.github.io/GSoC25-report/logs
[5]: https://lore.kernel.org/git/20260216132317.15894-1-christian.couder@gmail.com/
[6]: https://git.github.io/General-Application-Information/


==============================


Thanks,

Lorenzo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC Proposal] Implement promisor remote fetch ordering
  2026-03-10 18:25 [GSoC Proposal] Implement promisor remote fetch ordering Lorenzo Pegorari
@ 2026-03-14 17:30 ` Christian Couder
  2026-03-18 16:29   ` Lorenzo Pegorari
  0 siblings, 1 reply; 7+ messages in thread
From: Christian Couder @ 2026-03-14 17:30 UTC (permalink / raw)
  To: Lorenzo Pegorari
  Cc: git, Karthik Nayak, Justin Tobler, Siddharth Asthana,
	Ayush Chandekar, Junio C Hamano

On Tue, Mar 10, 2026 at 7:25 PM Lorenzo Pegorari
<lorenzo.pegorari2002@gmail.com> wrote:
>
> The following is my proposal for the GSoC'26 for the project "Implement
> promisor remote fetch ordering".

Thank you for your interest in Git and this project.

> As soon as the the contributor application period begins, I will submit
> the proposal in PDF format to the official GSoC website.

Good idea.

> I have dedicated a large section (about 40%) of the proposal to
> explaining the current situation and the tests that I have done to gain a
> lot of hands-on experience. I consider this section important, but if it
> too long-winded, please let me know.

[...]

> So far, I have made the following contributions to Git:
>
>  * [GSoC PATCH v2] diff: improve scaling of filenames in diffstat to handle UTF-8 chars
>    * Link: https://lore.kernel.org/git/cover.1768520441.git.lorenzo.pegorari2002@gmail.com
>    * Description: The computation of column width made by `git diff --stat`
>                   was confused when pathnames contained non-ASCII chars.
>                   This issue was reported by a `NEEDSWORK` comment.
>    * Status: Merged to `master`
>
>  * [GSoC PATCH v3] diff: handle ANSI escape codes in prefix when calculating diffstat width
>    * Link: https://lore.kernel.org/git/cover.1772226209.git.lorenzo.pegorari2002@gmail.com
>    * Description: Fixed `git log --graph --stat` not correctly counting
>                   the display width of colored graph part of its own
>                   output. This issue was reported by a `NEEDSWORK` comment.
>    * Status: Merged to `master`.

For the patches that are merged to master, it could help if you could
give the object ID of the merge commit that merged your commits into
master, or alternatively the object ID of all your commits.

>  * [GSoC PATCH v3] doc: improve gitprotocol-pack
>    * Link: https://lore.kernel.org/git/cover.1772502209.git.lorenzo.pegorari2002@gmail.com
>    * Description: Improved the `gitprotocol-pack` documentation.
>    * Status: Will merge to `master`.

Yeah, this has been merged to master after your email.

[...]

> Partial clones avoid this issue during `clone` and `fetch` operations by
> passing all the objects to download through a `--filter=<filter-spec>`
> specified by the user, which will limit the number of blobs and trees
> that actually get downloaded. The `<filter-spec>`, can, for example, be:
>  * `blob:none`, which will filter out all blobs.
>  * `tree:0`, which will filter out all trees.
>  * `blob:limit=5k`, which will filter out all blobs whose size is greater
>    than $5$kB.

Why are there '$' signs above?

> The filtered out objects will be lazily downloaded when the user runs a
> command that requires those missing data.
>
> This mechanism works with the following steps:
>  * When the client wants to fetch some objects from the server using a
>    filter, the client, after sending a list of capabilities it wants to
>    be in effect, sends the `filter: <filter-spec>` capability, followed
>    by a request for the objects that the client wants to retrieve. The
>    following is an example of a request (extracted using
>    `GIT_TRACE_PACKET=1`) made by a client to a server to fetch 1 object
>    using the `<filter-spec>=blob:none`:
>
>    ```
>    [...]
>    pkt-line.c:85           packet:        fetch< 0000  # "flush-pkt"
>    pkt-line.c:85           packet:        fetch> command=fetch  # Execute fetch
>    pkt-line.c:85           packet:        fetch> agent=git/2.43.0
>    pkt-line.c:85           packet:        fetch> object-format=sha1
>    pkt-line.c:85           packet:        fetch> 0001  # "delim-pkt"
>    pkt-line.c:85           packet:        fetch> thin-pack  # Capability
>    pkt-line.c:85           packet:        fetch> no-progress  # Capability
>    pkt-line.c:85           packet:        fetch> ofs-delta  # Capability
>    pkt-line.c:85           packet:        fetch> filter blob:none  # Filter capability
>    # OID of the object the client wants to retrieve
>    pkt-line.c:85           packet:        fetch> want 394ca7a7b5e75a57e736040480f685c8b71844eb
>    pkt-line.c:85           packet:        fetch> done  # End fetch
>    pkt-line.c:85           packet:        fetch> 0000  # "flush-pkt"
>    [...]
>    ```

I think when lazy fetching like this, the filter is always blob:none.
It's not really used anyway because the objects that the client wants
are specified explicitly.

The filter is important when initially cloning or fetching from the
server to specify which objects are initially excluded, even if some
of these  objects will be lazy fetched soon. For example the checkout
part of a clone might need objects that were initially excluded, so it
might lazy fetch some.

>  * The server will apply the requested `<filter-spec>` as it creates the
>    "promisor packfile" of the requested objects.

This is important during an initial clone or fetch, not when lazy fetching.

> A packfile is a binary
>    file that is used to compress many "loose objects", and it does so by
>    containing the most recent versions of the stored objects and deltas
>    of the previous versions of those objects. A promisor packfile is a
>    filtered packfile, where the unwanted objects are not present. The
>    promisor packfile is sent to the client.


> I created a minimal example setup, mostly based on the test
> `t/t5710-promisor-remote-capability` added by `4602676` ("Add
> 'promisor-remote' capability to protocol v2", 2025-02-18), to experiment
> with multiple promisor remotes, in order to not simply rely on the
> documentation, but to actually get hands-on experience. The example setup
> creates a `server`, a 'lopm' ("Large Object Promisor medium") for blobs
> larger than 5kB, a `lopl` ("Large Object Promisor large") for blobs
> larger than 50kB, and a `client` that interfaces with all of these
> remotes. It is created in the following way:

[...]

> Now, with this setup, by slightly tweaking the configurations of each
> repository, it is possible to deeply test how multiple promisor remotes
> are handled in various situations, and actually see what is described in
> the documentation.

Yeah, it's quite complex to set up.

> ## Testing Promisor Remotes Advertisement
>
> An important thing to test is the promisor remotes advertisement feature.
> This feature is dependent on 2 main configuration options: the
> server-side option `promisor.advertise`, which enables the server to
> advertise the promisor remotes it is using to the client, and the
> client-side option `promisor.acceptFromServer`, which describes how the
> client should handle the promisor remotes advertised:
>
>  * If `promisor.advertise=false`, when the `client` wants to fetch an
>    object that the `server` does not have,

I don't think it depends on the client fetching an object the server
does not have. It depends on the client using a filter because the
promisor-remote capability only makes sense in the case of partial
clones (or fetches).

> the `server` will not
>    advertise the `promisor-remote` capability, and so it has no other
>    choice than to first fetch the object from `lopl` and/or `lopm`, and
>    then give it to the `client`. This can be checked by doing `git -C
>    server rev-list --objects --all --missing=print`, and seeing that the
>    previously missing large blobs are now present inside the `server`, or
>    by directly looking into the `GIT_TRACE_PACKET` output, and seeing
>    that there is no reference to the `promisor-remote` capability.
>
>  * If `promisor.advertise=true`, when the `client` wants to fetch an
>    object that the `server` does not have,

Same as above, it doesn't depend on the client fetching an object the
server does not have. It depends on the client using a filter because
the promisor-remote capability only makes sense in the case of partial
clones (or fetches).

> the `server` will advertise
>    its promisor remotes, as seen by the `GIT_TRACE_PACKET` output, which
>    will contain:
>
>    ```
>    [...]
>    packet: upload-pack> promisor-remote= \
>        name=lopl,url=file://$(pwd)/lopl; \  # Adv lopl
>        name=lopm,url=file://$(pwd)/lopm  # Adv lopm
>    [...]
>    ```

[...]

> Recently, with the patch series "Implement `promisor.storeFields` and
> `--filter=auto`" [5], the new client-side configuration variable
> `promisor.storeFields` was added. It contains a list of field names
> `partialCloneFilter` and/or `token`), and the values of these fields,
> when transmitted by the server, will be stored in the local configuration
> on the client.
>
> ## Testing Multiple Promisor Remotes Fetch Order

Yeah, I think this is the most relevant for the project.

> Finally, the last mechanism that is fundamental to understand is the
> fetch order when multiple promisor remotes are defined:
>
>  * When multiple remotes are configured, they are tried one after the
>    other in the order in which they appear in the configuration, until
>    all objects are fetched.

Right, but there is the exception of a remote configured with
`extensions.partialClone` that will be tried last. You mention it
later though.

> This can be easily seen from the output of
>    `GIT_TRACE`, which initially tries to fetch the objects from `lopl`,
>    and then from `lopm`:
>
>    ```
>    [...]
>    trace: built-in: git fetch lopl [...] --filter=blob:none [...]
>    [...]
>    trace: built-in: git fetch lopm [...] --filter=blob:none [...]
>    [...]
>    ```
>
>    While, if we make it so that we first define `lopm` in the `client`
>    configuration, then initially `lopm` will be used to fetch the
>    objects, and `lopl` will not be used at all (because `lopm` contains
>    all required objects:
>
>    ```
>    [...]
>    trace: built-in: git fetch lopm [...] --filter=blob:none [...]
>    [...]
>    ```

Yeah, when all the needed objects have been lazy fetched, there is no
point in further fetching from any remote.

>  * If the configuration option `extensions.partialClone` is present, the
>    promisor remote that it specifies will always be the last one tried
>    when fetching objects.
>
> ------------------------------
>
> # "Implement promisor remote fetch ordering"
>
> ## Project Goal
>
> This project aims to improve Git by implementing a fetch ordering
> mechanism for multiple promisor remotes, that can be:
>
>  * Configured locally by the client.
>  * Advertised by servers through the `promisor-remote` protocol.
>
> ## Approach
>
> The bulk of the project will be the creation of a system that allows to
> define the order with which the promisor remotes will be tried when
> fetching an object.
>
> The first goal will be the creation of a `remote.<name>.promisorPriority`

Yeah, or just `remote.<name>.priority`. The name is to be discussed.

> configuration option, which will hold a number between 1 and 'UCHAR_MAX',

UCHAR_MAX could be system dependent. It might be better to have
configurations work in the same way on all machines though. So perhaps
a fixed range like 1 to 100 would be better. Or are there other ranges
of values used for similar things in Git or other well known software
that could be reused?

> and which defines the priority of that promisor remote in the fetch
> order. This means that the order in which the promisor are tried will be
> the following:
>
>  * All promisor remotes that have a valid `remote.<name>.promisorPriority`,
>    starting from the one with higher priority (the lower `promisorPriority`
>    value). If 2 or more promisor remotes have the same priority, they will be
>    tried following the order in which they appear in the configuration file.
>
>  * All promisor remotes that don't have or have an invalid
>    `remote.<name>.promisorPriority` configuration option. If 2 or more
>    promisor remotes don't define any priority, or have an invalid priority,
>    they will be tried following the order in which they appear in the
>    configuration file.
>
>  * The promisor remote defined inside the `extensions.partialClone`, no
>    matter their priority (which will be ignored if present). This is
>    necessary for backward compatibility.

Yeah, I think something like what you describe makes sense.

> Having already taken a look at the code, I have a general idea of th

s/of th/of the/

> major steps to take to actually introduce the
> `remote.<name>.promisorPriority` configuration option:

[...]

> # Possible Issues
>
> From my understanding, the project as it is proposed will handle all
> possible cases, except for one. Let's imagine the following situation:
>
>  * `server1` and `server2` both use the promisor remotes `lop1` and `lop2`.
>  * `client` has both `server1` and `server2` as remotes.
>
> In this situation, the `client` has no way to specifically say that when
> fetching from `server1`, it wants to first try `lop1` and then `lop2`, while
> when fetching from `server2`, it wants to first try `lop2` and then `lop1`.

Right, but lazy fetching does not only happen as part of a clone or
fetch from a server. It happens when for some reason (like a git show
or a git blame for example) the user needs some objects it doesn't
have locally, and when that happens, this is not related to a single
server.

So global priorities are likely the most useful ones to have.

> One way to solve this very specific (and maybe unusual) issue is to
> introduce a way to associate a `promisorPriority` to a specific remote.

Yeah, but I don't think it would be used a lot. We can perhaps think
of some cases where it could be useful, but in practice it is likely
that if there is an optimal order for one server, it will be optimal
for all other servers too.

[...]

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC Proposal] Implement promisor remote fetch ordering
  2026-03-14 17:30 ` Christian Couder
@ 2026-03-18 16:29   ` Lorenzo Pegorari
  0 siblings, 0 replies; 7+ messages in thread
From: Lorenzo Pegorari @ 2026-03-18 16:29 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Karthik Nayak, Justin Tobler, Siddharth Asthana,
	Ayush Chandekar, Junio C Hamano

On Sat, Mar 14, 2026 at 06:30:57PM +0100, Christian Couder wrote:
> On Tue, Mar 10, 2026 at 7:25 PM Lorenzo Pegorari
> <lorenzo.pegorari2002@gmail.com> wrote:
> >
> > The following is my proposal for the GSoC'26 for the project "Implement
> > promisor remote fetch ordering".
> 
> Thank you for your interest in Git and this project.

Thank you for reading and giving me feedback on my proposal!

> > As soon as the the contributor application period begins, I will submit
> > the proposal in PDF format to the official GSoC website.
> 
> Good idea.

I will send v2 and upload it pretty soon.

> For the patches that are merged to master, it could help if you could
> give the object ID of the merge commit that merged your commits into
> master, or alternatively the object ID of all your commits.

Ack.

> >  * [GSoC PATCH v3] doc: improve gitprotocol-pack
> >    * Link: https://lore.kernel.org/git/cover.1772502209.git.lorenzo.pegorari2002@gmail.com
> >    * Description: Improved the `gitprotocol-pack` documentation.
> >    * Status: Will merge to `master`.
> 
> Yeah, this has been merged to master after your email.

Ack.

> > Partial clones avoid this issue during `clone` and `fetch` operations by
> > passing all the objects to download through a `--filter=<filter-spec>`
> > specified by the user, which will limit the number of blobs and trees
> > that actually get downloaded. The `<filter-spec>`, can, for example, be:
> >  * `blob:none`, which will filter out all blobs.
> >  * `tree:0`, which will filter out all trees.
> >  * `blob:limit=5k`, which will filter out all blobs whose size is greater
> >    than $5$kB.
> 
> Why are there '$' signs above?

Ops. I wrote the proposal on Markdown with LaTeX support. Text between
"$" is considered LaTeX. Forgot to delete it when sending the email. My
fault.

> > The filtered out objects will be lazily downloaded when the user runs a
> > command that requires those missing data.
> >
> > This mechanism works with the following steps:
> >  * When the client wants to fetch some objects from the server using a
> >    filter, the client, after sending a list of capabilities it wants to
> >    be in effect, sends the `filter: <filter-spec>` capability, followed
> >    by a request for the objects that the client wants to retrieve. The
> >    following is an example of a request (extracted using
> >    `GIT_TRACE_PACKET=1`) made by a client to a server to fetch 1 object
> >    using the `<filter-spec>=blob:none`:
> >
> >    ```
> >    [...]
> >    pkt-line.c:85           packet:        fetch< 0000  # "flush-pkt"
> >    pkt-line.c:85           packet:        fetch> command=fetch  # Execute fetch
> >    pkt-line.c:85           packet:        fetch> agent=git/2.43.0
> >    pkt-line.c:85           packet:        fetch> object-format=sha1
> >    pkt-line.c:85           packet:        fetch> 0001  # "delim-pkt"
> >    pkt-line.c:85           packet:        fetch> thin-pack  # Capability
> >    pkt-line.c:85           packet:        fetch> no-progress  # Capability
> >    pkt-line.c:85           packet:        fetch> ofs-delta  # Capability
> >    pkt-line.c:85           packet:        fetch> filter blob:none  # Filter capability
> >    # OID of the object the client wants to retrieve
> >    pkt-line.c:85           packet:        fetch> want 394ca7a7b5e75a57e736040480f685c8b71844eb
> >    pkt-line.c:85           packet:        fetch> done  # End fetch
> >    pkt-line.c:85           packet:        fetch> 0000  # "flush-pkt"
> >    [...]
> >    ```
> 
> I think when lazy fetching like this, the filter is always blob:none.
> It's not really used anyway because the objects that the client wants
> are specified explicitly.

Oh, I didn't know that. Makes sense.

> The filter is important when initially cloning or fetching from the
> server to specify which objects are initially excluded, even if some
> of these  objects will be lazy fetched soon. For example the checkout
> part of a clone might need objects that were initially excluded, so it
> might lazy fetch some.

Ooh ok, with this comment I actually fully understand now. Looking back
at the `GIT_TRACE_PACKET` output, I actually understand almost all of
it. So the partial clone fetches (usually) the `HEAD`, excluding the
filtered out objects, while the lazy fetching directly asks for the
missing objects when they are needed, so the filter is not used. Got it!

> >  * The server will apply the requested `<filter-spec>` as it creates the
> >    "promisor packfile" of the requested objects.
> 
> This is important during an initial clone or fetch, not when lazy fetching.

Got it. I will revisit all the instances where I made some confusion
between lazy fetching and initial cloning/fetching. Thank you so much
for your explaination Christian!

> > A packfile is a binary
> >    file that is used to compress many "loose objects", and it does so by
> >    containing the most recent versions of the stored objects and deltas
> >    of the previous versions of those objects. A promisor packfile is a
> >    filtered packfile, where the unwanted objects are not present. The
> >    promisor packfile is sent to the client.
> 
> 
> > I created a minimal example setup, mostly based on the test
> > `t/t5710-promisor-remote-capability` added by `4602676` ("Add
> > 'promisor-remote' capability to protocol v2", 2025-02-18), to experiment
> > with multiple promisor remotes, in order to not simply rely on the
> > documentation, but to actually get hands-on experience. The example setup
> > creates a `server`, a 'lopm' ("Large Object Promisor medium") for blobs
> > larger than 5kB, a `lopl` ("Large Object Promisor large") for blobs
> > larger than 50kB, and a `client` that interfaces with all of these
> > remotes. It is created in the following way:
> 
> [...]
> 
> > Now, with this setup, by slightly tweaking the configurations of each
> > repository, it is possible to deeply test how multiple promisor remotes
> > are handled in various situations, and actually see what is described in
> > the documentation.
> 
> Yeah, it's quite complex to set up.

Yep. The complexity of the tests are the reason behind my decision to
deeply describe them in the proposal.

> > ## Testing Promisor Remotes Advertisement
> >
> > An important thing to test is the promisor remotes advertisement feature.
> > This feature is dependent on 2 main configuration options: the
> > server-side option `promisor.advertise`, which enables the server to
> > advertise the promisor remotes it is using to the client, and the
> > client-side option `promisor.acceptFromServer`, which describes how the
> > client should handle the promisor remotes advertised:
> >
> >  * If `promisor.advertise=false`, when the `client` wants to fetch an
> >    object that the `server` does not have,
> 
> I don't think it depends on the client fetching an object the server
> does not have. It depends on the client using a filter because the
> promisor-remote capability only makes sense in the case of partial
> clones (or fetches).

Ok yeah, I should have explained this better. Of course this depends on
the client using a filter. Thanks for the feedback.

> > the `server` will not
> >    advertise the `promisor-remote` capability, and so it has no other
> >    choice than to first fetch the object from `lopl` and/or `lopm`, and
> >    then give it to the `client`. This can be checked by doing `git -C
> >    server rev-list --objects --all --missing=print`, and seeing that the
> >    previously missing large blobs are now present inside the `server`, or
> >    by directly looking into the `GIT_TRACE_PACKET` output, and seeing
> >    that there is no reference to the `promisor-remote` capability.
> >
> >  * If `promisor.advertise=true`, when the `client` wants to fetch an
> >    object that the `server` does not have,
> 
> Same as above, it doesn't depend on the client fetching an object the
> server does not have. It depends on the client using a filter because
> the promisor-remote capability only makes sense in the case of partial
> clones (or fetches).

Ack. Same as above.

> > the `server` will advertise
> >    its promisor remotes, as seen by the `GIT_TRACE_PACKET` output, which
> >    will contain:
> >
> >    ```
> >    [...]
> >    packet: upload-pack> promisor-remote= \
> >        name=lopl,url=file://$(pwd)/lopl; \  # Adv lopl
> >        name=lopm,url=file://$(pwd)/lopm  # Adv lopm
> >    [...]
> >    ```
> 
> [...]
> 
> > Recently, with the patch series "Implement `promisor.storeFields` and
> > `--filter=auto`" [5], the new client-side configuration variable
> > `promisor.storeFields` was added. It contains a list of field names
> > `partialCloneFilter` and/or `token`), and the values of these fields,
> > when transmitted by the server, will be stored in the local configuration
> > on the client.
> >
> > ## Testing Multiple Promisor Remotes Fetch Order
> 
> Yeah, I think this is the most relevant for the project.

Agreed.

> > Finally, the last mechanism that is fundamental to understand is the
> > fetch order when multiple promisor remotes are defined:
> >
> >  * When multiple remotes are configured, they are tried one after the
> >    other in the order in which they appear in the configuration, until
> >    all objects are fetched.
> 
> Right, but there is the exception of a remote configured with
> `extensions.partialClone` that will be tried last. You mention it
> later though.

Yep, will mention it also here.

> > This can be easily seen from the output of
> >    `GIT_TRACE`, which initially tries to fetch the objects from `lopl`,
> >    and then from `lopm`:
> >
> >    ```
> >    [...]
> >    trace: built-in: git fetch lopl [...] --filter=blob:none [...]
> >    [...]
> >    trace: built-in: git fetch lopm [...] --filter=blob:none [...]
> >    [...]
> >    ```
> >
> >    While, if we make it so that we first define `lopm` in the `client`
> >    configuration, then initially `lopm` will be used to fetch the
> >    objects, and `lopl` will not be used at all (because `lopm` contains
> >    all required objects:
> >
> >    ```
> >    [...]
> >    trace: built-in: git fetch lopm [...] --filter=blob:none [...]
> >    [...]
> >    ```
> 
> Yeah, when all the needed objects have been lazy fetched, there is no
> point in further fetching from any remote.

Yeah, and so `lopl` is not tried at all.

> >  * If the configuration option `extensions.partialClone` is present, the
> >    promisor remote that it specifies will always be the last one tried
> >    when fetching objects.
> >
> > ------------------------------
> >
> > # "Implement promisor remote fetch ordering"
> >
> > ## Project Goal
> >
> > This project aims to improve Git by implementing a fetch ordering
> > mechanism for multiple promisor remotes, that can be:
> >
> >  * Configured locally by the client.
> >  * Advertised by servers through the `promisor-remote` protocol.
> >
> > ## Approach
> >
> > The bulk of the project will be the creation of a system that allows to
> > define the order with which the promisor remotes will be tried when
> > fetching an object.
> >
> > The first goal will be the creation of a `remote.<name>.promisorPriority`
> 
> Yeah, or just `remote.<name>.priority`. The name is to be discussed.

Ack.

> > configuration option, which will hold a number between 1 and 'UCHAR_MAX',
> 
> UCHAR_MAX could be system dependent. It might be better to have
> configurations work in the same way on all machines though. So perhaps
> a fixed range like 1 to 100 would be better. Or are there other ranges
> of values used for similar things in Git or other well known software
> that could be reused?

Mmh true. A fixed range might be better, I agree.

> > and which defines the priority of that promisor remote in the fetch
> > order. This means that the order in which the promisor are tried will be
> > the following:
> >
> >  * All promisor remotes that have a valid `remote.<name>.promisorPriority`,
> >    starting from the one with higher priority (the lower `promisorPriority`
> >    value). If 2 or more promisor remotes have the same priority, they will be
> >    tried following the order in which they appear in the configuration file.
> >
> >  * All promisor remotes that don't have or have an invalid
> >    `remote.<name>.promisorPriority` configuration option. If 2 or more
> >    promisor remotes don't define any priority, or have an invalid priority,
> >    they will be tried following the order in which they appear in the
> >    configuration file.
> >
> >  * The promisor remote defined inside the `extensions.partialClone`, no
> >    matter their priority (which will be ignored if present). This is
> >    necessary for backward compatibility.
> 
> Yeah, I think something like what you describe makes sense.

Nice! :-)

> > Having already taken a look at the code, I have a general idea of th
> 
> s/of th/of the/

Ack.

> > major steps to take to actually introduce the
> > `remote.<name>.promisorPriority` configuration option:
> 
> [...]
> 
> > # Possible Issues
> >
> > From my understanding, the project as it is proposed will handle all
> > possible cases, except for one. Let's imagine the following situation:
> >
> >  * `server1` and `server2` both use the promisor remotes `lop1` and `lop2`.
> >  * `client` has both `server1` and `server2` as remotes.
> >
> > In this situation, the `client` has no way to specifically say that when
> > fetching from `server1`, it wants to first try `lop1` and then `lop2`, while
> > when fetching from `server2`, it wants to first try `lop2` and then `lop1`.
> 
> Right, but lazy fetching does not only happen as part of a clone or
> fetch from a server. It happens when for some reason (like a git show
> or a git blame for example) the user needs some objects it doesn't
> have locally, and when that happens, this is not related to a single
> server.
> 
> So global priorities are likely the most useful ones to have.
> 
> > One way to solve this very specific (and maybe unusual) issue is to
> > introduce a way to associate a `promisorPriority` to a specific remote.
> 
> Yeah, but I don't think it would be used a lot. We can perhaps think
> of some cases where it could be useful, but in practice it is likely
> that if there is an optimal order for one server, it will be optimal
> for all other servers too.

I agree. I should have pointed out clearly that, to me, this unusual
situation doesn't seem worth the effort.

> [...]
> 
> Thanks!

Thank you Christian!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-18 16:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 18:25 [GSoC Proposal] Implement promisor remote fetch ordering Lorenzo Pegorari
2026-03-14 17:30 ` Christian Couder
2026-03-18 16:29   ` Lorenzo Pegorari
  -- strict thread matches above, loose matches on Subject: below --
2026-02-28 23:27 [GSoC] [Proposal]: " Abraham Samuel Adekunle
2026-03-03  9:27 ` Christian Couder
2026-03-03 12:08   ` Samuel Abraham
2026-03-10 15:11   ` Samuel Abraham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox