* [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file
@ 2026-03-05 20:48 SoutrikDas
2026-03-15 10:11 ` SoutrikDas
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: SoutrikDas @ 2026-03-05 20:48 UTC (permalink / raw)
To: git
Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31, chandrapratap3519
Hi!
This is my project proposal for GSOC 2026
I am interested in the project idea : "Complete and extend the
remote-object-info command for git cat-file"
# Complete and extend the remote-object-info command for git cat-file
## Contact
- Name: Soutrik Das
- E-mail: valusoutrik@gmail.com
- Github: https://github.com/SoutrikDas
- LinkedIn: https://www.linkedin.com/in/soutrik-das/
## About Me
My name is Soutrik Das, I am a developer and CS bachelor from Indian
Institute of Technology, Dhanbad. Currently I am pursuing a master's
degree in AI from Indian Institute of Technology, Bhubaneswar.
I dont really have much experience in contributing to something as
large as git, but I would love to learn anything and everything I can
gain from this experience. I have experience in C/C++ from my
Btech coursework and participating in codeforces contests.
## Pre GSOC
I started exploring Git's codebase around February 2026 and sent my first patch
as a docfix, followed by a microproject of modernizing tests
- [PATCH] doc: fix repo_config documentation reference [1]
status: merged to master
Merge Commit: 94336d77bcbf4360b67a9454d8bf2e84b3d88ae7
Description: Replace the path for the repo_config() documentation
from 'Documentation/technical/api-config.h' to 'config.h'.
- [GSOC PATCH] t7003: modernize path existence checks using test helpers [2]
status: merged to master
Merge Commit: 11294bb0fa540d214d071b32cf74b1ed37b3bbbd
Description: Replace direct uses of 'test -f' and 'test -d' with
git's helper functions 'test_path_is_file' ,'test_path_is_missing'
and 'test_path_is_dir'
I have read through most of Eric Ju's [4] work and some of Calvin Wan's [5]
work. I am still finding more things to understand from each thread, but
I feel I have grasped the basics.
My work in this project would be focused on implementing the changes
suggested at the end of Eric Ju's [Patch v11].
I wouldn't say I understand every bit of discussion from that thread,
but in general my understanding is :
Calvin Wan and Eric Ju has already implemented a client side command
called get_remote_info but its designed for being batched to reduce
multiple network trips to get a single object's data.
I have added Eric Ju's patch series to an old master commit (2d2a71ce85)
since I could not find a base commit for Eric's patch series. The patch
was properly applied and I also played around and added a very rough
but workin "%(objecttype)" code , ie now it prints like this :
29658341f39210201ff7f72a4be83937cf2288c5 14 blob
## Project : Complete and extend the remote-object-info command for git cat-file
Currently in the case of a partial clone, the user cannot retrieve all
object data without fetching the object beforehand. To solve this problem
Calvin Wan and Eric Ju had designed a patch sreies that can solve that,
by utilising protocolv2 servers capabilities.
This was done in the form of "remote-object-info".
But only the %(objectsize) was implemented, and that patch was not merged.
This project has two goals
1: To Rebase and finalize Calvin Wan and Eric Ju's Work by addressing
the feedback on Eric Ju's Patch v11
2: To add support for objecttype in remote-object-info
3: To discuss other information type like objectsize:disk and deltabase.
Project Duration : 12 week approx
## Timeline
Mar 6-31 : Refine Proposal
If possible I would like to submit small patches... but first I will
have to rebase Eric Ju's Patches ... I am not sure if I can do this
before GSOC...
If not, I plan to contribute to git in other areas.
May 1-24 : Community Bonding
1-7 : Understand relevant underlying/ helper functions
8-24 : Ask about any design related problems/decisions
May 25 - Jun 14 : Start a Patch Series to rebase Calvin Wan and Eric Ju's work
and keep refining
Jun 15 - Aug 15 : Start and keep refining Patch Series to add support for
object type information
Aug 16 - Aug 24 : Discuss and Implement other object information if possible
Concurrently I shall make a report for all the work done.
## Availability
My current semester is ending in the first week of April, so I will be
able to contribute 7-8 hours per day, totalling around 35-40 hrs a week
on the project.
Total weeks = 12 , total hours = 35*12 = 420
It leaves with a lot more room to accomodate any unforeseen circumstances
that may arise during the project.
## RFC
I have a few ideas but do not know if they are worth pursuing, so I will
leave them here in the first draft
- Addition of a remote-object-info outside of batchmode :
Yes it should be optimally used in batch mode .. but if user wants
only one objects size or type then should they be able to just
`git cat-file -r origin <oid>`
and get the size and type ? or something similar , I am not sure if
the way I have depicted it conforms to git's design.
- Addition of commands for common user behaviour :
I dont know if its going to be a common user behaviour but what about
`git cat-file -r --all-absent`
Or inside "git cat-file --batch-command="<format> remote-object-info
--all-absent --type=tree <remote>"
which would basically fill in remote-object-info with all the blobs
that are currently absent from the worktree ?
No need to fill them if its for a common enough use case.
- Sort according to size :
Maybe a user would want to check whats the largest file they dont
have yet.
- Get total missing blob size :
Use case would be when someone wants to know how much exactly there
is to download, before starting the download.
Thank you for your time in revewing my proposal as well as considering
my application. I am excited to learn everything I can from git.
Thanks and Regards,
Soutrik
[1] : pull.2187.git.git.1770293021383.gitgitgadget@gmail.com
[2] : 20260209172445.39536-1-valusoutrik@gmail.com
[3] : 20260225190306.39358-1-valusoutrik@gmail.com
[4] : 20240628190503.67389-1-eric.peijian@gmail.com
[5] : 20220728230210.2952731-1-calvinwan@google.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file
2026-03-05 20:48 [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file SoutrikDas
@ 2026-03-15 10:11 ` SoutrikDas
2026-03-16 12:08 ` Christian Couder
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: SoutrikDas @ 2026-03-15 10:11 UTC (permalink / raw)
To: valusoutrik
Cc: ayu.chandekar, chandrapratap3519, christian.couder, git, jltobler,
karthik.188, siddharthasthana31
Hi I was wondering If I could get some feedback on this.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file
2026-03-05 20:48 [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file SoutrikDas
2026-03-15 10:11 ` SoutrikDas
@ 2026-03-16 12:08 ` Christian Couder
2026-03-17 13:06 ` SoutrikDas
2026-03-16 20:46 ` Karthik Nayak
2026-03-20 13:12 ` [GSoC Proposal v2] " SoutrikDas
3 siblings, 1 reply; 7+ messages in thread
From: Christian Couder @ 2026-03-16 12:08 UTC (permalink / raw)
To: SoutrikDas
Cc: git, karthik.188, jltobler, ayu.chandekar, siddharthasthana31,
chandrapratap3519
Hi,
Sorry for the late feedback.
On Thu, Mar 5, 2026 at 9:48 PM SoutrikDas <valusoutrik@gmail.com> wrote:
> I have read through most of Eric Ju's [4] work and some of Calvin Wan's [5]
> work. I am still finding more things to understand from each thread, but
> I feel I have grasped the basics.
>
> My work in this project would be focused on implementing the changes
> suggested at the end of Eric Ju's [Patch v11].
>
> I wouldn't say I understand every bit of discussion from that thread,
> but in general my understanding is :
>
> Calvin Wan and Eric Ju has already implemented a client side command
s/has/have/
> called get_remote_info but its designed for being batched to reduce
s/its/it's/
> multiple network trips to get a single object's data.
The `git cat-file` command has a `--batch-command[=<format>]` option
to enter a command mode. In this command mode some special commands
and arguments can be passed via stdin to `git cat-file` to request
information.
[...]
> ## Project : Complete and extend the remote-object-info command for git cat-file
>
> Currently in the case of a partial clone, the user cannot retrieve all
> object data without fetching the object beforehand. To solve this problem
> Calvin Wan and Eric Ju had designed a patch sreies that can solve that,
s/sreies/series/
> by utilising protocolv2 servers capabilities.
>
> This was done in the form of "remote-object-info".
>
> But only the %(objectsize) was implemented, and that patch was not merged.
> This project has two goals
>
> 1: To Rebase and finalize Calvin Wan and Eric Ju's Work by addressing
> the feedback on Eric Ju's Patch v11
>
> 2: To add support for objecttype in remote-object-info
>
> 3: To discuss other information type like objectsize:disk and deltabase.
s/type/types/
But anyway I think "information type" is not a good wording for these
things, because we already talk about "type" for Git object types.
Please try to find a better wording.
> ## Timeline
>
> Mar 6-31 : Refine Proposal
>
> If possible I would like to submit small patches... but first I will
> have to rebase Eric Ju's Patches ... I am not sure if I can do this
> before GSOC...
You can try a rebase to see which issues would need to be resolved to
complete a rebase, and talk a bit about these issues in your proposal,
but otherwise applicants shouldn't start working on a project before
they have been accepted.
> If not, I plan to contribute to git in other areas.
>
> May 1-24 : Community Bonding
> 1-7 : Understand relevant underlying/ helper functions
> 8-24 : Ask about any design related problems/decisions
>
> May 25 - Jun 14 : Start a Patch Series to rebase Calvin Wan and Eric Ju's work
> and keep refining
>
> Jun 15 - Aug 15 : Start and keep refining Patch Series to add support for
> object type information
Would you implement both the client and the server side in the same
patch series or do it separately?
> Aug 16 - Aug 24 : Discuss and Implement other object information if possible
> Concurrently I shall make a report for all the work done.
>
> ## Availability
>
> My current semester is ending in the first week of April, so I will be
> able to contribute 7-8 hours per day, totalling around 35-40 hrs a week
> on the project.
Do you have another semester starting after the current one?
> Total weeks = 12 , total hours = 35*12 = 420
> It leaves with a lot more room to accomodate any unforeseen circumstances
> that may arise during the project.
>
> ## RFC
>
> I have a few ideas but do not know if they are worth pursuing, so I will
> leave them here in the first draft
>
> - Addition of a remote-object-info outside of batchmode :
> Yes it should be optimally used in batch mode .. but if user wants
> only one objects size or type then should they be able to just
> `git cat-file -r origin <oid>`
> and get the size and type ? or something similar , I am not sure if
> the way I have depicted it conforms to git's design.
Not sure if that would be very useful first. Also that might be better
in a different command than `cat-file`.
> - Addition of commands for common user behaviour :
> I dont know if its going to be a common user behaviour but what about
> `git cat-file -r --all-absent`
> Or inside "git cat-file --batch-command="<format> remote-object-info
> --all-absent --type=tree <remote>"
> which would basically fill in remote-object-info with all the blobs
> that are currently absent from the worktree ?
There are other ways to do this, like using:
git rev-list --objects --all --missing=print
Thanks for your proposal.
Best.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file
2026-03-05 20:48 [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file SoutrikDas
2026-03-15 10:11 ` SoutrikDas
2026-03-16 12:08 ` Christian Couder
@ 2026-03-16 20:46 ` Karthik Nayak
2026-03-17 15:13 ` SoutrikDas
2026-03-20 13:12 ` [GSoC Proposal v2] " SoutrikDas
3 siblings, 1 reply; 7+ messages in thread
From: Karthik Nayak @ 2026-03-16 20:46 UTC (permalink / raw)
To: SoutrikDas, git
Cc: christian.couder, jltobler, ayu.chandekar, siddharthasthana31,
chandrapratap3519
[-- Attachment #1: Type: text/plain, Size: 6605 bytes --]
SoutrikDas <valusoutrik@gmail.com> writes:
Hello,
[snip]
> ## Pre GSOC
>
> I started exploring Git's codebase around February 2026 and sent my first patch
> as a docfix, followed by a microproject of modernizing tests
>
> - [PATCH] doc: fix repo_config documentation reference [1]
> status: merged to master
> Merge Commit: 94336d77bcbf4360b67a9454d8bf2e84b3d88ae7
> Description: Replace the path for the repo_config() documentation
> from 'Documentation/technical/api-config.h' to 'config.h'.
>
> - [GSOC PATCH] t7003: modernize path existence checks using test helpers [2]
> status: merged to master
> Merge Commit: 11294bb0fa540d214d071b32cf74b1ed37b3bbbd
> Description: Replace direct uses of 'test -f' and 'test -d' with
> git's helper functions 'test_path_is_file' ,'test_path_is_missing'
> and 'test_path_is_dir'
>
>
> I have read through most of Eric Ju's [4] work and some of Calvin Wan's [5]
> work. I am still finding more things to understand from each thread, but
> I feel I have grasped the basics.
>
> My work in this project would be focused on implementing the changes
> suggested at the end of Eric Ju's [Patch v11].
>
> I wouldn't say I understand every bit of discussion from that thread,
> but in general my understanding is :
>
I do agree that there is a lot to unpack there.
> Calvin Wan and Eric Ju has already implemented a client side command
> called get_remote_info but its designed for being batched to reduce
> multiple network trips to get a single object's data.
>
As far as I can recall, the command allowed users to enter multiple OIDs
in a single line to reduce the to-fro with the server. But you could
still fetch single OID info.
> I have added Eric Ju's patch series to an old master commit (2d2a71ce85)
> since I could not find a base commit for Eric's patch series. The patch
> was properly applied and I also played around and added a very rough
> but workin "%(objecttype)" code , ie now it prints like this :
>
> 29658341f39210201ff7f72a4be83937cf2288c5 14 blob
>
Nice, have you tried with a more recent 'master'? I assume there are
merge conflicts?
>
> ## Project : Complete and extend the remote-object-info command for git cat-file
>
> Currently in the case of a partial clone, the user cannot retrieve all
> object data without fetching the object beforehand. To solve this problem
> Calvin Wan and Eric Ju had designed a patch sreies that can solve that,
> by utilising protocolv2 servers capabilities.
>
> This was done in the form of "remote-object-info".
>
> But only the %(objectsize) was implemented, and that patch was not merged.
> This project has two goals
>
> 1: To Rebase and finalize Calvin Wan and Eric Ju's Work by addressing
> the feedback on Eric Ju's Patch v11
>
Any idea how much work is left post v11?
> 2: To add support for objecttype in remote-object-info
>
> 3: To discuss other information type like objectsize:disk and deltabase.
>
> Project Duration : 12 week approx
>
> ## Timeline
>
> Mar 6-31 : Refine Proposal
>
> If possible I would like to submit small patches... but first I will
> have to rebase Eric Ju's Patches ... I am not sure if I can do this
> before GSOC...
>
As per the guidelines, it says
Any work done on the Project prior to acceptance of the Project
Proposal will not be considered for Evaluations.
> If not, I plan to contribute to git in other areas.
>
> May 1-24 : Community Bonding
> 1-7 : Understand relevant underlying/ helper functions
> 8-24 : Ask about any design related problems/decisions
>
> May 25 - Jun 14 : Start a Patch Series to rebase Calvin Wan and Eric Ju's work
> and keep refining
>
> Jun 15 - Aug 15 : Start and keep refining Patch Series to add support for
> object type information
>
> Aug 16 - Aug 24 : Discuss and Implement other object information if possible
> Concurrently I shall make a report for all the work done.
How will you manage reviews, considering generally they take a long
time?
>
> ## Availability
>
> My current semester is ending in the first week of April, so I will be
> able to contribute 7-8 hours per day, totalling around 35-40 hrs a week
> on the project.
>
> Total weeks = 12 , total hours = 35*12 = 420
> It leaves with a lot more room to accomodate any unforeseen circumstances
> that may arise during the project.
>
> ## RFC
>
> I have a few ideas but do not know if they are worth pursuing, so I will
> leave them here in the first draft
>
> - Addition of a remote-object-info outside of batchmode :
> Yes it should be optimally used in batch mode .. but if user wants
> only one objects size or type then should they be able to just
> `git cat-file -r origin <oid>`
> and get the size and type ? or something similar , I am not sure if
> the way I have depicted it conforms to git's design.
>
I do agree that something like that would be useful indeed, I'm not sure
of what that design looks like though.
> - Addition of commands for common user behaviour :
> I dont know if its going to be a common user behaviour but what about
> `git cat-file -r --all-absent`
> Or inside "git cat-file --batch-command="<format> remote-object-info
> --all-absent --type=tree <remote>"
> which would basically fill in remote-object-info with all the blobs
> that are currently absent from the worktree ?
> No need to fill them if its for a common enough use case.
I do see benefits of this too. But I do wonder if 'git rev-list' is a
better command for something like this.
> - Sort according to size :
> Maybe a user would want to check whats the largest file they dont
> have yet.
>
Same here.
> - Get total missing blob size :
> Use case would be when someone wants to know how much exactly there
> is to download, before starting the download.
>
This could probably go into 'git backfill' ? Interesting ideas
nevertheless!
> Thank you for your time in revewing my proposal as well as considering
> my application. I am excited to learn everything I can from git.
>
> Thanks and Regards,
> Soutrik
>
What I missed from the proposal:
1. Where did the work from Eric and Calvin stop at, what review comments
need to be addressed.
2. How do you plan to handle reviews and iterations taking time.
Regards,
Karthik
>
> [1] : pull.2187.git.git.1770293021383.gitgitgadget@gmail.com
> [2] : 20260209172445.39536-1-valusoutrik@gmail.com
> [3] : 20260225190306.39358-1-valusoutrik@gmail.com
> [4] : 20240628190503.67389-1-eric.peijian@gmail.com
> [5] : 20220728230210.2952731-1-calvinwan@google.com
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file
2026-03-16 12:08 ` Christian Couder
@ 2026-03-17 13:06 ` SoutrikDas
0 siblings, 0 replies; 7+ messages in thread
From: SoutrikDas @ 2026-03-17 13:06 UTC (permalink / raw)
To: christian.couder
Cc: ayu.chandekar, chandrapratap3519, git, jltobler, karthik.188,
siddharthasthana31, valusoutrik
Hi there,
> s/has/have/
> s/its/it's/
> s/sreies/series/
> s/type/types/
I will correct all the spelling mistakes.
> > multiple network trips to get a single object's data.
>
> The `git cat-file` command has a `--batch-command[=<format>]` option
> to enter a command mode. In this command mode some special commands
> and arguments can be passed via stdin to `git cat-file` to request
> information.
Will correct that.
> But anyway I think "information type" is not a good wording for these
> things, because we already talk about "type" for Git object types.
> Please try to find a better wording.
How about object property or object attribute or object field?
I feel like object fields may be a bit more technically correct.
> You can try a rebase to see which issues would need to be resolved to
> complete a rebase, and talk a bit about these issues in your proposal,
> but otherwise applicants shouldn't start working on a project before
> they have been accepted.
I tried a rebase on the current master , and there were indeed conflicts
I will include this part in my v2.
> Would you implement both the client and the server side in the same
> patch series or do it separately?
I am not sure actually... since Eric Ju did everything in one patch series.
But personally I feel like doing one series for server side first and another
for client side would be a bit more focused. But I am not sure if it would
cost more time for everyone involved, like giving feedback and all that?
> > My current semester is ending in the first week of April, so I will be
> > able to contribute 7-8 hours per day, totalling around 35-40 hrs a week
> > on the project.
>
> Do you have another semester starting after the current one?
Actually I made a mistake, its ending in the first week of May. But no,
after this semester we have a summer break so ... I will update this part.
> Not sure if that would be very useful first. Also that might be better
> in a different command than `cat-file`.
Alright. I will ask that as a question before my final gsoc proposal
submission so that if its approved, I will add it to my tasks in gsoc.
> There are other ways to do this, like using:
>
> git rev-list --objects --all --missing=print
Did not know that ... but thats great! I will remove this from the proposal.
Thanks for the feedback.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file
2026-03-16 20:46 ` Karthik Nayak
@ 2026-03-17 15:13 ` SoutrikDas
0 siblings, 0 replies; 7+ messages in thread
From: SoutrikDas @ 2026-03-17 15:13 UTC (permalink / raw)
To: karthik.188
Cc: ayu.chandekar, chandrapratap3519, christian.couder, git, jltobler,
siddharthasthana31, valusoutrik
Hi there,
> As far as I can recall, the command allowed users to enter multiple OIDs
> in a single line to reduce the to-fro with the server. But you could
> still fetch single OID info.
Yeah that was what I meant, but from Chistian Couder's feedback, I realized
that cat-file is not a good home for such a subcommand.
> Nice, have you tried with a more recent 'master'? I assume there are
> merge conflicts?
Yup, I will add these issues in my proposal v2.
> Any idea how much work is left post v11?
From the v11 thread
- a lot of design decision fix , like comment alignment and blank lines
- the max remote obj info logic is a bit wrong as Junio pointed out [1]
- one test case for max obj limit
- use of size_t for looping
- the placeholder check ie the even with only objectsize the checking of
formatting string is a bit incorrect [2]
- Implementing an allow list for placeholders
- print empty string for unsupported placeholders, ie those not on the
allow list
- remove usage of split_cmdline since neither url nor oid will have spaces
in them, so a strchr would suffice, I think ?
Above is for just for part 1 ie to get eric jus patch accepted
> As per the guidelines, it says
>
> Any work done on the Project prior to acceptance of the Project
> Proposal will not be considered for Evaluations.
I meant like in the May 1-24 duration, which is after the acceptance
of the project ( april 30 ) but before coding officially begins (may 25)
This is the timeline on gsocs page [3]:
> April 30 - 18:00 UTC
> Accepted GSoC contributor projects announced
> May 1 - 24
> Community Bonding Period | GSoC contributors get to know mentors,
> read documentation, get up to speed to begin working on their projects
> May 25
> Coding officially begins!
I was planning to also ask design questions in this period.
> How will you manage reviews, considering generally they take a long
> time?
I will adjust the timeline to give more time to rebase previously done work.
I was wondering... I cannot start on part 2 ie adding support for more object
fields without first integrating old work ... so about 50% of time will go to
rebasing and 30% to adding new fields ? and 20% for emergency or any mishap.
> I do agree that something like that would be useful indeed, I'm not sure
> of what that design looks like though.
> I do see benefits of this too. But I do wonder if 'git rev-list' is a
> better command for something like this.
I will clarify questions at the beginning of gsoc duration.
> What I missed from the proposal:
> 1. Where did the work from Eric and Calvin stop at, what review comments
> need to be addressed.
> 2. How do you plan to handle reviews and iterations taking time.
Will update the timeline as well as mention the current outstanding tasks,
as far as I have understood them.
Thank you for your feedback.
[1] : xmqqo6yr3wc4.fsf@gitster.g/
[2] : 20250224234720.GC729825@coredump.intra.peff.net/
[3] : https://developers.google.com/open-source/gsoc/timeline
^ permalink raw reply [flat|nested] 7+ messages in thread
* [GSoC Proposal v2] Complete and extend the remote-object-info command for git cat-file
2026-03-05 20:48 [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file SoutrikDas
` (2 preceding siblings ...)
2026-03-16 20:46 ` Karthik Nayak
@ 2026-03-20 13:12 ` SoutrikDas
3 siblings, 0 replies; 7+ messages in thread
From: SoutrikDas @ 2026-03-20 13:12 UTC (permalink / raw)
To: valusoutrik
Cc: ayu.chandekar, chandrapratap3519, christian.couder, git, jltobler,
karthik.188, siddharthasthana31
Hi everyone,
Thank you for the feedback Christian and Karthik.
I have not made a doc version of this yet. I will link it from v3
I understand that in this proposal I have not explained my own plans that
thoroughly, I am working on this in v3.
Changes from v1 :
- Correct spelling mistakes
- Address how much work is remaining after Eric Ju's Patch v11
- Increase Time in Timeline for Reviews
- Add a section for rebasing problems
---
This is the second version of my project proposal for GSoC 2026
I am interested in the project idea : "Complete and extend the
remote-object-info command for git cat-file"
# Complete and extend the remote-object-info command for git cat-file
## Contact
- Name: Soutrik Das
- E-mail: valusoutrik@gmail.com
- Github: https://github.com/SoutrikDas
- LinkedIn: https://www.linkedin.com/in/soutrik-das/
## About Me
My name is Soutrik Das, I am a developer. I did my B.Tech in CS from
IIT Dhanbad. Currently I am pursuing a M.Tech degree in AI from IIT
Bhubaneswar.
I don't really have much experience in contributing to something as
large as git, but I would like to learn as much as possible from this
experience. I have experience in C/C++ from my Btech coursework and
participating in codeforces contests.
## Pre GSoC
I started exploring Git's codebase around February 2026 and sent my first patch
as a docfix, followed by a microproject of modernizing tests
- [PATCH] doc: fix repo_config documentation reference [1]
status: merged to master
Merge Commit: 94336d77bcbf4360b67a9454d8bf2e84b3d88ae7
Merge Date : 13 Feb 2026
Description: Replace the path for the repo_config() documentation
from 'Documentation/technical/api-config.h' to 'config.h'.
- [GSoC PATCH] t7003: modernize path existence checks using test helpers [2]
status: merged to master
Merge Commit: 11294bb0fa540d214d071b32cf74b1ed37b3bbbd
Merge Date : 17 Feb 2026
Description: Replace direct uses of 'test -f' and 'test -d' with
git's helper functions 'test_path_is_file' ,'test_path_is_missing'
and 'test_path_is_dir'
## Eric Ju and Calvin Wan's work
In this section I want to talk about the work already done and what
feedback the community had on the last sent patch , ie v11
This is my understanding of the patch series:
Patch 1/8 : git-compat-util: add strtoul_ul()
Helper function addition
Patch 2/8 : cat-file: add declaration of variable i inside for loop
Small refactoring
Patch 3/8 : t1006: split test utility functions into new "lib-cat-file.sh"
Moving the `echo_without_newline`,`echo_without_newline_nul` and
`strlen` function from `t1006-cat-file.sh` to `lib-cat-file.sh` to
reuse them in future.
When I rebased the patch series against a recent master (March 5)
795c338de725e13bd361214c6b768019fc45a2c1, there is only one other
file ( t1007-hash-object.sh ) that has a duplicate definition.
Patch 4/8 : fetch-pack: refactor packet writing
Generalized write_command_and_capabilities so that it now takes in
a command instead of hardcoding "fetch". It was also moved from
`fetch-pack.c` to `connect.c`
Patch 5/8 : fetch-pack: move fetch initialization
Before this patch, the state machine of do_fetch_pack_v2() used to
assume that starting state is FETCH_CHECK_LOCAL so it would initialize
certain variables like `use_sideband=2` inside the FETCH_CHECK_LOCAL
case. But now for remote-object-info we do not want to go through
the extra steps, we are directly entering the state machine at
FETCH_SEND_REQUEST. We don't need to figure out what to fetch,
the user/machine is explicitly giving it.
Patch 6/8 : serve: advertise object-info feature
Makes the server adertise that it supports the "size" feature of
object-info command.
Patch 7/8 : transport: add client support for object-info
Adds `fetch_object_info` which checks if protocol is v2
and then sends the object info request. After getting the result
its parsing the output.
Also sets `state=FFETCH_SEND_REQUEST` when object-info is used.
Not related to above patch , but on the server side this request is
caught by serve.c and then handled by cap_object_info in protocol-caps.c
Patch 8/8 : cat-file: add remote-object-info to batch-command
Adds the subcommands and relevant tests.
To summarize, this patch series has added the subcommand, and all of
the needed functions to make one object info field work. But a few problems
were left to be addressed. Once those are addressed, adding new object
info fields will be much easier.
## Problems faced during rebasing
I applied the patches onto an old master (2d2a71ce85) and then rebased
to a recent master (795c338de7)
Patch 1/8: Auto / No Merge Conflict
Patch 2/8: Auto / No Merge Conflict
Patch 3/8: add/add conflict
Patch 4/8: Confirming movement of function `write_command_and_capabilities`
Patch 5/8: Auto / No Merge Conflict
Patch 6/8: Auto / No Merge Conflict
Patch 7/8: Makefile merge conflict but when opened in vscode it shows
0 conflict.
Patch 8/8: add/add conflict for object-store.c and modify/delete
conflict for object-store-ll.h
According to 68cd492a3e
> object-store: merge "object-store-ll.h" and "object-store.h"
And according to 8f49151763
> object-store: rename files to "odb.{c,h}"
Therefore I have added the function signature that was supposed to go to
object-store-ll.h to odb.h
## Work remaining to get v11 patch accepted
Almost all of it is focused on patch 8
- Fix multi-line comment formatting - closing */ on own line
- Add blank lines between macro definitions
- Split overly-long MAX_REMOTE_OBJ_INFO_LINE definition across lines
- Change loop variable from size_t i to int i (since argc is int)
- Rearrange if/else to put smaller body first: if (!gtransport->smart_options)
before else
- Fix the logic of maximum line size for the remote-object-info.
- Adding an allow list of object info fields
- Handling what happens if an unsupported object info field is given in
format string.
In this case we send the request as if such a object info field is
not even there, and when printing the result we simply print an empty
string on the client side. No extra payload on the network.
- Add tests.
- Update Documentation
## Project : Complete and extend the remote-object-info command for git cat-file
Currently in the case of a partial clone, the user cannot retrieve all
object data without fetching the object beforehand. To solve this problem
Calvin Wan and Eric Ju had designed a patch series that can solve that,
by utilising protocolv2 servers capabilities.
This was done in the form of "remote-object-info".
But only the %(objectsize) was implemented, and that patch was not merged.
This project has two goals
1: To Rebase and finalize Calvin Wan and Eric Ju's Work by addressing
the feedback on Eric Ju's Patch v11. Work for this part is discussed
above in above section.
2: To discuss with the community and add support for other relevant
object info fields `remote-object-info` like `objecttype`,
`objectsize:disk` and `deltabase`
Project Duration : 13 week approx
## Timeline
### Phase 1 :
May 1-24 : Community Bonding + Start Design discussions on
Logic of allow list implementation
Logic of maximum size of the remote-object-info command
Which object info fields should be supported
Week 1 (May 25 - 31) :
Open Patch Series 1 for Eric Jus patch, after
solving all remaining problems. Use the discussed idea/solution from
above. Both client and server side work would be in the same patch
series. This is just rebasing previous work so I have to address
the changes suggested after v11.
Week 2 (June 1 - 7) : Continue discussion, review feedback and refine.
Week 3 (June 8 - 14) : Review feedback and refine
Week 4 (June 15 - 21) : Review feedback and refine + Update Documentation
and Tests
Week 5 (June 22 - 28) : By now all tasks regarding Merging Eric Ju's
patch should be finished. But since it may take more time for
reviewing I am adding a buffer weeks.
Week 6 (June 29 - July 5) : Polish everything + Midterm report
Week 7 (July 6 - 12) : Midterm evaluation ( July 7-11)
Week 8 (July 13 - 19) : Start Patch Series 2 for adding other object info
fields as per the discussion started in Week 1.
Week 9 (July 20 - 26) : Review feedback and refine.
Week 10 (July 27 - August 2) : Review feedback and refine.
Week 11 (August 3 - 9) : Finalize all tests and Doc changes.
Week 12 (August 10 - 16) : Prepare Final report.
Week 13 (August 17 - 23) : Final Evaluation ( Aug 18-24 )
## Availability
My current semester is ending in the first week of May, so I will be
able to contribute 7-8 hours per day, totalling around 35-40 hrs a week
on the project.
Total weeks = 13 , total hours = 35*13 = 455
It leaves with a lot more room to accommodate any unforeseen circumstances
that may arise during the project.
## RFC
Hi Christian and Karthik !
I still feel like the single object get remote info might be useful
and I think this might be where I can add this functionality :
When someone does `GIT_NO_LAZY_FETCH=0 git cat-file -s <oid>`
And the oid is of a blob that is not on local, then git simply fetches
the blob and reruns git cat-file -s.
But if someone does `GIT_NO_LAZY_FETCH=1 git cat-file -s <oid>`
And the blob is not on local then it exits with the following error
> if (git_env_bool(NO_LAZY_FETCH_ENVIRONMENT, 0)) {
> static int warning_shown;
> if (!warning_shown) {
> warning_shown = 1;
> warning(_("lazy fetching disabled; some objects may not be available"));
> }
> return -1;
> }
Would it be useful behaviour if instead of exiting with an error it sent
a remote-object-info request for that single file ?
Thank you for your time in reviewing my proposal as well as considering
my application. I am excited to learn everything I can from git.
Thanks and Regards,
Soutrik
[1] : pull.2187.git.git.1770293021383.gitgitgadget@gmail.com
[2] : 20260209172445.39536-1-valusoutrik@gmail.com
[3] : 20260225190306.39358-1-valusoutrik@gmail.com
[4] : 20240628190503.67389-1-eric.peijian@gmail.com
[5] : 20220728230210.2952731-1-calvinwan@google.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-20 13:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-05 20:48 [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file SoutrikDas
2026-03-15 10:11 ` SoutrikDas
2026-03-16 12:08 ` Christian Couder
2026-03-17 13:06 ` SoutrikDas
2026-03-16 20:46 ` Karthik Nayak
2026-03-17 15:13 ` SoutrikDas
2026-03-20 13:12 ` [GSoC Proposal v2] " SoutrikDas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox