public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: SoutrikDas <valusoutrik@gmail.com>
To: valusoutrik@gmail.com
Cc: ayu.chandekar@gmail.com, chandrapratap3519@gmail.com,
	christian.couder@gmail.com, git@vger.kernel.org,
	jltobler@gmail.com, karthik.188@gmail.com,
	siddharthasthana31@gmail.com
Subject: [GSoC Proposal v2] Complete and extend the remote-object-info command for git cat-file
Date: Fri, 20 Mar 2026 18:42:00 +0530	[thread overview]
Message-ID: <20260320131200.3615-1-valusoutrik@gmail.com> (raw)
In-Reply-To: <20260305204809.54927-1-valusoutrik@gmail.com>


Hi everyone,
Thank you for the feedback Christian and Karthik.
I have not made a doc version of this yet. I will link it from v3

I understand that in this proposal I have not explained my own plans that
thoroughly, I am working on this in v3.

Changes from v1 : 
- Correct spelling mistakes
- Address how much work is remaining after Eric Ju's Patch v11
- Increase Time in Timeline for Reviews
- Add a section for rebasing problems

---

This is the second version of my project proposal for GSoC 2026

I am interested in the project idea : "Complete and extend the 
remote-object-info command for git cat-file"


# Complete and extend the remote-object-info command for git cat-file

## Contact

- Name: Soutrik Das
- E-mail: valusoutrik@gmail.com
- Github: https://github.com/SoutrikDas
- LinkedIn: https://www.linkedin.com/in/soutrik-das/

## About Me

My name is Soutrik Das, I am a developer. I did my B.Tech in CS from 
IIT Dhanbad. Currently I am pursuing a M.Tech degree in AI from IIT 
Bhubaneswar.

I don't really have much experience in contributing to something as 
large as git, but I would like to learn as much as possible from this 
experience. I have experience in C/C++ from my Btech coursework and 
participating in codeforces contests.


## Pre GSoC

I started exploring Git's codebase around February 2026 and sent my first patch
as a docfix, followed by a microproject of modernizing tests 

- [PATCH] doc: fix repo_config documentation reference [1]
    status: merged to master 
    Merge Commit: 94336d77bcbf4360b67a9454d8bf2e84b3d88ae7
    Merge Date : 13 Feb 2026
    Description: Replace the path for the repo_config() documentation 
    from 'Documentation/technical/api-config.h' to 'config.h'.

- [GSoC PATCH] t7003: modernize path existence checks using test helpers [2]
    status: merged to master 
    Merge Commit: 11294bb0fa540d214d071b32cf74b1ed37b3bbbd
    Merge Date : 17 Feb 2026
    Description: Replace direct uses of 'test -f' and 'test -d' with
    git's helper functions 'test_path_is_file' ,'test_path_is_missing'
     and 'test_path_is_dir'


## Eric Ju and Calvin Wan's work

In this section I want to talk about the work already done and what 
feedback the community had on the last sent patch , ie v11 

This is my understanding of the patch series: 

Patch 1/8 : git-compat-util: add strtoul_ul()
    Helper function addition

Patch 2/8 : cat-file: add declaration of variable i inside for loop
    Small refactoring

Patch 3/8 : t1006: split test utility functions into new "lib-cat-file.sh"
    Moving the `echo_without_newline`,`echo_without_newline_nul` and 
    `strlen` function from `t1006-cat-file.sh` to `lib-cat-file.sh` to
    reuse them in future. 
    When I rebased the patch series against a recent master (March 5)
    795c338de725e13bd361214c6b768019fc45a2c1, there is only one other
    file ( t1007-hash-object.sh ) that has a duplicate definition. 

Patch 4/8 : fetch-pack: refactor packet writing
    Generalized write_command_and_capabilities so that it now takes in
    a command instead of hardcoding "fetch". It was also moved from 
    `fetch-pack.c` to `connect.c`

Patch 5/8 : fetch-pack: move fetch initialization
    Before this patch, the state machine of do_fetch_pack_v2() used to
    assume that starting state is FETCH_CHECK_LOCAL so it would initialize
    certain variables like `use_sideband=2` inside the FETCH_CHECK_LOCAL
    case. But now for remote-object-info we do not want to go through
    the extra steps, we are directly entering the state machine at
    FETCH_SEND_REQUEST. We don't need to figure out what to fetch,
    the user/machine is explicitly giving it.

Patch 6/8 : serve: advertise object-info feature
    Makes the server adertise that it supports the "size" feature of
    object-info command.

Patch 7/8 : transport: add client support for object-info
    Adds `fetch_object_info` which checks if protocol is v2 
    and then sends the object info request. After getting the result
    its parsing the output. 

    Also sets `state=FFETCH_SEND_REQUEST` when object-info is used.

Not related to above patch , but on the server side this request is
caught by serve.c and then handled by cap_object_info in protocol-caps.c

Patch 8/8 : cat-file: add remote-object-info to batch-command
    Adds the subcommands and relevant tests.

To summarize, this patch series has added the subcommand, and all of
the needed functions to make one object info field work. But a few problems
were left to be addressed. Once those are addressed, adding new object
info fields will be much easier. 

## Problems faced during rebasing

I applied the patches onto an old master (2d2a71ce85)  and then rebased
to a recent master (795c338de7) 

Patch 1/8: Auto / No Merge Conflict

Patch 2/8: Auto / No Merge Conflict

Patch 3/8: add/add conflict

Patch 4/8: Confirming movement of function `write_command_and_capabilities`

Patch 5/8: Auto / No Merge Conflict

Patch 6/8: Auto / No Merge Conflict

Patch 7/8: Makefile merge conflict but when opened in vscode it shows
0 conflict.

Patch 8/8: add/add conflict for object-store.c and modify/delete 
conflict for object-store-ll.h
According to 68cd492a3e 

> object-store: merge "object-store-ll.h" and "object-store.h"

And according to 8f49151763

> object-store: rename files to "odb.{c,h}"

Therefore I have added the function signature that was supposed to go to
object-store-ll.h to odb.h


## Work remaining to get v11 patch accepted

Almost all of it is focused on patch 8 

- Fix multi-line comment formatting - closing */ on own line
- Add blank lines between macro definitions
- Split overly-long MAX_REMOTE_OBJ_INFO_LINE definition across lines
- Change loop variable from size_t i to int i (since argc is int)
- Rearrange if/else to put smaller body first: if (!gtransport->smart_options)
    before else

- Fix the logic of maximum line size for the remote-object-info.
- Adding an allow list of object info fields 
- Handling what happens if an unsupported object info field is given in
    format string. 
    In this case we send the request as if such a object info field is
    not even there, and when printing the result we simply print an empty
    string on the client side. No extra payload on the network. 
 
- Add tests.
- Update Documentation 
    


## Project : Complete and extend the remote-object-info command for git cat-file

Currently in the case of a partial clone, the user cannot retrieve all 
object data without fetching the object beforehand. To solve this problem
Calvin Wan and Eric Ju had designed a patch series that can solve that,
by utilising protocolv2 servers capabilities.

This was done in the form of "remote-object-info".

But only the %(objectsize) was implemented, and that patch was not merged. 
This project has two goals 

1: To Rebase and finalize Calvin Wan and Eric Ju's Work by addressing
    the feedback on Eric Ju's Patch v11. Work for this part is discussed
    above in above section.

2: To discuss with the community and add support for other relevant 
    object info fields `remote-object-info` like `objecttype`, 
    `objectsize:disk` and `deltabase`

Project Duration : 13 week approx

## Timeline 


### Phase 1 : 

May 1-24 : Community Bonding + Start Design discussions on 
            Logic of allow list implementation
            Logic of maximum size of the remote-object-info command
            Which object info fields should be supported

Week 1 (May 25 - 31) : 
    Open Patch Series 1 for Eric Jus patch, after 
    solving all remaining problems. Use the discussed idea/solution from 
    above. Both client and server side work would be in the same patch 
    series. This is just rebasing previous work so I have to address 
    the changes suggested after v11.

Week 2 (June 1 - 7) : Continue discussion, review feedback and refine.

Week 3 (June 8 - 14) : Review feedback and refine

Week 4 (June 15 - 21) : Review feedback and refine + Update Documentation
    and Tests

Week 5 (June 22 - 28) : By now all tasks regarding Merging Eric Ju's 
    patch should be finished. But since it may take more time for 
    reviewing I am adding a buffer weeks.

Week 6 (June 29 - July 5) : Polish everything + Midterm report

Week 7 (July 6 - 12) : Midterm evaluation ( July 7-11)

Week 8 (July 13 - 19) : Start Patch Series 2 for adding other object info
    fields as per the discussion started in Week 1.

Week 9 (July 20 - 26) : Review feedback and refine.

Week 10 (July 27 - August 2) : Review feedback and refine. 

Week 11 (August 3 - 9) : Finalize all tests and Doc changes.

Week 12 (August 10 - 16) : Prepare Final report.

Week 13 (August 17 - 23) : Final Evaluation ( Aug 18-24 )



## Availability

My current semester is ending in the first week of May, so I will be
able to contribute 7-8 hours per day, totalling around 35-40 hrs a week
on the project.

Total weeks = 13 , total hours = 35*13 = 455
It leaves with a lot more room to accommodate any unforeseen circumstances
that may arise during the project.

## RFC 


Hi Christian and Karthik !

I still feel like the single object get remote info might be useful
and I think this might be where I can add this functionality :

When someone does `GIT_NO_LAZY_FETCH=0 git cat-file -s <oid>` 
And the oid is of a blob that is not on local, then git simply fetches
the blob and reruns git cat-file -s. 

But if someone does `GIT_NO_LAZY_FETCH=1 git cat-file -s <oid>` 
And the blob is not on local then it exits with the following error

>	if (git_env_bool(NO_LAZY_FETCH_ENVIRONMENT, 0)) {
>		static int warning_shown;
>		if (!warning_shown) {
>			warning_shown = 1;
>			warning(_("lazy fetching disabled; some objects may not be available"));
>		}
>		return -1;
>	}

Would it be useful behaviour if instead of exiting with an error it sent
a remote-object-info request for that single file ? 


Thank you for your time in reviewing my proposal as well as considering
my application. I am excited to learn everything I can from git.


Thanks and Regards,
Soutrik

[1] : pull.2187.git.git.1770293021383.gitgitgadget@gmail.com
[2] : 20260209172445.39536-1-valusoutrik@gmail.com
[3] : 20260225190306.39358-1-valusoutrik@gmail.com
[4] : 20240628190503.67389-1-eric.peijian@gmail.com
[5] : 20220728230210.2952731-1-calvinwan@google.com

      parent reply	other threads:[~2026-03-20 13:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05 20:48 [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file SoutrikDas
2026-03-15 10:11 ` SoutrikDas
2026-03-16 12:08 ` Christian Couder
2026-03-17 13:06   ` SoutrikDas
2026-03-16 20:46 ` Karthik Nayak
2026-03-17 15:13   ` SoutrikDas
2026-03-20 13:12 ` SoutrikDas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260320131200.3615-1-valusoutrik@gmail.com \
    --to=valusoutrik@gmail.com \
    --cc=ayu.chandekar@gmail.com \
    --cc=chandrapratap3519@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=siddharthasthana31@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox