public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: SoutrikDas <valusoutrik@gmail.com>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, karthik.188@gmail.com,
	jltobler@gmail.com, ayu.chandekar@gmail.com,
	siddharthasthana31@gmail.com, chandrapratap3519@gmail.com
Subject: [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file
Date: Fri,  6 Mar 2026 02:18:09 +0530	[thread overview]
Message-ID: <20260305204809.54927-1-valusoutrik@gmail.com> (raw)


Hi!

This is my project proposal for GSOC 2026

I am interested in the project idea : "Complete and extend the 
remote-object-info command for git cat-file"


# Complete and extend the remote-object-info command for git cat-file

## Contact

- Name: Soutrik Das
- E-mail: valusoutrik@gmail.com
- Github: https://github.com/SoutrikDas
- LinkedIn: https://www.linkedin.com/in/soutrik-das/

## About Me

My name is Soutrik Das, I am a developer and CS bachelor from Indian 
Institute of Technology, Dhanbad. Currently I am pursuing a master's
degree in AI from Indian Institute of Technology, Bhubaneswar.

I dont really have much experience in contributing to something as 
large as git, but I would love to learn anything and everything I can
gain from this experience. I have experience in C/C++ from my
Btech coursework and participating in codeforces contests.


## Pre GSOC

I started exploring Git's codebase around February 2026 and sent my first patch
as a docfix, followed by a microproject of modernizing tests 

- [PATCH] doc: fix repo_config documentation reference [1]
    status: merged to master 
    Merge Commit: 94336d77bcbf4360b67a9454d8bf2e84b3d88ae7
    Description: Replace the path for the repo_config() documentation 
    from 'Documentation/technical/api-config.h' to 'config.h'.

- [GSOC PATCH] t7003: modernize path existence checks using test helpers [2]
    status: merged to master 
    Merge Commit: 11294bb0fa540d214d071b32cf74b1ed37b3bbbd
    Description: Replace direct uses of 'test -f' and 'test -d' with
    git's helper functions 'test_path_is_file' ,'test_path_is_missing'
     and 'test_path_is_dir'


I have read through most of Eric Ju's [4] work and some of Calvin Wan's [5]
work. I am still finding more things to understand from each thread, but 
I feel I have grasped the basics.

My work in this project would be focused on implementing the changes
suggested at the end of Eric Ju's [Patch v11].

I wouldn't say I understand every bit of discussion from that thread,
but in general my understanding is :

Calvin Wan and Eric Ju has already implemented a client side command
called get_remote_info but its designed for being batched to reduce
multiple network trips to get a single object's data. 

I have added Eric Ju's patch series to an old master commit (2d2a71ce85)
since I could not find a base commit for Eric's patch series. The patch
was properly applied and I also played around and added a very rough
but workin "%(objecttype)" code , ie now it prints like this : 

29658341f39210201ff7f72a4be83937cf2288c5 14 blob


## Project : Complete and extend the remote-object-info command for git cat-file

Currently in the case of a partial clone, the user cannot retrieve all 
object data without fetching the object beforehand. To solve this problem
Calvin Wan and Eric Ju had designed a patch sreies that can solve that,
by utilising protocolv2 servers capabilities.

This was done in the form of "remote-object-info".

But only the %(objectsize) was implemented, and that patch was not merged. 
This project has two goals 

1: To Rebase and finalize Calvin Wan and Eric Ju's Work by addressing
    the feedback on Eric Ju's Patch v11 

2: To add support for objecttype in remote-object-info

3: To discuss other information type like objectsize:disk and deltabase.

Project Duration : 12 week approx

## Timeline 

Mar 6-31 : Refine Proposal

    If possible I would like to submit small patches... but first I will
    have to rebase Eric Ju's Patches ... I am not sure if I can do this
    before GSOC...

    If not, I plan to contribute to git in other areas.

May 1-24 : Community Bonding 
    1-7  : Understand relevant underlying/ helper functions
    8-24 : Ask about any design related problems/decisions

May 25 - Jun 14 : Start a Patch Series to rebase Calvin Wan and Eric Ju's work
    and keep refining

Jun 15 - Aug 15 : Start and keep refining Patch Series to add support for
    object type information

Aug 16 - Aug 24 : Discuss and Implement other object information if possible
    Concurrently I shall make a report for all the work done.

## Availability

My current semester is ending in the first week of April, so I will be
able to contribute 7-8 hours per day, totalling around 35-40 hrs a week
on the project.

Total weeks = 12 , total hours = 35*12 = 420 
It leaves with a lot more room to accomodate any unforeseen circumstances
that may arise during the project.

## RFC 

I have a few ideas but do not know if they are worth pursuing, so I will
leave them here in the first draft 

- Addition of a remote-object-info outside of batchmode :
    Yes it should be optimally used in batch mode .. but if user wants
    only one objects size or type then should they be able to just 
    `git cat-file -r origin <oid>` 
    and get the size and type ? or something similar , I am not sure if
    the way I have depicted it conforms to git's design.

- Addition of commands for common user behaviour :
    I dont know if its going to be a common user behaviour but what about
    `git cat-file -r --all-absent` 
    Or inside "git cat-file --batch-command="<format> remote-object-info 
    --all-absent --type=tree <remote>"
    which would basically fill in remote-object-info with all the blobs
    that are currently absent from the worktree ?
    No need to fill them if its for a common enough use case.

- Sort according to size :
    Maybe a user would want to check whats the largest file they dont
    have yet.

- Get total missing blob size :
    Use case would be when someone wants to know how much exactly there
    is to download, before starting the download.
    
Thank you for your time in revewing my proposal as well as considering
my application. I am excited to learn everything I can from git.

Thanks and Regards,
Soutrik


[1] : pull.2187.git.git.1770293021383.gitgitgadget@gmail.com
[2] : 20260209172445.39536-1-valusoutrik@gmail.com
[3] : 20260225190306.39358-1-valusoutrik@gmail.com
[4] : 20240628190503.67389-1-eric.peijian@gmail.com
[5] : 20220728230210.2952731-1-calvinwan@google.com

             reply	other threads:[~2026-03-05 20:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05 20:48 SoutrikDas [this message]
2026-03-15 10:11 ` [GSOC Proposal] Complete and extend the remote-object-info command for git cat-file SoutrikDas
2026-03-16 12:08 ` Christian Couder
2026-03-17 13:06   ` SoutrikDas
2026-03-16 20:46 ` Karthik Nayak
2026-03-17 15:13   ` SoutrikDas
2026-03-20 13:12 ` [GSoC Proposal v2] " SoutrikDas
  -- strict thread matches above, loose matches on Subject: below --
2026-03-13 10:17 [GSoC] Proposal: " Pablo
2026-03-14  5:58 ` Chandra Pratap
2026-03-14 18:31   ` Pablo
2026-03-15  9:20     ` Chandra Pratap
2026-03-16 11:21     ` Christian Couder
2026-03-16 21:38     ` Karthik Nayak
2026-03-18 10:45       ` Pablo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260305204809.54927-1-valusoutrik@gmail.com \
    --to=valusoutrik@gmail.com \
    --cc=ayu.chandekar@gmail.com \
    --cc=chandrapratap3519@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=siddharthasthana31@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox