From: Jialong Wang <jerrywang183@yahoo.com>
To: git@vger.kernel.org
Cc: karthik.188@gmail.com, jltobler@gmail.com,
ayu.chandekar@gmail.com, siddharthasthana31@gmail.com,
jerrywang183@yahoo.com
Subject: [GSoC proposal v3][RFC] Improve the new git repo command
Date: Wed, 18 Mar 2026 16:08:16 -0400 [thread overview]
Message-ID: <20260318200816.31430-1-jerrywang183@yahoo.com> (raw)
In-Reply-To: <CAOLa=ZQ7AMUb72N-0Z-h09KneE+ASuXt=BUOmO9Bzp4y6w6XyQ@mail.gmail.com>
Hi all,
This is v3 of my proposal draft for the "Improve the new git repo
command" project. I am including the full draft inline below for
convenience.
In this revision, I tried to make the scope more realistic and better
aligned with the current public discussion around `git repo info`. In
particular, I:
- revised the proposal so it does not assume that path-related `git repo
info` work is starting from scratch
- reframed the project around integration, testing, repository-aware
cleanup, and any still-open metadata gaps
- added my more recent Git contributions
- added a short "immediate next steps" section describing the kind of
`git repo` patch I want to work on next before the coding period
I would appreciate any feedback from mentors and reviewers on whether
this revised framing is closer to the right direction.
Thanks for any feedback,
Jialong
Improve the git repo command
Name
Jialong Wang
Email
jerrywang183@yahoo.com
Preferred project size
175 hours
About me
My name is Jialong Wang, and I plan to apply to Git for GSoC 2026.
I have been getting familiar with Git's development workflow by building
Git from source, reading the contribution documents, and working on
patches through the mailing list. My initial microproject focused on
improving corrupt patch location reporting in `git apply` and `git am`.
That work went through mailing-list review, including comments from
Karthik Nayak and Junio C Hamano, and gave me direct experience with
rerolling patches, updating tests, and using CI to catch gaps I had
missed locally.
Since then, I have continued contributing small Git patches and
follow-up work instead of stopping after the microproject. My recent
contributions include:
1. an initial patch series to report the location of corrupt patches
more clearly
2. a follow-up patch to report input locations in header parsing errors
in `apply.c`
3. a follow-up patch to report input locations in binary and garbage
patch error paths in `apply.c`
4. `t2203: avoid suppressing git status exit code`
5. `object-name: turn INTERPRET_BRANCH_* constants into enum values`
This has helped me get comfortable with Git's normal workflow of
starting with a small change, responding to review, rerolling
appropriately, and then continuing with logically related follow-up
work.
Project summary
I would like to work on improving the new `git repo` command, with a
primary focus on `git repo info`.
The `git repo` command was introduced to provide a cleaner interface for
querying repository metadata. Path-related values are a natural part of
that goal, but the public discussion this year has already shown that
this topic is not starting from zero: there is ongoing work around
path-related fields, category-aware key naming, and path-format
behavior.
Because of that, I do not want to frame this proposal as "I will newly
add repository path metadata" in isolation. Instead, my proposal is to
improve `git repo info` by building on the direction already taking
shape upstream, focusing on integration, testing, repository-aware
cleanup, and any remaining path-related or adjacent metadata work that
is still useful and unimplemented by the time GSoC begins.
The goal is not to replace `git rev-parse`, but to make `git repo info`
a more coherent and better-tested structured interface for repository
metadata.
Motivation
Today, scripts and tools still often rely on commands such as:
- `git rev-parse --git-dir`
- `git rev-parse --show-toplevel`
- `git rev-parse --git-path <path>`
These commands are useful, but they were not primarily designed as a
structured repository metadata interface.
Since `git repo info` already exists for this purpose, extending and
refining it would make repository layout information easier to query in
a cleaner and more consistent way. However, given the current public
work in progress, I think the most useful contribution is not to
duplicate existing series, but to help move this area toward a better
integrated and upstream-ready state.
Current context
I am aware that work on path-related `git repo info` fields has already
started. There have already been patch series and proposal discussions
for path keys, category requests, path formatting, and nearby
`git repo structure` ideas.
Because of that, one of my first goals during the bonding period would
be to review the current state of those discussions carefully, identify
what remains open, and refine the exact project scope based on mentor
feedback. I would rather build on the current direction than duplicate
work that is already in progress.
At this point, the direction that seems most realistic to me is:
1. first align with the upstream direction that is already emerging for
`git repo info`
2. improve the command's internal consistency and test coverage
3. implement remaining path-related or adjacent metadata work only where
it is still clearly useful and not already being covered elsewhere
Immediate next steps
Before the coding period, I want to keep contributing in this area
through small reviewable patches instead of waiting until GSoC starts.
My immediate plan is:
1. review the latest upstream state of the path-related and
category-related `git repo info` work
2. identify one small `git repo` patch that does not duplicate an
in-flight series
3. start either with a repository-aware cleanup in `builtin/repo.c` or
with stronger tests in `t/t1900-repo-info.sh`, depending on which
direction is still open and useful
4. use that first patch series to validate the project direction with
the mailing list before committing to a larger implementation batch
Proposed work
The main objective of this project is to improve `git repo info` as a
structured repository metadata interface while avoiding duplication of
public in-flight work.
I expect the work to proceed in four connected parts:
1. review the current implementation and ongoing mailing-list
discussions, then narrow the initial scope to a first small batch of
cleanup, tests, or still-open metadata work
2. discuss design details on the mailing list, especially where there
are open questions about path naming, path formatting, or the
relationship with existing `git rev-parse` behavior
3. implement the agreed functionality through small patch series, with
each patch or small patch group carrying its own tests and
documentation updates for the user-visible behavior
4. if the first batch is in good shape, extend support to a second
agreed batch of improvements, whether that means remaining
path-related fields, repository-aware cleanup, or nearby metadata
work that still appears useful
Initial scope
At the beginning of the project, I would prefer to keep the first
practical batch conservative.
Rather than assuming that the first implementation work should directly
add a large number of path keys, I would prefer to start from one of
these two realistic entry points, depending on the state of upstream
work:
1. a small batch of still-unimplemented layout-related fields with clear
`rev-parse` equivalents, if those remain open
2. repository-aware cleanups and stronger tests around `git repo info`,
if the path-field direction is already substantially covered by
existing series
If path-related values are still a good first target by the beginning of
the coding period, the most likely initial candidates would be a small
set of high-value layout paths such as:
- `git-dir`
- `common-dir`
- `toplevel`
- `superproject-working-tree`
If those are already substantially addressed, I would instead prioritize
cleanups and tests that help the command mature, for example:
- reducing unnecessary reliance on global repository state inside
`builtin/repo.c`
- strengthening coverage in `t/t1900-repo-info.sh`
- covering edge cases such as linked worktrees and `--separate-git-dir`
Technical approach
The implementation of `git repo` is primarily in `builtin/repo.c`. The
first step would be to understand how `git repo info` currently collects
and prints repository metadata, and how that existing structure can be
extended or cleaned up without making the interface inconsistent.
Many relevant repository values are already available internally through
helpers such as:
- `repo_get_git_dir()`
- `repo_get_common_dir()`
- `repo_get_work_tree()`
Similarly, `git rev-parse` and `git rev-parse --git-path` already rely
on existing path resolution logic. So the work is not about inventing
these values from scratch, but about exposing or integrating a selected
subset of them through `git repo info` in a way that fits its current
design.
Patch strategy
I expect the implementation to be divided into small patches so that
each change can be reviewed independently.
A likely patch strategy would be:
1. a small preparatory cleanup if needed
2. a first small batch of `git repo` improvements, together with the
tests and documentation updates needed for those changes
3. a second batch that extends the same direction once the first one is
reviewed
I do not want to treat tests and documentation as a final cleanup
stage. Since these are user-visible changes to `git repo info`, I think
they should evolve with each patch batch so that the mailing list can
review the interface and its description at the same time as the
implementation.
Tests
Tests would be added alongside the new behavior rather than at the very
end.
Depending on the exact scope agreed on, test cases may include:
- ordinary repositories
- linked worktrees
- superproject and submodule cases
- repositories created with `--separate-git-dir`
- cases where path values differ from simple defaults
Where possible, I would compare new `git repo info` behavior against
existing `git rev-parse` behavior when the semantics are intentionally
close. I would also look for opportunities to reuse or mirror repository
layouts and edge cases that are already important elsewhere.
What I will not try to do
To keep the project realistic, I do not plan to:
- redesign all of `git repo`
- fully replace `git rev-parse`
- reimplement path-related work that is already being actively reviewed
- work on both `git repo info` and `git repo structure` at full scope in
the same project
Expected deliverables
By the end of the project, I expect to deliver:
- support for a useful set of `git repo info` improvements that are
still clearly open and upstream-relevant
- tests covering the new functionality and relevant repository layouts
- documentation updates for the new fields or behavior
- one or more patch series discussed and refined on the Git mailing list
Success criteria
I would consider the project successful if, by the end of the GSoC
period, the following are true:
1. a first useful batch of `git repo` improvements has been implemented
and is in good shape on the mailing list, ideally merged or close to
merge-ready
2. the new or refined behavior is covered by tests that clearly
exercise the agreed repository layouts and semantics
3. the documentation has been updated together with the implementation
4. if review and scope permit, at least one further agreed batch of
improvements has also been implemented or is well advanced
Timeline
Community bonding period
- study `builtin/repo.c` and the current `git repo info` implementation
- review recent and ongoing mailing-list discussions related to
`git repo`
- compare current `git repo info` behavior with related
`git rev-parse` behavior
- refine the exact scope with mentors and mailing-list feedback
- identify the first small batch of work that looks realistic for an
initial patch series
Weeks 1-3
- confirm the exact first batch of work to target
- prepare and send an initial patch series for that batch
- include tests and documentation updates in that first series
- address review comments and reroll as needed
Weeks 4-6
- continue strengthening semantics and coverage
- add tests for edge cases such as linked worktrees and
`--separate-git-dir`
- resolve small interface inconsistencies discovered during the early
cleanup work
Weeks 7-9
- finish or polish any remaining path/category/path-format work that
still needs implementation or integration
- coordinate patch scope with the latest upstream discussion
- update documentation to match settled behavior
Weeks 10-12
- implement one or more remaining metadata or interface improvements
that are still clearly useful and unclaimed
- focus on review-driven cleanup, additional tests, and documentation
polish
- prepare final report and project summary
Risks and mitigation
The main risk is overlap with parallel upstream work. I plan to mitigate
that by treating the project as integration-oriented from the beginning,
keeping patch series small, and adjusting scope based on the latest
public discussion and mentor guidance.
A second risk is that some of the path-related work may be largely
settled before the coding period starts. If that happens, I would shift
effort toward repository-aware cleanups, stronger test coverage,
documentation alignment, and other still-open `git repo` improvements
rather than forcing redundant feature work.
Why I think I am a good fit
I have already invested time in learning Git's contribution process
through actual submissions rather than only private experimentation.
That includes building the project, reading tests, sending patches,
rerolling in response to feedback, and adjusting patch structure when
maintainers asked for it.
I believe that experience is directly relevant here. The main challenge
of this project is not only writing code, but also moving an evolving
command forward in an upstream-friendly way without duplicating parallel
work. My recent contributions have helped me understand that process
much better, and I believe they put me in a stronger position to carry
this project successfully.
Relevant links
SoC 2026 idea page
https://git.github.io/SoC-2026-Ideas/
General application information
https://git.github.io/General-Application-Information/
git repo documentation
https://git-scm.com/docs/git-repo
git rev-parse documentation
https://git-scm.com/docs/git-rev-parse
Recent patch series adding path-related support to git repo
https://public-inbox.org/git/20260228224252.72788-1-lucasseikioshiro@gmail.com/
Recent work-in-progress series for category and path keys and
`--path-format`
https://public-inbox.org/git/pull.2208.v6.git.git.1772428548.gitgitgadget@gmail.com/
Recent GSoC proposal thread on improving the new git repo command
https://public-inbox.org/git/20260303140732.16886-1-pushkarkumarsingh1970@gmail.com/
Another recent GSoC proposal thread on improving and extending git repo
https://public-inbox.org/git/CA+rGoLd-1Mb5JG1H1PvE-kyjdznrLVFjwQiMLHtd2ETQ-igmXg@mail.gmail.com/
Recent proposal thread focused on the same SoC idea
https://public-inbox.org/git/CAO_P5U3g_+RpnDUmEv_qX-3GVhpxLV97eMxP1apERc0KU_95tQ@mail.gmail.com/
Microproject discussion thread
https://public-inbox.org/git/CAKWWG_nGhD6vqhAS1mkEwBQPrg_YX0+C3-xW=Q3ifFDw4dDviw@mail.gmail.com/
Microproject patch thread
https://public-inbox.org/git/20260315231538.68586-1-jerrywang183@yahoo.com/
Follow-up patch for header parsing errors
https://public-inbox.org/git/20260316195847.92386-1-jerrywang183@yahoo.com/
next prev parent reply other threads:[~2026-03-18 20:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9fc1d23fbc7d46349ac01314fbfc06eb.gsoc-proposal-draft-jerrywang183.ref@yahoo.com>
2026-03-16 11:47 ` [GSoC] Proposal draft: Improve the new git repo command Jialong Wang
2026-03-16 20:59 ` Karthik Nayak
2026-03-17 0:28 ` Jialong Wang
2026-03-18 20:08 ` Jialong Wang [this message]
2026-03-16 21:05 ` Jialong Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260318200816.31430-1-jerrywang183@yahoo.com \
--to=jerrywang183@yahoo.com \
--cc=ayu.chandekar@gmail.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=siddharthasthana31@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox