From: JAYATHEERTH K <jayatheerthkulkarni2005@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, karthik nayak <karthik.188@gmail.com>,
Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Subject: Re: [GSOC] [Proposal v1] Machine-Readable Repository Information Query Tool
Date: Thu, 3 Apr 2025 20:05:45 +0530 [thread overview]
Message-ID: <CA+rGoLeRXUQu8ZbDtaLp2_YbVGA5D1DeA2vSEcLf74qXjB5U2A@mail.gmail.com> (raw)
In-Reply-To: <CA+rGoLfCTzNTcGXG5py6oHQazeE8Vj0fLsR4KUTJ6rSRFnT_Vw@mail.gmail.com>
# Proposal for GSOC 2025 to Git
**Refactoring `git rev-parse`: A Dedicated Command for Repository Information**
## Contact Details
* **Name**: K Jayatheerth
* **Email**: jayatheerthkulkarni2005@gmail.com
* **Blog**: [Blog](https://jayatheerthkulkarni.github.io/gsoc_blog/index.html)
* **GitHub**: [GitHub](https://github.com/jayatheerthkulkarni)
## Prerequisites & Experience
As part of the GSoC application prerequisites, I have engaged with the
Git community with a microproject involving documentation changes.
This provided valuable experience with Git's codebase, contribution
workflow (patch submission, feedback cycles), and communication via
the mailing list.
* **Microproject Patch Series:** [Main mail
thread](https://lore.kernel.org/git/xmqqa59evffd.fsf@gitster.g/T/#t)
(Link to the most relevant thread demonstrating interaction and
successful patch refinement)
* **Initial Patch:** [First
Patch](https://lore.kernel.org/git/20250312081534.75536-1-jayatheerthkulkarni2005@gmail.com/t/#u)
* **Mailing List Introduction:** [First
Mail](https://lore.kernel.org/git/CA+rGoLc69R8qgbkYQiKoc2uweDwD10mxZXYFSY8xFs5eKSRVkA@mail.gmail.com/t/#u)
* **Blog:** My GSoC blog details these interactions:
[Blog](https://jayatheerthkulkarni.github.io/gsoc_blog/index.html)
## **Synopsis**
This project focuses on **refactoring Git by creating a dedicated
command (tentatively named `git repo-info`) to house the low-level
repository, path, and format-related query options currently misplaced
under the "OPTIONS FOR FILES" section of `git-rev-parse(1)`**. This
new command will provide a more logical and maintainable location for
this functionality, allowing `git rev-parse` to better focus on its
core purpose of parsing revisions, thus improving Git's internal
organization and command structure clarity.
## **Benefits to the Community**
### **1. Improves `git rev-parse` Clarity and Maintainability**
- `git rev-parse` has accumulated various options unrelated to its
primary purpose of parsing revisions, particularly those for querying
low-level repository state and paths.
- This project **directly addresses this issue** by migrating these
options to a dedicated command, making `git rev-parse` cleaner and
easier to understand and maintain.
- Provides a **clearer separation of concerns** within Git's command suite.
### **2. Provides Reliable Access for Automation and Scripting**
- Scripts often need fundamental repository information like the
top-level directory path (`--show-toplevel`), the `.git` directory
location (`--git-dir`), or repository state (`--is-bare-repository`).
- Currently, scripts rely on `git rev-parse` for this, mixing
low-level repo queries with revision parsing calls.
- The new `git repo-info` command will offer a **stable, dedicated
interface** for retrieving this specific low-level information, making
scripts **cleaner and more robust** by calling the command designed
explicitly for these tasks.
- The default output will mimic the **existing, simple text format**
of the `rev-parse` options, ensuring compatibility for scripts
migrating to the new command.
### **3. Enhances CI/CD Pipeline Foundations**
- CI/CD pipelines frequently need to establish context by determining
the repository root or `.git` directory location early in their
execution.
- Using the dedicated `git repo-info` command for these foundational
queries **simplifies the initial setup steps** in pipeline scripts
compared to using the overloaded `git rev-parse`.
## Deliverables
Acknowledging the needs that the project scope is focused on
refactoring `git rev-parse`, this project will introduce a new Git
command, tentatively named `git repo-info`, serving as the designated
home for specific low-level query options.
The key deliverables for this GSoC project include:
1. **New Core Command: `git repo-info`**
* A new `builtin/repo-info.c` command integrated into the Git source code.
* Implementation primarily in C, leveraging existing internal Git APIs
and logic currently within `rev-parse.c` to implement the relocated
options.
2. **Relocated `rev-parse` Options:**
* Implementation of the core functionality behind the following
options from `git-rev-parse(1)`'s "OPTIONS FOR FILES" section within
the new `git repo-info` command:
* **Path Queries:** `--show-cdup`, `--show-prefix`, `--show-toplevel`,
`--show-superproject-working-tree`
* **Directory Queries:** `--git-dir`, `--git-common-dir`,
`--resolve-git-dir <path>`
* **State/Format Queries:** `--is-inside-git-dir`,
`--is-inside-work-tree`, `--is-bare-repository`,
`--is-shallow-repository`
* **Index File Query:** `--shared-index-path`
3. **Default Output Format (Text-Based):**
* The command's default output for each implemented option will
**match the current plain text output** produced by `git rev-parse`
for that same option, ensuring backward compatibility for scripts
migrating to the new command. Output will primarily be via standard C
functions like `printf` or `puts`.
4. **Comprehensive Documentation:**
* A clear man page (`git-repo-info.adoc`) explaining the new command's
purpose and detailing the usage and output of each implemented option.
* Updates to `git-rev-parse.adoc` to clearly **deprecate** the
relocated options (or mark them as aliases for compatibility) and
point users to the new `git repo-info` command.
5. **Robust Test Suite:**
* A new test script (`t/tXXXX-repo-info.sh`) using Git's test
framework (`test-lib.sh`).
* Tests specifically validating the output of `git repo-info --option`
against the output of `git rev-parse --option` across various
repository states (standard repo, bare repo, inside `.git`, inside
worktree, submodules, shallow clone etc.) to ensure functional parity.
6. **(Stretch Goal / Potential Future Work): Structured Output**
* If time permits after successfully implementing, documenting, and
testing the core text-based functionality, investigate adding a
`--format=json` option to provide a structured JSON output containing
the results of the requested queries. This is explicitly a secondary
goal, contingent on completing the primary refactoring task.
**Out of Scope for GSoC (Based on Refined Goal):**
* Querying high-level metadata like current branch name, HEAD commit
details (beyond `--is-shallow-repository`), remote URLs, tags, or
arbitrary configuration values.
* Complex status reporting (worktree dirtiness).
* Real-time monitoring or comparing metadata between revisions.
* Implementing JSON output as the *primary* feature.
## Technical Details
This section outlines the proposed technical approach for implementing
the `git repo-info` command and relocating the specified options:
1. **Core `git repo-info` Command Implementation:**
* **Entry Point:** Create `builtin/repo-info.c` with
`cmd_repo_info(...)` function. Parse options using Git's
`parse-options` API.
* **Repository Context:** Utilize the standard `repo` structure and
`startup_info` provided by Git's infrastructure. Setup the repository
context similar to how `cmd_rev_parse` does it if needed (e.g., using
`setup_git_directory_gently`).
* **Reusing Logic:** Analyze the implementation of the target options
within `builtin/rev-parse.c`. Extract and adapt the relevant C
functions and logic (related to path manipulation using `prefix_path`,
`real_pathcmp`; repository state checks using
`is_bare_repository_cfg`, `is_inside_git_dir`, `is_inside_work_tree`;
accessing `startup_info`, `git_path`, etc.) into `builtin/repo-info.c`
or potentially shared helper functions if appropriate.
* **Specific Option Implementation:**
* `--show-toplevel`, `--show-cdup`, `--show-prefix`: Rely on the
`prefix` calculated during setup and path manipulation functions.
* `--git-dir`, `--git-common-dir`: Access `repo->gitdir`,
`repo->commondir` or use functions like `get_git_dir()`,
`get_common_dir()`. `--resolve-git-dir` will involve path resolution
relative to the provided argument.
* `--is-*` flags: Call existing helper functions like
`is_bare_repository_cfg()`, `is_inside_git_dir()`,
`is_inside_work_tree()`. `--is-shallow-repository` involves checking
`repo->is_shallow`.
* `--shared-index-path`: Access path information related to split
indexes if enabled.
* **Output Generation:** Use standard C `printf("%s\n", ...)` or
`puts(...)` to print the resulting string (path, "true"/"false", etc.)
to standard output, matching `rev-parse`'s current behavior. Boolean
flags typically exit `0` for true and `1` for false without output,
this behavior should be preserved.
2. **Documentation:**
* Create `Documentation/git-repo-info.adoc` using AsciiDoc format,
modeling it after existing man pages. Detail each option, its purpose,
and expected output.
* Modify `Documentation/git-rev-parse.adoc`, adding notes to the
relevant options indicating they are better handled by `git repo-info`
and potentially marking them for deprecation in a future Git version.
3. **Testing:**
* Create `t/tXXXX-repo-info.sh` using `test-lib.sh`.
* Structure tests using `test_expect_success` blocks.
* Utilize helper functions like `test_create_repo`, `cd repo`,
`test_cmp` to compare the output of `git repo-info --option` directly
against `git rev-parse --option` (for options producing output) or
against expected exit codes (for boolean flags).
* Cover edge cases like running outside a repository, in a bare
repository, deep within a worktree, within the `.git` directory, and
in repositories with submodules or worktrees.
4. **(Stretch Goal) JSON Output Implementation:**
* If attempted, add a `--format=json` option using `parse-options`.
* Collect results from the requested options internally.
* Use either an approved embedded C JSON library or Git's `strbuf` API
(with helpers like `strbuf_add_json_string`) to construct a JSON
object mapping option names (or descriptive keys) to their
corresponding values. Print the final JSON string to standard output.
Add specific tests for JSON output validation.
## Detailed Project Timeline
**Phase 0: Pre-Acceptance Preparation (April 9 - May 7, 2025)**
* **Focus:** Demonstrate continued interest and deepen understanding
*specifically of `rev-parse`'s internals* while awaiting results.
* **Activities:**
* **(April 9 - April 21):** Deep dive into `builtin/rev-parse.c`,
identifying the exact code blocks implementing the "OPTIONS FOR
FILES". Trace how they use `startup_info`, `prefix`, path functions,
and repository flags.
* **(April 22 - May 7):** Continue monitoring the mailing list. Refine
understanding of Git's testing framework, specifically focusing on
tests for `rev-parse` options (e.g., `t1006-cat-file.sh`,
`t5601-clone.sh` might use some flags). Review contribution
guidelines.
**Phase 1: Final Planning (May 8 - May 26, 2025 Approx.)**
* **Focus:** Formal introductions, confirm final scope & plan, setup.
* **Activities:**
* **(Week 1: May 8 - May 12):** Introduction with mentor(s). Confirm
the exact list of `rev-parse` options to be migrated. Discuss the
preferred approach for handling deprecation in `rev-parse` docs/code.
Discuss potential for shared helper functions vs. direct code
migration.
* **(Week 2: May 13 - May 19):** Set up dev environment. Deep dive
into the agreed-upon functions/code blocks within `rev-parse.c`.
Outline the basic structure for `builtin/repo-info.c` and the test
script `t/tXXXX-repo-info.sh`.
* **(Week 3: May 20 - May 26):** Implement the basic `cmd_repo_info`
skeleton, option parsing setup, and repository setup boilerplate.
Write initial "no-op" tests. Post first blog update.
**Phase 2: Implementation in Batches (Coding Weeks 1-8: May 27 - July
21, 2025 Approx.)**
* **Focus:** Implement options in logical groups, test thoroughly,
submit patches early and often.
* **GSoC Milestone:** Midterm Evaluations occur around Week 8.
* **Activities:**
* **(Batch 1 / Weeks 1-2: May 27 - June 9):** Implement basic path
queries: `--show-toplevel`, `--show-prefix`, `--show-cdup`. Add tests
comparing output with `rev-parse`. **Submit Patch Series 1**.
* **(Batch 2 / Weeks 3-4: June 10 - June 23):** Implement directory
queries: `--git-dir`, `--git-common-dir`, `--resolve-git-dir <path>`.
Add tests. **Submit Patch Series 2**. Write blog post update.
* **(Batch 3 / Weeks 5-6: June 24 - July 7):** Implement boolean state
queries: `--is-bare-repository`, `--is-inside-git-dir`,
`--is-inside-work-tree`. Add tests checking exit codes and behavior in
various locations. **Submit Patch Series 3**.
* **(Batch 4 / Weeks 7-8: July 8 - July 21):** Implement remaining
queries: `--is-shallow-repository`, `--shared-index-path`,
`--show-superproject-working-tree`. Add comprehensive tests covering
interactions (e.g., in submodules, shallow clones). **Submit Patch
Series 4**. Prepare for Midterm evaluation; ensure submitted batches
demonstrate core progress. Write blog post update.
**Phase 3: Documentation & Final Polish (Coding Weeks 9-12: July 22 -
Aug 18, 2025 Approx.)**
* **Focus:** Create documentation, address feedback on all patches,
refine implementation, potentially attempt stretch goal.
* **Activities:**
* **(Week 9: July 22 - July 28):** Write the first complete draft of
the man page for `git-repo-info`. Draft the necessary updates for
`git-rev-parse.adoc` (deprecation notices). **Submit Patch Series 5
(Documentation)**.
* **(Week 10: July 29 - Aug 4):** Focus on addressing review comments
on **all** previous patch series. Refactor code based on feedback.
Ensure test suite is robust and covers feedback points.
* **(Week 11: Aug 5 - Aug 11):** *Stretch Goal (Conditional):* If core
functionality and docs are stable and reviewed positively, begin
investigating/implementing `--format=json`. Add specific JSON tests if
implemented. Otherwise, focus on further code cleanup and test
hardening.
* **(Week 12: Aug 12 - Aug 18):** Prepare and submit final versions of
all patch series, incorporating all feedback. Final testing pass.
Write blog post update summarizing progress and final state. Code
freeze for final evaluation.
**Phase 4: Final Evaluation & Wrap-up (Aug 19 - Nov 19, 2025)**
* **Focus:** Final submissions, respond to late feedback, ensure
project completion.
* **Official GSoC Milestone:** November 19, 2025 - Program End Date.
* **Activities:**
* **(Late Aug - Sept):** Submit final GSoC evaluations. Actively
respond to any further comments on submitted patches from the
community/maintainers, aiming for merge readiness.
* **(Oct - Nov 19):** Monitor mailing list for patch status. Write
final GSoC project summary blog post. Continue engaging with the
community if interested in further contributions beyond GSoC.
Thank You,
Jayatheerth
next prev parent reply other threads:[~2025-04-03 14:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-31 14:51 [GSOC] [Proposal v1] Machine-Readable Repository Information Query Tool JAYATHEERTH K
2025-03-31 14:59 ` JAYATHEERTH K
2025-04-03 10:23 ` Patrick Steinhardt
2025-04-03 14:10 ` JAYATHEERTH K
2025-04-03 14:35 ` JAYATHEERTH K [this message]
2025-04-05 19:42 ` Karthik Nayak
2025-04-06 5:40 ` JAYATHEERTH K
2025-04-06 6:09 ` JAYATHEERTH K
2025-04-06 18:08 ` Kaartic Sivaraam
2025-04-07 2:32 ` JAYATHEERTH K
2025-04-04 9:13 ` Patrick Steinhardt
2025-04-04 13:22 ` JAYATHEERTH K
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+rGoLeRXUQu8ZbDtaLp2_YbVGA5D1DeA2vSEcLf74qXjB5U2A@mail.gmail.com \
--to=jayatheerthkulkarni2005@gmail.com \
--cc=git@vger.kernel.org \
--cc=karthik.188@gmail.com \
--cc=ps@pks.im \
--cc=shyamthakkar001@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).