public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [GSoC Proposal] Refactoring in order to reduce Git's global state
@ 2026-03-17 17:54 Francesco Paparatto
  2026-03-21 13:36 ` Christian Couder
  2026-03-24 19:31 ` [GSoC Proposal v2] " Francesco Paparatto
  0 siblings, 2 replies; 8+ messages in thread
From: Francesco Paparatto @ 2026-03-17 17:54 UTC (permalink / raw)
  To: git
  Cc: christian.couder, Ayush Chandekar, jltobler, Siddharth Asthana,
	karthik nayak

Refactoring in order to reduce Git's global state

Personal Information
--------------------
Name: Francesco Paparatto
Pronouns: he/him
Location: Milan, Italy
Time Zone: CET (UTC+1)
Email: francescopaparatto@gmail.com
GitHub: https://github.com/frapaparatto
LinkedIn: https://www.linkedin.com/in/francesco-paparatto/

About Me
--------
I am Francesco Paparatto, a self-taught programmer who dropped out
of a degree in Management to dedicate full-time to software
engineering.

My goal is to work as a Backend/Infrastructure Engineer,
and to reach that goal I am balancing CS fundamentals through
theoretical courses with challenging projects that help me develop
strong engineering skills, not only from a code perspective but also
from a system thinking point of view. I also like building
fundamental things from scratch in order to understand how they work.

This is my first time in open source and I am fascinated by this
world. I wish to become a cornerstone in one open source community.

Git Experience and Contributions
---------------------------------
I started learning Git in depth at the beginning of 2026 when I
began working on my cgit project [1], a small reimplementation of
Git's core plumbing commands in order to understand how they really
work under the hood, but also as a way to start reading and learning
from real codebases and learn how to design and structure code
properly.

So far, I have made the following contributions:

* [GSoC PATCH v2] t3310: replace test -f/-d with
  test_path_is_file/test_path_is_dir
  Link: https://lore.kernel.org/git/20260228005939.9012-1-francescopaparatto@gmail.com/
  Status: Graduated to 'master'.

* [PATCH v4] t3310: avoid hiding failures from rev-parse in
  command substitutions
  Link: https://lore.kernel.org/git/20260307103631.89829-1-francescopaparatto@gmail.com/
  Status: Will merge to 'master'.

Overview
--------
Git's internal functions rely heavily on global state stored in
environment.c. Configuration values like trust_executable_bit,
editor_program, and git_commit_encoding are declared as file-scope
globals and populated at startup through git_default_config() and
its sub-handlers like git_default_core_config().

This design assumes a single repository per process. When Git is
used as a library (libification) or needs to handle multiple
repositories in the same process, globals from one repository
overwrite values from another. For example, two threads formatting
commits for repositories with different i18n.commitEncoding settings
would race on the same git_commit_encoding pointer.

The goal of this project is to move these global variables into
per-repository structures within struct repository, following the
pattern established by Olamide Bello's Outreachy work with struct
repo_config_values [2].

Context and Prior Work
-----------------------
Not all config variables can be treated in the same way. There is
a fundamental distinction between eagerly and lazily parsed
variables, and conflating the two causes regressions.

Variables set in git_default_core_config() are eagerly parsed. They
are read at startup, and if a value is invalid, Git calls die()
immediately with a clear error before doing any real work. The user
gets early feedback and can fix their config.

Variables in struct repo_settings are lazily parsed. They are
populated on first access via prepare_repo_settings(). If an eagerly
parsed variable is naively moved into this struct, invalid config
that used to crash at startup now crashes mid-operation — the user
may have already started work that is now lost.

During GSoC 2025, Ayush Chandekar moved several global configuration
variables into repository-scoped structures [3]. Through this work
and subsequent review discussions, the eager/lazy problem became
visible [4].

Ayush's work also surfaced the getter/setter debate. When he
introduced getter and setter functions for repo_settings fields,
reviewers pointed out they added no value without calling
prepare_repo_settings() internally. From this discussion, Junio
suggested two approaches for repo_settings variables that must
not be mixed [5]:

- Common variables: populated in prepare_repo_settings(), accessed
  directly via repo->settings.foo. No getter, no setter.
- Rare variables: prepare_repo_settings() does not touch the field.
  A lazy getter checks a sentinel value (e.g. -1), reads from
  config on first access, and caches the result.

The appropriate pattern for each variable will require reasoning
and discussion on the mailing list.

Phillip Wood suggested a third approach: passing a
repository pointer through git_default_config() via the void *cb
callback data parameter, so handlers can populate per-repo structs
without touching globals [6].

Building on these lessons, Olamide Bello during the Outreachy
program introduced struct repo_config_values [2], a structure
linked to struct repository that stores eagerly parsed configuration
values while preserving their startup-time error detection. An
accessor function repo_config_values() enforces safety by preventing
access from uninitialized repositories and guarding against access
from secondary repository instances that do not yet have their
config populated.

So we now have two structs living inside struct repository:
repo_settings for lazily parsed variables, and repo_config_values
for eagerly parsed variables.

Approach
--------
I will follow the pattern established in Olamide Bello's approved
patch series [2], which provides the concrete workflow for each
variable:

1. Add a new field to struct repo_config_values in environment.h.
2. Initialize the field in repo_config_values_init().
3. Update the config callback: get cfg via
   repo_config_values(the_repository), write to cfg->field instead
   of the global.
4. Update all call sites: replace the global with cfg->field.
5. Remove the global from environment.c and the extern from
   environment.h.
6. Run tests and check fuzz targets.

This workflow is not purely mechanical. Each variable requires
case-by-case analysis:

- Is the variable per-repository? Some variables like
  editor_program are user preferences. As Phillip Wood asked [7]:
  "Why would I want to use different editors for different
  repositories in the same process?" Variables where per-repo
  scoping does not make semantic sense may be better handled by
  localizing them to their subsystem.
- How deep is the call chain? As preparation for this proposal, I
  traced askpass_program end-to-end. It has a single reader in
  prompt.c, which looks simple. But git_prompt() is called from
  two paths: the credential system and the bisect system. The
  difficulty of a variable is not about reader count — it is
  about call chain depth.
- Are there initialization ordering constraints? Some variables
  like is_bare_repository_cfg are set during .git directory
  discovery, before struct repository is fully initialized.
  Moving them into the repository struct creates a chicken-and-egg
  problem that requires design discussion on the mailing list.

The macro #define USE_THE_REPOSITORY_VARIABLE, introduced by
Patrick Steinhardt [8], controls access to the_repository
global. The macro serves both as a migration indicator and a
technical gate. When all globals in a file have been migrated
and all functions receive struct repository * explicitly,
the macro can be removed.

Following Stolee's two-step migration model [9], I will first
move variables into repo_config_values using the_repository
(Step 1: safe, mechanical, no behavior change). For selected
variables with shallow call chains, I will also thread struct
repository *repo through callers to begin replacing direct
the_repository usage (Step 2).

I propose a dual approach for organizing the work:

- Variable-focused migration: move environment.c globals into
  repo_config_values following Bello's pattern. This is the
  primary track. For each variable, I classify it, trace readers,
  migrate it, and remove the global.
- File-focused cleanup: for files where only a few the_repository
  usages remain after variable migration, complete the cleanup
  and remove USE_THE_REPOSITORY_VARIABLE entirely. This is a
  natural side effect of the first track.

Some variables may need a hybrid approach: when a variable is
used across many files but heavily concentrated in one subsystem,
it may make sense to migrate it alongside other globals in that
subsystem rather than in isolation.

The two tracks reinforce each other: migrating a variable often
removes the last reason a file needs the macro.

Timeline
--------
Project size: 175 hours.

Community Bonding (May 1 - May 25):
- Discuss project direction and design approaches with mentors.
- Study Bello Caleb's and Ayush Chandekar's patches in depth.
  Review remaining repo_config_values work and identify
  unfinished tasks.
- Identify and prioritize two main areas of work:
  + Variables in environment.c to migrate into repo_config_values.
  + Files where USE_THE_REPOSITORY_VARIABLE can be removed.
- Submit an RFC patch following Bello's pattern to validate
  the workflow before the coding period begins.

Coding Period (May 26 - August 16):
- Start with straightforward variables: those with few readers,
  clear per-repository semantics, and simple parsing logic
  (e.g., boolean flags and integer configs).
- Progressively move to more involved variables with deeper call
  chains, string-type values, or dependencies on other variables.
- Apply the dual approach described above:
  + Variable-focused migration: classify, trace, migrate, and
    remove globals following Bello's pattern.
  + File-focused cleanup: where variable migration removes the
    last global dependency in a file, complete the cleanup and
    remove USE_THE_REPOSITORY_VARIABLE.
- Submit small patch series (3-5 patches each) frequently to
  respect reviewers' time and maintain steady velocity.
- Maintain two parallel series: one in review and one being
  written, to account for review cycle delays.
- Continuously iterate: incorporate mailing list feedback,
  reroll patches (v2/v3), and refine the approach based on
  community input.
- Publish weekly or biweekly blog updates documenting progress
  and design decisions.

Final period (August 17 - August 24):
- Address any remaining tasks or pending patches.
- Run full test suite with AddressSanitizer to verify no
  memory issues were introduced.
- Update internal documentation.
- Receive final feedback from mentors and reviewers.
- Prepare and submit the final project report.

A 30% buffer is built into the schedule to account for
unexpected review delays and design discussions.

Blogging
--------
I believe blogging is an important part of growing as a developer
and an effective way to learn, because writing forces you to
truly understand what you are working on.

I plan to publish weekly updates documenting my journey through this
project: progress, design decisions, challenges, and lessons
learned. I also want these posts to serve as a valuable resource
for anyone who, like me today, will look for guidance on
contributing to Git or to open source projects in general.

Availability
------------
Git will be my top priority. I have no other commitments
scheduled during the GSoC period, so I will be able to work on
this full-time. In fact, I plan to devote 35–40+ hours per week
to the Git project. My preferred working window is 9:00-18:00 CET.

Post-GSoC
---------
Contributing to Git has been an invaluable experience.
Not only on a personal level—because it pushed me out of my
comfort zone and challenged me—but also, and above all, on a
professional level. The feeling of working on code used by millions
of developers and companies around the world is incredibly rewarding.

This iterative process of discussions, writing code, and receiving
feedback helps you grow tremendously as a developer—and
especially quickly.

Being exposed to a codebase like Git’s forces you to think much more
deeply, to understand how everything works and how it connects
to the rest of the program. For these reasons, I intend to continue
working on Git even after GSoC by contributing patches, participating
in discussions, and reviewing new members’ code.

Furthermore, this refactoring process is a long-term effort,
and I’d like to keep working on it.

References
----------
[1] https://github.com/frapaparatto/cgit
[2] https://lore.kernel.org/git/cover.1768217572.git.belkid98@gmail.com/
[3] https://lore.kernel.org/git/20250603131806.14915-1-ayu.chandekar@gmail.com/
[4] https://lore.kernel.org/git/17b7f51c-0c3d-4d63-a501-47ce829f7345@gmail.com/
[5] https://lore.kernel.org/git/xmqqbjquge0c.fsf@gitster.g/
[6] https://lore.kernel.org/git/d61c966b-61ae-4ba9-b983-c8dab6e2c292@gmail.com/
[7] https://lore.kernel.org/git/8e657184-ee0b-453a-9f2d-a98080d3582e@gmail.com/
[8] https://lore.kernel.org/git/cover.1718347699.git.ps@pks.im/
[9] https://lore.kernel.org/git/47d09c43-6d27-40ff-8dbc-22cc4a5949ed@gmail.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [GSOC][PROPOSAL]: Refactoring in order to reduce Git’s global state
@ 2026-03-06 14:57 Shreyansh Paliwal
  2026-03-07 10:33 ` Christian Couder
  0 siblings, 1 reply; 8+ messages in thread
From: Shreyansh Paliwal @ 2026-03-06 14:57 UTC (permalink / raw)
  To: git
  Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
	siddharthasthana31

Hello all,

This is my first draft of GSoC 2026 proposal for the project
'Refactoring in order to reduce Git’s global state'.

Doc version can be read at:
https://docs.google.com/document/d/16MRNUv6dJi6vtNvI5Ro0WmHf20dRRBHjFLpmhAuaUOA/edit?usp=sharing

Any feedback or suggestions would be greatly appreciated.

Thanks for reading.
---

Refactoring in order to reduce Git's global state

Personal Information:
---------------------

Name: Shreyansh Paliwal
Email: Shreyanshpaliwalcmsmn@gmail.com
Alternate Email: Shreyansh.01014803123@it.mait.ac.in
Mobile No.: +91-9335120023

Education: GGSIPU, New Delhi, India
Year: III / IV
Degree: Bachelor of Technology in Information Technology

Github: https://github.com/shreyp135
Time-zone: UTC +5:30 (IST)

About Me:
---------

I am Shreyansh Paliwal, a pre-final year undergraduate student at Guru
Gobind Singh Indraprastha University, New Delhi, India. I am a technology
enthusiast, who began programming in 2018 with Java as my first language
and later transitioned to C/C++ in 2023 as my primary focus. I enjoy
exploring new technologies and programming languages, and I have developed
solid experience building applications using TypeScript, React.js, Node.js,
and AWS. I actively participate in technical events and have organized
multiple hackathons, tech-fests, and related activities at my college as
the SIG-Head of IOSD, a tech-focused student community.

I started using Git in 2023, which is also when I made my first open-source
contribution to the Git project. I was a winner of Augtoberfest 2024, an
open-source competition organized by C4GT India. Over the past several
months, I have been involved with the Git project, studying the codebase,
submitting patches, and incorporating review feedback. I am motivated to
improve the experience of Git for end users, and this project is an
excellent opportunity to continue that work.

Overview:
---------

Git relies heavily on global state for managing environment variables and
configuration data. In particular, many parts of the codebase depend on the
global struct repository instance, the_repository, which represents the
currently active repository. Instead of passing a repository instance
explicitly, several internal functions implicitly rely on this global
object. Additionally, various configuration derived values and
environment-related variables such as the_hash_algo, default_abbrev, and
comment_line_str are stored globally, most of them defined in
environment.c.

This design assumes that only one repository is active within a process at
a time. As a result, the repository state becomes shared across the entire
process, weakening isolation and making behavior implicitly dependent on
global context. Such global dependencies make the code harder to reason
about, test, and maintain, and can introduce subtle bugs when operations
interact with multiple repositories. They also limit long-term goals such
as safely supporting multiple repositories within a single process and
continuing Git’s ongoing libification efforts.

To address these issues, global environment and configuration state should
be refactored into better-scoped contexts. Repository-specific data can be
moved into struct repository or related structures, while
subsystem-specific state should be localized appropriately. Passing
repository instances explicitly through function interfaces will improve
modularity, reduce hidden dependencies, and make the codebase easier to
maintain while moving Git closer to supporting multiple repositories safely
within a single process.

The difficulty of this project is medium, and it is estimated to take 175
to 350 hours.

Pre-GSOC:
---------

I first explored the Git codebase in December 2023, when I submitted a
small patch fixing the wording of an error message that I noticed while
browsing the source code. At that time I had recently started using Git and
GitHub for version control in my projects, which sparked my curiosity about
how Git works internally.

A few months ago, when I had some free time from college, I decided to
start contributing to Git more actively. I built Git from source, read
parts of the documentation, and familiarized myself with the mailing list
workflow. While going through the documentation, I noticed a few
inconsistencies in the MyFirstContribution page and submitted patches to
fix them. I also completed a microproject involving a test cleanup, and
later worked on adding a warning for a quiet fallback.

During this process, I attempted to remove the usage of the_repository from
a file. However, after discussion on the mailing list, Phillip pointed out
that the change was not particularly useful in that context and could
introduce segfaults that would not justify the effort for builtin code.
Based on this feedback, I dropped that attempt and instead focused on
understanding the broader global state refactoring effort. To better
understand the project area, I studied previous patches and blog posts by
Ayush Chandekar and Olamide Bello, followed discussions on the mailing
list, and explored parts of the codebase such as the wt-status and worktree
subsystems. This helped me understand the ongoing effort to reduce Git’s
reliance on global state and motivated me to work further in this area.

The following is a list of my contributions, ordered from earliest to most
recent:

Patches for Git:
----------------

* test-lib-functions.sh: fix test_grep fail message wording
        Status: Merged into master
        Mailing List: https://lore.kernel.org/git/20231203171956.771-1-shreyanshpaliwalcmsmn@gmail.com/
        Merge Commit: 37e8d795bed7b93d3f12bcdd3fbb86dfe57921e6
        Log: This was my first patch to Git in 2023. While browsing the
                 source code and past issues, I noticed that even after
                 the test_i18ngrep function was deprecated, an error message
                 referring to test_grep was left behind. I updated the
                 wording to correctly reference test_i18ngrep.

* doc: MyFirstContribution: fix missing dependencies and clarify build steps
        Status: Merged into master
        Mailing List: https://lore.kernel.org/git/20260112195625.391821-1-shreyanshpaliwalcmsmn@gmail.com/
        Merge Commit: 81021871eaa8b16a892b9c8791a0c905ab26e342
        Log: While getting familiar with the codebase, I followed the
                 MyFirstContribution documentation and encountered a few
                 issues. Some include headers were missing, the synopsis
                 format was incorrect, and the explanation for -j$(nproc)
                 was absent. I submitted fixes to improve the clarity and
                 correctness of the documentation.

* t5500: simplify test implementation and fix git exit code suppression (Microproject)
        Status: Merged into master
        Mailing List: https://lore.kernel.org/git/20260121130012.888299-1-shreyanshpaliwalcmsmn@gmail.com/
        Merge Commit: a824421d3644f39bfa8dfc75876db8ed1c7bcdbf
        Log: This was completed as a microproject for GSoC. Instead of 
                constructing the pack protocol using a complex combination
                of here-docs and echo commands, the patch captures command
                outputs beforehand and uses the test-tool pkt-line pack
                helper to construct the protocol input in a temporary file
                before feeding it to git upload-pack.

* show-index: add warning and wrap error messages with gettext
        Status: Merged into master
        Mailing List: https://lore.kernel.org/git/20260130153603.290196-1-shreyanshpaliwalcmsmn@gmail.com/
        Merge Commit: ea39808a22714b8f61b9472de7ef467ced15efea,
                227e2cc4e1415c4aeadceef527dd33e478ad5ec3
        Log: While exploring the code, I noticed a TODO comment suggesting
                automatic hash detection. After discussion on the mailing
                list, it was concluded that there was no future-proof
                approach to implement this until a new index file format
                came into use. Instead, an explicit warning was added rather
                than silently falling back to SHA-1. Additionally, several
                error messages were missing gettext wrapping, which was also
                fixed.

* wt-status: reduce reliance on global state
        Status: Merged into seen
        Mailing List: https://lore.kernel.org/git/20260218175654.66004-1-shreyanshpaliwalcmsmn@gmail.com/
        Merge Commit: a7cd24de0b3b679c16ae3ee8215af06aeea1e6a3,
                9d0d2ba217f3ceefb0315b556f012edb598b9724,
                4631e22f925fa2af8d8548af97ee2215be101409
        Log: This has been the most significant patch series in my journey
                so far. It began with a suggestion from Phillip to clean up
                some the_repository usages in wt-status.c. I extended the
                effort to remove all usages of the_repository and
                the_hash_algo from the file. During review discussions, it
                was suggested that some worktree API cleanup should happen
                first, particularly regarding the representation of worktrees
                as NULL. Some related changes were later moved to a separate
                series, after which this refactoring proceeded.

* worktree: change representation and usage of primary worktree
        Status: Continued by Phillip Wood [1]
        Mailing List: https://lore.kernel.org/git/20260213120529.15475-1-shreyanshpaliwalcmsmn@gmail.com/
        Log: This worktree API cleanup series started while I was working
                on wt-status. The intention was to modify the representation
                of the current worktree so that struct worktree would not be
                NULL. During discussion, Phillip clarified that NULL actually
                represents the current worktree rather than the primary
                worktree. Since Phillip already had a patch based on the right
                logic, he continued the series and it was eventually merged
                into master.

* tree-diff: remove the usage of the_hash_algo global
        Status: Merged into master
        Mailing List: https://lore.kernel.org/git/20260220175331.1250726-1-shreyanshpaliwalcmsmn@gmail.com/
        Merge Commit: 1e50d839f8592daf364778298a61670c4b998654
        Log: This was a straightforward patch that removed the remaining
                usages of the global the_hash_algo in tree-diff.c by using the
                repository’s local instance instead.

* send-email: UTF-8 encoding in subject line
        Status: Merged into seen
        Mailing List: https://lore.kernel.org/git/20260228112210.270273-1-shreyanshpaliwalcmsmn@gmail.com/
        Merge Commit: c52f085a477c8eece87821c5bbc035e5a900eb12
        Log: This patch was motivated by an issue I personally encountered
                while sending a GSoC discussion email [2]. Initially the
                change only modified the wording of the prompt, but after
                discussion on the mailing list it was extended to include
                proper validation to prevent invalid charset encodings from
                being used in git send-email and to reduce confusion.

* Remove global state from editor.c
        Status: Waiting for further feedback
        Mailing List: https://lore.kernel.org/git/20260301105228.1738388-1-shreyanshpaliwalcmsmn@gmail.com/
        Log: This was based on my doubt on localizing editor_program in
                editor.c [2]. The patch received mixed feedback from
                contributors and is currently awaiting additional guidance
                from mentor and/or maintainer regarding the appropriate
                direction.

Patches for git.github.io:
--------------------------

* SoC-2026-ideas: Remove an extra backtick
        Status: merged into master
        PR Link: https://github.com/git/git.github.io/pull/831
        Merge Commit: c1e4aa87a54430953eaa7355061139fdf1ff6796
        Log: Minor Typo fix.

* rn-132: fixed 2 typos
        Status: merged into master
        PR Link: https://github.com/git/git.github.io/pull/832
        Merge Commit: 92876114d855d472ce2e0e5337e72a4b97b81681
        Log: Fixed typos in Git Rev News Edition 132.

I have also been involved in additional discussions on the Git mailing
list [3][4][5][6].

History / Background:
--------------------

Efforts to reduce Git’s reliance on global state started when several Git
subsystems began moving toward libification, where Git’s internal
functionality could be reused as a library. Early examples of this
direction include major patch series such as the libification of git
mailinfo by Junio [7] and git apply by Christian [8]. These large patch
series exposed the limitations of relying on process-wide global state and
highlighted the need for better encapsulation of repository-related data.

One important step in this direction was the introduction of struct
repository, through refactoring work by Stefan Beller [9] and Brandon
Williams [10]. The motivation behind this structure was to centralize
repository-related state instead of relying on scattered global variables.
This change improved code clarity and made it easier to reason about Git’s
internal behavior. It also laid the groundwork for future improvements such
as safer multithreading and the possibility of handling submodules within
the same process. Later, additional refactoring work by Patrick further
removed reliance on the global the_repository in config [11] and path [12]
subsystems. As part of this work, several variables were consolidated into
environment.c from config.c so that environment-related state could be
managed in a single location [13]. The macro #define
USE_THE_REPOSITORY_VARIABLE was also introduced to help transition code
away from implicit global repository access [14].

This project area was further explored during GSoC 2025 by Ayush Chandekar
[15], who continued removing usages of the_repository across different parts
of the codebase and relocated several global configuration variables (such as
core_preload_index and merge_log_config) into repository-scoped structures.
More recently, Olamide Bello, during the Outreachy program, made significant
progress in improving how configuration values are stored [16] [17]. His work
introduced a new structure, repo_config_values, which stores repository
specific configuration values, linked to struct repository. This allows
configuration values to be associated with a specific repository instance
rather than stored globally. Along with this, a private structure
config_values_private was added to support initialization and internal
handling of these values. During discussions around these changes, an
important design consideration also emerged, moving global variables directly
into repository structures or introducing lazy loading helpers can lead to
user experience regressions if configuration errors are detected later.

These efforts collectively form the foundation of the ongoing work to
gradually remove Git’s reliance on global state and move toward a more
modular, repository-scoped architecture.

Proposed Plan:
-------------

I started exploring the codebase by browsing relevant files and identifying
global variables by temporarily removing the USE_THE_REPOSITORY_VARIABLE
macro. My primary focus was on core library files rather than builtin code
[18]. Through this exploration, I observed that a large number of files still
depend on the_repository.

To tackle this project systematically, I propose classifying these files into
two categories:

1. Files using the_repository or the_hash_algo where a repository instance
   already exists: These files rely on global variables even though a
   struct repository instance is available somewhere in the call stack. In
   such cases, the refactor primarily involves passing the repository
   instance through the function call stack and replacing the global
   usages. In some cases, a repository instance may not be directly
   available in the file itself. In those situations, I will trace the
   callers and propagate repository instances from higher levels in the call
   hierarchy. Examples of such files include, alias.c, archive*.c,
   walker.c, xdiff-interface.c. These cases generally require localized
   refactoring and are good candidates for incremental patches.

2. Files relying on other global variables defined in environment.c: Some
   files rely on additional global variables which are parsed and accessed
   through environment.c. In these cases, there is no existing
   repository-scoped instance, which makes refactoring slightly more
   technical. Examples include, wt-status.c (default_abbrev,
   comment_line_str), apply.c (has_symlink, ignore_case,
   trust_executable_bit, apply_default_whitespace,
   apply_default_ignorewhitespace). For such variables, I plan to evaluate
   whether they should be moved into a repository-scoped structure (e.g.,
   repo_settings, repo_config_values), or they should instead be localized
   and passed explicitly where needed. The appropriate approach will depend
   on how widely the variable is used and whether it logically fits in the
   multi-repository standpoint.

I plan to begin with the first category, addressing straightforward
refactors file by file. In parallel, I will analyze and work on specific
groups of global variables from the second category, designing appropriate
repository-scoped replacements.

The end goal is to remove reliance on global state and eventually eliminate
the USE_THE_REPOSITORY_VARIABLE macro from these files.

Project Timeline:
----------------

* Community Bonding (Until May 24):
        - Discuss the project direction and design approaches with mentors.
        - Identify and prioritize two main areas of work:
                + files that rely on the_repository.
                + global variables defined in environment.c.
        - Study the previous patches by Olamide Bello and Ayush in depth and
                 also discuss with them about their approaches and challenges.
        - Interact with all the people involved in this work to better
                 understand design decisions and potential pitfalls.
        - Experiment with small RFC patches, if needed to validate approaches.

* Coding period (May 25 - August 16):
        - Review the work done by Olamide Bello on moving values parsed by
                 git_default_config() into the repo_config_values structure and
                 identify any remaining tasks.
        - Complete remaining cleanup or refactoring related to the worktree API,
                 if left any [19].
        - Identify straightforward refactors to remove usages of the_repository
                 in files such as xdiff-interface.c, archive*.c, fsmonitor*.c etc.
        - Work file by file with the goal of eliminating
                 #define USE_THE_REPOSITORY_VARIABLE by replacing global usages
                 with explicit repository instances.
        - Concurrently maintain at least two parallel patch series:
                + Small / straightforward refactors and replacements like
                         the_hash_algo or the_repostitory.
                + Larger structural refactors involving globals such as
                         DEFAULT_ABBREV, comment_line_str etc.
        - Publish weekly or biweekly blog updates documenting progress and design
                 decisions.

* Final week (august 17 - august 24):
        - Address any remaining tasks or pending patches.
        - Recieve final feedback from mentors and reviewers.
        - Prepare a detailed report summarizing the work completed during the project.

Blogging:
---------

I believe blogging is an important part of any open-source project. It
helps others understand the ongoing work and also enables the contributor
to develop a deeper understanding and keep a better track of their own
progress. I experienced this firsthand, early in my journey I was unsure
about various aspects, but reading the blogs of Ayush and Olamide Bello
gave me valuable insight into the contributor perspective and their overall
work.

With the goal of helping future contributors in a similar way, I plan to
document my journey and project progress through regular blog posts. I will
publish updates on a weekly or biweekly basis, depending on the amount of
meaningful progress made. I have set up my blogging area on Medium, and my
posts will be available at [20].


Availability:
-------------

The main coding period runs from June to August. Most of June and July
coincide with my summer vacation, which allows me to dedicate significant
time to the project. My final exams are scheduled for May and will last
approximately one week, but they will be completed before the coding period
begins and should not affect my availability.

During June and July, I will be able to dedicate around 40 hours per week to
the project. In August, when my regular semester resumes, I expect to
contribute approximately 25–30 hours per week.

I do not have any other exams, internships, or planned vacations during the
coding period. Apart from this project, I have no other major commitments
for the summer.

I will keep the community regularly updated on my progress throughout the
project. My primary mode of communication will be email, and I will also be
available for calls or meetings if/when required. My preferred availability
window is 13:00–19:00 UTC.

Post GSoC:
----------

Being part of the Git community and contributing to the codebase has been a
very valuable experience for me. The process of understanding Git’s internals,
submitting patches, and receiving feedback on the mailing list has helped me
grow significantly as a developer. The feeling of working on code that is used
by millions of developers and companies around the world is very rewarding.

I plan to remain involved with the Git community even after GSoC by continuing
to contribute patches, review code, and participate in discussions to help make
Git better for end users. The work on refactoring Git’s global state is part of
a long-term effort, and I would love to continue working on it beyond the GSoC
timeline.

I would also be happy to mentor, co-mentor, or volunteer in the future to help
new and upcoming contributors whenever I get the chance. I see GSoC as the
starting point of a long-term relationship with the Git community.

Closing & Appreciation:
-----------------------

I would like to thank the Git community for the excellent documentation and the
welcoming environment. I am also grateful for the patience and guidance shown
in the feedback and discussions on the mailing list by Junio, Phillip, Karthik,
Ben, and others, which have helped me improve my understanding and contributions.

I also read blogs and proposals by Ayush, Lucas, Kousik Sanagavarapu, and Olamide
Bello, which provided valuable insights and helped shape my approach to contributing.

Thank you for reviewing my proposal :)

References:
-----------

[1]- https://lore.kernel.org/git/cover.1771511192.git.phillip.wood@dunelm.org.uk/

[2]- https://lore.kernel.org/git/20260304145823.189440-1-shreyanshpaliwalcmsmn@gmail.com/T/#m65b9b4547036991a7b7f3c861b9663428891f588

[3]- https://lore.kernel.org/git/20260114143238.536312-1-shreyanshpaliwalcmsmn@gmail.com/

[4]- https://lore.kernel.org/git/20260115211609.17420-1-shreyanshpaliwalcmsmn@gmail.com/

[5]- https://lore.kernel.org/git/20260204111343.71975-1-shreyanshpaliwalcmsmn@gmail.com/

[6]- https://lore.kernel.org/git/20260205131132.44282-1-shreyanshpaliwalcmsmn@gmail.com/

[7]- https://lore.kernel.org/git/1444778207-859-1-git-send-email-gitster@pobox.com/

[8]- https://lore.kernel.org/git/20160511131745.2914-1-chriscool@tuxfamily.org/

[9]- https://lore.kernel.org/git/20180205235508.216277-1-sbeller@google.com/

[10]- https://lore.kernel.org/git/20170531214417.38857-1-bmwill@google.com/

[11]- https://lore.kernel.org/git/cover.1715339393.git.ps@pks.im/

[12]- https://lore.kernel.org/git/20250206-b4-pks-path-drop-the-repository-v1-16-4e77f0313206@pks.im/

[13]- https://lore.kernel.org/git/20250717-pks-config-wo-the-repository-v1-20-d888e4a17de1@pks.im/

[14]- https://lore.kernel.org/git/cover.1718347699.git.ps@pks.im/

[15]- https://ayu-ch.github.io/2025/08/29/gsoc-final-report.html

[16]- https://cloobtech.hashnode.dev/week-5-and-6-design-reviews-rfcs-and-refining-the-path-forward

[17]- https://lore.kernel.org/all/cover.1771258573.git.belkid98@gmail.com/

[18]- https://lore.kernel.org/git/7b5dd0c4-0ca0-458e-89db-621a70dac9ae@gmail.com/

[19]- https://lore.kernel.org/git/20260217163909.55094-1-shreyanshpaliwalcmsmn@gmail.com/

[20]- https://medium.com/@shreyanshpaliwal18

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-24 19:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17 17:54 [GSoC Proposal] Refactoring in order to reduce Git's global state Francesco Paparatto
2026-03-21 13:36 ` Christian Couder
2026-03-21 13:56   ` Francesco Paparatto
2026-03-21 16:32   ` Junio C Hamano
2026-03-24 19:31 ` [GSoC Proposal v2] " Francesco Paparatto
  -- strict thread matches above, loose matches on Subject: below --
2026-03-06 14:57 [GSOC][PROPOSAL]: Refactoring in order to reduce Git’s " Shreyansh Paliwal
2026-03-07 10:33 ` Christian Couder
2026-03-07 12:46   ` Shreyansh Paliwal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox