public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Tian Yuchen <a3205153416@gmail.com>
To: git@vger.kernel.org
Cc: Christian Couder <christian.couder@gmail.com>,
	Karthik Nayak <karthik.188@gmail.com>,
	Justin Tobler <jltobler@gmail.com>,
	Ayush Chandekar <ayu.chandekar@gmail.com>,
	Siddharth Asthana <siddharthasthana31@gmail.com>
Subject: [GSoC][Draft Proposal V2] Refactoring in order to reduce Git's global state
Date: Mon, 23 Feb 2026 09:07:23 +0800	[thread overview]
Message-ID: <1bbafedb-b87b-4f1c-bce3-59089ac1ff8b@gmail.com> (raw)
In-Reply-To: <ab45758c-fbcf-42b2-96df-030eef8526c3@gmail.com>

Hello everyone,

I'm Tian Yuchen and I'm planning to apply for GSoC project this year. I 
hope you can take the time to review my proposal.

Please feel free to leave feedback!

Google Docs link:

https://docs.google.com/document/d/1t2sznOvnPz-9tOzVMH--pLxzRqYSJCFzqVWBVfL_NP8/edit?tab=t.0#heading=h.c3c40ftj1ilv




Refactoring in order to reduce Git's global state
=================================================

PERSONAL INFORMATION
--------------------
Name: Tian Yuchen
E-mail: a3205153416@gmail.com
Phone number: +65 98740318
Time-zone: UTC + 08:00
Github: https://github.com/malon7782

Education: NTU, Singapore
Year: Year 1 semester 2
Degree: Electrical and Electronic Engineering (EEE)


PRE GSOC
--------
I have always held a deep passion for the open-source community. 
Although I wasn't a computer science major, I tinkered with open-source 
projects long before college. I have solid hands-on experience in C 
programming and system-level debugging.

I use Ubuntu 24.04 on a daily basis, so I am proficient in using the 
Linux command line and CLI tools.

I have contributed to the Git community by sending patches. Since my 
first commit (17/1/2026), I have maintained a nearly daily contribution. 
Here is the list of contributions I have made:

* [PATCH v1] t1005: modernize "! test -f" to "test_path_is_missing"
  
https://lore.kernel.org/git/20260117062515.319664-1-a3205153416@gmail.com/
   This patch is my microproject, the first contribution I made to the 
codebase.
   [Graduated to 'master']

* [PATCH v2] t2203: avoid masking exit codes in git status
  
https://lore.kernel.org/git/20260118043537.338769-1-a3205153416@gmail.com/#t

* [PATCH v2] symlinks: use unsigned int for flags
  
https://lore.kernel.org/git/20260120152219.398999-1-a3205153416@gmail.com/
   [Will merge to 'next']

* [PATCH v4] t/perf/p3400: speed up setup using fast-import
  
https://lore.kernel.org/git/20260130170123.642344-1-a3205153416@gmail.com/
   [Will merge to 'master']

* Re: [PATCH] [RFC] attr: use local repository state in read_attr
  
https://lore.kernel.org/git/cc2f400e-49c2-4de0-9c51-9a5c0294735e@gmail.com/
   Code review. To verify the performance loss, I wrote a test script to
   measure the time difference before and after the modification.

* Re: Bug: git add :!x . exits with error when x is in .gitignore
  
https://lore.kernel.org/git/1d560aa1-d452-47f5-aaf2-4cb1ccdab100@gmail.com/
   Code review. Pointed out logical error.

* [PATCH v10] setup: allow cwd/.git to be a symlink to a directory
  
https://lore.kernel.org/git/20260220164512.216901-1-a3205153416@gmail.com/
   In progress.
   After over half a month of discussions, repeated refactoring, and code
   reviews, I delved deep into setup.c. I gained insights into Git's 
design philosophy, and learned the art of striking a balance in 
developer communication. It took me a large amount of time and effort to 
thoroughly understand every line of the code. I often found myself 
poring over the call chain of a single function well into the night.... 
But I persevered until the end, and I believe my patience will see me 
through even larger projects.


ABOUT THE PROJECT
-----------------

-- Synopsis

As far as I know, the Git community is actively working towards 
'libification' - making Git's internal machinery reusable as a C 
library. The extensive reliance on global state is a major roadblock to 
this goal.

Many core functions implicitly read environment variables and store them 
in global static variables. This can cause several issues:

   1. Global variables prevent Git's core functions from being executed 
safely in multi-threaded contexts.
   2. When Git is called multiple times within the same process, global 
states can lead to memory leaks or incorrect behaviors.
   3. Unit testing becomes difficult because the environment must be 
artificially manipulated before calling functions.

Take a look at this example from environment.c:

     206 const char *get_commit_output_encoding(void)
     207 {
     208     return git_commit_encoding ? git_commit_encoding : "UTF-8";
     209 }

If Git is invoked as a C library by a multi-threaded server:
- Thread A formats a commit for Repo A (using GBK);
- Thread B concurrently formats a commit for Repo B (using UTF-8);

Then they will race to read and overwrite the exact same global
`git_commit_encoding` pointer, which is not what we expect. Therefore,
we have to refactor these environment variables by moving them from
global scope into a well-defined and encapsulated context.


-- Approach

The task at hand can be summed up in one sentence: repackage the global
variables into the `struct repository` structure. In other words:

     [ Current ]
     Core functions --------reads-------> Global variables (via getenv)
                                          [Thread unsafe]

     [ Target ]
     Core functions ----passes context--> struct repository
                                                 | owns
                                                 v
                                          struct git_env

Although the principle is simple, the scope of changes is extensive. The
following three-step approach can serve as a guiding principle for it:

   1. Identify isolated environment variables currently residing in the
      global scope. Introduce a dedicated structure to hold these states,
      e.g. `struct git_env` within the `struct repository`.
   2. Modify the function signatures within the call chain to accept the
      context, e.g., `struct repository *repo`, instead of relying on
      implicit globals. External callers of the functions must be
      carefully audited to prevent regressions.
   3. Safely remove the old global variables and macro definitions. Tools
      such as AddressSanitizer can be helpful to ensure that the new
      struct-based lifecycle introduces zero memory leaks.

Additionally, given the anticipated high volume of commits, we must 
ensure each patch is independent and atomic, preventing any 
user-untraceable or unexplainable bugs from occurring in the codebase at 
any state.


AVAILABILITY
------------
Fortunately, my summer vacation coincides with the GSoC work period.
I will treat this project as my primary focus, dedicating a minimum of
35 hours per week. If needed, I can work a 9-to-5 schedule.

I will have a significant head start to draft RFC patches before the
official coding period even begins. Having this buffer period allows me
to go through the rigorous code review process within the Git community
with greater ease.


TIMELINE & MILESTONES
---------------------
Considering the differences between this project and other projects on 
the idea list, rather than hoarding massive changes, I will submit 
3-to-5-patch series frequently to respect reviewers' time and maintain a 
steady velocity.

Below is the tentative schedule I have prepared for myself:

* Community Bonding (May 1 - May 25): Planning & RFC
   - May 1 - May 7: Wrap up university finals. Discuss and finalize the
     prioritized list of subsystems with my mentor.
   - May 8 - May 25: Define the core context container. Draft and submit
     the initial RFC patch series for this new data structure.

* Phase 1 (May 26 - July 10): Foundation
   - Weeks 1-2: Plumb the context pointer (`struct repository *repo`) 
through call chains for simple variables (e.g., boolean flags or integer 
configs).
   - Weeks 3-4: Audit and update external callers to use the new API.
   - Weeks 5-6: Submit the first major refactoring patch series. Address
     mailing list feedback and resolve merge conflicts. (Midterm Evaluation)

* Phase 2 (July 11 - August 18): Complex Migration & Cleanup
   - Weeks 7-8: Refactor higher-complexity variables (e.g., path-related 
globals).
   - Weeks 9-10: Compile the codebase with AddressSanitizer and run the 
full test suite to execute strict memory leak checks.
   - Weeks 11-12: Remove unused global macro definitions and static 
variables. Update internal documentation and write the final GSoC report.

(The above is for reference only. Personally, I always finish tasks 
faster than planned ;)


~$ git checkout HEAD@{postGSoC}
-------------------------------
This past month since joining the Git community has been the most 
enjoyable month of my programming journey. To quote a close friend of 
mine (who is applying for the Neovim GSoC project):

   "Only fools chase trends; open source is the game for the brave."

The words may be blunt, but the logic holds true. This statement surely
resonates with me (and maybe many other GSoC contributors): our passion
for code and open-source drives us forward.

Even if I didn't make the cut, so what? ~$ git reset --hard...
Just kidding. The Git codebase is far too interesting to abandon now.

-------------------------------------------------------------------------
Changes since V1:

  - Transfer the text from Google Docs to here.



Regards,

Yuchen

  parent reply	other threads:[~2026-02-23  1:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-22 17:59 [GSoC][Draft Proposal] Refactoring in order to reduce Git's global state Tian Yuchen
2026-02-22 18:34 ` Usman Akinyemi
2026-02-23  0:57   ` Tian Yuchen
2026-02-23  1:07 ` Tian Yuchen [this message]
2026-02-25 17:11 ` [GSoC][Draft Proposal v3] " Tian Yuchen
2026-02-26  9:27   ` Karthik Nayak
2026-02-26 14:03     ` Tian Yuchen
2026-02-26 14:16     ` Tian Yuchen
2026-02-26 17:02   ` [GSoC][Draft Proposal v4] " Tian Yuchen
2026-02-27  9:03     ` Phillip Wood
2026-02-27 15:07       ` Tian Yuchen
2026-02-27 16:58     ` Tian Yuchen
2026-03-01 16:43       ` Phillip Wood
2026-03-01 16:58         ` Tian Yuchen
2026-03-02 19:06         ` Junio C Hamano
2026-03-03 12:11       ` [GSoC][Draft Proposal v6] " Tian Yuchen
2026-03-08 17:38         ` [GSoC][Draft Proposal v7] " Tian Yuchen
2026-03-14 17:57           ` Tian Yuchen
  -- strict thread matches above, loose matches on Subject: below --
2026-03-08 11:40 [GSoC Draft Proposal] " Burak Kaan Karaçay
2026-03-15  9:52 ` [GSoC Draft Proposal v2] " Burak Kaan Karaçay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1bbafedb-b87b-4f1c-bce3-59089ac1ff8b@gmail.com \
    --to=a3205153416@gmail.com \
    --cc=ayu.chandekar@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=siddharthasthana31@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox