From: Phillip Wood <phillip.wood123@gmail.com>
To: Tian Yuchen <a3205153416@gmail.com>, git@vger.kernel.org
Cc: Christian Couder <christian.couder@gmail.com>,
Karthik Nayak <karthik.188@gmail.com>,
Justin Tobler <jltobler@gmail.com>,
Ayush Chandekar <ayu.chandekar@gmail.com>,
Siddharth Asthana <siddharthasthana31@gmail.com>,
phillip.wood@dunelm.org.uk
Subject: Re: [GSoC][Draft Proposal v4] Refactoring in order to reduce Git's global state
Date: Sun, 1 Mar 2026 16:43:03 +0000 [thread overview]
Message-ID: <eecd6531-a7b5-4f0e-8e4d-3807f47d1f9d@gmail.com> (raw)
In-Reply-To: <0a944142-7c51-4143-af00-2a5798ea68af@gmail.com>
Hi Tian
On 27/02/2026 16:58, Tian Yuchen wrote:
>
>
> 3. Unit testing becomes difficult because the environment must be
> artificially manipulated before calling functions.
>
> Take a look at this example from environment.c:
>
> 206 const char *get_commit_output_encoding(void)
> 207 {
> 208 return git_commit_encoding ? git_commit_encoding : "UTF-8";
> 209 }
>
> If Git is invoked as a C library by a multi-threaded server:
> - Thread A formats a commit for Repo A (using GBK);
> - Thread B concurrently formats a commit for Repo B (using UTF-8);
The encoding config is really a user preference that lets the user
compose commit messages in their preferred encoding while allowing git
to store the message encoded as UTF-8. I'm struggling to see why two
threads would be using different encodings as it implies that the user
is using different encodings in different repositories.
Below you say
> Variables parsed at startup (e.g., editor_program)
> must not be moved to lazily parsed structs to ensure that
> invalid configurations can trigger early failures before
> execution proceeds too far, which is also for the sake of user
> experience.
i18n.commitEncoding is another such setting as it is currently eagerly
parsed so I'm surprised to see it being converted to lazy parsing in
https://lore.kernel.org/20260228190201.3684705-1-a3205153416@gmail.com
I'm afraid that the suggestion on the project webpage is not very
helpful. Most config variables are unsuited to a conversion based on
repository_settings, it would be better to look at the approach
implemented in
https://lore.kernel.org/48821a3848bef25c13038be8377ad73e7c17a924.1771258573.git.belkid98@gmail.com
that is discussed in https://lore.kernel.org/xmqqwm1vk83a.fsf@gitster.g
Thanks
Phillip
> Then they will race to read and overwrite the exact same global
> `git_commit_encoding` pointer, which is not what we expect. Therefore,
> we have to refactor these environment variables by moving them from
> global scope into a well-defined and encapsulated context.
>
>
> -- Approach
>
> The task at hand goes beyond simply repackaging the global variables
> into the struct repository structure. Based on my recent experience
> refactoring setup.c, I realized that libification requires careful
> management of variable lifecycles and api boundaries:
>
> [ Current ]
> Core functions --------reads-------> Global variables (via getenv)
> [Thread unsafe]
>
> [ Target ]
> Core functions ----passes context--> struct repository
> | owns
> v
> struct repo_settings
>
> other domain-specific structs
>
> Although the principle is simple, the scope of changes is extensive. The
> following insights can serve as a guiding principle for it:
>
> 1. Identify isolated environment variables currently residing in the
> global scope. Conduct a case-by-case analysis to map each variable
> to its most appropriate existing home based on their lifecycles:
>
> Variables that are only parsed when needed will be safely mapped
> to struct repo_settings.
>
> Variables parsed at startup (e.g., editor_program)
> must not be moved to lazily parsed structs to ensure that
> invalid configurations can trigger early failures before
> execution proceeds too far, which is also for the sake of user
> experience.
>
> 2. Instead of blindly passing struct repository *repo down into every
> single low-level library function, bubbling the dependency up is
> the true goal. External callers of the functions must be carefully
> audited to prevent regressions.
>
> 3. Safely remove the old global variables and macro definitions. Make
> full use of Git's existing GitLab/GitHub CI and utilize local
> Meson builds with AddressSanitizer enabled to ensure that the new
> lifecycle introduces zero memory leaks.
>
>
> Additionally, given the anticipated high volume of commits, we must
> ensure each patch is independent and atomic, preventing any user-
> untraceable or unexplainable bugs from occurring in the codebase at any
> state.
>
>
> AVAILABILITY
> ------------
> Fortunately, my summer vacation coincides with the GSoC work period.
> I will treat this project as my primary focus, dedicating a minimum of
> 35 hours per week. If needed, I can work a 9-to-5 schedule.
>
> I will have a significant head start to draft RFC patches before the
> official coding period even begins. Having this buffer period allows me
> to go through the rigorous code review process within the Git community
> with greater ease.
>
>
> TIMELINE & MILESTONES
> ---------------------
> Considering the differences between this project and other projects on
> the idea list, rather than hoarding massive changes, I will submit 3-
> to-5-patch series frequently to respect reviewers' time and maintain a
> steady velocity.
>
> Below is the tentative schedule I have prepared for myself:
>
> * Community Bonding (May 1 - May 25): Planning & RFC
> - May 1 - May 7: Wrap up university finals. Discuss and finalize the
> prioritized list of subsystems with my mentor.
> - May 8 - May 25: Categorize the targeted global variables and map out
> their intended destinations (e.g., repo_settings). Draft and submit
> the initial RFC patch series.
>
> * Phase 1 (May 26 - July 10): Foundation
> - Weeks 1-2: Plumb the context pointer ('struct repository *repo')
> through call chains for simple variables (e.g., boolean flags or integer
> configs).
> - Weeks 3-4: Audit and update external callers to use the new API.
> - Weeks 5-6: Submit the first major refactoring patch series. Address
> mailing list feedback and resolve merge conflicts. (Midterm
> Evaluation)
>
> * Phase 2 (July 11 - August 18): Complex Migration & Cleanup
> - Weeks 7-8: Refactor higher-complexity variables (e.g., path-related
> globals).
> - Weeks 9-10: Compile the codebase with AddressSanitizer and run the
> full test suite to execute strict memory leak checks.
> - Weeks 11-12: Remove unused global macro definitions and static
> variables. Update internal documentation and write the final GSoC report.
>
> (The above is for reference only. Personally, I always finish tasks
> faster than planned 😉)
>
>
> ~$ git checkout HEAD@{postGSoC}
> -------------------------------
> This past month since joining the Git community has been the most
> enjoyable month of my programming journey. To quote a close friend of
> mine (who is applying for the Neovim GSoC project):
>
> "Only fools chase trends; open source is the game for the brave."
>
> The words may be blunt, but the logic holds true. This statement surely
> resonates with me (and maybe many other GSoC contributors): our passion
> for code and open-source drives us forward.
>
> Even if I didn't make the cut, so what? ~$ git reset --hard...
> Just kidding. The Git codebase is far too interesting to abandon now.
>
> -------------------------------------------------------------------------
> Changes since V4:
>
> - “Treating variables or functions differently based on their
> lifecycle” has been added to the Approach section.
>
> - Fixed a typo below the diagram.
>
> Regards,
>
> Yuchen
>
next prev parent reply other threads:[~2026-03-01 16:43 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-22 17:59 [GSoC][Draft Proposal] Refactoring in order to reduce Git's global state Tian Yuchen
2026-02-22 18:34 ` Usman Akinyemi
2026-02-23 0:57 ` Tian Yuchen
2026-02-23 1:07 ` [GSoC][Draft Proposal V2] " Tian Yuchen
2026-02-25 17:11 ` [GSoC][Draft Proposal v3] " Tian Yuchen
2026-02-26 9:27 ` Karthik Nayak
2026-02-26 14:03 ` Tian Yuchen
2026-02-26 14:16 ` Tian Yuchen
2026-02-26 17:02 ` [GSoC][Draft Proposal v4] " Tian Yuchen
2026-02-27 9:03 ` Phillip Wood
2026-02-27 15:07 ` Tian Yuchen
2026-02-27 16:58 ` Tian Yuchen
2026-03-01 16:43 ` Phillip Wood [this message]
2026-03-01 16:58 ` Tian Yuchen
2026-03-02 19:06 ` Junio C Hamano
2026-03-03 12:11 ` [GSoC][Draft Proposal v6] " Tian Yuchen
2026-03-08 17:38 ` [GSoC][Draft Proposal v7] " Tian Yuchen
2026-03-14 17:57 ` Tian Yuchen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eecd6531-a7b5-4f0e-8e4d-3807f47d1f9d@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=a3205153416@gmail.com \
--cc=ayu.chandekar@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=phillip.wood@dunelm.org.uk \
--cc=siddharthasthana31@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox