public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Tian Yuchen <a3205153416@gmail.com>, git@vger.kernel.org
Cc: Christian Couder <christian.couder@gmail.com>,
	Karthik Nayak <karthik.188@gmail.com>,
	Justin Tobler <jltobler@gmail.com>,
	Ayush Chandekar <ayu.chandekar@gmail.com>,
	Siddharth Asthana <siddharthasthana31@gmail.com>,
	phillip.wood@dunelm.org.uk
Subject: Re: [GSoC][Draft Proposal v4] Refactoring in order to reduce Git's global state
Date: Sun, 1 Mar 2026 16:43:03 +0000	[thread overview]
Message-ID: <eecd6531-a7b5-4f0e-8e4d-3807f47d1f9d@gmail.com> (raw)
In-Reply-To: <0a944142-7c51-4143-af00-2a5798ea68af@gmail.com>

Hi Tian

On 27/02/2026 16:58, Tian Yuchen wrote:
> 
> 
>    3. Unit testing becomes difficult because the environment must be 
> artificially manipulated before calling functions.
> 
> Take a look at this example from environment.c:
> 
>      206 const char *get_commit_output_encoding(void)
>      207 {
>      208     return git_commit_encoding ? git_commit_encoding : "UTF-8";
>      209 }
> 
> If Git is invoked as a C library by a multi-threaded server:
> - Thread A formats a commit for Repo A (using GBK);
> - Thread B concurrently formats a commit for Repo B (using UTF-8);

The encoding config is really a user preference that lets the user 
compose commit messages in their preferred encoding while allowing git 
to store the message encoded as UTF-8. I'm struggling to see why two 
threads would be using different encodings as it implies that the user 
is using different encodings in different repositories.

Below you say

 >      Variables parsed at startup (e.g., editor_program)
 >      must not be moved to lazily parsed structs to ensure that
 >      invalid configurations can trigger early failures before
 >      execution proceeds too far, which is also for the sake of user
 >          experience.

i18n.commitEncoding is another such setting as it is currently eagerly 
parsed so I'm surprised to see it being converted to lazy parsing in 
https://lore.kernel.org/20260228190201.3684705-1-a3205153416@gmail.com

I'm afraid that the suggestion on the project webpage is not very 
helpful. Most config variables are unsuited to a conversion based on 
repository_settings, it would be better to look at the approach 
implemented in 
https://lore.kernel.org/48821a3848bef25c13038be8377ad73e7c17a924.1771258573.git.belkid98@gmail.com 
that is discussed in https://lore.kernel.org/xmqqwm1vk83a.fsf@gitster.g

Thanks

Phillip

> Then they will race to read and overwrite the exact same global
> `git_commit_encoding` pointer, which is not what we expect. Therefore,
> we have to refactor these environment variables by moving them from
> global scope into a well-defined and encapsulated context.
> 
> 
> -- Approach
> 
> The task at hand goes beyond simply repackaging the global variables 
> into the struct repository structure. Based on my recent experience 
> refactoring setup.c, I realized that libification requires careful 
> management of variable lifecycles and api boundaries:
> 
>      [ Current ]
>      Core functions --------reads-------> Global variables (via getenv)
>                                           [Thread unsafe]
> 
>      [ Target ]
>      Core functions ----passes context--> struct repository
>                                                  | owns
>                                                  v
>                                           struct repo_settings
> 
>                                          other domain-specific structs
> 
> Although the principle is simple, the scope of changes is extensive. The 
> following insights can serve as a guiding principle for it:
> 
>    1. Identify isolated environment variables currently residing in the
>       global scope. Conduct a case-by-case analysis to map each variable
>       to its most appropriate existing home based on their lifecycles:
> 
>      Variables that are only parsed when needed will be safely mapped
>      to struct repo_settings.
> 
>      Variables parsed at startup (e.g., editor_program)
>      must not be moved to lazily parsed structs to ensure that
>      invalid configurations can trigger early failures before
>      execution proceeds too far, which is also for the sake of user
>          experience.
> 
>    2. Instead of blindly passing struct repository *repo down into every
>       single low-level library function, bubbling the dependency up is
>       the true goal. External callers of the functions must be carefully
>       audited to prevent regressions.
> 
>    3. Safely remove the old global variables and macro definitions. Make
>       full use of Git's existing GitLab/GitHub CI and utilize local
>       Meson builds with AddressSanitizer enabled to ensure that the new
>       lifecycle introduces zero memory leaks.
> 
> 
> Additionally, given the anticipated high volume of commits, we must 
> ensure each patch is independent and atomic, preventing any user- 
> untraceable or unexplainable bugs from occurring in the codebase at any 
> state.
> 
> 
> AVAILABILITY
> ------------
> Fortunately, my summer vacation coincides with the GSoC work period.
> I will treat this project as my primary focus, dedicating a minimum of
> 35 hours per week. If needed, I can work a 9-to-5 schedule.
> 
> I will have a significant head start to draft RFC patches before the
> official coding period even begins. Having this buffer period allows me
> to go through the rigorous code review process within the Git community
> with greater ease.
> 
> 
> TIMELINE & MILESTONES
> ---------------------
> Considering the differences between this project and other projects on 
> the idea list, rather than hoarding massive changes, I will submit 3- 
> to-5-patch series frequently to respect reviewers' time and maintain a 
> steady velocity.
> 
> Below is the tentative schedule I have prepared for myself:
> 
> * Community Bonding (May 1 - May 25): Planning & RFC
>    - May 1 - May 7: Wrap up university finals. Discuss and finalize the
>      prioritized list of subsystems with my mentor.
>    - May 8 - May 25: Categorize the targeted global variables and map out
>      their intended destinations (e.g., repo_settings). Draft and submit
>      the initial RFC patch series.
> 
> * Phase 1 (May 26 - July 10): Foundation
>    - Weeks 1-2: Plumb the context pointer ('struct repository *repo') 
> through call chains for simple variables (e.g., boolean flags or integer 
> configs).
>    - Weeks 3-4: Audit and update external callers to use the new API.
>    - Weeks 5-6: Submit the first major refactoring patch series. Address
>      mailing list feedback and resolve merge conflicts. (Midterm 
> Evaluation)
> 
> * Phase 2 (July 11 - August 18): Complex Migration & Cleanup
>    - Weeks 7-8: Refactor higher-complexity variables (e.g., path-related 
> globals).
>    - Weeks 9-10: Compile the codebase with AddressSanitizer and run the 
> full test suite to execute strict memory leak checks.
>    - Weeks 11-12: Remove unused global macro definitions and static 
> variables. Update internal documentation and write the final GSoC report.
> 
> (The above is for reference only. Personally, I always finish tasks 
> faster than planned 😉)
> 
> 
> ~$ git checkout HEAD@{postGSoC}
> -------------------------------
> This past month since joining the Git community has been the most 
> enjoyable month of my programming journey. To quote a close friend of 
> mine (who is applying for the Neovim GSoC project):
> 
>    "Only fools chase trends; open source is the game for the brave."
> 
> The words may be blunt, but the logic holds true. This statement surely
> resonates with me (and maybe many other GSoC contributors): our passion
> for code and open-source drives us forward.
> 
> Even if I didn't make the cut, so what? ~$ git reset --hard...
> Just kidding. The Git codebase is far too interesting to abandon now.
> 
> -------------------------------------------------------------------------
> Changes since V4:
> 
>   - “Treating variables or functions differently based on their 
> lifecycle” has been added to the Approach section.
> 
>   - Fixed a typo below the diagram.
> 
> Regards,
> 
> Yuchen
> 


  reply	other threads:[~2026-03-01 16:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-22 17:59 [GSoC][Draft Proposal] Refactoring in order to reduce Git's global state Tian Yuchen
2026-02-22 18:34 ` Usman Akinyemi
2026-02-23  0:57   ` Tian Yuchen
2026-02-23  1:07 ` [GSoC][Draft Proposal V2] " Tian Yuchen
2026-02-25 17:11 ` [GSoC][Draft Proposal v3] " Tian Yuchen
2026-02-26  9:27   ` Karthik Nayak
2026-02-26 14:03     ` Tian Yuchen
2026-02-26 14:16     ` Tian Yuchen
2026-02-26 17:02   ` [GSoC][Draft Proposal v4] " Tian Yuchen
2026-02-27  9:03     ` Phillip Wood
2026-02-27 15:07       ` Tian Yuchen
2026-02-27 16:58     ` Tian Yuchen
2026-03-01 16:43       ` Phillip Wood [this message]
2026-03-01 16:58         ` Tian Yuchen
2026-03-02 19:06         ` Junio C Hamano
2026-03-03 12:11       ` [GSoC][Draft Proposal v6] " Tian Yuchen
2026-03-08 17:38         ` [GSoC][Draft Proposal v7] " Tian Yuchen
2026-03-14 17:57           ` Tian Yuchen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eecd6531-a7b5-4f0e-8e4d-3807f47d1f9d@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=a3205153416@gmail.com \
    --cc=ayu.chandekar@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=siddharthasthana31@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox