* [GSoC][Draft Proposal V2] Refactoring in order to reduce Git's global state
2026-02-22 17:59 [GSoC][Draft Proposal] " Tian Yuchen
@ 2026-02-23 1:07 ` Tian Yuchen
0 siblings, 0 replies; 5+ messages in thread
From: Tian Yuchen @ 2026-02-23 1:07 UTC (permalink / raw)
To: git
Cc: Christian Couder, Karthik Nayak, Justin Tobler, Ayush Chandekar,
Siddharth Asthana
Hello everyone,
I'm Tian Yuchen and I'm planning to apply for GSoC project this year. I
hope you can take the time to review my proposal.
Please feel free to leave feedback!
Google Docs link:
https://docs.google.com/document/d/1t2sznOvnPz-9tOzVMH--pLxzRqYSJCFzqVWBVfL_NP8/edit?tab=t.0#heading=h.c3c40ftj1ilv
Refactoring in order to reduce Git's global state
=================================================
PERSONAL INFORMATION
--------------------
Name: Tian Yuchen
E-mail: a3205153416@gmail.com
Phone number: +65 98740318
Time-zone: UTC + 08:00
Github: https://github.com/malon7782
Education: NTU, Singapore
Year: Year 1 semester 2
Degree: Electrical and Electronic Engineering (EEE)
PRE GSOC
--------
I have always held a deep passion for the open-source community.
Although I wasn't a computer science major, I tinkered with open-source
projects long before college. I have solid hands-on experience in C
programming and system-level debugging.
I use Ubuntu 24.04 on a daily basis, so I am proficient in using the
Linux command line and CLI tools.
I have contributed to the Git community by sending patches. Since my
first commit (17/1/2026), I have maintained a nearly daily contribution.
Here is the list of contributions I have made:
* [PATCH v1] t1005: modernize "! test -f" to "test_path_is_missing"
https://lore.kernel.org/git/20260117062515.319664-1-a3205153416@gmail.com/
This patch is my microproject, the first contribution I made to the
codebase.
[Graduated to 'master']
* [PATCH v2] t2203: avoid masking exit codes in git status
https://lore.kernel.org/git/20260118043537.338769-1-a3205153416@gmail.com/#t
* [PATCH v2] symlinks: use unsigned int for flags
https://lore.kernel.org/git/20260120152219.398999-1-a3205153416@gmail.com/
[Will merge to 'next']
* [PATCH v4] t/perf/p3400: speed up setup using fast-import
https://lore.kernel.org/git/20260130170123.642344-1-a3205153416@gmail.com/
[Will merge to 'master']
* Re: [PATCH] [RFC] attr: use local repository state in read_attr
https://lore.kernel.org/git/cc2f400e-49c2-4de0-9c51-9a5c0294735e@gmail.com/
Code review. To verify the performance loss, I wrote a test script to
measure the time difference before and after the modification.
* Re: Bug: git add :!x . exits with error when x is in .gitignore
https://lore.kernel.org/git/1d560aa1-d452-47f5-aaf2-4cb1ccdab100@gmail.com/
Code review. Pointed out logical error.
* [PATCH v10] setup: allow cwd/.git to be a symlink to a directory
https://lore.kernel.org/git/20260220164512.216901-1-a3205153416@gmail.com/
In progress.
After over half a month of discussions, repeated refactoring, and code
reviews, I delved deep into setup.c. I gained insights into Git's
design philosophy, and learned the art of striking a balance in
developer communication. It took me a large amount of time and effort to
thoroughly understand every line of the code. I often found myself
poring over the call chain of a single function well into the night....
But I persevered until the end, and I believe my patience will see me
through even larger projects.
ABOUT THE PROJECT
-----------------
-- Synopsis
As far as I know, the Git community is actively working towards
'libification' - making Git's internal machinery reusable as a C
library. The extensive reliance on global state is a major roadblock to
this goal.
Many core functions implicitly read environment variables and store them
in global static variables. This can cause several issues:
1. Global variables prevent Git's core functions from being executed
safely in multi-threaded contexts.
2. When Git is called multiple times within the same process, global
states can lead to memory leaks or incorrect behaviors.
3. Unit testing becomes difficult because the environment must be
artificially manipulated before calling functions.
Take a look at this example from environment.c:
206 const char *get_commit_output_encoding(void)
207 {
208 return git_commit_encoding ? git_commit_encoding : "UTF-8";
209 }
If Git is invoked as a C library by a multi-threaded server:
- Thread A formats a commit for Repo A (using GBK);
- Thread B concurrently formats a commit for Repo B (using UTF-8);
Then they will race to read and overwrite the exact same global
`git_commit_encoding` pointer, which is not what we expect. Therefore,
we have to refactor these environment variables by moving them from
global scope into a well-defined and encapsulated context.
-- Approach
The task at hand can be summed up in one sentence: repackage the global
variables into the `struct repository` structure. In other words:
[ Current ]
Core functions --------reads-------> Global variables (via getenv)
[Thread unsafe]
[ Target ]
Core functions ----passes context--> struct repository
| owns
v
struct git_env
Although the principle is simple, the scope of changes is extensive. The
following three-step approach can serve as a guiding principle for it:
1. Identify isolated environment variables currently residing in the
global scope. Introduce a dedicated structure to hold these states,
e.g. `struct git_env` within the `struct repository`.
2. Modify the function signatures within the call chain to accept the
context, e.g., `struct repository *repo`, instead of relying on
implicit globals. External callers of the functions must be
carefully audited to prevent regressions.
3. Safely remove the old global variables and macro definitions. Tools
such as AddressSanitizer can be helpful to ensure that the new
struct-based lifecycle introduces zero memory leaks.
Additionally, given the anticipated high volume of commits, we must
ensure each patch is independent and atomic, preventing any
user-untraceable or unexplainable bugs from occurring in the codebase at
any state.
AVAILABILITY
------------
Fortunately, my summer vacation coincides with the GSoC work period.
I will treat this project as my primary focus, dedicating a minimum of
35 hours per week. If needed, I can work a 9-to-5 schedule.
I will have a significant head start to draft RFC patches before the
official coding period even begins. Having this buffer period allows me
to go through the rigorous code review process within the Git community
with greater ease.
TIMELINE & MILESTONES
---------------------
Considering the differences between this project and other projects on
the idea list, rather than hoarding massive changes, I will submit
3-to-5-patch series frequently to respect reviewers' time and maintain a
steady velocity.
Below is the tentative schedule I have prepared for myself:
* Community Bonding (May 1 - May 25): Planning & RFC
- May 1 - May 7: Wrap up university finals. Discuss and finalize the
prioritized list of subsystems with my mentor.
- May 8 - May 25: Define the core context container. Draft and submit
the initial RFC patch series for this new data structure.
* Phase 1 (May 26 - July 10): Foundation
- Weeks 1-2: Plumb the context pointer (`struct repository *repo`)
through call chains for simple variables (e.g., boolean flags or integer
configs).
- Weeks 3-4: Audit and update external callers to use the new API.
- Weeks 5-6: Submit the first major refactoring patch series. Address
mailing list feedback and resolve merge conflicts. (Midterm Evaluation)
* Phase 2 (July 11 - August 18): Complex Migration & Cleanup
- Weeks 7-8: Refactor higher-complexity variables (e.g., path-related
globals).
- Weeks 9-10: Compile the codebase with AddressSanitizer and run the
full test suite to execute strict memory leak checks.
- Weeks 11-12: Remove unused global macro definitions and static
variables. Update internal documentation and write the final GSoC report.
(The above is for reference only. Personally, I always finish tasks
faster than planned ;)
~$ git checkout HEAD@{postGSoC}
-------------------------------
This past month since joining the Git community has been the most
enjoyable month of my programming journey. To quote a close friend of
mine (who is applying for the Neovim GSoC project):
"Only fools chase trends; open source is the game for the brave."
The words may be blunt, but the logic holds true. This statement surely
resonates with me (and maybe many other GSoC contributors): our passion
for code and open-source drives us forward.
Even if I didn't make the cut, so what? ~$ git reset --hard...
Just kidding. The Git codebase is far too interesting to abandon now.
-------------------------------------------------------------------------
Changes since V1:
- Transfer the text from Google Docs to here.
Regards,
Yuchen
^ permalink raw reply [flat|nested] 5+ messages in thread
* [GSoC Draft Proposal] Refactoring in order to reduce Git's global state
@ 2026-03-08 11:40 Burak Kaan Karaçay
2026-03-09 15:17 ` Christian Couder
2026-03-15 9:52 ` [GSoC Draft Proposal v2] " Burak Kaan Karaçay
0 siblings, 2 replies; 5+ messages in thread
From: Burak Kaan Karaçay @ 2026-03-08 11:40 UTC (permalink / raw)
To: git
Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31
=================================================
Refactoring in order to reduce Git’s global state
=================================================
Personal Info:
--------------
Name: Burak Kaan Karaçay (he/him)
Email: bkkaracay@gmail.com
Education: UG Sophomore, Marmara University
GitHub: https://github.com/bkkaracay
Timezone: UTC+3 (Istanbul, Turkey)
My Patches:
-----------
+ (Microproject) t2003: modernize path existence checks using test
helpers
- Thread:
https://lore.kernel.org/git/20260208202809.270523-1-bkkaracay@gmail.com/T/
- Thread v2:
https://lore.kernel.org/git/20260209112444.1268765-1-bkkaracay@gmail.com/T/
- Status: Merged to master
- Commit Hash: 168d575719d944759964e004d17a3282b0f883d5
+ [PATCH 0/2] mailmap: reduce global state
- Thread:
https://lore.kernel.org/git/20260219125954.3539324-1-bkkaracay@gmail.com/T/
- Status: Merged to master
- Commit Hash: 2d843a2d3d6c2d5e7861e6aa99743d15d36746b9
Relevant Experience:
--------------------
I am currently developing my own programming language as a hobby
project, writing a zero-dependency interpreter for it in C. While it is
still a work in progress, I have completed the core front-end pipeline.
Building this project has given me practical experience with C
programming, data structures and modular software architecture.
+ To support potential future multithreading, I avoided global variables
in my own project. Instead, I pass state via local contexts.
+ I implemented an arena allocator (memory pool) to reduce malloc system
call overhead, prevent memory fragmentation and ensure cache locality.
+ I used techniques like string interning and Pratt parsing.
My project is available on my GitHub profile [1]. If you would like to
take a look at the code, 'src/main.c' is a good starting point.
Project Abstract:
-----------------
Git was originally designed as a short-lived CLI tool, where relying on
global variables was highly practical. Over time, the need to embed Git
into other projects and applications emerged. Today, these global
variables are a huge roadblock to the libification of git, as they make
it impossible to properly handle multiple repositories within a single
process or safely support multi-threading.
This project aims to reduce this reliance by migrating global variables
from 'environment.c' into appropriate locations. This effort will
support the libification goal and modernize Git's internal structure.
Technical Approach:
-------------------
The core challenge of this project is choosing the correct parsing
strategy more than relocating globals. The codebase currently offers two
migration strategies for global state removal.
Currently, globals are loaded eagerly via 'repo_config()'. The modern
'repo_config_values()' API provides a safe and straightforward way to
eagerly load variables and reduce global count. However, eager-loading
parses all configurations upfront, including unnecessary ones. Users may
encounter fatal configuration errors that are entirely unrelated to the
command they are executing [2].
On the contrary, lazy-loading postpones the parsing process until the
variable is strictly required, preventing unrelated configuration
errors. However, it is significantly trickier to migrate. If a
misformatted configuration triggers a 'die()' in the middle of the
execution, it risks causing data corruption. Moreover, lazy-loading
changes the timing of error reporting and struggles to replicate
eager-loading behavior when multiple configuration keys affect a single
variable [3].
If lazy-loading is considered safe for variable, git provides two APIs
depending on the performance requirements:
* The 'repo_config_get*' function set is suitable for variables
* accessed infrequently because of underlying string hashing costs. It
* is important to use this API to not bloat the 'struct repo_settings'
* [2].
* For frequently accessed variables, caching them within 'struct
* repo_settings' is preferred, as it amortizes hash costs and provides
* direct memory access speed.
There is no silver bullet solution for migrating globals. Because
transitioning these variables require a deep understanding about the
codebase, communication with mentors and the community is essential.
About Gentle Reading:
---------------------
Current config readers rely on 'die()' to handle error cases. While
pragmatic for cli-tools, fatal exits are unacceptable for a library, as
they will crash the host process. Building upon Derrick Stolee's recent
introduction of gentle parsing functions [4], I propose implementing
'_maybe' variants for core configuration readers. Since removing all
'die()' calls is inevitable for libification, sooner or later config
readers will be purged from 'die()' calls. Utilizing the gentle
functions for newly migrated global variables will reduce the future
amount of work.
Applying this gentle API to widely used functions risks creating
unreviewable patches and merge conflicts. To solve this, I plan to use a
function wrapper approach, similar to the strategy used in early
the_repository migrations [5]. However, the_repository changes are more
mechanical work compared to the gentle transition. In complex call
stacks, a gentle transition risks causing a regression or a scope creep.
Utilizing the "normal" config helpers will be helpful in these
conditions.
Another possible roadblock in the transition is the magic numbers in
error reporting. Some of the functions in Git use -1 and 1 to inform
callers about two different error cases or situations. Introducing a
third hard-coded number to tell callers to stop the Git process for a
misformatted config would be a poor design choice. Furthermore, adopting
a standardized error structure like enum git_error_code is a step toward
git's ongoing libification efforts, as it enables external callers
consuming the API to handle errors programmatically.
Availability:
-------------
I plan to dedicate 40+ hours per week to this project during my active
coding period. However, I want to be completely transparent about my
university's academic calendar to set realistic expectations.
In Turkey, the university summer break begins in July and ends in late
September. During May and June, my schedule will be heavily occupied by
final exams and major group project deadlines. For this reason, my
availability during these two months will be limited to around 10-15
hours per week. I will use this time to stay active on the mailing list,
participate in architectural discussions and submit smaller, preparatory
patches.
To ensure the highest quality of work, I propose utilizing GSoC's
officially supported flexible timeline. I am completely free during
July, August, and September (with no summer school or internships).
During these three months, I will dedicate 40+ hours per week entirely
to git.
Community Bonding (May 1 - May 24):
- Analyze environment.c and create a detailed mitigation plan for each
variable.
- Discuss the plan with mentors to identify potential roadblocks or edge
cases.
- Submit a patch about 'enum git_error_code' to start community
discussion.
- Set up a blog to share bi-weekly updates throughout the project.
Phase 1 (May 25 - June 28):
- Introduce the '_maybe' versions of the config readers and write tests
for them.
- Begin mitigating "low-hanging" globals. To avoid wasting time while
waiting for reviews, start drafting subsequent patches concurrently.
- Publish the first progress reports on the blog.
Phase 2 (June 29 - September 15):
- Discuss globals with mentors where mitigations might cause behavioral
changes.
- Shift focus to the more complex cases, specifically those involving
eager-lazy or '_maybe' transitions.
- Continue publishing regular blog updates.
Phase 3 (September 16 - September 30):
- Act as a buffer period to respond to final feedback on patches
currently under review.
- Complete the final project report and publish it on the blog.
References:
-----------
[1] https://github.com/bkkaracay/caret
[2] https://lore.kernel.org/git/xmqq1pk3lmu3.fsf@gitster.g/
[3] https://lore.kernel.org/git/23428022-ab13-4a3e-90ed-ff91ef93f051@gmail.com/
[4] https://lore.kernel.org/all/pull.2044.v3.git.1771849615.gitgitgadget@gmail.com/
[5] https://lore.kernel.org/git/20260109213021.2546-2-l.s.r@web.de/
---
Thanks to everyone for their time and guidance. I'm really excited about
the possibility of working on this project, and any feedback to make
this proposal better is deeply appreciated.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [GSoC Draft Proposal] Refactoring in order to reduce Git's global state
2026-03-08 11:40 [GSoC Draft Proposal] Refactoring in order to reduce Git's global state Burak Kaan Karaçay
@ 2026-03-09 15:17 ` Christian Couder
2026-03-11 18:34 ` Burak Kaan Karaçay
2026-03-15 9:52 ` [GSoC Draft Proposal v2] " Burak Kaan Karaçay
1 sibling, 1 reply; 5+ messages in thread
From: Christian Couder @ 2026-03-09 15:17 UTC (permalink / raw)
To: Burak Kaan Karaçay
Cc: git, karthik.188, jltobler, ayu.chandekar, siddharthasthana31
On Sun, Mar 8, 2026 at 12:40 PM Burak Kaan Karaçay <bkkaracay@gmail.com> wrote:
[...]
> My Patches:
> -----------
>
> + (Microproject) t2003: modernize path existence checks using test
> helpers
> - Thread:
> https://lore.kernel.org/git/20260208202809.270523-1-bkkaracay@gmail.com/T/
> - Thread v2:
> https://lore.kernel.org/git/20260209112444.1268765-1-bkkaracay@gmail.com/T/
> - Status: Merged to master
> - Commit Hash: 168d575719d944759964e004d17a3282b0f883d5
Here you gave the commit that was merged.
> + [PATCH 0/2] mailmap: reduce global state
> - Thread:
> https://lore.kernel.org/git/20260219125954.3539324-1-bkkaracay@gmail.com/T/
> - Status: Merged to master
> - Commit Hash: 2d843a2d3d6c2d5e7861e6aa99743d15d36746b9
Here this is the merge commit. Two commits were actually merged.
[...]
> Technical Approach:
> -------------------
Before discussing that I think you might want to summarize what has
already been done for this project, especially the recent work by
Olamide Bello.
[...]
> Availability:
> -------------
>
> I plan to dedicate 40+ hours per week to this project during my active
> coding period. However, I want to be completely transparent about my
> university's academic calendar to set realistic expectations.
>
> In Turkey, the university summer break begins in July and ends in late
> September. During May and June, my schedule will be heavily occupied by
> final exams and major group project deadlines. For this reason, my
> availability during these two months will be limited to around 10-15
> hours per week. I will use this time to stay active on the mailing list,
> participate in architectural discussions and submit smaller, preparatory
> patches.
>
> To ensure the highest quality of work, I propose utilizing GSoC's
> officially supported flexible timeline. I am completely free during
> July, August, and September (with no summer school or internships).
> During these three months, I will dedicate 40+ hours per week entirely
> to git.
Yeah, I think it could work. Thanks for suggesting this.
> Community Bonding (May 1 - May 24):
> - Analyze environment.c and create a detailed mitigation plan for each
> variable.
> - Discuss the plan with mentors to identify potential roadblocks or edge
> cases.
> - Submit a patch about 'enum git_error_code' to start community
> discussion.
I didn't talk about it earlier, but I am not sure using 'enum
git_error_code' all over the codebase would be a good idea. Perhaps a
few functions would benefit from that, but then the enum could be
specific for these functions.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [GSoC Draft Proposal] Refactoring in order to reduce Git's global state
2026-03-09 15:17 ` Christian Couder
@ 2026-03-11 18:34 ` Burak Kaan Karaçay
0 siblings, 0 replies; 5+ messages in thread
From: Burak Kaan Karaçay @ 2026-03-11 18:34 UTC (permalink / raw)
To: Christian Couder
Cc: git, karthik.188, jltobler, ayu.chandekar, siddharthasthana31
On Mon, Mar 09, 2026 at 04:17:20PM +0100, Christian Couder wrote:
>I didn't talk about it earlier, but I am not sure using 'enum
>git_error_code' all over the codebase would be a good idea. Perhaps a
>few functions would benefit from that, but then the enum could be
>specific for these functions.
I understand your point. I have also noticed the use of
function-specific enums for error returns in the codebase, which makes a
lot of sense. Because of this, I plan to remove the part about
introducing a global git_error_code from the proposal. Thanks for
pointing this out.
I have also noted your other suggestions regarding Olamide's work and
the commit hashes. I will apply them in the next version of the
proposal.
Thanks for your time and guidance, it is really helpful.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [GSoC Draft Proposal v2] Refactoring in order to reduce Git's global state
2026-03-08 11:40 [GSoC Draft Proposal] Refactoring in order to reduce Git's global state Burak Kaan Karaçay
2026-03-09 15:17 ` Christian Couder
@ 2026-03-15 9:52 ` Burak Kaan Karaçay
1 sibling, 0 replies; 5+ messages in thread
From: Burak Kaan Karaçay @ 2026-03-15 9:52 UTC (permalink / raw)
To: Burak Kaan Karaçay, git
Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31
Changes in v2:
- Clarified merge commit - commit hash difference.
- Added 'Project Background' section.
- Refined the part about Olamide's API in 'Technical Approach'.
- Removed 'enum git_error_code' proposal.
Thanks for time and guidance.
---
=================================================
Refactoring in order to reduce Git’s global state
=================================================
Personal Info:
--------------
Name: Burak Kaan Karaçay (he/him)
Email: bkkaracay@gmail.com
Education: UG Sophomore, Marmara University
GitHub: https://github.com/bkkaracay
Timezone: UTC+3 (Istanbul, Turkey)
My Patches:
-----------
+ (Microproject) t2003: modernize path existence checks using test
helpers
- Thread:
https://lore.kernel.org/git/20260208202809.270523-1-bkkaracay@gmail.com/T/
- Thread v2:
https://lore.kernel.org/git/20260209112444.1268765-1-bkkaracay@gmail.com/T/
- Status: Merged to master
- Merge Commit Hash: 70d3916a7db5233ce01f2f3f36ee04d57c0f9252
+ [PATCH v2 0/2] mailmap: reduce global state
- Thread:
https://lore.kernel.org/git/20260219125954.3539324-1-bkkaracay@gmail.com/T/
- Status: Merged to master
- Merge Commit Hash: 2d843a2d3d6c2d5e7861e6aa99743d15d36746b9
+ [PATCH v3 0/2] run-command: stop using the_repository
- Thread:
https://lore.kernel.org/git/20260311151923.4178655-1-bkkaracay@gmail.com/T/
- Status: Will merge to master
- Merge Commit Hash (next): 61ffe62b75cf89af469af53b15f3fdc6639d217a
Relevant Experience:
--------------------
I am currently developing my own programming language as a hobby
project, writing a zero-dependency interpreter for it in C. While it is
still a work in progress, I have completed the core front-end pipeline.
Building this project has given me practical experience with C
programming, data structures and modular software architecture.
+ To support potential future multithreading, I avoided global variables
in my own project. Instead, I pass state via local contexts.
+ I implemented an arena allocator (memory pool) to reduce malloc system
call overhead, prevent memory fragmentation and ensure cache locality.
+ I used techniques like string interning and Pratt parsing.
My project is available on my GitHub profile [1]. If you would like to
take a look at the code, 'src/main.c' is a good starting point.
Project Abstract:
-----------------
Git was originally designed as a short-lived CLI tool, where relying on
global variables was highly practical. Over time, the need to embed Git
into other projects and applications emerged. Today, these global
variables are a huge roadblock to the libification of git, as they make
it impossible to properly handle multiple repositories within a single
process or safely support multi-threading.
This project aims to reduce this reliance by migrating global variables
from 'environment.c' into appropriate locations. This effort will
support the libification goal and modernize Git's internal structure.
Project Background:
-------------------
Discussions surrounding the "libification" of git date back as early as
2005 [2]. However, efforts to isolate global state in environment.c
accelerated following Patrick Steinhardt's groundwork in 2024.
Once the environment.c cleanup became an official GSoC project, the
patch series from the first intern in this area, Ayush Chandekar,
provided valuable lessons on best practices and potential pitfalls.
During the later stages of Ayush's internship, the limitations and
safety risks of lazy-parsing became apparent. To solve this bottleneck,
Phillip Wood proposed a new eager-loading API, which was successfully
implemented by Outreachy intern Olamide Caleb Bello. Although this API
is currently functional, to avoid invasive changes across the codebase,
it can currently only read config values from 'the_repository' [3].
Technical Approach:
-------------------
The core challenge of this project is choosing the correct parsing
strategy more than relocating globals. The codebase currently offers two
migration strategies for global state removal.
Currently, globals are loaded eagerly via 'repo_config()'. Olamide's
'struct config_values' API provides a modern way to load these globals
eagerly by parsing them into fields in 'repo->cfg_values'. However,
eager-loading parses all configurations upfront, including unnecessary
ones. Users may encounter fatal configuration errors that are entirely
unrelated to the command they are executing [4].
On the contrary, lazy-loading postpones the parsing process until the
variable is strictly required, preventing unrelated configuration
errors. However, it is significantly trickier to migrate. If a
misformatted configuration triggers a 'die()' in the middle of the
execution, it risks causing data corruption. Moreover, lazy-loading
changes the timing of error reporting and struggles to replicate
eager-loading behavior when multiple configuration keys affect a single
variable [5].
If lazy-loading is considered safe for variable, git provides two APIs
depending on the performance requirements:
* The 'repo_config_get*' function set is suitable for variables
accessed infrequently because of underlying string hashing costs. It
is important to use this API to not bloat the 'struct repo_settings'
[4].
* For frequently accessed variables, caching them within 'struct
repo_settings' is preferred, as it amortizes hash costs and provides
direct memory access speed.
There is no silver bullet solution for migrating globals. Because
transitioning these variables require a deep understanding about the
codebase, communication with mentors and the community is essential.
About Gentle Reading:
---------------------
Current config readers rely on 'die()' to handle error cases. While
pragmatic for cli-tools, fatal exits are unacceptable for a library, as
they will crash the host process. Building upon Derrick Stolee's recent
introduction of gentle parsing functions [6], I propose implementing
'_maybe' variants for core configuration readers. Since removing all
'die()' calls is inevitable for libification, sooner or later config
readers will be purged from 'die()' calls. Utilizing the gentle
functions for newly migrated global variables will reduce the future
amount of work.
Applying this gentle API to widely used functions risks creating
unreviewable patches and merge conflicts. To solve this, I plan to use a
function wrapper approach, similar to the strategy used in early
the_repository migrations [7]. However, the_repository changes are more
mechanical work compared to the gentle transition. In complex call
stacks, a gentle transition risks causing a regression or a scope creep.
Utilizing the "normal" config helpers will be helpful in these
conditions.
Availability:
-------------
I plan to dedicate 40+ hours per week to this project during my active
coding period. However, I want to be completely transparent about my
university's academic calendar to set realistic expectations.
In Turkey, the university summer break begins in July and ends in late
September. During May and June, my schedule will be heavily occupied by
final exams and major group project deadlines. For this reason, my
availability during these two months will be limited to around 10-15
hours per week. I will use this time to stay active on the mailing list,
participate in architectural discussions and submit smaller, preparatory
patches.
To ensure the highest quality of work, I propose utilizing GSoC's
officially supported flexible timeline. I am completely free during
July, August, and September (with no summer school or internships).
During these three months, I will dedicate 40+ hours per week entirely
to git.
Community Bonding (May 1 - May 24):
- Analyze environment.c and create a detailed mitigation plan for each
variable.
- Discuss the plan with mentors to identify potential roadblocks or edge
cases.
- Set up a blog to share bi-weekly updates throughout the project.
Phase 1 (May 25 - June 28):
- Introduce the '_maybe' versions of the config readers and write tests
for them.
- Begin mitigating "low-hanging" globals. To avoid wasting time while
waiting for reviews, start drafting next patches.
- Publish the first progress reports on the blog.
Phase 2 (June 29 - September 15):
- Discuss globals with mentors where mitigations might cause behavioral
changes.
- Shift focus to the more complex cases, specifically those involving
eager-lazy or '_maybe' transitions.
- Continue publishing regular blog updates.
Phase 3 (September 16 - September 30):
- Act as a buffer period to respond to final feedback on patches
currently under review.
- Complete the final project report and publish it on the blog.
References:
-----------
[1] https://github.com/bkkaracay/caret
[2] https://lore.kernel.org/git/7vpsr6ymg3.fsf_-_@assigned-by-dhcp.cox.net/
[3] https://cloobtech.hashnode.dev/week-5-and-6-design-reviews-rfcs-and-refining-the-path-forward
[4] https://lore.kernel.org/git/xmqq1pk3lmu3.fsf@gitster.g/
[5] https://lore.kernel.org/git/23428022-ab13-4a3e-90ed-ff91ef93f051@gmail.com/
[6] https://lore.kernel.org/all/pull.2044.v3.git.1771849615.gitgitgadget@gmail.com/
[7] https://lore.kernel.org/git/20260109213021.2546-2-l.s.r@web.de/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-15 9:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-08 11:40 [GSoC Draft Proposal] Refactoring in order to reduce Git's global state Burak Kaan Karaçay
2026-03-09 15:17 ` Christian Couder
2026-03-11 18:34 ` Burak Kaan Karaçay
2026-03-15 9:52 ` [GSoC Draft Proposal v2] " Burak Kaan Karaçay
-- strict thread matches above, loose matches on Subject: below --
2026-02-22 17:59 [GSoC][Draft Proposal] " Tian Yuchen
2026-02-23 1:07 ` [GSoC][Draft Proposal V2] " Tian Yuchen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox