From: "Burak Kaan Karaçay" <bkkaracay@gmail.com>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, karthik.188@gmail.com,
jltobler@gmail.com, ayu.chandekar@gmail.com,
siddharthasthana31@gmail.com
Subject: [GSoC Draft Proposal] Refactoring in order to reduce Git's global state
Date: Sun, 8 Mar 2026 14:40:35 +0300 [thread overview]
Message-ID: <aa1cn0_ATfh-uRE4@gmail.com> (raw)
=================================================
Refactoring in order to reduce Git’s global state
=================================================
Personal Info:
--------------
Name: Burak Kaan Karaçay (he/him)
Email: bkkaracay@gmail.com
Education: UG Sophomore, Marmara University
GitHub: https://github.com/bkkaracay
Timezone: UTC+3 (Istanbul, Turkey)
My Patches:
-----------
+ (Microproject) t2003: modernize path existence checks using test
helpers
- Thread:
https://lore.kernel.org/git/20260208202809.270523-1-bkkaracay@gmail.com/T/
- Thread v2:
https://lore.kernel.org/git/20260209112444.1268765-1-bkkaracay@gmail.com/T/
- Status: Merged to master
- Commit Hash: 168d575719d944759964e004d17a3282b0f883d5
+ [PATCH 0/2] mailmap: reduce global state
- Thread:
https://lore.kernel.org/git/20260219125954.3539324-1-bkkaracay@gmail.com/T/
- Status: Merged to master
- Commit Hash: 2d843a2d3d6c2d5e7861e6aa99743d15d36746b9
Relevant Experience:
--------------------
I am currently developing my own programming language as a hobby
project, writing a zero-dependency interpreter for it in C. While it is
still a work in progress, I have completed the core front-end pipeline.
Building this project has given me practical experience with C
programming, data structures and modular software architecture.
+ To support potential future multithreading, I avoided global variables
in my own project. Instead, I pass state via local contexts.
+ I implemented an arena allocator (memory pool) to reduce malloc system
call overhead, prevent memory fragmentation and ensure cache locality.
+ I used techniques like string interning and Pratt parsing.
My project is available on my GitHub profile [1]. If you would like to
take a look at the code, 'src/main.c' is a good starting point.
Project Abstract:
-----------------
Git was originally designed as a short-lived CLI tool, where relying on
global variables was highly practical. Over time, the need to embed Git
into other projects and applications emerged. Today, these global
variables are a huge roadblock to the libification of git, as they make
it impossible to properly handle multiple repositories within a single
process or safely support multi-threading.
This project aims to reduce this reliance by migrating global variables
from 'environment.c' into appropriate locations. This effort will
support the libification goal and modernize Git's internal structure.
Technical Approach:
-------------------
The core challenge of this project is choosing the correct parsing
strategy more than relocating globals. The codebase currently offers two
migration strategies for global state removal.
Currently, globals are loaded eagerly via 'repo_config()'. The modern
'repo_config_values()' API provides a safe and straightforward way to
eagerly load variables and reduce global count. However, eager-loading
parses all configurations upfront, including unnecessary ones. Users may
encounter fatal configuration errors that are entirely unrelated to the
command they are executing [2].
On the contrary, lazy-loading postpones the parsing process until the
variable is strictly required, preventing unrelated configuration
errors. However, it is significantly trickier to migrate. If a
misformatted configuration triggers a 'die()' in the middle of the
execution, it risks causing data corruption. Moreover, lazy-loading
changes the timing of error reporting and struggles to replicate
eager-loading behavior when multiple configuration keys affect a single
variable [3].
If lazy-loading is considered safe for variable, git provides two APIs
depending on the performance requirements:
* The 'repo_config_get*' function set is suitable for variables
* accessed infrequently because of underlying string hashing costs. It
* is important to use this API to not bloat the 'struct repo_settings'
* [2].
* For frequently accessed variables, caching them within 'struct
* repo_settings' is preferred, as it amortizes hash costs and provides
* direct memory access speed.
There is no silver bullet solution for migrating globals. Because
transitioning these variables require a deep understanding about the
codebase, communication with mentors and the community is essential.
About Gentle Reading:
---------------------
Current config readers rely on 'die()' to handle error cases. While
pragmatic for cli-tools, fatal exits are unacceptable for a library, as
they will crash the host process. Building upon Derrick Stolee's recent
introduction of gentle parsing functions [4], I propose implementing
'_maybe' variants for core configuration readers. Since removing all
'die()' calls is inevitable for libification, sooner or later config
readers will be purged from 'die()' calls. Utilizing the gentle
functions for newly migrated global variables will reduce the future
amount of work.
Applying this gentle API to widely used functions risks creating
unreviewable patches and merge conflicts. To solve this, I plan to use a
function wrapper approach, similar to the strategy used in early
the_repository migrations [5]. However, the_repository changes are more
mechanical work compared to the gentle transition. In complex call
stacks, a gentle transition risks causing a regression or a scope creep.
Utilizing the "normal" config helpers will be helpful in these
conditions.
Another possible roadblock in the transition is the magic numbers in
error reporting. Some of the functions in Git use -1 and 1 to inform
callers about two different error cases or situations. Introducing a
third hard-coded number to tell callers to stop the Git process for a
misformatted config would be a poor design choice. Furthermore, adopting
a standardized error structure like enum git_error_code is a step toward
git's ongoing libification efforts, as it enables external callers
consuming the API to handle errors programmatically.
Availability:
-------------
I plan to dedicate 40+ hours per week to this project during my active
coding period. However, I want to be completely transparent about my
university's academic calendar to set realistic expectations.
In Turkey, the university summer break begins in July and ends in late
September. During May and June, my schedule will be heavily occupied by
final exams and major group project deadlines. For this reason, my
availability during these two months will be limited to around 10-15
hours per week. I will use this time to stay active on the mailing list,
participate in architectural discussions and submit smaller, preparatory
patches.
To ensure the highest quality of work, I propose utilizing GSoC's
officially supported flexible timeline. I am completely free during
July, August, and September (with no summer school or internships).
During these three months, I will dedicate 40+ hours per week entirely
to git.
Community Bonding (May 1 - May 24):
- Analyze environment.c and create a detailed mitigation plan for each
variable.
- Discuss the plan with mentors to identify potential roadblocks or edge
cases.
- Submit a patch about 'enum git_error_code' to start community
discussion.
- Set up a blog to share bi-weekly updates throughout the project.
Phase 1 (May 25 - June 28):
- Introduce the '_maybe' versions of the config readers and write tests
for them.
- Begin mitigating "low-hanging" globals. To avoid wasting time while
waiting for reviews, start drafting subsequent patches concurrently.
- Publish the first progress reports on the blog.
Phase 2 (June 29 - September 15):
- Discuss globals with mentors where mitigations might cause behavioral
changes.
- Shift focus to the more complex cases, specifically those involving
eager-lazy or '_maybe' transitions.
- Continue publishing regular blog updates.
Phase 3 (September 16 - September 30):
- Act as a buffer period to respond to final feedback on patches
currently under review.
- Complete the final project report and publish it on the blog.
References:
-----------
[1] https://github.com/bkkaracay/caret
[2] https://lore.kernel.org/git/xmqq1pk3lmu3.fsf@gitster.g/
[3] https://lore.kernel.org/git/23428022-ab13-4a3e-90ed-ff91ef93f051@gmail.com/
[4] https://lore.kernel.org/all/pull.2044.v3.git.1771849615.gitgitgadget@gmail.com/
[5] https://lore.kernel.org/git/20260109213021.2546-2-l.s.r@web.de/
---
Thanks to everyone for their time and guidance. I'm really excited about
the possibility of working on this project, and any feedback to make
this proposal better is deeply appreciated.
next reply other threads:[~2026-03-08 11:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-08 11:40 Burak Kaan Karaçay [this message]
2026-03-09 15:17 ` [GSoC Draft Proposal] Refactoring in order to reduce Git's global state Christian Couder
2026-03-11 18:34 ` Burak Kaan Karaçay
2026-03-15 9:52 ` [GSoC Draft Proposal v2] " Burak Kaan Karaçay
-- strict thread matches above, loose matches on Subject: below --
2026-02-22 17:59 [GSoC][Draft Proposal] " Tian Yuchen
2026-02-22 18:34 ` Usman Akinyemi
2026-02-23 0:57 ` Tian Yuchen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa1cn0_ATfh-uRE4@gmail.com \
--to=bkkaracay@gmail.com \
--cc=ayu.chandekar@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=siddharthasthana31@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox