public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [GSoC][Draft Proposal] Refactoring in order to reduce Git's global state
@ 2026-02-22 17:59 Tian Yuchen
  2026-02-22 18:34 ` Usman Akinyemi
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Tian Yuchen @ 2026-02-22 17:59 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Karthik Nayak, Justin Tobler, Ayush Chandekar,
	Siddharth Asthana

Hi everyone,

I'm Tian Yuchen and I'm planning to apply for GSoC this year!

Instead of pasting a giant wall of text into this email, I have
drafted my proposal in Google Doc. I thought it might be easier for
everyone to leave inline comments and suggestions there. (Of course, if 
you're more accustomed to email replies, you can also quote the content 
from the doc in your response. Thank you.)

Here is the link:

https://docs.google.com/document/d/1t2sznOvnPz-9tOzVMH--pLxzRqYSJCFzqVWBVfL_NP8/edit?tab=t.0#heading=h.c3c40ftj1ilv

Feel free to provide feedback!

Regards,

Yuchen

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [GSoC Draft Proposal] Refactoring in order to reduce Git's global state
@ 2026-03-08 11:40 Burak Kaan Karaçay
  2026-03-09 15:17 ` Christian Couder
  0 siblings, 1 reply; 21+ messages in thread
From: Burak Kaan Karaçay @ 2026-03-08 11:40 UTC (permalink / raw)
  To: git
  Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
	siddharthasthana31

=================================================
Refactoring in order to reduce Git’s global state
=================================================

Personal Info:
--------------

Name: Burak Kaan Karaçay (he/him)
Email: bkkaracay@gmail.com 
Education: UG Sophomore, Marmara University
GitHub: https://github.com/bkkaracay
Timezone: UTC+3 (Istanbul, Turkey)


My Patches:
-----------

+ (Microproject) t2003: modernize path existence checks using test
helpers
   - Thread:
     https://lore.kernel.org/git/20260208202809.270523-1-bkkaracay@gmail.com/T/
   - Thread v2:
     https://lore.kernel.org/git/20260209112444.1268765-1-bkkaracay@gmail.com/T/
   - Status: Merged to master
   - Commit Hash: 168d575719d944759964e004d17a3282b0f883d5

+ [PATCH 0/2] mailmap: reduce global state
   - Thread:
     https://lore.kernel.org/git/20260219125954.3539324-1-bkkaracay@gmail.com/T/
   - Status: Merged to master
   - Commit Hash: 2d843a2d3d6c2d5e7861e6aa99743d15d36746b9


Relevant Experience:
--------------------

I am currently developing my own programming language as a hobby
project, writing a zero-dependency interpreter for it in C. While it is
still a work in progress, I have completed the core front-end pipeline.
Building this project has given me practical experience with C
programming, data structures and modular software architecture.

+ To support potential future multithreading, I avoided global variables
in my own project. Instead, I pass state via local contexts.

+ I implemented an arena allocator (memory pool) to reduce malloc system
call overhead, prevent memory fragmentation and ensure cache locality.

+ I used techniques like string interning and Pratt parsing.

My project is available on my GitHub profile [1]. If you would like to
take a look at the code, 'src/main.c' is a good starting point.


Project Abstract:
-----------------

Git was originally designed as a short-lived CLI tool, where relying on
global variables was highly practical. Over time, the need to embed Git
into other projects and applications emerged. Today, these global
variables are a huge roadblock to the libification of git, as they make
it impossible to properly handle multiple repositories within a single
process or safely support multi-threading.

This project aims to reduce this reliance by migrating global variables
from 'environment.c' into appropriate locations. This effort will
support the libification goal and modernize Git's internal structure.


Technical Approach:
-------------------

The core challenge of this project is choosing the correct parsing
strategy more than relocating globals. The codebase currently offers two
migration strategies for global state removal.

Currently, globals are loaded eagerly via 'repo_config()'. The modern
'repo_config_values()' API provides a safe and straightforward way to
eagerly load variables and reduce global count. However, eager-loading
parses all configurations upfront, including unnecessary ones. Users may
encounter fatal configuration errors that are entirely unrelated to the
command they are executing [2].

On the contrary, lazy-loading postpones the parsing process until the
variable is strictly required, preventing unrelated configuration
errors. However, it is significantly trickier to migrate. If a
misformatted configuration triggers a 'die()' in the middle of the
execution, it risks causing data corruption. Moreover, lazy-loading
changes the timing of error reporting and struggles to replicate
eager-loading behavior when multiple configuration keys affect a single
variable [3].

If lazy-loading is considered safe for variable, git provides two APIs
depending on the performance requirements:

   * The 'repo_config_get*' function set is suitable for variables
   * accessed infrequently because of underlying string hashing costs. It
   * is important to use this API to not bloat the 'struct repo_settings'
   * [2].
   
   * For frequently accessed variables, caching them within 'struct
   * repo_settings' is preferred, as it amortizes hash costs and provides
   * direct memory access speed.

There is no silver bullet solution for migrating globals. Because
transitioning these variables require a deep understanding about the
codebase, communication with mentors and the community is essential.


About Gentle Reading:
---------------------

Current config readers rely on 'die()' to handle error cases. While
pragmatic for cli-tools, fatal exits are unacceptable for a library, as
they will crash the host process. Building upon Derrick Stolee's recent
introduction of gentle parsing functions [4], I propose implementing
'_maybe' variants for core configuration readers. Since removing all
'die()' calls is inevitable for libification, sooner or later config
readers will be purged from 'die()' calls. Utilizing the gentle
functions for newly migrated global variables will reduce the future
amount of work.

Applying this gentle API to widely used functions risks creating
unreviewable patches and merge conflicts. To solve this, I plan to use a
function wrapper approach, similar to the strategy used in early
the_repository migrations [5]. However, the_repository changes are more
mechanical work compared to the gentle transition. In complex call
stacks, a gentle transition risks causing a regression or a scope creep.
Utilizing the "normal" config helpers will be helpful in these
conditions.

Another possible roadblock in the transition is the magic numbers in
error reporting. Some of the functions in Git use -1 and 1 to inform
callers about two different error cases or situations. Introducing a
third hard-coded number to tell callers to stop the Git process for a
misformatted config would be a poor design choice. Furthermore, adopting
a standardized error structure like enum git_error_code is a step toward
git's ongoing libification efforts, as it enables external callers
consuming the API to handle errors programmatically.


Availability:
-------------

I plan to dedicate 40+ hours per week to this project during my active
coding period. However, I want to be completely transparent about my
university's academic calendar to set realistic expectations.

In Turkey, the university summer break begins in July and ends in late
September. During May and June, my schedule will be heavily occupied by
final exams and major group project deadlines. For this reason, my
availability during these two months will be limited to around 10-15
hours per week. I will use this time to stay active on the mailing list,
participate in architectural discussions and submit smaller, preparatory
patches.

To ensure the highest quality of work, I propose utilizing GSoC's
officially supported flexible timeline. I am completely free during
July, August, and September (with no summer school or internships).
During these three months, I will dedicate 40+ hours per week entirely
to git.


Community Bonding (May 1 - May 24):
- Analyze environment.c and create a detailed mitigation plan for each
   variable.
- Discuss the plan with mentors to identify potential roadblocks or edge
   cases.
- Submit a patch about 'enum git_error_code' to start community
   discussion.
- Set up a blog to share bi-weekly updates throughout the project.

Phase 1 (May 25 - June 28):
- Introduce the '_maybe' versions of the config readers and write tests
   for them.
- Begin mitigating "low-hanging" globals. To avoid wasting time while
   waiting for reviews, start drafting subsequent patches concurrently.
- Publish the first progress reports on the blog.

Phase 2 (June 29 - September 15):
- Discuss globals with mentors where mitigations might cause behavioral
   changes.
- Shift focus to the more complex cases, specifically those involving
   eager-lazy or '_maybe' transitions.
- Continue publishing regular blog updates.

Phase 3 (September 16 - September 30):
- Act as a buffer period to respond to final feedback on patches
   currently under review.
- Complete the final project report and publish it on the blog.

References:
-----------

[1] https://github.com/bkkaracay/caret
[2] https://lore.kernel.org/git/xmqq1pk3lmu3.fsf@gitster.g/
[3] https://lore.kernel.org/git/23428022-ab13-4a3e-90ed-ff91ef93f051@gmail.com/
[4] https://lore.kernel.org/all/pull.2044.v3.git.1771849615.gitgitgadget@gmail.com/
[5] https://lore.kernel.org/git/20260109213021.2546-2-l.s.r@web.de/

---

Thanks to everyone for their time and guidance. I'm really excited about
the possibility of working on this project, and any feedback to make
this proposal better is deeply appreciated.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-03-14 17:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-22 17:59 [GSoC][Draft Proposal] Refactoring in order to reduce Git's global state Tian Yuchen
2026-02-22 18:34 ` Usman Akinyemi
2026-02-23  0:57   ` Tian Yuchen
2026-02-23  1:07 ` [GSoC][Draft Proposal V2] " Tian Yuchen
2026-02-25 17:11 ` [GSoC][Draft Proposal v3] " Tian Yuchen
2026-02-26  9:27   ` Karthik Nayak
2026-02-26 14:03     ` Tian Yuchen
2026-02-26 14:16     ` Tian Yuchen
2026-02-26 17:02   ` [GSoC][Draft Proposal v4] " Tian Yuchen
2026-02-27  9:03     ` Phillip Wood
2026-02-27 15:07       ` Tian Yuchen
2026-02-27 16:58     ` Tian Yuchen
2026-03-01 16:43       ` Phillip Wood
2026-03-01 16:58         ` Tian Yuchen
2026-03-02 19:06         ` Junio C Hamano
2026-03-03 12:11       ` [GSoC][Draft Proposal v6] " Tian Yuchen
2026-03-08 17:38         ` [GSoC][Draft Proposal v7] " Tian Yuchen
2026-03-14 17:57           ` Tian Yuchen
  -- strict thread matches above, loose matches on Subject: below --
2026-03-08 11:40 [GSoC Draft Proposal] " Burak Kaan Karaçay
2026-03-09 15:17 ` Christian Couder
2026-03-11 18:34   ` Burak Kaan Karaçay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox