public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: "Burak Kaan Karaçay" <bkkaracay@gmail.com>
To: "Burak Kaan Karaçay" <bkkaracay@gmail.com>, git@vger.kernel.org
Cc: <christian.couder@gmail.com>, <karthik.188@gmail.com>,
	<jltobler@gmail.com>, <ayu.chandekar@gmail.com>,
	<siddharthasthana31@gmail.com>
Subject: [GSoC Draft Proposal v2] Refactoring in order to reduce Git's global state
Date: Sun, 15 Mar 2026 12:52:57 +0300	[thread overview]
Message-ID: <DH39IOSGA9U9.K4GIHJXAPHX7@gmail.com> (raw)
In-Reply-To: <aa1cn0_ATfh-uRE4@gmail.com>

Changes in v2:
- Clarified merge commit - commit hash difference.
- Added 'Project Background' section.
- Refined the part about Olamide's API in 'Technical Approach'.
- Removed 'enum git_error_code' proposal.

Thanks for time and guidance.

---

=================================================
Refactoring in order to reduce Git’s global state
=================================================

Personal Info:
--------------

Name: Burak Kaan Karaçay (he/him)
Email: bkkaracay@gmail.com 
Education: UG Sophomore, Marmara University
GitHub: https://github.com/bkkaracay
Timezone: UTC+3 (Istanbul, Turkey)


My Patches:
-----------

+ (Microproject) t2003: modernize path existence checks using test
helpers
   - Thread:
     https://lore.kernel.org/git/20260208202809.270523-1-bkkaracay@gmail.com/T/
   - Thread v2:
     https://lore.kernel.org/git/20260209112444.1268765-1-bkkaracay@gmail.com/T/
   - Status: Merged to master
   - Merge Commit Hash: 70d3916a7db5233ce01f2f3f36ee04d57c0f9252

+ [PATCH v2 0/2] mailmap: reduce global state
   - Thread:
     https://lore.kernel.org/git/20260219125954.3539324-1-bkkaracay@gmail.com/T/
   - Status: Merged to master
   - Merge Commit Hash: 2d843a2d3d6c2d5e7861e6aa99743d15d36746b9
   
+ [PATCH v3 0/2] run-command: stop using the_repository
   - Thread:
     https://lore.kernel.org/git/20260311151923.4178655-1-bkkaracay@gmail.com/T/
   - Status: Will merge to master
   - Merge Commit Hash (next): 61ffe62b75cf89af469af53b15f3fdc6639d217a


Relevant Experience:
--------------------

I am currently developing my own programming language as a hobby
project, writing a zero-dependency interpreter for it in C. While it is
still a work in progress, I have completed the core front-end pipeline.
Building this project has given me practical experience with C
programming, data structures and modular software architecture.

+ To support potential future multithreading, I avoided global variables
in my own project. Instead, I pass state via local contexts.

+ I implemented an arena allocator (memory pool) to reduce malloc system
call overhead, prevent memory fragmentation and ensure cache locality.

+ I used techniques like string interning and Pratt parsing.

My project is available on my GitHub profile [1]. If you would like to
take a look at the code, 'src/main.c' is a good starting point.


Project Abstract:
-----------------

Git was originally designed as a short-lived CLI tool, where relying on
global variables was highly practical. Over time, the need to embed Git
into other projects and applications emerged. Today, these global
variables are a huge roadblock to the libification of git, as they make
it impossible to properly handle multiple repositories within a single
process or safely support multi-threading.

This project aims to reduce this reliance by migrating global variables
from 'environment.c' into appropriate locations. This effort will
support the libification goal and modernize Git's internal structure.


Project Background:
-------------------

Discussions surrounding the "libification" of git date back as early as
2005 [2]. However, efforts to isolate global state in environment.c
accelerated following Patrick Steinhardt's groundwork in 2024.

Once the environment.c cleanup became an official GSoC project, the
patch series from the first intern in this area, Ayush Chandekar,
provided valuable lessons on best practices and potential pitfalls.
During the later stages of Ayush's internship, the limitations and
safety risks of lazy-parsing became apparent. To solve this bottleneck,
Phillip Wood proposed a new eager-loading API, which was successfully
implemented by Outreachy intern Olamide Caleb Bello. Although this API
is currently functional, to avoid invasive changes across the codebase,
it can currently only read config values from 'the_repository' [3].


Technical Approach:
-------------------

The core challenge of this project is choosing the correct parsing
strategy more than relocating globals. The codebase currently offers two
migration strategies for global state removal.

Currently, globals are loaded eagerly via 'repo_config()'. Olamide's
'struct config_values' API provides a modern way to load these globals
eagerly by parsing them into fields in 'repo->cfg_values'. However,
eager-loading parses all configurations upfront, including unnecessary
ones. Users may encounter fatal configuration errors that are entirely
unrelated to the command they are executing [4].

On the contrary, lazy-loading postpones the parsing process until the
variable is strictly required, preventing unrelated configuration
errors. However, it is significantly trickier to migrate. If a
misformatted configuration triggers a 'die()' in the middle of the
execution, it risks causing data corruption. Moreover, lazy-loading
changes the timing of error reporting and struggles to replicate
eager-loading behavior when multiple configuration keys affect a single
variable [5].

If lazy-loading is considered safe for variable, git provides two APIs
depending on the performance requirements:

   * The 'repo_config_get*' function set is suitable for variables
     accessed infrequently because of underlying string hashing costs. It
     is important to use this API to not bloat the 'struct repo_settings'
     [4].
   
   * For frequently accessed variables, caching them within 'struct
     repo_settings' is preferred, as it amortizes hash costs and provides
     direct memory access speed.

There is no silver bullet solution for migrating globals. Because
transitioning these variables require a deep understanding about the
codebase, communication with mentors and the community is essential.


About Gentle Reading:
---------------------

Current config readers rely on 'die()' to handle error cases. While
pragmatic for cli-tools, fatal exits are unacceptable for a library, as
they will crash the host process. Building upon Derrick Stolee's recent
introduction of gentle parsing functions [6], I propose implementing
'_maybe' variants for core configuration readers. Since removing all
'die()' calls is inevitable for libification, sooner or later config
readers will be purged from 'die()' calls. Utilizing the gentle
functions for newly migrated global variables will reduce the future
amount of work.

Applying this gentle API to widely used functions risks creating
unreviewable patches and merge conflicts. To solve this, I plan to use a
function wrapper approach, similar to the strategy used in early
the_repository migrations [7]. However, the_repository changes are more
mechanical work compared to the gentle transition. In complex call
stacks, a gentle transition risks causing a regression or a scope creep.
Utilizing the "normal" config helpers will be helpful in these
conditions.


Availability:
-------------

I plan to dedicate 40+ hours per week to this project during my active
coding period. However, I want to be completely transparent about my
university's academic calendar to set realistic expectations.

In Turkey, the university summer break begins in July and ends in late
September. During May and June, my schedule will be heavily occupied by
final exams and major group project deadlines. For this reason, my
availability during these two months will be limited to around 10-15
hours per week. I will use this time to stay active on the mailing list,
participate in architectural discussions and submit smaller, preparatory
patches.

To ensure the highest quality of work, I propose utilizing GSoC's
officially supported flexible timeline. I am completely free during
July, August, and September (with no summer school or internships).
During these three months, I will dedicate 40+ hours per week entirely
to git.


Community Bonding (May 1 - May 24):
- Analyze environment.c and create a detailed mitigation plan for each
   variable.
- Discuss the plan with mentors to identify potential roadblocks or edge
   cases.
- Set up a blog to share bi-weekly updates throughout the project.

Phase 1 (May 25 - June 28):
- Introduce the '_maybe' versions of the config readers and write tests
   for them.
- Begin mitigating "low-hanging" globals. To avoid wasting time while
   waiting for reviews, start drafting next patches.
- Publish the first progress reports on the blog.

Phase 2 (June 29 - September 15):
- Discuss globals with mentors where mitigations might cause behavioral
   changes.
- Shift focus to the more complex cases, specifically those involving
   eager-lazy or '_maybe' transitions.
- Continue publishing regular blog updates.

Phase 3 (September 16 - September 30):
- Act as a buffer period to respond to final feedback on patches
   currently under review.
- Complete the final project report and publish it on the blog.

References:
-----------

[1] https://github.com/bkkaracay/caret
[2] https://lore.kernel.org/git/7vpsr6ymg3.fsf_-_@assigned-by-dhcp.cox.net/
[3] https://cloobtech.hashnode.dev/week-5-and-6-design-reviews-rfcs-and-refining-the-path-forward
[4] https://lore.kernel.org/git/xmqq1pk3lmu3.fsf@gitster.g/
[5] https://lore.kernel.org/git/23428022-ab13-4a3e-90ed-ff91ef93f051@gmail.com/
[6] https://lore.kernel.org/all/pull.2044.v3.git.1771849615.gitgitgadget@gmail.com/
[7] https://lore.kernel.org/git/20260109213021.2546-2-l.s.r@web.de/

  parent reply	other threads:[~2026-03-15  9:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-08 11:40 [GSoC Draft Proposal] Refactoring in order to reduce Git's global state Burak Kaan Karaçay
2026-03-09 15:17 ` Christian Couder
2026-03-11 18:34   ` Burak Kaan Karaçay
2026-03-15  9:52 ` Burak Kaan Karaçay [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-02-22 17:59 [GSoC][Draft " Tian Yuchen
2026-02-23  1:07 ` [GSoC][Draft Proposal V2] " Tian Yuchen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DH39IOSGA9U9.K4GIHJXAPHX7@gmail.com \
    --to=bkkaracay@gmail.com \
    --cc=ayu.chandekar@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=siddharthasthana31@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox