git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state
@ 2025-03-26  5:26 Ayush Chandekar
  2025-03-28 13:06 ` shejialuo
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ayush Chandekar @ 2025-03-26  5:26 UTC (permalink / raw)
  To: git; +Cc: ps, karthik.188, shejialuo, christian.couder, shyamthakkar001

Hello,
This is my GSoC 2025 proposal for the project "Refactoring in order to reduce Git’s global state".
You can view docs version here: 
https://docs.google.com/document/d/1tJrtWxo1UGKChB3hu5eZ-ljm0FtU_fsv0TnIRwu3EKY/edit?usp=sharing

---------

Refactoring in order to reduce git’s state

My Information:
---------------

Name: Ayush Chandekar
Email: ayu.chandekar@gmail.com
Mobile No: (+91) 9372496874
Education: UG Sophomore, IIT Roorkee
Github: https://github.com/ayu-ch
Blog: https://ayu-ch.github.io


About me:
---------

I'm Ayush Chandekar, a UG Sophomore studying at Indian Institute of
Technology, Roorkee. I like participating in various software development
and tech-development endeavors, usually hackathons, CTFs, and projects at
SDSLabs. SDSLabs is a student-run technical group that includes passionate
developers and designers interested in various fields and involved in multiple
software development projects that aim to foster a software development
culture on campus. Being a part of this group has exposed me to different
software development methodologies, tools and frameworks and helped me become
comfortable contributing to an open-source project with multiple contributors.
Some open-source contributions I made here are: [1], [2] & [3]

I see this project as a meaningful opportunity to deepen my involvement in
the Git community and to build a foundation for continued contributions to
open source development in the future.


Overview:
---------

Git currently uses a global object called `the_repository`, which refers to a
single instance of `struct repository`. Many internal functions rely on this
global object rather than accepting a `struct repository` as an explicit
parameter. This design inherently assumes a single active repository,
making it difficult to support multi-repository use cases and obstructing
the long-term goal of libification of Git.

A key architectural limitation is that while `struct repository` encapsulates
some repository-specific information, many important environment variables
and configuration settings that logically belong to a repository are still
stored as global variables, primarily in `environment.c`, not within the
`repository` struct. As a result, even if multiple repositories were to
exist concurrently, they would still share this global state, leading to
incorrect behavior, race conditions, or subtle bugs.

This project aims to refactor Git’s environment handling by relocating
these global variables into appropriate local contexts primarily into
`struct repository` and `struct repo_settings`. This change will not
only make the environment state repository-specific, but also improve the
modularity and maintainability of the codebase. The work involves identifying
environment-related global variables, designing a suitable structure to
house them within the repository context, and updating all affected code
paths accordingly.

The difficulty of this project is medium, and it is estimated to take 
175 to 350 hours.


Pre-GSOC:
---------

I started exploring Git’s codebase and documentation around the end of
January, familiarizing myself with its structure and development practices. I
submitted a microproject, which helped me navigate the code and contribution
workflow.

After selecting the project on refactoring Git’s state, I studied the
surrounding code and reviewed past patches ([4], [5], [6], [7], [8] & [9])
to understand the reasoning behind previous changes. To better prepare
for the GSoC timeline, I also submitted a patch related to the project,
gaining hands-on experience with both the implementation details and the
submission process.


Patches:
--------

+ (Microproject) t6423: fix suppression of Git’s exit code in tests
	Thread:
	https://public-inbox.org/git/20250202120926.322417-1-ayu.chandekar@gmail.com/
	Status: Merged into master 
	Commit Hash: 7c1d34fe5d1229362f2c3ecf2d493167a1f555a2 
	Description: Instead of executing a Git command as the upstream component of
				 a pipe, which can result in the exit status being lost, redirect
				 its output to a file and then process that file in two steps to
				 ensure the exit status is properly preserved.

+ midx: implement progress reporting for QSORT operation
	Thread:
	https://public-inbox.org/git/20250210074623.136599-1-ayu.chandekar@gmail.com/
	Status: Dropped 
	Description: Add progress reporting during the QSORT operation in 
				 multi-pack-index verification. While going through the code, 
				 I found this TODO, which I thought was interesting however my 
				 approach assumed that the qsort() operation processes elements
				 in a structured order, which isn't guaranteed.

+ Stop depending on `the_repository` for core.attributesfile
	Thread:
	https://public-inbox.org/git/20250310151048.69825-1-ayu.chandekar@gmail.com/
	Status: WIP, needs more discussion.  
	Description: This patch refactors access to the `core.attributesfiles` 
				 configuration by moving it into the `repo_settings` struct.
				 It eliminates the global variable `git_attributes_file` and 
				 updates relevant code paths to pass the `struct repository`
				 as a parameter.


Proposed Plan:
--------------

I have been reviewing global variables across the codebase to understand their
dependencies and impact. To do this, I examined `config.c` and cross-referenced
it with `environment.c` to see how these variables are currently managed. The
goal of this project is to eliminate global variables by moving their
configurations into repository-specific settings. This involves:

-   Identifying all occurrences of these global variables.

-   Removing dependencies on `the_repository`.

-   Updating function signatures to pass `struct repository` explicitly.

-   Replacing global variable references with repository-scoped configurations.

Instead of adding all variables directly into `repo_settings`, we can group
related variables into specialized structs (e.g., `performance_config`,
`behaviour_config`, `whitespace_config`) and embed these within `repo_settings`.
This approach ensures a more modular and maintainable design while keeping 
`repo_settings` manageable.

I have also created a diagram explaining this structure in [10].

With this approach, I can structure the patch series by grouping the refactoring 
of related variables within specific structs. This will help maintain a clean and
organized codebase while also making the development and review process more 
systematic and efficient.

One key challenge is determining which variables should be part of
`repo_settings` and which should remain separate. While working on the patch to
refactor access to `core.attributesfile`, I received feedback from Junio that not
all global variables should be blindly moved into the `repo_settings` struct.
This reinforced the need to carefully assess which variables belong in `repo_settings`
and which should be handled differently.

This plan is flexible and may be refined through multiple iterations as I receive
feedback from the community and reviewers.

Timeline:
---------

Pre-GSOC: 
(Until 8 May) 
-	Explore the codebase more, focusing on environment-related code paths.
-	Document how each global variable is used and how it can be moved to 
	repository settings.  
-	Study Git’s Coding Guidelines and the Pro Git Book to align with best practices.

----------

Community Bonding: 
(May 8 - June 1) 
-	Engage with mentors to discuss different environment variables, their 
	dependencies, and the best approach for refactoring.
-	Finalize an implementation plan based on discussions.
-	Since I will be on summer vacation, I can start coding early and make progress 
	on the project.

----------

Coding Period: 
(June 2 - August 25) 
-	Refactor global variables, replacing them with repository-scoped 
	configurations.  
-	Modify function signatures to pass `struct repository` explicitly instead
	of relying on `the_repository`.
-	Categorize variables into specialized structs to improve modularity and
	maintainability.  
-	Continuously submit patches for review and incorporate feedback from mentors
	and the community.  
-	I plan to write weekly blogs which will document what I did in the whole 
	week.

----------

Final Week: 
(August 25 - September 1) 
-	Write a detailed report on the entire project.  
-	Fix bugs if any.  
-	Reflect on the project, noting challenges faced and lessons learned.


Blogging:
---------

I have also set up a blogging page at [11]. While reading blogs from previous
GSoC contributors, I found them useful in understanding the challenges
they faced and how they approached their projects. Their experiences gave
me a better idea of what to expect and how to navigate the development
process. Inspired by this, I decided to start my own blog to document my
journey throughout GSoC. This will not only help me track my own progress but
also serve as a resource for future contributors who might work on similar
projects. I plan to share updates on my work, challenges encountered and
insights gained from discussions with mentors and the community.

Additionally, I hope my blog encourages more people to contribute to open
source by providing a transparent look into the development process. Writing
about my experience will also help me reflect on my work and improve my
ability to communicate technical ideas effectively.

I liked the format and structure of Chandra's blog, so I decided to use the
same template for my own blogging page.


Availability:
-------------

As a college student, I intend to utilise my summer breaks from May to July
to work on the project. After completing my University exams in April, I can
start working in May. I can dedicate 40 hours a week from May to July, while
in August after the classes commence, I can dedicate about 25 hours a week.

There are no exams or planned vacations throughout the coding period. Besides
this project, I have no commitments/vacations planned for the summer. I shall
keep my status posted to all the community members and maintain transparency
in the project.


Post-GSOC:
----------

Beyond contributing code, I strongly believe in giving back to the community
and helping others grow. Open source thrives on mentorship, knowledge sharing,
and long-term involvement, and I would love to continue contributing even
after GSoC ends.

I have always valued mentorship, both as a mentee and as someone who enjoys
guiding others. If given the opportunity, I would be more than happy to
mentor/co-mentor future GSoC contributors. By staying involved in the
community, whether through contributing, reviewing patches, or mentoring,
I hope to help sustain and expand the project’s reach. I look at GSoC as not 
just as a one-time contribution but as a step toward a longer-term relationship
with open source.

I will continue to be involved with Git even after GSoC by contributing patches,
reviewing code, and participating in discussions. My work on refactoring Git’s 
state aligns with long-term improvements to the codebase, and I plan to keep 
refining it beyond the program. I see GSoC as just the beginning of my journey
with Git.

Appreciation:
-------------

I appreciate the Git community for its excellent documentation, which made it 
much easier for me to understand Git in depth. The well-structured resources 
helped me navigate the codebase and gain a deeper understanding of how Git 
works internally.

Beyond the documentation, I am also grateful for how welcoming and supportive 
the community has been. Whether through discussions on the mailing list or 
feedback on my patches, the information and guidance I received made my 
experience even better.

Additionally, I read the blogs and proposals of Chandra, Jialuo, and Ghanashyam, 
which provided valuable insights into their journeys and helped me shape my 
own approach to contributing.

Thanks for reviewing this proposal.

References:
-----------

[1] https://github.com/sdslabs/beast/pull/374

[2] https://github.com/sdslabs/beast/tree/add-teams-with-hint

[3] https://github.com/sdslabs/playCTF/pull/177

[4] https://public-inbox.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/

[5] https://public-inbox.org/git/20250303-b4-pks-objects-without-the-repository-v1-0-c5dd43f2476e@pks.im/

[6] https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-0-4e77f0313206@pks.im/

[7] https://public-inbox.org/git/pull.1829.git.1731653548549.gitgitgadget@gmail.com/#t

[8] https://public-inbox.org/git/cover.1733236936.git.karthik.188@gmail.com/

[9] https://public-inbox.org/git/cover.1724923648.git.ps@pks.im/

[10] https://www.mermaidchart.com/raw/327324ea-af1d-4a98-8bff-254479b3a79c?theme=light&version=v0.1&format=svg

[11] https://ayu-ch.github.io

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state
  2025-03-26  5:26 [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state Ayush Chandekar
@ 2025-03-28 13:06 ` shejialuo
  2025-03-29  9:54   ` Ayush Chandekar
  2025-04-04  8:51 ` [GSOC] [PROPOSAL v2]: " Ayush Chandekar
  2025-04-08 12:52 ` [GSOC] [PROPOSAL v3]: " Ayush Chandekar
  2 siblings, 1 reply; 17+ messages in thread
From: shejialuo @ 2025-03-28 13:06 UTC (permalink / raw)
  To: Ayush Chandekar; +Cc: git, ps, karthik.188, christian.couder, shyamthakkar001

On Wed, Mar 26, 2025 at 10:56:00AM +0530, Ayush Chandekar wrote:
> Hello,
> This is my GSoC 2025 proposal for the project "Refactoring in order to reduce Git’s global state".
> You can view docs version here: 
> https://docs.google.com/document/d/1tJrtWxo1UGKChB3hu5eZ-ljm0FtU_fsv0TnIRwu3EKY/edit?usp=sharing
> 
> ---------
> 
> Refactoring in order to reduce git’s state
> 
> My Information:
> ---------------
> 
> Name: Ayush Chandekar
> Email: ayu.chandekar@gmail.com
> Mobile No: (+91) 9372496874
> Education: UG Sophomore, IIT Roorkee
> Github: https://github.com/ayu-ch
> Blog: https://ayu-ch.github.io
> 
> 
> About me:
> ---------
> 
> I'm Ayush Chandekar, a UG Sophomore studying at Indian Institute of
> Technology, Roorkee. I like participating in various software development
> and tech-development endeavors, usually hackathons, CTFs, and projects at
> SDSLabs. SDSLabs is a student-run technical group that includes passionate
> developers and designers interested in various fields and involved in multiple
> software development projects that aim to foster a software development
> culture on campus. Being a part of this group has exposed me to different
> software development methodologies, tools and frameworks and helped me become
> comfortable contributing to an open-source project with multiple contributors.
> Some open-source contributions I made here are: [1], [2] & [3]
> 
> I see this project as a meaningful opportunity to deepen my involvement in
> the Git community and to build a foundation for continued contributions to
> open source development in the future.
> 
> 
> Overview:
> ---------
> 
> Git currently uses a global object called `the_repository`, which refers to a
> single instance of `struct repository`. Many internal functions rely on this
> global object rather than accepting a `struct repository` as an explicit
> parameter. This design inherently assumes a single active repository,
> making it difficult to support multi-repository use cases and obstructing
> the long-term goal of libification of Git.
> 
> A key architectural limitation is that while `struct repository` encapsulates
> some repository-specific information, many important environment variables
> and configuration settings that logically belong to a repository are still
> stored as global variables, primarily in `environment.c`, not within the
> `repository` struct. As a result, even if multiple repositories were to
> exist concurrently, they would still share this global state, leading to
> incorrect behavior, race conditions, or subtle bugs.
> 
> This project aims to refactor Git’s environment handling by relocating
> these global variables into appropriate local contexts primarily into
> `struct repository` and `struct repo_settings`. This change will not

I think we could just improve this statement better. Some global
variables may only apply to one subsystem or two subsystems. In such
situations, we may just put the global variable into their own context
but not the "struct repository" or "struct repo_settings".

> only make the environment state repository-specific, but also improve the
> modularity and maintainability of the codebase. The work involves identifying
> environment-related global variables, designing a suitable structure to
> house them within the repository context, and updating all affected code
> paths accordingly.
> 
> The difficulty of this project is medium, and it is estimated to take 
> 175 to 350 hours.
> 
> 
> Pre-GSOC:
> ---------
> 
> I started exploring Git’s codebase and documentation around the end of
> January, familiarizing myself with its structure and development practices. I
> submitted a microproject, which helped me navigate the code and contribution
> workflow.
> 
> After selecting the project on refactoring Git’s state, I studied the
> surrounding code and reviewed past patches ([4], [5], [6], [7], [8] & [9])
> to understand the reasoning behind previous changes. To better prepare
> for the GSoC timeline, I also submitted a patch related to the project,
> gaining hands-on experience with both the implementation details and the
> submission process.
> 
> 
> Patches:
> --------
> 
> + (Microproject) t6423: fix suppression of Git’s exit code in tests
> 	Thread:
> 	https://public-inbox.org/git/20250202120926.322417-1-ayu.chandekar@gmail.com/
> 	Status: Merged into master 
> 	Commit Hash: 7c1d34fe5d1229362f2c3ecf2d493167a1f555a2 
> 	Description: Instead of executing a Git command as the upstream component of
> 				 a pipe, which can result in the exit status being lost, redirect
> 				 its output to a file and then process that file in two steps to
> 				 ensure the exit status is properly preserved.
> 
> + midx: implement progress reporting for QSORT operation
> 	Thread:
> 	https://public-inbox.org/git/20250210074623.136599-1-ayu.chandekar@gmail.com/
> 	Status: Dropped 
> 	Description: Add progress reporting during the QSORT operation in 
> 				 multi-pack-index verification. While going through the code, 
> 				 I found this TODO, which I thought was interesting however my 
> 				 approach assumed that the qsort() operation processes elements
> 				 in a structured order, which isn't guaranteed.
> 
> + Stop depending on `the_repository` for core.attributesfile
> 	Thread:
> 	https://public-inbox.org/git/20250310151048.69825-1-ayu.chandekar@gmail.com/
> 	Status: WIP, needs more discussion.  
> 	Description: This patch refactors access to the `core.attributesfiles` 
> 				 configuration by moving it into the `repo_settings` struct.
> 				 It eliminates the global variable `git_attributes_file` and 
> 				 updates relevant code paths to pass the `struct repository`
> 				 as a parameter.
> 
> 
> Proposed Plan:
> --------------
> 
> I have been reviewing global variables across the codebase to understand their
> dependencies and impact. To do this, I examined `config.c` and cross-referenced
> it with `environment.c` to see how these variables are currently managed. The
> goal of this project is to eliminate global variables by moving their
> configurations into repository-specific settings. This involves:
> 
> -   Identifying all occurrences of these global variables.
> 
> -   Removing dependencies on `the_repository`.
> 
> -   Updating function signatures to pass `struct repository` explicitly.
> 

When reading this, I feel a little wired because I think in [1], you
have already realized that we should move some global variables to some
specific subsystems.

[1] https://public-inbox.org/git/20250310151048.69825-1-ayu.chandekar@gmail.com/ 

> -   Replacing global variable references with repository-scoped configurations.
> 
> Instead of adding all variables directly into `repo_settings`, we can group
> related variables into specialized structs (e.g., `performance_config`,
> `behaviour_config`, `whitespace_config`) and embed these within `repo_settings`.
> This approach ensures a more modular and maintainable design while keeping 
> `repo_settings` manageable.
> 
> I have also created a diagram explaining this structure in [10].
> 
> With this approach, I can structure the patch series by grouping the refactoring 
> of related variables within specific structs. This will help maintain a clean and
> organized codebase while also making the development and review process more 
> systematic and efficient.
> 

Yes, it's a good idea to use sub-structure to make the code be cleaner.
However, from my own experience when being a GSoC student, we should not
consider about this due to that we will over-engineer this.

You need a lot of time and effort to convince the community why the
design is good and why we should put this variable into this
sub-structure.

Instead, you'd better focus on which variables you want to remove
firstly. And how do you try to remove them with some simple steps. This
would make you more concentrate on the jobs you need to do and reduce
the risk.

> One key challenge is determining which variables should be part of
> `repo_settings` and which should remain separate. While working on the patch to
> refactor access to `core.attributesfile`, I received feedback from Junio that not
> all global variables should be blindly moved into the `repo_settings` struct.
> This reinforced the need to carefully assess which variables belong in `repo_settings`
> and which should be handled differently.
> 

Yes, this is correct. I somehow think whether we should put this
paragraph into Pre-GSoC part? I think that you have found this when
adding a patch to remove one global variable. And thus by communicating
with the community, you have further understood that the requirement and
the detail of this project.

And in your plan, you should just say that we need to do this. Would
this be better?

> This plan is flexible and may be refined through multiple iterations as I receive
> feedback from the community and reviewers.
> 
> Timeline:
> ---------
> 
> Pre-GSOC: 
> (Until 8 May) 
> -	Explore the codebase more, focusing on environment-related code paths.
> -	Document how each global variable is used and how it can be moved to 
> 	repository settings.  
> -	Study Git’s Coding Guidelines and the Pro Git Book to align with best practices.
> 
> ----------
> 
> Community Bonding: 
> (May 8 - June 1) 
> -	Engage with mentors to discuss different environment variables, their 
> 	dependencies, and the best approach for refactoring.
> -	Finalize an implementation plan based on discussions.
> -	Since I will be on summer vacation, I can start coding early and make progress 
> 	on the project.
> 
> ----------
> 
> Coding Period: 
> (June 2 - August 25) 
> -	Refactor global variables, replacing them with repository-scoped 
> 	configurations.  
> -	Modify function signatures to pass `struct repository` explicitly instead
> 	of relying on `the_repository`.
> -	Categorize variables into specialized structs to improve modularity and
> 	maintainability.  

As I have said, this is a high-risk task. Categorization needs many
iterations. And we may do this after GSoC.

> -	Continuously submit patches for review and incorporate feedback from mentors
> 	and the community.  
> -	I plan to write weekly blogs which will document what I did in the whole 
> 	week.
> 
> ----------
> 
> Final Week: 
> (August 25 - September 1) 
> -	Write a detailed report on the entire project.  
> -	Fix bugs if any.  
> -	Reflect on the project, noting challenges faced and lessons learned.
> 
> 
> Blogging:
> ---------
> 
> I have also set up a blogging page at [11]. While reading blogs from previous
> GSoC contributors, I found them useful in understanding the challenges
> they faced and how they approached their projects. Their experiences gave
> me a better idea of what to expect and how to navigate the development
> process. Inspired by this, I decided to start my own blog to document my
> journey throughout GSoC. This will not only help me track my own progress but
> also serve as a resource for future contributors who might work on similar
> projects. I plan to share updates on my work, challenges encountered and
> insights gained from discussions with mentors and the community.
> 
> Additionally, I hope my blog encourages more people to contribute to open
> source by providing a transparent look into the development process. Writing
> about my experience will also help me reflect on my work and improve my
> ability to communicate technical ideas effectively.
> 
> I liked the format and structure of Chandra's blog, so I decided to use the
> same template for my own blogging page.
> 
> 
> Availability:
> -------------
> 
> As a college student, I intend to utilise my summer breaks from May to July
> to work on the project. After completing my University exams in April, I can
> start working in May. I can dedicate 40 hours a week from May to July, while
> in August after the classes commence, I can dedicate about 25 hours a week.
> 
> There are no exams or planned vacations throughout the coding period. Besides
> this project, I have no commitments/vacations planned for the summer. I shall
> keep my status posted to all the community members and maintain transparency
> in the project.
> 
> 
> Post-GSOC:
> ----------
> 
> Beyond contributing code, I strongly believe in giving back to the community
> and helping others grow. Open source thrives on mentorship, knowledge sharing,
> and long-term involvement, and I would love to continue contributing even
> after GSoC ends.
> 
> I have always valued mentorship, both as a mentee and as someone who enjoys
> guiding others. If given the opportunity, I would be more than happy to
> mentor/co-mentor future GSoC contributors. By staying involved in the
> community, whether through contributing, reviewing patches, or mentoring,
> I hope to help sustain and expand the project’s reach. I look at GSoC as not 
> just as a one-time contribution but as a step toward a longer-term relationship
> with open source.
> 
> I will continue to be involved with Git even after GSoC by contributing patches,
> reviewing code, and participating in discussions. My work on refactoring Git’s 
> state aligns with long-term improvements to the codebase, and I plan to keep 
> refining it beyond the program. I see GSoC as just the beginning of my journey
> with Git.
> 
> Appreciation:
> -------------
> 
> I appreciate the Git community for its excellent documentation, which made it 
> much easier for me to understand Git in depth. The well-structured resources 
> helped me navigate the codebase and gain a deeper understanding of how Git 
> works internally.
> 
> Beyond the documentation, I am also grateful for how welcoming and supportive 
> the community has been. Whether through discussions on the mailing list or 
> feedback on my patches, the information and guidance I received made my 
> experience even better.
> 
> Additionally, I read the blogs and proposals of Chandra, Jialuo, and Ghanashyam, 
> which provided valuable insights into their journeys and helped me shape my 
> own approach to contributing.

I'm happy that my blogs help you.

> 
> Thanks for reviewing this proposal.
> 

Thanks for your proposal!

> References:
> -----------
> 
> [1] https://github.com/sdslabs/beast/pull/374
> 
> [2] https://github.com/sdslabs/beast/tree/add-teams-with-hint
> 
> [3] https://github.com/sdslabs/playCTF/pull/177
> 
> [4] https://public-inbox.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/
> 
> [5] https://public-inbox.org/git/20250303-b4-pks-objects-without-the-repository-v1-0-c5dd43f2476e@pks.im/
> 
> [6] https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-0-4e77f0313206@pks.im/
> 
> [7] https://public-inbox.org/git/pull.1829.git.1731653548549.gitgitgadget@gmail.com/#t
> 
> [8] https://public-inbox.org/git/cover.1733236936.git.karthik.188@gmail.com/
> 
> [9] https://public-inbox.org/git/cover.1724923648.git.ps@pks.im/
> 
> [10] https://www.mermaidchart.com/raw/327324ea-af1d-4a98-8bff-254479b3a79c?theme=light&version=v0.1&format=svg
> 
> [11] https://ayu-ch.github.io

Jialuo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state
  2025-03-28 13:06 ` shejialuo
@ 2025-03-29  9:54   ` Ayush Chandekar
  2025-03-31 14:17     ` shejialuo
  0 siblings, 1 reply; 17+ messages in thread
From: Ayush Chandekar @ 2025-03-29  9:54 UTC (permalink / raw)
  To: shejialuo; +Cc: git, ps, karthik.188, christian.couder, shyamthakkar001

> > This project aims to refactor Git’s environment handling by relocating
> > these global variables into appropriate local contexts primarily into
> > `struct repository` and `struct repo_settings`. This change will not
>
> I think we could just improve this statement better. Some global
> variables may only apply to one subsystem or two subsystems. In such
> situations, we may just put the global variable into their own context
> but not the "struct repository" or "struct repo_settings".

Right, I was generalizing in my statement, but I agree that some
global variables
may belong in subsystem-specific contexts rather than `struct repository` or
`struct repo_settings`. I'll make sure to account for that distinction
in my proposal
and implementation.

> >
> > Proposed Plan:
> > --------------
> >
> > I have been reviewing global variables across the codebase to understand their
> > dependencies and impact. To do this, I examined `config.c` and cross-referenced
> > it with `environment.c` to see how these variables are currently managed. The
> > goal of this project is to eliminate global variables by moving their
> > configurations into repository-specific settings. This involves:
> >
> > -   Identifying all occurrences of these global variables.
> >
> > -   Removing dependencies on `the_repository`.
> >
> > -   Updating function signatures to pass `struct repository` explicitly.
> >
>
> When reading this, I feel a little wired because I think in [1], you
> have already realized that we should move some global variables to some
> specific subsystems.
>
> [1] https://public-inbox.org/git/20250310151048.69825-1-ayu.chandekar@gmail.com/
>

I see your point. I was generalizing again, but I’ll make sure to
clarify this distinction
in my proposal. Thanks for pointing it out!

> > -   Replacing global variable references with repository-scoped configurations.
> >
> > Instead of adding all variables directly into `repo_settings`, we can group
> > related variables into specialized structs (e.g., `performance_config`,
> > `behaviour_config`, `whitespace_config`) and embed these within `repo_settings`.
> > This approach ensures a more modular and maintainable design while keeping
> > `repo_settings` manageable.
> >
> > I have also created a diagram explaining this structure in [10].
> >
> > With this approach, I can structure the patch series by grouping the refactoring
> > of related variables within specific structs. This will help maintain a clean and
> > organized codebase while also making the development and review process more
> > systematic and efficient.
> >
>
> Yes, it's a good idea to use sub-structure to make the code be cleaner.
> However, from my own experience when being a GSoC student, we should not
> consider about this due to that we will over-engineer this.
>
> You need a lot of time and effort to convince the community why the
> design is good and why we should put this variable into this
> sub-structure.
>
> Instead, you'd better focus on which variables you want to remove
> firstly. And how do you try to remove them with some simple steps. This
> would make you more concentrate on the jobs you need to do and reduce
> the risk.
>

That makes sense. My intent was to provide a structured approach, but
I could see how it
it may introduce unnecessary complexity at this stage. I'll focus on
identifying and managing
the global variables before considering additional structuring. I
think this is something which
can be done at the end of the project. That is, once all the variables
are handled, they can
then be set into specific structures, if needed.

> > One key challenge is determining which variables should be part of
> > `repo_settings` and which should remain separate. While working on the patch to
> > refactor access to `core.attributesfile`, I received feedback from Junio that not
> > all global variables should be blindly moved into the `repo_settings` struct.
> > This reinforced the need to carefully assess which variables belong in `repo_settings`
> > and which should be handled differently.
> >
>
> Yes, this is correct. I somehow think whether we should put this
> paragraph into Pre-GSoC part? I think that you have found this when
> adding a patch to remove one global variable. And thus by communicating
> with the community, you have further understood that the requirement and
> the detail of this project.

Yep, since I encountered this while working on the patch, it fits well
in the Pre-GSoC section.
Moving it there would better show how I learned more about the
project's scope through
community feedback.

>
> And in your plan, you should just say that we need to do this. Would
> this be better?
>
So, I should remove all the categorization stuff and just say that I
would focus on
each variable, discuss in the community whether it should belong in the struct
repo_settings/repo or not while sending patches?
I felt that keeping it general might seem vague, but that's the nature
of the project. Every variable
is unique and would need a different approach and outlining the
approach of each variable
in the proposal is not very feasible, as these decisions need to
happen collaboratively through
discussions in the community.

Should I still mention that once the project is complete, we could
consider structuring related
stuff if the community sees value in it.

> > This plan is flexible and may be refined through multiple iterations as I receive
> > feedback from the community and reviewers.
> >
> > Timeline:
> > ---------
> >
> > Pre-GSOC:
> > (Until 8 May)
> > -     Explore the codebase more, focusing on environment-related code paths.
> > -     Document how each global variable is used and how it can be moved to
> >       repository settings.
> > -     Study Git’s Coding Guidelines and the Pro Git Book to align with best practices.
> >
> > ----------
> >
> > Community Bonding:
> > (May 8 - June 1)
> > -     Engage with mentors to discuss different environment variables, their
> >       dependencies, and the best approach for refactoring.
> > -     Finalize an implementation plan based on discussions.
> > -     Since I will be on summer vacation, I can start coding early and make progress
> >       on the project.
> >
> > ----------
> >
> > Coding Period:
> > (June 2 - August 25)
> > -     Refactor global variables, replacing them with repository-scoped
> >       configurations.
> > -     Modify function signatures to pass `struct repository` explicitly instead
> >       of relying on `the_repository`.
> > -     Categorize variables into specialized structs to improve modularity and
> >       maintainability.
>
> As I have said, this is a high-risk task. Categorization needs many
> iterations. And we may do this after GSoC.

Yep, will update that.

Thanks for your review, again:)

Ayush

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state
  2025-03-29  9:54   ` Ayush Chandekar
@ 2025-03-31 14:17     ` shejialuo
  2025-03-31 15:04       ` Ayush Chandekar
  0 siblings, 1 reply; 17+ messages in thread
From: shejialuo @ 2025-03-31 14:17 UTC (permalink / raw)
  To: Ayush Chandekar; +Cc: git, ps, karthik.188, christian.couder, shyamthakkar001

On Sat, Mar 29, 2025 at 03:24:05PM +0530, Ayush Chandekar wrote:

[snip]

> > > One key challenge is determining which variables should be part of
> > > `repo_settings` and which should remain separate. While working on the patch to
> > > refactor access to `core.attributesfile`, I received feedback from Junio that not
> > > all global variables should be blindly moved into the `repo_settings` struct.
> > > This reinforced the need to carefully assess which variables belong in `repo_settings`
> > > and which should be handled differently.
> > >
> >
> > Yes, this is correct. I somehow think whether we should put this
> > paragraph into Pre-GSoC part? I think that you have found this when
> > adding a patch to remove one global variable. And thus by communicating
> > with the community, you have further understood that the requirement and
> > the detail of this project.
> 
> Yep, since I encountered this while working on the patch, it fits well
> in the Pre-GSoC section.
> Moving it there would better show how I learned more about the
> project's scope through
> community feedback.
> 

Yes, this is my intention. This represents your ability where you
interact with the community and get feedback. And this is what we want
to see.

> >
> > And in your plan, you should just say that we need to do this. Would
> > this be better?
> >
> So, I should remove all the categorization stuff and just say that I
> would focus on
> each variable, discuss in the community whether it should belong in the struct
> repo_settings/repo or not while sending patches?

I think you should put the categorization stuff into after-GSoC part.
Well, I don't think you could focus on _each_ variable. This is
impossible for you to talk about the way for _each_ variable. I somehow
think that you could just write the proposal about how to handle one or
two global variables.

You already touch one setting "core.attributesfile" right? You may just
elaborate more in the proposal.

> I felt that keeping it general might seem vague, but that's the nature
> of the project. Every variable
> is unique and would need a different approach and outlining the
> approach of each variable
> in the proposal is not very feasible, as these decisions need to
> happen collaboratively through
> discussions in the community.
> 

Yes, so you could firstly give how you want to handle the global
variables from top. And give some concrete examples to demonstrate your
idea.

> Should I still mention that once the project is complete, we could
> consider structuring related
> stuff if the community sees value in it.
> 

You could, mention this in after GSoC part.

> > > This plan is flexible and may be refined through multiple iterations as I receive
> > > feedback from the community and reviewers.
> > >
> > > Timeline:
> > > ---------
> > >
> > > Pre-GSOC:
> > > (Until 8 May)
> > > -     Explore the codebase more, focusing on environment-related code paths.
> > > -     Document how each global variable is used and how it can be moved to
> > >       repository settings.
> > > -     Study Git’s Coding Guidelines and the Pro Git Book to align with best practices.
> > >
> > > ----------
> > >
> > > Community Bonding:
> > > (May 8 - June 1)
> > > -     Engage with mentors to discuss different environment variables, their
> > >       dependencies, and the best approach for refactoring.
> > > -     Finalize an implementation plan based on discussions.
> > > -     Since I will be on summer vacation, I can start coding early and make progress
> > >       on the project.
> > >
> > > ----------
> > >
> > > Coding Period:
> > > (June 2 - August 25)
> > > -     Refactor global variables, replacing them with repository-scoped
> > >       configurations.
> > > -     Modify function signatures to pass `struct repository` explicitly instead
> > >       of relying on `the_repository`.
> > > -     Categorize variables into specialized structs to improve modularity and
> > >       maintainability.
> >
> > As I have said, this is a high-risk task. Categorization needs many
> > iterations. And we may do this after GSoC.
> 
> Yep, will update that.
> 
> Thanks for your review, again:)
> 
> Ayush

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state
  2025-03-31 14:17     ` shejialuo
@ 2025-03-31 15:04       ` Ayush Chandekar
  2025-03-31 15:18         ` Ayush Chandekar
  0 siblings, 1 reply; 17+ messages in thread
From: Ayush Chandekar @ 2025-03-31 15:04 UTC (permalink / raw)
  To: shejialuo; +Cc: git, ps, karthik.188, christian.couder, shyamthakkar001

> > Yep, since I encountered this while working on the patch, it fits well
> > in the Pre-GSoC section.
> > Moving it there would better show how I learned more about the
> > project's scope through
> > community feedback.
> >
>
> Yes, this is my intention. This represents your ability where you
> interact with the community and get feedback. And this is what we want
> to see.
>

Got it!

> > So, I should remove all the categorization stuff and just say that I
> > would focus on
> > each variable, discuss in the community whether it should belong in the struct
> > repo_settings/repo or not while sending patches?
>
> I think you should put the categorization stuff into after-GSoC part.
> Well, I don't think you could focus on _each_ variable. This is
> impossible for you to talk about the way for _each_ variable. I somehow
> think that you could just write the proposal about how to handle one or
> two global variables.
>

Right, I can do that.

> You already touch one setting "core.attributesfile" right? You may just
> elaborate more in the proposal.
>

Yep!

> > I felt that keeping it general might seem vague, but that's the nature
> > of the project. Every variable
> > is unique and would need a different approach and outlining the
> > approach of each variable
> > in the proposal is not very feasible, as these decisions need to
> > happen collaboratively through
> > discussions in the community.
> >
>
> Yes, so you could firstly give how you want to handle the global
> variables from top. And give some concrete examples to demonstrate your
> idea.
>

Alright, will do.

I'll send a new iteration of the proposal soon.

Thank you so much for your inputs:)
Ayush

On Mon, Mar 31, 2025 at 7:47 PM shejialuo <shejialuo@gmail.com> wrote:
>
> On Sat, Mar 29, 2025 at 03:24:05PM +0530, Ayush Chandekar wrote:
>
> [snip]
>
> > > > One key challenge is determining which variables should be part of
> > > > `repo_settings` and which should remain separate. While working on the patch to
> > > > refactor access to `core.attributesfile`, I received feedback from Junio that not
> > > > all global variables should be blindly moved into the `repo_settings` struct.
> > > > This reinforced the need to carefully assess which variables belong in `repo_settings`
> > > > and which should be handled differently.
> > > >
> > >
> > > Yes, this is correct. I somehow think whether we should put this
> > > paragraph into Pre-GSoC part? I think that you have found this when
> > > adding a patch to remove one global variable. And thus by communicating
> > > with the community, you have further understood that the requirement and
> > > the detail of this project.
> >
> > Yep, since I encountered this while working on the patch, it fits well
> > in the Pre-GSoC section.
> > Moving it there would better show how I learned more about the
> > project's scope through
> > community feedback.
> >
>
> Yes, this is my intention. This represents your ability where you
> interact with the community and get feedback. And this is what we want
> to see.
>
> > >
> > > And in your plan, you should just say that we need to do this. Would
> > > this be better?
> > >
> > So, I should remove all the categorization stuff and just say that I
> > would focus on
> > each variable, discuss in the community whether it should belong in the struct
> > repo_settings/repo or not while sending patches?
>
> I think you should put the categorization stuff into after-GSoC part.
> Well, I don't think you could focus on _each_ variable. This is
> impossible for you to talk about the way for _each_ variable. I somehow
> think that you could just write the proposal about how to handle one or
> two global variables.
>
> You already touch one setting "core.attributesfile" right? You may just
> elaborate more in the proposal.
>
> > I felt that keeping it general might seem vague, but that's the nature
> > of the project. Every variable
> > is unique and would need a different approach and outlining the
> > approach of each variable
> > in the proposal is not very feasible, as these decisions need to
> > happen collaboratively through
> > discussions in the community.
> >
>
> Yes, so you could firstly give how you want to handle the global
> variables from top. And give some concrete examples to demonstrate your
> idea.
>
> > Should I still mention that once the project is complete, we could
> > consider structuring related
> > stuff if the community sees value in it.
> >
>
> You could, mention this in after GSoC part.
>
> > > > This plan is flexible and may be refined through multiple iterations as I receive
> > > > feedback from the community and reviewers.
> > > >
> > > > Timeline:
> > > > ---------
> > > >
> > > > Pre-GSOC:
> > > > (Until 8 May)
> > > > -     Explore the codebase more, focusing on environment-related code paths.
> > > > -     Document how each global variable is used and how it can be moved to
> > > >       repository settings.
> > > > -     Study Git’s Coding Guidelines and the Pro Git Book to align with best practices.
> > > >
> > > > ----------
> > > >
> > > > Community Bonding:
> > > > (May 8 - June 1)
> > > > -     Engage with mentors to discuss different environment variables, their
> > > >       dependencies, and the best approach for refactoring.
> > > > -     Finalize an implementation plan based on discussions.
> > > > -     Since I will be on summer vacation, I can start coding early and make progress
> > > >       on the project.
> > > >
> > > > ----------
> > > >
> > > > Coding Period:
> > > > (June 2 - August 25)
> > > > -     Refactor global variables, replacing them with repository-scoped
> > > >       configurations.
> > > > -     Modify function signatures to pass `struct repository` explicitly instead
> > > >       of relying on `the_repository`.
> > > > -     Categorize variables into specialized structs to improve modularity and
> > > >       maintainability.
> > >
> > > As I have said, this is a high-risk task. Categorization needs many
> > > iterations. And we may do this after GSoC.
> >
> > Yep, will update that.
> >
> > Thanks for your review, again:)
> >
> > Ayush
>
> Thanks,
> Jialuo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state
  2025-03-31 15:04       ` Ayush Chandekar
@ 2025-03-31 15:18         ` Ayush Chandekar
  0 siblings, 0 replies; 17+ messages in thread
From: Ayush Chandekar @ 2025-03-31 15:18 UTC (permalink / raw)
  To: shejialuo; +Cc: git, ps, karthik.188, christian.couder, shyamthakkar001

Please excuse the top posting in my previous mail.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [GSoC PROPOSAL v1] Refactoring in order to reduce Git’s global state
@ 2025-04-02 18:14 Arnav Bhate
  2025-04-03  9:59 ` Patrick Steinhardt
  0 siblings, 1 reply; 17+ messages in thread
From: Arnav Bhate @ 2025-04-02 18:14 UTC (permalink / raw)
  To: git

## Personal Information

- Full name: Arnav Akshaya Bhate
- Email address: bhatearnav@gmail.com
- Mobile no.: +91 8291328838
- Time zone: UTC+05:30
- Education: IIT Bombay
- Year: Second year
- GitHub: https://github.com/arnavbhate

## About Me

I'm Arnav Bhate, a second-year UG student at Indian Institute of
Technology Bombay. I love coding and so I am a member of IIT Bombay's
Developers' Community (DevCom), which is a group of roughly 40 people
developing software for use by students and staff of the institute. Most
of the software developed is not open source, so I can not include
examples of my work there in this proposal. Being a member of DevCom has
exposed me to collaborative software development.

A common link in all software I have worked on is that Git has been used
for version control. I thus see this project as my way of giving back to
the Git community in particular and open source in general. This will be
my first significant contribution to the open source community, and I
wish to stick around afterwards.

## Overview

Git currently uses many global variables, most significantly
`the_repository`, which are included in roughly 290 files. Apart from
`the_repository`, there are many global variables, some of which
logically belong in struct repository, as they represent information
specific to a repository. So even if all instances of the_repository
were converted into a extra repository argument for the function, there
would still be many global variables left.

The use of such variables assumes that Git will only operate on one
repository at a time, which renders multi-repository handling
impossible without kludges.

This project aims to move such variables from global scope into more
appropriate local contexts, mainly `struct repository` and
`struct repository_settings`. This will not only make the environment
repository-specific, allowing easy multi-repository handling, but also
make maintaining the code easier.

The project involves identifying suitable locations for environment
variables in repository specific structs, moving them there and updating
all the code affected by the move.

## Pre-GSoC

I first got into Git's codebase in February 2025, with my first
contribution in March. My first patch was on my microproject and since
then I have submitted two more patches on a similar topic.

### Patches

- (Microproject) decorate: fix sign comparison warnings  
  Thread: https://lore.kernel.org/git/afa6b428-3190-42ae-9eac-540c95b576fd@gmail.com/  
  Status: Merged into master  
  Commit hash: 2bfd3b368572cbf1ce287de09db08b7e7e429ecd  
  Description: Refactoring of decorate.c to replace signed variables
  with unsigned ones when they are used to iterate over arrays whose
  sizes are represented by unsigned variables, and remove 2 unnecessary
  variables which just hold the value of another variable without being
  modified, replacing them with the variable whose value they were
  holding.

- rm: fix sign comparison warnings  
  Thread: https://lore.kernel.org/git/38de63ce-6d4e-4f1f-95b1-049df78d9cfc@gmail.com/  
  Status: Under discussion  
  Description: Refactoring of rm.c to make iterators over arrays whose
  sizes are represented by unsigned variables unsigned. Specifically in
  `get_ours_cache_pos`, where before a signed variable was being passed
  and then inverted in the function, now the already inverted variable
  is passed as an unsigned variable, with the inversion moved to the
  function call.

- pathspec: fix sign comparison warnings  
  Thread: https://lore.kernel.org/git/a3aa5f99-63ce-4be5-8d64-fb6e226b3bf9@gmail.com/  
  Status: under discussion  
  Description: Refactoring of pathspec.c to make array iterator
  variables match the type of the variable storing the array's size.
  Where replacing the variable's type is not possible, because of the
  large-scale cascade replacements it would cause, an appropriate cast
  has been added.

## Proposed Plan

- Identifying all occurences of `the_repository` and updating them to
  use a `struct repository` passed to the function.

- Identifying global variables that should be moved and identifying
  suitable locations, some could be moved directly into
  `struct repository`, some in its sub-structs that already exist and
  some in newly created sub-structs.

- Identifying and updating occurences of these variables to reference
  their new locations.

It makes sense that all the variables need not be in the same struct, as
separation would keep the codebase organised, and thus easier to
maintain. It would also make it easier to introduce these changes
systematically, as a group of related variables, combined together in a
struct, could be introduced in a single patch series.

### Timeline

#### Pre-GSoC (Until May 8)

- Explore the codebase, identifying global variables and how they are
  used.

- Start to identify suitable locations for global variables.

#### Community Bonding Period (May 8 - June 1)

- Interact with mentor, discussing best ways to refactor various
  variables and make a plan based on that.

- If time is left, start coding early, as my summer break will have
  started.

#### Coding Period (June 2 - August 25)

- Modify functions to add an `struct repository` argument where they
  depend on `the_repository` and replace all occurences of it.

- Move global variables to their new locations in various structs,
  and refactor functions that depend on them to use their new locations.

#### Final Week (August 25 - September 1)

- Fix any bugs that may be left.

- Write final report.

### Availability

My summer break from college lasts from May to July. I am currently
planning on taking a vacation during this period of about 1 week,
however, the dates have not been decided. Outside of this vacation, I
am not occupied in the break and can devote up to 60 hours a week
towards the project. In August, once classes recommence, I will be
available for 20 hours a week.

## Post-GSoC

After completing my project, I plan on staying active and contributing
patches, and start reviewing code.

-- 
Regards,
Arnav Bhate
(He/Him)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSoC PROPOSAL v1] Refactoring in order to reduce Git’s global state
  2025-04-02 18:14 [GSoC PROPOSAL v1] " Arnav Bhate
@ 2025-04-03  9:59 ` Patrick Steinhardt
  2025-04-03 15:26   ` Arnav Bhate
  0 siblings, 1 reply; 17+ messages in thread
From: Patrick Steinhardt @ 2025-04-03  9:59 UTC (permalink / raw)
  To: Arnav Bhate; +Cc: git

On Wed, Apr 02, 2025 at 11:44:12PM +0530, Arnav Bhate wrote:
[snip]
> ## Proposed Plan
> 
> - Identifying all occurences of `the_repository` and updating them to
>   use a `struct repository` passed to the function.

I think that might be overly ambituous :) After all we're talking about
~3500 occurrences, and it won't be feasible to replace them all in the
couple of months. This is rather a multi-year project, and one that has
already been going on for quite a while.

> - Identifying global variables that should be moved and identifying
>   suitable locations, some could be moved directly into
>   `struct repository`, some in its sub-structs that already exist and
>   some in newly created sub-structs.

Likewise, I would recommend to properly scope _which_ variables you want
to replace. There's a ton of global state, so you should try to limit
the project to a reasonable workload.

> - Identifying and updating occurences of these variables to reference
>   their new locations.
> 
> It makes sense that all the variables need not be in the same struct, as
> separation would keep the codebase organised, and thus easier to
> maintain. It would also make it easier to introduce these changes
> systematically, as a group of related variables, combined together in a
> struct, could be introduced in a single patch series.
> 
> ### Timeline
> 
> #### Pre-GSoC (Until May 8)
> 
> - Explore the codebase, identifying global variables and how they are
>   used.
> 
> - Start to identify suitable locations for global variables.
> 
> #### Community Bonding Period (May 8 - June 1)
> 
> - Interact with mentor, discussing best ways to refactor various
>   variables and make a plan based on that.
> 
> - If time is left, start coding early, as my summer break will have
>   started.
> 
> #### Coding Period (June 2 - August 25)
> 
> - Modify functions to add an `struct repository` argument where they
>   depend on `the_repository` and replace all occurences of it.
> 
> - Move global variables to their new locations in various structs,
>   and refactor functions that depend on them to use their new locations.

In large-scale projects like these it typically makes sense to work in
batches. Instead of having three separate phases to "define the
problem", "develop the solution" and "deploy the improvement" I would
strongly encourage you to define and tie together smaller batches of
work.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSoC PROPOSAL v1] Refactoring in order to reduce Git’s global state
  2025-04-03  9:59 ` Patrick Steinhardt
@ 2025-04-03 15:26   ` Arnav Bhate
  2025-04-04  9:19     ` Patrick Steinhardt
  0 siblings, 1 reply; 17+ messages in thread
From: Arnav Bhate @ 2025-04-03 15:26 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:
> On Wed, Apr 02, 2025 at 11:44:12PM +0530, Arnav Bhate wrote:
> [snip]
>> ## Proposed Plan
>>
>> - Identifying all occurences of `the_repository` and updating them to
>>   use a `struct repository` passed to the function.
> 
> I think that might be overly ambituous :) After all we're talking about
> ~3500 occurrences, and it won't be feasible to replace them all in the
> couple of months. This is rather a multi-year project, and one that has
> already been going on for quite a while.
> 
>> - Identifying global variables that should be moved and identifying
>>   suitable locations, some could be moved directly into
>>   `struct repository`, some in its sub-structs that already exist and
>>   some in newly created sub-structs.
> 
> Likewise, I would recommend to properly scope _which_ variables you want
> to replace. There's a ton of global state, so you should try to limit
> the project to a reasonable workload.

I could do all the global variables in environment.c. I feel like that
is doable. Once I am finished with that, I could start replacing
the_repository.

>> - Identifying and updating occurences of these variables to reference
>>   their new locations.
>>
>> It makes sense that all the variables need not be in the same struct, as
>> separation would keep the codebase organised, and thus easier to
>> maintain. It would also make it easier to introduce these changes
>> systematically, as a group of related variables, combined together in a
>> struct, could be introduced in a single patch series.
>>
>> ### Timeline
>>
>> #### Pre-GSoC (Until May 8)
>>
>> - Explore the codebase, identifying global variables and how they are
>>   used.
>>
>> - Start to identify suitable locations for global variables.
>>
>> #### Community Bonding Period (May 8 - June 1)
>>
>> - Interact with mentor, discussing best ways to refactor various
>>   variables and make a plan based on that.
>>
>> - If time is left, start coding early, as my summer break will have
>>   started.
>>
>> #### Coding Period (June 2 - August 25)
>>
>> - Modify functions to add an `struct repository` argument where they
>>   depend on `the_repository` and replace all occurences of it.
>>
>> - Move global variables to their new locations in various structs,
>>   and refactor functions that depend on them to use their new locations.
> 
> In large-scale projects like these it typically makes sense to work in
> batches. Instead of having three separate phases to "define the
> problem", "develop the solution" and "deploy the improvement" I would
> strongly encourage you to define and tie together smaller batches of
> work.

What I meant is, before coding started, I want to finalise all the new
locations for the global variables with my mentor, then I would actually
modify the code in batches, struct-by-struct. Are you suggesting that
the new locations not be finalised beforehand, or are we misinterpreting
each other?

> Thanks!
> 
> Patrick

-- 
Regards,
Arnav Bhate
(He/Him)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
  2025-03-26  5:26 [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state Ayush Chandekar
  2025-03-28 13:06 ` shejialuo
@ 2025-04-04  8:51 ` Ayush Chandekar
  2025-04-04 14:45   ` Karthik Nayak
  2025-04-07  8:42   ` Ayush Chandekar
  2025-04-08 12:52 ` [GSOC] [PROPOSAL v3]: " Ayush Chandekar
  2 siblings, 2 replies; 17+ messages in thread
From: Ayush Chandekar @ 2025-04-04  8:51 UTC (permalink / raw)
  To: ayu.chandekar
  Cc: christian.couder, git, karthik.188, ps, shejialuo,
	shyamthakkar001

Hello,
This is my GSoC 2025 proposal for the project "Refactoring in order to reduce Git’s global state".
You can view docs version here: 
https://docs.google.com/document/d/1tJrtWxo1UGKChB3hu5eZ-ljm0FtU_fsv0TnIRwu3EKY/edit?usp=sharing

---------

Refactoring in order to reduce git’s state

My Information:
---------------

Name: Ayush Chandekar
Email: ayu.chandekar@gmail.com
Mobile No: (+91) 9372496874
Education: UG Sophomore, IIT Roorkee
Github: https://github.com/ayu-ch
Blog: https://ayu-ch.github.io


About me:
---------

I'm Ayush Chandekar, a UG Sophomore studying at Indian Institute of
Technology, Roorkee. I like participating in various software development
and tech-development endeavors, usually hackathons, CTFs, and projects at
SDSLabs. SDSLabs is a student-run technical group that includes passionate
developers and designers interested in various fields and involved in multiple
software development projects that aim to foster a software development
culture on campus. Being a part of this group has exposed me to different
software development methodologies, tools and frameworks and helped me become
comfortable contributing to an open-source project with multiple contributors.
Some open-source contributions I made here are: [1], [2] & [3]

I see this project as a meaningful opportunity to deepen my involvement in
the Git community and to build a foundation for continued contributions to
open source development in the future.


Overview:
---------

Git currently uses a global object called `the_repository`, which refers to a
single instance of `struct repository`. Many internal functions rely on this
global object rather than accepting a `struct repository` as an explicit
parameter. This design inherently assumes a single active repository,
making it difficult to support multi-repository use cases and obstructing
the long-term goal of libification of Git.

A key architectural limitation is that while `struct repository` encapsulates
some repository-specific information, many important environment variables
and configuration settings that logically belong to a repository are still
stored as global variables, primarily in `environment.c`, not within the
`repository` struct. As a result, even if multiple repositories were to
exist concurrently, they would still share this global state, leading to
incorrect behavior, race conditions, or subtle bugs.

This project aims to refactor Git’s environment handling by relocating global
variables into more appropriate local contexts, primarily within 
struct repository and struct repo_settings. However, some global variables may
only apply to specific subsystems. In such cases, rather than placing them in
struct repository or struct repo_settings, they should be moved into a
context that better reflects their scope.

This change will not only make the environment state repository-specific but
also improve the modularity and maintainability of the codebase. The work
involves identifying environment-related global variables, determining the
most suitable structure to house them, and updating all affected code paths
accordingly.

The difficulty of this project is medium, and it is estimated to take 
175 to 350 hours.


Pre-GSOC:
---------

I started exploring Git’s codebase and documentation around the end of
January, familiarizing myself with its structure and development practices. I
submitted a microproject, which helped me navigate the code and contribution
workflow.

After selecting the project on refactoring Git’s state, I studied the
surrounding code and reviewed past patches ([4], [5], [6], [7], [8] & [9])
to understand the reasoning behind previous changes. 

To better prepare for the GSoC timeline, I submitted a patch related to the
project, to gain hands-on experience with both the implementation details
and the submission process. The patch focused on refactoring access to
`core.attributesfile`.

Through discussions and feedback from the community, I gained a clearer 
understanding of a key aspect of the project:
determining whether certain variables should belong to repo_settings/
repository or be part of a separate subsystem.

Junio pointed out in a feedback that not all global variables should
be blindly moved into `repo_settings`.
Specifically, for `git_attributes_file`, adding it to the repository struct
doesn’t make sense. He explained that it’s similar to how index_state is
handled, while index_state knows which repository it belongs to, the
repository struct only holds a pointer to a single index_state instance
and isn’t aware of other instances.

Following this approach, instead of placing `git_attributes_file` in the
repository struct, we can house it within an attribute set and pass a
pointer to that set wherever needed.

This practice patch gave me a clearer understanding of the project.

Patches:
--------

For git:

+ (Microproject) t6423: fix suppression of Git’s exit code in tests
	Thread:
	https://public-inbox.org/git/20250202120926.322417-1-ayu.chandekar@gmail.com/
	Status: Merged into master 
	Commit Hash: 7c1d34fe5d1229362f2c3ecf2d493167a1f555a2 
	Description: Instead of executing a Git command as the upstream component of
				 a pipe, which can result in the exit status being lost, redirect
				 its output to a file and then process that file in two steps to
				 ensure the exit status is properly preserved.

+ midx: implement progress reporting for QSORT operation
	Thread:
	https://public-inbox.org/git/20250210074623.136599-1-ayu.chandekar@gmail.com/
	Status: Dropped 
	Description: Add progress reporting during the QSORT operation in 
				 multi-pack-index verification. While going through the code, 
				 I found this TODO, which I thought was interesting however my 
				 approach assumed that the qsort() operation processes elements
				 in a structured order, which isn't guaranteed.

+ Stop depending on `the_repository` for core.attributesfile
	Thread:
	https://public-inbox.org/git/20250310151048.69825-1-ayu.chandekar@gmail.com/
	Status: WIP, needs more discussion.  
	Description: This patch refactors access to the `core.attributesfiles` 
				 configuration by moving it into the `repo_settings` struct.
				 It eliminates the global variable `git_attributes_file` and 
				 updates relevant code paths to pass the `struct repository`
				 as a parameter.

For git.github.io:

+ GSoC-participants: add GSoC 2024 participants to the list #762
	Status: Merged into master
	Description: Adding GSoC 2024 participants will help new
				 contributors understand their journey, making it easier for them 
				 to navigate the program and the project.

Proposed Plan:
--------------

I have been reviewing global variables across the codebase to understand their
dependencies and impact. To do this, I examined `config.c` and cross-referenced
it with `environment.c` to see how these variables are currently managed. The
goal of this project is to eliminate global variables by moving their
configurations into their local contexts. 

The general approach for handling a global variable begins with understanding
its purpose. This involves tracing its usage across the codebase and identifying
the subsystem it should belong to. If the variable is closely tied to
repository-related functionality, it may belong in struct repository or
struct repo_settings. Otherwise, it should be placed in a more suitable
context based on its scope.

Additionally, it's important to review previous attempts or related patches
to understand past design decisions and ensure consistency with ongoing efforts.
Finally, the global instance is eliminated by relocating the variable into the
appropriate context and passing it through the relevant code paths.

Example: Handling `is_bare_repository_cfg`
The variable `is_bare_repository_cfg` determines whether a repository is bare,
meaning it lacks a working directory. Since this property is fundamental to
how a repository functions, it should be placed in struct repository.

I have also gone through the code paths and analyzed how this variable is
initialized. We can initialize it similarly to how hash_algo is set through
the repository format. The repository format already contains an `is_bare`
field, which we can use to set this variable inside struct repository.

However, I still have some questions regarding why the is_bare_repository()
function checks for `repo->worktree` and why the `worktree struct` itself has
an `is_bare` variable. If a repository is considered bare when !repo->worktree
is true, the role of `worktree->is_bare` needs further clarification. I believe
that by engaging with the community, my understanding will become clearer.
I also went through [4] to see how John Cai's approach was. 

This is how we can also approach for other global variables.
Through multiple iterations, this approach will be refined based on feedback, 
edge cases, and community input.


Timeline:
---------

Pre-GSOC: 
(Until 8 May) 
-	Explore the codebase more, focusing on environment-related code paths.
-	Document how each global variable is used and how it can be moved to 
	repository settings.  
-	Study Git’s Coding Guidelines and the Pro Git Book to align with best practices.

----------

Community Bonding: 
(May 8 - June 1) 
-	Engage with mentors to discuss different environment variables, their 
	dependencies, and the best approach for refactoring.
-	Finalize an implementation plan based on discussions.
-	Since I will be on summer vacation, I can start coding early and make progress 
	on the project.

----------

Coding Period: 
(June 2 - August 25) 
-	Identify the appropriate subsystem for each global variable and relocate it 
	into struct repository, struct repo_settings, or other suitable contexts.
-	Modify function signatures to pass the new contexts explicitly, replacing 
	reliance on global variables.
-	Continuously submit patches for review and incorporate feedback from mentors
	and the community.  
-	I plan to write weekly blogs which will document what I did in the whole 
	week.

----------

Final Week: 
(August 25 - September 1) 
-	Write a detailed report on the entire project.  
-	Fix bugs if any.  
-	Reflect on the project, noting challenges faced and lessons learned.


Blogging:
---------

I have also set up a blogging page at [10]. While reading blogs from previous
GSoC contributors, I found them useful in understanding the challenges
they faced and how they approached their projects. Their experiences gave
me a better idea of what to expect and how to navigate the development
process. Inspired by this, I decided to start my own blog to document my
journey throughout GSoC. This will not only help me track my own progress but
also serve as a resource for future contributors who might work on similar
projects. I plan to share updates on my work, challenges encountered and
insights gained from discussions with mentors and the community.

Additionally, I hope my blog encourages more people to contribute to open
source by providing a transparent look into the development process. Writing
about my experience will also help me reflect on my work and improve my
ability to communicate technical ideas effectively.

I liked the format and structure of Chandra's blog, so I decided to use the
same template for my own blogging page.


Availability:
-------------

As a college student, I intend to utilise my summer breaks from May to July
to work on the project. After completing my University exams in April, I can
start working in May. I can dedicate 40 hours a week from May to July, while
in August after the classes commence, I can dedicate about 25 hours a week.

There are no exams or planned vacations throughout the coding period. Besides
this project, I have no commitments/vacations planned for the summer. I shall
keep my status posted to all the community members and maintain transparency
in the project.


Post-GSOC:
----------

Beyond contributing code, I strongly believe in giving back to the community
and helping others grow. Open source thrives on mentorship, knowledge sharing,
and long-term involvement, and I would love to continue contributing even
after GSoC ends.

I have always valued mentorship, both as a mentee and as someone who enjoys
guiding others. If given the opportunity, I would be more than happy to
mentor/co-mentor future GSoC contributors. By staying involved in the
community, whether through contributing, reviewing patches, or mentoring,
I hope to help sustain and expand the project’s reach. I look at GSoC as not 
just as a one-time contribution but as a step toward a longer-term relationship
with open source.

I will continue to be involved with Git even after GSoC by contributing patches,
reviewing code, and participating in discussions. My work on refactoring Git’s 
state aligns with long-term improvements to the codebase, and I plan to keep 
refining it beyond the program. I see GSoC as just the beginning of my journey
with Git.

Appreciation:
-------------

I appreciate the Git community for its excellent documentation, which made it 
much easier for me to understand Git in depth. The well-structured resources 
helped me navigate the codebase and gain a deeper understanding of how Git 
works internally.

Beyond the documentation, I am also grateful for how welcoming and supportive 
the community has been. Whether through discussions on the mailing list or 
feedback on my patches, the information and guidance I received made my 
experience even better.

Additionally, I read the blogs and proposals of Chandra, Jialuo, and Ghanashyam, 
which provided valuable insights into their journeys and helped me shape my 
own approach to contributing.

Thanks for reviewing this proposal.

References:
-----------

[1] https://github.com/sdslabs/beast/pull/374

[2] https://github.com/sdslabs/beast/tree/add-teams-with-hint

[3] https://github.com/sdslabs/playCTF/pull/177

[4] https://public-inbox.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/

[5] https://public-inbox.org/git/20250303-b4-pks-objects-without-the-repository-v1-0-c5dd43f2476e@pks.im/

[6] https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-0-4e77f0313206@pks.im/

[7] https://public-inbox.org/git/pull.1829.git.1731653548549.gitgitgadget@gmail.com/#t

[8] https://public-inbox.org/git/cover.1733236936.git.karthik.188@gmail.com/

[9] https://public-inbox.org/git/cover.1724923648.git.ps@pks.im/

[10] https://ayu-ch.github.io

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSoC PROPOSAL v1] Refactoring in order to reduce Git’s global state
  2025-04-03 15:26   ` Arnav Bhate
@ 2025-04-04  9:19     ` Patrick Steinhardt
  0 siblings, 0 replies; 17+ messages in thread
From: Patrick Steinhardt @ 2025-04-04  9:19 UTC (permalink / raw)
  To: Arnav Bhate; +Cc: git

On Thu, Apr 03, 2025 at 08:56:45PM +0530, Arnav Bhate wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > On Wed, Apr 02, 2025 at 11:44:12PM +0530, Arnav Bhate wrote:
> >> ### Timeline
> >>
> >> #### Pre-GSoC (Until May 8)
> >>
> >> - Explore the codebase, identifying global variables and how they are
> >>   used.
> >>
> >> - Start to identify suitable locations for global variables.
> >>
> >> #### Community Bonding Period (May 8 - June 1)
> >>
> >> - Interact with mentor, discussing best ways to refactor various
> >>   variables and make a plan based on that.
> >>
> >> - If time is left, start coding early, as my summer break will have
> >>   started.
> >>
> >> #### Coding Period (June 2 - August 25)
> >>
> >> - Modify functions to add an `struct repository` argument where they
> >>   depend on `the_repository` and replace all occurences of it.
> >>
> >> - Move global variables to their new locations in various structs,
> >>   and refactor functions that depend on them to use their new locations.
> > 
> > In large-scale projects like these it typically makes sense to work in
> > batches. Instead of having three separate phases to "define the
> > problem", "develop the solution" and "deploy the improvement" I would
> > strongly encourage you to define and tie together smaller batches of
> > work.
> 
> What I meant is, before coding started, I want to finalise all the new
> locations for the global variables with my mentor, then I would actually
> modify the code in batches, struct-by-struct. Are you suggesting that
> the new locations not be finalised beforehand, or are we misinterpreting
> each other?

The problem I see is that you only have one large "Coding Period". What
we would like to see though is that you define smaller, self-contained
batches of work that you can try to land individually, as well as an
estimation around how long each of these batches will take you to both
developend and land in Git itself.

Patrick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
  2025-04-04  8:51 ` [GSOC] [PROPOSAL v2]: " Ayush Chandekar
@ 2025-04-04 14:45   ` Karthik Nayak
  2025-04-06 10:44     ` Ayush Chandekar
  2025-04-07  8:42   ` Ayush Chandekar
  1 sibling, 1 reply; 17+ messages in thread
From: Karthik Nayak @ 2025-04-04 14:45 UTC (permalink / raw)
  To: Ayush Chandekar; +Cc: christian.couder, git, ps, shejialuo, shyamthakkar001

[-- Attachment #1: Type: text/plain, Size: 3243 bytes --]

Ayush Chandekar <ayu.chandekar@gmail.com> writes:

[snip]

> Proposed Plan:
> --------------
>
> I have been reviewing global variables across the codebase to understand their
> dependencies and impact. To do this, I examined `config.c` and cross-referenced
> it with `environment.c` to see how these variables are currently managed. The
> goal of this project is to eliminate global variables by moving their
> configurations into their local contexts.
>
> The general approach for handling a global variable begins with understanding
> its purpose. This involves tracing its usage across the codebase and identifying
> the subsystem it should belong to. If the variable is closely tied to
> repository-related functionality, it may belong in struct repository or
> struct repo_settings. Otherwise, it should be placed in a more suitable
> context based on its scope.
>
> Additionally, it's important to review previous attempts or related patches
> to understand past design decisions and ensure consistency with ongoing efforts.
> Finally, the global instance is eliminated by relocating the variable into the
> appropriate context and passing it through the relevant code paths.
>
> Example: Handling `is_bare_repository_cfg`
> The variable `is_bare_repository_cfg` determines whether a repository is bare,
> meaning it lacks a working directory. Since this property is fundamental to
> how a repository functions, it should be placed in struct repository.
>
> I have also gone through the code paths and analyzed how this variable is
> initialized. We can initialize it similarly to how hash_algo is set through
> the repository format. The repository format already contains an `is_bare`
> field, which we can use to set this variable inside struct repository.
>
> However, I still have some questions regarding why the is_bare_repository()
> function checks for `repo->worktree` and why the `worktree struct` itself has
> an `is_bare` variable. If a repository is considered bare when !repo->worktree
> is true, the role of `worktree->is_bare` needs further clarification. I believe
> that by engaging with the community, my understanding will become clearer.
> I also went through [4] to see how John Cai's approach was.
>
> This is how we can also approach for other global variables.
> Through multiple iterations, this approach will be refined based on feedback,
> edge cases, and community input.
>

So the approach you suggest is to comb through the global variables and
config and find new locations for them to be stored. While this is
definitely a bunch chunk of the problem, shouldn't we also talk about
how we can reduce usage of some of these variables?

In particular, I'm wondering how you'd want to tackle 'the_repository'
usage. There is some previous work done here, where Patrick added the
'#define USE_THE_REPOSITORY_VARIABLE' definition which tracks usage of
global variable and usage of them in different files.

A possible approach which has been followed is to simply go from the
bottom layers of the code upwards, cleaning up usage of global variables
and ensuring we can remove '#define USE_THE_REPOSITORY_VARIABLE' from
files. This is also the approach taken in some of the patches that
you've linked.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
  2025-04-04 14:45   ` Karthik Nayak
@ 2025-04-06 10:44     ` Ayush Chandekar
  2025-04-07  9:06       ` Christian Couder
  0 siblings, 1 reply; 17+ messages in thread
From: Ayush Chandekar @ 2025-04-06 10:44 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: christian.couder, git, ps, shejialuo, shyamthakkar001

>
> So the approach you suggest is to comb through the global variables and
> config and find new locations for them to be stored. While this is
> definitely a bunch chunk of the problem, shouldn't we also talk about
> how we can reduce usage of some of these variables?
>
> In particular, I'm wondering how you'd want to tackle 'the_repository'
> usage. There is some previous work done here, where Patrick added the
> '#define USE_THE_REPOSITORY_VARIABLE' definition which tracks usage of
> global variable and usage of them in different files.
>
> A possible approach which has been followed is to simply go from the
> bottom layers of the code upwards, cleaning up usage of global variables
> and ensuring we can remove '#define USE_THE_REPOSITORY_VARIABLE' from
> files. This is also the approach taken in some of the patches that
> you've linked.
>

Your approach makes a lot of sense to me, that is, picking a specific
subsystem or file and aiming to remove the `#define USE_THE_REPOSITORY_VARIABLE`
definition and thus 'the_repository' eventually. This was the method
used by Patrick to tackle
the object subsystem in [1]  and the path subsystem in [2] and you to
tackle the packfile in [3].
This approach also helps in removing some of the global variables used
within that particular
subsystem, which is a nice bonus.

However, this approach might not be feasible for the global variables that
arent tightly tied to a single subsystem. So what I can do is, for removing
`the_repository`, I can follow the approach you mentioned, and for relocating
the more general global variables, I can use the approach which I
talked about in the
proposal.

What do you think?

[1]: https://public-inbox.org/git/20250303-b4-pks-objects-without-the-repository-v1-0-c5dd43f2476e@pks.im/
[2]: https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-0-4e77f0313206@pks.im/
[3]: https://public-inbox.org/git/cover.1733236936.git.karthik.188@gmail.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
  2025-04-04  8:51 ` [GSOC] [PROPOSAL v2]: " Ayush Chandekar
  2025-04-04 14:45   ` Karthik Nayak
@ 2025-04-07  8:42   ` Ayush Chandekar
  1 sibling, 0 replies; 17+ messages in thread
From: Ayush Chandekar @ 2025-04-07  8:42 UTC (permalink / raw)
  To: Ayush Chandekar, Patrick Steinhardt
  Cc: christian.couder, git, karthik nayak, shejialuo,
	Ghanshyam Thakkar

Hey Patrick,

It would be great if you could take a look at my proposal, especially since
you've worked on this area before. Any feedback would be really appreciated!

Thanks!
Ayush

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
  2025-04-06 10:44     ` Ayush Chandekar
@ 2025-04-07  9:06       ` Christian Couder
  2025-04-07 10:07         ` Ayush Chandekar
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Couder @ 2025-04-07  9:06 UTC (permalink / raw)
  To: Ayush Chandekar; +Cc: Karthik Nayak, git, ps, shejialuo, shyamthakkar001

On Sun, Apr 6, 2025 at 12:44 PM Ayush Chandekar <ayu.chandekar@gmail.com> wrote:
>
> >
> > So the approach you suggest is to comb through the global variables and
> > config and find new locations for them to be stored. While this is
> > definitely a bunch chunk of the problem, shouldn't we also talk about
> > how we can reduce usage of some of these variables?
> >
> > In particular, I'm wondering how you'd want to tackle 'the_repository'
> > usage. There is some previous work done here, where Patrick added the
> > '#define USE_THE_REPOSITORY_VARIABLE' definition which tracks usage of
> > global variable and usage of them in different files.
> >
> > A possible approach which has been followed is to simply go from the
> > bottom layers of the code upwards, cleaning up usage of global variables
> > and ensuring we can remove '#define USE_THE_REPOSITORY_VARIABLE' from
> > files. This is also the approach taken in some of the patches that
> > you've linked.
> >
>
> Your approach makes a lot of sense to me, that is, picking a specific
> subsystem or file and aiming to remove the `#define USE_THE_REPOSITORY_VARIABLE`
> definition and thus 'the_repository' eventually. This was the method
> used by Patrick to tackle
> the object subsystem in [1]  and the path subsystem in [2] and you to
> tackle the packfile in [3].
> This approach also helps in removing some of the global variables used
> within that particular
> subsystem, which is a nice bonus.
>
> However, this approach might not be feasible for the global variables that
> arent tightly tied to a single subsystem.

Well, initially 'the_repository' wasn't tightly tied to a single
subsystem and even now I am not sure we could say it's tightly tied to
a single subsystem. Or maybe I don't understand what you mean.

Do you mean that it's tightly tied because it needs `#define
USE_THE_REPOSITORY_VARIABLE`?

But for other global variables it could be possible to define and use
similar macros. This way it might be possible to remove those
variables step by step only in some files.

> So what I can do is, for removing
> `the_repository`, I can follow the approach you mentioned, and for relocating
> the more general global variables, I can use the approach which I
> talked about in the
> proposal.
>
> What do you think?

If removing `the_repository` is part of your proposal, then yeah,
describing the approach you will use to remove is a good idea.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
  2025-04-07  9:06       ` Christian Couder
@ 2025-04-07 10:07         ` Ayush Chandekar
  0 siblings, 0 replies; 17+ messages in thread
From: Ayush Chandekar @ 2025-04-07 10:07 UTC (permalink / raw)
  To: Christian Couder; +Cc: Karthik Nayak, git, ps, shejialuo, shyamthakkar001

>
> Well, initially 'the_repository' wasn't tightly tied to a single
> subsystem and even now I am not sure we could say it's tightly tied to
> a single subsystem. Or maybe I don't understand what you mean.
>
> Do you mean that it's tightly tied because it needs `#define
> USE_THE_REPOSITORY_VARIABLE`?
>
Sorry if I was not clear earlier. I wasn't referring to
'the_repository' being tied, it was about other
global variables being tied to a subsystem.
What I meant is that the approach of picking a subsystem and removing the
`#define USE_THE_REPOSITORY_VARIABLE` is really effective for removing
'the_repository.'
It also helps in localizing the global variables from environment.h
that are specific to that subsystem,
either into the subsystem itself or into struct repository / repo_settings.

But let's say if a global variable is common to 2-3 different subsystems, then
this approach would not be feasible for that variable. For that, I
would require to individually
tackle that variable. This is an approach which I mentioned in my proposal.

So using these two approaches according to different needs, I can move forward.

> But for other global variables it could be possible to define and use
> similar macros. This way it might be possible to remove those
> variables step by step only in some files.
>
Yes, I still need to think through how that would align with the
approach I mentioned.
Defining a single macro like `#define USE_GLOBAL_VARIABLES` is
something I can look into.

> > So what I can do is, for removing
> > `the_repository`, I can follow the approach you mentioned, and for relocating
> > the more general global variables, I can use the approach which I
> > talked about in the
> > proposal.
> >
> > What do you think?
>
> If removing `the_repository` is part of your proposal, then yeah,
> describing the approach you will use to remove is a good idea.

Yes, it is a part of the project but I haven't added this specific
approach in the proposal yet and was hence asking if I can.
Thanks:)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [GSOC] [PROPOSAL v3]: Refactoring in order to reduce Git’s global state
  2025-03-26  5:26 [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state Ayush Chandekar
  2025-03-28 13:06 ` shejialuo
  2025-04-04  8:51 ` [GSOC] [PROPOSAL v2]: " Ayush Chandekar
@ 2025-04-08 12:52 ` Ayush Chandekar
  2 siblings, 0 replies; 17+ messages in thread
From: Ayush Chandekar @ 2025-04-08 12:52 UTC (permalink / raw)
  To: ayu.chandekar
  Cc: christian.couder, git, karthik.188, ps, shejialuo,
	shyamthakkar001

Hello,
This is the third version of my GSoC 2025 proposal for the project 
"Refactoring in order to reduce Git’s global state".

The key change from v2 to this v3 is that I’ve added how I plan to tackle 
the 'the_repository' global object.

You can view docs version here: 
https://docs.google.com/document/d/1tJrtWxo1UGKChB3hu5eZ-ljm0FtU_fsv0TnIRwu3EKY/edit?usp=sharing

---------

Refactoring in order to reduce git’s state

My Information:
---------------

Name: Ayush Chandekar
Email: ayu.chandekar@gmail.com
Mobile No: (+91) 9372496874
Education: UG Sophomore, IIT Roorkee
Github: https://github.com/ayu-ch
Blog: https://ayu-ch.github.io


About me:
---------

I'm Ayush Chandekar, a UG Sophomore studying at Indian Institute of
Technology, Roorkee. I like participating in various software development
and tech-development endeavors, usually hackathons, CTFs, and projects at
SDSLabs. SDSLabs is a student-run technical group that includes passionate
developers and designers interested in various fields and involved in multiple
software development projects that aim to foster a software development
culture on campus. Being a part of this group has exposed me to different
software development methodologies, tools and frameworks and helped me become
comfortable contributing to an open-source project with multiple contributors.
Some open-source contributions I made here are: [1], [2] & [3]

I see this project as a meaningful opportunity to deepen my involvement in
the Git community and to build a foundation for continued contributions to
open source development in the future.


Overview:
---------

Git currently uses a global object called `the_repository`, which refers to a
single instance of `struct repository`. Many internal functions rely on this
global object rather than accepting a `struct repository` as an explicit
parameter. This design inherently assumes a single active repository,
making it difficult to support multi-repository use cases and obstructing
the long-term goal of libification of Git.

A key architectural limitation is that while `struct repository` encapsulates
some repository-specific information, many important environment variables
and configuration settings that logically belong to a repository are still
stored as global variables, primarily in `environment.c`, not within the
`repository` struct. As a result, even if multiple repositories were to
exist concurrently, they would still share this global state, leading to
incorrect behavior, race conditions, or subtle bugs.

This project aims to refactor Git’s environment handling by relocating global
variables into more appropriate local contexts, primarily within 
struct repository and struct repo_settings. However, some global variables may
only apply to specific subsystems. In such cases, rather than placing them in
struct repository or struct repo_settings, they should be moved into a
context that better reflects their scope.

This change will not only make the environment state repository-specific but
also improve the modularity and maintainability of the codebase. The work
involves identifying environment-related global variables, determining the
most suitable structure to house them, and updating all affected code paths
accordingly.

The difficulty of this project is medium, and it is estimated to take 
175 to 350 hours.


Pre-GSOC:
---------

I started exploring Git’s codebase and documentation around the end of
January, familiarizing myself with its structure and development practices. I
submitted a microproject, which helped me navigate the code and contribution
workflow.

After selecting the project on refactoring Git’s state, I studied the
surrounding code and reviewed past patches ([4], [5], [6], [7], [8] & [9])
to understand the reasoning behind previous changes. 

To better prepare for the GSoC timeline, I submitted a patch related to the
project, to gain hands-on experience with both the implementation details
and the submission process. The patch focused on refactoring access to
`core.attributesfile`.

Through discussions and feedback from the community, I gained a clearer 
understanding of a key aspect of the project:
determining whether certain variables should belong to repo_settings/
repository or be part of a separate subsystem.

Junio pointed out in a feedback that not all global variables should
be blindly moved into `repo_settings`.
Specifically, for `git_attributes_file`, adding it to the repository struct
doesn’t make sense. He explained that it’s similar to how index_state is
handled, while index_state knows which repository it belongs to, the
repository struct only holds a pointer to a single index_state instance
and isn’t aware of other instances.

Following this approach, instead of placing `git_attributes_file` in the
repository struct, we can house it within an attribute set and pass a
pointer to that set wherever needed.

This practice patch gave me a clearer understanding of the project.

Patches:
--------

For git:

+ (Microproject) t6423: fix suppression of Git’s exit code in tests
	Thread:
	https://public-inbox.org/git/20250202120926.322417-1-ayu.chandekar@gmail.com/
	Status: Merged into master 
	Commit Hash: 7c1d34fe5d1229362f2c3ecf2d493167a1f555a2 
	Description: Instead of executing a Git command as the upstream component of
				 a pipe, which can result in the exit status being lost, redirect
				 its output to a file and then process that file in two steps to
				 ensure the exit status is properly preserved.

+ midx: implement progress reporting for QSORT operation
	Thread:
	https://public-inbox.org/git/20250210074623.136599-1-ayu.chandekar@gmail.com/
	Status: Dropped 
	Description: Add progress reporting during the QSORT operation in 
				 multi-pack-index verification. While going through the code, 
				 I found this TODO, which I thought was interesting however my 
				 approach assumed that the qsort() operation processes elements
				 in a structured order, which isn't guaranteed.

+ Stop depending on `the_repository` for core.attributesfile
	Thread:
	https://public-inbox.org/git/20250310151048.69825-1-ayu.chandekar@gmail.com/
	Status: WIP, needs more discussion.  
	Description: This patch refactors access to the `core.attributesfiles` 
				 configuration by moving it into the `repo_settings` struct.
				 It eliminates the global variable `git_attributes_file` and 
				 updates relevant code paths to pass the `struct repository`
				 as a parameter.

For git.github.io:

+ GSoC-participants: add GSoC 2024 participants to the list #762
	Status: Merged into master
	Description: Adding GSoC 2024 participants will help new
				 contributors understand their journey, making it easier for them 
				 to navigate the program and the project.

+ Rename references from *.txt to *.adoc in documentation paths. #769
	Status: Merged into master
	Description: Since the documentation in git is changed from *.txt to *.adoc
				 format. Update references to reflect that change.

+ Rename references from *.txt to *.adoc in Rev News editions. #770
	Status: Merged into master
	Description: Since the documentation in git is changed from *.txt to *.adoc
				 format. Update references in previous editions to reflect that change.

Proposed Plan:
--------------

I have been reviewing global variables across the codebase to understand their
dependencies and impact. To do this, I examined `config.c` and cross-referenced
it with `environment.c` to see how these variables are currently managed. The
goal of this project is to eliminate global variables by moving their
configurations into their local contexts. 

The general approach for handling a global variable begins with understanding
its purpose. This involves tracing its usage across the codebase and identifying
the subsystem it should belong to. If the variable is closely tied to
repository-related functionality, it may belong in struct repository or
struct repo_settings. Otherwise, it should be placed in a more suitable
context based on its scope.

Additionally, it's important to review previous attempts or related patches
to understand past design decisions and ensure consistency with ongoing efforts.
Finally, the global instance is eliminated by relocating the variable into the
appropriate context and passing it through the relevant code paths.

Example: Handling `is_bare_repository_cfg`
The variable `is_bare_repository_cfg` determines whether a repository is bare,
meaning it lacks a working directory. Since this property is fundamental to
how a repository functions, it should be placed in struct repository.

I have also gone through the code paths and analyzed how this variable is
initialized. We can initialize it similarly to how hash_algo is set through
the repository format. The repository format already contains an `is_bare`
field, which we can use to set this variable inside struct repository.

However, I still have some questions regarding why the is_bare_repository()
function checks for `repo->worktree` and why the `worktree struct` itself has
an `is_bare` variable. If a repository is considered bare when !repo->worktree
is true, the role of `worktree->is_bare` needs further clarification. I believe
that by engaging with the community, my understanding will become clearer.
I also went through [4] to see how John Cai's approach was. 

This is how we can also approach for other global variables.
Through multiple iterations, this approach will be refined based on feedback, 
edge cases, and community input.

Other than that, we have a global object 'the_repository'. As an attempt to  
remove this globally, Patrick introduced a macro  
`#define USE_THE_REPOSITORY_VARIABLE` in this patch: [10]. An approach I can  
follow is picking a subsystem or file, and aiming to remove the macro and  
hence eliminating the usage of 'the_repository' in that subsystem/file by 
passing the `struct repository` explicitly through the call chain. 
This approach also helps in removing some of the global variables in that  
particular subsystem. 
This is also followed in the patches by Patrick in  
[5] and [6] to tackle the object and path subsystem respectively and Karthik  
in [8] to tackle the packfile subsystem.


Timeline:
---------

Pre-GSOC: 
(Until 8 May) 
-	Explore the codebase more, focusing on environment-related code paths.
-	Document how each global variable is used and how it can be moved to 
	repository settings.  
-	Study Git’s Coding Guidelines and the Pro Git Book to align with best practices.

----------

Community Bonding: 
(May 8 - June 1) 
-	Engage with mentors to discuss different environment variables, their 
	dependencies, and the best approach for refactoring.
-	Finalize an implementation plan based on discussions.
-	Since I will be on summer vacation, I can start coding early and make progress 
	on the project.

----------

Coding Period: 
(June 2 - August 25) 
-	Identify the appropriate subsystem for each global variable and relocate it 
	into struct repository, struct repo_settings, or other suitable contexts.
-	Modify function signatures to pass the new contexts explicitly, replacing 
	reliance on global variables.
-	Pick subsystems and remove the macro 
	#define USE_THE_REPOSITORY_VARIABLE and thereby eliminating usage
	of the global variable ‘the_repository’.
-	Continuously submit patches for review and incorporate feedback from mentors
	and the community.  
-	I plan to write weekly blogs which will document what I did in the whole 
	week.

----------

Final Week: 
(August 25 - September 1) 
-	Write a detailed report on the entire project.  
-	Fix bugs if any.  
-	Reflect on the project, noting challenges faced and lessons learned.


Blogging:
---------

I have also set up a blogging page at [11]. While reading blogs from previous
GSoC contributors, I found them useful in understanding the challenges
they faced and how they approached their projects. Their experiences gave
me a better idea of what to expect and how to navigate the development
process. Inspired by this, I decided to start my own blog to document my
journey throughout GSoC. This will not only help me track my own progress but
also serve as a resource for future contributors who might work on similar
projects. I plan to share updates on my work, challenges encountered and
insights gained from discussions with mentors and the community.

Additionally, I hope my blog encourages more people to contribute to open
source by providing a transparent look into the development process. Writing
about my experience will also help me reflect on my work and improve my
ability to communicate technical ideas effectively.

I liked the format and structure of Chandra's blog, so I decided to use the
same template for my own blogging page.


Availability:
-------------

As a college student, I intend to utilise my summer breaks from May to July
to work on the project. After completing my University exams in April, I can
start working in May. I can dedicate 40 hours a week from May to July, while
in August after the classes commence, I can dedicate about 25 hours a week.

There are no exams or planned vacations throughout the coding period. Besides
this project, I have no commitments/vacations planned for the summer. I shall
keep my status posted to all the community members and maintain transparency
in the project.


Post-GSOC:
----------

Beyond contributing code, I strongly believe in giving back to the community
and helping others grow. Open source thrives on mentorship, knowledge sharing,
and long-term involvement, and I would love to continue contributing even
after GSoC ends.

I have always valued mentorship, both as a mentee and as someone who enjoys
guiding others. If given the opportunity, I would be more than happy to
mentor/co-mentor future GSoC contributors. By staying involved in the
community, whether through contributing, reviewing patches, or mentoring,
I hope to help sustain and expand the project’s reach. I look at GSoC as not 
just as a one-time contribution but as a step toward a longer-term relationship
with open source.

I will continue to be involved with Git even after GSoC by contributing patches,
reviewing code, and participating in discussions. My work on refactoring Git’s 
state aligns with long-term improvements to the codebase, and I plan to keep 
refining it beyond the program. I see GSoC as just the beginning of my journey
with Git.

Appreciation:
-------------

I appreciate the Git community for its excellent documentation, which made it 
much easier for me to understand Git in depth. The well-structured resources 
helped me navigate the codebase and gain a deeper understanding of how Git 
works internally.

Beyond the documentation, I am also grateful for how welcoming and supportive 
the community has been. Whether through discussions on the mailing list or 
feedback on my patches, the information and guidance I received made my 
experience even better.

Additionally, I read the blogs and proposals of Chandra, Jialuo, and Ghanashyam, 
which provided valuable insights into their journeys and helped me shape my 
own approach to contributing.

Thanks for reviewing this proposal.

References:
-----------

[1] https://github.com/sdslabs/beast/pull/374

[2] https://github.com/sdslabs/beast/tree/add-teams-with-hint

[3] https://github.com/sdslabs/playCTF/pull/177

[4] https://public-inbox.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/

[5] https://public-inbox.org/git/20250303-b4-pks-objects-without-the-repository-v1-0-c5dd43f2476e@pks.im/

[6] https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-0-4e77f0313206@pks.im/

[7] https://public-inbox.org/git/pull.1829.git.1731653548549.gitgitgadget@gmail.com/#t

[8] https://public-inbox.org/git/cover.1733236936.git.karthik.188@gmail.com/

[9] https://public-inbox.org/git/cover.1724923648.git.ps@pks.im/

[10] https://public-inbox.org/git/cover.1718347699.git.ps@pks.im/

[11] https://ayu-ch.github.io

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-04-08 12:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-26  5:26 [GSOC] [PROPOSAL V1]: Refactoring in order to reduce Git’s global state Ayush Chandekar
2025-03-28 13:06 ` shejialuo
2025-03-29  9:54   ` Ayush Chandekar
2025-03-31 14:17     ` shejialuo
2025-03-31 15:04       ` Ayush Chandekar
2025-03-31 15:18         ` Ayush Chandekar
2025-04-04  8:51 ` [GSOC] [PROPOSAL v2]: " Ayush Chandekar
2025-04-04 14:45   ` Karthik Nayak
2025-04-06 10:44     ` Ayush Chandekar
2025-04-07  9:06       ` Christian Couder
2025-04-07 10:07         ` Ayush Chandekar
2025-04-07  8:42   ` Ayush Chandekar
2025-04-08 12:52 ` [GSOC] [PROPOSAL v3]: " Ayush Chandekar
  -- strict thread matches above, loose matches on Subject: below --
2025-04-02 18:14 [GSoC PROPOSAL v1] " Arnav Bhate
2025-04-03  9:59 ` Patrick Steinhardt
2025-04-03 15:26   ` Arnav Bhate
2025-04-04  9:19     ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).