* [PATCH] read-cache: use index state repository for trace2 logging
@ 2026-03-27 10:08 Jayesh Daga via GitGitGadget
2026-03-27 13:48 ` Derrick Stolee
2026-03-28 7:14 ` [PATCH v2] read-cache: use istate->repo " Jayesh Daga via GitGitGadget
0 siblings, 2 replies; 5+ messages in thread
From: Jayesh Daga via GitGitGadget @ 2026-03-27 10:08 UTC (permalink / raw)
To: git
Cc: Justin Tobler, Ayush Chandekar, Siddharth Asthana, Jayesh Daga,
jayesh0104
From: jayesh0104 <jayeshdaga99@gmail.com>
Replace uses of the_repository in trace2_data_intmax() with
istate->repo, which represents the repository associated with
the index state.
This avoids relying on global repository state and aligns with
other parts of the codebase (e.g., sparse-index.c) that pass the
repository instance explicitly.
No functional change intended.
Signed-off-by: jayesh0104 <jayeshdaga99@gmail.com>
---
[GSoC] read-cache: use index state repository for trace2 logging
HIGH LEVEL
==========
The current implementation of trace2_data_intmax() in read-cache.c
relies on the global the_repository instance.
As part of the ongoing effort to "lib-ify" the Git codebase and reduce
dependence on global state, this patch transitions those calls to use
the repository instance associated with the index_state.
Low-level (Implementation & Justification)
==========================================
In read-cache.c, the index_state (istate) typically carries a pointer to
its associated repository. However, because istate->repo is not
guaranteed to be initialized in all code paths (e.g., certain low-level
utility or testing contexts), this patch implements a defensive fallback
pattern.
Changes:
Introduced a local repository pointer r that prefers istate->repo but
falls back to the_repository if the former is NULL.
Updated trace2_data_intmax() calls to use this context-aware pointer.
+ struct repository *r = istate->repo ? istate->repo : the_repository;
- trace2_data_intmax("index", the_repository, "read/version", istate->version);
+ trace2_data_intmax("index", r, "read/version", istate->version);
Benefits:
Thread Safety & Modernization: Aligns with the project's goal of moving
away from the_repository.
Robustness: The ternary fallback ensures we avoid potential NULL pointer
dereferences while maintaining existing logging behavior in edge cases.
Consistency: Follows patterns seen in other modernized areas of the
codebase.
Summary
=======
Transitioned trace2 logging in read-cache.c from global to local
repository context.
Implemented a safety fallback to the_repository to handle uninitialized
istate->repo pointers.
No functional changes to telemetry output are intended.
cc :Karthik Nayak karthik.188@gmail.com
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2253%2Fjayesh0104%2Ftrace2-istate-repo-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2253/jayesh0104/trace2-istate-repo-v1
Pull-Request: https://github.com/git/git/pull/2253
read-cache.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/read-cache.c b/read-cache.c
index 5049f9baca..2c5c5165e0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2206,6 +2206,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
size_t extension_offset = 0;
int nr_threads, cpus;
struct index_entry_offset_table *ieot = NULL;
+ struct repository *r;
if (istate->initialized)
return istate->cache_nr;
@@ -2313,9 +2314,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
* TODO trace2: replace "the_repository" with the actual repo instance
* that is associated with the given "istate".
*/
- trace2_data_intmax("index", the_repository, "read/version",
+ r = istate->repo ? istate->repo : the_repository;
+ trace2_data_intmax("index", r, "read/version",
istate->version);
- trace2_data_intmax("index", the_repository, "read/cache_nr",
+ trace2_data_intmax("index", r, "read/cache_nr",
istate->cache_nr);
/*
base-commit: ca1db8a0f7dc0dbea892e99f5b37c5fe5861be71
--
gitgitgadget
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] read-cache: use index state repository for trace2 logging
2026-03-27 10:08 [PATCH] read-cache: use index state repository for trace2 logging Jayesh Daga via GitGitGadget
@ 2026-03-27 13:48 ` Derrick Stolee
2026-03-27 16:28 ` Junio C Hamano
2026-03-28 7:14 ` [PATCH v2] read-cache: use istate->repo " Jayesh Daga via GitGitGadget
1 sibling, 1 reply; 5+ messages in thread
From: Derrick Stolee @ 2026-03-27 13:48 UTC (permalink / raw)
To: Jayesh Daga via GitGitGadget, git
Cc: Justin Tobler, Ayush Chandekar, Siddharth Asthana, Jayesh Daga
On 3/27/2026 6:08 AM, Jayesh Daga via GitGitGadget wrote:
> Robustness: The ternary fallback ensures we avoid potential NULL pointer
> dereferences while maintaining existing logging behavior in edge cases.
> + r = istate->repo ? istate->repo : the_repository;
If I understand correctly, it is a bug if istate->repo is NULL.
Did you try running the test suite with istate->repo as a replacement for
the_repository in these tracing calls? Is there a legitimate scenario
where this would be NULL?
> + trace2_data_intmax("index", r, "read/version",
> istate->version);
> - trace2_data_intmax("index", the_repository, "read/cache_nr",
> + trace2_data_intmax("index", r, "read/cache_nr",
> istate->cache_nr);
Other than that, this is a minor improvement in the right direction. I'd
rather that it be more complete if you are working in this file.
There are several places where we use the_repository and there is an
'istate' right there. If this change works, then why not apply that same
transformation in the other places?
There are TODO comments for many of these, including the hunk you are
editing (be sure to remove these!).
There are other cases that I see in this file:
* refresh_index() uses the_repository for progress.
* tweak_untracked_cache() has a local pointer 'r' that could
be set to istate->repo
* tweak_split_index() passes the_repository to
repo_config_get_split_index().
* do_read_index() uses the_repository when it could use
istate->repo.
* read_index_from() uses the_repository for tracing.
* verify_index_from() uses the_repository->hash_algo but
has an istate->repo that could be used.
* do_write_index() has several instances.
(At this point, I stopped taking inventory.)
There are other uses of the_repository that will be harder to remove as
there isn't a local istate at the time, so those aren't worth combining
with this kind of effort.
If you are already working in this space, then I recommend figuring out
how much we can rely on istate->repo and then apply that knowledge to
these cases as separate commits:
1. Replace the uses in the trace2 calls with istate->repo
and delete the TODO comments.
2. Replace the other uses of the_repository when an istate
exists already.
Thanks, -Stolee
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] read-cache: use index state repository for trace2 logging
2026-03-27 13:48 ` Derrick Stolee
@ 2026-03-27 16:28 ` Junio C Hamano
2026-03-28 5:25 ` jayesh0104
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2026-03-27 16:28 UTC (permalink / raw)
To: Derrick Stolee
Cc: Jayesh Daga via GitGitGadget, git, Justin Tobler, Ayush Chandekar,
Siddharth Asthana, Jayesh Daga
Derrick Stolee <stolee@gmail.com> writes:
> On 3/27/2026 6:08 AM, Jayesh Daga via GitGitGadget wrote:
>
>> Robustness: The ternary fallback ensures we avoid potential NULL pointer
>> dereferences while maintaining existing logging behavior in edge cases.
>
>> + r = istate->repo ? istate->repo : the_repository;
>
> If I understand correctly, it is a bug if istate->repo is NULL.
Because INDEX_STATE_INIT(r) assigns the repository as the first
thing, I tend to agree. A (bare) repository can lack the index
so repo->index might be NULL, but if you have an istate instance,
it should always know which repository it came from.
>> + trace2_data_intmax("index", r, "read/version",
>> istate->version);
>> - trace2_data_intmax("index", the_repository, "read/cache_nr",
>> + trace2_data_intmax("index", r, "read/cache_nr",
>> istate->cache_nr);
>
> Other than that, this is a minor improvement in the right direction. I'd
> rather that it be more complete if you are working in this file.
;-) Long timers always aim higher than posted patches.
> If you are already working in this space, then I recommend figuring out
> how much we can rely on istate->repo and then apply that knowledge to
> these cases as separate commits:
>
> 1. Replace the uses in the trace2 calls with istate->repo
> and delete the TODO comments.
>
> 2. Replace the other uses of the_repository when an istate
> exists already.
Excellent suggestion.
Thanks.
By the way, Jeyesh, do you really want to be known with a numbered
"jayesh0104" as your name? These author identities are cast in
stone in commit objects and will stay with the project.
Also see Documentation/SubmittingPatches::[dco,real-name].
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] read-cache: use index state repository for trace2 logging
2026-03-27 16:28 ` Junio C Hamano
@ 2026-03-28 5:25 ` jayesh0104
0 siblings, 0 replies; 5+ messages in thread
From: jayesh0104 @ 2026-03-28 5:25 UTC (permalink / raw)
To: gitster
Cc: ayu.chandekar, git, gitgitgadget, jayeshdaga99, jltobler,
siddharthasthana31, stolee
Hi Junio, Derrick,
Thanks for the detailed review and suggestions.
On the fallback to `the_repository`: I agree with your observation that `istate->repo` being NULL would indicate a bug rather than a scenario to defensively handle. My initial intent was to be conservative in case there were edge paths where `istate->repo` might not be initialized, but given that INDEX_STATE_INIT(r) sets this unconditionally, it makes sense to rely on that invariant instead of masking potential issues. I will drop the fallback and use `istate->repo` directly (and verify via the test suite).
Regarding scope, Derrick’s suggestion to split this into separate commits makes sense. I’ll proceed as follows:
1. A focused patch that replaces `the_repository` with `istate->repo` in the trace2 calls within this file and removes the associated TODO comments.
2. A follow-up patch that replaces other uses of `the_repository` in places where an `istate` is already available (e.g., `refresh_index()`, `tweak_untracked_cache()`, `do_write_index()`, etc.), keeping changes logically grouped for easier review.
I’ll also run the full test suite after removing the fallback to confirm there are no hidden assumptions.
Junio, thanks also for pointing out the author identity. I’ll update it to use my real name in the next version.
Thanks again for the guidance.
Best,
Jayesh
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] read-cache: use istate->repo for trace2 logging
2026-03-27 10:08 [PATCH] read-cache: use index state repository for trace2 logging Jayesh Daga via GitGitGadget
2026-03-27 13:48 ` Derrick Stolee
@ 2026-03-28 7:14 ` Jayesh Daga via GitGitGadget
1 sibling, 0 replies; 5+ messages in thread
From: Jayesh Daga via GitGitGadget @ 2026-03-28 7:14 UTC (permalink / raw)
To: git
Cc: Karthik Nayak, Justin Tobler, Ayush Chandekar, Siddharth Asthana,
Jayesh Daga, Jayesh Daga
From: Jayesh Daga <jayeshdaga99@gmail.com>
trace2_data_intmax() calls in do_read_index() currently use the
global 'the_repository' instance, even though the index_state
already carries an explicit repository pointer (istate->repo).
index_state instances are initialized via INDEX_STATE_INIT(r),
which sets istate->repo. Using the_repository here is therefore
redundant and obscures the actual repository context associated
with the index being read.
In particular, using the global repository can lead to misleading
trace2 output in scenarios where multiple repository instances are
in use (such as tests or future refactoring toward better library
boundaries), as events may be attributed to the wrong repository.
Switch these calls to use istate->repo directly, making the
association between the index and its repository explicit.
Since istate->repo is expected to be non-NULL, enforce this
assumption with a BUG() check so that any violation of this
invariant is caught early.
Also remove the now-obsolete TODO comment.
Signed-off-by: Jayesh Daga <jayeshdaga99@gmail.com>
---
[GSoC] read-cache: use index state repository for trace2 logging
HIGH LEVEL
==========
trace2_data_intmax() calls in read-cache.c currently use the global
'the_repository' instance, even though index_state already carries an
explicit repository pointer (istate->repo).
Use istate->repo directly in these calls, making the repository context
associated with the index explicit.
Low-level (Implementation & Justification)
==========================================
index_state instances are initialized via INDEX_STATE_INIT(r), which
sets istate->repo. Therefore, istate->repo is expected to be non-NULL in
normal code paths.
Using the global 'the_repository' in this context is redundant and can
obscure the actual repository associated with the index being read.
In particular, relying on the global repository may lead to misleading
trace2 output in scenarios where multiple repository instances are in
use (e.g., tests or future refactoring toward improved library
boundaries), as events could be attributed to the wrong repository.
Changes:
* Replace uses of the_repository in trace2_data_intmax() with
istate->repo
* Enforce the expectation that istate->repo is non-NULL via BUG()
* Remove the now-obsolete TODO comment
Example:
* trace2_data_intmax("index", the_repository, "read/version", ...)
* trace2_data_intmax("index", istate->repo, "read/version", ...)
Summary
=======
Use the repository associated with index_state for trace2 logging
instead of the global repository instance, making the repository context
explicit and consistent with existing initialization guarantees.
Changes since v1:
=================
* Dropped fallback to the_repository; rely on istate->repo invariant
* Added BUG() check for NULL istate->repo
* Removed TODO comment
* Improved commit message with detailed rationale
* Cleaned up duplicate commits and fixed history
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2253%2Fjayesh0104%2Ftrace2-istate-repo-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2253/jayesh0104/trace2-istate-repo-v2
Pull-Request: https://github.com/git/git/pull/2253
Range-diff vs v1:
1: ba6d2add1d ! 1: 7fdfaaef9b read-cache: use index state repository for trace2 logging
@@
## Metadata ##
-Author: jayesh0104 <jayeshdaga99@gmail.com>
+Author: Jayesh Daga <jayeshdaga99@gmail.com>
## Commit message ##
- read-cache: use index state repository for trace2 logging
+ read-cache: use istate->repo for trace2 logging
- Replace uses of the_repository in trace2_data_intmax() with
- istate->repo, which represents the repository associated with
- the index state.
+ trace2_data_intmax() calls in do_read_index() currently use the
+ global 'the_repository' instance, even though the index_state
+ already carries an explicit repository pointer (istate->repo).
- This avoids relying on global repository state and aligns with
- other parts of the codebase (e.g., sparse-index.c) that pass the
- repository instance explicitly.
+ index_state instances are initialized via INDEX_STATE_INIT(r),
+ which sets istate->repo. Using the_repository here is therefore
+ redundant and obscures the actual repository context associated
+ with the index being read.
- No functional change intended.
+ In particular, using the global repository can lead to misleading
+ trace2 output in scenarios where multiple repository instances are
+ in use (such as tests or future refactoring toward better library
+ boundaries), as events may be attributed to the wrong repository.
- Signed-off-by: jayesh0104 <jayeshdaga99@gmail.com>
+ Switch these calls to use istate->repo directly, making the
+ association between the index and its repository explicit.
+
+ Since istate->repo is expected to be non-NULL, enforce this
+ assumption with a BUG() check so that any violation of this
+ invariant is caught early.
+
+ Also remove the now-obsolete TODO comment.
+
+ Signed-off-by: Jayesh Daga <jayeshdaga99@gmail.com>
## read-cache.c ##
@@ read-cache.c: int do_read_index(struct index_state *istate, const char *path, int must_exist)
@@ read-cache.c: int do_read_index(struct index_state *istate, const char *path, in
if (istate->initialized)
return istate->cache_nr;
@@ read-cache.c: int do_read_index(struct index_state *istate, const char *path, int must_exist)
- * TODO trace2: replace "the_repository" with the actual repo instance
- * that is associated with the given "istate".
- */
+ }
+ munmap((void *)mmap, mmap_size);
+
+- /*
+- * TODO trace2: replace "the_repository" with the actual repo instance
+- * that is associated with the given "istate".
+- */
- trace2_data_intmax("index", the_repository, "read/version",
-+ r = istate->repo ? istate->repo : the_repository;
++ r=istate->repo;
++ if (!r)
++ BUG("istate->repo is NULL in do_read_index");
+ trace2_data_intmax("index", r, "read/version",
istate->version);
- trace2_data_intmax("index", the_repository, "read/cache_nr",
read-cache.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/read-cache.c b/read-cache.c
index 5049f9baca..46ffb49cab 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2206,6 +2206,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
size_t extension_offset = 0;
int nr_threads, cpus;
struct index_entry_offset_table *ieot = NULL;
+ struct repository *r;
if (istate->initialized)
return istate->cache_nr;
@@ -2309,13 +2310,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
}
munmap((void *)mmap, mmap_size);
- /*
- * TODO trace2: replace "the_repository" with the actual repo instance
- * that is associated with the given "istate".
- */
- trace2_data_intmax("index", the_repository, "read/version",
+ r=istate->repo;
+ if (!r)
+ BUG("istate->repo is NULL in do_read_index");
+ trace2_data_intmax("index", r, "read/version",
istate->version);
- trace2_data_intmax("index", the_repository, "read/cache_nr",
+ trace2_data_intmax("index", r, "read/cache_nr",
istate->cache_nr);
/*
base-commit: ca1db8a0f7dc0dbea892e99f5b37c5fe5861be71
--
gitgitgadget
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-28 7:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 10:08 [PATCH] read-cache: use index state repository for trace2 logging Jayesh Daga via GitGitGadget
2026-03-27 13:48 ` Derrick Stolee
2026-03-27 16:28 ` Junio C Hamano
2026-03-28 5:25 ` jayesh0104
2026-03-28 7:14 ` [PATCH v2] read-cache: use istate->repo " Jayesh Daga via GitGitGadget
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox