* Re: [PATCH] read-cache: use index state repository for trace2 logging
2026-03-27 10:08 [PATCH] read-cache: use index state repository for trace2 logging Jayesh Daga via GitGitGadget
@ 2026-03-27 13:48 ` Derrick Stolee
2026-03-27 16:28 ` Junio C Hamano
2026-03-28 7:14 ` [PATCH v2] read-cache: use istate->repo " Jayesh Daga via GitGitGadget
1 sibling, 1 reply; 5+ messages in thread
From: Derrick Stolee @ 2026-03-27 13:48 UTC (permalink / raw)
To: Jayesh Daga via GitGitGadget, git
Cc: Justin Tobler, Ayush Chandekar, Siddharth Asthana, Jayesh Daga
On 3/27/2026 6:08 AM, Jayesh Daga via GitGitGadget wrote:
> Robustness: The ternary fallback ensures we avoid potential NULL pointer
> dereferences while maintaining existing logging behavior in edge cases.
> + r = istate->repo ? istate->repo : the_repository;
If I understand correctly, it is a bug if istate->repo is NULL.
Did you try running the test suite with istate->repo as a replacement for
the_repository in these tracing calls? Is there a legitimate scenario
where this would be NULL?
> + trace2_data_intmax("index", r, "read/version",
> istate->version);
> - trace2_data_intmax("index", the_repository, "read/cache_nr",
> + trace2_data_intmax("index", r, "read/cache_nr",
> istate->cache_nr);
Other than that, this is a minor improvement in the right direction. I'd
rather that it be more complete if you are working in this file.
There are several places where we use the_repository and there is an
'istate' right there. If this change works, then why not apply that same
transformation in the other places?
There are TODO comments for many of these, including the hunk you are
editing (be sure to remove these!).
There are other cases that I see in this file:
* refresh_index() uses the_repository for progress.
* tweak_untracked_cache() has a local pointer 'r' that could
be set to istate->repo
* tweak_split_index() passes the_repository to
repo_config_get_split_index().
* do_read_index() uses the_repository when it could use
istate->repo.
* read_index_from() uses the_repository for tracing.
* verify_index_from() uses the_repository->hash_algo but
has an istate->repo that could be used.
* do_write_index() has several instances.
(At this point, I stopped taking inventory.)
There are other uses of the_repository that will be harder to remove as
there isn't a local istate at the time, so those aren't worth combining
with this kind of effort.
If you are already working in this space, then I recommend figuring out
how much we can rely on istate->repo and then apply that knowledge to
these cases as separate commits:
1. Replace the uses in the trace2 calls with istate->repo
and delete the TODO comments.
2. Replace the other uses of the_repository when an istate
exists already.
Thanks, -Stolee
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v2] read-cache: use istate->repo for trace2 logging
2026-03-27 10:08 [PATCH] read-cache: use index state repository for trace2 logging Jayesh Daga via GitGitGadget
2026-03-27 13:48 ` Derrick Stolee
@ 2026-03-28 7:14 ` Jayesh Daga via GitGitGadget
1 sibling, 0 replies; 5+ messages in thread
From: Jayesh Daga via GitGitGadget @ 2026-03-28 7:14 UTC (permalink / raw)
To: git
Cc: Karthik Nayak, Justin Tobler, Ayush Chandekar, Siddharth Asthana,
Jayesh Daga, Jayesh Daga
From: Jayesh Daga <jayeshdaga99@gmail.com>
trace2_data_intmax() calls in do_read_index() currently use the
global 'the_repository' instance, even though the index_state
already carries an explicit repository pointer (istate->repo).
index_state instances are initialized via INDEX_STATE_INIT(r),
which sets istate->repo. Using the_repository here is therefore
redundant and obscures the actual repository context associated
with the index being read.
In particular, using the global repository can lead to misleading
trace2 output in scenarios where multiple repository instances are
in use (such as tests or future refactoring toward better library
boundaries), as events may be attributed to the wrong repository.
Switch these calls to use istate->repo directly, making the
association between the index and its repository explicit.
Since istate->repo is expected to be non-NULL, enforce this
assumption with a BUG() check so that any violation of this
invariant is caught early.
Also remove the now-obsolete TODO comment.
Signed-off-by: Jayesh Daga <jayeshdaga99@gmail.com>
---
[GSoC] read-cache: use index state repository for trace2 logging
HIGH LEVEL
==========
trace2_data_intmax() calls in read-cache.c currently use the global
'the_repository' instance, even though index_state already carries an
explicit repository pointer (istate->repo).
Use istate->repo directly in these calls, making the repository context
associated with the index explicit.
Low-level (Implementation & Justification)
==========================================
index_state instances are initialized via INDEX_STATE_INIT(r), which
sets istate->repo. Therefore, istate->repo is expected to be non-NULL in
normal code paths.
Using the global 'the_repository' in this context is redundant and can
obscure the actual repository associated with the index being read.
In particular, relying on the global repository may lead to misleading
trace2 output in scenarios where multiple repository instances are in
use (e.g., tests or future refactoring toward improved library
boundaries), as events could be attributed to the wrong repository.
Changes:
* Replace uses of the_repository in trace2_data_intmax() with
istate->repo
* Enforce the expectation that istate->repo is non-NULL via BUG()
* Remove the now-obsolete TODO comment
Example:
* trace2_data_intmax("index", the_repository, "read/version", ...)
* trace2_data_intmax("index", istate->repo, "read/version", ...)
Summary
=======
Use the repository associated with index_state for trace2 logging
instead of the global repository instance, making the repository context
explicit and consistent with existing initialization guarantees.
Changes since v1:
=================
* Dropped fallback to the_repository; rely on istate->repo invariant
* Added BUG() check for NULL istate->repo
* Removed TODO comment
* Improved commit message with detailed rationale
* Cleaned up duplicate commits and fixed history
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2253%2Fjayesh0104%2Ftrace2-istate-repo-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2253/jayesh0104/trace2-istate-repo-v2
Pull-Request: https://github.com/git/git/pull/2253
Range-diff vs v1:
1: ba6d2add1d ! 1: 7fdfaaef9b read-cache: use index state repository for trace2 logging
@@
## Metadata ##
-Author: jayesh0104 <jayeshdaga99@gmail.com>
+Author: Jayesh Daga <jayeshdaga99@gmail.com>
## Commit message ##
- read-cache: use index state repository for trace2 logging
+ read-cache: use istate->repo for trace2 logging
- Replace uses of the_repository in trace2_data_intmax() with
- istate->repo, which represents the repository associated with
- the index state.
+ trace2_data_intmax() calls in do_read_index() currently use the
+ global 'the_repository' instance, even though the index_state
+ already carries an explicit repository pointer (istate->repo).
- This avoids relying on global repository state and aligns with
- other parts of the codebase (e.g., sparse-index.c) that pass the
- repository instance explicitly.
+ index_state instances are initialized via INDEX_STATE_INIT(r),
+ which sets istate->repo. Using the_repository here is therefore
+ redundant and obscures the actual repository context associated
+ with the index being read.
- No functional change intended.
+ In particular, using the global repository can lead to misleading
+ trace2 output in scenarios where multiple repository instances are
+ in use (such as tests or future refactoring toward better library
+ boundaries), as events may be attributed to the wrong repository.
- Signed-off-by: jayesh0104 <jayeshdaga99@gmail.com>
+ Switch these calls to use istate->repo directly, making the
+ association between the index and its repository explicit.
+
+ Since istate->repo is expected to be non-NULL, enforce this
+ assumption with a BUG() check so that any violation of this
+ invariant is caught early.
+
+ Also remove the now-obsolete TODO comment.
+
+ Signed-off-by: Jayesh Daga <jayeshdaga99@gmail.com>
## read-cache.c ##
@@ read-cache.c: int do_read_index(struct index_state *istate, const char *path, int must_exist)
@@ read-cache.c: int do_read_index(struct index_state *istate, const char *path, in
if (istate->initialized)
return istate->cache_nr;
@@ read-cache.c: int do_read_index(struct index_state *istate, const char *path, int must_exist)
- * TODO trace2: replace "the_repository" with the actual repo instance
- * that is associated with the given "istate".
- */
+ }
+ munmap((void *)mmap, mmap_size);
+
+- /*
+- * TODO trace2: replace "the_repository" with the actual repo instance
+- * that is associated with the given "istate".
+- */
- trace2_data_intmax("index", the_repository, "read/version",
-+ r = istate->repo ? istate->repo : the_repository;
++ r=istate->repo;
++ if (!r)
++ BUG("istate->repo is NULL in do_read_index");
+ trace2_data_intmax("index", r, "read/version",
istate->version);
- trace2_data_intmax("index", the_repository, "read/cache_nr",
read-cache.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/read-cache.c b/read-cache.c
index 5049f9baca..46ffb49cab 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2206,6 +2206,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
size_t extension_offset = 0;
int nr_threads, cpus;
struct index_entry_offset_table *ieot = NULL;
+ struct repository *r;
if (istate->initialized)
return istate->cache_nr;
@@ -2309,13 +2310,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
}
munmap((void *)mmap, mmap_size);
- /*
- * TODO trace2: replace "the_repository" with the actual repo instance
- * that is associated with the given "istate".
- */
- trace2_data_intmax("index", the_repository, "read/version",
+ r=istate->repo;
+ if (!r)
+ BUG("istate->repo is NULL in do_read_index");
+ trace2_data_intmax("index", r, "read/version",
istate->version);
- trace2_data_intmax("index", the_repository, "read/cache_nr",
+ trace2_data_intmax("index", r, "read/cache_nr",
istate->cache_nr);
/*
base-commit: ca1db8a0f7dc0dbea892e99f5b37c5fe5861be71
--
gitgitgadget
^ permalink raw reply related [flat|nested] 5+ messages in thread