* [PATCH 1/4] refs: expose `ref_iterator` via 'refs.h'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
@ 2025-07-01 15:03 ` Karthik Nayak
  2025-07-01 15:03 ` [PATCH 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-01 15:03 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak
The `ref_iterator` is an internal structure to the 'refs/'
sub-directory, which allows iteration over refs. All reference iteration
is built on top of these iterators.
External clients of the 'refs' subsystem use the various
'refs_for_each...()' functions to iterate over refs. However since these
are wrapper functions, each combination of functionality requires a new
wrapper function. This is not feasible as the functions pile up with the
increase in requirements. Expose the internal reference iterator, so
advanced users can mix and match options as needed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++
  | 145 +------------------------------------------------
 2 files changed, 150 insertions(+), 143 deletions(-)
 --git a/refs.h b/refs.h
index 46a6008e07..c05be6d0ac 100644
--- a/refs.h
+++ b/refs.h
@@ -1190,4 +1190,152 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 				    unsigned int flags,
 				    struct strbuf *err);
 
+/*
+ * Reference iterators
+ *
+ * A reference iterator encapsulates the state of an in-progress
+ * iteration over references. Create an instance of `struct
+ * ref_iterator` via one of the functions in this module.
+ *
+ * A freshly-created ref_iterator doesn't yet point at a reference. To
+ * advance the iterator, call ref_iterator_advance(). If successful,
+ * this sets the iterator's refname, oid, and flags fields to describe
+ * the next reference and returns ITER_OK. The data pointed at by
+ * refname and oid belong to the iterator; if you want to retain them
+ * after calling ref_iterator_advance() again or calling
+ * ref_iterator_free(), you must make a copy. When the iteration has
+ * been exhausted, ref_iterator_advance() releases any resources
+ * associated with the iteration, frees the ref_iterator object, and
+ * returns ITER_DONE. If you want to abort the iteration early, call
+ * ref_iterator_free(), which also frees the ref_iterator object and
+ * any associated resources. If there was an internal error advancing
+ * to the next entry, ref_iterator_advance() aborts the iteration,
+ * frees the ref_iterator, and returns ITER_ERROR.
+ *
+ * The reference currently being looked at can be peeled by calling
+ * ref_iterator_peel(). This function is often faster than peel_ref(),
+ * so it should be preferred when iterating over references.
+ *
+ * Putting it all together, a typical iteration looks like this:
+ *
+ *     int ok;
+ *     struct ref_iterator *iter = ...;
+ *
+ *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+ *             if (want_to_stop_iteration()) {
+ *                     ok = ITER_DONE;
+ *                     break;
+ *             }
+ *
+ *             // Access information about the current reference:
+ *             if (!(iter->flags & REF_ISSYMREF))
+ *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
+ *
+ *             // If you need to peel the reference:
+ *             ref_iterator_peel(iter, &oid);
+ *     }
+ *
+ *     if (ok != ITER_DONE)
+ *             handle_error();
+ *     ref_iterator_free(iter);
+ */
+struct ref_iterator;
+
+/*
+ * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
+ * which feeds it).
+ */
+enum do_for_each_ref_flags {
+	/*
+	 * Include broken references in a do_for_each_ref*() iteration, which
+	 * would normally be omitted. This includes both refs that point to
+	 * missing objects (a true repository corruption), ones with illegal
+	 * names (which we prefer not to expose to callers), as well as
+	 * dangling symbolic refs (i.e., those that point to a non-existent
+	 * ref; this is not a corruption, but as they have no valid oid, we
+	 * omit them from normal iteration results).
+	 */
+	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
+
+	/*
+	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
+	 * Normally this will be used with a files ref_store, since that's
+	 * where all reference backends will presumably store their
+	 * per-worktree refs.
+	 */
+	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
+
+	/*
+	 * Omit dangling symrefs from output; this only has an effect with
+	 * INCLUDE_BROKEN, since they are otherwise not included at all.
+	 */
+	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
+
+	/*
+	 * Include root refs i.e. HEAD and pseudorefs along with the regular
+	 * refs.
+	 */
+	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
+};
+
+/*
+ * Return an iterator that goes over each reference in `refs` for
+ * which the refname begins with prefix. If trim is non-zero, then
+ * trim that many characters off the beginning of each refname.
+ * The output is ordered by refname.
+ */
+struct ref_iterator *refs_ref_iterator_begin(
+		struct ref_store *refs,
+		const char *prefix, const char **exclude_patterns,
+		int trim, enum do_for_each_ref_flags flags);
+
+/*
+ * Advance the iterator to the first or next item and return ITER_OK.
+ * If the iteration is exhausted, free the resources associated with
+ * the ref_iterator and return ITER_DONE. On errors, free the iterator
+ * resources and return ITER_ERROR. It is a bug to use ref_iterator or
+ * call this function again after it has returned ITER_DONE or
+ * ITER_ERROR.
+ */
+int ref_iterator_advance(struct ref_iterator *ref_iterator);
+
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
+/*
+ * If possible, peel the reference currently being viewed by the
+ * iterator. Return 0 on success.
+ */
+int ref_iterator_peel(struct ref_iterator *ref_iterator,
+		      struct object_id *peeled);
+
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
+
+/*
+ * The common backend for the for_each_*ref* functions. Call fn for
+ * each reference in iter. If the iterator itself ever returns
+ * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
+ * the iteration and return that value. Otherwise, return 0. In any
+ * case, free the iterator when done. This function is basically an
+ * adapter between the callback style of reference iteration and the
+ * iterator style.
+ */
+int do_for_each_ref_iterator(struct ref_iterator *iter,
+			     each_ref_fn fn, void *cb_data);
+
+
 #endif /* REFS_H */
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index f868870851..03f5df04d5 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -244,90 +244,8 @@ const char *find_descendant_ref(const char *dirname,
 #define SYMREF_MAXDEPTH 5
 
 /*
- * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
- * which feeds it).
- */
-enum do_for_each_ref_flags {
-	/*
-	 * Include broken references in a do_for_each_ref*() iteration, which
-	 * would normally be omitted. This includes both refs that point to
-	 * missing objects (a true repository corruption), ones with illegal
-	 * names (which we prefer not to expose to callers), as well as
-	 * dangling symbolic refs (i.e., those that point to a non-existent
-	 * ref; this is not a corruption, but as they have no valid oid, we
-	 * omit them from normal iteration results).
-	 */
-	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
-
-	/*
-	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
-	 * Normally this will be used with a files ref_store, since that's
-	 * where all reference backends will presumably store their
-	 * per-worktree refs.
-	 */
-	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
-
-	/*
-	 * Omit dangling symrefs from output; this only has an effect with
-	 * INCLUDE_BROKEN, since they are otherwise not included at all.
-	 */
-	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
-
-	/*
-	 * Include root refs i.e. HEAD and pseudorefs along with the regular
-	 * refs.
-	 */
-	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
-};
-
-/*
- * Reference iterators
- *
- * A reference iterator encapsulates the state of an in-progress
- * iteration over references. Create an instance of `struct
- * ref_iterator` via one of the functions in this module.
- *
- * A freshly-created ref_iterator doesn't yet point at a reference. To
- * advance the iterator, call ref_iterator_advance(). If successful,
- * this sets the iterator's refname, oid, and flags fields to describe
- * the next reference and returns ITER_OK. The data pointed at by
- * refname and oid belong to the iterator; if you want to retain them
- * after calling ref_iterator_advance() again or calling
- * ref_iterator_free(), you must make a copy. When the iteration has
- * been exhausted, ref_iterator_advance() releases any resources
- * associated with the iteration, frees the ref_iterator object, and
- * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_free(), which also frees the ref_iterator object and
- * any associated resources. If there was an internal error advancing
- * to the next entry, ref_iterator_advance() aborts the iteration,
- * frees the ref_iterator, and returns ITER_ERROR.
- *
- * The reference currently being looked at can be peeled by calling
- * ref_iterator_peel(). This function is often faster than peel_ref(),
- * so it should be preferred when iterating over references.
- *
- * Putting it all together, a typical iteration looks like this:
- *
- *     int ok;
- *     struct ref_iterator *iter = ...;
- *
- *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = ITER_DONE;
- *                     break;
- *             }
- *
- *             // Access information about the current reference:
- *             if (!(iter->flags & REF_ISSYMREF))
- *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
- *
- *             // If you need to peel the reference:
- *             ref_iterator_peel(iter, &oid);
- *     }
- *
- *     if (ok != ITER_DONE)
- *             handle_error();
- *     ref_iterator_free(iter);
+ * Data structure for holding a reference iterator. See refs.h for
+ * more details and usage instructions.
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -337,42 +255,6 @@ struct ref_iterator {
 	unsigned int flags;
 };
 
-/*
- * Advance the iterator to the first or next item and return ITER_OK.
- * If the iteration is exhausted, free the resources associated with
- * the ref_iterator and return ITER_DONE. On errors, free the iterator
- * resources and return ITER_ERROR. It is a bug to use ref_iterator or
- * call this function again after it has returned ITER_DONE or
- * ITER_ERROR.
- */
-int ref_iterator_advance(struct ref_iterator *ref_iterator);
-
-/*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
- * first reference again.
- *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
- *
- * Returns 0 on success, a negative error code otherwise.
- */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
-
-/*
- * If possible, peel the reference currently being viewed by the
- * iterator. Return 0 on success.
- */
-int ref_iterator_peel(struct ref_iterator *ref_iterator,
-		      struct object_id *peeled);
-
-/* Free the reference iterator and any associated resources. */
-void ref_iterator_free(struct ref_iterator *ref_iterator);
-
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
  * returns ITER_DONE).
@@ -384,17 +266,6 @@ struct ref_iterator *empty_ref_iterator_begin(void);
  */
 int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
 
-/*
- * Return an iterator that goes over each reference in `refs` for
- * which the refname begins with prefix. If trim is non-zero, then
- * trim that many characters off the beginning of each refname.
- * The output is ordered by refname.
- */
-struct ref_iterator *refs_ref_iterator_begin(
-		struct ref_store *refs,
-		const char *prefix, const char **exclude_patterns,
-		int trim, enum do_for_each_ref_flags flags);
-
 /*
  * A callback function used to instruct merge_ref_iterator how to
  * interleave the entries from iter0 and iter1. The function should
@@ -520,18 +391,6 @@ struct ref_iterator_vtable {
  */
 extern struct ref_iterator *current_ref_iter;
 
-/*
- * The common backend for the for_each_*ref* functions. Call fn for
- * each reference in iter. If the iterator itself ever returns
- * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
- * the iteration and return that value. Otherwise, return 0. In any
- * case, free the iterator when done. This function is basically an
- * adapter between the callback style of reference iteration and the
- * iterator style.
- */
-int do_for_each_ref_iterator(struct ref_iterator *iter,
-			     each_ref_fn fn, void *cb_data);
-
 struct ref_store;
 
 /* refs backends */
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH 2/4] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
  2025-07-01 15:03 ` [PATCH 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
@ 2025-07-01 15:03 ` Karthik Nayak
  2025-07-14 15:46   ` Junio C Hamano
  2025-07-01 15:03 ` [PATCH 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-01 15:03 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak
The 'find_ref_entry' function is no longer used, so remove it.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  | 14 --------------
  |  7 -------
 2 files changed, 21 deletions(-)
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index c1f1bab1d5..8aaffa8c6b 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -194,20 +194,6 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir,
 	return dir;
 }
 
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
-{
-	int entry_index;
-	struct ref_entry *entry;
-	dir = find_containing_dir(dir, refname);
-	if (!dir)
-		return NULL;
-	entry_index = search_ref_dir(dir, refname, strlen(refname));
-	if (entry_index == -1)
-		return NULL;
-	entry = dir->entries[entry_index];
-	return (entry->flag & REF_DIR) ? NULL : entry;
-}
-
 /*
  * Emit a warning and return true iff ref1 and ref2 have the same name
  * and the same oid. Die if they have the same name but different
 --git a/refs/ref-cache.h b/refs/ref-cache.h
index 5f04e518c3..f635d2d824 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -201,13 +201,6 @@ void free_ref_cache(struct ref_cache *cache);
  */
 void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
 
-/*
- * Find the value entry with the given name in dir, sorting ref_dirs
- * and recursing into subdirectories as necessary.  If the name is not
- * found or it corresponds to a directory entry, return NULL.
- */
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
-
 /*
  * Start iterating over references in `cache`. If `prefix` is
  * specified, only include references whose names start with that
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH 2/4] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-01 15:03 ` [PATCH 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
@ 2025-07-14 15:46   ` Junio C Hamano
  0 siblings, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2025-07-14 15:46 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
Karthik Nayak <karthik.188@gmail.com> writes:
> The 'find_ref_entry' function is no longer used, so remove it.
It seems that ba1c052f (ref_store: implement `refs_peel_ref()`
generically, 2017-09-25) removed the last caller of it.  This is
long overdue ;-)
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs/ref-cache.c | 14 --------------
>  refs/ref-cache.h |  7 -------
>  2 files changed, 21 deletions(-)
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index c1f1bab1d5..8aaffa8c6b 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -194,20 +194,6 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir,
>  	return dir;
>  }
>  
> -struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
> -{
> -	int entry_index;
> -	struct ref_entry *entry;
> -	dir = find_containing_dir(dir, refname);
> -	if (!dir)
> -		return NULL;
> -	entry_index = search_ref_dir(dir, refname, strlen(refname));
> -	if (entry_index == -1)
> -		return NULL;
> -	entry = dir->entries[entry_index];
> -	return (entry->flag & REF_DIR) ? NULL : entry;
> -}
> -
>  /*
>   * Emit a warning and return true iff ref1 and ref2 have the same name
>   * and the same oid. Die if they have the same name but different
> diff --git a/refs/ref-cache.h b/refs/ref-cache.h
> index 5f04e518c3..f635d2d824 100644
> --- a/refs/ref-cache.h
> +++ b/refs/ref-cache.h
> @@ -201,13 +201,6 @@ void free_ref_cache(struct ref_cache *cache);
>   */
>  void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
>  
> -/*
> - * Find the value entry with the given name in dir, sorting ref_dirs
> - * and recursing into subdirectories as necessary.  If the name is not
> - * found or it corresponds to a directory entry, return NULL.
> - */
> -struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
> -
>  /*
>   * Start iterating over references in `cache`. If `prefix` is
>   * specified, only include references whose names start with that
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH 3/4] refs: selectively set prefix in the seek functions
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
  2025-07-01 15:03 ` [PATCH 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
  2025-07-01 15:03 ` [PATCH 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
@ 2025-07-01 15:03 ` Karthik Nayak
  2025-07-03  5:55   ` Patrick Steinhardt
  2025-07-01 15:03 ` [PATCH 4/4] for-each-ref: introduce a '--skip-until' option Karthik Nayak
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-01 15:03 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.
However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.
Let's add a 'set_prefix' field to the function, which when set, will set
the prefix for the iteration in-line with the existing behavior. But
when the 'set_prefix' field is not set, the reference backends will
simply seek to the specified reference without setting prefix. This
allows users to start iteration from a specific reference.
In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when `set_prefix` is set. We then parse the provided
seek string and set the required levels and their indexes to ensure that
seeking is possible.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                   |  2 +-
                   | 19 +++++++-----
             |  7 +++--
     |  7 +++--
          | 24 +++++++++------
    | 15 +++++----
         | 81 ++++++++++++++++++++++++++++++++++++++++++++++---
     |  7 +++--
  | 17 ++++++-----
 9 files changed, 134 insertions(+), 45 deletions(-)
 --git a/refs.c b/refs.c
index dce5c49ca2..a4220d3537 100644
--- a/refs.c
+++ b/refs.c
@@ -2669,7 +2669,7 @@ enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs
 			if (!iter) {
 				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 							       DO_FOR_EACH_INCLUDE_BROKEN);
-			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+			} else if (ref_iterator_seek(iter, dirname.buf, 1) < 0) {
 				goto cleanup;
 			}
 
 --git a/refs.h b/refs.h
index c05be6d0ac..c5e08db0ff 100644
--- a/refs.h
+++ b/refs.h
@@ -1300,20 +1300,25 @@ struct ref_iterator *refs_ref_iterator_begin(
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
 /*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
+ * Seek the iterator to the first reference matching the given seek string.
+ * The seek string is matched as a literal string, without regard for path
  * separators. If prefix is NULL or the empty string, seek the iterator to the
  * first reference again.
  *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
+ * When set_prefix is true, this function behaves as if a new ref iterator
+ * with the same prefix had been created, setting the prefix for subsequent
+ * iteration. When set_prefix is false, the iterator simply seeks to the
+ * specified reference without changing the existing prefix, allowing
+ * iteration to start from that specific reference.
+ *
+ * This function allows reuse of iterators and thus may allow the backend
+ * to optimize. Parameters other than the prefix that have been passed when
+ * creating the iterator will remain unchanged.
  *
  * Returns 0 on success, a negative error code otherwise.
  */
 int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
+		      const char *seek, int set_prefix);
 
 /*
  * If possible, peel the reference currently being viewed by the
 --git a/refs/debug.c b/refs/debug.c
index 485e3079d7..7c04bcba10 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -170,12 +170,13 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, int set_prefix)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->seek(diter->iter, prefix);
-	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	int res = diter->iter->vtable->seek(diter->iter, seek, set_prefix);
+	trace_printf_key(&trace_refs, "iterator_seek: %s set_prefix: %d: %d\n",
+			 seek ? seek : "", set_prefix, res);
 	return res;
 }
 
 --git a/refs/files-backend.c b/refs/files-backend.c
index bf6f89b1d1..827b15981c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -929,11 +929,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, int set_prefix)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	return ref_iterator_seek(iter->iter0, prefix);
+	return ref_iterator_seek(iter->iter0, seek, set_prefix);
 }
 
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -2316,7 +2316,8 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				      const char *prefix UNUSED)
+				      const char *seek UNUSED,
+				      int set_prefix UNUSED)
 {
 	BUG("ref_iterator_seek() called for reflog_iterator");
 }
 --git a/refs/iterator.c b/refs/iterator.c
index 766d96e795..1f99045d40 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -16,9 +16,9 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix)
+		      const char *seek, int set_prefix)
 {
-	return ref_iterator->vtable->seek(ref_iterator, prefix);
+	return ref_iterator->vtable->seek(ref_iterator, seek, set_prefix);
 }
 
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -57,7 +57,8 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 }
 
 static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				   const char *prefix UNUSED)
+				   const char *seek UNUSED,
+				   int set_prefix UNUSED)
 {
 	return 0;
 }
@@ -224,7 +225,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, int set_prefix)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
@@ -234,11 +235,11 @@ static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	iter->iter0 = iter->iter0_owned;
 	iter->iter1 = iter->iter1_owned;
 
-	ret = ref_iterator_seek(iter->iter0, prefix);
+	ret = ref_iterator_seek(iter->iter0, seek, set_prefix);
 	if (ret < 0)
 		return ret;
 
-	ret = ref_iterator_seek(iter->iter1, prefix);
+	ret = ref_iterator_seek(iter->iter1, seek, set_prefix);
 	if (ret < 0)
 		return ret;
 
@@ -407,13 +408,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *seek, int set_prefix)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	return ref_iterator_seek(iter->iter0, prefix);
+
+	if (set_prefix) {
+		free(iter->prefix);
+		iter->prefix = xstrdup_or_null(seek);
+	}
+	return ref_iterator_seek(iter->iter0, seek, set_prefix);
 }
 
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 --git a/refs/packed-backend.c b/refs/packed-backend.c
index 7fd73a0e6d..dca886c5cc 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1004,19 +1004,22 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *seek, int set_prefix)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
 	const char *start;
 
-	if (prefix && *prefix)
-		start = find_reference_location(iter->snapshot, prefix, 0);
+	if (seek && *seek)
+		start = find_reference_location(iter->snapshot, seek, 0);
 	else
 		start = iter->snapshot->start;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
+	if (set_prefix) {
+		free(iter->prefix);
+		iter->prefix = xstrdup_or_null(seek);
+	}
+
 	iter->pos = start;
 	iter->eof = iter->snapshot->eof;
 
@@ -1194,7 +1197,7 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (packed_ref_iterator_seek(&iter->base, prefix, 1) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index 8aaffa8c6b..656e6cd9ff 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -434,11 +434,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
-static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+static int cache_ref_iterator_set_prefix(struct cache_ref_iterator *iter,
+					 const char *prefix)
 {
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
 	struct cache_ref_iterator_level *level;
 	struct ref_dir *dir;
 
@@ -469,6 +467,79 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	return 0;
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *seek, int set_prefix)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	if (set_prefix) {
+		return cache_ref_iterator_set_prefix(iter, seek);
+	} else if (seek && *seek) {
+		struct cache_ref_iterator_level *level;
+		const char *slash = seek;
+		struct ref_dir *dir;
+
+		dir = get_ref_dir(iter->cache->root);
+
+		if (iter->prime_dir)
+			prime_ref_dir(dir, seek);
+
+		iter->levels_nr = 1;
+		level = &iter->levels[0];
+		level->index = -1;
+		level->dir = dir;
+
+		/*
+		 * Breakdown the provided seek path and assign the correct
+		 * indexing to each level as needed.
+		 */
+		do {
+			int len, idx;
+			int cmp = 0;
+
+			sort_ref_dir(dir);
+
+			slash = strchr(slash, '/');
+			len = slash ? slash - seek : (int)strlen(seek);
+
+			for (idx = 0; idx < dir->nr; idx++) {
+				cmp = strncmp(seek, dir->entries[idx]->name, len);
+				if (cmp <= 0)
+					break;
+			}
+			/* don't overflow the index */
+			idx = idx >= dir->nr ? dir->nr - 1 : idx;
+
+			if (slash)
+				slash = slash + 1;
+
+			level->index = idx;
+			if (dir->entries[idx]->flag & REF_DIR) {
+				/* push down a level */
+				dir = get_ref_dir(dir->entries[idx]);
+
+				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+					   iter->levels_alloc);
+				level = &iter->levels[iter->levels_nr++];
+				level->dir = dir;
+				level->index = -1;
+			} else {
+				/* reduce the index so the leaf node is iterated over */
+				if (cmp <= 0 && !slash)
+					level->index = idx - 1;
+				/*
+				 * while the seek path may not be exhausted, our
+				 * match is exhausted at a leaf node.
+				 */
+				break;
+			}
+		} while (slash);
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -509,7 +580,7 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 	iter->cache = cache;
 	iter->prime_dir = prime_dir;
 
-	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (cache_ref_iterator_seek(&iter->base, prefix, 1) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index 03f5df04d5..cee377696c 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -353,11 +353,12 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 /*
- * Seek the iterator to the first reference matching the given prefix. Should
- * behave the same as if a new iterator was created with the same prefix.
+ * Seek the iterator to the first matching reference. If set_prefix is set,
+ * it would behave the same as if a new iterator was created with the same
+ * prefix.
  */
 typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
-				 const char *prefix);
+				 const char *seek, int set_prefix);
 
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
 --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 4c3817f4ec..81fb6a9028 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -719,15 +719,17 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				      const char *prefix)
+				      const char *seek, int set_prefix)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
-	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+	if (set_prefix) {
+		free(iter->prefix);
+		iter->prefix = xstrdup_or_null(seek);
+		iter->prefix_len = seek ? strlen(seek) : 0;
+	}
+	iter->err = reftable_iterator_seek_ref(&iter->iter, seek);
 
 	return iter->err;
 }
@@ -839,7 +841,7 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	ret = reftable_ref_iterator_seek(&iter->base, prefix);
+	ret = reftable_ref_iterator_seek(&iter->base, prefix, 1);
 	if (ret)
 		goto done;
 
@@ -2042,7 +2044,8 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-					 const char *prefix UNUSED)
+					 const char *seek UNUSED,
+					 int set_prefix UNUSED)
 {
 	BUG("reftable reflog iterator cannot be seeked");
 	return -1;
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH 3/4] refs: selectively set prefix in the seek functions
  2025-07-01 15:03 ` [PATCH 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-03  5:55   ` Patrick Steinhardt
  2025-07-03  9:40     ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Patrick Steinhardt @ 2025-07-03  5:55 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
On Tue, Jul 01, 2025 at 05:03:29PM +0200, Karthik Nayak wrote:
> The ref iterator exposes a `ref_iterator_seek()` function. The name
> suggests that this would seek the iterator to a specific reference in
> some ways similar to how `fseek()` works for the filesystem.
> 
> However, the function actually sets the prefix for refs iteration. So
> further iteration would only yield references which match the particular
> prefix. This is a bit confusing.
> 
> Let's add a 'set_prefix' field to the function, which when set, will set
> the prefix for the iteration in-line with the existing behavior. But
> when the 'set_prefix' field is not set, the reference backends will
> simply seek to the specified reference without setting prefix. This
> allows users to start iteration from a specific reference.
> 
> In the packed and reftable backend, since references are available in a
> sorted list, the changes are simply setting the prefix if needed. The
> changes on the files-backend are a little more involved, since the files
> backend uses the 'ref-cache' mechanism. We move out the existing logic
> within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
> which is called when `set_prefix` is set. We then parse the provided
> seek string and set the required levels and their indexes to ensure that
> seeking is possible.
That solution makes sense.
> diff --git a/refs.c b/refs.c
> index dce5c49ca2..a4220d3537 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2669,7 +2669,7 @@ enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs
>  			if (!iter) {
>  				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
>  							       DO_FOR_EACH_INCLUDE_BROKEN);
> -			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
> +			} else if (ref_iterator_seek(iter, dirname.buf, 1) < 0) {
>  				goto cleanup;
>  			}
>  
This is quite unreadable, as you have no idea what `1` could mean. Let's
make this a `unsigned flags` variable instead so that we can provide
meaningful names.
> diff --git a/refs.h b/refs.h
> index c05be6d0ac..c5e08db0ff 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -1300,20 +1300,25 @@ struct ref_iterator *refs_ref_iterator_begin(
>  int ref_iterator_advance(struct ref_iterator *ref_iterator);
>  
>  /*
> - * Seek the iterator to the first reference with the given prefix.
> - * The prefix is matched as a literal string, without regard for path
> + * Seek the iterator to the first reference matching the given seek string.
> + * The seek string is matched as a literal string, without regard for path
>   * separators. If prefix is NULL or the empty string, seek the iterator to the
>   * first reference again.
>   *
> - * This function is expected to behave as if a new ref iterator with the same
> - * prefix had been created, but allows reuse of iterators and thus may allow
> - * the backend to optimize. Parameters other than the prefix that have been
> - * passed when creating the iterator will remain unchanged.
> + * When set_prefix is true, this function behaves as if a new ref iterator
> + * with the same prefix had been created, setting the prefix for subsequent
> + * iteration. When set_prefix is false, the iterator simply seeks to the
> + * specified reference without changing the existing prefix, allowing
> + * iteration to start from that specific reference.
I think we should detangle this paragraph a bit.
    This function is expected to behave as if a new ref iterator has
    been created, but allows reuse of it
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH 3/4] refs: selectively set prefix in the seek functions
  2025-07-03  5:55   ` Patrick Steinhardt
@ 2025-07-03  9:40     ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-03  9:40 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 3717 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Tue, Jul 01, 2025 at 05:03:29PM +0200, Karthik Nayak wrote:
>> The ref iterator exposes a `ref_iterator_seek()` function. The name
>> suggests that this would seek the iterator to a specific reference in
>> some ways similar to how `fseek()` works for the filesystem.
>>
>> However, the function actually sets the prefix for refs iteration. So
>> further iteration would only yield references which match the particular
>> prefix. This is a bit confusing.
>>
>> Let's add a 'set_prefix' field to the function, which when set, will set
>> the prefix for the iteration in-line with the existing behavior. But
>> when the 'set_prefix' field is not set, the reference backends will
>> simply seek to the specified reference without setting prefix. This
>> allows users to start iteration from a specific reference.
>>
>> In the packed and reftable backend, since references are available in a
>> sorted list, the changes are simply setting the prefix if needed. The
>> changes on the files-backend are a little more involved, since the files
>> backend uses the 'ref-cache' mechanism. We move out the existing logic
>> within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
>> which is called when `set_prefix` is set. We then parse the provided
>> seek string and set the required levels and their indexes to ensure that
>> seeking is possible.
>
> That solution makes sense.
>
>> diff --git a/refs.c b/refs.c
>> index dce5c49ca2..a4220d3537 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -2669,7 +2669,7 @@ enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs
>>  			if (!iter) {
>>  				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
>>  							       DO_FOR_EACH_INCLUDE_BROKEN);
>> -			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
>> +			} else if (ref_iterator_seek(iter, dirname.buf, 1) < 0) {
>>  				goto cleanup;
>>  			}
>>
>
> This is quite unreadable, as you have no idea what `1` could mean. Let's
> make this a `unsigned flags` variable instead so that we can provide
> meaningful names.
>
Yeah, that would make a lot more sense. Will amend.
>> diff --git a/refs.h b/refs.h
>> index c05be6d0ac..c5e08db0ff 100644
>> --- a/refs.h
>> +++ b/refs.h
>> @@ -1300,20 +1300,25 @@ struct ref_iterator *refs_ref_iterator_begin(
>>  int ref_iterator_advance(struct ref_iterator *ref_iterator);
>>
>>  /*
>> - * Seek the iterator to the first reference with the given prefix.
>> - * The prefix is matched as a literal string, without regard for path
>> + * Seek the iterator to the first reference matching the given seek string.
>> + * The seek string is matched as a literal string, without regard for path
>>   * separators. If prefix is NULL or the empty string, seek the iterator to the
>>   * first reference again.
>>   *
>> - * This function is expected to behave as if a new ref iterator with the same
>> - * prefix had been created, but allows reuse of iterators and thus may allow
>> - * the backend to optimize. Parameters other than the prefix that have been
>> - * passed when creating the iterator will remain unchanged.
>> + * When set_prefix is true, this function behaves as if a new ref iterator
>> + * with the same prefix had been created, setting the prefix for subsequent
>> + * iteration. When set_prefix is false, the iterator simply seeks to the
>> + * specified reference without changing the existing prefix, allowing
>> + * iteration to start from that specific reference.
>
> I think we should detangle this paragraph a bit.
>
>     This function is expected to behave as if a new ref iterator has
>     been created, but allows reuse of it
Sure, let me add this in. Thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH 4/4] for-each-ref: introduce a '--skip-until' option
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
                   ` (2 preceding siblings ...)
  2025-07-01 15:03 ` [PATCH 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-01 15:03 ` Karthik Nayak
  2025-07-03  5:55   ` Patrick Steinhardt
  2025-07-01 17:08 ` [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Junio C Hamano
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-01 15:03 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
The previous commit added 'seek' functionality to the reference
backends. Utilize this and expose a '--skip-until' option in
'git-for-each-ref(1)'. When used, the reference iteration seeks to the
first matching reference and iterates from there onward.
This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --skip-until=refs/heads/branch-100
    git for-each-ref --count=100 --skip-until=refs/heads/branch-200
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  |   6 +-
               |   5 +
                         |  57 ++++++++----
                         |   1 +
       | 180 ++++++++++++++++++++++++++++++++++++
 5 files changed, 230 insertions(+), 19 deletions(-)
 --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
index 5ef89fc0fe..4bf7c66b8c 100644
--- a/Documentation/git-for-each-ref.adoc
+++ b/Documentation/git-for-each-ref.adoc
@@ -14,7 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
-		   [--exclude=<pattern> ...]
+		   [--exclude=<pattern> ...] [--skip-until=<pattern>]
 
 DESCRIPTION
 -----------
@@ -108,6 +108,10 @@ TAB %(refname)`.
 --include-root-refs::
 	List root refs (HEAD and pseudorefs) apart from regular refs.
 
+--skip-until::
+    Skip references up to the specified pattern. Cannot be used with
+    general pattern matching.
+
 FIELD NAMES
 -----------
 
 --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 3d2207ec77..543013cd11 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -13,6 +13,7 @@ static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
+	N_("git for-each-ref [--skip-until <pattern>]"),
 	NULL
 };
 
@@ -44,6 +45,7 @@ int cmd_for_each_ref(int argc,
 		OPT_GROUP(""),
 		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
+		OPT_STRING(  0 , "skip-until", &filter.seek, N_("skip-until"), N_("skip references until")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
@@ -100,6 +102,9 @@ int cmd_for_each_ref(int argc,
 		filter.name_patterns = argv;
 	}
 
+	if (filter.seek && filter.name_patterns && filter.name_patterns[0])
+		die(_("cannot use --skip-until with patterns"));
+
 	if (include_root_refs)
 		flags |= FILTER_REFS_ROOT_REFS | FILTER_REFS_DETACHED_HEAD;
 
 --git a/ref-filter.c b/ref-filter.c
index 7a274633cf..9d0255d5db 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2692,10 +2692,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 				       each_ref_fn cb,
 				       void *cb_data)
 {
+	struct ref_iterator *iter;
+	int flags = 0, ret = 0;
+
 	if (filter->kind & FILTER_REFS_ROOT_REFS) {
 		/* In this case, we want to print all refs including root refs. */
-		return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
-						       cb, cb_data);
+		flags |= DO_FOR_EACH_INCLUDE_ROOT_REFS;
+		goto non_prefix_iter;
 	}
 
 	if (!filter->match_as_path) {
@@ -2704,8 +2707,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	if (filter->ignore_case) {
@@ -2714,20 +2716,28 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", filter->exclude.v, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
 						 filter->exclude.v,
 						 cb, cb_data);
+
+non_prefix_iter:
+	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
+				       NULL, 0, flags);
+	if (filter->seek)
+		ret = ref_iterator_seek(iter, filter->seek, 0);
+	if (ret)
+		return ret;
+
+	return do_for_each_ref_iterator(iter, cb, cb_data);
 }
 
 /*
@@ -3200,6 +3210,8 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 	if (!filter->kind)
 		die("filter_refs: invalid type");
 	else {
+		const char *prefix = NULL;
+
 		/*
 		 * For common cases where we need only branches or remotes or tags,
 		 * we only iterate through those refs. If a mix of refs is needed,
@@ -3207,19 +3219,28 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/heads/", NULL,
-						       fn, cb_data);
+			prefix = "refs/heads/";
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/remotes/", NULL,
-						       fn, cb_data);
+			prefix = "refs/remotes/";
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/tags/", NULL, fn,
-						       cb_data);
-		else if (filter->kind & FILTER_REFS_REGULAR)
+			prefix = "refs/tags/";
+
+		if (prefix) {
+			struct ref_iterator *iter;
+
+			iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
+						       "", NULL, 0, 0);
+
+			if (filter->seek)
+				ret = ref_iterator_seek(iter, filter->seek, 0);
+			else if (prefix)
+				ret = ref_iterator_seek(iter, prefix, 1);
+
+			if (!ret)
+				ret = do_for_each_ref_iterator(iter, fn, cb_data);
+		} else if (filter->kind & FILTER_REFS_REGULAR) {
 			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
+		}
 
 		/*
 		 * When printing all ref types, HEAD is already included,
 --git a/ref-filter.h b/ref-filter.h
index c98c4fbd4c..9e97c65bc2 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -64,6 +64,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	const char *seek;
 	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index bb02b86c16..af2c60a2ce 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -541,4 +541,184 @@ test_expect_success 'validate worktree atom' '
 	test_cmp expect actual
 '
 
+test_expect_success 'skip until with empty value' '
+	cat >expect <<-\EOF &&
+	refs/heads/main
+	refs/heads/main_worktree
+	refs/heads/side
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until="" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until to a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until to a specific reference with partial match' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/sp >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until just behind a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/parrot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until to specific directory' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until to specific directory with trailing slash' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/lost >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until just behind a specific directory' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until overflow specific reference length' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spotnew >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until overflow specific reference path' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot/new >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until used with a pattern' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --skip-until with patterns
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot refs/tags  2>actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH 4/4] for-each-ref: introduce a '--skip-until' option
  2025-07-01 15:03 ` [PATCH 4/4] for-each-ref: introduce a '--skip-until' option Karthik Nayak
@ 2025-07-03  5:55   ` Patrick Steinhardt
  2025-07-03 10:02     ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Patrick Steinhardt @ 2025-07-03  5:55 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
On Tue, Jul 01, 2025 at 05:03:30PM +0200, Karthik Nayak wrote:
> diff --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
> index 5ef89fc0fe..4bf7c66b8c 100644
> --- a/Documentation/git-for-each-ref.adoc
> +++ b/Documentation/git-for-each-ref.adoc
> @@ -14,7 +14,7 @@ SYNOPSIS
>  		   [--points-at=<object>]
>  		   [--merged[=<object>]] [--no-merged[=<object>]]
>  		   [--contains[=<object>]] [--no-contains[=<object>]]
> -		   [--exclude=<pattern> ...]
> +		   [--exclude=<pattern> ...] [--skip-until=<pattern>]
>  
>  DESCRIPTION
>  -----------
> @@ -108,6 +108,10 @@ TAB %(refname)`.
>  --include-root-refs::
>  	List root refs (HEAD and pseudorefs) apart from regular refs.
>  
> +--skip-until::
> +    Skip references up to the specified pattern. Cannot be used with
> +    general pattern matching.
> +
>  FIELD NAMES
>  -----------
>  
Is it "up to and including the specified pattern" or "up to but
excluding the specified pattern"? It would help to make it very explicit
whether the pattern itself would be yielded or not.
> diff --git a/ref-filter.c b/ref-filter.c
> index 7a274633cf..9d0255d5db 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2714,20 +2716,28 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>  		 * so just return everything and let the caller
>  		 * sort it out.
>  		 */
> -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> -						"", NULL, cb, cb_data);
> +		goto non_prefix_iter;
>  	}
>  
>  	if (!filter->name_patterns[0]) {
>  		/* no patterns; we have to look at everything */
> -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> -						 "", filter->exclude.v, cb, cb_data);
> +		goto non_prefix_iter;
>  	}
>  
>  	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
>  						 NULL, filter->name_patterns,
>  						 filter->exclude.v,
>  						 cb, cb_data);
> +
> +non_prefix_iter:
> +	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
> +				       NULL, 0, flags);
> +	if (filter->seek)
> +		ret = ref_iterator_seek(iter, filter->seek, 0);
Hm, this interface is somewhat weird now, as we have a split in what the
prefix-string meeks when creating the iterator and seeking it. I think
we should align those two functions.
> +	if (ret)
> +		return ret;
> +
> +	return do_for_each_ref_iterator(iter, cb, cb_data);
>  }
>  
>  /*
> @@ -3200,6 +3210,8 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
>  	if (!filter->kind)
>  		die("filter_refs: invalid type");
>  	else {
The `if` branch now needs to be updated to have curly braces, as well.
Patrick
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH 4/4] for-each-ref: introduce a '--skip-until' option
  2025-07-03  5:55   ` Patrick Steinhardt
@ 2025-07-03 10:02     ` Karthik Nayak
  2025-07-03 10:59       ` Patrick Steinhardt
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-03 10:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 3312 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Tue, Jul 01, 2025 at 05:03:30PM +0200, Karthik Nayak wrote:
>> diff --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
>> index 5ef89fc0fe..4bf7c66b8c 100644
>> --- a/Documentation/git-for-each-ref.adoc
>> +++ b/Documentation/git-for-each-ref.adoc
>> @@ -14,7 +14,7 @@ SYNOPSIS
>>  		   [--points-at=<object>]
>>  		   [--merged[=<object>]] [--no-merged[=<object>]]
>>  		   [--contains[=<object>]] [--no-contains[=<object>]]
>> -		   [--exclude=<pattern> ...]
>> +		   [--exclude=<pattern> ...] [--skip-until=<pattern>]
>>
>>  DESCRIPTION
>>  -----------
>> @@ -108,6 +108,10 @@ TAB %(refname)`.
>>  --include-root-refs::
>>  	List root refs (HEAD and pseudorefs) apart from regular refs.
>>
>> +--skip-until::
>> +    Skip references up to the specified pattern. Cannot be used with
>> +    general pattern matching.
>> +
>>  FIELD NAMES
>>  -----------
>>
>
> Is it "up to and including the specified pattern" or "up to but
> excluding the specified pattern"? It would help to make it very explicit
> whether the pattern itself would be yielded or not.
>
It is "up to and including", will modify to make this more clearer.
>> diff --git a/ref-filter.c b/ref-filter.c
>> index 7a274633cf..9d0255d5db 100644
>> --- a/ref-filter.c
>> +++ b/ref-filter.c
>> @@ -2714,20 +2716,28 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>>  		 * so just return everything and let the caller
>>  		 * sort it out.
>>  		 */
>> -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
>> -						"", NULL, cb, cb_data);
>> +		goto non_prefix_iter;
>>  	}
>>
>>  	if (!filter->name_patterns[0]) {
>>  		/* no patterns; we have to look at everything */
>> -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
>> -						 "", filter->exclude.v, cb, cb_data);
>> +		goto non_prefix_iter;
>>  	}
>>
>>  	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
>>  						 NULL, filter->name_patterns,
>>  						 filter->exclude.v,
>>  						 cb, cb_data);
>> +
>> +non_prefix_iter:
>> +	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
>> +				       NULL, 0, flags);
>> +	if (filter->seek)
>> +		ret = ref_iterator_seek(iter, filter->seek, 0);
>
> Hm, this interface is somewhat weird now, as we have a split in what the
> prefix-string meeks when creating the iterator and seeking it. I think
> we should align those two functions.
>
The `refs_ref_iterator_begin()` takes in a `prefix` string, which sets
the prefix.
The `ref_iterator_seek()` takes in a `seek` string, but a flag allows it
also set the prefix.
I think this is okay since the naming matches what it does.
The alternate would be to `refs_ref_iterator_begin()` to also take in a
`seek` string with a flag to also set the prefix. What do you think? I'm
okay either ways.
>> +	if (ret)
>> +		return ret;
>> +
>> +	return do_for_each_ref_iterator(iter, cb, cb_data);
>>  }
>>
>>  /*
>> @@ -3200,6 +3210,8 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
>>  	if (!filter->kind)
>>  		die("filter_refs: invalid type");
>>  	else {
>
> The `if` branch now needs to be updated to have curly braces, as well.
>
> Patrick
Yes, will add.
Thanks for the review!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH 4/4] for-each-ref: introduce a '--skip-until' option
  2025-07-03 10:02     ` Karthik Nayak
@ 2025-07-03 10:59       ` Patrick Steinhardt
  0 siblings, 0 replies; 102+ messages in thread
From: Patrick Steinhardt @ 2025-07-03 10:59 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
On Thu, Jul 03, 2025 at 03:02:03AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > On Tue, Jul 01, 2025 at 05:03:30PM +0200, Karthik Nayak wrote:
> >> diff --git a/ref-filter.c b/ref-filter.c
> >> index 7a274633cf..9d0255d5db 100644
> >> --- a/ref-filter.c
> >> +++ b/ref-filter.c
> >> @@ -2714,20 +2716,28 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
> >>  		 * so just return everything and let the caller
> >>  		 * sort it out.
> >>  		 */
> >> -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> >> -						"", NULL, cb, cb_data);
> >> +		goto non_prefix_iter;
> >>  	}
> >>
> >>  	if (!filter->name_patterns[0]) {
> >>  		/* no patterns; we have to look at everything */
> >> -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> >> -						 "", filter->exclude.v, cb, cb_data);
> >> +		goto non_prefix_iter;
> >>  	}
> >>
> >>  	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
> >>  						 NULL, filter->name_patterns,
> >>  						 filter->exclude.v,
> >>  						 cb, cb_data);
> >> +
> >> +non_prefix_iter:
> >> +	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
> >> +				       NULL, 0, flags);
> >> +	if (filter->seek)
> >> +		ret = ref_iterator_seek(iter, filter->seek, 0);
> >
> > Hm, this interface is somewhat weird now, as we have a split in what the
> > prefix-string meeks when creating the iterator and seeking it. I think
> > we should align those two functions.
> >
> 
> The `refs_ref_iterator_begin()` takes in a `prefix` string, which sets
> the prefix.
> 
> The `ref_iterator_seek()` takes in a `seek` string, but a flag allows it
> also set the prefix.
> 
> I think this is okay since the naming matches what it does.
> 
> The alternate would be to `refs_ref_iterator_begin()` to also take in a
> `seek` string with a flag to also set the prefix. What do you think? I'm
> okay either ways.
I just think that the interface is a bit confusing. It's weird that the
needle that we're seeking for may or may not be used to update internal
state, and that this is inconsistent with the similar fields that you
pass to the iterator when creating it. So after seeking it sometimes
acts like you have created a new iterator with the needle, sometimes it
does not becaus we retain internal state. This kind of inconsistency
invites mistakes.
Patrick
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
                   ` (3 preceding siblings ...)
  2025-07-01 15:03 ` [PATCH 4/4] for-each-ref: introduce a '--skip-until' option Karthik Nayak
@ 2025-07-01 17:08 ` Junio C Hamano
  2025-07-02 16:45   ` Karthik Nayak
  2025-07-01 21:37 ` Junio C Hamano
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-01 17:08 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
Karthik Nayak <karthik.188@gmail.com> writes:
> The `git-for-each-ref(1)` command is used to iterate over references
> present in a repository. In large repositories with millions of
> references, it would be optimal to paginate this output such that we
> can start iteration from a given reference.
I haven't looked at the patches, but should the end-user's mental
model of the process be like this?
 - We have a native order in which references are sorted and that is
   what "git for-each-ref" without "--sort" option gives them.
 - They can use the "--skip-until" option to seek in the above order
   and start iterating in the middle.
 - If they give "--sort", the set of refs to be shown would not
   change; skipping is done in the native order and then the
   remainder is given sorted.
Please make sure that the documentation is clear enough to avoid a
misunderstanding that this feature would kick in after we grab all
refs and sort them.  If it worked that way, it would allow us to say
"going from newer to older, but skipping the most recent ones that
were touched within a week", which would have been nice, but that is
not what we are doing with this feature---I think it is OK but we
need to be clear about it in the documentation.
> This series adds a '--skip-until' option in 'git-for-each-ref(1)'. When
> used, the reference iteration seeks to the first matching reference and
> iterates from there onward.
OK.  Even the filesystem backed ones we internall sort after doing
readdir() loop, so this is feasible.  Nice.
> Initally I was also planning to cleanup all the `refs_for_each...()`
> functions in 'refs.h' by simply using the iterator, but this bloated the
> series. So I've left that for another day.
OK.
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-01 17:08 ` [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Junio C Hamano
@ 2025-07-02 16:45   ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-02 16:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2168 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The `git-for-each-ref(1)` command is used to iterate over references
>> present in a repository. In large repositories with millions of
>> references, it would be optimal to paginate this output such that we
>> can start iteration from a given reference.
>
> I haven't looked at the patches, but should the end-user's mental
> model of the process be like this?
>
>  - We have a native order in which references are sorted and that is
>    what "git for-each-ref" without "--sort" option gives them.
>
>  - They can use the "--skip-until" option to seek in the above order
>    and start iterating in the middle.
>
>  - If they give "--sort", the set of refs to be shown would not
>    change; skipping is done in the native order and then the
>    remainder is given sorted.
>
> Please make sure that the documentation is clear enough to avoid a
> misunderstanding that this feature would kick in after we grab all
> refs and sort them.  If it worked that way, it would allow us to say
> "going from newer to older, but skipping the most recent ones that
> were touched within a week", which would have been nice, but that is
> not what we are doing with this feature---I think it is OK but we
> need to be clear about it in the documentation.
>
I totally didn't consider '--sort'. I do agree that we should document
that behavior if we tend to keep it as is. I wonder if it is worthwhile
to even prevent the two from being used together. I find the whole "we
skip before sorting" to be very confusing.
>> This series adds a '--skip-until' option in 'git-for-each-ref(1)'. When
>> used, the reference iteration seeks to the first matching reference and
>> iterates from there onward.
>
> OK.  Even the filesystem backed ones we internall sort after doing
> readdir() loop, so this is feasible.  Nice.
>
Yup. We have 'sort_ref_dir()' to sort each directory parsed.
>> Initally I was also planning to cleanup all the `refs_for_each...()`
>> functions in 'refs.h' by simply using the iterator, but this bloated the
>> series. So I've left that for another day.
>
> OK.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
                   ` (4 preceding siblings ...)
  2025-07-01 17:08 ` [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Junio C Hamano
@ 2025-07-01 21:37 ` Junio C Hamano
  2025-07-02 18:19   ` Karthik Nayak
  2025-07-02 14:14 ` Phillip Wood
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-01 21:37 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Toon Claes
Offtopic.  After applying this topic, I asked clang-format if it
wants to change anything.
    $ git clang-format --diff $(git merge-base HEAD master)
The result was disasterous.  Can "clang-format --diff" mode be
taught a bit more focused to avoid touching existing entries in the
same array (in this case opts[] that has tons of options for the
"git for-each-ref" command), when only one new entry was added, I
wonder?
Also I am not impressed by the change it made to the code that is
commented out (in refs.h).
Line wrapping it did to refs_ref_iterator_begin() is an improvement,
but those to ref_iterator_seek() and do_for_each_ref_iterator() are
unnecessary (both of these were more readble in the original).
Even though I found its output better for Toon's "last-modified"
changes, I am not impressed by what clang-format suggested for this
series.
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 543013cd11..39056557d4 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 #include "strvec.h"
 
-static char const * const for_each_ref_usage[] = {
+static char const *const for_each_ref_usage[] = {
 	N_("git for-each-ref [<options>] [<pattern>]"),
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
@@ -33,32 +33,41 @@ int cmd_for_each_ref(int argc,
 	struct option opts[] = {
 		OPT_BIT('s', "shell", &format.quote_style,
 			N_("quote placeholders suitably for shells"), QUOTE_SHELL),
-		OPT_BIT('p', "perl",  &format.quote_style,
+		OPT_BIT('p', "perl", &format.quote_style,
 			N_("quote placeholders suitably for perl"), QUOTE_PERL),
-		OPT_BIT(0 , "python", &format.quote_style,
-			N_("quote placeholders suitably for python"), QUOTE_PYTHON),
-		OPT_BIT(0 , "tcl",  &format.quote_style,
+		OPT_BIT(0, "python", &format.quote_style,
+			N_("quote placeholders suitably for python"),
+			QUOTE_PYTHON),
+		OPT_BIT(0, "tcl", &format.quote_style,
 			N_("quote placeholders suitably for Tcl"), QUOTE_TCL),
-		OPT_BOOL(0, "omit-empty",  &format.array_opts.omit_empty,
-			N_("do not output a newline after empty formatted refs")),
+		OPT_BOOL(0, "omit-empty", &format.array_opts.omit_empty,
+			 N_("do not output a newline after empty formatted refs")),
 
 		OPT_GROUP(""),
-		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
-		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
-		OPT_STRING(  0 , "skip-until", &filter.seek, N_("skip-until"), N_("skip references until")),
+		OPT_INTEGER(0, "count", &format.array_opts.max_count,
+			    N_("show only <n> matched refs")),
+		OPT_STRING(0, "format", &format.format, N_("format"),
+			   N_("format to use for the output")),
+		OPT_STRING(0, "skip-until", &filter.seek, N_("skip-until"),
+			   N_("skip references until")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
-		OPT_CALLBACK(0, "points-at", &filter.points_at,
-			     N_("object"), N_("print only refs which points at the given object"),
+		OPT_CALLBACK(0, "points-at", &filter.points_at, N_("object"),
+			     N_("print only refs which points at the given object"),
 			     parse_opt_object_name),
 		OPT_MERGED(&filter, N_("print only refs that are merged")),
 		OPT_NO_MERGED(&filter, N_("print only refs that are not merged")),
-		OPT_CONTAINS(&filter.with_commit, N_("print only refs which contain the commit")),
-		OPT_NO_CONTAINS(&filter.no_commit, N_("print only refs which don't contain the commit")),
-		OPT_BOOL(0, "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
-		OPT_BOOL(0, "stdin", &from_stdin, N_("read reference patterns from stdin")),
-		OPT_BOOL(0, "include-root-refs", &include_root_refs, N_("also include HEAD ref and pseudorefs")),
+		OPT_CONTAINS(&filter.with_commit,
+			     N_("print only refs which contain the commit")),
+		OPT_NO_CONTAINS(&filter.no_commit,
+				N_("print only refs which don't contain the commit")),
+		OPT_BOOL(0, "ignore-case", &icase,
+			 N_("sorting and filtering are case insensitive")),
+		OPT_BOOL(0, "stdin", &from_stdin,
+			 N_("read reference patterns from stdin")),
+		OPT_BOOL(0, "include-root-refs", &include_root_refs,
+			 N_("also include HEAD ref and pseudorefs")),
 		OPT_END(),
 	};
 
diff --git a/refs.h b/refs.h
index c5e08db0ff..518b17c748 100644
--- a/refs.h
+++ b/refs.h
@@ -1229,7 +1229,8 @@ int repo_migrate_ref_storage_format(struct repository *repo,
  *
  *             // Access information about the current reference:
  *             if (!(iter->flags & REF_ISSYMREF))
- *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
+ *                     printf("%s is %s\n", iter->refname,
+ * oid_to_hex(iter->oid));
  *
  *             // If you need to peel the reference:
  *             ref_iterator_peel(iter, &oid);
@@ -1284,10 +1285,11 @@ enum do_for_each_ref_flags {
  * trim that many characters off the beginning of each refname.
  * The output is ordered by refname.
  */
-struct ref_iterator *refs_ref_iterator_begin(
-		struct ref_store *refs,
-		const char *prefix, const char **exclude_patterns,
-		int trim, enum do_for_each_ref_flags flags);
+struct ref_iterator *refs_ref_iterator_begin(struct ref_store *refs,
+					     const char *prefix,
+					     const char **exclude_patterns,
+					     int trim,
+					     enum do_for_each_ref_flags flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
@@ -1317,8 +1319,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
  *
  * Returns 0 on success, a negative error code otherwise.
  */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *seek, int set_prefix);
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
+		      int set_prefix);
 
 /*
  * If possible, peel the reference currently being viewed by the
@@ -1339,8 +1341,7 @@ void ref_iterator_free(struct ref_iterator *ref_iterator);
  * adapter between the callback style of reference iteration and the
  * iterator style.
  */
-int do_for_each_ref_iterator(struct ref_iterator *iter,
-			     each_ref_fn fn, void *cb_data);
-
+int do_for_each_ref_iterator(struct ref_iterator *iter, each_ref_fn fn,
+			     void *cb_data);
 
 #endif /* REFS_H */
diff --git a/refs/iterator.c b/refs/iterator.c
index 1f99045d40..2b7f019c3e 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,8 +15,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *seek, int set_prefix)
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
+		      int set_prefix)
 {
 	return ref_iterator->vtable->seek(ref_iterator, seek, set_prefix);
 }
@@ -57,8 +57,7 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 }
 
 static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				   const char *seek UNUSED,
-				   int set_prefix UNUSED)
+				   const char *seek UNUSED, int set_prefix UNUSED)
 {
 	return 0;
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 656e6cd9ff..b812520dc7 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -525,7 +525,8 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 				level->dir = dir;
 				level->index = -1;
 			} else {
-				/* reduce the index so the leaf node is iterated over */
+				/* reduce the index so the leaf node is iterated
+				 * over */
 				if (cmp <= 0 && !slash)
 					level->index = idx - 1;
 				/*
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-01 21:37 ` Junio C Hamano
@ 2025-07-02 18:19   ` Karthik Nayak
  2025-07-03  8:41     ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-02 18:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Toon Claes
[-- Attachment #1: Type: text/plain, Size: 5904 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Offtopic.  After applying this topic, I asked clang-format if it
> wants to change anything.
>
>     $ git clang-format --diff $(git merge-base HEAD master)
>
> The result was disasterous.  Can "clang-format --diff" mode be
> taught a bit more focused to avoid touching existing entries in the
> same array (in this case opts[] that has tons of options for the
> "git for-each-ref" command), when only one new entry was added, I
> wonder?
>
I couldn't find any way to do something like this.
> Also I am not impressed by the change it made to the code that is
> commented out (in refs.h).
>
> Line wrapping it did to refs_ref_iterator_begin() is an improvement,
> but those to ref_iterator_seek() and do_for_each_ref_iterator() are
> unnecessary (both of these were more readble in the original).
>
> Even though I found its output better for Toon's "last-modified"
> changes, I am not impressed by what clang-format suggested for this
> series.
>
It indeed looks really bad, I had a go with the new changes from
'gitster/kn/clang-format-updates'. Which seems a lot better.
However, this does show a problem with using 'RemoveBracesLLVM', where
it formats the following:
  if (...) {
     ...
     ...
  } else {
     ...
  }
to:
  if (...) {
     ...
     ...
  } else
     ...
Which isn't our style, I think we should completely drop this too, from
my patch series. Let me go ahead and do that. I really want to strip out
as many rules as possible to make the number of false positives 0 so we
can actually start enforcing clang-format. Once we enforce it, we can
slowly see what additional rules work well for us.
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 543013cd11..2ec96eff74 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 #include "strvec.h"
-static char const * const for_each_ref_usage[] = {
+static char const *const for_each_ref_usage[] = {
 	N_("git for-each-ref [<options>] [<pattern>]"),
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
@@ -33,19 +33,19 @@ int cmd_for_each_ref(int argc,
 	struct option opts[] = {
 		OPT_BIT('s', "shell", &format.quote_style,
 			N_("quote placeholders suitably for shells"), QUOTE_SHELL),
-		OPT_BIT('p', "perl",  &format.quote_style,
+		OPT_BIT('p', "perl", &format.quote_style,
 			N_("quote placeholders suitably for perl"), QUOTE_PERL),
-		OPT_BIT(0 , "python", &format.quote_style,
+		OPT_BIT(0, "python", &format.quote_style,
 			N_("quote placeholders suitably for python"), QUOTE_PYTHON),
-		OPT_BIT(0 , "tcl",  &format.quote_style,
+		OPT_BIT(0, "tcl", &format.quote_style,
 			N_("quote placeholders suitably for Tcl"), QUOTE_TCL),
-		OPT_BOOL(0, "omit-empty",  &format.array_opts.omit_empty,
-			N_("do not output a newline after empty formatted refs")),
+		OPT_BOOL(0, "omit-empty", &format.array_opts.omit_empty,
+			 N_("do not output a newline after empty formatted refs")),
 		OPT_GROUP(""),
-		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show
only <n> matched refs")),
-		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format
to use for the output")),
-		OPT_STRING(  0 , "skip-until", &filter.seek, N_("skip-until"),
N_("skip references until")),
+		OPT_INTEGER(0, "count", &format.array_opts.max_count, N_("show only
<n> matched refs")),
+		OPT_STRING(0, "format", &format.format, N_("format"), N_("format to
use for the output")),
+		OPT_STRING(0, "skip-until", &filter.seek, N_("skip-until"),
N_("skip references until")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
diff --git a/refs.c b/refs.c
index a4220d3537..d492e1b423 100644
--- a/refs.c
+++ b/refs.c
@@ -2669,23 +2669,21 @@ enum ref_transaction_error
refs_verify_refnames_available(struct ref_store *refs
 			if (!iter) {
 				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 							       DO_FOR_EACH_INCLUDE_BROKEN);
-			} else if (ref_iterator_seek(iter, dirname.buf, 1) < 0) {
-				goto cleanup;
-			}
+				else if (ref_iterator_seek(iter, dirname.buf, 1) < 0) goto cleanup;
-			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-				if (skip &&
-				    string_list_has_string(skip, iter->refname))
-					continue;
+				while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+					if (skip &&
+					    string_list_has_string(skip, iter->refname))
+						continue;
-				if (transaction && ref_transaction_maybe_set_rejected(
-					    transaction, *update_idx,
-					    REF_TRANSACTION_ERROR_NAME_CONFLICT))
-					continue;
+					if (transaction && ref_transaction_maybe_set_rejected(
+								   transaction, *update_idx,
+								   REF_TRANSACTION_ERROR_NAME_CONFLICT))
+						continue;
-				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-					    iter->refname, refname);
-				goto cleanup;
+					strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+						    iter->refname, refname);
+					goto cleanup;
 			}
 			if (ok != ITER_DONE)
diff --git a/refs.h b/refs.h
index c5e08db0ff..41fe96d688 100644
--- a/refs.h
+++ b/refs.h
@@ -1285,9 +1285,9 @@ enum do_for_each_ref_flags {
  * The output is ordered by refname.
  */
 struct ref_iterator *refs_ref_iterator_begin(
-		struct ref_store *refs,
-		const char *prefix, const char **exclude_patterns,
-		int trim, enum do_for_each_ref_flags flags);
+	struct ref_store *refs,
+	const char *prefix, const char **exclude_patterns,
+	int trim, enum do_for_each_ref_flags flags);
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
@@ -1342,5 +1342,4 @@ void ref_iterator_free(struct ref_iterator *ref_iterator);
 int do_for_each_ref_iterator(struct ref_iterator *iter,
 			     each_ref_fn fn, void *cb_data);
-
 #endif /* REFS_H */
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-02 18:19   ` Karthik Nayak
@ 2025-07-03  8:41     ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-03  8:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Toon Claes
[-- Attachment #1: Type: text/plain, Size: 2364 bytes --]
Karthik Nayak <karthik.188@gmail.com> writes:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Offtopic.  After applying this topic, I asked clang-format if it
>> wants to change anything.
>>
>>     $ git clang-format --diff $(git merge-base HEAD master)
>>
>> The result was disasterous.  Can "clang-format --diff" mode be
>> taught a bit more focused to avoid touching existing entries in the
>> same array (in this case opts[] that has tons of options for the
>> "git for-each-ref" command), when only one new entry was added, I
>> wonder?
>>
>
> I couldn't find any way to do something like this.
>
>> Also I am not impressed by the change it made to the code that is
>> commented out (in refs.h).
>>
>> Line wrapping it did to refs_ref_iterator_begin() is an improvement,
>> but those to ref_iterator_seek() and do_for_each_ref_iterator() are
>> unnecessary (both of these were more readble in the original).
>>
>> Even though I found its output better for Toon's "last-modified"
>> changes, I am not impressed by what clang-format suggested for this
>> series.
>>
>
> It indeed looks really bad, I had a go with the new changes from
> 'gitster/kn/clang-format-updates'. Which seems a lot better.
>
> However, this does show a problem with using 'RemoveBracesLLVM', where
> it formats the following:
>
>   if (...) {
>      ...
>      ...
>   } else {
>      ...
>   }
>
> to:
>
>   if (...) {
>      ...
>      ...
>   } else
>      ...
>
> Which isn't our style, I think we should completely drop this too, from
> my patch series. Let me go ahead and do that. I really want to strip out
> as many rules as possible to make the number of false positives 0 so we
> can actually start enforcing clang-format. Once we enforce it, we can
> slowly see what additional rules work well for us.
>
I did some more testing here, and it seems like this was because this
particular instance was more like
   if (...) {
      ...
   } else {
      ...
   }
Where both the clauses had single line statement, but we only modified
the 'else' part of the clause in this patch series, so clang-format,
only suggested removing the braces from the 'else' clause.
So all is good here, I think we can go ahead with the
'gitster/kn/clang-format-updates' and merge it to 'next'. Sorry for
being the false positive, I thought I missed testing a particular case
and the series.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
                   ` (5 preceding siblings ...)
  2025-07-01 21:37 ` Junio C Hamano
@ 2025-07-02 14:14 ` Phillip Wood
  2025-07-02 20:33   ` Karthik Nayak
  2025-07-04 13:02 ` [PATCH v2 " Karthik Nayak
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 102+ messages in thread
From: Phillip Wood @ 2025-07-02 14:14 UTC (permalink / raw)
  To: Karthik Nayak, git
Hi Karthik
On 01/07/2025 16:03, Karthik Nayak wrote:
> 
> This enables efficient pagination workflows like:
>      git for-each-ref --count=100
>      git for-each-ref --count=100 --skip-until=refs/heads/branch-100
>      git for-each-ref --count=100 --skip-until=refs/heads/branch-200
Doesn't that require you to know the name of the ref after the last one 
returned by the previous batch? If the use case here is pagination then 
being able to provide a numeric offset might be a better fit. For example
	git for-each-ref --count=100 --start=200
would show refs 200 to 300
Thanks
Phillip
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-02 14:14 ` Phillip Wood
@ 2025-07-02 20:33   ` Karthik Nayak
  2025-07-03  5:18     ` Patrick Steinhardt
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-02 20:33 UTC (permalink / raw)
  To: phillip.wood, git
[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]
Phillip Wood <phillip.wood123@gmail.com> writes:
Hello Phillip,
> Hi Karthik
>
> On 01/07/2025 16:03, Karthik Nayak wrote:
>>
>> This enables efficient pagination workflows like:
>>      git for-each-ref --count=100
>>      git for-each-ref --count=100 --skip-until=refs/heads/branch-100
>>      git for-each-ref --count=100 --skip-until=refs/heads/branch-200
>
> Doesn't that require you to know the name of the ref after the last one
> returned by the previous batch? If the use case here is pagination then
> being able to provide a numeric offset might be a better fit. For example
>
It does require that you know the last ref from the previous batch.
The reason for picking a reference offset is mostly for performance
optimization. Our reference backends are built with prefix matching in
mind, in short they do a binary search through the reference namespace
to find the required prefix. By using a reference offset we can utilize
this binary search mechanism to arrive at offset.
Using a count offset would require iteration to reach the desired
offset (basically a O(N) operation). This wouldn't really matter in
repositories with ~10^3 refs, but in larger repositories with around
~10^6 refs this starts to make a large difference.
> 	git for-each-ref --count=100 --start=200
>
> would show refs 200 to 300
>
> Thanks
>
> Phillip
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-02 20:33   ` Karthik Nayak
@ 2025-07-03  5:18     ` Patrick Steinhardt
  2025-07-03  5:56       ` Junio C Hamano
  0 siblings, 1 reply; 102+ messages in thread
From: Patrick Steinhardt @ 2025-07-03  5:18 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: phillip.wood, git
On Wed, Jul 02, 2025 at 03:33:47PM -0500, Karthik Nayak wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
> Hello Phillip,
> 
> > Hi Karthik
> >
> > On 01/07/2025 16:03, Karthik Nayak wrote:
> >>
> >> This enables efficient pagination workflows like:
> >>      git for-each-ref --count=100
> >>      git for-each-ref --count=100 --skip-until=refs/heads/branch-100
> >>      git for-each-ref --count=100 --skip-until=refs/heads/branch-200
> >
> > Doesn't that require you to know the name of the ref after the last one
> > returned by the previous batch? If the use case here is pagination then
> > being able to provide a numeric offset might be a better fit. For example
> >
> 
> It does require that you know the last ref from the previous batch.
> 
> The reason for picking a reference offset is mostly for performance
> optimization. Our reference backends are built with prefix matching in
> mind, in short they do a binary search through the reference namespace
> to find the required prefix. By using a reference offset we can utilize
> this binary search mechanism to arrive at offset.
> 
> Using a count offset would require iteration to reach the desired
> offset (basically a O(N) operation). This wouldn't really matter in
> repositories with ~10^3 refs, but in larger repositories with around
> ~10^6 refs this starts to make a large difference.
Even more importantly though, a numeric offset would be invalidated by a
concurrent write in case that write ends up inserting a ref in the range
of commits you intend to skip now.
Patrick
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-03  5:18     ` Patrick Steinhardt
@ 2025-07-03  5:56       ` Junio C Hamano
  2025-07-03  8:19         ` Patrick Steinhardt
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-03  5:56 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, phillip.wood, git
Patrick Steinhardt <ps@pks.im> writes:
> Even more importantly though, a numeric offset would be invalidated by a
> concurrent write in case that write ends up inserting a ref in the range
> of commits you intend to skip now.
That argument cuts both ways, no?  You have shown up to some ref
which you remember in the last cycle, and then while you are
planning to formulate another query with --skip-until naming that
ref, somebody removes that ref, then what happens?  Or somebody
inserts a new ref that sorts earlier than the ref you stopped at the
last time.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-03  5:56       ` Junio C Hamano
@ 2025-07-03  8:19         ` Patrick Steinhardt
  2025-07-03  8:48           ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Patrick Steinhardt @ 2025-07-03  8:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, phillip.wood, git
On Wed, Jul 02, 2025 at 10:56:18PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Even more importantly though, a numeric offset would be invalidated by a
> > concurrent write in case that write ends up inserting a ref in the range
> > of commits you intend to skip now.
> 
> That argument cuts both ways, no?  You have shown up to some ref
> which you remember in the last cycle, and then while you are
> planning to formulate another query with --skip-until naming that
> ref, somebody removes that ref, then what happens?
This ref was already yielded, and it wouldn't and shouldn't be yielded
on the next page. This works as expected with the proposal, as
`--skip-until` does not care whether the value itself actually exists.
> Or somebody inserts a new ref that sorts earlier than the ref you
> stopped at the last time.
It wouldn't and shouldn't be shown. When I have already yielded all refs
up to refs/heads/something, I don't expect to see any ref that sorts
before refs/heads/something on the next page.
Patrick
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-03  8:19         ` Patrick Steinhardt
@ 2025-07-03  8:48           ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-03  8:48 UTC (permalink / raw)
  To: Patrick Steinhardt, Junio C Hamano; +Cc: phillip.wood, git
[-- Attachment #1: Type: text/plain, Size: 1651 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Wed, Jul 02, 2025 at 10:56:18PM -0700, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>>
>> > Even more importantly though, a numeric offset would be invalidated by a
>> > concurrent write in case that write ends up inserting a ref in the range
>> > of commits you intend to skip now.
>>
>> That argument cuts both ways, no?  You have shown up to some ref
>> which you remember in the last cycle, and then while you are
>> planning to formulate another query with --skip-until naming that
>> ref, somebody removes that ref, then what happens?
>
> This ref was already yielded, and it wouldn't and shouldn't be yielded
> on the next page. This works as expected with the proposal, as
> `--skip-until` does not care whether the value itself actually exists.
The current version of the series will include the reference provided to
'--skip-until', if it exists. But your latter statement still holds, if
the reference doesn't exist, it will still work by finding the next
reference in the default sort order.
>> Or somebody inserts a new ref that sorts earlier than the ref you
>> stopped at the last time.
>
> It wouldn't and shouldn't be shown. When I have already yielded all refs
> up to refs/heads/something, I don't expect to see any ref that sorts
> before refs/heads/something on the next page.
>
Yeah, this was my thought too. Another way to think of this is that in a
cursor based approach, a particular reference is guarateed never to
occur again, even with modifications to the repository made between
requests. However in a count based approach this doesn't stand.
> Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
                   ` (6 preceding siblings ...)
  2025-07-02 14:14 ` Phillip Wood
@ 2025-07-04 13:02 ` Karthik Nayak
  2025-07-04 13:02   ` [PATCH v2 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
                     ` (4 more replies)
  2025-07-08 13:47 ` [PATCH v3 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
                   ` (2 subsequent siblings)
  10 siblings, 5 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-04 13:02 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
This series adds a '--skip-until' option in 'git-for-each-ref(1)'. When
used, the reference iteration seeks to the first matching reference and
iterates from there onward.
This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --skip-until=refs/heads/branch-100
    git for-each-ref --count=100 --skip-until=refs/heads/branch-200
To add this functionality, we expose the `ref_iterator` outside the
'refs/' namespace and modify the `ref_iterator_seek()` to actually seek
to a given reference and only set the prefix when the `set_prefix` field
is set.
On the reftable and packed backend, the changes are simple. But since
the files backend uses 'ref-cache' for reference handling, the changes
there are a little more involved, since we need to setup the right
levels and the indexing.
Initially I was also planning to cleanup all the `refs_for_each...()`
functions in 'refs.h' by simply using the iterator, but this bloated the
series. So I've left that for another day.
Changes in v2:
- Modify 'ref_iterator_seek()' to take in flags instead of a
  'set_prefix' variable. This improves readability, where users would
  use the 'REF_ITERATOR_SEEK_SET_PREFIX' instead of simply passing '1'.
- When the set prefix flag isn't usage, reset any previously set prefix.
  This ensures that the internal prefix state is always reset whenever
  we seek and unifies the behavior between 'ref_iterator_seek' and
  'ref_iterator_begin'.
- Don't allow '--skip-until' to be run with '--sort', since the seeking
  always takes place before any sorting and this can be confusing.
- Some styling fixes:
  - Remove extra newline
  - Skip braces around single lined if...else clause
  - Add braces around 'if' clause
  - Fix indentation
- Link to v1: https://lore.kernel.org/git/20250701-306-git-for-each-ref-pagination-v1-0-4f0ae7c0688f@gmail.com/
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-for-each-ref.adoc |   6 +-
 builtin/for-each-ref.c              |   8 ++
 ref-filter.c                        |  61 ++++++++----
 ref-filter.h                        |   1 +
 refs.c                              |   6 +-
 refs.h                              | 158 ++++++++++++++++++++++++++++++
 refs/debug.c                        |   7 +-
 refs/files-backend.c                |   7 +-
 refs/iterator.c                     |  26 ++---
 refs/packed-backend.c               |  17 ++--
 refs/ref-cache.c                    |  99 +++++++++++++++----
 refs/ref-cache.h                    |   7 --
 refs/refs-internal.h                | 152 ++---------------------------
 refs/reftable-backend.c             |  21 ++--
 t/t6302-for-each-ref-filter.sh      | 188 ++++++++++++++++++++++++++++++++++++
 15 files changed, 538 insertions(+), 226 deletions(-)
Karthik Nayak (4):
      refs: expose `ref_iterator` via 'refs.h'
      ref-cache: remove unused function 'find_ref_entry()'
      refs: selectively set prefix in the seek functions
      for-each-ref: introduce a '--skip-until' option
Range-diff versus v1:
1:  a20ef2c841 ! 1:  d19553c365 refs: expose `ref_iterator` via 'refs.h'
    @@ refs.h: int repo_migrate_ref_storage_format(struct repository *repo,
     + * The output is ordered by refname.
     + */
     +struct ref_iterator *refs_ref_iterator_begin(
    -+		struct ref_store *refs,
    -+		const char *prefix, const char **exclude_patterns,
    -+		int trim, enum do_for_each_ref_flags flags);
    ++	struct ref_store *refs,
    ++	const char *prefix, const char **exclude_patterns,
    ++	int trim, enum do_for_each_ref_flags flags);
     +
     +/*
     + * Advance the iterator to the first or next item and return ITER_OK.
    @@ refs.h: int repo_migrate_ref_storage_format(struct repository *repo,
     + */
     +int do_for_each_ref_iterator(struct ref_iterator *iter,
     +			     each_ref_fn fn, void *cb_data);
    -+
     +
      #endif /* REFS_H */
     
2:  96f3e6eb05 = 2:  1d3936132b ref-cache: remove unused function 'find_ref_entry()'
3:  fa19b53a37 ! 3:  aab2011494 refs: selectively set prefix in the seek functions
    @@ Commit message
         further iteration would only yield references which match the particular
         prefix. This is a bit confusing.
     
    -    Let's add a 'set_prefix' field to the function, which when set, will set
    -    the prefix for the iteration in-line with the existing behavior. But
    -    when the 'set_prefix' field is not set, the reference backends will
    -    simply seek to the specified reference without setting prefix. This
    -    allows users to start iteration from a specific reference.
    +    Let's add a 'flags' field to the function, which when set with the
    +    'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
    +    iteration in-line with the existing behavior. Otherwise, the reference
    +    backends will simply seek to the specified reference and clears any
    +    previously set prefix. This allows users to start iteration from a
    +    specific reference.
     
         In the packed and reftable backend, since references are available in a
         sorted list, the changes are simply setting the prefix if needed. The
         changes on the files-backend are a little more involved, since the files
         backend uses the 'ref-cache' mechanism. We move out the existing logic
         within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
    -    which is called when `set_prefix` is set. We then parse the provided
    -    seek string and set the required levels and their indexes to ensure that
    -    seeking is possible.
    +    which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
    +    then parse the provided seek string and set the required levels and
    +    their indexes to ensure that seeking is possible.
     
    +    Helped-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## refs.c ##
     @@ refs.c: enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs
    - 			if (!iter) {
    + 		if (!initial_transaction) {
    + 			int ok;
    + 
    +-			if (!iter) {
    ++			if (!iter)
      				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
      							       DO_FOR_EACH_INCLUDE_BROKEN);
     -			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
    -+			} else if (ref_iterator_seek(iter, dirname.buf, 1) < 0) {
    ++			else if (ref_iterator_seek(iter, dirname.buf,
    ++						   REF_ITERATOR_SEEK_SET_PREFIX) < 0)
      				goto cleanup;
    - 			}
    +-			}
      
    + 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
    + 				if (skip &&
     
      ## refs.h ##
     @@ refs.h: struct ref_iterator *refs_ref_iterator_begin(
    +  */
      int ref_iterator_advance(struct ref_iterator *ref_iterator);
      
    ++enum ref_iterator_seek_flag {
    ++	/*
    ++	 * Also set the seek pattern as a prefix for iteration. This ensures
    ++	 * that only references which match the prefix are yielded.
    ++	 */
    ++	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
    ++};
    ++
      /*
     - * Seek the iterator to the first reference with the given prefix.
     - * The prefix is matched as a literal string, without regard for path
    +- * separators. If prefix is NULL or the empty string, seek the iterator to the
     + * Seek the iterator to the first reference matching the given seek string.
     + * The seek string is matched as a literal string, without regard for path
    -  * separators. If prefix is NULL or the empty string, seek the iterator to the
    ++ * separators. If seek is NULL or the empty string, seek the iterator to the
       * first reference again.
       *
     - * This function is expected to behave as if a new ref iterator with the same
     - * prefix had been created, but allows reuse of iterators and thus may allow
     - * the backend to optimize. Parameters other than the prefix that have been
     - * passed when creating the iterator will remain unchanged.
    -+ * When set_prefix is true, this function behaves as if a new ref iterator
    -+ * with the same prefix had been created, setting the prefix for subsequent
    -+ * iteration. When set_prefix is false, the iterator simply seeks to the
    -+ * specified reference without changing the existing prefix, allowing
    -+ * iteration to start from that specific reference.
    ++ * This function is expected to behave as if a new ref iterator has been
    ++ * created, but allows reuse of existing iterators for optimization.
     + *
    -+ * This function allows reuse of iterators and thus may allow the backend
    -+ * to optimize. Parameters other than the prefix that have been passed when
    -+ * creating the iterator will remain unchanged.
    ++ * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
    ++ * updated to match the seek string, affecting all subsequent iterations. If
    ++ * not, the iterator seeks to the specified reference and clears any previously
    ++ * set prefix.
       *
       * Returns 0 on success, a negative error code otherwise.
       */
    - int ref_iterator_seek(struct ref_iterator *ref_iterator,
    +-int ref_iterator_seek(struct ref_iterator *ref_iterator,
     -		      const char *prefix);
    -+		      const char *seek, int set_prefix);
    ++int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
    ++		      unsigned int flags);
      
      /*
       * If possible, peel the reference currently being viewed by the
    @@ refs/debug.c: static int debug_ref_iterator_advance(struct ref_iterator *ref_ite
      
      static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, int set_prefix)
    ++				   const char *seek, unsigned int flags)
      {
      	struct debug_ref_iterator *diter =
      		(struct debug_ref_iterator *)ref_iterator;
     -	int res = diter->iter->vtable->seek(diter->iter, prefix);
     -	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
    -+	int res = diter->iter->vtable->seek(diter->iter, seek, set_prefix);
    -+	trace_printf_key(&trace_refs, "iterator_seek: %s set_prefix: %d: %d\n",
    -+			 seek ? seek : "", set_prefix, res);
    ++	int res = diter->iter->vtable->seek(diter->iter, seek, flags);
    ++	trace_printf_key(&trace_refs, "iterator_seek: %s flags: %d: %d\n",
    ++			 seek ? seek : "", flags, res);
      	return res;
      }
      
    @@ refs/files-backend.c: static int files_ref_iterator_advance(struct ref_iterator
      
      static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, int set_prefix)
    ++				   const char *seek, unsigned int flags)
      {
      	struct files_ref_iterator *iter =
      		(struct files_ref_iterator *)ref_iterator;
     -	return ref_iterator_seek(iter->iter0, prefix);
    -+	return ref_iterator_seek(iter->iter0, seek, set_prefix);
    ++	return ref_iterator_seek(iter->iter0, seek, flags);
      }
      
      static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/files-backend.c: static int files_reflog_iterator_advance(struct ref_iterat
      static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -				      const char *prefix UNUSED)
     +				      const char *seek UNUSED,
    -+				      int set_prefix UNUSED)
    ++				      unsigned int flags UNUSED)
      {
      	BUG("ref_iterator_seek() called for reflog_iterator");
      }
     
      ## refs/iterator.c ##
     @@ refs/iterator.c: int ref_iterator_advance(struct ref_iterator *ref_iterator)
    + 	return ref_iterator->vtable->advance(ref_iterator);
      }
      
    - int ref_iterator_seek(struct ref_iterator *ref_iterator,
    +-int ref_iterator_seek(struct ref_iterator *ref_iterator,
     -		      const char *prefix)
    -+		      const char *seek, int set_prefix)
    ++int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
    ++		      unsigned int flags)
      {
     -	return ref_iterator->vtable->seek(ref_iterator, prefix);
    -+	return ref_iterator->vtable->seek(ref_iterator, seek, set_prefix);
    ++	return ref_iterator->vtable->seek(ref_iterator, seek, flags);
      }
      
      int ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/iterator.c: static int empty_ref_iterator_advance(struct ref_iterator *ref_
      static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -				   const char *prefix UNUSED)
     +				   const char *seek UNUSED,
    -+				   int set_prefix UNUSED)
    ++				   unsigned int flags UNUSED)
      {
      	return 0;
      }
    @@ refs/iterator.c: static int merge_ref_iterator_advance(struct ref_iterator *ref_
      
      static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, int set_prefix)
    ++				   const char *seek, unsigned int flags)
      {
      	struct merge_ref_iterator *iter =
      		(struct merge_ref_iterator *)ref_iterator;
    @@ refs/iterator.c: static int merge_ref_iterator_seek(struct ref_iterator *ref_ite
      	iter->iter1 = iter->iter1_owned;
      
     -	ret = ref_iterator_seek(iter->iter0, prefix);
    -+	ret = ref_iterator_seek(iter->iter0, seek, set_prefix);
    ++	ret = ref_iterator_seek(iter->iter0, seek, flags);
      	if (ret < 0)
      		return ret;
      
     -	ret = ref_iterator_seek(iter->iter1, prefix);
    -+	ret = ref_iterator_seek(iter->iter1, seek, set_prefix);
    ++	ret = ref_iterator_seek(iter->iter1, seek, flags);
      	if (ret < 0)
      		return ret;
      
    @@ refs/iterator.c: static int prefix_ref_iterator_advance(struct ref_iterator *ref
      
      static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				    const char *prefix)
    -+				    const char *seek, int set_prefix)
    ++				    const char *seek, unsigned int flags)
      {
      	struct prefix_ref_iterator *iter =
      		(struct prefix_ref_iterator *)ref_iterator;
    @@ refs/iterator.c: static int prefix_ref_iterator_advance(struct ref_iterator *ref
     -	iter->prefix = xstrdup_or_null(prefix);
     -	return ref_iterator_seek(iter->iter0, prefix);
     +
    -+	if (set_prefix) {
    ++	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
     +		free(iter->prefix);
     +		iter->prefix = xstrdup_or_null(seek);
     +	}
    -+	return ref_iterator_seek(iter->iter0, seek, set_prefix);
    ++	return ref_iterator_seek(iter->iter0, seek, flags);
      }
      
      static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/packed-backend.c: static int packed_ref_iterator_advance(struct ref_iterato
      
      static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				    const char *prefix)
    -+				    const char *seek, int set_prefix)
    ++				    const char *seek, unsigned int flags)
      {
      	struct packed_ref_iterator *iter =
      		(struct packed_ref_iterator *)ref_iterator;
    @@ refs/packed-backend.c: static int packed_ref_iterator_advance(struct ref_iterato
      
     -	free(iter->prefix);
     -	iter->prefix = xstrdup_or_null(prefix);
    -+	if (set_prefix) {
    -+		free(iter->prefix);
    ++	/* Unset any previously set prefix */
    ++	FREE_AND_NULL(iter->prefix);
    ++
    ++	if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
     +		iter->prefix = xstrdup_or_null(seek);
    -+	}
     +
      	iter->pos = start;
      	iter->eof = iter->snapshot->eof;
    @@ refs/packed-backend.c: static struct ref_iterator *packed_ref_iterator_begin(
      	iter->flags = flags;
      
     -	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
    -+	if (packed_ref_iterator_seek(&iter->base, prefix, 1) < 0) {
    ++	if (packed_ref_iterator_seek(&iter->base, prefix,
    ++				     REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
      		ref_iterator_free(&iter->base);
      		return NULL;
      	}
    @@ refs/ref-cache.c: static int cache_ref_iterator_seek(struct ref_iterator *ref_it
      }
      
     +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
    -+				   const char *seek, int set_prefix)
    ++				   const char *seek, unsigned int flags)
     +{
     +	struct cache_ref_iterator *iter =
     +		(struct cache_ref_iterator *)ref_iterator;
     +
    -+	if (set_prefix) {
    ++	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
     +		return cache_ref_iterator_set_prefix(iter, seek);
     +	} else if (seek && *seek) {
     +		struct cache_ref_iterator_level *level;
    @@ refs/ref-cache.c: static int cache_ref_iterator_seek(struct ref_iterator *ref_it
     +		level->index = -1;
     +		level->dir = dir;
     +
    ++		/* Unset any previously set prefix */
    ++		FREE_AND_NULL(iter->prefix);
    ++
     +		/*
     +		 * Breakdown the provided seek path and assign the correct
     +		 * indexing to each level as needed.
    @@ refs/ref-cache.c: struct ref_iterator *cache_ref_iterator_begin(struct ref_cache
      	iter->prime_dir = prime_dir;
      
     -	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
    -+	if (cache_ref_iterator_seek(&iter->base, prefix, 1) < 0) {
    ++	if (cache_ref_iterator_seek(&iter->base, prefix,
    ++				    REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
      		ref_iterator_free(&iter->base);
      		return NULL;
      	}
    @@ refs/refs-internal.h: void base_ref_iterator_init(struct ref_iterator *iter,
       */
      typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
     -				 const char *prefix);
    -+				 const char *seek, int set_prefix);
    ++				 const char *seek, unsigned int flags);
      
      /*
       * Peels the current ref, returning 0 for success or -1 for failure.
    @@ refs/reftable-backend.c: static int reftable_ref_iterator_advance(struct ref_ite
      
      static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				      const char *prefix)
    -+				      const char *seek, int set_prefix)
    ++				      const char *seek, unsigned int flags)
      {
      	struct reftable_ref_iterator *iter =
      		(struct reftable_ref_iterator *)ref_iterator;
    @@ refs/reftable-backend.c: static int reftable_ref_iterator_advance(struct ref_ite
     -	iter->prefix = xstrdup_or_null(prefix);
     -	iter->prefix_len = prefix ? strlen(prefix) : 0;
     -	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
    -+	if (set_prefix) {
    -+		free(iter->prefix);
    ++	/* Unset any previously set prefix */
    ++	FREE_AND_NULL(iter->prefix);
    ++	iter->prefix_len = 0;
    ++
    ++	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
     +		iter->prefix = xstrdup_or_null(seek);
     +		iter->prefix_len = seek ? strlen(seek) : 0;
     +	}
    @@ refs/reftable-backend.c: static struct reftable_ref_iterator *ref_iterator_for_s
      		goto done;
      
     -	ret = reftable_ref_iterator_seek(&iter->base, prefix);
    -+	ret = reftable_ref_iterator_seek(&iter->base, prefix, 1);
    ++	ret = reftable_ref_iterator_seek(&iter->base, prefix,
    ++					 REF_ITERATOR_SEEK_SET_PREFIX);
      	if (ret)
      		goto done;
      
    @@ refs/reftable-backend.c: static int reftable_reflog_iterator_advance(struct ref_
      static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -					 const char *prefix UNUSED)
     +					 const char *seek UNUSED,
    -+					 int set_prefix UNUSED)
    ++					 unsigned int flags UNUSED)
      {
      	BUG("reftable reflog iterator cannot be seeked");
      	return -1;
4:  0cfc879a93 ! 4:  dc6ffa9002 for-each-ref: introduce a '--skip-until' option
    @@ Documentation/git-for-each-ref.adoc: TAB %(refname)`.
      	List root refs (HEAD and pseudorefs) apart from regular refs.
      
     +--skip-until::
    -+    Skip references up to the specified pattern. Cannot be used with
    -+    general pattern matching.
    ++    Skip references up to but excluding the specified pattern. Cannot be used
    ++    with general pattern matching or custom sort options.
     +
      FIELD NAMES
      -----------
    @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
      		OPT__COLOR(&format.use_color, N_("respect format colors")),
      		OPT_REF_FILTER_EXCLUDE(&filter),
      		OPT_REF_SORT(&sorting_options),
    +@@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
    + 	if (verify_ref_format(&format))
    + 		usage_with_options(for_each_ref_usage, opts);
    + 
    ++	if (filter.seek && sorting_options.nr > 1)
    ++		die(_("cannot use --skip-until custom sort options"));
    ++
    + 	sorting = ref_sorting_options(&sorting_options);
    + 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
    + 	filter.ignore_case = icase;
     @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
      		filter.name_patterns = argv;
      	}
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      
      /*
     @@ ref-filter.c: static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
    - 	if (!filter->kind)
    + 	init_contains_cache(&filter->internal.no_contains_cache);
    + 
    + 	/*  Simple per-ref filtering */
    +-	if (!filter->kind)
    ++	if (!filter->kind) {
      		die("filter_refs: invalid type");
    - 	else {
    +-	else {
    ++	} else {
     +		const char *prefix = NULL;
     +
      		/*
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	cat >expect <<-\EOF &&
     +	fatal: cannot use --skip-until with patterns
     +	EOF
    -+	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot refs/tags  2>actual &&
    ++	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot refs/tags 2>actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'skip until used with custom sort order' '
    ++	cat >expect <<-\EOF &&
    ++	fatal: cannot use --skip-until custom sort options
    ++	EOF
    ++	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot --sort=author 2>actual &&
     +	test_cmp expect actual
     +'
     +
base-commit: cf6f63ea6bf35173e02e18bdc6a4ba41288acff9
change-id: 20250605-306-git-for-each-ref-pagination-0ba8a29ae646
Thanks
- Karthik
^ permalink raw reply	[flat|nested] 102+ messages in thread* [PATCH v2 1/4] refs: expose `ref_iterator` via 'refs.h'
  2025-07-04 13:02 ` [PATCH v2 " Karthik Nayak
@ 2025-07-04 13:02   ` Karthik Nayak
  2025-07-04 13:02   ` [PATCH v2 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-04 13:02 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps
The `ref_iterator` is an internal structure to the 'refs/'
sub-directory, which allows iteration over refs. All reference iteration
is built on top of these iterators.
External clients of the 'refs' subsystem use the various
'refs_for_each...()' functions to iterate over refs. However since these
are wrapper functions, each combination of functionality requires a new
wrapper function. This is not feasible as the functions pile up with the
increase in requirements. Expose the internal reference iterator, so
advanced users can mix and match options as needed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++
  | 145 +-------------------------------------------------
 2 files changed, 149 insertions(+), 143 deletions(-)
 --git a/refs.h b/refs.h
index 46a6008e07..7c21aaef3d 100644
--- a/refs.h
+++ b/refs.h
@@ -1190,4 +1190,151 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 				    unsigned int flags,
 				    struct strbuf *err);
 
+/*
+ * Reference iterators
+ *
+ * A reference iterator encapsulates the state of an in-progress
+ * iteration over references. Create an instance of `struct
+ * ref_iterator` via one of the functions in this module.
+ *
+ * A freshly-created ref_iterator doesn't yet point at a reference. To
+ * advance the iterator, call ref_iterator_advance(). If successful,
+ * this sets the iterator's refname, oid, and flags fields to describe
+ * the next reference and returns ITER_OK. The data pointed at by
+ * refname and oid belong to the iterator; if you want to retain them
+ * after calling ref_iterator_advance() again or calling
+ * ref_iterator_free(), you must make a copy. When the iteration has
+ * been exhausted, ref_iterator_advance() releases any resources
+ * associated with the iteration, frees the ref_iterator object, and
+ * returns ITER_DONE. If you want to abort the iteration early, call
+ * ref_iterator_free(), which also frees the ref_iterator object and
+ * any associated resources. If there was an internal error advancing
+ * to the next entry, ref_iterator_advance() aborts the iteration,
+ * frees the ref_iterator, and returns ITER_ERROR.
+ *
+ * The reference currently being looked at can be peeled by calling
+ * ref_iterator_peel(). This function is often faster than peel_ref(),
+ * so it should be preferred when iterating over references.
+ *
+ * Putting it all together, a typical iteration looks like this:
+ *
+ *     int ok;
+ *     struct ref_iterator *iter = ...;
+ *
+ *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+ *             if (want_to_stop_iteration()) {
+ *                     ok = ITER_DONE;
+ *                     break;
+ *             }
+ *
+ *             // Access information about the current reference:
+ *             if (!(iter->flags & REF_ISSYMREF))
+ *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
+ *
+ *             // If you need to peel the reference:
+ *             ref_iterator_peel(iter, &oid);
+ *     }
+ *
+ *     if (ok != ITER_DONE)
+ *             handle_error();
+ *     ref_iterator_free(iter);
+ */
+struct ref_iterator;
+
+/*
+ * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
+ * which feeds it).
+ */
+enum do_for_each_ref_flags {
+	/*
+	 * Include broken references in a do_for_each_ref*() iteration, which
+	 * would normally be omitted. This includes both refs that point to
+	 * missing objects (a true repository corruption), ones with illegal
+	 * names (which we prefer not to expose to callers), as well as
+	 * dangling symbolic refs (i.e., those that point to a non-existent
+	 * ref; this is not a corruption, but as they have no valid oid, we
+	 * omit them from normal iteration results).
+	 */
+	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
+
+	/*
+	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
+	 * Normally this will be used with a files ref_store, since that's
+	 * where all reference backends will presumably store their
+	 * per-worktree refs.
+	 */
+	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
+
+	/*
+	 * Omit dangling symrefs from output; this only has an effect with
+	 * INCLUDE_BROKEN, since they are otherwise not included at all.
+	 */
+	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
+
+	/*
+	 * Include root refs i.e. HEAD and pseudorefs along with the regular
+	 * refs.
+	 */
+	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
+};
+
+/*
+ * Return an iterator that goes over each reference in `refs` for
+ * which the refname begins with prefix. If trim is non-zero, then
+ * trim that many characters off the beginning of each refname.
+ * The output is ordered by refname.
+ */
+struct ref_iterator *refs_ref_iterator_begin(
+	struct ref_store *refs,
+	const char *prefix, const char **exclude_patterns,
+	int trim, enum do_for_each_ref_flags flags);
+
+/*
+ * Advance the iterator to the first or next item and return ITER_OK.
+ * If the iteration is exhausted, free the resources associated with
+ * the ref_iterator and return ITER_DONE. On errors, free the iterator
+ * resources and return ITER_ERROR. It is a bug to use ref_iterator or
+ * call this function again after it has returned ITER_DONE or
+ * ITER_ERROR.
+ */
+int ref_iterator_advance(struct ref_iterator *ref_iterator);
+
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
+/*
+ * If possible, peel the reference currently being viewed by the
+ * iterator. Return 0 on success.
+ */
+int ref_iterator_peel(struct ref_iterator *ref_iterator,
+		      struct object_id *peeled);
+
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
+
+/*
+ * The common backend for the for_each_*ref* functions. Call fn for
+ * each reference in iter. If the iterator itself ever returns
+ * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
+ * the iteration and return that value. Otherwise, return 0. In any
+ * case, free the iterator when done. This function is basically an
+ * adapter between the callback style of reference iteration and the
+ * iterator style.
+ */
+int do_for_each_ref_iterator(struct ref_iterator *iter,
+			     each_ref_fn fn, void *cb_data);
+
 #endif /* REFS_H */
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index f868870851..03f5df04d5 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -244,90 +244,8 @@ const char *find_descendant_ref(const char *dirname,
 #define SYMREF_MAXDEPTH 5
 
 /*
- * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
- * which feeds it).
- */
-enum do_for_each_ref_flags {
-	/*
-	 * Include broken references in a do_for_each_ref*() iteration, which
-	 * would normally be omitted. This includes both refs that point to
-	 * missing objects (a true repository corruption), ones with illegal
-	 * names (which we prefer not to expose to callers), as well as
-	 * dangling symbolic refs (i.e., those that point to a non-existent
-	 * ref; this is not a corruption, but as they have no valid oid, we
-	 * omit them from normal iteration results).
-	 */
-	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
-
-	/*
-	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
-	 * Normally this will be used with a files ref_store, since that's
-	 * where all reference backends will presumably store their
-	 * per-worktree refs.
-	 */
-	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
-
-	/*
-	 * Omit dangling symrefs from output; this only has an effect with
-	 * INCLUDE_BROKEN, since they are otherwise not included at all.
-	 */
-	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
-
-	/*
-	 * Include root refs i.e. HEAD and pseudorefs along with the regular
-	 * refs.
-	 */
-	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
-};
-
-/*
- * Reference iterators
- *
- * A reference iterator encapsulates the state of an in-progress
- * iteration over references. Create an instance of `struct
- * ref_iterator` via one of the functions in this module.
- *
- * A freshly-created ref_iterator doesn't yet point at a reference. To
- * advance the iterator, call ref_iterator_advance(). If successful,
- * this sets the iterator's refname, oid, and flags fields to describe
- * the next reference and returns ITER_OK. The data pointed at by
- * refname and oid belong to the iterator; if you want to retain them
- * after calling ref_iterator_advance() again or calling
- * ref_iterator_free(), you must make a copy. When the iteration has
- * been exhausted, ref_iterator_advance() releases any resources
- * associated with the iteration, frees the ref_iterator object, and
- * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_free(), which also frees the ref_iterator object and
- * any associated resources. If there was an internal error advancing
- * to the next entry, ref_iterator_advance() aborts the iteration,
- * frees the ref_iterator, and returns ITER_ERROR.
- *
- * The reference currently being looked at can be peeled by calling
- * ref_iterator_peel(). This function is often faster than peel_ref(),
- * so it should be preferred when iterating over references.
- *
- * Putting it all together, a typical iteration looks like this:
- *
- *     int ok;
- *     struct ref_iterator *iter = ...;
- *
- *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = ITER_DONE;
- *                     break;
- *             }
- *
- *             // Access information about the current reference:
- *             if (!(iter->flags & REF_ISSYMREF))
- *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
- *
- *             // If you need to peel the reference:
- *             ref_iterator_peel(iter, &oid);
- *     }
- *
- *     if (ok != ITER_DONE)
- *             handle_error();
- *     ref_iterator_free(iter);
+ * Data structure for holding a reference iterator. See refs.h for
+ * more details and usage instructions.
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -337,42 +255,6 @@ struct ref_iterator {
 	unsigned int flags;
 };
 
-/*
- * Advance the iterator to the first or next item and return ITER_OK.
- * If the iteration is exhausted, free the resources associated with
- * the ref_iterator and return ITER_DONE. On errors, free the iterator
- * resources and return ITER_ERROR. It is a bug to use ref_iterator or
- * call this function again after it has returned ITER_DONE or
- * ITER_ERROR.
- */
-int ref_iterator_advance(struct ref_iterator *ref_iterator);
-
-/*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
- * first reference again.
- *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
- *
- * Returns 0 on success, a negative error code otherwise.
- */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
-
-/*
- * If possible, peel the reference currently being viewed by the
- * iterator. Return 0 on success.
- */
-int ref_iterator_peel(struct ref_iterator *ref_iterator,
-		      struct object_id *peeled);
-
-/* Free the reference iterator and any associated resources. */
-void ref_iterator_free(struct ref_iterator *ref_iterator);
-
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
  * returns ITER_DONE).
@@ -384,17 +266,6 @@ struct ref_iterator *empty_ref_iterator_begin(void);
  */
 int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
 
-/*
- * Return an iterator that goes over each reference in `refs` for
- * which the refname begins with prefix. If trim is non-zero, then
- * trim that many characters off the beginning of each refname.
- * The output is ordered by refname.
- */
-struct ref_iterator *refs_ref_iterator_begin(
-		struct ref_store *refs,
-		const char *prefix, const char **exclude_patterns,
-		int trim, enum do_for_each_ref_flags flags);
-
 /*
  * A callback function used to instruct merge_ref_iterator how to
  * interleave the entries from iter0 and iter1. The function should
@@ -520,18 +391,6 @@ struct ref_iterator_vtable {
  */
 extern struct ref_iterator *current_ref_iter;
 
-/*
- * The common backend for the for_each_*ref* functions. Call fn for
- * each reference in iter. If the iterator itself ever returns
- * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
- * the iteration and return that value. Otherwise, return 0. In any
- * case, free the iterator when done. This function is basically an
- * adapter between the callback style of reference iteration and the
- * iterator style.
- */
-int do_for_each_ref_iterator(struct ref_iterator *iter,
-			     each_ref_fn fn, void *cb_data);
-
 struct ref_store;
 
 /* refs backends */
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v2 2/4] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-04 13:02 ` [PATCH v2 " Karthik Nayak
  2025-07-04 13:02   ` [PATCH v2 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
@ 2025-07-04 13:02   ` Karthik Nayak
  2025-07-04 13:02   ` [PATCH v2 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-04 13:02 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps
The 'find_ref_entry' function is no longer used, so remove it.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  | 14 --------------
  |  7 -------
 2 files changed, 21 deletions(-)
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index c1f1bab1d5..8aaffa8c6b 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -194,20 +194,6 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir,
 	return dir;
 }
 
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
-{
-	int entry_index;
-	struct ref_entry *entry;
-	dir = find_containing_dir(dir, refname);
-	if (!dir)
-		return NULL;
-	entry_index = search_ref_dir(dir, refname, strlen(refname));
-	if (entry_index == -1)
-		return NULL;
-	entry = dir->entries[entry_index];
-	return (entry->flag & REF_DIR) ? NULL : entry;
-}
-
 /*
  * Emit a warning and return true iff ref1 and ref2 have the same name
  * and the same oid. Die if they have the same name but different
 --git a/refs/ref-cache.h b/refs/ref-cache.h
index 5f04e518c3..f635d2d824 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -201,13 +201,6 @@ void free_ref_cache(struct ref_cache *cache);
  */
 void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
 
-/*
- * Find the value entry with the given name in dir, sorting ref_dirs
- * and recursing into subdirectories as necessary.  If the name is not
- * found or it corresponds to a directory entry, return NULL.
- */
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
-
 /*
  * Start iterating over references in `cache`. If `prefix` is
  * specified, only include references whose names start with that
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v2 3/4] refs: selectively set prefix in the seek functions
  2025-07-04 13:02 ` [PATCH v2 " Karthik Nayak
  2025-07-04 13:02   ` [PATCH v2 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
  2025-07-04 13:02   ` [PATCH v2 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
@ 2025-07-04 13:02   ` Karthik Nayak
  2025-07-04 13:02   ` [PATCH v2 4/4] for-each-ref: introduce a '--skip-until' option Karthik Nayak
  2025-07-04 13:41   ` [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Andreas Schwab
  4 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-04 13:02 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.
However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.
Let's add a 'flags' field to the function, which when set with the
'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
iteration in-line with the existing behavior. Otherwise, the reference
backends will simply seek to the specified reference and clears any
previously set prefix. This allows users to start iteration from a
specific reference.
In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
then parse the provided seek string and set the required levels and
their indexes to ensure that seeking is possible.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                   |  6 ++--
                   | 29 +++++++++++------
             |  7 ++--
     |  7 ++--
          | 26 ++++++++-------
    | 17 ++++++----
         | 85 ++++++++++++++++++++++++++++++++++++++++++++++---
     |  7 ++--
  | 21 ++++++++----
 9 files changed, 155 insertions(+), 50 deletions(-)
 --git a/refs.c b/refs.c
index dce5c49ca2..243e6898b8 100644
--- a/refs.c
+++ b/refs.c
@@ -2666,12 +2666,12 @@ enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs
 		if (!initial_transaction) {
 			int ok;
 
-			if (!iter) {
+			if (!iter)
 				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 							       DO_FOR_EACH_INCLUDE_BROKEN);
-			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+			else if (ref_iterator_seek(iter, dirname.buf,
+						   REF_ITERATOR_SEEK_SET_PREFIX) < 0)
 				goto cleanup;
-			}
 
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 --git a/refs.h b/refs.h
index 7c21aaef3d..7852ad36f3 100644
--- a/refs.h
+++ b/refs.h
@@ -1299,21 +1299,32 @@ struct ref_iterator *refs_ref_iterator_begin(
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+enum ref_iterator_seek_flag {
+	/*
+	 * Also set the seek pattern as a prefix for iteration. This ensures
+	 * that only references which match the prefix are yielded.
+	 */
+	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
+};
+
 /*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * Seek the iterator to the first reference matching the given seek string.
+ * The seek string is matched as a literal string, without regard for path
+ * separators. If seek is NULL or the empty string, seek the iterator to the
  * first reference again.
  *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
+ * This function is expected to behave as if a new ref iterator has been
+ * created, but allows reuse of existing iterators for optimization.
+ *
+ * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
+ * updated to match the seek string, affecting all subsequent iterations. If
+ * not, the iterator seeks to the specified reference and clears any previously
+ * set prefix.
  *
  * Returns 0 on success, a negative error code otherwise.
  */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
+		      unsigned int flags);
 
 /*
  * If possible, peel the reference currently being viewed by the
 --git a/refs/debug.c b/refs/debug.c
index 485e3079d7..2ed8cff2aa 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -170,12 +170,13 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, unsigned int flags)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->seek(diter->iter, prefix);
-	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	int res = diter->iter->vtable->seek(diter->iter, seek, flags);
+	trace_printf_key(&trace_refs, "iterator_seek: %s flags: %d: %d\n",
+			 seek ? seek : "", flags, res);
 	return res;
 }
 
 --git a/refs/files-backend.c b/refs/files-backend.c
index bf6f89b1d1..0e63013319 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -929,11 +929,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, unsigned int flags)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	return ref_iterator_seek(iter->iter0, prefix);
+	return ref_iterator_seek(iter->iter0, seek, flags);
 }
 
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -2316,7 +2316,8 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				      const char *prefix UNUSED)
+				      const char *seek UNUSED,
+				      unsigned int flags UNUSED)
 {
 	BUG("ref_iterator_seek() called for reflog_iterator");
 }
 --git a/refs/iterator.c b/refs/iterator.c
index 766d96e795..f2364bd6e7 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,10 +15,10 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix)
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
+		      unsigned int flags)
 {
-	return ref_iterator->vtable->seek(ref_iterator, prefix);
+	return ref_iterator->vtable->seek(ref_iterator, seek, flags);
 }
 
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -57,7 +57,8 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 }
 
 static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				   const char *prefix UNUSED)
+				   const char *seek UNUSED,
+				   unsigned int flags UNUSED)
 {
 	return 0;
 }
@@ -224,7 +225,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, unsigned int flags)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
@@ -234,11 +235,11 @@ static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	iter->iter0 = iter->iter0_owned;
 	iter->iter1 = iter->iter1_owned;
 
-	ret = ref_iterator_seek(iter->iter0, prefix);
+	ret = ref_iterator_seek(iter->iter0, seek, flags);
 	if (ret < 0)
 		return ret;
 
-	ret = ref_iterator_seek(iter->iter1, prefix);
+	ret = ref_iterator_seek(iter->iter1, seek, flags);
 	if (ret < 0)
 		return ret;
 
@@ -407,13 +408,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *seek, unsigned int flags)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	return ref_iterator_seek(iter->iter0, prefix);
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		free(iter->prefix);
+		iter->prefix = xstrdup_or_null(seek);
+	}
+	return ref_iterator_seek(iter->iter0, seek, flags);
 }
 
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 --git a/refs/packed-backend.c b/refs/packed-backend.c
index 7fd73a0e6d..11a363d246 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1004,19 +1004,23 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *seek, unsigned int flags)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
 	const char *start;
 
-	if (prefix && *prefix)
-		start = find_reference_location(iter->snapshot, prefix, 0);
+	if (seek && *seek)
+		start = find_reference_location(iter->snapshot, seek, 0);
 	else
 		start = iter->snapshot->start;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
+	/* Unset any previously set prefix */
+	FREE_AND_NULL(iter->prefix);
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
+		iter->prefix = xstrdup_or_null(seek);
+
 	iter->pos = start;
 	iter->eof = iter->snapshot->eof;
 
@@ -1194,7 +1198,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (packed_ref_iterator_seek(&iter->base, prefix,
+				     REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index 8aaffa8c6b..01dfbeb50c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -434,11 +434,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
-static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+static int cache_ref_iterator_set_prefix(struct cache_ref_iterator *iter,
+					 const char *prefix)
 {
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
 	struct cache_ref_iterator_level *level;
 	struct ref_dir *dir;
 
@@ -469,6 +467,82 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	return 0;
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *seek, unsigned int flags)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		return cache_ref_iterator_set_prefix(iter, seek);
+	} else if (seek && *seek) {
+		struct cache_ref_iterator_level *level;
+		const char *slash = seek;
+		struct ref_dir *dir;
+
+		dir = get_ref_dir(iter->cache->root);
+
+		if (iter->prime_dir)
+			prime_ref_dir(dir, seek);
+
+		iter->levels_nr = 1;
+		level = &iter->levels[0];
+		level->index = -1;
+		level->dir = dir;
+
+		/* Unset any previously set prefix */
+		FREE_AND_NULL(iter->prefix);
+
+		/*
+		 * Breakdown the provided seek path and assign the correct
+		 * indexing to each level as needed.
+		 */
+		do {
+			int len, idx;
+			int cmp = 0;
+
+			sort_ref_dir(dir);
+
+			slash = strchr(slash, '/');
+			len = slash ? slash - seek : (int)strlen(seek);
+
+			for (idx = 0; idx < dir->nr; idx++) {
+				cmp = strncmp(seek, dir->entries[idx]->name, len);
+				if (cmp <= 0)
+					break;
+			}
+			/* don't overflow the index */
+			idx = idx >= dir->nr ? dir->nr - 1 : idx;
+
+			if (slash)
+				slash = slash + 1;
+
+			level->index = idx;
+			if (dir->entries[idx]->flag & REF_DIR) {
+				/* push down a level */
+				dir = get_ref_dir(dir->entries[idx]);
+
+				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+					   iter->levels_alloc);
+				level = &iter->levels[iter->levels_nr++];
+				level->dir = dir;
+				level->index = -1;
+			} else {
+				/* reduce the index so the leaf node is iterated over */
+				if (cmp <= 0 && !slash)
+					level->index = idx - 1;
+				/*
+				 * while the seek path may not be exhausted, our
+				 * match is exhausted at a leaf node.
+				 */
+				break;
+			}
+		} while (slash);
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -509,7 +583,8 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 	iter->cache = cache;
 	iter->prime_dir = prime_dir;
 
-	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (cache_ref_iterator_seek(&iter->base, prefix,
+				    REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index 03f5df04d5..6376a3b379 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -353,11 +353,12 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 /*
- * Seek the iterator to the first reference matching the given prefix. Should
- * behave the same as if a new iterator was created with the same prefix.
+ * Seek the iterator to the first matching reference. If set_prefix is set,
+ * it would behave the same as if a new iterator was created with the same
+ * prefix.
  */
 typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
-				 const char *prefix);
+				 const char *seek, unsigned int flags);
 
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
 --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 4c3817f4ec..d627221b65 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -719,15 +719,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				      const char *prefix)
+				      const char *seek, unsigned int flags)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
-	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+	/* Unset any previously set prefix */
+	FREE_AND_NULL(iter->prefix);
+	iter->prefix_len = 0;
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		iter->prefix = xstrdup_or_null(seek);
+		iter->prefix_len = seek ? strlen(seek) : 0;
+	}
+	iter->err = reftable_iterator_seek_ref(&iter->iter, seek);
 
 	return iter->err;
 }
@@ -839,7 +844,8 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	ret = reftable_ref_iterator_seek(&iter->base, prefix);
+	ret = reftable_ref_iterator_seek(&iter->base, prefix,
+					 REF_ITERATOR_SEEK_SET_PREFIX);
 	if (ret)
 		goto done;
 
@@ -2042,7 +2048,8 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-					 const char *prefix UNUSED)
+					 const char *seek UNUSED,
+					 unsigned int flags UNUSED)
 {
 	BUG("reftable reflog iterator cannot be seeked");
 	return -1;
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v2 4/4] for-each-ref: introduce a '--skip-until' option
  2025-07-04 13:02 ` [PATCH v2 " Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-07-04 13:02   ` [PATCH v2 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-04 13:02   ` Karthik Nayak
  2025-07-07 15:30     ` Junio C Hamano
  2025-07-04 13:41   ` [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Andreas Schwab
  4 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-04 13:02 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
The previous commit added 'seek' functionality to the reference
backends. Utilize this and expose a '--skip-until' option in
'git-for-each-ref(1)'. When used, the reference iteration seeks to the
first matching reference and iterates from there onward.
This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --skip-until=refs/heads/branch-100
    git for-each-ref --count=100 --skip-until=refs/heads/branch-200
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  |   6 +-
               |   8 ++
                         |  61 ++++++++----
                         |   1 +
       | 188 ++++++++++++++++++++++++++++++++++++
 5 files changed, 243 insertions(+), 21 deletions(-)
 --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
index 5ef89fc0fe..e369fee9a1 100644
--- a/Documentation/git-for-each-ref.adoc
+++ b/Documentation/git-for-each-ref.adoc
@@ -14,7 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
-		   [--exclude=<pattern> ...]
+		   [--exclude=<pattern> ...] [--skip-until=<pattern>]
 
 DESCRIPTION
 -----------
@@ -108,6 +108,10 @@ TAB %(refname)`.
 --include-root-refs::
 	List root refs (HEAD and pseudorefs) apart from regular refs.
 
+--skip-until::
+    Skip references up to but excluding the specified pattern. Cannot be used
+    with general pattern matching or custom sort options.
+
 FIELD NAMES
 -----------
 
 --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 3d2207ec77..aee2e7489a 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -13,6 +13,7 @@ static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
+	N_("git for-each-ref [--skip-until <pattern>]"),
 	NULL
 };
 
@@ -44,6 +45,7 @@ int cmd_for_each_ref(int argc,
 		OPT_GROUP(""),
 		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
+		OPT_STRING(  0 , "skip-until", &filter.seek, N_("skip-until"), N_("skip references until")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
@@ -79,6 +81,9 @@ int cmd_for_each_ref(int argc,
 	if (verify_ref_format(&format))
 		usage_with_options(for_each_ref_usage, opts);
 
+	if (filter.seek && sorting_options.nr > 1)
+		die(_("cannot use --skip-until custom sort options"));
+
 	sorting = ref_sorting_options(&sorting_options);
 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
 	filter.ignore_case = icase;
@@ -100,6 +105,9 @@ int cmd_for_each_ref(int argc,
 		filter.name_patterns = argv;
 	}
 
+	if (filter.seek && filter.name_patterns && filter.name_patterns[0])
+		die(_("cannot use --skip-until with patterns"));
+
 	if (include_root_refs)
 		flags |= FILTER_REFS_ROOT_REFS | FILTER_REFS_DETACHED_HEAD;
 
 --git a/ref-filter.c b/ref-filter.c
index 7a274633cf..56bb5312bd 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2692,10 +2692,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 				       each_ref_fn cb,
 				       void *cb_data)
 {
+	struct ref_iterator *iter;
+	int flags = 0, ret = 0;
+
 	if (filter->kind & FILTER_REFS_ROOT_REFS) {
 		/* In this case, we want to print all refs including root refs. */
-		return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
-						       cb, cb_data);
+		flags |= DO_FOR_EACH_INCLUDE_ROOT_REFS;
+		goto non_prefix_iter;
 	}
 
 	if (!filter->match_as_path) {
@@ -2704,8 +2707,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	if (filter->ignore_case) {
@@ -2714,20 +2716,28 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", filter->exclude.v, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
 						 filter->exclude.v,
 						 cb, cb_data);
+
+non_prefix_iter:
+	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
+				       NULL, 0, flags);
+	if (filter->seek)
+		ret = ref_iterator_seek(iter, filter->seek, 0);
+	if (ret)
+		return ret;
+
+	return do_for_each_ref_iterator(iter, cb, cb_data);
 }
 
 /*
@@ -3197,9 +3207,11 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 	init_contains_cache(&filter->internal.no_contains_cache);
 
 	/*  Simple per-ref filtering */
-	if (!filter->kind)
+	if (!filter->kind) {
 		die("filter_refs: invalid type");
-	else {
+	} else {
+		const char *prefix = NULL;
+
 		/*
 		 * For common cases where we need only branches or remotes or tags,
 		 * we only iterate through those refs. If a mix of refs is needed,
@@ -3207,19 +3219,28 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/heads/", NULL,
-						       fn, cb_data);
+			prefix = "refs/heads/";
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/remotes/", NULL,
-						       fn, cb_data);
+			prefix = "refs/remotes/";
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/tags/", NULL, fn,
-						       cb_data);
-		else if (filter->kind & FILTER_REFS_REGULAR)
+			prefix = "refs/tags/";
+
+		if (prefix) {
+			struct ref_iterator *iter;
+
+			iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
+						       "", NULL, 0, 0);
+
+			if (filter->seek)
+				ret = ref_iterator_seek(iter, filter->seek, 0);
+			else if (prefix)
+				ret = ref_iterator_seek(iter, prefix, 1);
+
+			if (!ret)
+				ret = do_for_each_ref_iterator(iter, fn, cb_data);
+		} else if (filter->kind & FILTER_REFS_REGULAR) {
 			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
+		}
 
 		/*
 		 * When printing all ref types, HEAD is already included,
 --git a/ref-filter.h b/ref-filter.h
index c98c4fbd4c..9e97c65bc2 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -64,6 +64,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	const char *seek;
 	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index bb02b86c16..3f1823e95b 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -541,4 +541,192 @@ test_expect_success 'validate worktree atom' '
 	test_cmp expect actual
 '
 
+test_expect_success 'skip until with empty value' '
+	cat >expect <<-\EOF &&
+	refs/heads/main
+	refs/heads/main_worktree
+	refs/heads/side
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until="" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until to a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until to a specific reference with partial match' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/sp >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until just behind a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/parrot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until to specific directory' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until to specific directory with trailing slash' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/lost >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until just behind a specific directory' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until overflow specific reference length' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spotnew >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until overflow specific reference path' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot/new >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until used with a pattern' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --skip-until with patterns
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot refs/tags 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'skip until used with custom sort order' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --skip-until custom sort options
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot --sort=author 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v2 4/4] for-each-ref: introduce a '--skip-until' option
  2025-07-04 13:02   ` [PATCH v2 4/4] for-each-ref: introduce a '--skip-until' option Karthik Nayak
@ 2025-07-07 15:30     ` Junio C Hamano
  2025-07-07 18:31       ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-07 15:30 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps
Karthik Nayak <karthik.188@gmail.com> writes:
> +	if (filter.seek && sorting_options.nr > 1)
> +		die(_("cannot use --skip-until custom sort options"));
Missing "with" before "custom sort".
When I commented on the previous iteration about sorting, I didn't
mean to suggest making them incompatible---it may have some use case
to grab a batch out of the underlying refstore, sort refs in that
batch, and then show them.  But from usability's point of view, I
tend to agree with this design decision.  Such an unnatural batching
and sorting operation is probably not worth supporting.
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v2 4/4] for-each-ref: introduce a '--skip-until' option
  2025-07-07 15:30     ` Junio C Hamano
@ 2025-07-07 18:31       ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-07 18:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps
[-- Attachment #1: Type: text/plain, Size: 881 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> +	if (filter.seek && sorting_options.nr > 1)
>> +		die(_("cannot use --skip-until custom sort options"));
>
> Missing "with" before "custom sort".
>
Thanks, will fix in the next version.
> When I commented on the previous iteration about sorting, I didn't
> mean to suggest making them incompatible---it may have some use case
> to grab a batch out of the underlying refstore, sort refs in that
> batch, and then show them.  But from usability's point of view, I
> tend to agree with this design decision.  Such an unnatural batching
> and sorting operation is probably not worth supporting.
I think so too, there might be some usecase, when that usecase is more
concrete we can probably re-visit this.
For now, it is confusing and it is just easier to not support them
together.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-04 13:02 ` [PATCH v2 " Karthik Nayak
                     ` (3 preceding siblings ...)
  2025-07-04 13:02   ` [PATCH v2 4/4] for-each-ref: introduce a '--skip-until' option Karthik Nayak
@ 2025-07-04 13:41   ` Andreas Schwab
  2025-07-04 14:02     ` Karthik Nayak
  4 siblings, 1 reply; 102+ messages in thread
From: Andreas Schwab @ 2025-07-04 13:41 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, ps
On Jul 04 2025, Karthik Nayak wrote:
> This series adds a '--skip-until' option in 'git-for-each-ref(1)'. When
> used, the reference iteration seeks to the first matching reference and
> iterates from there onward.
I would have named the option --start-with.  It has the advantage that
it is clear whether the matched ref is included.
-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-04 13:41   ` [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Andreas Schwab
@ 2025-07-04 14:02     ` Karthik Nayak
  2025-07-04 14:52       ` Andreas Schwab
  2025-07-04 16:39       ` Junio C Hamano
  0 siblings, 2 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-04 14:02 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: git, gitster, ps
[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]
Andreas Schwab <schwab@linux-m68k.org> writes:
> On Jul 04 2025, Karthik Nayak wrote:
>
>> This series adds a '--skip-until' option in 'git-for-each-ref(1)'. When
>> used, the reference iteration seeks to the first matching reference and
>> iterates from there onward.
>
> I would have named the option --start-with.  It has the advantage that
> it is clear whether the matched ref is included.
>
We did discuss this internally, some other names we thought of:
--skip-to
--start-after
--start-from
--seek
--skip-before
--start-at
I think I was a bit against '--start-from' and '--start-at', because
they imply that the reference provided must exist.
Consider the example
  $ git for-each-ref
  refs/heads/bar
  refs/heads/foo
  refs/heads/main
  $ git for-each-ref --seek=refs/heads/cat
  refs/heads/foo
  refs/heads/main
You can see that the reference doesn't have to exist. So implying that
it should can be a bit confusing.
But I'm open to changing this, if we can conclude on any flag name...
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-04 14:02     ` Karthik Nayak
@ 2025-07-04 14:52       ` Andreas Schwab
  2025-07-04 14:58         ` Karthik Nayak
  2025-07-04 16:39       ` Junio C Hamano
  1 sibling, 1 reply; 102+ messages in thread
From: Andreas Schwab @ 2025-07-04 14:52 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, ps
On Jul 04 2025, Karthik Nayak wrote:
> Consider the example
>
>   $ git for-each-ref
>   refs/heads/bar
>   refs/heads/foo
>   refs/heads/main
>
>   $ git for-each-ref --seek=refs/heads/cat
>   refs/heads/foo
>   refs/heads/main
>
> You can see that the reference doesn't have to exist.
That is even more confusing.  What is the first matching ref if none of
them match?  Doesn't that mean skipping _all_ refs?
-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-04 14:52       ` Andreas Schwab
@ 2025-07-04 14:58         ` Karthik Nayak
  2025-07-04 15:55           ` Andreas Schwab
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-04 14:58 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: git, gitster, ps
[-- Attachment #1: Type: text/plain, Size: 895 bytes --]
Andreas Schwab <schwab@linux-m68k.org> writes:
> On Jul 04 2025, Karthik Nayak wrote:
>
>> Consider the example
>>
>>   $ git for-each-ref
>>   refs/heads/bar
>>   refs/heads/foo
>>   refs/heads/main
>>
>>   $ git for-each-ref --seek=refs/heads/cat
>>   refs/heads/foo
>>   refs/heads/main
>>
>> You can see that the reference doesn't have to exist.
>
> That is even more confusing.  What is the first matching ref if none of
> them match?  Doesn't that mean skipping _all_ refs?
>
Well the idea is it would seek to the offset where the reference would
fit in.
This is to ensure that seeks to references which were deleted
concurrently doesn't leave the client hanging with no results while
paginating over all references.
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-04 14:58         ` Karthik Nayak
@ 2025-07-04 15:55           ` Andreas Schwab
  2025-07-07  8:52             ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Andreas Schwab @ 2025-07-04 15:55 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, ps
On Jul 04 2025, Karthik Nayak wrote:
> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> On Jul 04 2025, Karthik Nayak wrote:
>>
>>> Consider the example
>>>
>>>   $ git for-each-ref
>>>   refs/heads/bar
>>>   refs/heads/foo
>>>   refs/heads/main
>>>
>>>   $ git for-each-ref --seek=refs/heads/cat
>>>   refs/heads/foo
>>>   refs/heads/main
>>>
>>> You can see that the reference doesn't have to exist.
>>
>> That is even more confusing.  What is the first matching ref if none of
>> them match?  Doesn't that mean skipping _all_ refs?
>>
>
> Well the idea is it would seek to the offset where the reference would
> fit in.
>
> This is to ensure that seeks to references which were deleted
> concurrently doesn't leave the client hanging with no results while
> paginating over all references.
Then don't call it a pattern.  Pattern matching is a set operation,
independent of sorting.  What you really have is a marker that divides
the sorted list in two parts according to how the marker sorts.  And
that makes --start-with more descriptive and less ambiguous.
-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-04 15:55           ` Andreas Schwab
@ 2025-07-07  8:52             ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-07  8:52 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: git, gitster, ps
[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]
Andreas Schwab <schwab@linux-m68k.org> writes:
> On Jul 04 2025, Karthik Nayak wrote:
>
>> Andreas Schwab <schwab@linux-m68k.org> writes:
>>
>>> On Jul 04 2025, Karthik Nayak wrote:
>>>
>>>> Consider the example
>>>>
>>>>   $ git for-each-ref
>>>>   refs/heads/bar
>>>>   refs/heads/foo
>>>>   refs/heads/main
>>>>
>>>>   $ git for-each-ref --seek=refs/heads/cat
>>>>   refs/heads/foo
>>>>   refs/heads/main
>>>>
>>>> You can see that the reference doesn't have to exist.
>>>
>>> That is even more confusing.  What is the first matching ref if none of
>>> them match?  Doesn't that mean skipping _all_ refs?
>>>
>>
>> Well the idea is it would seek to the offset where the reference would
>> fit in.
>>
>> This is to ensure that seeks to references which were deleted
>> concurrently doesn't leave the client hanging with no results while
>> paginating over all references.
>
> Then don't call it a pattern.  Pattern matching is a set operation,
> independent of sorting.  What you really have is a marker that divides
> the sorted list in two parts according to how the marker sorts.  And
> that makes --start-with more descriptive and less ambiguous.
>
Fair enough. I'll change the documentation and description in the next
version.
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-04 14:02     ` Karthik Nayak
  2025-07-04 14:52       ` Andreas Schwab
@ 2025-07-04 16:39       ` Junio C Hamano
  2025-07-07  8:59         ` Karthik Nayak
  1 sibling, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-04 16:39 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Andreas Schwab, git, ps
Karthik Nayak <karthik.188@gmail.com> writes:
> I think I was a bit against '--start-from' and '--start-at', because
> they imply that the reference provided must exist.
It also implies that if the reference does exist, that would be the
first one that is shown.  But I do not think you want that, as ...
> Consider the example
>
>   $ git for-each-ref
>   refs/heads/bar
>   refs/heads/foo
>   refs/heads/main
... after a paging application starts from the beginning and showed
a single page of some items, it knows the "last" one it showed.
That last entry may have been refs/heads/bar.  The application may
not have seen the next entry (i.e. refs/heads/foo).  So if it has to
use '--start-at=refs/heads/bar', the first entry it gets from such a
request may be for refs/heads/bar again.  The application needs to
remember the "last" one it showed and skip that, which is a bit
awkward, isn't it?
>   $ git for-each-ref --seek=refs/heads/cat
>   refs/heads/foo
>   refs/heads/main
>
> You can see that the reference doesn't have to exist. So implying that
> it should can be a bit confusing.
For that reason, whatever verb you pick from seek or start or skip,
it would be great if the option name also made it explicit that the
named one, if exists, is not shown.  Conceptually, it is "skip
everything that sorts before the named item, including the named
item itself" that such a paging application would want, wouldn't it?
Thanks.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-04 16:39       ` Junio C Hamano
@ 2025-07-07  8:59         ` Karthik Nayak
  2025-07-07  9:45           ` Phillip Wood
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-07  8:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andreas Schwab, git, ps
[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> I think I was a bit against '--start-from' and '--start-at', because
>> they imply that the reference provided must exist.
>
> It also implies that if the reference does exist, that would be the
> first one that is shown.  But I do not think you want that, as ...
>
>> Consider the example
>>
>>   $ git for-each-ref
>>   refs/heads/bar
>>   refs/heads/foo
>>   refs/heads/main
>
>
> ... after a paging application starts from the beginning and showed
> a single page of some items, it knows the "last" one it showed.
> That last entry may have been refs/heads/bar.  The application may
> not have seen the next entry (i.e. refs/heads/foo).  So if it has to
> use '--start-at=refs/heads/bar', the first entry it gets from such a
> request may be for refs/heads/bar again.  The application needs to
> remember the "last" one it showed and skip that, which is a bit
> awkward, isn't it?
>
I do agree, I was modelling this after what would be the best approach
within the Git codebase. But I think it would be nicer for the clients
if we skip the provided reference.
>>   $ git for-each-ref --seek=refs/heads/cat
>>   refs/heads/foo
>>   refs/heads/main
>>
>> You can see that the reference doesn't have to exist. So implying that
>> it should can be a bit confusing.
>
> For that reason, whatever verb you pick from seek or start or skip,
> it would be great if the option name also made it explicit that the
> named one, if exists, is not shown.  Conceptually, it is "skip
> everything that sorts before the named item, including the named
> item itself" that such a paging application would want, wouldn't it?
>
> Thanks.
>
With that I think '--start-after' sounds like the best option. I'll
modify for the next version accordingly.
Thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-07  8:59         ` Karthik Nayak
@ 2025-07-07  9:45           ` Phillip Wood
  2025-07-08 11:39             ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Phillip Wood @ 2025-07-07  9:45 UTC (permalink / raw)
  To: Karthik Nayak, Junio C Hamano; +Cc: Andreas Schwab, git, ps
On 07/07/2025 09:59, Karthik Nayak wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Karthik Nayak <karthik.188@gmail.com> writes:
> I do agree, I was modelling this after what would be the best approach
> within the Git codebase. 
That was my fear when I asked about using a numeric offset. Patrick has 
made a principled argument for using a ref name rather than a numeric 
ofsset - I think you should build the motivation for this series around 
that and the documentation should explain the implications of references 
being added and deleted while paging them.
> With that I think '--start-after' sounds like the best option. I'll
> modify for the next version accordingly.
That sounds like a good name
Thanks
Phillip
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v2 0/4] for-each-ref: introduce seeking functionality via '--skip-until'
  2025-07-07  9:45           ` Phillip Wood
@ 2025-07-08 11:39             ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-08 11:39 UTC (permalink / raw)
  To: phillip.wood, Junio C Hamano; +Cc: Andreas Schwab, git, ps
[-- Attachment #1: Type: text/plain, Size: 921 bytes --]
Phillip Wood <phillip.wood123@gmail.com> writes:
> On 07/07/2025 09:59, Karthik Nayak wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Karthik Nayak <karthik.188@gmail.com> writes:
>> I do agree, I was modelling this after what would be the best approach
>> within the Git codebase.
>
> That was my fear when I asked about using a numeric offset. Patrick has
> made a principled argument for using a ref name rather than a numeric
> ofsset - I think you should build the motivation for this series around
> that and the documentation should explain the implications of references
> being added and deleted while paging them.
>
Yeah, I'll add something on those lines in the next version. I do
appreciate these checks/questions.
>> With that I think '--start-after' sounds like the best option. I'll
>> modify for the next version accordingly.
> That sounds like a good name
>
Thanks!
> Thanks
>
> Phillip
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH v3 0/4] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
                   ` (7 preceding siblings ...)
  2025-07-04 13:02 ` [PATCH v2 " Karthik Nayak
@ 2025-07-08 13:47 ` Karthik Nayak
  2025-07-08 13:47   ` [PATCH v3 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
                     ` (3 more replies)
  2025-07-11 16:18 ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
  10 siblings, 4 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-08 13:47 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
This series adds a '--start-after' option in 'git-for-each-ref(1)'. When
used, the reference iteration seeks to first reference following the
marker alphabetically. When paging, it should be noted that references
may be deleted, modified or added between invocations. Output will only
yield those references which follow the marker lexicographically. If the
marker does not exist, output begins from the first reference that would
come after it alphabetically.
This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200
To add this functionality, we expose the `ref_iterator` outside the
'refs/' namespace and modify the `ref_iterator_seek()` to actually seek
to a given reference and only set the prefix when the `set_prefix` field
is set.
On the reftable and packed backend, the changes are simple. But since
the files backend uses 'ref-cache' for reference handling, the changes
there are a little more involved, since we need to setup the right
levels and the indexing.
Initially I was also planning to cleanup all the `refs_for_each...()`
functions in 'refs.h' by simply using the iterator, but this bloated the
series. So I've left that for another day.
Changes in v3:
- Change the working of the command to exclude the marker provided. With
  this rename the flag to '--start-after'.
- Extend the documentation to add a note about concurrent modifications
  to the reference database.
- Link to v2: https://lore.kernel.org/r/20250704-306-git-for-each-ref-pagination-v2-0-bcde14acdd81@gmail.com
Changes in v2:
- Modify 'ref_iterator_seek()' to take in flags instead of a
  'set_prefix' variable. This improves readability, where users would
  use the 'REF_ITERATOR_SEEK_SET_PREFIX' instead of simply passing '1'.
- When the set prefix flag isn't usage, reset any previously set prefix.
  This ensures that the internal prefix state is always reset whenever
  we seek and unifies the behavior between 'ref_iterator_seek' and
  'ref_iterator_begin'.
- Don't allow '--skip-until' to be run with '--sort', since the seeking
  always takes place before any sorting and this can be confusing.
- Some styling fixes:
  - Remove extra newline
  - Skip braces around single lined if...else clause
  - Add braces around 'if' clause
  - Fix indentation
- Link to v1: https://lore.kernel.org/git/20250701-306-git-for-each-ref-pagination-v1-0-4f0ae7c0688f@gmail.com/
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-for-each-ref.adoc |  11 +-
 builtin/for-each-ref.c              |   8 ++
 ref-filter.c                        |  80 +++++++++++----
 ref-filter.h                        |   1 +
 refs.c                              |   6 +-
 refs.h                              | 158 +++++++++++++++++++++++++++++
 refs/debug.c                        |   7 +-
 refs/files-backend.c                |   7 +-
 refs/iterator.c                     |  26 +++--
 refs/packed-backend.c               |  17 ++--
 refs/ref-cache.c                    |  99 ++++++++++++++----
 refs/ref-cache.h                    |   7 --
 refs/refs-internal.h                | 152 ++--------------------------
 refs/reftable-backend.c             |  21 ++--
 t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
 15 files changed, 568 insertions(+), 226 deletions(-)
Karthik Nayak (4):
      refs: expose `ref_iterator` via 'refs.h'
      ref-cache: remove unused function 'find_ref_entry()'
      refs: selectively set prefix in the seek functions
      for-each-ref: introduce a '--start-after' option
Range-diff versus v2:
1:  c0ce873c35 = 1:  dbb03c2aa9 refs: expose `ref_iterator` via 'refs.h'
2:  2c50d1eba2 = 2:  fa5a0cb722 ref-cache: remove unused function 'find_ref_entry()'
3:  fae849749f = 3:  9940d390cc refs: selectively set prefix in the seek functions
4:  a0725a6647 ! 4:  ebe864095a for-each-ref: introduce a '--skip-until' option
    @@ Metadata
     Author: Karthik Nayak <karthik.188@gmail.com>
     
      ## Commit message ##
    -    for-each-ref: introduce a '--skip-until' option
    +    for-each-ref: introduce a '--start-after' option
     
         The `git-for-each-ref(1)` command is used to iterate over references
         present in a repository. In large repositories with millions of
    @@ Commit message
         through results.
     
         The previous commit added 'seek' functionality to the reference
    -    backends. Utilize this and expose a '--skip-until' option in
    +    backends. Utilize this and expose a '--start-after' option in
         'git-for-each-ref(1)'. When used, the reference iteration seeks to the
    -    first matching reference and iterates from there onward.
    +    lexicographically next reference and iterates from there onward.
     
         This enables efficient pagination workflows like:
             git for-each-ref --count=100
    -        git for-each-ref --count=100 --skip-until=refs/heads/branch-100
    -        git for-each-ref --count=100 --skip-until=refs/heads/branch-200
    +        git for-each-ref --count=100 --start-after=refs/heads/branch-100
    +        git for-each-ref --count=100 --start-after=refs/heads/branch-200
    +
    +    Since the reference iterators only allow seeking to a specified marker
    +    via the `ref_iterator_seek()`, we introduce a helper function
    +    `start_ref_iterator_after()`, which seeks to next reference by simply
    +    adding (char) 1 to the marker.
    +
    +    We must note that pagination always continues from the provided marker,
    +    as such any concurrent reference updates lexicographically behind the
    +    marker will not be output. Document the same.
     
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
    @@ Documentation/git-for-each-ref.adoc: SYNOPSIS
      		   [--merged[=<object>]] [--no-merged[=<object>]]
      		   [--contains[=<object>]] [--no-contains[=<object>]]
     -		   [--exclude=<pattern> ...]
    -+		   [--exclude=<pattern> ...] [--skip-until=<pattern>]
    ++		   [--exclude=<pattern> ...] [--start-after=<marker>]
      
      DESCRIPTION
      -----------
    @@ Documentation/git-for-each-ref.adoc: TAB %(refname)`.
      --include-root-refs::
      	List root refs (HEAD and pseudorefs) apart from regular refs.
      
    -+--skip-until::
    -+    Skip references up to but excluding the specified pattern. Cannot be used
    -+    with general pattern matching or custom sort options.
    ++--start-after::
    ++    Allows paginating the output by skipping references up to and including the
    ++    specified marker. When paging, it should be noted that references may be
    ++    deleted, modified or added between invocations. Output will only yield those
    ++    references which follow the marker lexicographically. If the marker does not
    ++    exist, output begins from the first reference that would come after it
    ++    alphabetically. Cannot be used with general pattern matching or custom
    ++    sort options.
     +
      FIELD NAMES
      -----------
    @@ builtin/for-each-ref.c: static char const * const for_each_ref_usage[] = {
      	N_("git for-each-ref [--points-at <object>]"),
      	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
      	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
    -+	N_("git for-each-ref [--skip-until <pattern>]"),
    ++	N_("git for-each-ref [--start-after <marker>]"),
      	NULL
      };
      
    @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
      		OPT_GROUP(""),
      		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
      		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
    -+		OPT_STRING(  0 , "skip-until", &filter.seek, N_("skip-until"), N_("skip references until")),
    ++		OPT_STRING(  0 , "start-after", &filter.start_after, N_("start-start"), N_("start iteration after the provided marker")),
      		OPT__COLOR(&format.use_color, N_("respect format colors")),
      		OPT_REF_FILTER_EXCLUDE(&filter),
      		OPT_REF_SORT(&sorting_options),
    @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
      	if (verify_ref_format(&format))
      		usage_with_options(for_each_ref_usage, opts);
      
    -+	if (filter.seek && sorting_options.nr > 1)
    -+		die(_("cannot use --skip-until custom sort options"));
    ++	if (filter.start_after && sorting_options.nr > 1)
    ++		die(_("cannot use --start-after with custom sort options"));
     +
      	sorting = ref_sorting_options(&sorting_options);
      	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
    @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
      		filter.name_patterns = argv;
      	}
      
    -+	if (filter.seek && filter.name_patterns && filter.name_patterns[0])
    -+		die(_("cannot use --skip-until with patterns"));
    ++	if (filter.start_after && filter.name_patterns && filter.name_patterns[0])
    ++		die(_("cannot use --start-after with patterns"));
     +
      	if (include_root_refs)
      		flags |= FILTER_REFS_ROOT_REFS | FILTER_REFS_DETACHED_HEAD;
      
     
      ## ref-filter.c ##
    +@@ ref-filter.c: static int filter_exclude_match(struct ref_filter *filter, const char *refname)
    + 	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
    + }
    + 
    ++/*
    ++ * We need to seek to the reference right after a given marker but excluding any
    ++ * matching references. So we seek to the lexicographically next reference.
    ++ */
    ++static int start_ref_iterator_after(struct ref_iterator *iter, const char *marker)
    ++{
    ++	struct strbuf sb = STRBUF_INIT;
    ++	int ret;
    ++
    ++	strbuf_addstr(&sb, marker);
    ++	strbuf_addch(&sb, 1);
    ++
    ++	ret = ref_iterator_seek(iter, sb.buf, 0);
    ++
    ++	strbuf_release(&sb);
    ++	return ret;
    ++}
    ++
    + /*
    +  * This is the same as for_each_fullref_in(), but it tries to iterate
    +  * only over the patterns we'll care about. Note that it _doesn't_ do a full
     @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      				       each_ref_fn cb,
      				       void *cb_data)
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
     +non_prefix_iter:
     +	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
     +				       NULL, 0, flags);
    -+	if (filter->seek)
    -+		ret = ref_iterator_seek(iter, filter->seek, 0);
    ++	if (filter->start_after)
    ++		ret = start_ref_iterator_after(iter, filter->start_after);
    ++
     +	if (ret)
     +		return ret;
     +
    @@ ref-filter.c: static int do_filter_refs(struct ref_filter *filter, unsigned int
     +			iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
     +						       "", NULL, 0, 0);
     +
    -+			if (filter->seek)
    -+				ret = ref_iterator_seek(iter, filter->seek, 0);
    ++			if (filter->start_after)
    ++				ret = start_ref_iterator_after(iter, filter->start_after);
     +			else if (prefix)
     +				ret = ref_iterator_seek(iter, prefix, 1);
     +
    @@ ref-filter.h: struct ref_array {
      
      struct ref_filter {
      	const char **name_patterns;
    -+	const char *seek;
    ++	const char *start_after;
      	struct strvec exclude;
      	struct oid_array points_at;
      	struct commit_list *with_commit;
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
      	test_cmp expect actual
      '
      
    -+test_expect_success 'skip until with empty value' '
    ++test_expect_success 'start after with empty value' '
     +	cat >expect <<-\EOF &&
     +	refs/heads/main
     +	refs/heads/main_worktree
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until="" >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after="" >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until to a specific reference' '
    ++test_expect_success 'start after a specific reference' '
     +	cat >expect <<-\EOF &&
    -+	refs/odd/spot
     +	refs/tags/annotated-tag
     +	refs/tags/doubly-annotated-tag
     +	refs/tags/doubly-signed-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until to a specific reference with partial match' '
    ++test_expect_success 'start after a specific reference with partial match' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/sp >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/sp >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until just behind a specific reference' '
    ++test_expect_success 'start after, just behind a specific reference' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/parrot >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/parrot >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until to specific directory' '
    ++test_expect_success 'start after with specific directory match' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until to specific directory with trailing slash' '
    ++test_expect_success 'start after with specific directory and trailing slash' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/lost >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until just behind a specific directory' '
    ++test_expect_success 'start after, just behind a specific directory' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/ >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until overflow specific reference length' '
    ++test_expect_success 'start after, overflow specific reference length' '
     +	cat >expect <<-\EOF &&
     +	refs/tags/annotated-tag
     +	refs/tags/doubly-annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spotnew >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/spotnew >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until overflow specific reference path' '
    ++test_expect_success 'start after, overflow specific reference path' '
     +	cat >expect <<-\EOF &&
     +	refs/tags/annotated-tag
     +	refs/tags/doubly-annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot/new >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot/new >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'start after, last reference' '
    ++	cat >expect <<-\EOF &&
    ++	EOF
    ++	git for-each-ref --format="%(refname)" --start-after=refs/tags/two >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until used with a pattern' '
    ++test_expect_success 'start after used with a pattern' '
     +	cat >expect <<-\EOF &&
    -+	fatal: cannot use --skip-until with patterns
    ++	fatal: cannot use --start-after with patterns
     +	EOF
    -+	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot refs/tags 2>actual &&
    ++	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot refs/tags 2>actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until used with custom sort order' '
    ++test_expect_success 'start after used with custom sort order' '
     +	cat >expect <<-\EOF &&
    -+	fatal: cannot use --skip-until custom sort options
    ++	fatal: cannot use --start-after with custom sort options
     +	EOF
    -+	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot --sort=author 2>actual &&
    ++	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot --sort=author 2>actual &&
     +	test_cmp expect actual
     +'
     +
base-commit: cf6f63ea6bf35173e02e18bdc6a4ba41288acff9
change-id: 20250605-306-git-for-each-ref-pagination-0ba8a29ae646
Thanks
- Karthik
^ permalink raw reply	[flat|nested] 102+ messages in thread* [PATCH v3 1/4] refs: expose `ref_iterator` via 'refs.h'
  2025-07-08 13:47 ` [PATCH v3 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
@ 2025-07-08 13:47   ` Karthik Nayak
  2025-07-08 13:47   ` [PATCH v3 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-08 13:47 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The `ref_iterator` is an internal structure to the 'refs/'
sub-directory, which allows iteration over refs. All reference iteration
is built on top of these iterators.
External clients of the 'refs' subsystem use the various
'refs_for_each...()' functions to iterate over refs. However since these
are wrapper functions, each combination of functionality requires a new
wrapper function. This is not feasible as the functions pile up with the
increase in requirements. Expose the internal reference iterator, so
advanced users can mix and match options as needed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++
  | 145 +-------------------------------------------------
 2 files changed, 149 insertions(+), 143 deletions(-)
 --git a/refs.h b/refs.h
index 46a6008e07..7c21aaef3d 100644
--- a/refs.h
+++ b/refs.h
@@ -1190,4 +1190,151 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 				    unsigned int flags,
 				    struct strbuf *err);
 
+/*
+ * Reference iterators
+ *
+ * A reference iterator encapsulates the state of an in-progress
+ * iteration over references. Create an instance of `struct
+ * ref_iterator` via one of the functions in this module.
+ *
+ * A freshly-created ref_iterator doesn't yet point at a reference. To
+ * advance the iterator, call ref_iterator_advance(). If successful,
+ * this sets the iterator's refname, oid, and flags fields to describe
+ * the next reference and returns ITER_OK. The data pointed at by
+ * refname and oid belong to the iterator; if you want to retain them
+ * after calling ref_iterator_advance() again or calling
+ * ref_iterator_free(), you must make a copy. When the iteration has
+ * been exhausted, ref_iterator_advance() releases any resources
+ * associated with the iteration, frees the ref_iterator object, and
+ * returns ITER_DONE. If you want to abort the iteration early, call
+ * ref_iterator_free(), which also frees the ref_iterator object and
+ * any associated resources. If there was an internal error advancing
+ * to the next entry, ref_iterator_advance() aborts the iteration,
+ * frees the ref_iterator, and returns ITER_ERROR.
+ *
+ * The reference currently being looked at can be peeled by calling
+ * ref_iterator_peel(). This function is often faster than peel_ref(),
+ * so it should be preferred when iterating over references.
+ *
+ * Putting it all together, a typical iteration looks like this:
+ *
+ *     int ok;
+ *     struct ref_iterator *iter = ...;
+ *
+ *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+ *             if (want_to_stop_iteration()) {
+ *                     ok = ITER_DONE;
+ *                     break;
+ *             }
+ *
+ *             // Access information about the current reference:
+ *             if (!(iter->flags & REF_ISSYMREF))
+ *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
+ *
+ *             // If you need to peel the reference:
+ *             ref_iterator_peel(iter, &oid);
+ *     }
+ *
+ *     if (ok != ITER_DONE)
+ *             handle_error();
+ *     ref_iterator_free(iter);
+ */
+struct ref_iterator;
+
+/*
+ * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
+ * which feeds it).
+ */
+enum do_for_each_ref_flags {
+	/*
+	 * Include broken references in a do_for_each_ref*() iteration, which
+	 * would normally be omitted. This includes both refs that point to
+	 * missing objects (a true repository corruption), ones with illegal
+	 * names (which we prefer not to expose to callers), as well as
+	 * dangling symbolic refs (i.e., those that point to a non-existent
+	 * ref; this is not a corruption, but as they have no valid oid, we
+	 * omit them from normal iteration results).
+	 */
+	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
+
+	/*
+	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
+	 * Normally this will be used with a files ref_store, since that's
+	 * where all reference backends will presumably store their
+	 * per-worktree refs.
+	 */
+	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
+
+	/*
+	 * Omit dangling symrefs from output; this only has an effect with
+	 * INCLUDE_BROKEN, since they are otherwise not included at all.
+	 */
+	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
+
+	/*
+	 * Include root refs i.e. HEAD and pseudorefs along with the regular
+	 * refs.
+	 */
+	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
+};
+
+/*
+ * Return an iterator that goes over each reference in `refs` for
+ * which the refname begins with prefix. If trim is non-zero, then
+ * trim that many characters off the beginning of each refname.
+ * The output is ordered by refname.
+ */
+struct ref_iterator *refs_ref_iterator_begin(
+	struct ref_store *refs,
+	const char *prefix, const char **exclude_patterns,
+	int trim, enum do_for_each_ref_flags flags);
+
+/*
+ * Advance the iterator to the first or next item and return ITER_OK.
+ * If the iteration is exhausted, free the resources associated with
+ * the ref_iterator and return ITER_DONE. On errors, free the iterator
+ * resources and return ITER_ERROR. It is a bug to use ref_iterator or
+ * call this function again after it has returned ITER_DONE or
+ * ITER_ERROR.
+ */
+int ref_iterator_advance(struct ref_iterator *ref_iterator);
+
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
+/*
+ * If possible, peel the reference currently being viewed by the
+ * iterator. Return 0 on success.
+ */
+int ref_iterator_peel(struct ref_iterator *ref_iterator,
+		      struct object_id *peeled);
+
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
+
+/*
+ * The common backend for the for_each_*ref* functions. Call fn for
+ * each reference in iter. If the iterator itself ever returns
+ * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
+ * the iteration and return that value. Otherwise, return 0. In any
+ * case, free the iterator when done. This function is basically an
+ * adapter between the callback style of reference iteration and the
+ * iterator style.
+ */
+int do_for_each_ref_iterator(struct ref_iterator *iter,
+			     each_ref_fn fn, void *cb_data);
+
 #endif /* REFS_H */
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index f868870851..03f5df04d5 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -244,90 +244,8 @@ const char *find_descendant_ref(const char *dirname,
 #define SYMREF_MAXDEPTH 5
 
 /*
- * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
- * which feeds it).
- */
-enum do_for_each_ref_flags {
-	/*
-	 * Include broken references in a do_for_each_ref*() iteration, which
-	 * would normally be omitted. This includes both refs that point to
-	 * missing objects (a true repository corruption), ones with illegal
-	 * names (which we prefer not to expose to callers), as well as
-	 * dangling symbolic refs (i.e., those that point to a non-existent
-	 * ref; this is not a corruption, but as they have no valid oid, we
-	 * omit them from normal iteration results).
-	 */
-	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
-
-	/*
-	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
-	 * Normally this will be used with a files ref_store, since that's
-	 * where all reference backends will presumably store their
-	 * per-worktree refs.
-	 */
-	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
-
-	/*
-	 * Omit dangling symrefs from output; this only has an effect with
-	 * INCLUDE_BROKEN, since they are otherwise not included at all.
-	 */
-	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
-
-	/*
-	 * Include root refs i.e. HEAD and pseudorefs along with the regular
-	 * refs.
-	 */
-	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
-};
-
-/*
- * Reference iterators
- *
- * A reference iterator encapsulates the state of an in-progress
- * iteration over references. Create an instance of `struct
- * ref_iterator` via one of the functions in this module.
- *
- * A freshly-created ref_iterator doesn't yet point at a reference. To
- * advance the iterator, call ref_iterator_advance(). If successful,
- * this sets the iterator's refname, oid, and flags fields to describe
- * the next reference and returns ITER_OK. The data pointed at by
- * refname and oid belong to the iterator; if you want to retain them
- * after calling ref_iterator_advance() again or calling
- * ref_iterator_free(), you must make a copy. When the iteration has
- * been exhausted, ref_iterator_advance() releases any resources
- * associated with the iteration, frees the ref_iterator object, and
- * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_free(), which also frees the ref_iterator object and
- * any associated resources. If there was an internal error advancing
- * to the next entry, ref_iterator_advance() aborts the iteration,
- * frees the ref_iterator, and returns ITER_ERROR.
- *
- * The reference currently being looked at can be peeled by calling
- * ref_iterator_peel(). This function is often faster than peel_ref(),
- * so it should be preferred when iterating over references.
- *
- * Putting it all together, a typical iteration looks like this:
- *
- *     int ok;
- *     struct ref_iterator *iter = ...;
- *
- *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = ITER_DONE;
- *                     break;
- *             }
- *
- *             // Access information about the current reference:
- *             if (!(iter->flags & REF_ISSYMREF))
- *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
- *
- *             // If you need to peel the reference:
- *             ref_iterator_peel(iter, &oid);
- *     }
- *
- *     if (ok != ITER_DONE)
- *             handle_error();
- *     ref_iterator_free(iter);
+ * Data structure for holding a reference iterator. See refs.h for
+ * more details and usage instructions.
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -337,42 +255,6 @@ struct ref_iterator {
 	unsigned int flags;
 };
 
-/*
- * Advance the iterator to the first or next item and return ITER_OK.
- * If the iteration is exhausted, free the resources associated with
- * the ref_iterator and return ITER_DONE. On errors, free the iterator
- * resources and return ITER_ERROR. It is a bug to use ref_iterator or
- * call this function again after it has returned ITER_DONE or
- * ITER_ERROR.
- */
-int ref_iterator_advance(struct ref_iterator *ref_iterator);
-
-/*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
- * first reference again.
- *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
- *
- * Returns 0 on success, a negative error code otherwise.
- */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
-
-/*
- * If possible, peel the reference currently being viewed by the
- * iterator. Return 0 on success.
- */
-int ref_iterator_peel(struct ref_iterator *ref_iterator,
-		      struct object_id *peeled);
-
-/* Free the reference iterator and any associated resources. */
-void ref_iterator_free(struct ref_iterator *ref_iterator);
-
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
  * returns ITER_DONE).
@@ -384,17 +266,6 @@ struct ref_iterator *empty_ref_iterator_begin(void);
  */
 int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
 
-/*
- * Return an iterator that goes over each reference in `refs` for
- * which the refname begins with prefix. If trim is non-zero, then
- * trim that many characters off the beginning of each refname.
- * The output is ordered by refname.
- */
-struct ref_iterator *refs_ref_iterator_begin(
-		struct ref_store *refs,
-		const char *prefix, const char **exclude_patterns,
-		int trim, enum do_for_each_ref_flags flags);
-
 /*
  * A callback function used to instruct merge_ref_iterator how to
  * interleave the entries from iter0 and iter1. The function should
@@ -520,18 +391,6 @@ struct ref_iterator_vtable {
  */
 extern struct ref_iterator *current_ref_iter;
 
-/*
- * The common backend for the for_each_*ref* functions. Call fn for
- * each reference in iter. If the iterator itself ever returns
- * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
- * the iteration and return that value. Otherwise, return 0. In any
- * case, free the iterator when done. This function is basically an
- * adapter between the callback style of reference iteration and the
- * iterator style.
- */
-int do_for_each_ref_iterator(struct ref_iterator *iter,
-			     each_ref_fn fn, void *cb_data);
-
 struct ref_store;
 
 /* refs backends */
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v3 2/4] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-08 13:47 ` [PATCH v3 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
  2025-07-08 13:47   ` [PATCH v3 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
@ 2025-07-08 13:47   ` Karthik Nayak
  2025-07-08 13:47   ` [PATCH v3 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
  2025-07-08 13:47   ` [PATCH v3 4/4] for-each-ref: introduce a '--start-after' option Karthik Nayak
  3 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-08 13:47 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The 'find_ref_entry' function is no longer used, so remove it.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  | 14 --------------
  |  7 -------
 2 files changed, 21 deletions(-)
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index c1f1bab1d5..8aaffa8c6b 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -194,20 +194,6 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir,
 	return dir;
 }
 
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
-{
-	int entry_index;
-	struct ref_entry *entry;
-	dir = find_containing_dir(dir, refname);
-	if (!dir)
-		return NULL;
-	entry_index = search_ref_dir(dir, refname, strlen(refname));
-	if (entry_index == -1)
-		return NULL;
-	entry = dir->entries[entry_index];
-	return (entry->flag & REF_DIR) ? NULL : entry;
-}
-
 /*
  * Emit a warning and return true iff ref1 and ref2 have the same name
  * and the same oid. Die if they have the same name but different
 --git a/refs/ref-cache.h b/refs/ref-cache.h
index 5f04e518c3..f635d2d824 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -201,13 +201,6 @@ void free_ref_cache(struct ref_cache *cache);
  */
 void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
 
-/*
- * Find the value entry with the given name in dir, sorting ref_dirs
- * and recursing into subdirectories as necessary.  If the name is not
- * found or it corresponds to a directory entry, return NULL.
- */
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
-
 /*
  * Start iterating over references in `cache`. If `prefix` is
  * specified, only include references whose names start with that
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-08 13:47 ` [PATCH v3 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
  2025-07-08 13:47   ` [PATCH v3 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
  2025-07-08 13:47   ` [PATCH v3 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
@ 2025-07-08 13:47   ` Karthik Nayak
  2025-07-10  6:44     ` Patrick Steinhardt
  2025-07-08 13:47   ` [PATCH v3 4/4] for-each-ref: introduce a '--start-after' option Karthik Nayak
  3 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-08 13:47 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.
However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.
Let's add a 'flags' field to the function, which when set with the
'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
iteration in-line with the existing behavior. Otherwise, the reference
backends will simply seek to the specified reference and clears any
previously set prefix. This allows users to start iteration from a
specific reference.
In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
then parse the provided seek string and set the required levels and
their indexes to ensure that seeking is possible.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                   |  6 ++--
                   | 29 +++++++++++------
             |  7 ++--
     |  7 ++--
          | 26 ++++++++-------
    | 17 ++++++----
         | 85 ++++++++++++++++++++++++++++++++++++++++++++++---
     |  7 ++--
  | 21 ++++++++----
 9 files changed, 155 insertions(+), 50 deletions(-)
 --git a/refs.c b/refs.c
index dce5c49ca2..243e6898b8 100644
--- a/refs.c
+++ b/refs.c
@@ -2666,12 +2666,12 @@ enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs
 		if (!initial_transaction) {
 			int ok;
 
-			if (!iter) {
+			if (!iter)
 				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 							       DO_FOR_EACH_INCLUDE_BROKEN);
-			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+			else if (ref_iterator_seek(iter, dirname.buf,
+						   REF_ITERATOR_SEEK_SET_PREFIX) < 0)
 				goto cleanup;
-			}
 
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 --git a/refs.h b/refs.h
index 7c21aaef3d..7852ad36f3 100644
--- a/refs.h
+++ b/refs.h
@@ -1299,21 +1299,32 @@ struct ref_iterator *refs_ref_iterator_begin(
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+enum ref_iterator_seek_flag {
+	/*
+	 * Also set the seek pattern as a prefix for iteration. This ensures
+	 * that only references which match the prefix are yielded.
+	 */
+	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
+};
+
 /*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * Seek the iterator to the first reference matching the given seek string.
+ * The seek string is matched as a literal string, without regard for path
+ * separators. If seek is NULL or the empty string, seek the iterator to the
  * first reference again.
  *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
+ * This function is expected to behave as if a new ref iterator has been
+ * created, but allows reuse of existing iterators for optimization.
+ *
+ * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
+ * updated to match the seek string, affecting all subsequent iterations. If
+ * not, the iterator seeks to the specified reference and clears any previously
+ * set prefix.
  *
  * Returns 0 on success, a negative error code otherwise.
  */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
+		      unsigned int flags);
 
 /*
  * If possible, peel the reference currently being viewed by the
 --git a/refs/debug.c b/refs/debug.c
index 485e3079d7..2ed8cff2aa 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -170,12 +170,13 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, unsigned int flags)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->seek(diter->iter, prefix);
-	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	int res = diter->iter->vtable->seek(diter->iter, seek, flags);
+	trace_printf_key(&trace_refs, "iterator_seek: %s flags: %d: %d\n",
+			 seek ? seek : "", flags, res);
 	return res;
 }
 
 --git a/refs/files-backend.c b/refs/files-backend.c
index bf6f89b1d1..0e63013319 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -929,11 +929,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, unsigned int flags)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	return ref_iterator_seek(iter->iter0, prefix);
+	return ref_iterator_seek(iter->iter0, seek, flags);
 }
 
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -2316,7 +2316,8 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				      const char *prefix UNUSED)
+				      const char *seek UNUSED,
+				      unsigned int flags UNUSED)
 {
 	BUG("ref_iterator_seek() called for reflog_iterator");
 }
 --git a/refs/iterator.c b/refs/iterator.c
index 766d96e795..f2364bd6e7 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,10 +15,10 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix)
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
+		      unsigned int flags)
 {
-	return ref_iterator->vtable->seek(ref_iterator, prefix);
+	return ref_iterator->vtable->seek(ref_iterator, seek, flags);
 }
 
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -57,7 +57,8 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 }
 
 static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				   const char *prefix UNUSED)
+				   const char *seek UNUSED,
+				   unsigned int flags UNUSED)
 {
 	return 0;
 }
@@ -224,7 +225,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *seek, unsigned int flags)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
@@ -234,11 +235,11 @@ static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	iter->iter0 = iter->iter0_owned;
 	iter->iter1 = iter->iter1_owned;
 
-	ret = ref_iterator_seek(iter->iter0, prefix);
+	ret = ref_iterator_seek(iter->iter0, seek, flags);
 	if (ret < 0)
 		return ret;
 
-	ret = ref_iterator_seek(iter->iter1, prefix);
+	ret = ref_iterator_seek(iter->iter1, seek, flags);
 	if (ret < 0)
 		return ret;
 
@@ -407,13 +408,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *seek, unsigned int flags)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	return ref_iterator_seek(iter->iter0, prefix);
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		free(iter->prefix);
+		iter->prefix = xstrdup_or_null(seek);
+	}
+	return ref_iterator_seek(iter->iter0, seek, flags);
 }
 
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 --git a/refs/packed-backend.c b/refs/packed-backend.c
index 7fd73a0e6d..11a363d246 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1004,19 +1004,23 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *seek, unsigned int flags)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
 	const char *start;
 
-	if (prefix && *prefix)
-		start = find_reference_location(iter->snapshot, prefix, 0);
+	if (seek && *seek)
+		start = find_reference_location(iter->snapshot, seek, 0);
 	else
 		start = iter->snapshot->start;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
+	/* Unset any previously set prefix */
+	FREE_AND_NULL(iter->prefix);
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
+		iter->prefix = xstrdup_or_null(seek);
+
 	iter->pos = start;
 	iter->eof = iter->snapshot->eof;
 
@@ -1194,7 +1198,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (packed_ref_iterator_seek(&iter->base, prefix,
+				     REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index 8aaffa8c6b..01dfbeb50c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -434,11 +434,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
-static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+static int cache_ref_iterator_set_prefix(struct cache_ref_iterator *iter,
+					 const char *prefix)
 {
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
 	struct cache_ref_iterator_level *level;
 	struct ref_dir *dir;
 
@@ -469,6 +467,82 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	return 0;
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *seek, unsigned int flags)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		return cache_ref_iterator_set_prefix(iter, seek);
+	} else if (seek && *seek) {
+		struct cache_ref_iterator_level *level;
+		const char *slash = seek;
+		struct ref_dir *dir;
+
+		dir = get_ref_dir(iter->cache->root);
+
+		if (iter->prime_dir)
+			prime_ref_dir(dir, seek);
+
+		iter->levels_nr = 1;
+		level = &iter->levels[0];
+		level->index = -1;
+		level->dir = dir;
+
+		/* Unset any previously set prefix */
+		FREE_AND_NULL(iter->prefix);
+
+		/*
+		 * Breakdown the provided seek path and assign the correct
+		 * indexing to each level as needed.
+		 */
+		do {
+			int len, idx;
+			int cmp = 0;
+
+			sort_ref_dir(dir);
+
+			slash = strchr(slash, '/');
+			len = slash ? slash - seek : (int)strlen(seek);
+
+			for (idx = 0; idx < dir->nr; idx++) {
+				cmp = strncmp(seek, dir->entries[idx]->name, len);
+				if (cmp <= 0)
+					break;
+			}
+			/* don't overflow the index */
+			idx = idx >= dir->nr ? dir->nr - 1 : idx;
+
+			if (slash)
+				slash = slash + 1;
+
+			level->index = idx;
+			if (dir->entries[idx]->flag & REF_DIR) {
+				/* push down a level */
+				dir = get_ref_dir(dir->entries[idx]);
+
+				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+					   iter->levels_alloc);
+				level = &iter->levels[iter->levels_nr++];
+				level->dir = dir;
+				level->index = -1;
+			} else {
+				/* reduce the index so the leaf node is iterated over */
+				if (cmp <= 0 && !slash)
+					level->index = idx - 1;
+				/*
+				 * while the seek path may not be exhausted, our
+				 * match is exhausted at a leaf node.
+				 */
+				break;
+			}
+		} while (slash);
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -509,7 +583,8 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 	iter->cache = cache;
 	iter->prime_dir = prime_dir;
 
-	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (cache_ref_iterator_seek(&iter->base, prefix,
+				    REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index 03f5df04d5..6376a3b379 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -353,11 +353,12 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 /*
- * Seek the iterator to the first reference matching the given prefix. Should
- * behave the same as if a new iterator was created with the same prefix.
+ * Seek the iterator to the first matching reference. If set_prefix is set,
+ * it would behave the same as if a new iterator was created with the same
+ * prefix.
  */
 typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
-				 const char *prefix);
+				 const char *seek, unsigned int flags);
 
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
 --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 4c3817f4ec..d627221b65 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -719,15 +719,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				      const char *prefix)
+				      const char *seek, unsigned int flags)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
-	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+	/* Unset any previously set prefix */
+	FREE_AND_NULL(iter->prefix);
+	iter->prefix_len = 0;
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		iter->prefix = xstrdup_or_null(seek);
+		iter->prefix_len = seek ? strlen(seek) : 0;
+	}
+	iter->err = reftable_iterator_seek_ref(&iter->iter, seek);
 
 	return iter->err;
 }
@@ -839,7 +844,8 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	ret = reftable_ref_iterator_seek(&iter->base, prefix);
+	ret = reftable_ref_iterator_seek(&iter->base, prefix,
+					 REF_ITERATOR_SEEK_SET_PREFIX);
 	if (ret)
 		goto done;
 
@@ -2042,7 +2048,8 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-					 const char *prefix UNUSED)
+					 const char *seek UNUSED,
+					 unsigned int flags UNUSED)
 {
 	BUG("reftable reflog iterator cannot be seeked");
 	return -1;
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-08 13:47   ` [PATCH v3 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-10  6:44     ` Patrick Steinhardt
  2025-07-11  9:44       ` Karthik Nayak
  2025-07-14 16:09       ` Junio C Hamano
  0 siblings, 2 replies; 102+ messages in thread
From: Patrick Steinhardt @ 2025-07-10  6:44 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, schwab, phillip.wood123
On Tue, Jul 08, 2025 at 03:47:48PM +0200, Karthik Nayak wrote:
> diff --git a/refs.h b/refs.h
> index 7c21aaef3d..7852ad36f3 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -1299,21 +1299,32 @@ struct ref_iterator *refs_ref_iterator_begin(
>   */
>  int ref_iterator_advance(struct ref_iterator *ref_iterator);
>  
> +enum ref_iterator_seek_flag {
> +	/*
> +	 * Also set the seek pattern as a prefix for iteration. This ensures
> +	 * that only references which match the prefix are yielded.
> +	 */
> +	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
> +};
> +
Nit: I think it's a tiny bit confusing that the documentation of this
enum is split up across here and the doc of `ref_iterator_seek()`. I
think it would be sensible to move the last paragraph of the function
over here so that the whole behaviour of the enum is explained in a
single place.
>  /*
> - * Seek the iterator to the first reference with the given prefix.
> - * The prefix is matched as a literal string, without regard for path
> - * separators. If prefix is NULL or the empty string, seek the iterator to the
> + * Seek the iterator to the first reference matching the given seek string.
> + * The seek string is matched as a literal string, without regard for path
> + * separators. If seek is NULL or the empty string, seek the iterator to the
>   * first reference again.
>   *
> - * This function is expected to behave as if a new ref iterator with the same
> - * prefix had been created, but allows reuse of iterators and thus may allow
> - * the backend to optimize. Parameters other than the prefix that have been
> - * passed when creating the iterator will remain unchanged.
> + * This function is expected to behave as if a new ref iterator has been
> + * created, but allows reuse of existing iterators for optimization.
> + *
> + * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
> + * updated to match the seek string, affecting all subsequent iterations. If
> + * not, the iterator seeks to the specified reference and clears any previously
> + * set prefix.
>   *
>   * Returns 0 on success, a negative error code otherwise.
>   */
> -int ref_iterator_seek(struct ref_iterator *ref_iterator,
> -		      const char *prefix);
> +int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
> +		      unsigned int flags);
Another tiny nit: instead of calling the variable `seek` we can just
call it `refname`. That might give a bit more of a hint what you're
actually seeking for.
But other than that I'm happy with the new behaviour, where we are now
consistently either setting or resetting the prefix depending on whether
or not the caller set the flag.
> diff --git a/refs/iterator.c b/refs/iterator.c
> index 766d96e795..f2364bd6e7 100644
> --- a/refs/iterator.c
> +++ b/refs/iterator.c
> @@ -407,13 +408,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  }
>  
>  static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
> -				    const char *prefix)
> +				    const char *seek, unsigned int flags)
>  {
>  	struct prefix_ref_iterator *iter =
>  		(struct prefix_ref_iterator *)ref_iterator;
> -	free(iter->prefix);
> -	iter->prefix = xstrdup_or_null(prefix);
> -	return ref_iterator_seek(iter->iter0, prefix);
> +
> +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
> +		free(iter->prefix);
> +		iter->prefix = xstrdup_or_null(seek);
> +	}
> +	return ref_iterator_se
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-10  6:44     ` Patrick Steinhardt
@ 2025-07-11  9:44       ` Karthik Nayak
  2025-07-14 16:09       ` Junio C Hamano
  1 sibling, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-11  9:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, gitster, schwab, phillip.wood123
[-- Attachment #1: Type: text/plain, Size: 3663 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Tue, Jul 08, 2025 at 03:47:48PM +0200, Karthik Nayak wrote:
>> diff --git a/refs.h b/refs.h
>> index 7c21aaef3d..7852ad36f3 100644
>> --- a/refs.h
>> +++ b/refs.h
>> @@ -1299,21 +1299,32 @@ struct ref_iterator *refs_ref_iterator_begin(
>>   */
>>  int ref_iterator_advance(struct ref_iterator *ref_iterator);
>>
>> +enum ref_iterator_seek_flag {
>> +	/*
>> +	 * Also set the seek pattern as a prefix for iteration. This ensures
>> +	 * that only references which match the prefix are yielded.
>> +	 */
>> +	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
>> +};
>> +
>
> Nit: I think it's a tiny bit confusing that the documentation of this
> enum is split up across here and the doc of `ref_iterator_seek()`. I
> think it would be sensible to move the last paragraph of the function
> over here so that the whole behaviour of the enum is explained in a
> single place.
>
Yeah I think that makes sense.
>>  /*
>> - * Seek the iterator to the first reference with the given prefix.
>> - * The prefix is matched as a literal string, without regard for path
>> - * separators. If prefix is NULL or the empty string, seek the iterator to the
>> + * Seek the iterator to the first reference matching the given seek string.
>> + * The seek string is matched as a literal string, without regard for path
>> + * separators. If seek is NULL or the empty string, seek the iterator to the
>>   * first reference again.
>>   *
>> - * This function is expected to behave as if a new ref iterator with the same
>> - * prefix had been created, but allows reuse of iterators and thus may allow
>> - * the backend to optimize. Parameters other than the prefix that have been
>> - * passed when creating the iterator will remain unchanged.
>> + * This function is expected to behave as if a new ref iterator has been
>> + * created, but allows reuse of existing iterators for optimization.
>> + *
>> + * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
>> + * updated to match the seek string, affecting all subsequent iterations. If
>> + * not, the iterator seeks to the specified reference and clears any previously
>> + * set prefix.
>>   *
>>   * Returns 0 on success, a negative error code otherwise.
>>   */
>> -int ref_iterator_seek(struct ref_iterator *ref_iterator,
>> -		      const char *prefix);
>> +int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
>> +		      unsigned int flags);
>
> Another tiny nit: instead of calling the variable `seek` we can just
> call it `refname`. That might give a bit more of a hint what you're
> actually seeking for.
>
Fair enough, let me change that.
> But other than that I'm happy with the new behaviour, where we are now
> consistently either setting or resetting the prefix depending on whether
> or not the caller set the flag.
>
Thanks for the review!
>> diff --git a/refs/iterator.c b/refs/iterator.c
>> index 766d96e795..f2364bd6e7 100644
>> --- a/refs/iterator.c
>> +++ b/refs/iterator.c
>> @@ -407,13 +408,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
>>  }
>>
>>  static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
>> -				    const char *prefix)
>> +				    const char *seek, unsigned int flags)
>>  {
>>  	struct prefix_ref_iterator *iter =
>>  		(struct prefix_ref_iterator *)ref_iterator;
>> -	free(iter->prefix);
>> -	iter->prefix = xstrdup_or_null(prefix);
>> -	return ref_iterator_seek(iter->iter0, prefix);
>> +
>> +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
>> +		free(iter->prefix);
>> +		iter->prefix = xstrdup_or_null(seek);
>> +	}
>> +	return ref_iterator_se
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-10  6:44     ` Patrick Steinhardt
  2025-07-11  9:44       ` Karthik Nayak
@ 2025-07-14 16:09       ` Junio C Hamano
  2025-07-15  9:49         ` Karthik Nayak
  1 sibling, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-14 16:09 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git, schwab, phillip.wood123
Patrick Steinhardt <ps@pks.im> writes:
>> + * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
>> + * updated to match the seek string, affecting all subsequent iterations. If
>> + * not, the iterator seeks to the specified reference and clears any previously
>> + * set prefix.
>>   *
>>   * Returns 0 on success, a negative error code otherwise.
>>   */
>> -int ref_iterator_seek(struct ref_iterator *ref_iterator,
>> -		      const char *prefix);
>> +int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
>> +		      unsigned int flags);
>
> Another tiny nit: instead of calling the variable `seek` we can just
> call it `refname`. That might give a bit more of a hint what you're
> actually seeking for.
>
> But other than that I'm happy with the new behaviour, where we are now
> consistently either setting or resetting the prefix depending on whether
> or not the caller set the flag.
I am not sure.  The way the "prefix" is used, if I understand correctly, is
 - it is set by iterator-begin, typically to the area to iterate
   over (e.g. "refs/heads/" for iterating over branches) in the
   for_each_ref_*() family of helpers, and internally we seek to
   that area (skipping anything that come strictly before
   "refs/heads/" for example).
 - iterator-advance looks at it and decides we are done when the
   iterator points beyond that prefix
So if you are iterating inside "refs/heads/" hierarchy and seek to
"refs/heads/m", don't you still want to stop when you step outside
"refs/heads/" by keeping the original prefix, instead of unsetting
the prefix to empty?  A postimage of this patch for packed backend
(picked at random) reads like this:
        static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
                                            const char *seek, unsigned int flags)
        {
                struct packed_ref_iterator *iter =
                        (struct packed_ref_iterator *)ref_iterator;
                const char *start;
                if (seek && *seek)
                        start = find_reference_location(iter->snapshot, seek, 0);
                else
                        start = iter->snapshot->start;
                /* Unset any previously set prefix */
                FREE_AND_NULL(iter->prefix);
                if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
                        iter->prefix = xstrdup_or_null(seek);
so after (true) seeking that does not have the SET_PREFIX flag on,
wouldn't our iterator-advance run through the end since it no longer
is aware of where to stop?
Thanks.
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-14 16:09       ` Junio C Hamano
@ 2025-07-15  9:49         ` Karthik Nayak
  2025-07-15 16:35           ` Junio C Hamano
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15  9:49 UTC (permalink / raw)
  To: Junio C Hamano, Patrick Steinhardt; +Cc: git, schwab, phillip.wood123
[-- Attachment #1: Type: text/plain, Size: 3459 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Patrick Steinhardt <ps@pks.im> writes:
>
>>> + * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
>>> + * updated to match the seek string, affecting all subsequent iterations. If
>>> + * not, the iterator seeks to the specified reference and clears any previously
>>> + * set prefix.
>>>   *
>>>   * Returns 0 on success, a negative error code otherwise.
>>>   */
>>> -int ref_iterator_seek(struct ref_iterator *ref_iterator,
>>> -		      const char *prefix);
>>> +int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
>>> +		      unsigned int flags);
>>
>> Another tiny nit: instead of calling the variable `seek` we can just
>> call it `refname`. That might give a bit more of a hint what you're
>> actually seeking for.
>>
>> But other than that I'm happy with the new behaviour, where we are now
>> consistently either setting or resetting the prefix depending on whether
>> or not the caller set the flag.
>
> I am not sure.  The way the "prefix" is used, if I understand correctly, is
>
>  - it is set by iterator-begin, typically to the area to iterate
>    over (e.g. "refs/heads/" for iterating over branches) in the
>    for_each_ref_*() family of helpers, and internally we seek to
>    that area (skipping anything that come strictly before
>    "refs/heads/" for example).
>
>  - iterator-advance looks at it and decides we are done when the
>    iterator points beyond that prefix
>
That's right.
> So if you are iterating inside "refs/heads/" hierarchy and seek to
> "refs/heads/m", don't you still want to stop when you step outside
> "refs/heads/" by keeping the original prefix, instead of unsetting
> the prefix to empty?  A postimage of this patch for packed backend
> (picked at random) reads like this:
>
>         static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
>                                             const char *seek, unsigned int flags)
>         {
>                 struct packed_ref_iterator *iter =
>                         (struct packed_ref_iterator *)ref_iterator;
>                 const char *start;
>
>                 if (seek && *seek)
>                         start = find_reference_location(iter->snapshot, seek, 0);
>                 else
>                         start = iter->snapshot->start;
>
>                 /* Unset any previously set prefix */
>                 FREE_AND_NULL(iter->prefix);
>
>                 if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
>                         iter->prefix = xstrdup_or_null(seek);
>
> so after (true) seeking that does not have the SET_PREFIX flag on,
> wouldn't our iterator-advance run through the end since it no longer
> is aware of where to stop?
>
That's also right and that is indeed the intention. We're trying to make
the actions more intentional.
So if a user sets a 'prefix' for the iterator, all previous state of the
iterator is reset. So, the same function for seeking an iterator should
also have the same side-effect of resetting the previous state.
There could be a usecase where we add support for keeping the prefix,
while also seeking the iterator. That would be an explicit change
(perhaps with a corresponding flag?) that we'd have to build, add tests
for and call out. Until then, we explicitly reset the state whenever a
user calls 'ref_iterator_seek()', they can be sure that any previous
state is reset.
> Thanks.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-15  9:49         ` Karthik Nayak
@ 2025-07-15 16:35           ` Junio C Hamano
  2025-07-16 14:40             ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-15 16:35 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Patrick Steinhardt, git, schwab, phillip.wood123
Karthik Nayak <karthik.188@gmail.com> writes:
>> so after (true) seeking that does not have the SET_PREFIX flag on,
>> wouldn't our iterator-advance run through the end since it no longer
>> is aware of where to stop?
>>
>
> That's also right and that is indeed the intention. We're trying to make
> the actions more intentional.
>
> So if a user sets a 'prefix' for the iterator, all previous state of the
> iterator is reset. So, the same function for seeking an iterator should
> also have the same side-effect of resetting the previous state.
Perhaps we have different definition of "previous state" in mind?
So let's imagine an iterator is walking over all branches (i.e. the
prefix is set to refs/heads/, to allow it to stop once it steps
outside refs/heads/ and moves over to refs/imerge).  It starts
iterating and I see branches whose name sorts early in alphabetical
order.  I tell it to seek to refs/heads/master and keep iterating.
Wouldn't it be a lot more natural if it still stops iterating after
it finishes showing the last branch, iow, a ref in refs/heads/
hierarchy?  In other words, I am not sure why ...
> There could be a usecase where we add support for keeping the prefix,
> while also seeking the iterator. That would be an explicit change
... that is the optional and unimplemented feature, not the other
way around.  Is it just the ease of implementation?
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-15 16:35           ` Junio C Hamano
@ 2025-07-16 14:40             ` Karthik Nayak
  2025-07-16 15:39               ` Junio C Hamano
  2025-07-16 20:02               ` Junio C Hamano
  0 siblings, 2 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-16 14:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, schwab, phillip.wood123
[-- Attachment #1: Type: text/plain, Size: 2932 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>>> so after (true) seeking that does not have the SET_PREFIX flag on,
>>> wouldn't our iterator-advance run through the end since it no longer
>>> is aware of where to stop?
>>>
>>
>> That's also right and that is indeed the intention. We're trying to make
>> the actions more intentional.
>>
>> So if a user sets a 'prefix' for the iterator, all previous state of the
>> iterator is reset. So, the same function for seeking an iterator should
>> also have the same side-effect of resetting the previous state.
>
> Perhaps we have different definition of "previous state" in mind?
> So let's imagine an iterator is walking over all branches (i.e. the
> prefix is set to refs/heads/, to allow it to stop once it steps
> outside refs/heads/ and moves over to refs/imerge).  It starts
> iterating and I see branches whose name sorts early in alphabetical
> order.  I tell it to seek to refs/heads/master and keep iterating.
>
I get what you're saying and indeed that would be natural. Let me draw
another example to draw the contrast.
Let's say a user is iterating with a prefix set to 'refs/heads/', this
would iterate over all the refs with that prefix. But mid-way the user
realizes that they only care about 'refs/heads/feature/' prefix and they
ask the iterator to set that as the prefix.
In such a situation, the iterator seeks to 'refs/heads/feature/' and
will only yield references with that prefix. In short, the previous
prefix state was reset.
So to avoid the two scenarios:
1. Only seek the iterator but maintain prefix
2. Seek and set new prefix, loosing old prefix
Where one resets the prefix while the other doesn't. We make it explicit
and say, whenever 'ref_iterator_seek' is called, any set prefix is
reset. I do see the other way around too, where prefix isn't treated as
previous state.
What I was trying to argue for, was that, there could be a situation
like what you mentioned, where a user might want to retain a prefix,
this should be an explicit requirement which not implemented in this
series. So as of this series, you cannot set a prefix and then seek and
expect to retain the prefix.
> Wouldn't it be a lot more natural if it still stops iterating after
> it finishes showing the last branch, iow, a ref in refs/heads/
> hierarchy?  In other words, I am not sure why ...
>
>> There could be a usecase where we add support for keeping the prefix,
>> while also seeking the iterator. That would be an explicit change
>
> ... that is the optional and unimplemented feature, not the other
> way around.  Is it just the ease of implementation?
This series did start out that way around, so ease of implementation
isn't it. It was more of a side-effect of not clearing state. But I
would be more comfortable if this wasn't a side-effect but rather a
conscious choice with tests and adequate documentation.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-16 14:40             ` Karthik Nayak
@ 2025-07-16 15:39               ` Junio C Hamano
  2025-07-16 20:02               ` Junio C Hamano
  1 sibling, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2025-07-16 15:39 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Patrick Steinhardt, git, schwab, phillip.wood123
Karthik Nayak <karthik.188@gmail.com> writes:
> Let's say a user is iterating with a prefix set to 'refs/heads/', this
> would iterate over all the refs with that prefix. But mid-way the user
> realizes that they only care about 'refs/heads/feature/' prefix and they
> ask the iterator to set that as the prefix.
But is that user who changes their mind in the middle "seeking"?
It is more like "I am abandoning the enumeration I started earlier
over refs/heads/, and I want a different enumeration over
refs/heads/feature/, but because I know the implementation detail
that abandoning an iterator and creating another is more expensive,
let me reuse the one in use to repurpose it".
I wouldn't call it "seeking"; it sounds more like "resetting".
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-16 14:40             ` Karthik Nayak
  2025-07-16 15:39               ` Junio C Hamano
@ 2025-07-16 20:02               ` Junio C Hamano
  2025-07-17  9:01                 ` Karthik Nayak
  1 sibling, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-16 20:02 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Patrick Steinhardt, git, schwab, phillip.wood123
Karthik Nayak <karthik.188@gmail.com> writes:
> Let's say a user is iterating with a prefix set to 'refs/heads/', this
> would iterate over all the refs with that prefix. But mid-way the user
> realizes that they only care about 'refs/heads/feature/' prefix and they
> ask the iterator to set that as the prefix.
>
> In such a situation, the iterator seeks to 'refs/heads/feature/' and
> will only yield references with that prefix. In short, the previous
> prefix state was reset.
Yes, even though I wouldn't call such an operation "seek", "Ah, I do
not need the entire refs/heads/ walked, only refs/heads/feature/ is
enough" is an operation mode that makes sense.
But not for paging, though.
If your web application is showing all branches, one pageful at a
time, and the first page ended at refs/heads/feature/something and
you ended up "seeking" to refs/heads/feature/ to start the second
page, you do not want your second page to end when the iteration
goes out of refs/heads/feature/ hierarchy, no?
It seems to me that the root cause of the confusion is because
prefix, which is to let iteration finish way before the data runs
out (instead finish when the iteration steps out of a given
subhierarchy denoted by the prefix), is somehow abused as the
current position of the cursor.  Shouldn't they be two separate
concepts?  The cursor needs to fall within the prefix while the
iterator is active, so they are not two totally independent things,
but prefix is pretty much static while the cursor position is very
dynamic.
> This series did start out that way around, so ease of implementation
> isn't it. It was more of a side-effect of not clearing state.
I am even more worried about usability and correctness aspect of
what was described here now.  After seeking to refs/heads/feature/,
do we continue to iterate and step out of refs/heads/feature/
hierarchy or can we cut off a particular page that started with a
ref within refs/heads/feature/ subhierarchy when we exhaust refs in
refs/heads/feature/ and have to wait for getting asked for the next
page before we show refs/heads/gsomething that is outside
refs/heads/feature/ and sorts after?  The "I reset to iterate over
refs/heads/feature/ because the entire refs/heads/ is not what I
care about" example makes me worried about this.
Thanks.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-16 20:02               ` Junio C Hamano
@ 2025-07-17  9:01                 ` Karthik Nayak
  2025-07-17 17:31                   ` Junio C Hamano
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-17  9:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, schwab, phillip.wood123
[-- Attachment #1: Type: text/plain, Size: 5171 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Let's say a user is iterating with a prefix set to 'refs/heads/', this
>> would iterate over all the refs with that prefix. But mid-way the user
>> realizes that they only care about 'refs/heads/feature/' prefix and they
>> ask the iterator to set that as the prefix.
>>
>> In such a situation, the iterator seeks to 'refs/heads/feature/' and
>> will only yield references with that prefix. In short, the previous
>> prefix state was reset.
>
> Yes, even though I wouldn't call such an operation "seek", "Ah, I do
> not need the entire refs/heads/ walked, only refs/heads/feature/ is
> enough" is an operation mode that makes sense.
>
> But not for paging, though.
>
> If your web application is showing all branches, one pageful at a
> time, and the first page ended at refs/heads/feature/something and
> you ended up "seeking" to refs/heads/feature/ to start the second
> page, you do not want your second page to end when the iteration
> goes out of refs/heads/feature/ hierarchy, no?
>
Yup and this (we show all references beyond the seek) is the current
implementation. I was talking about the internal implementation of
'refs_iteration_seek()' which is the function used for seek and setting
the prefix.
To clarify, this is the current implementation:
$ git for-each-ref
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/heads/bar
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/heads/feature/x
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/heads/feature/y
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/heads/foo
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/heads/goo/x
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/heads/goo/y
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/heads/master
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/tags/tagged/2
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/tags/tagged/3
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/tags/v1
6f4b58c4968eb82277cf5b1cf8775117e5b83de0 commit	refs/tags/v2
$ git for-each-ref --format="%(refname)" --start-after=refs/heads/goo
refs/heads/goo/x
refs/heads/goo/y
refs/heads/master
refs/tags/tagged/2
refs/tags/tagged/3
refs/tags/v1
refs/tags/v2
$ git for-each-ref --format="%(refname)" --start-after=refs/heads/master
refs/tags/tagged/2
refs/tags/tagged/3
refs/tags/v1
refs/tags/v2
$ git for-each-ref --format="%(refname)" --start-after=refs/heads/goo/x
refs/heads/goo/y
refs/heads/master
refs/tags/tagged/2
refs/tags/tagged/3
refs/tags/v1
refs/tags/v2
$ git for-each-ref --format="%(refname)" refs/heads/feature
refs/heads/feature/x
refs/heads/feature/y
You can see we list all references beyond the seek.
> It seems to me that the root cause of the confusion is because
> prefix, which is to let iteration finish way before the data runs
> out (instead finish when the iteration steps out of a given
> subhierarchy denoted by the prefix), is somehow abused as the
> current position of the cursor.  Shouldn't they be two separate
> concepts?  The cursor needs to fall within the prefix while the
> iterator is active, so they are not two totally independent things,
> but prefix is pretty much static while the cursor position is very
> dynamic.
>
The prefix setup in 'ref_iteration_seek' does two things, let's consider
prefix: 'refs/heads/feature'
1. It sets the cursor to seek to 'refs/heads/feature'
2. It also sets the internal prefix matching to 'refs/heads/feature'
In Contrast seeking via 'ref_iteration_seek' only sets the cursor to
'refs/heads/feature'.
To make this simpler, we've changed 'ref_iteration_seek' to do:
1. seek the cursor to the requested reference
2. Set prefix if the REF_ITERATOR_SEEK_SET_PREFIX is set, and unset the
prefix otherwise.
The state reset I was talking about in my previous emails refers to step
#2 here, where when no 'REF_ITERATOR_SEEK_SET_PREFIX' is set, we remove
any previous prefix set.
>> This series did start out that way around, so ease of implementation
>> isn't it. It was more of a side-effect of not clearing state.
>
> I am even more worried about usability and correctness aspect of
> what was described here now.  After seeking to refs/heads/feature/,
> do we continue to iterate and step out of refs/heads/feature/
> hierarchy or can we cut off a particular page that started with a
> ref within refs/heads/feature/ subhierarchy when we exhaust refs in
> refs/heads/feature/ and have to wait for getting asked for the next
> page before we show refs/heads/gsomething that is outside
> refs/heads/feature/ and sorts after?  The "I reset to iterate over
> refs/heads/feature/ because the entire refs/heads/ is not what I
> care about" example makes me worried about this.
>
> Thanks.
I think we're crossing paths and talking different things. I hope the
examples above clarify things. The current implementation doesn't
support '--start-after' and prefix setting at the same time:
$ git for-each-ref --format="%(refname)"
--start-after=refs/heads/master refs/heads
fatal: cannot use --start-after with patterns
Happy to clarify if this doesn't make sense.
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v3 3/4] refs: selectively set prefix in the seek functions
  2025-07-17  9:01                 ` Karthik Nayak
@ 2025-07-17 17:31                   ` Junio C Hamano
  0 siblings, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2025-07-17 17:31 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Patrick Steinhardt, git, schwab, phillip.wood123
Karthik Nayak <karthik.188@gmail.com> writes:
>> Yes, even though I wouldn't call such an operation "seek", "Ah, I do
>> not need the entire refs/heads/ walked, only refs/heads/feature/ is
>> enough" is an operation mode that makes sense.
>>
>> But not for paging, though.
Actually, you are not using the new value for prefix; you are
unsetting prefix to nothing, so you do not stop when you go over the
boundary of refs/heads/feature/ hierarchy but will run through to
the end, so my worry is unfounded.
Which is good.
But the application has to decide when to stop as "We obtained an
iterator in order to list all branches in refs/heads/ and the last
round we gave refs/heads/main out.  Seek to it and continue the
output" will no longer stop after making a callback with
refs/heads/zz but will continue yielding non branch refs after that.
> The current implementation doesn't
> support '--start-after' and prefix setting at the same time:
>
> $ git for-each-ref --format="%(refname)"
> --start-after=refs/heads/master refs/heads
> fatal: cannot use --start-after with patterns
Good---in that case, "unsetting prefix to nothing" does not make any
difference and cannot introduce any confusing behaviour, as it has to
run to the end with or without "--start-after" anyway.
Thanks.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH v3 4/4] for-each-ref: introduce a '--start-after' option
  2025-07-08 13:47 ` [PATCH v3 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-07-08 13:47   ` [PATCH v3 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-08 13:47   ` Karthik Nayak
  2025-07-08 20:25     ` Junio C Hamano
  3 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-08 13:47 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
The previous commit added 'seek' functionality to the reference
backends. Utilize this and expose a '--start-after' option in
'git-for-each-ref(1)'. When used, the reference iteration seeks to the
lexicographically next reference and iterates from there onward.
This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200
Since the reference iterators only allow seeking to a specified marker
via the `ref_iterator_seek()`, we introduce a helper function
`start_ref_iterator_after()`, which seeks to next reference by simply
adding (char) 1 to the marker.
We must note that pagination always continues from the provided marker,
as such any concurrent reference updates lexicographically behind the
marker will not be output. Document the same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  |  11 +-
               |   8 ++
                         |  80 +++++++++++----
                         |   1 +
       | 194 ++++++++++++++++++++++++++++++++++++
 5 files changed, 273 insertions(+), 21 deletions(-)
 --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
index 5ef89fc0fe..e099d1ba7c 100644
--- a/Documentation/git-for-each-ref.adoc
+++ b/Documentation/git-for-each-ref.adoc
@@ -14,7 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
-		   [--exclude=<pattern> ...]
+		   [--exclude=<pattern> ...] [--start-after=<marker>]
 
 DESCRIPTION
 -----------
@@ -108,6 +108,15 @@ TAB %(refname)`.
 --include-root-refs::
 	List root refs (HEAD and pseudorefs) apart from regular refs.
 
+--start-after::
+    Allows paginating the output by skipping references up to and including the
+    specified marker. When paging, it should be noted that references may be
+    deleted, modified or added between invocations. Output will only yield those
+    references which follow the marker lexicographically. If the marker does not
+    exist, output begins from the first reference that would come after it
+    alphabetically. Cannot be used with general pattern matching or custom
+    sort options.
+
 FIELD NAMES
 -----------
 
 --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 3d2207ec77..3f21598046 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -13,6 +13,7 @@ static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
+	N_("git for-each-ref [--start-after <marker>]"),
 	NULL
 };
 
@@ -44,6 +45,7 @@ int cmd_for_each_ref(int argc,
 		OPT_GROUP(""),
 		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
+		OPT_STRING(  0 , "start-after", &filter.start_after, N_("start-start"), N_("start iteration after the provided marker")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
@@ -79,6 +81,9 @@ int cmd_for_each_ref(int argc,
 	if (verify_ref_format(&format))
 		usage_with_options(for_each_ref_usage, opts);
 
+	if (filter.start_after && sorting_options.nr > 1)
+		die(_("cannot use --start-after with custom sort options"));
+
 	sorting = ref_sorting_options(&sorting_options);
 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
 	filter.ignore_case = icase;
@@ -100,6 +105,9 @@ int cmd_for_each_ref(int argc,
 		filter.name_patterns = argv;
 	}
 
+	if (filter.start_after && filter.name_patterns && filter.name_patterns[0])
+		die(_("cannot use --start-after with patterns"));
+
 	if (include_root_refs)
 		flags |= FILTER_REFS_ROOT_REFS | FILTER_REFS_DETACHED_HEAD;
 
 --git a/ref-filter.c b/ref-filter.c
index 7a274633cf..2dfd385313 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2683,6 +2683,24 @@ static int filter_exclude_match(struct ref_filter *filter, const char *refname)
 	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
 }
 
+/*
+ * We need to seek to the reference right after a given marker but excluding any
+ * matching references. So we seek to the lexicographically next reference.
+ */
+static int start_ref_iterator_after(struct ref_iterator *iter, const char *marker)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int ret;
+
+	strbuf_addstr(&sb, marker);
+	strbuf_addch(&sb, 1);
+
+	ret = ref_iterator_seek(iter, sb.buf, 0);
+
+	strbuf_release(&sb);
+	return ret;
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -2692,10 +2710,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 				       each_ref_fn cb,
 				       void *cb_data)
 {
+	struct ref_iterator *iter;
+	int flags = 0, ret = 0;
+
 	if (filter->kind & FILTER_REFS_ROOT_REFS) {
 		/* In this case, we want to print all refs including root refs. */
-		return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
-						       cb, cb_data);
+		flags |= DO_FOR_EACH_INCLUDE_ROOT_REFS;
+		goto non_prefix_iter;
 	}
 
 	if (!filter->match_as_path) {
@@ -2704,8 +2725,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	if (filter->ignore_case) {
@@ -2714,20 +2734,29 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", filter->exclude.v, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
 						 filter->exclude.v,
 						 cb, cb_data);
+
+non_prefix_iter:
+	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
+				       NULL, 0, flags);
+	if (filter->start_after)
+		ret = start_ref_iterator_after(iter, filter->start_after);
+
+	if (ret)
+		return ret;
+
+	return do_for_each_ref_iterator(iter, cb, cb_data);
 }
 
 /*
@@ -3197,9 +3226,11 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 	init_contains_cache(&filter->internal.no_contains_cache);
 
 	/*  Simple per-ref filtering */
-	if (!filter->kind)
+	if (!filter->kind) {
 		die("filter_refs: invalid type");
-	else {
+	} else {
+		const char *prefix = NULL;
+
 		/*
 		 * For common cases where we need only branches or remotes or tags,
 		 * we only iterate through those refs. If a mix of refs is needed,
@@ -3207,19 +3238,28 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/heads/", NULL,
-						       fn, cb_data);
+			prefix = "refs/heads/";
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/remotes/", NULL,
-						       fn, cb_data);
+			prefix = "refs/remotes/";
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/tags/", NULL, fn,
-						       cb_data);
-		else if (filter->kind & FILTER_REFS_REGULAR)
+			prefix = "refs/tags/";
+
+		if (prefix) {
+			struct ref_iterator *iter;
+
+			iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
+						       "", NULL, 0, 0);
+
+			if (filter->start_after)
+				ret = start_ref_iterator_after(iter, filter->start_after);
+			else if (prefix)
+				ret = ref_iterator_seek(iter, prefix, 1);
+
+			if (!ret)
+				ret = do_for_each_ref_iterator(iter, fn, cb_data);
+		} else if (filter->kind & FILTER_REFS_REGULAR) {
 			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
+		}
 
 		/*
 		 * When printing all ref types, HEAD is already included,
 --git a/ref-filter.h b/ref-filter.h
index c98c4fbd4c..f22ca94b49 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -64,6 +64,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	const char *start_after;
 	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index bb02b86c16..a43e099118 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -541,4 +541,198 @@ test_expect_success 'validate worktree atom' '
 	test_cmp expect actual
 '
 
+test_expect_success 'start after with empty value' '
+	cat >expect <<-\EOF &&
+	refs/heads/main
+	refs/heads/main_worktree
+	refs/heads/side
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after="" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after a specific reference with partial match' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/sp >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, just behind a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/parrot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after with specific directory match' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after with specific directory and trailing slash' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, just behind a specific directory' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, overflow specific reference length' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spotnew >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, overflow specific reference path' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot/new >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, last reference' '
+	cat >expect <<-\EOF &&
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/tags/two >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after used with a pattern' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --start-after with patterns
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot refs/tags 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after used with custom sort order' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --start-after with custom sort options
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot --sort=author 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v3 4/4] for-each-ref: introduce a '--start-after' option
  2025-07-08 13:47   ` [PATCH v3 4/4] for-each-ref: introduce a '--start-after' option Karthik Nayak
@ 2025-07-08 20:25     ` Junio C Hamano
  2025-07-09  9:53       ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-08 20:25 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, schwab, phillip.wood123
Karthik Nayak <karthik.188@gmail.com> writes:
> The `git-for-each-ref(1)` command is used to iterate over references
> present in a repository. In large repositories with millions of
> references, it would be optimal to paginate this output such that we
> can start iteration from a given reference. This would avoid having to
> iterate over all references from the beginning each time when paginating
> through results.
>
> The previous commit added 'seek' functionality to the reference
> backends. Utilize this and expose a '--start-after' option in
> 'git-for-each-ref(1)'. When used, the reference iteration seeks to the
> lexicographically next reference and iterates from there onward.
>
> This enables efficient pagination workflows like:
>     git for-each-ref --count=100
>     git for-each-ref --count=100 --start-after=refs/heads/branch-100
>     git for-each-ref --count=100 --start-after=refs/heads/branch-200
It is a bit hard to understand how this leads to "efficient
pagination" unless the reader is told what the calling script does
after the first call before making the second call.  It remembers
the last output from the Nth call and prepares the N+1th call by
using that last output entry.
But that probably belongs to the end-user facing documentation, not
in the log message.
> +--start-after::
`--start-after=<marker>`::
> +    Allows paginating the output by skipping references up to and including the
> +    specified marker. When paging, it should be noted that references may be
> +    deleted, modified or added between invocations. Output will only yield those
> +    references which follow the marker lexicographically. If the marker does not
> +    exist, output begins from the first reference that would come after it
> +    alphabetically.
It is true that the first entry shown would be what would come
immediately _after_ the given <marker>, whether the marker does or
does not exist.  So "If the marker does not exist, output begins..."
-> "Output begins ..."
Other than that, looked pretty good to me.
Thanks, will queue.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v3 4/4] for-each-ref: introduce a '--start-after' option
  2025-07-08 20:25     ` Junio C Hamano
@ 2025-07-09  9:53       ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-09  9:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, schwab, phillip.wood123
[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The `git-for-each-ref(1)` command is used to iterate over references
>> present in a repository. In large repositories with millions of
>> references, it would be optimal to paginate this output such that we
>> can start iteration from a given reference. This would avoid having to
>> iterate over all references from the beginning each time when paginating
>> through results.
>>
>> The previous commit added 'seek' functionality to the reference
>> backends. Utilize this and expose a '--start-after' option in
>> 'git-for-each-ref(1)'. When used, the reference iteration seeks to the
>> lexicographically next reference and iterates from there onward.
>>
>> This enables efficient pagination workflows like:
>>     git for-each-ref --count=100
>>     git for-each-ref --count=100 --start-after=refs/heads/branch-100
>>     git for-each-ref --count=100 --start-after=refs/heads/branch-200
>
> It is a bit hard to understand how this leads to "efficient
> pagination" unless the reader is told what the calling script does
> after the first call before making the second call.  It remembers
> the last output from the Nth call and prepares the N+1th call by
> using that last output entry.
>
> But that probably belongs to the end-user facing documentation, not
> in the log message.
>
I added a small line in the commit message to clarify this. I'm not sure
this belongs in the user facing documentation. Mostly I see this in the
commit message to explain the intention behind adding the flag. The
documentation already mentions how the flag can be used, so we should be
good there.
>> +--start-after::
>
> `--start-after=<marker>`::
>
Oops. Thanks
>> +    Allows paginating the output by skipping references up to and including the
>> +    specified marker. When paging, it should be noted that references may be
>> +    deleted, modified or added between invocations. Output will only yield those
>> +    references which follow the marker lexicographically. If the marker does not
>> +    exist, output begins from the first reference that would come after it
>> +    alphabetically.
>
> It is true that the first entry shown would be what would come
> immediately _after_ the given <marker>, whether the marker does or
> does not exist.  So "If the marker does not exist, output begins..."
> -> "Output begins ..."
>
> Other than that, looked pretty good to me.
>
Good point, will amend this.
> Thanks, will queue.
Thank you for the review. I'll add the changes locally and push a new
version after a day or two.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
                   ` (8 preceding siblings ...)
  2025-07-08 13:47 ` [PATCH v3 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
@ 2025-07-11 16:18 ` Karthik Nayak
  2025-07-11 16:18   ` [PATCH v4 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
                     ` (4 more replies)
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
  10 siblings, 5 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-11 16:18 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
This series adds a '--start-after' option in 'git-for-each-ref(1)'. When
used, the reference iteration seeks to first reference following the
marker alphabetically. When paging, it should be noted that references
may be deleted, modified or added between invocations. Output will only
yield those references which follow the marker lexicographically. If the
marker does not exist, output begins from the first reference that would
come after it alphabetically.
This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200
To add this functionality, we expose the `ref_iterator` outside the
'refs/' namespace and modify the `ref_iterator_seek()` to actually seek
to a given reference and only set the prefix when the `set_prefix` field
is set.
On the reftable and packed backend, the changes are simple. But since
the files backend uses 'ref-cache' for reference handling, the changes
there are a little more involved, since we need to setup the right
levels and the indexing.
Initially I was also planning to cleanup all the `refs_for_each...()`
functions in 'refs.h' by simply using the iterator, but this bloated the
series. So I've left that for another day.
Changes in v4:
- Patch 3/4: Move around the documentation for the flag and rename the
  seek variable to refname.
- Patch 4/4: Cleanup the commit message and also the documentation.
- Link to v3: https://lore.kernel.org/r/20250708-306-git-for-each-ref-pagination-v3-0-8cfba1080be4@gmail.com
Changes in v3:
- Change the working of the command to exclude the marker provided. With
  this rename the flag to '--start-after'.
- Extend the documentation to add a note about concurrent modifications
  to the reference database.
- Link to v2: https://lore.kernel.org/r/20250704-306-git-for-each-ref-pagination-v2-0-bcde14acdd81@gmail.com
Changes in v2:
- Modify 'ref_iterator_seek()' to take in flags instead of a
  'set_prefix' variable. This improves readability, where users would
  use the 'REF_ITERATOR_SEEK_SET_PREFIX' instead of simply passing '1'.
- When the set prefix flag isn't usage, reset any previously set prefix.
  This ensures that the internal prefix state is always reset whenever
  we seek and unifies the behavior between 'ref_iterator_seek' and
  'ref_iterator_begin'.
- Don't allow '--skip-until' to be run with '--sort', since the seeking
  always takes place before any sorting and this can be confusing.
- Some styling fixes:
  - Remove extra newline
  - Skip braces around single lined if...else clause
  - Add braces around 'if' clause
  - Fix indentation
- Link to v1: https://lore.kernel.org/git/20250701-306-git-for-each-ref-pagination-v1-0-4f0ae7c0688f@gmail.com/
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-for-each-ref.adoc |  10 +-
 builtin/for-each-ref.c              |   8 ++
 ref-filter.c                        |  80 +++++++++++----
 ref-filter.h                        |   1 +
 refs.c                              |   6 +-
 refs.h                              | 155 ++++++++++++++++++++++++++++
 refs/debug.c                        |   7 +-
 refs/files-backend.c                |   7 +-
 refs/iterator.c                     |  26 +++--
 refs/packed-backend.c               |  17 ++--
 refs/ref-cache.c                    |  99 ++++++++++++++----
 refs/ref-cache.h                    |   7 --
 refs/refs-internal.h                | 152 ++--------------------------
 refs/reftable-backend.c             |  21 ++--
 t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
 15 files changed, 564 insertions(+), 226 deletions(-)
Karthik Nayak (4):
      refs: expose `ref_iterator` via 'refs.h'
      ref-cache: remove unused function 'find_ref_entry()'
      refs: selectively set prefix in the seek functions
      for-each-ref: introduce a '--start-after' option
Range-diff versus v3:
1:  eed39162f5 = 1:  9e6ecff291 refs: expose `ref_iterator` via 'refs.h'
2:  b9db49d31b = 2:  22f5222e4f ref-cache: remove unused function 'find_ref_entry()'
3:  502e2696fd ! 3:  0e71d8ffd9 refs: selectively set prefix in the seek functions
    @@ refs.h: struct ref_iterator *refs_ref_iterator_begin(
      
     +enum ref_iterator_seek_flag {
     +	/*
    -+	 * Also set the seek pattern as a prefix for iteration. This ensures
    -+	 * that only references which match the prefix are yielded.
    ++	 * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
    ++	 * updated to match the provided string, affecting all subsequent iterations. If
    ++	 * not, the iterator seeks to the specified reference and clears any previously
    ++	 * set prefix.
     +	 */
     +	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
     +};
    @@ refs.h: struct ref_iterator *refs_ref_iterator_begin(
     - * passed when creating the iterator will remain unchanged.
     + * This function is expected to behave as if a new ref iterator has been
     + * created, but allows reuse of existing iterators for optimization.
    -+ *
    -+ * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
    -+ * updated to match the seek string, affecting all subsequent iterations. If
    -+ * not, the iterator seeks to the specified reference and clears any previously
    -+ * set prefix.
       *
       * Returns 0 on success, a negative error code otherwise.
       */
     -int ref_iterator_seek(struct ref_iterator *ref_iterator,
     -		      const char *prefix);
    -+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
    ++int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname,
     +		      unsigned int flags);
      
      /*
    @@ refs/debug.c: static int debug_ref_iterator_advance(struct ref_iterator *ref_ite
      
      static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, unsigned int flags)
    ++				   const char *refname, unsigned int flags)
      {
      	struct debug_ref_iterator *diter =
      		(struct debug_ref_iterator *)ref_iterator;
     -	int res = diter->iter->vtable->seek(diter->iter, prefix);
     -	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
    -+	int res = diter->iter->vtable->seek(diter->iter, seek, flags);
    ++	int res = diter->iter->vtable->seek(diter->iter, refname, flags);
     +	trace_printf_key(&trace_refs, "iterator_seek: %s flags: %d: %d\n",
    -+			 seek ? seek : "", flags, res);
    ++			 refname ? refname : "", flags, res);
      	return res;
      }
      
    @@ refs/files-backend.c: static int files_ref_iterator_advance(struct ref_iterator
      
      static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, unsigned int flags)
    ++				   const char *refname, unsigned int flags)
      {
      	struct files_ref_iterator *iter =
      		(struct files_ref_iterator *)ref_iterator;
     -	return ref_iterator_seek(iter->iter0, prefix);
    -+	return ref_iterator_seek(iter->iter0, seek, flags);
    ++	return ref_iterator_seek(iter->iter0, refname, flags);
      }
      
      static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/files-backend.c: static int files_reflog_iterator_advance(struct ref_iterat
      
      static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -				      const char *prefix UNUSED)
    -+				      const char *seek UNUSED,
    ++				      const char *refname UNUSED,
     +				      unsigned int flags UNUSED)
      {
      	BUG("ref_iterator_seek() called for reflog_iterator");
    @@ refs/iterator.c: int ref_iterator_advance(struct ref_iterator *ref_iterator)
      
     -int ref_iterator_seek(struct ref_iterator *ref_iterator,
     -		      const char *prefix)
    -+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
    ++int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname,
     +		      unsigned int flags)
      {
     -	return ref_iterator->vtable->seek(ref_iterator, prefix);
    -+	return ref_iterator->vtable->seek(ref_iterator, seek, flags);
    ++	return ref_iterator->vtable->seek(ref_iterator, refname, flags);
      }
      
      int ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/iterator.c: static int empty_ref_iterator_advance(struct ref_iterator *ref_
      
      static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -				   const char *prefix UNUSED)
    -+				   const char *seek UNUSED,
    ++				   const char *refname UNUSED,
     +				   unsigned int flags UNUSED)
      {
      	return 0;
    @@ refs/iterator.c: static int merge_ref_iterator_advance(struct ref_iterator *ref_
      
      static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, unsigned int flags)
    ++				   const char *refname, unsigned int flags)
      {
      	struct merge_ref_iterator *iter =
      		(struct merge_ref_iterator *)ref_iterator;
    @@ refs/iterator.c: static int merge_ref_iterator_seek(struct ref_iterator *ref_ite
      	iter->iter1 = iter->iter1_owned;
      
     -	ret = ref_iterator_seek(iter->iter0, prefix);
    -+	ret = ref_iterator_seek(iter->iter0, seek, flags);
    ++	ret = ref_iterator_seek(iter->iter0, refname, flags);
      	if (ret < 0)
      		return ret;
      
     -	ret = ref_iterator_seek(iter->iter1, prefix);
    -+	ret = ref_iterator_seek(iter->iter1, seek, flags);
    ++	ret = ref_iterator_seek(iter->iter1, refname, flags);
      	if (ret < 0)
      		return ret;
      
    @@ refs/iterator.c: static int prefix_ref_iterator_advance(struct ref_iterator *ref
      
      static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				    const char *prefix)
    -+				    const char *seek, unsigned int flags)
    ++				    const char *refname, unsigned int flags)
      {
      	struct prefix_ref_iterator *iter =
      		(struct prefix_ref_iterator *)ref_iterator;
    @@ refs/iterator.c: static int prefix_ref_iterator_advance(struct ref_iterator *ref
     +
     +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
     +		free(iter->prefix);
    -+		iter->prefix = xstrdup_or_null(seek);
    ++		iter->prefix = xstrdup_or_null(refname);
     +	}
    -+	return ref_iterator_seek(iter->iter0, seek, flags);
    ++	return ref_iterator_seek(iter->iter0, refname, flags);
      }
      
      static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/packed-backend.c: static int packed_ref_iterator_advance(struct ref_iterato
      
      static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				    const char *prefix)
    -+				    const char *seek, unsigned int flags)
    ++				    const char *refname, unsigned int flags)
      {
      	struct packed_ref_iterator *iter =
      		(struct packed_ref_iterator *)ref_iterator;
    @@ refs/packed-backend.c: static int packed_ref_iterator_advance(struct ref_iterato
      
     -	if (prefix && *prefix)
     -		start = find_reference_location(iter->snapshot, prefix, 0);
    -+	if (seek && *seek)
    -+		start = find_reference_location(iter->snapshot, seek, 0);
    ++	if (refname && *refname)
    ++		start = find_reference_location(iter->snapshot, refname, 0);
      	else
      		start = iter->snapshot->start;
      
    @@ refs/packed-backend.c: static int packed_ref_iterator_advance(struct ref_iterato
     +	FREE_AND_NULL(iter->prefix);
     +
     +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
    -+		iter->prefix = xstrdup_or_null(seek);
    ++		iter->prefix = xstrdup_or_null(refname);
     +
      	iter->pos = start;
      	iter->eof = iter->snapshot->eof;
    @@ refs/ref-cache.c: static int cache_ref_iterator_seek(struct ref_iterator *ref_it
      }
      
     +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
    -+				   const char *seek, unsigned int flags)
    ++				   const char *refname, unsigned int flags)
     +{
     +	struct cache_ref_iterator *iter =
     +		(struct cache_ref_iterator *)ref_iterator;
     +
     +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
    -+		return cache_ref_iterator_set_prefix(iter, seek);
    -+	} else if (seek && *seek) {
    ++		return cache_ref_iterator_set_prefix(iter, refname);
    ++	} else if (refname && *refname) {
     +		struct cache_ref_iterator_level *level;
    -+		const char *slash = seek;
    ++		const char *slash = refname;
     +		struct ref_dir *dir;
     +
     +		dir = get_ref_dir(iter->cache->root);
     +
     +		if (iter->prime_dir)
    -+			prime_ref_dir(dir, seek);
    ++			prime_ref_dir(dir, refname);
     +
     +		iter->levels_nr = 1;
     +		level = &iter->levels[0];
    @@ refs/ref-cache.c: static int cache_ref_iterator_seek(struct ref_iterator *ref_it
     +			sort_ref_dir(dir);
     +
     +			slash = strchr(slash, '/');
    -+			len = slash ? slash - seek : (int)strlen(seek);
    ++			len = slash ? slash - refname : (int)strlen(refname);
     +
     +			for (idx = 0; idx < dir->nr; idx++) {
    -+				cmp = strncmp(seek, dir->entries[idx]->name, len);
    ++				cmp = strncmp(refname, dir->entries[idx]->name, len);
     +				if (cmp <= 0)
     +					break;
     +			}
    @@ refs/refs-internal.h: void base_ref_iterator_init(struct ref_iterator *iter,
       */
      typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
     -				 const char *prefix);
    -+				 const char *seek, unsigned int flags);
    ++				 const char *refname, unsigned int flags);
      
      /*
       * Peels the current ref, returning 0 for success or -1 for failure.
    @@ refs/reftable-backend.c: static int reftable_ref_iterator_advance(struct ref_ite
      
      static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				      const char *prefix)
    -+				      const char *seek, unsigned int flags)
    ++				      const char *refname, unsigned int flags)
      {
      	struct reftable_ref_iterator *iter =
      		(struct reftable_ref_iterator *)ref_iterator;
    @@ refs/reftable-backend.c: static int reftable_ref_iterator_advance(struct ref_ite
     +	iter->prefix_len = 0;
     +
     +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
    -+		iter->prefix = xstrdup_or_null(seek);
    -+		iter->prefix_len = seek ? strlen(seek) : 0;
    ++		iter->prefix = xstrdup_or_null(refname);
    ++		iter->prefix_len = refname ? strlen(refname) : 0;
     +	}
    -+	iter->err = reftable_iterator_seek_ref(&iter->iter, seek);
    ++	iter->err = reftable_iterator_seek_ref(&iter->iter, refname);
      
      	return iter->err;
      }
    @@ refs/reftable-backend.c: static int reftable_reflog_iterator_advance(struct ref_
      
      static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -					 const char *prefix UNUSED)
    -+					 const char *seek UNUSED,
    ++					 const char *refname UNUSED,
     +					 unsigned int flags UNUSED)
      {
      	BUG("reftable reflog iterator cannot be seeked");
4:  a571579886 ! 4:  e4e9dddd15 for-each-ref: introduce a '--start-after' option
    @@ Commit message
         'git-for-each-ref(1)'. When used, the reference iteration seeks to the
         lexicographically next reference and iterates from there onward.
     
    -    This enables efficient pagination workflows like:
    +    This enables efficient pagination workflows, where the calling script
    +    can remember the last provided reference and use that as the starting
    +    point for the next set of references:
             git for-each-ref --count=100
             git for-each-ref --count=100 --start-after=refs/heads/branch-100
             git for-each-ref --count=100 --start-after=refs/heads/branch-200
    @@ Documentation/git-for-each-ref.adoc: TAB %(refname)`.
      --include-root-refs::
      	List root refs (HEAD and pseudorefs) apart from regular refs.
      
    -+--start-after::
    ++--start-after=<marker>::
     +    Allows paginating the output by skipping references up to and including the
     +    specified marker. When paging, it should be noted that references may be
     +    deleted, modified or added between invocations. Output will only yield those
    -+    references which follow the marker lexicographically. If the marker does not
    -+    exist, output begins from the first reference that would come after it
    -+    alphabetically. Cannot be used with general pattern matching or custom
    -+    sort options.
    ++    references which follow the marker lexicographically. Output begins from the
    ++    first reference that would come after the marker alphabetically. Cannot be
    ++    used with general pattern matching or custom sort options.
     +
      FIELD NAMES
      -----------
base-commit: cf6f63ea6bf35173e02e18bdc6a4ba41288acff9
change-id: 20250605-306-git-for-each-ref-pagination-0ba8a29ae646
Thanks
- Karthik
^ permalink raw reply	[flat|nested] 102+ messages in thread* [PATCH v4 1/4] refs: expose `ref_iterator` via 'refs.h'
  2025-07-11 16:18 ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
@ 2025-07-11 16:18   ` Karthik Nayak
  2025-07-11 16:18   ` [PATCH v4 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-11 16:18 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The `ref_iterator` is an internal structure to the 'refs/'
sub-directory, which allows iteration over refs. All reference iteration
is built on top of these iterators.
External clients of the 'refs' subsystem use the various
'refs_for_each...()' functions to iterate over refs. However since these
are wrapper functions, each combination of functionality requires a new
wrapper function. This is not feasible as the functions pile up with the
increase in requirements. Expose the internal reference iterator, so
advanced users can mix and match options as needed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++
  | 145 +-------------------------------------------------
 2 files changed, 149 insertions(+), 143 deletions(-)
 --git a/refs.h b/refs.h
index 46a6008e07..7c21aaef3d 100644
--- a/refs.h
+++ b/refs.h
@@ -1190,4 +1190,151 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 				    unsigned int flags,
 				    struct strbuf *err);
 
+/*
+ * Reference iterators
+ *
+ * A reference iterator encapsulates the state of an in-progress
+ * iteration over references. Create an instance of `struct
+ * ref_iterator` via one of the functions in this module.
+ *
+ * A freshly-created ref_iterator doesn't yet point at a reference. To
+ * advance the iterator, call ref_iterator_advance(). If successful,
+ * this sets the iterator's refname, oid, and flags fields to describe
+ * the next reference and returns ITER_OK. The data pointed at by
+ * refname and oid belong to the iterator; if you want to retain them
+ * after calling ref_iterator_advance() again or calling
+ * ref_iterator_free(), you must make a copy. When the iteration has
+ * been exhausted, ref_iterator_advance() releases any resources
+ * associated with the iteration, frees the ref_iterator object, and
+ * returns ITER_DONE. If you want to abort the iteration early, call
+ * ref_iterator_free(), which also frees the ref_iterator object and
+ * any associated resources. If there was an internal error advancing
+ * to the next entry, ref_iterator_advance() aborts the iteration,
+ * frees the ref_iterator, and returns ITER_ERROR.
+ *
+ * The reference currently being looked at can be peeled by calling
+ * ref_iterator_peel(). This function is often faster than peel_ref(),
+ * so it should be preferred when iterating over references.
+ *
+ * Putting it all together, a typical iteration looks like this:
+ *
+ *     int ok;
+ *     struct ref_iterator *iter = ...;
+ *
+ *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+ *             if (want_to_stop_iteration()) {
+ *                     ok = ITER_DONE;
+ *                     break;
+ *             }
+ *
+ *             // Access information about the current reference:
+ *             if (!(iter->flags & REF_ISSYMREF))
+ *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
+ *
+ *             // If you need to peel the reference:
+ *             ref_iterator_peel(iter, &oid);
+ *     }
+ *
+ *     if (ok != ITER_DONE)
+ *             handle_error();
+ *     ref_iterator_free(iter);
+ */
+struct ref_iterator;
+
+/*
+ * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
+ * which feeds it).
+ */
+enum do_for_each_ref_flags {
+	/*
+	 * Include broken references in a do_for_each_ref*() iteration, which
+	 * would normally be omitted. This includes both refs that point to
+	 * missing objects (a true repository corruption), ones with illegal
+	 * names (which we prefer not to expose to callers), as well as
+	 * dangling symbolic refs (i.e., those that point to a non-existent
+	 * ref; this is not a corruption, but as they have no valid oid, we
+	 * omit them from normal iteration results).
+	 */
+	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
+
+	/*
+	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
+	 * Normally this will be used with a files ref_store, since that's
+	 * where all reference backends will presumably store their
+	 * per-worktree refs.
+	 */
+	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
+
+	/*
+	 * Omit dangling symrefs from output; this only has an effect with
+	 * INCLUDE_BROKEN, since they are otherwise not included at all.
+	 */
+	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
+
+	/*
+	 * Include root refs i.e. HEAD and pseudorefs along with the regular
+	 * refs.
+	 */
+	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
+};
+
+/*
+ * Return an iterator that goes over each reference in `refs` for
+ * which the refname begins with prefix. If trim is non-zero, then
+ * trim that many characters off the beginning of each refname.
+ * The output is ordered by refname.
+ */
+struct ref_iterator *refs_ref_iterator_begin(
+	struct ref_store *refs,
+	const char *prefix, const char **exclude_patterns,
+	int trim, enum do_for_each_ref_flags flags);
+
+/*
+ * Advance the iterator to the first or next item and return ITER_OK.
+ * If the iteration is exhausted, free the resources associated with
+ * the ref_iterator and return ITER_DONE. On errors, free the iterator
+ * resources and return ITER_ERROR. It is a bug to use ref_iterator or
+ * call this function again after it has returned ITER_DONE or
+ * ITER_ERROR.
+ */
+int ref_iterator_advance(struct ref_iterator *ref_iterator);
+
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
+/*
+ * If possible, peel the reference currently being viewed by the
+ * iterator. Return 0 on success.
+ */
+int ref_iterator_peel(struct ref_iterator *ref_iterator,
+		      struct object_id *peeled);
+
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
+
+/*
+ * The common backend for the for_each_*ref* functions. Call fn for
+ * each reference in iter. If the iterator itself ever returns
+ * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
+ * the iteration and return that value. Otherwise, return 0. In any
+ * case, free the iterator when done. This function is basically an
+ * adapter between the callback style of reference iteration and the
+ * iterator style.
+ */
+int do_for_each_ref_iterator(struct ref_iterator *iter,
+			     each_ref_fn fn, void *cb_data);
+
 #endif /* REFS_H */
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index f868870851..03f5df04d5 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -244,90 +244,8 @@ const char *find_descendant_ref(const char *dirname,
 #define SYMREF_MAXDEPTH 5
 
 /*
- * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
- * which feeds it).
- */
-enum do_for_each_ref_flags {
-	/*
-	 * Include broken references in a do_for_each_ref*() iteration, which
-	 * would normally be omitted. This includes both refs that point to
-	 * missing objects (a true repository corruption), ones with illegal
-	 * names (which we prefer not to expose to callers), as well as
-	 * dangling symbolic refs (i.e., those that point to a non-existent
-	 * ref; this is not a corruption, but as they have no valid oid, we
-	 * omit them from normal iteration results).
-	 */
-	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
-
-	/*
-	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
-	 * Normally this will be used with a files ref_store, since that's
-	 * where all reference backends will presumably store their
-	 * per-worktree refs.
-	 */
-	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
-
-	/*
-	 * Omit dangling symrefs from output; this only has an effect with
-	 * INCLUDE_BROKEN, since they are otherwise not included at all.
-	 */
-	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
-
-	/*
-	 * Include root refs i.e. HEAD and pseudorefs along with the regular
-	 * refs.
-	 */
-	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
-};
-
-/*
- * Reference iterators
- *
- * A reference iterator encapsulates the state of an in-progress
- * iteration over references. Create an instance of `struct
- * ref_iterator` via one of the functions in this module.
- *
- * A freshly-created ref_iterator doesn't yet point at a reference. To
- * advance the iterator, call ref_iterator_advance(). If successful,
- * this sets the iterator's refname, oid, and flags fields to describe
- * the next reference and returns ITER_OK. The data pointed at by
- * refname and oid belong to the iterator; if you want to retain them
- * after calling ref_iterator_advance() again or calling
- * ref_iterator_free(), you must make a copy. When the iteration has
- * been exhausted, ref_iterator_advance() releases any resources
- * associated with the iteration, frees the ref_iterator object, and
- * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_free(), which also frees the ref_iterator object and
- * any associated resources. If there was an internal error advancing
- * to the next entry, ref_iterator_advance() aborts the iteration,
- * frees the ref_iterator, and returns ITER_ERROR.
- *
- * The reference currently being looked at can be peeled by calling
- * ref_iterator_peel(). This function is often faster than peel_ref(),
- * so it should be preferred when iterating over references.
- *
- * Putting it all together, a typical iteration looks like this:
- *
- *     int ok;
- *     struct ref_iterator *iter = ...;
- *
- *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = ITER_DONE;
- *                     break;
- *             }
- *
- *             // Access information about the current reference:
- *             if (!(iter->flags & REF_ISSYMREF))
- *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
- *
- *             // If you need to peel the reference:
- *             ref_iterator_peel(iter, &oid);
- *     }
- *
- *     if (ok != ITER_DONE)
- *             handle_error();
- *     ref_iterator_free(iter);
+ * Data structure for holding a reference iterator. See refs.h for
+ * more details and usage instructions.
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -337,42 +255,6 @@ struct ref_iterator {
 	unsigned int flags;
 };
 
-/*
- * Advance the iterator to the first or next item and return ITER_OK.
- * If the iteration is exhausted, free the resources associated with
- * the ref_iterator and return ITER_DONE. On errors, free the iterator
- * resources and return ITER_ERROR. It is a bug to use ref_iterator or
- * call this function again after it has returned ITER_DONE or
- * ITER_ERROR.
- */
-int ref_iterator_advance(struct ref_iterator *ref_iterator);
-
-/*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
- * first reference again.
- *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
- *
- * Returns 0 on success, a negative error code otherwise.
- */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
-
-/*
- * If possible, peel the reference currently being viewed by the
- * iterator. Return 0 on success.
- */
-int ref_iterator_peel(struct ref_iterator *ref_iterator,
-		      struct object_id *peeled);
-
-/* Free the reference iterator and any associated resources. */
-void ref_iterator_free(struct ref_iterator *ref_iterator);
-
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
  * returns ITER_DONE).
@@ -384,17 +266,6 @@ struct ref_iterator *empty_ref_iterator_begin(void);
  */
 int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
 
-/*
- * Return an iterator that goes over each reference in `refs` for
- * which the refname begins with prefix. If trim is non-zero, then
- * trim that many characters off the beginning of each refname.
- * The output is ordered by refname.
- */
-struct ref_iterator *refs_ref_iterator_begin(
-		struct ref_store *refs,
-		const char *prefix, const char **exclude_patterns,
-		int trim, enum do_for_each_ref_flags flags);
-
 /*
  * A callback function used to instruct merge_ref_iterator how to
  * interleave the entries from iter0 and iter1. The function should
@@ -520,18 +391,6 @@ struct ref_iterator_vtable {
  */
 extern struct ref_iterator *current_ref_iter;
 
-/*
- * The common backend for the for_each_*ref* functions. Call fn for
- * each reference in iter. If the iterator itself ever returns
- * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
- * the iteration and return that value. Otherwise, return 0. In any
- * case, free the iterator when done. This function is basically an
- * adapter between the callback style of reference iteration and the
- * iterator style.
- */
-int do_for_each_ref_iterator(struct ref_iterator *iter,
-			     each_ref_fn fn, void *cb_data);
-
 struct ref_store;
 
 /* refs backends */
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v4 2/4] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-11 16:18 ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
  2025-07-11 16:18   ` [PATCH v4 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
@ 2025-07-11 16:18   ` Karthik Nayak
  2025-07-11 16:18   ` [PATCH v4 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-11 16:18 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The 'find_ref_entry' function is no longer used, so remove it.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  | 14 --------------
  |  7 -------
 2 files changed, 21 deletions(-)
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index c1f1bab1d5..8aaffa8c6b 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -194,20 +194,6 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir,
 	return dir;
 }
 
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
-{
-	int entry_index;
-	struct ref_entry *entry;
-	dir = find_containing_dir(dir, refname);
-	if (!dir)
-		return NULL;
-	entry_index = search_ref_dir(dir, refname, strlen(refname));
-	if (entry_index == -1)
-		return NULL;
-	entry = dir->entries[entry_index];
-	return (entry->flag & REF_DIR) ? NULL : entry;
-}
-
 /*
  * Emit a warning and return true iff ref1 and ref2 have the same name
  * and the same oid. Die if they have the same name but different
 --git a/refs/ref-cache.h b/refs/ref-cache.h
index 5f04e518c3..f635d2d824 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -201,13 +201,6 @@ void free_ref_cache(struct ref_cache *cache);
  */
 void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
 
-/*
- * Find the value entry with the given name in dir, sorting ref_dirs
- * and recursing into subdirectories as necessary.  If the name is not
- * found or it corresponds to a directory entry, return NULL.
- */
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
-
 /*
  * Start iterating over references in `cache`. If `prefix` is
  * specified, only include references whose names start with that
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v4 3/4] refs: selectively set prefix in the seek functions
  2025-07-11 16:18 ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
  2025-07-11 16:18   ` [PATCH v4 1/4] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
  2025-07-11 16:18   ` [PATCH v4 2/4] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
@ 2025-07-11 16:18   ` Karthik Nayak
  2025-07-14 10:34     ` Christian Couder
  2025-07-11 16:18   ` [PATCH v4 4/4] for-each-ref: introduce a '--start-after' option Karthik Nayak
  2025-07-14 16:34   ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Christian Couder
  4 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-11 16:18 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.
However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.
Let's add a 'flags' field to the function, which when set with the
'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
iteration in-line with the existing behavior. Otherwise, the reference
backends will simply seek to the specified reference and clears any
previously set prefix. This allows users to start iteration from a
specific reference.
In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
then parse the provided seek string and set the required levels and
their indexes to ensure that seeking is possible.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                   |  6 ++--
                   | 26 +++++++++------
             |  7 ++--
     |  7 ++--
          | 26 ++++++++-------
    | 17 ++++++----
         | 85 ++++++++++++++++++++++++++++++++++++++++++++++---
     |  7 ++--
  | 21 ++++++++----
 9 files changed, 152 insertions(+), 50 deletions(-)
 --git a/refs.c b/refs.c
index dce5c49ca2..243e6898b8 100644
--- a/refs.c
+++ b/refs.c
@@ -2666,12 +2666,12 @@ enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs
 		if (!initial_transaction) {
 			int ok;
 
-			if (!iter) {
+			if (!iter)
 				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 							       DO_FOR_EACH_INCLUDE_BROKEN);
-			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+			else if (ref_iterator_seek(iter, dirname.buf,
+						   REF_ITERATOR_SEEK_SET_PREFIX) < 0)
 				goto cleanup;
-			}
 
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 --git a/refs.h b/refs.h
index 7c21aaef3d..e6780a8848 100644
--- a/refs.h
+++ b/refs.h
@@ -1299,21 +1299,29 @@ struct ref_iterator *refs_ref_iterator_begin(
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+enum ref_iterator_seek_flag {
+	/*
+	 * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
+	 * updated to match the provided string, affecting all subsequent iterations. If
+	 * not, the iterator seeks to the specified reference and clears any previously
+	 * set prefix.
+	 */
+	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
+};
+
 /*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * Seek the iterator to the first reference matching the given seek string.
+ * The seek string is matched as a literal string, without regard for path
+ * separators. If seek is NULL or the empty string, seek the iterator to the
  * first reference again.
  *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
+ * This function is expected to behave as if a new ref iterator has been
+ * created, but allows reuse of existing iterators for optimization.
  *
  * Returns 0 on success, a negative error code otherwise.
  */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname,
+		      unsigned int flags);
 
 /*
  * If possible, peel the reference currently being viewed by the
 --git a/refs/debug.c b/refs/debug.c
index 485e3079d7..da300efaf3 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -170,12 +170,13 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *refname, unsigned int flags)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->seek(diter->iter, prefix);
-	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	int res = diter->iter->vtable->seek(diter->iter, refname, flags);
+	trace_printf_key(&trace_refs, "iterator_seek: %s flags: %d: %d\n",
+			 refname ? refname : "", flags, res);
 	return res;
 }
 
 --git a/refs/files-backend.c b/refs/files-backend.c
index bf6f89b1d1..8b282f2a60 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -929,11 +929,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *refname, unsigned int flags)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	return ref_iterator_seek(iter->iter0, prefix);
+	return ref_iterator_seek(iter->iter0, refname, flags);
 }
 
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -2316,7 +2316,8 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				      const char *prefix UNUSED)
+				      const char *refname UNUSED,
+				      unsigned int flags UNUSED)
 {
 	BUG("ref_iterator_seek() called for reflog_iterator");
 }
 --git a/refs/iterator.c b/refs/iterator.c
index 766d96e795..17ef841d8a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,10 +15,10 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix)
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname,
+		      unsigned int flags)
 {
-	return ref_iterator->vtable->seek(ref_iterator, prefix);
+	return ref_iterator->vtable->seek(ref_iterator, refname, flags);
 }
 
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -57,7 +57,8 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 }
 
 static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				   const char *prefix UNUSED)
+				   const char *refname UNUSED,
+				   unsigned int flags UNUSED)
 {
 	return 0;
 }
@@ -224,7 +225,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *refname, unsigned int flags)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
@@ -234,11 +235,11 @@ static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	iter->iter0 = iter->iter0_owned;
 	iter->iter1 = iter->iter1_owned;
 
-	ret = ref_iterator_seek(iter->iter0, prefix);
+	ret = ref_iterator_seek(iter->iter0, refname, flags);
 	if (ret < 0)
 		return ret;
 
-	ret = ref_iterator_seek(iter->iter1, prefix);
+	ret = ref_iterator_seek(iter->iter1, refname, flags);
 	if (ret < 0)
 		return ret;
 
@@ -407,13 +408,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *refname, unsigned int flags)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	return ref_iterator_seek(iter->iter0, prefix);
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		free(iter->prefix);
+		iter->prefix = xstrdup_or_null(refname);
+	}
+	return ref_iterator_seek(iter->iter0, refname, flags);
 }
 
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 --git a/refs/packed-backend.c b/refs/packed-backend.c
index 7fd73a0e6d..5fa4ae6655 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1004,19 +1004,23 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *refname, unsigned int flags)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
 	const char *start;
 
-	if (prefix && *prefix)
-		start = find_reference_location(iter->snapshot, prefix, 0);
+	if (refname && *refname)
+		start = find_reference_location(iter->snapshot, refname, 0);
 	else
 		start = iter->snapshot->start;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
+	/* Unset any previously set prefix */
+	FREE_AND_NULL(iter->prefix);
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
+		iter->prefix = xstrdup_or_null(refname);
+
 	iter->pos = start;
 	iter->eof = iter->snapshot->eof;
 
@@ -1194,7 +1198,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (packed_ref_iterator_seek(&iter->base, prefix,
+				     REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index 8aaffa8c6b..1d95b56d40 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -434,11 +434,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
-static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+static int cache_ref_iterator_set_prefix(struct cache_ref_iterator *iter,
+					 const char *prefix)
 {
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
 	struct cache_ref_iterator_level *level;
 	struct ref_dir *dir;
 
@@ -469,6 +467,82 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	return 0;
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *refname, unsigned int flags)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		return cache_ref_iterator_set_prefix(iter, refname);
+	} else if (refname && *refname) {
+		struct cache_ref_iterator_level *level;
+		const char *slash = refname;
+		struct ref_dir *dir;
+
+		dir = get_ref_dir(iter->cache->root);
+
+		if (iter->prime_dir)
+			prime_ref_dir(dir, refname);
+
+		iter->levels_nr = 1;
+		level = &iter->levels[0];
+		level->index = -1;
+		level->dir = dir;
+
+		/* Unset any previously set prefix */
+		FREE_AND_NULL(iter->prefix);
+
+		/*
+		 * Breakdown the provided seek path and assign the correct
+		 * indexing to each level as needed.
+		 */
+		do {
+			int len, idx;
+			int cmp = 0;
+
+			sort_ref_dir(dir);
+
+			slash = strchr(slash, '/');
+			len = slash ? slash - refname : (int)strlen(refname);
+
+			for (idx = 0; idx < dir->nr; idx++) {
+				cmp = strncmp(refname, dir->entries[idx]->name, len);
+				if (cmp <= 0)
+					break;
+			}
+			/* don't overflow the index */
+			idx = idx >= dir->nr ? dir->nr - 1 : idx;
+
+			if (slash)
+				slash = slash + 1;
+
+			level->index = idx;
+			if (dir->entries[idx]->flag & REF_DIR) {
+				/* push down a level */
+				dir = get_ref_dir(dir->entries[idx]);
+
+				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+					   iter->levels_alloc);
+				level = &iter->levels[iter->levels_nr++];
+				level->dir = dir;
+				level->index = -1;
+			} else {
+				/* reduce the index so the leaf node is iterated over */
+				if (cmp <= 0 && !slash)
+					level->index = idx - 1;
+				/*
+				 * while the seek path may not be exhausted, our
+				 * match is exhausted at a leaf node.
+				 */
+				break;
+			}
+		} while (slash);
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -509,7 +583,8 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 	iter->cache = cache;
 	iter->prime_dir = prime_dir;
 
-	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (cache_ref_iterator_seek(&iter->base, prefix,
+				    REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index 03f5df04d5..90de7837f8 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -353,11 +353,12 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 /*
- * Seek the iterator to the first reference matching the given prefix. Should
- * behave the same as if a new iterator was created with the same prefix.
+ * Seek the iterator to the first matching reference. If set_prefix is set,
+ * it would behave the same as if a new iterator was created with the same
+ * prefix.
  */
 typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
-				 const char *prefix);
+				 const char *refname, unsigned int flags);
 
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
 --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 4c3817f4ec..c3d48cc412 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -719,15 +719,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				      const char *prefix)
+				      const char *refname, unsigned int flags)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
-	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+	/* Unset any previously set prefix */
+	FREE_AND_NULL(iter->prefix);
+	iter->prefix_len = 0;
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		iter->prefix = xstrdup_or_null(refname);
+		iter->prefix_len = refname ? strlen(refname) : 0;
+	}
+	iter->err = reftable_iterator_seek_ref(&iter->iter, refname);
 
 	return iter->err;
 }
@@ -839,7 +844,8 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	ret = reftable_ref_iterator_seek(&iter->base, prefix);
+	ret = reftable_ref_iterator_seek(&iter->base, prefix,
+					 REF_ITERATOR_SEEK_SET_PREFIX);
 	if (ret)
 		goto done;
 
@@ -2042,7 +2048,8 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-					 const char *prefix UNUSED)
+					 const char *refname UNUSED,
+					 unsigned int flags UNUSED)
 {
 	BUG("reftable reflog iterator cannot be seeked");
 	return -1;
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v4 3/4] refs: selectively set prefix in the seek functions
  2025-07-11 16:18   ` [PATCH v4 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-14 10:34     ` Christian Couder
  2025-07-15  8:19       ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Christian Couder @ 2025-07-14 10:34 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, ps, schwab, phillip.wood123
On Fri, Jul 11, 2025 at 6:20 PM Karthik Nayak <karthik.188@gmail.com> wrote:
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 03f5df04d5..90de7837f8 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -353,11 +353,12 @@ void base_ref_iterator_init(struct ref_iterator *iter,
>  typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
>
>  /*
> - * Seek the iterator to the first reference matching the given prefix. Should
> - * behave the same as if a new iterator was created with the same prefix.
> + * Seek the iterator to the first matching reference. If set_prefix is set,
s/If set_prefix is set/If the REF_ITERATOR_SEEK_SET_PREFIX flag is set/
> + * it would behave the same as if a new iterator was created with the same
> + * prefix.
Maybe: s/with the same prefix/at the same reference/
>   */
>  typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
> -                                const char *prefix);
> +                                const char *refname, unsigned int flags);
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v4 3/4] refs: selectively set prefix in the seek functions
  2025-07-14 10:34     ` Christian Couder
@ 2025-07-15  8:19       ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15  8:19 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, gitster, ps, schwab, phillip.wood123
[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]
Christian Couder <christian.couder@gmail.com> writes:
> On Fri, Jul 11, 2025 at 6:20 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
>> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
>> index 03f5df04d5..90de7837f8 100644
>> --- a/refs/refs-internal.h
>> +++ b/refs/refs-internal.h
>> @@ -353,11 +353,12 @@ void base_ref_iterator_init(struct ref_iterator *iter,
>>  typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
>>
>>  /*
>> - * Seek the iterator to the first reference matching the given prefix. Should
>> - * behave the same as if a new iterator was created with the same prefix.
>> + * Seek the iterator to the first matching reference. If set_prefix is set,
>
> s/If set_prefix is set/If the REF_ITERATOR_SEEK_SET_PREFIX flag is set/
>
Will change, thanks.
>> + * it would behave the same as if a new iterator was created with the same
>> + * prefix.
>
> Maybe: s/with the same prefix/at the same reference/
>
Changed it to
  If the REF_ITERATOR_SEEK_SET_PREFIX flag is set, it would behave the
  same as if a new iterator was created with the provided refname as prefix.
>>   */
>>  typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
>> -                                const char *prefix);
>> +                                const char *refname, unsigned int flags);
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH v4 4/4] for-each-ref: introduce a '--start-after' option
  2025-07-11 16:18 ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-07-11 16:18   ` [PATCH v4 3/4] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-11 16:18   ` Karthik Nayak
  2025-07-14 16:04     ` Christian Couder
  2025-07-14 16:34   ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Christian Couder
  4 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-11 16:18 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
The previous commit added 'seek' functionality to the reference
backends. Utilize this and expose a '--start-after' option in
'git-for-each-ref(1)'. When used, the reference iteration seeks to the
lexicographically next reference and iterates from there onward.
This enables efficient pagination workflows, where the calling script
can remember the last provided reference and use that as the starting
point for the next set of references:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200
Since the reference iterators only allow seeking to a specified marker
via the `ref_iterator_seek()`, we introduce a helper function
`start_ref_iterator_after()`, which seeks to next reference by simply
adding (char) 1 to the marker.
We must note that pagination always continues from the provided marker,
as such any concurrent reference updates lexicographically behind the
marker will not be output. Document the same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  |  10 +-
               |   8 ++
                         |  80 +++++++++++----
                         |   1 +
       | 194 ++++++++++++++++++++++++++++++++++++
 5 files changed, 272 insertions(+), 21 deletions(-)
 --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
index 5ef89fc0fe..ae61ba642a 100644
--- a/Documentation/git-for-each-ref.adoc
+++ b/Documentation/git-for-each-ref.adoc
@@ -14,7 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
-		   [--exclude=<pattern> ...]
+		   [--exclude=<pattern> ...] [--start-after=<marker>]
 
 DESCRIPTION
 -----------
@@ -108,6 +108,14 @@ TAB %(refname)`.
 --include-root-refs::
 	List root refs (HEAD and pseudorefs) apart from regular refs.
 
+--start-after=<marker>::
+    Allows paginating the output by skipping references up to and including the
+    specified marker. When paging, it should be noted that references may be
+    deleted, modified or added between invocations. Output will only yield those
+    references which follow the marker lexicographically. Output begins from the
+    first reference that would come after the marker alphabetically. Cannot be
+    used with general pattern matching or custom sort options.
+
 FIELD NAMES
 -----------
 
 --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 3d2207ec77..3f21598046 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -13,6 +13,7 @@ static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
+	N_("git for-each-ref [--start-after <marker>]"),
 	NULL
 };
 
@@ -44,6 +45,7 @@ int cmd_for_each_ref(int argc,
 		OPT_GROUP(""),
 		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
+		OPT_STRING(  0 , "start-after", &filter.start_after, N_("start-start"), N_("start iteration after the provided marker")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
@@ -79,6 +81,9 @@ int cmd_for_each_ref(int argc,
 	if (verify_ref_format(&format))
 		usage_with_options(for_each_ref_usage, opts);
 
+	if (filter.start_after && sorting_options.nr > 1)
+		die(_("cannot use --start-after with custom sort options"));
+
 	sorting = ref_sorting_options(&sorting_options);
 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
 	filter.ignore_case = icase;
@@ -100,6 +105,9 @@ int cmd_for_each_ref(int argc,
 		filter.name_patterns = argv;
 	}
 
+	if (filter.start_after && filter.name_patterns && filter.name_patterns[0])
+		die(_("cannot use --start-after with patterns"));
+
 	if (include_root_refs)
 		flags |= FILTER_REFS_ROOT_REFS | FILTER_REFS_DETACHED_HEAD;
 
 --git a/ref-filter.c b/ref-filter.c
index 7a274633cf..2dfd385313 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2683,6 +2683,24 @@ static int filter_exclude_match(struct ref_filter *filter, const char *refname)
 	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
 }
 
+/*
+ * We need to seek to the reference right after a given marker but excluding any
+ * matching references. So we seek to the lexicographically next reference.
+ */
+static int start_ref_iterator_after(struct ref_iterator *iter, const char *marker)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int ret;
+
+	strbuf_addstr(&sb, marker);
+	strbuf_addch(&sb, 1);
+
+	ret = ref_iterator_seek(iter, sb.buf, 0);
+
+	strbuf_release(&sb);
+	return ret;
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -2692,10 +2710,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 				       each_ref_fn cb,
 				       void *cb_data)
 {
+	struct ref_iterator *iter;
+	int flags = 0, ret = 0;
+
 	if (filter->kind & FILTER_REFS_ROOT_REFS) {
 		/* In this case, we want to print all refs including root refs. */
-		return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
-						       cb, cb_data);
+		flags |= DO_FOR_EACH_INCLUDE_ROOT_REFS;
+		goto non_prefix_iter;
 	}
 
 	if (!filter->match_as_path) {
@@ -2704,8 +2725,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	if (filter->ignore_case) {
@@ -2714,20 +2734,29 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", filter->exclude.v, cb, cb_data);
+		goto non_prefix_iter;
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
 						 filter->exclude.v,
 						 cb, cb_data);
+
+non_prefix_iter:
+	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
+				       NULL, 0, flags);
+	if (filter->start_after)
+		ret = start_ref_iterator_after(iter, filter->start_after);
+
+	if (ret)
+		return ret;
+
+	return do_for_each_ref_iterator(iter, cb, cb_data);
 }
 
 /*
@@ -3197,9 +3226,11 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 	init_contains_cache(&filter->internal.no_contains_cache);
 
 	/*  Simple per-ref filtering */
-	if (!filter->kind)
+	if (!filter->kind) {
 		die("filter_refs: invalid type");
-	else {
+	} else {
+		const char *prefix = NULL;
+
 		/*
 		 * For common cases where we need only branches or remotes or tags,
 		 * we only iterate through those refs. If a mix of refs is needed,
@@ -3207,19 +3238,28 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/heads/", NULL,
-						       fn, cb_data);
+			prefix = "refs/heads/";
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/remotes/", NULL,
-						       fn, cb_data);
+			prefix = "refs/remotes/";
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/tags/", NULL, fn,
-						       cb_data);
-		else if (filter->kind & FILTER_REFS_REGULAR)
+			prefix = "refs/tags/";
+
+		if (prefix) {
+			struct ref_iterator *iter;
+
+			iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
+						       "", NULL, 0, 0);
+
+			if (filter->start_after)
+				ret = start_ref_iterator_after(iter, filter->start_after);
+			else if (prefix)
+				ret = ref_iterator_seek(iter, prefix, 1);
+
+			if (!ret)
+				ret = do_for_each_ref_iterator(iter, fn, cb_data);
+		} else if (filter->kind & FILTER_REFS_REGULAR) {
 			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
+		}
 
 		/*
 		 * When printing all ref types, HEAD is already included,
 --git a/ref-filter.h b/ref-filter.h
index c98c4fbd4c..f22ca94b49 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -64,6 +64,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	const char *start_after;
 	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index bb02b86c16..a43e099118 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -541,4 +541,198 @@ test_expect_success 'validate worktree atom' '
 	test_cmp expect actual
 '
 
+test_expect_success 'start after with empty value' '
+	cat >expect <<-\EOF &&
+	refs/heads/main
+	refs/heads/main_worktree
+	refs/heads/side
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after="" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after a specific reference with partial match' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/sp >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, just behind a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/parrot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after with specific directory match' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after with specific directory and trailing slash' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, just behind a specific directory' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, overflow specific reference length' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spotnew >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, overflow specific reference path' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot/new >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, last reference' '
+	cat >expect <<-\EOF &&
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/tags/two >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after used with a pattern' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --start-after with patterns
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot refs/tags 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after used with custom sort order' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --start-after with custom sort options
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot --sort=author 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v4 4/4] for-each-ref: introduce a '--start-after' option
  2025-07-11 16:18   ` [PATCH v4 4/4] for-each-ref: introduce a '--start-after' option Karthik Nayak
@ 2025-07-14 16:04     ` Christian Couder
  2025-07-14 16:42       ` Junio C Hamano
  2025-07-15  8:42       ` Karthik Nayak
  0 siblings, 2 replies; 102+ messages in thread
From: Christian Couder @ 2025-07-14 16:04 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, ps, schwab, phillip.wood123
On Fri, Jul 11, 2025 at 6:21 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>  /*
>   * This is the same as for_each_fullref_in(), but it tries to iterate
>   * only over the patterns we'll care about. Note that it _doesn't_ do a full
> @@ -2692,10 +2710,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>                                        each_ref_fn cb,
>                                        void *cb_data)
>  {
> +       struct ref_iterator *iter;
> +       int flags = 0, ret = 0;
> +
>         if (filter->kind & FILTER_REFS_ROOT_REFS) {
>                 /* In this case, we want to print all refs including root refs. */
> -               return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
> -                                                      cb, cb_data);
> +               flags |= DO_FOR_EACH_INCLUDE_ROOT_REFS;
> +               goto non_prefix_iter;
>         }
>
>         if (!filter->match_as_path) {
> @@ -2704,8 +2725,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>                  * prefixes like "refs/heads/" etc. are stripped off,
>                  * so we have to look at everything:
>                  */
> -               return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> -                                               "", NULL, cb, cb_data);
> +               goto non_prefix_iter;
>         }
>
>         if (filter->ignore_case) {
> @@ -2714,20 +2734,29 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>                  * so just return everything and let the caller
>                  * sort it out.
>                  */
> -               return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> -                                               "", NULL, cb, cb_data);
> +               goto non_prefix_iter;
>         }
>
>         if (!filter->name_patterns[0]) {
>                 /* no patterns; we have to look at everything */
> -               return refs_for_each_fullref_in(get_main_ref_store(the_repository),
> -                                                "", filter->exclude.v, cb, cb_data);
> +               goto non_prefix_iter;
>         }
>
>         return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
>                                                  NULL, filter->name_patterns,
>                                                  filter->exclude.v,
>                                                  cb, cb_data);
> +
> +non_prefix_iter:
> +       iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
> +                                      NULL, 0, flags);
> +       if (filter->start_after)
> +               ret = start_ref_iterator_after(iter, filter->start_after);
> +
> +       if (ret)
> +               return ret;
> +
> +       return do_for_each_ref_iterator(iter, cb, cb_data);
>  }
Nit: I wonder if what is under the 'non_prefix_iter' label could be in
a new function and instead of `goto non_prefix_iter` we could return
the result of the new function.
>  /*
> @@ -3197,9 +3226,11 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
>         init_contains_cache(&filter->internal.no_contains_cache);
>
>         /*  Simple per-ref filtering */
> -       if (!filter->kind)
> +       if (!filter->kind) {
>                 die("filter_refs: invalid type");
> -       else {
> +       } else {
Nit: the `else` could be removed altogether here, but maybe that
should be done in a preparatory patch.
> +               const char *prefix = NULL;
> +
[...]
> +test_expect_success 'start after with specific directory and trailing slash' '
> +       cat >expect <<-\EOF &&
> +       refs/odd/spot
> +       refs/tags/annotated-tag
> +       refs/tags/doubly-annotated-tag
> +       refs/tags/doubly-signed-tag
> +       refs/tags/foo1.10
> +       refs/tags/foo1.3
> +       refs/tags/foo1.6
> +       refs/tags/four
> +       refs/tags/one
> +       refs/tags/signed-tag
> +       refs/tags/three
> +       refs/tags/two
> +       EOF
> +       git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
I don't see a trailing slash.
> +       test_cmp expect actual
> +'
> +
> +test_expect_success 'start after, just behind a specific directory' '
> +       cat >expect <<-\EOF &&
> +       refs/odd/spot
> +       refs/tags/annotated-tag
> +       refs/tags/doubly-annotated-tag
> +       refs/tags/doubly-signed-tag
> +       refs/tags/foo1.10
> +       refs/tags/foo1.3
> +       refs/tags/foo1.6
> +       refs/tags/four
> +       refs/tags/one
> +       refs/tags/signed-tag
> +       refs/tags/three
> +       refs/tags/two
> +       EOF
> +       git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
Here there is a trailing slash though.
> +       test_cmp expect actual
> +'
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v4 4/4] for-each-ref: introduce a '--start-after' option
  2025-07-14 16:04     ` Christian Couder
@ 2025-07-14 16:42       ` Junio C Hamano
  2025-07-15  8:42       ` Karthik Nayak
  1 sibling, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2025-07-14 16:42 UTC (permalink / raw)
  To: Christian Couder; +Cc: Karthik Nayak, git, ps, schwab, phillip.wood123
Christian Couder <christian.couder@gmail.com> writes:
>>         /*  Simple per-ref filtering */
>> -       if (!filter->kind)
>> +       if (!filter->kind) {
>>                 die("filter_refs: invalid type");
>> -       else {
>> +       } else {
>
> Nit: the `else` could be removed altogether here, but maybe that
> should be done in a preparatory patch.
Good eyes.  Thanks for carefully reading it over.
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v4 4/4] for-each-ref: introduce a '--start-after' option
  2025-07-14 16:04     ` Christian Couder
  2025-07-14 16:42       ` Junio C Hamano
@ 2025-07-15  8:42       ` Karthik Nayak
  1 sibling, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15  8:42 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, gitster, ps, schwab, phillip.wood123
[-- Attachment #1: Type: text/plain, Size: 5541 bytes --]
Christian Couder <christian.couder@gmail.com> writes:
> On Fri, Jul 11, 2025 at 6:21 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
>>  /*
>>   * This is the same as for_each_fullref_in(), but it tries to iterate
>>   * only over the patterns we'll care about. Note that it _doesn't_ do a full
>> @@ -2692,10 +2710,13 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>>                                        each_ref_fn cb,
>>                                        void *cb_data)
>>  {
>> +       struct ref_iterator *iter;
>> +       int flags = 0, ret = 0;
>> +
>>         if (filter->kind & FILTER_REFS_ROOT_REFS) {
>>                 /* In this case, we want to print all refs including root refs. */
>> -               return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
>> -                                                      cb, cb_data);
>> +               flags |= DO_FOR_EACH_INCLUDE_ROOT_REFS;
>> +               goto non_prefix_iter;
>>         }
>>
>>         if (!filter->match_as_path) {
>> @@ -2704,8 +2725,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>>                  * prefixes like "refs/heads/" etc. are stripped off,
>>                  * so we have to look at everything:
>>                  */
>> -               return refs_for_each_fullref_in(get_main_ref_store(the_repository),
>> -                                               "", NULL, cb, cb_data);
>> +               goto non_prefix_iter;
>>         }
>>
>>         if (filter->ignore_case) {
>> @@ -2714,20 +2734,29 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
>>                  * so just return everything and let the caller
>>                  * sort it out.
>>                  */
>> -               return refs_for_each_fullref_in(get_main_ref_store(the_repository),
>> -                                               "", NULL, cb, cb_data);
>> +               goto non_prefix_iter;
>>         }
>>
>>         if (!filter->name_patterns[0]) {
>>                 /* no patterns; we have to look at everything */
>> -               return refs_for_each_fullref_in(get_main_ref_store(the_repository),
>> -                                                "", filter->exclude.v, cb, cb_data);
>> +               goto non_prefix_iter;
>>         }
>>
>>         return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
>>                                                  NULL, filter->name_patterns,
>>                                                  filter->exclude.v,
>>                                                  cb, cb_data);
>> +
>> +non_prefix_iter:
>> +       iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
>> +                                      NULL, 0, flags);
>> +       if (filter->start_after)
>> +               ret = start_ref_iterator_after(iter, filter->start_after);
>> +
>> +       if (ret)
>> +               return ret;
>> +
>> +       return do_for_each_ref_iterator(iter, cb, cb_data);
>>  }
>
> Nit: I wonder if what is under the 'non_prefix_iter' label could be in
> a new function and instead of `goto non_prefix_iter` we could return
> the result of the new function.
>
Yeah, that would work too. Let me do that and make it nicer! Thanks for
the suggestion.
>>  /*
>> @@ -3197,9 +3226,11 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
>>         init_contains_cache(&filter->internal.no_contains_cache);
>>
>>         /*  Simple per-ref filtering */
>> -       if (!filter->kind)
>> +       if (!filter->kind) {
>>                 die("filter_refs: invalid type");
>> -       else {
>> +       } else {
>
> Nit: the `else` could be removed altogether here, but maybe that
> should be done in a preparatory patch.
>
Indeed, since I plan to re-roll with the changes you've suggested, I
will add this in too.
>> +               const char *prefix = NULL;
>> +
>
> [...]
>
>> +test_expect_success 'start after with specific directory and trailing slash' '
>> +       cat >expect <<-\EOF &&
>> +       refs/odd/spot
>> +       refs/tags/annotated-tag
>> +       refs/tags/doubly-annotated-tag
>> +       refs/tags/doubly-signed-tag
>> +       refs/tags/foo1.10
>> +       refs/tags/foo1.3
>> +       refs/tags/foo1.6
>> +       refs/tags/four
>> +       refs/tags/one
>> +       refs/tags/signed-tag
>> +       refs/tags/three
>> +       refs/tags/two
>> +       EOF
>> +       git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
>
> I don't see a trailing slash.
>
>> +       test_cmp expect actual
>> +'
>> +
>> +test_expect_success 'start after, just behind a specific directory' '
>> +       cat >expect <<-\EOF &&
>> +       refs/odd/spot
>> +       refs/tags/annotated-tag
>> +       refs/tags/doubly-annotated-tag
>> +       refs/tags/doubly-signed-tag
>> +       refs/tags/foo1.10
>> +       refs/tags/foo1.3
>> +       refs/tags/foo1.6
>> +       refs/tags/four
>> +       refs/tags/one
>> +       refs/tags/signed-tag
>> +       refs/tags/three
>> +       refs/tags/two
>> +       EOF
>> +       git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
>
> Here there is a trailing slash though.
I think these tests would make more sense in the newer versions with the
values swapped. Let me do that.
Thanks Christian for the thorough review.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-11 16:18 ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
                     ` (3 preceding siblings ...)
  2025-07-11 16:18   ` [PATCH v4 4/4] for-each-ref: introduce a '--start-after' option Karthik Nayak
@ 2025-07-14 16:34   ` Christian Couder
  2025-07-14 16:49     ` Junio C Hamano
  4 siblings, 1 reply; 102+ messages in thread
From: Christian Couder @ 2025-07-14 16:34 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, ps, schwab, phillip.wood123
On Fri, Jul 11, 2025 at 6:20 PM Karthik Nayak <karthik.188@gmail.com> wrote:
> Initially I was also planning to cleanup all the `refs_for_each...()`
> functions in 'refs.h' by simply using the iterator, but this bloated the
> series. So I've left that for another day.
I wonder if there is a plan to add the '--start-after' option to `git
branch` and `git tag` too?
> Karthik Nayak (4):
>       refs: expose `ref_iterator` via 'refs.h'
>       ref-cache: remove unused function 'find_ref_entry()'
>       refs: selectively set prefix in the seek functions
>       for-each-ref: introduce a '--start-after' option
Except for the few small comments I left on the two last patches and
one below, this looks good to me.
[ ... ]
> Range-diff versus v3:
[ ... ]
>      +  struct cache_ref_iterator *iter =
>      +          (struct cache_ref_iterator *)ref_iterator;
>      +
>      +  if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
>     -+          return cache_ref_iterator_set_prefix(iter, seek);
>     -+  } else if (seek && *seek) {
>     ++          return cache_ref_iterator_set_prefix(iter, refname);
>     ++  } else if (refname && *refname) {
Nit: the `else` here could be removed, but yeah it might be better to
do that in a preparatory patch.
>      +          struct cache_ref_iterator_level *level;
>     -+          const char *slash = seek;
>     ++          const char *slash = refname;
>      +          struct ref_dir *dir;
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-14 16:34   ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Christian Couder
@ 2025-07-14 16:49     ` Junio C Hamano
  2025-07-15  9:49       ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-14 16:49 UTC (permalink / raw)
  To: Christian Couder; +Cc: Karthik Nayak, git, ps, schwab, phillip.wood123
Christian Couder <christian.couder@gmail.com> writes:
> On Fri, Jul 11, 2025 at 6:20 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
>> Initially I was also planning to cleanup all the `refs_for_each...()`
>> functions in 'refs.h' by simply using the iterator, but this bloated the
>> series. So I've left that for another day.
>
> I wonder if there is a plan to add the '--start-after' option to `git
> branch` and `git tag` too?
Good question.
"git for-each-ref" is for scripters, "git branch/tag" are for
humans.  And humans do not page (outside "more/less").
So while it may be trivial to expose the feature to these Porcelain
commands, it is not obvious that it is a good idea worth cluttering
"git tag -h" output.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-14 16:49     ` Junio C Hamano
@ 2025-07-15  9:49       ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15  9:49 UTC (permalink / raw)
  To: Junio C Hamano, Christian Couder; +Cc: git, ps, schwab, phillip.wood123
[-- Attachment #1: Type: text/plain, Size: 974 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> On Fri, Jul 11, 2025 at 6:20 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>>
>>> Initially I was also planning to cleanup all the `refs_for_each...()`
>>> functions in 'refs.h' by simply using the iterator, but this bloated the
>>> series. So I've left that for another day.
>>
>> I wonder if there is a plan to add the '--start-after' option to `git
>> branch` and `git tag` too?
>
> Good question.
>
> "git for-each-ref" is for scripters, "git branch/tag" are for
> humans.  And humans do not page (outside "more/less").
>
> So while it may be trivial to expose the feature to these Porcelain
> commands, it is not obvious that it is a good idea worth cluttering
> "git tag -h" output.
Agreed, since the functionality is built into 'ref-filter', it should be
easy to expand to these commands. But I don't see the need for it.
Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-01 15:03 [PATCH 0/4] for-each-ref: introduce seeking functionality via '--skip-until' Karthik Nayak
                   ` (9 preceding siblings ...)
  2025-07-11 16:18 ` [PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after' Karthik Nayak
@ 2025-07-15 11:28 ` Karthik Nayak
  2025-07-15 11:28   ` [PATCH v5 1/5] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
                     ` (6 more replies)
  10 siblings, 7 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15 11:28 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123,
	Christian Couder
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
This series adds a '--start-after' option in 'git-for-each-ref(1)'. When
used, the reference iteration seeks to first reference following the
marker alphabetically. When paging, it should be noted that references
may be deleted, modified or added between invocations. Output will only
yield those references which follow the marker lexicographically. If the
marker does not exist, output begins from the first reference that would
come after it alphabetically.
This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200
To add this functionality, we expose the `ref_iterator` outside the
'refs/' namespace and modify the `ref_iterator_seek()` to actually seek
to a given reference and only set the prefix when the `set_prefix` field
is set.
On the reftable and packed backend, the changes are simple. But since
the files backend uses 'ref-cache' for reference handling, the changes
there are a little more involved, since we need to setup the right
levels and the indexing.
Initially I was also planning to cleanup all the `refs_for_each...()`
functions in 'refs.h' by simply using the iterator, but this bloated the
series. So I've left that for another day.
Changes in v5:
- Changes to the comments to refer to the flag
  'REF_ITERATOR_SEEK_SET_PREFIX' instead of a variable used in older
  versions. Also other small grammar fixes.
- Added a commit to remove an unnecessary else clause.
- Move seeking functionality within `for_each_fullref_in_pattern` to its
  own function.
- Fix incorrect naming in the tests.
- Link to v4: https://lore.kernel.org/r/20250711-306-git-for-each-ref-pagination-v4-0-ed3303ad5b89@gmail.com
Changes in v4:
- Patch 3/4: Move around the documentation for the flag and rename the
  seek variable to refname.
- Patch 4/4: Cleanup the commit message and also the documentation.
- Link to v3: https://lore.kernel.org/r/20250708-306-git-for-each-ref-pagination-v3-0-8cfba1080be4@gmail.com
Changes in v3:
- Change the working of the command to exclude the marker provided. With
  this rename the flag to '--start-after'.
- Extend the documentation to add a note about concurrent modifications
  to the reference database.
- Link to v2: https://lore.kernel.org/r/20250704-306-git-for-each-ref-pagination-v2-0-bcde14acdd81@gmail.com
Changes in v2:
- Modify 'ref_iterator_seek()' to take in flags instead of a
  'set_prefix' variable. This improves readability, where users would
  use the 'REF_ITERATOR_SEEK_SET_PREFIX' instead of simply passing '1'.
- When the set prefix flag isn't usage, reset any previously set prefix.
  This ensures that the internal prefix state is always reset whenever
  we seek and unifies the behavior between 'ref_iterator_seek' and
  'ref_iterator_begin'.
- Don't allow '--skip-until' to be run with '--sort', since the seeking
  always takes place before any sorting and this can be confusing.
- Some styling fixes:
  - Remove extra newline
  - Skip braces around single lined if...else clause
  - Add braces around 'if' clause
  - Fix indentation
- Link to v1: https://lore.kernel.org/git/20250701-306-git-for-each-ref-pagination-v1-0-4f0ae7c0688f@gmail.com/
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-for-each-ref.adoc |  10 +-
 builtin/for-each-ref.c              |   8 ++
 ref-filter.c                        | 116 ++++++++++++++-------
 ref-filter.h                        |   1 +
 refs.c                              |   6 +-
 refs.h                              | 155 ++++++++++++++++++++++++++++
 refs/debug.c                        |   7 +-
 refs/files-backend.c                |   7 +-
 refs/iterator.c                     |  26 +++--
 refs/packed-backend.c               |  17 ++--
 refs/ref-cache.c                    |  99 ++++++++++++++----
 refs/ref-cache.h                    |   7 --
 refs/refs-internal.h                | 152 ++--------------------------
 refs/reftable-backend.c             |  21 ++--
 t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
 15 files changed, 583 insertions(+), 243 deletions(-)
Karthik Nayak (5):
      refs: expose `ref_iterator` via 'refs.h'
      ref-cache: remove unused function 'find_ref_entry()'
      refs: selectively set prefix in the seek functions
      ref-filter: remove unnecessary else clause
      for-each-ref: introduce a '--start-after' option
Range-diff versus v4:
1:  dde167f421 = 1:  f9c9a7fdd9 refs: expose `ref_iterator` via 'refs.h'
2:  e392e93520 = 2:  83bee35517 ref-cache: remove unused function 'find_ref_entry()'
3:  711ffcac00 ! 3:  3b6019a1e7 refs: selectively set prefix in the seek functions
    @@ refs/refs-internal.h: void base_ref_iterator_init(struct ref_iterator *iter,
      /*
     - * Seek the iterator to the first reference matching the given prefix. Should
     - * behave the same as if a new iterator was created with the same prefix.
    -+ * Seek the iterator to the first matching reference. If set_prefix is set,
    -+ * it would behave the same as if a new iterator was created with the same
    -+ * prefix.
    ++ * Seek the iterator to the first matching reference. If the
    ++ * REF_ITERATOR_SEEK_SET_PREFIX flag is set, it would behave the same as if a
    ++ * new iterator was created with the provided refname as prefix.
       */
      typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
     -				 const char *prefix);
-:  ---------- > 4:  3f89eeef26 ref-filter: remove unnecessary else clause
4:  3a0c89acbe ! 5:  7ee7d83cf0 for-each-ref: introduce a '--start-after' option
    @@ ref-filter.c: static int filter_exclude_match(struct ref_filter *filter, const c
     +	strbuf_release(&sb);
     +	return ret;
     +}
    ++
    ++static int for_each_fullref_with_seek(struct ref_filter *filter, each_ref_fn cb,
    ++				       void *cb_data, unsigned int flags)
    ++{
    ++	struct ref_iterator *iter;
    ++	int ret = 0;
    ++
    ++	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
    ++				       NULL, 0, flags);
    ++	if (filter->start_after)
    ++		ret = start_ref_iterator_after(iter, filter->start_after);
    ++
    ++	if (ret)
    ++		return ret;
    ++
    ++	return do_for_each_ref_iterator(iter, cb, cb_data);
    ++}
     +
      /*
       * This is the same as for_each_fullref_in(), but it tries to iterate
       * only over the patterns we'll care about. Note that it _doesn't_ do a full
     @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
    - 				       each_ref_fn cb,
    - 				       void *cb_data)
      {
    -+	struct ref_iterator *iter;
    -+	int flags = 0, ret = 0;
    -+
      	if (filter->kind & FILTER_REFS_ROOT_REFS) {
      		/* In this case, we want to print all refs including root refs. */
     -		return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
     -						       cb, cb_data);
    -+		flags |= DO_FOR_EACH_INCLUDE_ROOT_REFS;
    -+		goto non_prefix_iter;
    ++		return for_each_fullref_with_seek(filter, cb, cb_data,
    ++						  DO_FOR_EACH_INCLUDE_ROOT_REFS);
      	}
      
      	if (!filter->match_as_path) {
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      		 */
     -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
     -						"", NULL, cb, cb_data);
    -+		goto non_prefix_iter;
    ++		return for_each_fullref_with_seek(filter, cb, cb_data, 0);
      	}
      
      	if (filter->ignore_case) {
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      		 */
     -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
     -						"", NULL, cb, cb_data);
    -+		goto non_prefix_iter;
    ++		return for_each_fullref_with_seek(filter, cb, cb_data, 0);
      	}
      
      	if (!filter->name_patterns[0]) {
      		/* no patterns; we have to look at everything */
     -		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
     -						 "", filter->exclude.v, cb, cb_data);
    -+		goto non_prefix_iter;
    ++		return for_each_fullref_with_seek(filter, cb, cb_data, 0);
      	}
      
      	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
    - 						 NULL, filter->name_patterns,
    - 						 filter->exclude.v,
    - 						 cb, cb_data);
    -+
    -+non_prefix_iter:
    -+	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
    -+				       NULL, 0, flags);
    -+	if (filter->start_after)
    -+		ret = start_ref_iterator_after(iter, filter->start_after);
    -+
    -+	if (ret)
    -+		return ret;
    -+
    -+	return do_for_each_ref_iterator(iter, cb, cb_data);
    - }
    +@@ ref-filter.c: void filter_is_base(struct repository *r,
      
    - /*
    -@@ ref-filter.c: static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
    - 	init_contains_cache(&filter->internal.no_contains_cache);
    + static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
    + {
    ++	const char *prefix = NULL;
    + 	int ret = 0;
      
    - 	/*  Simple per-ref filtering */
    --	if (!filter->kind)
    -+	if (!filter->kind) {
    - 		die("filter_refs: invalid type");
    --	else {
    -+	} else {
    -+		const char *prefix = NULL;
    -+
    - 		/*
    - 		 * For common cases where we need only branches or remotes or tags,
    - 		 * we only iterate through those refs. If a mix of refs is needed,
    + 	filter->kind = type & FILTER_REFS_KIND_MASK;
     @@ ref-filter.c: static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
    - 		 * of filter_ref_kind().
    - 		 */
    - 		if (filter->kind == FILTER_REFS_BRANCHES)
    --			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
    --						       "refs/heads/", NULL,
    --						       fn, cb_data);
    -+			prefix = "refs/heads/";
    - 		else if (filter->kind == FILTER_REFS_REMOTES)
    --			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
    --						       "refs/remotes/", NULL,
    --						       fn, cb_data);
    -+			prefix = "refs/remotes/";
    - 		else if (filter->kind == FILTER_REFS_TAGS)
    --			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
    --						       "refs/tags/", NULL, fn,
    --						       cb_data);
    --		else if (filter->kind & FILTER_REFS_REGULAR)
    -+			prefix = "refs/tags/";
    + 	 * of filter_ref_kind().
    + 	 */
    + 	if (filter->kind == FILTER_REFS_BRANCHES)
    +-		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
    +-					       "refs/heads/", NULL,
    +-					       fn, cb_data);
    ++		prefix = "refs/heads/";
    + 	else if (filter->kind == FILTER_REFS_REMOTES)
    +-		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
    +-					       "refs/remotes/", NULL,
    +-					       fn, cb_data);
    ++		prefix = "refs/remotes/";
    + 	else if (filter->kind == FILTER_REFS_TAGS)
    +-		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
    +-					       "refs/tags/", NULL, fn,
    +-					       cb_data);
    +-	else if (filter->kind & FILTER_REFS_REGULAR)
    ++		prefix = "refs/tags/";
     +
    -+		if (prefix) {
    -+			struct ref_iterator *iter;
    ++	if (prefix) {
    ++		struct ref_iterator *iter;
     +
    -+			iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
    -+						       "", NULL, 0, 0);
    ++		iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
    ++					       "", NULL, 0, 0);
     +
    -+			if (filter->start_after)
    -+				ret = start_ref_iterator_after(iter, filter->start_after);
    -+			else if (prefix)
    -+				ret = ref_iterator_seek(iter, prefix, 1);
    ++		if (filter->start_after)
    ++			ret = start_ref_iterator_after(iter, filter->start_after);
    ++		else if (prefix)
    ++			ret = ref_iterator_seek(iter, prefix, 1);
     +
    -+			if (!ret)
    -+				ret = do_for_each_ref_iterator(iter, fn, cb_data);
    -+		} else if (filter->kind & FILTER_REFS_REGULAR) {
    - 			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
    -+		}
    ++		if (!ret)
    ++			ret = do_for_each_ref_iterator(iter, fn, cb_data);
    ++	} else if (filter->kind & FILTER_REFS_REGULAR) {
    + 		ret = for_each_fullref_in_pattern(filter, fn, cb_data);
    ++	}
      
    - 		/*
    - 		 * When printing all ref types, HEAD is already included,
    + 	/*
    + 	 * When printing all ref types, HEAD is already included,
     
      ## ref-filter.h ##
     @@ ref-filter.h: struct ref_array {
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
     +	test_cmp expect actual
     +'
     +
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
     +	test_cmp expect actual
     +'
     +
base-commit: cf6f63ea6bf35173e02e18bdc6a4ba41288acff9
change-id: 20250605-306-git-for-each-ref-pagination-0ba8a29ae646
Thanks
- Karthik
^ permalink raw reply	[flat|nested] 102+ messages in thread* [PATCH v5 1/5] refs: expose `ref_iterator` via 'refs.h'
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
@ 2025-07-15 11:28   ` Karthik Nayak
  2025-07-15 11:28   ` [PATCH v5 2/5] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15 11:28 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123,
	Christian Couder
The `ref_iterator` is an internal structure to the 'refs/'
sub-directory, which allows iteration over refs. All reference iteration
is built on top of these iterators.
External clients of the 'refs' subsystem use the various
'refs_for_each...()' functions to iterate over refs. However since these
are wrapper functions, each combination of functionality requires a new
wrapper function. This is not feasible as the functions pile up with the
increase in requirements. Expose the internal reference iterator, so
advanced users can mix and match options as needed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++
  | 145 +-------------------------------------------------
 2 files changed, 149 insertions(+), 143 deletions(-)
 --git a/refs.h b/refs.h
index 46a6008e07..7c21aaef3d 100644
--- a/refs.h
+++ b/refs.h
@@ -1190,4 +1190,151 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 				    unsigned int flags,
 				    struct strbuf *err);
 
+/*
+ * Reference iterators
+ *
+ * A reference iterator encapsulates the state of an in-progress
+ * iteration over references. Create an instance of `struct
+ * ref_iterator` via one of the functions in this module.
+ *
+ * A freshly-created ref_iterator doesn't yet point at a reference. To
+ * advance the iterator, call ref_iterator_advance(). If successful,
+ * this sets the iterator's refname, oid, and flags fields to describe
+ * the next reference and returns ITER_OK. The data pointed at by
+ * refname and oid belong to the iterator; if you want to retain them
+ * after calling ref_iterator_advance() again or calling
+ * ref_iterator_free(), you must make a copy. When the iteration has
+ * been exhausted, ref_iterator_advance() releases any resources
+ * associated with the iteration, frees the ref_iterator object, and
+ * returns ITER_DONE. If you want to abort the iteration early, call
+ * ref_iterator_free(), which also frees the ref_iterator object and
+ * any associated resources. If there was an internal error advancing
+ * to the next entry, ref_iterator_advance() aborts the iteration,
+ * frees the ref_iterator, and returns ITER_ERROR.
+ *
+ * The reference currently being looked at can be peeled by calling
+ * ref_iterator_peel(). This function is often faster than peel_ref(),
+ * so it should be preferred when iterating over references.
+ *
+ * Putting it all together, a typical iteration looks like this:
+ *
+ *     int ok;
+ *     struct ref_iterator *iter = ...;
+ *
+ *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+ *             if (want_to_stop_iteration()) {
+ *                     ok = ITER_DONE;
+ *                     break;
+ *             }
+ *
+ *             // Access information about the current reference:
+ *             if (!(iter->flags & REF_ISSYMREF))
+ *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
+ *
+ *             // If you need to peel the reference:
+ *             ref_iterator_peel(iter, &oid);
+ *     }
+ *
+ *     if (ok != ITER_DONE)
+ *             handle_error();
+ *     ref_iterator_free(iter);
+ */
+struct ref_iterator;
+
+/*
+ * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
+ * which feeds it).
+ */
+enum do_for_each_ref_flags {
+	/*
+	 * Include broken references in a do_for_each_ref*() iteration, which
+	 * would normally be omitted. This includes both refs that point to
+	 * missing objects (a true repository corruption), ones with illegal
+	 * names (which we prefer not to expose to callers), as well as
+	 * dangling symbolic refs (i.e., those that point to a non-existent
+	 * ref; this is not a corruption, but as they have no valid oid, we
+	 * omit them from normal iteration results).
+	 */
+	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
+
+	/*
+	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
+	 * Normally this will be used with a files ref_store, since that's
+	 * where all reference backends will presumably store their
+	 * per-worktree refs.
+	 */
+	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
+
+	/*
+	 * Omit dangling symrefs from output; this only has an effect with
+	 * INCLUDE_BROKEN, since they are otherwise not included at all.
+	 */
+	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
+
+	/*
+	 * Include root refs i.e. HEAD and pseudorefs along with the regular
+	 * refs.
+	 */
+	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
+};
+
+/*
+ * Return an iterator that goes over each reference in `refs` for
+ * which the refname begins with prefix. If trim is non-zero, then
+ * trim that many characters off the beginning of each refname.
+ * The output is ordered by refname.
+ */
+struct ref_iterator *refs_ref_iterator_begin(
+	struct ref_store *refs,
+	const char *prefix, const char **exclude_patterns,
+	int trim, enum do_for_each_ref_flags flags);
+
+/*
+ * Advance the iterator to the first or next item and return ITER_OK.
+ * If the iteration is exhausted, free the resources associated with
+ * the ref_iterator and return ITER_DONE. On errors, free the iterator
+ * resources and return ITER_ERROR. It is a bug to use ref_iterator or
+ * call this function again after it has returned ITER_DONE or
+ * ITER_ERROR.
+ */
+int ref_iterator_advance(struct ref_iterator *ref_iterator);
+
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
+/*
+ * If possible, peel the reference currently being viewed by the
+ * iterator. Return 0 on success.
+ */
+int ref_iterator_peel(struct ref_iterator *ref_iterator,
+		      struct object_id *peeled);
+
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
+
+/*
+ * The common backend for the for_each_*ref* functions. Call fn for
+ * each reference in iter. If the iterator itself ever returns
+ * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
+ * the iteration and return that value. Otherwise, return 0. In any
+ * case, free the iterator when done. This function is basically an
+ * adapter between the callback style of reference iteration and the
+ * iterator style.
+ */
+int do_for_each_ref_iterator(struct ref_iterator *iter,
+			     each_ref_fn fn, void *cb_data);
+
 #endif /* REFS_H */
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index f868870851..03f5df04d5 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -244,90 +244,8 @@ const char *find_descendant_ref(const char *dirname,
 #define SYMREF_MAXDEPTH 5
 
 /*
- * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
- * which feeds it).
- */
-enum do_for_each_ref_flags {
-	/*
-	 * Include broken references in a do_for_each_ref*() iteration, which
-	 * would normally be omitted. This includes both refs that point to
-	 * missing objects (a true repository corruption), ones with illegal
-	 * names (which we prefer not to expose to callers), as well as
-	 * dangling symbolic refs (i.e., those that point to a non-existent
-	 * ref; this is not a corruption, but as they have no valid oid, we
-	 * omit them from normal iteration results).
-	 */
-	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
-
-	/*
-	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
-	 * Normally this will be used with a files ref_store, since that's
-	 * where all reference backends will presumably store their
-	 * per-worktree refs.
-	 */
-	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
-
-	/*
-	 * Omit dangling symrefs from output; this only has an effect with
-	 * INCLUDE_BROKEN, since they are otherwise not included at all.
-	 */
-	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
-
-	/*
-	 * Include root refs i.e. HEAD and pseudorefs along with the regular
-	 * refs.
-	 */
-	DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3),
-};
-
-/*
- * Reference iterators
- *
- * A reference iterator encapsulates the state of an in-progress
- * iteration over references. Create an instance of `struct
- * ref_iterator` via one of the functions in this module.
- *
- * A freshly-created ref_iterator doesn't yet point at a reference. To
- * advance the iterator, call ref_iterator_advance(). If successful,
- * this sets the iterator's refname, oid, and flags fields to describe
- * the next reference and returns ITER_OK. The data pointed at by
- * refname and oid belong to the iterator; if you want to retain them
- * after calling ref_iterator_advance() again or calling
- * ref_iterator_free(), you must make a copy. When the iteration has
- * been exhausted, ref_iterator_advance() releases any resources
- * associated with the iteration, frees the ref_iterator object, and
- * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_free(), which also frees the ref_iterator object and
- * any associated resources. If there was an internal error advancing
- * to the next entry, ref_iterator_advance() aborts the iteration,
- * frees the ref_iterator, and returns ITER_ERROR.
- *
- * The reference currently being looked at can be peeled by calling
- * ref_iterator_peel(). This function is often faster than peel_ref(),
- * so it should be preferred when iterating over references.
- *
- * Putting it all together, a typical iteration looks like this:
- *
- *     int ok;
- *     struct ref_iterator *iter = ...;
- *
- *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = ITER_DONE;
- *                     break;
- *             }
- *
- *             // Access information about the current reference:
- *             if (!(iter->flags & REF_ISSYMREF))
- *                     printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid));
- *
- *             // If you need to peel the reference:
- *             ref_iterator_peel(iter, &oid);
- *     }
- *
- *     if (ok != ITER_DONE)
- *             handle_error();
- *     ref_iterator_free(iter);
+ * Data structure for holding a reference iterator. See refs.h for
+ * more details and usage instructions.
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -337,42 +255,6 @@ struct ref_iterator {
 	unsigned int flags;
 };
 
-/*
- * Advance the iterator to the first or next item and return ITER_OK.
- * If the iteration is exhausted, free the resources associated with
- * the ref_iterator and return ITER_DONE. On errors, free the iterator
- * resources and return ITER_ERROR. It is a bug to use ref_iterator or
- * call this function again after it has returned ITER_DONE or
- * ITER_ERROR.
- */
-int ref_iterator_advance(struct ref_iterator *ref_iterator);
-
-/*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
- * first reference again.
- *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
- *
- * Returns 0 on success, a negative error code otherwise.
- */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
-
-/*
- * If possible, peel the reference currently being viewed by the
- * iterator. Return 0 on success.
- */
-int ref_iterator_peel(struct ref_iterator *ref_iterator,
-		      struct object_id *peeled);
-
-/* Free the reference iterator and any associated resources. */
-void ref_iterator_free(struct ref_iterator *ref_iterator);
-
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
  * returns ITER_DONE).
@@ -384,17 +266,6 @@ struct ref_iterator *empty_ref_iterator_begin(void);
  */
 int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
 
-/*
- * Return an iterator that goes over each reference in `refs` for
- * which the refname begins with prefix. If trim is non-zero, then
- * trim that many characters off the beginning of each refname.
- * The output is ordered by refname.
- */
-struct ref_iterator *refs_ref_iterator_begin(
-		struct ref_store *refs,
-		const char *prefix, const char **exclude_patterns,
-		int trim, enum do_for_each_ref_flags flags);
-
 /*
  * A callback function used to instruct merge_ref_iterator how to
  * interleave the entries from iter0 and iter1. The function should
@@ -520,18 +391,6 @@ struct ref_iterator_vtable {
  */
 extern struct ref_iterator *current_ref_iter;
 
-/*
- * The common backend for the for_each_*ref* functions. Call fn for
- * each reference in iter. If the iterator itself ever returns
- * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
- * the iteration and return that value. Otherwise, return 0. In any
- * case, free the iterator when done. This function is basically an
- * adapter between the callback style of reference iteration and the
- * iterator style.
- */
-int do_for_each_ref_iterator(struct ref_iterator *iter,
-			     each_ref_fn fn, void *cb_data);
-
 struct ref_store;
 
 /* refs backends */
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v5 2/5] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
  2025-07-15 11:28   ` [PATCH v5 1/5] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
@ 2025-07-15 11:28   ` Karthik Nayak
  2025-07-17 14:48     ` Junio C Hamano
  2025-07-15 11:28   ` [PATCH v5 3/5] refs: selectively set prefix in the seek functions Karthik Nayak
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15 11:28 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123,
	Christian Couder
The 'find_ref_entry' function is no longer used, so remove it.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  | 14 --------------
  |  7 -------
 2 files changed, 21 deletions(-)
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index c1f1bab1d5..8aaffa8c6b 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -194,20 +194,6 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir,
 	return dir;
 }
 
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
-{
-	int entry_index;
-	struct ref_entry *entry;
-	dir = find_containing_dir(dir, refname);
-	if (!dir)
-		return NULL;
-	entry_index = search_ref_dir(dir, refname, strlen(refname));
-	if (entry_index == -1)
-		return NULL;
-	entry = dir->entries[entry_index];
-	return (entry->flag & REF_DIR) ? NULL : entry;
-}
-
 /*
  * Emit a warning and return true iff ref1 and ref2 have the same name
  * and the same oid. Die if they have the same name but different
 --git a/refs/ref-cache.h b/refs/ref-cache.h
index 5f04e518c3..f635d2d824 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -201,13 +201,6 @@ void free_ref_cache(struct ref_cache *cache);
  */
 void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
 
-/*
- * Find the value entry with the given name in dir, sorting ref_dirs
- * and recursing into subdirectories as necessary.  If the name is not
- * found or it corresponds to a directory entry, return NULL.
- */
-struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
-
 /*
  * Start iterating over references in `cache`. If `prefix` is
  * specified, only include references whose names start with that
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v5 2/5] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-15 11:28   ` [PATCH v5 2/5] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
@ 2025-07-17 14:48     ` Junio C Hamano
  2025-07-17 19:31       ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-17 14:48 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, schwab, phillip.wood123, Christian Couder
Karthik Nayak <karthik.188@gmail.com> writes:
> The 'find_ref_entry' function is no longer used, so remove it.
If my spelunking is correct, ba1c052f (ref_store: implement
`refs_peel_ref()` generically, 2017-09-25) is the commit that
removed the last caller of it.  Which may be worth noting here.
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs/ref-cache.c | 14 --------------
>  refs/ref-cache.h |  7 -------
>  2 files changed, 21 deletions(-)
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index c1f1bab1d5..8aaffa8c6b 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -194,20 +194,6 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir,
>  	return dir;
>  }
>  
> -struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
> -{
> -	int entry_index;
> -	struct ref_entry *entry;
> -	dir = find_containing_dir(dir, refname);
> -	if (!dir)
> -		return NULL;
> -	entry_index = search_ref_dir(dir, refname, strlen(refname));
> -	if (entry_index == -1)
> -		return NULL;
> -	entry = dir->entries[entry_index];
> -	return (entry->flag & REF_DIR) ? NULL : entry;
> -}
> -
>  /*
>   * Emit a warning and return true iff ref1 and ref2 have the same name
>   * and the same oid. Die if they have the same name but different
> diff --git a/refs/ref-cache.h b/refs/ref-cache.h
> index 5f04e518c3..f635d2d824 100644
> --- a/refs/ref-cache.h
> +++ b/refs/ref-cache.h
> @@ -201,13 +201,6 @@ void free_ref_cache(struct ref_cache *cache);
>   */
>  void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
>  
> -/*
> - * Find the value entry with the given name in dir, sorting ref_dirs
> - * and recursing into subdirectories as necessary.  If the name is not
> - * found or it corresponds to a directory entry, return NULL.
> - */
> -struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
> -
>  /*
>   * Start iterating over references in `cache`. If `prefix` is
>   * specified, only include references whose names start with that
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v5 2/5] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-17 14:48     ` Junio C Hamano
@ 2025-07-17 19:31       ` Karthik Nayak
  2025-07-17 20:32         ` Junio C Hamano
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-17 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, schwab, phillip.wood123, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 889 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The 'find_ref_entry' function is no longer used, so remove it.
>
> If my spelunking is correct, ba1c052f (ref_store: implement
> `refs_peel_ref()` generically, 2017-09-25) is the commit that
> removed the last caller of it.  Which may be worth noting here.
>
Indeed, I could also verify this by running
$ git log -S find_ref_entry --oneline
2c90b85801 ref-cache: remove unused function 'find_ref_entry()'
ba1c052fa6 ref_store: implement `refs_peel_ref()` generically
9939b33d6a packed-backend: rip out some now-unused code
....
And looking at `ba1c052fa6`. I should've done this before. But thanks
for the digging!
I plan to address a few comments on this version, but I also see that
you've merged it to master. Should I raise followups for noteworthy
changes or a new version?
Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v5 2/5] ref-cache: remove unused function 'find_ref_entry()'
  2025-07-17 19:31       ` Karthik Nayak
@ 2025-07-17 20:32         ` Junio C Hamano
  0 siblings, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2025-07-17 20:32 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, schwab, phillip.wood123, Christian Couder
Karthik Nayak <karthik.188@gmail.com> writes:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Karthik Nayak <karthik.188@gmail.com> writes:
>>
>>> The 'find_ref_entry' function is no longer used, so remove it.
>>
>> If my spelunking is correct, ba1c052f (ref_store: implement
>> `refs_peel_ref()` generically, 2017-09-25) is the commit that
>> removed the last caller of it.  Which may be worth noting here.
>>
>
> Indeed, I could also verify this by running
>
> $ git log -S find_ref_entry --oneline
> 2c90b85801 ref-cache: remove unused function 'find_ref_entry()'
> ba1c052fa6 ref_store: implement `refs_peel_ref()` generically
> 9939b33d6a packed-backend: rip out some now-unused code
> ....
>
> And looking at `ba1c052fa6`. I should've done this before. But thanks
> for the digging!
>
> I plan to address a few comments on this version, but I also see that
> you've merged it to master. Should I raise followups for noteworthy
> changes or a new version?
You mean it is now in 'next'?  Yes, please give incremental patches.
Finding some more gotchas even after v5 is a sign that they are
tricky enough that deserve separate explanation from the main part
of the series.  And I think the msan one Kyle and Peff found is also
tricky enough that evaded reviewers' eyes.
Thanks.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH v5 3/5] refs: selectively set prefix in the seek functions
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
  2025-07-15 11:28   ` [PATCH v5 1/5] refs: expose `ref_iterator` via 'refs.h' Karthik Nayak
  2025-07-15 11:28   ` [PATCH v5 2/5] ref-cache: remove unused function 'find_ref_entry()' Karthik Nayak
@ 2025-07-15 11:28   ` Karthik Nayak
  2025-07-17  2:09     ` Jeff King
  2025-07-15 11:28   ` [PATCH v5 4/5] ref-filter: remove unnecessary else clause Karthik Nayak
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15 11:28 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123,
	Christian Couder
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.
However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.
Let's add a 'flags' field to the function, which when set with the
'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
iteration in-line with the existing behavior. Otherwise, the reference
backends will simply seek to the specified reference and clears any
previously set prefix. This allows users to start iteration from a
specific reference.
In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
then parse the provided seek string and set the required levels and
their indexes to ensure that seeking is possible.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
                   |  6 ++--
                   | 26 +++++++++------
             |  7 ++--
     |  7 ++--
          | 26 ++++++++-------
    | 17 ++++++----
         | 85 ++++++++++++++++++++++++++++++++++++++++++++++---
     |  7 ++--
  | 21 ++++++++----
 9 files changed, 152 insertions(+), 50 deletions(-)
 --git a/refs.c b/refs.c
index dce5c49ca2..243e6898b8 100644
--- a/refs.c
+++ b/refs.c
@@ -2666,12 +2666,12 @@ enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs
 		if (!initial_transaction) {
 			int ok;
 
-			if (!iter) {
+			if (!iter)
 				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
 							       DO_FOR_EACH_INCLUDE_BROKEN);
-			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+			else if (ref_iterator_seek(iter, dirname.buf,
+						   REF_ITERATOR_SEEK_SET_PREFIX) < 0)
 				goto cleanup;
-			}
 
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 --git a/refs.h b/refs.h
index 7c21aaef3d..e6780a8848 100644
--- a/refs.h
+++ b/refs.h
@@ -1299,21 +1299,29 @@ struct ref_iterator *refs_ref_iterator_begin(
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+enum ref_iterator_seek_flag {
+	/*
+	 * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
+	 * updated to match the provided string, affecting all subsequent iterations. If
+	 * not, the iterator seeks to the specified reference and clears any previously
+	 * set prefix.
+	 */
+	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
+};
+
 /*
- * Seek the iterator to the first reference with the given prefix.
- * The prefix is matched as a literal string, without regard for path
- * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * Seek the iterator to the first reference matching the given seek string.
+ * The seek string is matched as a literal string, without regard for path
+ * separators. If seek is NULL or the empty string, seek the iterator to the
  * first reference again.
  *
- * This function is expected to behave as if a new ref iterator with the same
- * prefix had been created, but allows reuse of iterators and thus may allow
- * the backend to optimize. Parameters other than the prefix that have been
- * passed when creating the iterator will remain unchanged.
+ * This function is expected to behave as if a new ref iterator has been
+ * created, but allows reuse of existing iterators for optimization.
  *
  * Returns 0 on success, a negative error code otherwise.
  */
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix);
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname,
+		      unsigned int flags);
 
 /*
  * If possible, peel the reference currently being viewed by the
 --git a/refs/debug.c b/refs/debug.c
index 485e3079d7..da300efaf3 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -170,12 +170,13 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *refname, unsigned int flags)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->seek(diter->iter, prefix);
-	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	int res = diter->iter->vtable->seek(diter->iter, refname, flags);
+	trace_printf_key(&trace_refs, "iterator_seek: %s flags: %d: %d\n",
+			 refname ? refname : "", flags, res);
 	return res;
 }
 
 --git a/refs/files-backend.c b/refs/files-backend.c
index bf6f89b1d1..8b282f2a60 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -929,11 +929,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *refname, unsigned int flags)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	return ref_iterator_seek(iter->iter0, prefix);
+	return ref_iterator_seek(iter->iter0, refname, flags);
 }
 
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -2316,7 +2316,8 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				      const char *prefix UNUSED)
+				      const char *refname UNUSED,
+				      unsigned int flags UNUSED)
 {
 	BUG("ref_iterator_seek() called for reflog_iterator");
 }
 --git a/refs/iterator.c b/refs/iterator.c
index 766d96e795..17ef841d8a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,10 +15,10 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
-int ref_iterator_seek(struct ref_iterator *ref_iterator,
-		      const char *prefix)
+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname,
+		      unsigned int flags)
 {
-	return ref_iterator->vtable->seek(ref_iterator, prefix);
+	return ref_iterator->vtable->seek(ref_iterator, refname, flags);
 }
 
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
@@ -57,7 +57,8 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 }
 
 static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-				   const char *prefix UNUSED)
+				   const char *refname UNUSED,
+				   unsigned int flags UNUSED)
 {
 	return 0;
 }
@@ -224,7 +225,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+				   const char *refname, unsigned int flags)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
@@ -234,11 +235,11 @@ static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	iter->iter0 = iter->iter0_owned;
 	iter->iter1 = iter->iter1_owned;
 
-	ret = ref_iterator_seek(iter->iter0, prefix);
+	ret = ref_iterator_seek(iter->iter0, refname, flags);
 	if (ret < 0)
 		return ret;
 
-	ret = ref_iterator_seek(iter->iter1, prefix);
+	ret = ref_iterator_seek(iter->iter1, refname, flags);
 	if (ret < 0)
 		return ret;
 
@@ -407,13 +408,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *refname, unsigned int flags)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	return ref_iterator_seek(iter->iter0, prefix);
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		free(iter->prefix);
+		iter->prefix = xstrdup_or_null(refname);
+	}
+	return ref_iterator_seek(iter->iter0, refname, flags);
 }
 
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 --git a/refs/packed-backend.c b/refs/packed-backend.c
index 7fd73a0e6d..5fa4ae6655 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1004,19 +1004,23 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				    const char *prefix)
+				    const char *refname, unsigned int flags)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
 	const char *start;
 
-	if (prefix && *prefix)
-		start = find_reference_location(iter->snapshot, prefix, 0);
+	if (refname && *refname)
+		start = find_reference_location(iter->snapshot, refname, 0);
 	else
 		start = iter->snapshot->start;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
+	/* Unset any previously set prefix */
+	FREE_AND_NULL(iter->prefix);
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
+		iter->prefix = xstrdup_or_null(refname);
+
 	iter->pos = start;
 	iter->eof = iter->snapshot->eof;
 
@@ -1194,7 +1198,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (packed_ref_iterator_seek(&iter->base, prefix,
+				     REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/ref-cache.c b/refs/ref-cache.c
index 8aaffa8c6b..1d95b56d40 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -434,11 +434,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
-static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				   const char *prefix)
+static int cache_ref_iterator_set_prefix(struct cache_ref_iterator *iter,
+					 const char *prefix)
 {
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
 	struct cache_ref_iterator_level *level;
 	struct ref_dir *dir;
 
@@ -469,6 +467,82 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 	return 0;
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *refname, unsigned int flags)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		return cache_ref_iterator_set_prefix(iter, refname);
+	} else if (refname && *refname) {
+		struct cache_ref_iterator_level *level;
+		const char *slash = refname;
+		struct ref_dir *dir;
+
+		dir = get_ref_dir(iter->cache->root);
+
+		if (iter->prime_dir)
+			prime_ref_dir(dir, refname);
+
+		iter->levels_nr = 1;
+		level = &iter->levels[0];
+		level->index = -1;
+		level->dir = dir;
+
+		/* Unset any previously set prefix */
+		FREE_AND_NULL(iter->prefix);
+
+		/*
+		 * Breakdown the provided seek path and assign the correct
+		 * indexing to each level as needed.
+		 */
+		do {
+			int len, idx;
+			int cmp = 0;
+
+			sort_ref_dir(dir);
+
+			slash = strchr(slash, '/');
+			len = slash ? slash - refname : (int)strlen(refname);
+
+			for (idx = 0; idx < dir->nr; idx++) {
+				cmp = strncmp(refname, dir->entries[idx]->name, len);
+				if (cmp <= 0)
+					break;
+			}
+			/* don't overflow the index */
+			idx = idx >= dir->nr ? dir->nr - 1 : idx;
+
+			if (slash)
+				slash = slash + 1;
+
+			level->index = idx;
+			if (dir->entries[idx]->flag & REF_DIR) {
+				/* push down a level */
+				dir = get_ref_dir(dir->entries[idx]);
+
+				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+					   iter->levels_alloc);
+				level = &iter->levels[iter->levels_nr++];
+				level->dir = dir;
+				level->index = -1;
+			} else {
+				/* reduce the index so the leaf node is iterated over */
+				if (cmp <= 0 && !slash)
+					level->index = idx - 1;
+				/*
+				 * while the seek path may not be exhausted, our
+				 * match is exhausted at a leaf node.
+				 */
+				break;
+			}
+		} while (slash);
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -509,7 +583,8 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 	iter->cache = cache;
 	iter->prime_dir = prime_dir;
 
-	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+	if (cache_ref_iterator_seek(&iter->base, prefix,
+				    REF_ITERATOR_SEEK_SET_PREFIX) < 0) {
 		ref_iterator_free(&iter->base);
 		return NULL;
 	}
 --git a/refs/refs-internal.h b/refs/refs-internal.h
index 03f5df04d5..40c1c0f93d 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -353,11 +353,12 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 /*
- * Seek the iterator to the first reference matching the given prefix. Should
- * behave the same as if a new iterator was created with the same prefix.
+ * Seek the iterator to the first matching reference. If the
+ * REF_ITERATOR_SEEK_SET_PREFIX flag is set, it would behave the same as if a
+ * new iterator was created with the provided refname as prefix.
  */
 typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
-				 const char *prefix);
+				 const char *refname, unsigned int flags);
 
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
 --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 4c3817f4ec..c3d48cc412 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -719,15 +719,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
-				      const char *prefix)
+				      const char *refname, unsigned int flags)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
 
-	free(iter->prefix);
-	iter->prefix = xstrdup_or_null(prefix);
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
-	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+	/* Unset any previously set prefix */
+	FREE_AND_NULL(iter->prefix);
+	iter->prefix_len = 0;
+
+	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
+		iter->prefix = xstrdup_or_null(refname);
+		iter->prefix_len = refname ? strlen(refname) : 0;
+	}
+	iter->err = reftable_iterator_seek_ref(&iter->iter, refname);
 
 	return iter->err;
 }
@@ -839,7 +844,8 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	ret = reftable_ref_iterator_seek(&iter->base, prefix);
+	ret = reftable_ref_iterator_seek(&iter->base, prefix,
+					 REF_ITERATOR_SEEK_SET_PREFIX);
 	if (ret)
 		goto done;
 
@@ -2042,7 +2048,8 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 }
 
 static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
-					 const char *prefix UNUSED)
+					 const char *refname UNUSED,
+					 unsigned int flags UNUSED)
 {
 	BUG("reftable reflog iterator cannot be seeked");
 	return -1;
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v5 3/5] refs: selectively set prefix in the seek functions
  2025-07-15 11:28   ` [PATCH v5 3/5] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-17  2:09     ` Jeff King
  2025-07-17 19:49       ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2025-07-17  2:09 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, ps, schwab, phillip.wood123, Christian Couder
On Tue, Jul 15, 2025 at 01:28:28PM +0200, Karthik Nayak wrote:
> +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
> +				   const char *refname, unsigned int flags)
> [...]
> +		do {
> +			int len, idx;
> +			int cmp = 0;
> +
> +			sort_ref_dir(dir);
> +
> +			slash = strchr(slash, '/');
> +			len = slash ? slash - refname : (int)strlen(refname);
I was looking at this code due to a nearby thread and noticed this funny
cast to int. I guess you added it to silence -Wsign-compare, but Why are
we not using a size_t in the first place?
This kind of conversion can sometimes have security implications because
a very large "refname" would cause "len" to become negative (i.e., if
it's between 2GB and 4GB).
In this particular case it ends up cast back to a size_t via strncmp:
> +			for (idx = 0; idx < dir->nr; idx++) {
> +				cmp = strncmp(refname, dir->entries[idx]->name, len);
> +				if (cmp <= 0)
> +					break;
> +			}
so we get the original value back. We'd still get truncation for a
refname value over 4GB, which would presumably give us a slightly wrong
answer. But I don't think we'd ever look outside the array.
Such sizes are probably unlikely if we are feeding filesystem paths. But
we probably should not set a bad example, and just do;
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 1d95b56d40..3949d145e8 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -498,13 +498,14 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 		 * indexing to each level as needed.
 		 */
 		do {
-			int len, idx;
+			size_t len;
+			int idx;
 			int cmp = 0;
 
 			sort_ref_dir(dir);
 
 			slash = strchr(slash, '/');
-			len = slash ? slash - refname : (int)strlen(refname);
+			len = slash ? slash - refname : strlen(refname);
 
 			for (idx = 0; idx < dir->nr; idx++) {
 				cmp = strncmp(refname, dir->entries[idx]->name, len);
-Peff
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v5 3/5] refs: selectively set prefix in the seek functions
  2025-07-17  2:09     ` Jeff King
@ 2025-07-17 19:49       ` Karthik Nayak
  2025-07-17 21:55         ` Jeff King
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-17 19:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, ps, schwab, phillip.wood123, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]
Jeff King <peff@peff.net> writes:
> On Tue, Jul 15, 2025 at 01:28:28PM +0200, Karthik Nayak wrote:
>
>> +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
>> +				   const char *refname, unsigned int flags)
>> [...]
>> +		do {
>> +			int len, idx;
>> +			int cmp = 0;
>> +
>> +			sort_ref_dir(dir);
>> +
>> +			slash = strchr(slash, '/');
>> +			len = slash ? slash - refname : (int)strlen(refname);
>
> I was looking at this code due to a nearby thread and noticed this funny
> cast to int. I guess you added it to silence -Wsign-compare, but Why are
> we not using a size_t in the first place?
>
That's an oversight from my side.
> This kind of conversion can sometimes have security implications because
> a very large "refname" would cause "len" to become negative (i.e., if
> it's between 2GB and 4GB).
>
Indeed, I didn't think of that.
> In this particular case it ends up cast back to a size_t via strncmp:
>
>> +			for (idx = 0; idx < dir->nr; idx++) {
>> +				cmp = strncmp(refname, dir->entries[idx]->name, len);
>> +				if (cmp <= 0)
>> +					break;
>> +			}
>
> so we get the original value back. We'd still get truncation for a
> refname value over 4GB, which would presumably give us a slightly wrong
> answer. But I don't think we'd ever look outside the array.
>
> Such sizes are probably unlikely if we are feeding filesystem paths. But
> we probably should not set a bad example, and just do;
>
Agreed.
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 1d95b56d40..3949d145e8 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -498,13 +498,14 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
>  		 * indexing to each level as needed.
>  		 */
>  		do {
> -			int len, idx;
> +			size_t len;
> +			int idx;
>  			int cmp = 0;
>
>  			sort_ref_dir(dir);
>
>  			slash = strchr(slash, '/');
> -			len = slash ? slash - refname : (int)strlen(refname);
> +			len = slash ? slash - refname : strlen(refname);
>
>  			for (idx = 0; idx < dir->nr; idx++) {
>  				cmp = strncmp(refname, dir->entries[idx]->name, len);
>
> -Peff
Thanks, I think we have to typecast `slash - refname` to size_t, but
this is the right way to do it. Thanks for the review!
Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v5 3/5] refs: selectively set prefix in the seek functions
  2025-07-17 19:49       ` Karthik Nayak
@ 2025-07-17 21:55         ` Jeff King
  0 siblings, 0 replies; 102+ messages in thread
From: Jeff King @ 2025-07-17 21:55 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, ps, schwab, phillip.wood123, Christian Couder
On Thu, Jul 17, 2025 at 12:49:33PM -0700, Karthik Nayak wrote:
> > diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> > index 1d95b56d40..3949d145e8 100644
> > --- a/refs/ref-cache.c
> > +++ b/refs/ref-cache.c
> > @@ -498,13 +498,14 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
> >  		 * indexing to each level as needed.
> >  		 */
> >  		do {
> > -			int len, idx;
> > +			size_t len;
> > +			int idx;
> >  			int cmp = 0;
> >
> >  			sort_ref_dir(dir);
> >
> >  			slash = strchr(slash, '/');
> > -			len = slash ? slash - refname : (int)strlen(refname);
> > +			len = slash ? slash - refname : strlen(refname);
> >
> >  			for (idx = 0; idx < dir->nr; idx++) {
> >  				cmp = strncmp(refname, dir->entries[idx]->name, len);
> >
> > -Peff
> 
> Thanks, I think we have to typecast `slash - refname` to size_t, but
> this is the right way to do it. Thanks for the review!
Ah, yeah. I mistakenly test-compiled without DEVELOPER=1. ;)
I do think that cast is a lesser evil, though. It is a ptrdiff_t, but we
know it is correctly unsigned because "slash > refname" via strchr.  I
wish there was a good way to use the type system to tell the compiler
that.
-Peff
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH v5 4/5] ref-filter: remove unnecessary else clause
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-07-15 11:28   ` [PATCH v5 3/5] refs: selectively set prefix in the seek functions Karthik Nayak
@ 2025-07-15 11:28   ` Karthik Nayak
  2025-07-15 11:28   ` [PATCH v5 5/5] for-each-ref: introduce a '--start-after' option Karthik Nayak
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15 11:28 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123,
	Christian Couder
In 'ref-filter.c', there is an 'else' clause within `do_filter_refs()`.
This is unnecessary since the 'if' clause calls `die()`, which would
exit the program. So let's remove the unnecessary 'else' clause. This
improves readability since the indentation is also reduced and flow is
simpler.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  | 60 ++++++++++++++++++++++++++++++------------------------------
 1 file changed, 30 insertions(+), 30 deletions(-)
 --git a/ref-filter.c b/ref-filter.c
index 7a274633cf..da663c7ac8 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3199,37 +3199,37 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 	/*  Simple per-ref filtering */
 	if (!filter->kind)
 		die("filter_refs: invalid type");
-	else {
-		/*
-		 * For common cases where we need only branches or remotes or tags,
-		 * we only iterate through those refs. If a mix of refs is needed,
-		 * we iterate over all refs and filter out required refs with the help
-		 * of filter_ref_kind().
-		 */
-		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/heads/", NULL,
-						       fn, cb_data);
-		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/remotes/", NULL,
-						       fn, cb_data);
-		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						       "refs/tags/", NULL, fn,
-						       cb_data);
-		else if (filter->kind & FILTER_REFS_REGULAR)
-			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
 
-		/*
-		 * When printing all ref types, HEAD is already included,
-		 * so we don't want to print HEAD again.
-		 */
-		if (!ret && !(filter->kind & FILTER_REFS_ROOT_REFS) &&
-		    (filter->kind & FILTER_REFS_DETACHED_HEAD))
-			refs_head_ref(get_main_ref_store(the_repository), fn,
-				      cb_data);
-	}
+	/*
+	 * For common cases where we need only branches or remotes or tags,
+	 * we only iterate through those refs. If a mix of refs is needed,
+	 * we iterate over all refs and filter out required refs with the help
+	 * of filter_ref_kind().
+	 */
+	if (filter->kind == FILTER_REFS_BRANCHES)
+		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+					       "refs/heads/", NULL,
+					       fn, cb_data);
+	else if (filter->kind == FILTER_REFS_REMOTES)
+		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+					       "refs/remotes/", NULL,
+					       fn, cb_data);
+	else if (filter->kind == FILTER_REFS_TAGS)
+		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+					       "refs/tags/", NULL, fn,
+					       cb_data);
+	else if (filter->kind & FILTER_REFS_REGULAR)
+		ret = for_each_fullref_in_pattern(filter, fn, cb_data);
+
+	/*
+	 * When printing all ref types, HEAD is already included,
+	 * so we don't want to print HEAD again.
+	 */
+	if (!ret && !(filter->kind & FILTER_REFS_ROOT_REFS) &&
+	    (filter->kind & FILTER_REFS_DETACHED_HEAD))
+		refs_head_ref(get_main_ref_store(the_repository), fn,
+			      cb_data);
+
 
 	clear_contains_cache(&filter->internal.contains_cache);
 	clear_contains_cache(&filter->internal.no_contains_cache);
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* [PATCH v5 5/5] for-each-ref: introduce a '--start-after' option
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
                     ` (3 preceding siblings ...)
  2025-07-15 11:28   ` [PATCH v5 4/5] ref-filter: remove unnecessary else clause Karthik Nayak
@ 2025-07-15 11:28   ` Karthik Nayak
  2025-07-17 15:31     ` Junio C Hamano
  2025-07-15 19:00   ` [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after' Junio C Hamano
  2025-07-23 21:51   ` [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level Junio C Hamano
  6 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-15 11:28 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, gitster, ps, schwab, phillip.wood123,
	Christian Couder
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
The previous commit added 'seek' functionality to the reference
backends. Utilize this and expose a '--start-after' option in
'git-for-each-ref(1)'. When used, the reference iteration seeks to the
lexicographically next reference and iterates from there onward.
This enables efficient pagination workflows, where the calling script
can remember the last provided reference and use that as the starting
point for the next set of references:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200
Since the reference iterators only allow seeking to a specified marker
via the `ref_iterator_seek()`, we introduce a helper function
`start_ref_iterator_after()`, which seeks to next reference by simply
adding (char) 1 to the marker.
We must note that pagination always continues from the provided marker,
as such any concurrent reference updates lexicographically behind the
marker will not be output. Document the same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
  |  10 +-
               |   8 ++
                         |  78 +++++++++++----
                         |   1 +
       | 194 ++++++++++++++++++++++++++++++++++++
 5 files changed, 272 insertions(+), 19 deletions(-)
 --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
index 5ef89fc0fe..ae61ba642a 100644
--- a/Documentation/git-for-each-ref.adoc
+++ b/Documentation/git-for-each-ref.adoc
@@ -14,7 +14,7 @@ SYNOPSIS
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
-		   [--exclude=<pattern> ...]
+		   [--exclude=<pattern> ...] [--start-after=<marker>]
 
 DESCRIPTION
 -----------
@@ -108,6 +108,14 @@ TAB %(refname)`.
 --include-root-refs::
 	List root refs (HEAD and pseudorefs) apart from regular refs.
 
+--start-after=<marker>::
+    Allows paginating the output by skipping references up to and including the
+    specified marker. When paging, it should be noted that references may be
+    deleted, modified or added between invocations. Output will only yield those
+    references which follow the marker lexicographically. Output begins from the
+    first reference that would come after the marker alphabetically. Cannot be
+    used with general pattern matching or custom sort options.
+
 FIELD NAMES
 -----------
 
 --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 3d2207ec77..3f21598046 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -13,6 +13,7 @@ static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [--points-at <object>]"),
 	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
 	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
+	N_("git for-each-ref [--start-after <marker>]"),
 	NULL
 };
 
@@ -44,6 +45,7 @@ int cmd_for_each_ref(int argc,
 		OPT_GROUP(""),
 		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
+		OPT_STRING(  0 , "start-after", &filter.start_after, N_("start-start"), N_("start iteration after the provided marker")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
@@ -79,6 +81,9 @@ int cmd_for_each_ref(int argc,
 	if (verify_ref_format(&format))
 		usage_with_options(for_each_ref_usage, opts);
 
+	if (filter.start_after && sorting_options.nr > 1)
+		die(_("cannot use --start-after with custom sort options"));
+
 	sorting = ref_sorting_options(&sorting_options);
 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
 	filter.ignore_case = icase;
@@ -100,6 +105,9 @@ int cmd_for_each_ref(int argc,
 		filter.name_patterns = argv;
 	}
 
+	if (filter.start_after && filter.name_patterns && filter.name_patterns[0])
+		die(_("cannot use --start-after with patterns"));
+
 	if (include_root_refs)
 		flags |= FILTER_REFS_ROOT_REFS | FILTER_REFS_DETACHED_HEAD;
 
 --git a/ref-filter.c b/ref-filter.c
index da663c7ac8..c8a6b7f1af 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2683,6 +2683,41 @@ static int filter_exclude_match(struct ref_filter *filter, const char *refname)
 	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
 }
 
+/*
+ * We need to seek to the reference right after a given marker but excluding any
+ * matching references. So we seek to the lexicographically next reference.
+ */
+static int start_ref_iterator_after(struct ref_iterator *iter, const char *marker)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int ret;
+
+	strbuf_addstr(&sb, marker);
+	strbuf_addch(&sb, 1);
+
+	ret = ref_iterator_seek(iter, sb.buf, 0);
+
+	strbuf_release(&sb);
+	return ret;
+}
+
+static int for_each_fullref_with_seek(struct ref_filter *filter, each_ref_fn cb,
+				       void *cb_data, unsigned int flags)
+{
+	struct ref_iterator *iter;
+	int ret = 0;
+
+	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
+				       NULL, 0, flags);
+	if (filter->start_after)
+		ret = start_ref_iterator_after(iter, filter->start_after);
+
+	if (ret)
+		return ret;
+
+	return do_for_each_ref_iterator(iter, cb, cb_data);
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -2694,8 +2729,8 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 {
 	if (filter->kind & FILTER_REFS_ROOT_REFS) {
 		/* In this case, we want to print all refs including root refs. */
-		return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
-						       cb, cb_data);
+		return for_each_fullref_with_seek(filter, cb, cb_data,
+						  DO_FOR_EACH_INCLUDE_ROOT_REFS);
 	}
 
 	if (!filter->match_as_path) {
@@ -2704,8 +2739,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		return for_each_fullref_with_seek(filter, cb, cb_data, 0);
 	}
 
 	if (filter->ignore_case) {
@@ -2714,14 +2748,12 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						"", NULL, cb, cb_data);
+		return for_each_fullref_with_seek(filter, cb, cb_data, 0);
 	}
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", filter->exclude.v, cb, cb_data);
+		return for_each_fullref_with_seek(filter, cb, cb_data, 0);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
@@ -3189,6 +3221,7 @@ void filter_is_base(struct repository *r,
 
 static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
 {
+	const char *prefix = NULL;
 	int ret = 0;
 
 	filter->kind = type & FILTER_REFS_KIND_MASK;
@@ -3207,19 +3240,28 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 	 * of filter_ref_kind().
 	 */
 	if (filter->kind == FILTER_REFS_BRANCHES)
-		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-					       "refs/heads/", NULL,
-					       fn, cb_data);
+		prefix = "refs/heads/";
 	else if (filter->kind == FILTER_REFS_REMOTES)
-		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-					       "refs/remotes/", NULL,
-					       fn, cb_data);
+		prefix = "refs/remotes/";
 	else if (filter->kind == FILTER_REFS_TAGS)
-		ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
-					       "refs/tags/", NULL, fn,
-					       cb_data);
-	else if (filter->kind & FILTER_REFS_REGULAR)
+		prefix = "refs/tags/";
+
+	if (prefix) {
+		struct ref_iterator *iter;
+
+		iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
+					       "", NULL, 0, 0);
+
+		if (filter->start_after)
+			ret = start_ref_iterator_after(iter, filter->start_after);
+		else if (prefix)
+			ret = ref_iterator_seek(iter, prefix, 1);
+
+		if (!ret)
+			ret = do_for_each_ref_iterator(iter, fn, cb_data);
+	} else if (filter->kind & FILTER_REFS_REGULAR) {
 		ret = for_each_fullref_in_pattern(filter, fn, cb_data);
+	}
 
 	/*
 	 * When printing all ref types, HEAD is already included,
 --git a/ref-filter.h b/ref-filter.h
index c98c4fbd4c..f22ca94b49 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -64,6 +64,7 @@ struct ref_array {
 
 struct ref_filter {
 	const char **name_patterns;
+	const char *start_after;
 	struct strvec exclude;
 	struct oid_array points_at;
 	struct commit_list *with_commit;
 --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index bb02b86c16..e097db6b02 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -541,4 +541,198 @@ test_expect_success 'validate worktree atom' '
 	test_cmp expect actual
 '
 
+test_expect_success 'start after with empty value' '
+	cat >expect <<-\EOF &&
+	refs/heads/main
+	refs/heads/main_worktree
+	refs/heads/side
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after="" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after a specific reference with partial match' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/sp >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, just behind a specific reference' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/parrot >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after with specific directory match' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after with specific directory and trailing slash' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, just behind a specific directory' '
+	cat >expect <<-\EOF &&
+	refs/odd/spot
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, overflow specific reference length' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spotnew >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, overflow specific reference path' '
+	cat >expect <<-\EOF &&
+	refs/tags/annotated-tag
+	refs/tags/doubly-annotated-tag
+	refs/tags/doubly-signed-tag
+	refs/tags/foo1.10
+	refs/tags/foo1.3
+	refs/tags/foo1.6
+	refs/tags/four
+	refs/tags/one
+	refs/tags/signed-tag
+	refs/tags/three
+	refs/tags/two
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot/new >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after, last reference' '
+	cat >expect <<-\EOF &&
+	EOF
+	git for-each-ref --format="%(refname)" --start-after=refs/tags/two >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after used with a pattern' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --start-after with patterns
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot refs/tags 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'start after used with custom sort order' '
+	cat >expect <<-\EOF &&
+	fatal: cannot use --start-after with custom sort options
+	EOF
+	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot --sort=author 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v5 5/5] for-each-ref: introduce a '--start-after' option
  2025-07-15 11:28   ` [PATCH v5 5/5] for-each-ref: introduce a '--start-after' option Karthik Nayak
@ 2025-07-17 15:31     ` Junio C Hamano
  2025-07-22  8:07       ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-17 15:31 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, schwab, phillip.wood123, Christian Couder
Karthik Nayak <karthik.188@gmail.com> writes:
> The `git-for-each-ref(1)` command is used to iterate over references
> present in a repository. In large repositories with millions of
> references, it would be optimal to paginate this output such that we
> can start iteration from a given reference. This would avoid having to
> iterate over all references from the beginning each time when paginating
> through results.
>
> The previous commit added 'seek' functionality to the reference
> backends. Utilize this and expose a '--start-after' option in
> 'git-for-each-ref(1)'. When used, the reference iteration seeks to the
> lexicographically next reference and iterates from there onward.
>
> This enables efficient pagination workflows, where the calling script
> can remember the last provided reference and use that as the starting
> point for the next set of references:
>     git for-each-ref --count=100
>     git for-each-ref --count=100 --start-after=refs/heads/branch-100
>     git for-each-ref --count=100 --start-after=refs/heads/branch-200
>
> Since the reference iterators only allow seeking to a specified marker
> via the `ref_iterator_seek()`, we introduce a helper function
> `start_ref_iterator_after()`, which seeks to next reference by simply
> adding (char) 1 to the marker.
>
> We must note that pagination always continues from the provided marker,
> as such any concurrent reference updates lexicographically behind the
> marker will not be output. Document the same.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  Documentation/git-for-each-ref.adoc |  10 +-
>  builtin/for-each-ref.c              |   8 ++
>  ref-filter.c                        |  78 +++++++++++----
>  ref-filter.h                        |   1 +
>  t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 272 insertions(+), 19 deletions(-)
>
> diff --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
> index 5ef89fc0fe..ae61ba642a 100644
> --- a/Documentation/git-for-each-ref.adoc
> +++ b/Documentation/git-for-each-ref.adoc
> @@ -14,7 +14,7 @@ SYNOPSIS
>  		   [--points-at=<object>]
>  		   [--merged[=<object>]] [--no-merged[=<object>]]
>  		   [--contains[=<object>]] [--no-contains[=<object>]]
> -		   [--exclude=<pattern> ...]
> +		   [--exclude=<pattern> ...] [--start-after=<marker>]
Not a problem this patch introduces, but as I noticed it, let me
leave a #leftoverbits comment here (it is OK to have a preliminary
clean-up patch).
 * "--exclude=<pattern>" should be enclosed inside a pair of
   (parentheses), just like the way how [(--sort=<key>)...] is
   shown.
 * [--stdin | <pattern>...] should be moved to the end.  There is no
   reason to require "--stdin" to be the end of dashed options, but
   the <pattern>... must be, as they are positional, not dashed.
> +--start-after=<marker>::
> +    Allows paginating the output by skipping references up to and including the
> +    specified marker. When paging, it should be noted that references may be
> +    deleted, modified or added between invocations. Output will only yield those
> +    references which follow the marker lexicographically. Output begins from the
> +    first reference that would come after the marker alphabetically. Cannot be
> +    used with general pattern matching or custom sort options.
It is unclear what "general" in "general pattern matching" refers
to.
    Cannot be used with `--sort=<key>` or `--stdin` options, or
    the _<pattern>_ argument(s) to limit the refs.
or something, perhaps?  It is curious how `--exclude=<pattern>`
interacts with the feature.  Presumably the exclusion is done so
late in the output phase that it does not have any effect?  It does
not have to be mentioned in this documentation if that is the case
as it is a mere implementation detail.  
    Side note.  The limitation that sorting and name_patterns cannot
    be used with the feature also comes from implementation
    (i.e. the name_patterns optimization will compete with this
    feature to take advantage of the "prefix" thing in an
    incompatible way), so while the reason does not have to be
    stated in the end-user facing documentation, the effect needs
    documenting.
> @@ -3189,6 +3221,7 @@ void filter_is_base(struct repository *r,
>  
>  static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
>  {
> +	const char *prefix = NULL;
> ...
> +
> +	if (prefix) {
> +		struct ref_iterator *iter;
> +
> +		iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
> +					       "", NULL, 0, 0);
> +
> +		if (filter->start_after)
The start_after of the filter comes from "--start-after=<mark>".
Can it be true with non-NULL prefix at this point?  Unless you add
support for the option to "git branch/tag", it would not happen, I
guess.
More importantly, when you do add support to "git branch/tag", the
code need to be updated to keep the original prefix while seeking
the cursor to the specified <mark>, instead of clearing it.
> +			ret = start_ref_iterator_after(iter, filter->start_after);
> +		else if (prefix)
> +			ret = ref_iterator_seek(iter, prefix, 1);
We have "REF_ITERATOR_SEEK_SET_PREFIX" for that "1"?
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v5 5/5] for-each-ref: introduce a '--start-after' option
  2025-07-17 15:31     ` Junio C Hamano
@ 2025-07-22  8:07       ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-22  8:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, schwab, phillip.wood123, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 5865 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The `git-for-each-ref(1)` command is used to iterate over references
>> present in a repository. In large repositories with millions of
>> references, it would be optimal to paginate this output such that we
>> can start iteration from a given reference. This would avoid having to
>> iterate over all references from the beginning each time when paginating
>> through results.
>>
>> The previous commit added 'seek' functionality to the reference
>> backends. Utilize this and expose a '--start-after' option in
>> 'git-for-each-ref(1)'. When used, the reference iteration seeks to the
>> lexicographically next reference and iterates from there onward.
>>
>> This enables efficient pagination workflows, where the calling script
>> can remember the last provided reference and use that as the starting
>> point for the next set of references:
>>     git for-each-ref --count=100
>>     git for-each-ref --count=100 --start-after=refs/heads/branch-100
>>     git for-each-ref --count=100 --start-after=refs/heads/branch-200
>>
>> Since the reference iterators only allow seeking to a specified marker
>> via the `ref_iterator_seek()`, we introduce a helper function
>> `start_ref_iterator_after()`, which seeks to next reference by simply
>> adding (char) 1 to the marker.
>>
>> We must note that pagination always continues from the provided marker,
>> as such any concurrent reference updates lexicographically behind the
>> marker will not be output. Document the same.
>>
>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>> ---
>>  Documentation/git-for-each-ref.adoc |  10 +-
>>  builtin/for-each-ref.c              |   8 ++
>>  ref-filter.c                        |  78 +++++++++++----
>>  ref-filter.h                        |   1 +
>>  t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
>>  5 files changed, 272 insertions(+), 19 deletions(-)
>>
>> diff --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
>> index 5ef89fc0fe..ae61ba642a 100644
>> --- a/Documentation/git-for-each-ref.adoc
>> +++ b/Documentation/git-for-each-ref.adoc
>> @@ -14,7 +14,7 @@ SYNOPSIS
>>  		   [--points-at=<object>]
>>  		   [--merged[=<object>]] [--no-merged[=<object>]]
>>  		   [--contains[=<object>]] [--no-contains[=<object>]]
>> -		   [--exclude=<pattern> ...]
>> +		   [--exclude=<pattern> ...] [--start-after=<marker>]
>
> Not a problem this patch introduces, but as I noticed it, let me
> leave a #leftoverbits comment here (it is OK to have a preliminary
> clean-up patch).
>
>  * "--exclude=<pattern>" should be enclosed inside a pair of
>    (parentheses), just like the way how [(--sort=<key>)...] is
>    shown.
>
>  * [--stdin | <pattern>...] should be moved to the end.  There is no
>    reason to require "--stdin" to be the end of dashed options, but
>    the <pattern>... must be, as they are positional, not dashed.
>
I'm sending in a series of these small fixes, so I'll add this in.
>> +--start-after=<marker>::
>> +    Allows paginating the output by skipping references up to and including the
>> +    specified marker. When paging, it should be noted that references may be
>> +    deleted, modified or added between invocations. Output will only yield those
>> +    references which follow the marker lexicographically. Output begins from the
>> +    first reference that would come after the marker alphabetically. Cannot be
>> +    used with general pattern matching or custom sort options.
>
> It is unclear what "general" in "general pattern matching" refers
> to.
>
>     Cannot be used with `--sort=<key>` or `--stdin` options, or
>     the _<pattern>_ argument(s) to limit the refs.
>
This does read much better. I'll also add this.
> or something, perhaps?  It is curious how `--exclude=<pattern>`
> interacts with the feature.  Presumably the exclusion is done so
> late in the output phase that it does not have any effect?  It does
> not have to be mentioned in this documentation if that is the case
> as it is a mere implementation detail.
That is correct indeed, while this doesn't have to be documented, I
think we can merit from a test. So I'll add that in.
>
>     Side note.  The limitation that sorting and name_patterns cannot
>     be used with the feature also comes from implementation
>     (i.e. the name_patterns optimization will compete with this
>     feature to take advantage of the "prefix" thing in an
>     incompatible way), so while the reason does not have to be
>     stated in the end-user facing documentation, the effect needs
>     documenting.
>
>> @@ -3189,6 +3221,7 @@ void filter_is_base(struct repository *r,
>>
>>  static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
>>  {
>> +	const char *prefix = NULL;
>> ...
>> +
>> +	if (prefix) {
>> +		struct ref_iterator *iter;
>> +
>> +		iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
>> +					       "", NULL, 0, 0);
>> +
>> +		if (filter->start_after)
>
> The start_after of the filter comes from "--start-after=<mark>".
> Can it be true with non-NULL prefix at this point?  Unless you add
> support for the option to "git branch/tag", it would not happen, I
> guess.
>
> More importantly, when you do add support to "git branch/tag", the
> code need to be updated to keep the original prefix while seeking
> the cursor to the specified <mark>, instead of clearing it.
>
Exactly, if we do add '<pattern>' and '--start-after' compatibility,
we'll have to make that change.
>> +			ret = start_ref_iterator_after(iter, filter->start_after);
>> +		else if (prefix)
>> +			ret = ref_iterator_seek(iter, prefix, 1);
>
> We have "REF_ITERATOR_SEEK_SET_PREFIX" for that "1"?
Yup, also the 'if (prefix)' can be dropped too.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
                     ` (4 preceding siblings ...)
  2025-07-15 11:28   ` [PATCH v5 5/5] for-each-ref: introduce a '--start-after' option Karthik Nayak
@ 2025-07-15 19:00   ` Junio C Hamano
  2025-07-17  1:19     ` Kyle Lippincott
  2025-07-23 21:51   ` [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level Junio C Hamano
  6 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-15 19:00 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, schwab, phillip.wood123, Christian Couder
Karthik Nayak <karthik.188@gmail.com> writes:
> Changes in v5:
> - Changes to the comments to refer to the flag
>   'REF_ITERATOR_SEEK_SET_PREFIX' instead of a variable used in older
>   versions. Also other small grammar fixes.
> - Added a commit to remove an unnecessary else clause.
> - Move seeking functionality within `for_each_fullref_in_pattern` to its
>   own function.
> - Fix incorrect naming in the tests.
> - Link to v4: https://lore.kernel.org/r/20250711-306-git-for-each-ref-pagination-v4-0-ed3303ad5b89@gmail.com
The two refactoring differences relative to the previous round do
make the result more plesant to eyes.  Looking great.
Will replace.  Thanks.
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-15 19:00   ` [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after' Junio C Hamano
@ 2025-07-17  1:19     ` Kyle Lippincott
  2025-07-17  1:54       ` Jeff King
  0 siblings, 1 reply; 102+ messages in thread
From: Kyle Lippincott @ 2025-07-17  1:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Karthik Nayak, git, ps, schwab, phillip.wood123, Christian Couder
There's something in this series that's triggering an msan warning in
t/t6302-for-each-ref-filter:
==2147==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x562a64e923cc in cache_ref_iterator_advance refs/ref-cache.c:409:27
    #1 0x562a64e88dc0 in ref_iterator_advance refs/iterator.c:15:9
    #2 0x562a64e88dc0 in merge_ref_iterator_advance refs/iterator.c:164:13
    #3 0x562a64e8850b in ref_iterator_advance refs/iterator.c:15:9
    #4 0x562a64e85f5a in files_ref_iterator_advance refs/files-backend.c:902:15
    #5 0x562a64e88bb4 in ref_iterator_advance refs/iterator.c:15:9
    #6 0x562a64e88bb4 in do_for_each_ref_iterator refs/iterator.c:478:15
    #7 0x562a64e64593 in for_each_fullref_with_seek ref-filter.c:2718:9
    #8 0x562a64e5cfe8 in for_each_fullref_in_pattern ref-filter.c
    #9 0x562a64e5cfe8 in do_filter_refs ref-filter.c:3263:9
    #10 0x562a64e5d7fc in filter_and_format_refs ref-filter.c:3364:3
    #11 0x562a64af0235 in cmd_for_each_ref builtin/for-each-ref.c:115:2
    #12 0x562a64a3ebdc in run_builtin git.c:480:11
    #13 0x562a64a3d342 in handle_builtin git.c:746:9
    #14 0x562a64a3be33 in run_argv git.c:813:4
    #15 0x562a64a3be33 in cmd_main git.c:953:19
    #16 0x562a64c2f12f in main common-main.c:9:11
SUMMARY: MemorySanitizer: use-of-uninitialized-value
refs/ref-cache.c:409:27 in cache_ref_iterator_advance
Unfortunately I can't provide great instructions for reproducing this
locally, because it relies on our internal build stack (which uses
blaze). Getting MemorySanitizer running can be quite annoying, though
you might not have any issues if this test doesn't invoke any third
party libraries (like zlib).
I need to sign off for the night soon, but if this isn't sufficient
enough information to identify what's happening here, I can try to dig
deeper tomorrow. This run was executed on an import of upstream commit
4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc (Junio's merge of this
series)
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-17  1:19     ` Kyle Lippincott
@ 2025-07-17  1:54       ` Jeff King
  2025-07-17 17:08         ` Kyle Lippincott
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2025-07-17  1:54 UTC (permalink / raw)
  To: Kyle Lippincott
  Cc: Junio C Hamano, Karthik Nayak, git, ps, schwab, phillip.wood123,
	Christian Couder
On Wed, Jul 16, 2025 at 06:19:32PM -0700, Kyle Lippincott wrote:
> Unfortunately I can't provide great instructions for reproducing this
> locally, because it relies on our internal build stack (which uses
> blaze). Getting MemorySanitizer running can be quite annoying, though
> you might not have any issues if this test doesn't invoke any third
> party libraries (like zlib).
> 
> I need to sign off for the night soon, but if this isn't sufficient
> enough information to identify what's happening here, I can try to dig
> deeper tomorrow. This run was executed on an import of upstream commit
> 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc (Junio's merge of this
> series)
valgrind can often find the same issues as MSan without as much headache
to get it running (the downside is that it is _way_ slower). And indeed:
  git checkout 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc &&
  make &&
  (cd t && ./t6302-for-each-ref-filter.sh --valgrind-only=48)
yields:
  ==2177572== Conditional jump or move depends on uninitialised value(s)
  ==2177572==    at 0x3BC380: cache_ref_iterator_advance (ref-cache.c:409)
  ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
  ==2177572==    by 0x3B6CC3: merge_ref_iterator_advance (iterator.c:179)
  ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
  ==2177572==    by 0x3A9770: files_ref_iterator_advance (files-backend.c:902)
  ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
  ==2177572==    by 0x3B7457: do_for_each_ref_iterator (iterator.c:478)
  ==2177572==    by 0x399B43: for_each_fullref_with_seek (ref-filter.c:2718)
  ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
  ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
  ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
  ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
  ==2177572==  Uninitialised value was created by a heap allocation
  ==2177572==    at 0x484BDD0: realloc (vg_replace_malloc.c:1801)
  ==2177572==    by 0x44E941: xrealloc (wrapper.c:140)
  ==2177572==    by 0x3BCAD9: cache_ref_iterator_begin (ref-cache.c:580)
  ==2177572==    by 0x3A988A: files_ref_iterator_begin (files-backend.c:995)
  ==2177572==    by 0x3A295E: refs_ref_iterator_begin (refs.c:1776)
  ==2177572==    by 0x399AF6: for_each_fullref_with_seek (ref-filter.c:2710)
  ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
  ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
  ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
  ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
  ==2177572==    by 0x128C90: run_builtin (git.c:480)
  ==2177572==    by 0x1290EB: handle_builtin (git.c:746)
Bisecting doesn't tell us much, though (the first commit that introduces
the test shows the problem). I didn't dig further than that.
-Peff
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-17  1:54       ` Jeff King
@ 2025-07-17 17:08         ` Kyle Lippincott
  2025-07-17 19:26           ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Kyle Lippincott @ 2025-07-17 17:08 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Karthik Nayak, git, ps, schwab, phillip.wood123,
	Christian Couder
On Wed, Jul 16, 2025 at 6:54 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jul 16, 2025 at 06:19:32PM -0700, Kyle Lippincott wrote:
>
> > Unfortunately I can't provide great instructions for reproducing this
> > locally, because it relies on our internal build stack (which uses
> > blaze). Getting MemorySanitizer running can be quite annoying, though
> > you might not have any issues if this test doesn't invoke any third
> > party libraries (like zlib).
> >
> > I need to sign off for the night soon, but if this isn't sufficient
> > enough information to identify what's happening here, I can try to dig
> > deeper tomorrow. This run was executed on an import of upstream commit
> > 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc (Junio's merge of this
> > series)
>
> valgrind can often find the same issues as MSan without as much headache
> to get it running (the downside is that it is _way_ slower). And indeed:
>
>   git checkout 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc &&
>   make &&
>   (cd t && ./t6302-for-each-ref-filter.sh --valgrind-only=48)
>
> yields:
>
>   ==2177572== Conditional jump or move depends on uninitialised value(s)
>   ==2177572==    at 0x3BC380: cache_ref_iterator_advance (ref-cache.c:409)
>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
>   ==2177572==    by 0x3B6CC3: merge_ref_iterator_advance (iterator.c:179)
>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
>   ==2177572==    by 0x3A9770: files_ref_iterator_advance (files-backend.c:902)
>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
>   ==2177572==    by 0x3B7457: do_for_each_ref_iterator (iterator.c:478)
>   ==2177572==    by 0x399B43: for_each_fullref_with_seek (ref-filter.c:2718)
>   ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
>   ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
>   ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
>   ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
>   ==2177572==  Uninitialised value was created by a heap allocation
>   ==2177572==    at 0x484BDD0: realloc (vg_replace_malloc.c:1801)
>   ==2177572==    by 0x44E941: xrealloc (wrapper.c:140)
>   ==2177572==    by 0x3BCAD9: cache_ref_iterator_begin (ref-cache.c:580)
>   ==2177572==    by 0x3A988A: files_ref_iterator_begin (files-backend.c:995)
>   ==2177572==    by 0x3A295E: refs_ref_iterator_begin (refs.c:1776)
>   ==2177572==    by 0x399AF6: for_each_fullref_with_seek (ref-filter.c:2710)
>   ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
>   ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
>   ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
>   ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
>   ==2177572==    by 0x128C90: run_builtin (git.c:480)
>   ==2177572==    by 0x1290EB: handle_builtin (git.c:746)
>
> Bisecting doesn't tell us much, though (the first commit that introduces
> the test shows the problem). I didn't dig further than that.
>
> -Peff
Thanks for that, that helped me a bit too as it provides more
information than I was getting out of MemorySanitizer (I suspect
MemorySanitizer was producing the information it just wasn't going to
stderr or something, or maybe I was missing a flag to get it to report
more). I'm not sure what the right fix would be; my guess is that the
fix would be to modify the places where we set levels_nr and
initialize the other fields in level to also set it to prefix_state
(around lines 488 and 527 in ref-cache.c); and indeed setting the
prefix_state to PREFIX_CONTAINS_DIR (the 0 value of the enum) makes
the test pass even under valgrind. Unfortunately without a much more
in-depth knowledge of the code and the enum values I can't
definitively state that those are the correct values. I can say that
setting it to PREFIX_WITHIN_DIR causes both additional valgrind
failures and test failures even without valgrind, but setting it to
PREFIX_EXCLUDES_DIR doesn't seem to be a problem. I also moved the
`if` around like 409 into the following if, because that was the only
time entry_prefix_state was used, I'd been thinking that maybe it
needed the check for entry->flag & REF_DIR prior to referencing
level->prefix_state, but that didn't resolve it on its own.
I don't mind if anyone else picks up this fix and runs with it, but
I'm not comfortable sending this patch myself because I don't have
enough knowledge of this are of the code to know if it's right, just
that it fixes the issue we encountered, and I'm extremely overloaded
right now and can't get that knowledge nor see the patch through to
the end.
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 1d95b56d40..24feb33fcb 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -391,7 +391,6 @@ static int cache_ref_iterator_advance(struct
ref_iterator *ref_iterator)
                        &iter->levels[iter->levels_nr - 1];
                struct ref_dir *dir = level->dir;
                struct ref_entry *entry;
-               enum prefix_state entry_prefix_state;
                if (level->index == -1)
                        sort_ref_dir(dir);
@@ -406,16 +405,17 @@ static int cache_ref_iterator_advance(struct
ref_iterator *ref_iterator)
                entry = dir->entries[level->index];
-               if (level->prefix_state == PREFIX_WITHIN_DIR) {
-                       entry_prefix_state =
overlaps_prefix(entry->name, iter->prefix);
-                       if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
-                           (entry_prefix_state == PREFIX_WITHIN_DIR
&& !(entry->flag & REF_DIR)))
-                               continue;
-               } else {
-                       entry_prefix_state = level->prefix_state;
-               }
-
                if (entry->flag & REF_DIR) {
+                       enum prefix_state entry_prefix_state;
+                       if (level->prefix_state == PREFIX_WITHIN_DIR) {
+                               entry_prefix_state =
overlaps_prefix(entry->name, iter->prefix);
+                               if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
+                                   (entry_prefix_state ==
PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR)))
+                                       continue;
+                       } else {
+                               entry_prefix_state = level->prefix_state;
+                       }
+
                        /* push down a level */
                        ALLOC_GROW(iter->levels, iter->levels_nr + 1,
                                   iter->levels_alloc);
@@ -489,6 +489,7 @@ static int cache_ref_iterator_seek(struct
ref_iterator *ref_iterator,
                level = &iter->levels[0];
                level->index = -1;
                level->dir = dir;
+               level->prefix_state = PREFIX_EXCLUDES_DIR;      //
FIXME: PROBABLY NOT CORRECT
                /* Unset any previously set prefix */
                FREE_AND_NULL(iter->prefix);
@@ -527,6 +528,7 @@ static int cache_ref_iterator_seek(struct
ref_iterator *ref_iterator,
                                level = &iter->levels[iter->levels_nr++];
                                level->dir = dir;
                                level->index = -1;
+                               level->prefix_state =
PREFIX_EXCLUDES_DIR;      // FIXME: PROBABLY NOT CORRECT
                        } else {
                                /* reduce the index so the leaf node
is iterated over */
                                if (cmp <= 0 && !slash)
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-17 17:08         ` Kyle Lippincott
@ 2025-07-17 19:26           ` Karthik Nayak
  2025-07-17 19:35             ` Kyle Lippincott
  2025-07-17 22:21             ` Junio C Hamano
  0 siblings, 2 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-17 19:26 UTC (permalink / raw)
  To: Kyle Lippincott, Jeff King, Patrick Steinhardt
  Cc: Junio C Hamano, git, schwab, phillip.wood123, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 12469 bytes --]
Kyle Lippincott <spectral@google.com> writes:
> On Wed, Jul 16, 2025 at 6:54 PM Jeff King <peff@peff.net> wrote:
>>
>> On Wed, Jul 16, 2025 at 06:19:32PM -0700, Kyle Lippincott wrote:
>>
>> > Unfortunately I can't provide great instructions for reproducing this
>> > locally, because it relies on our internal build stack (which uses
>> > blaze). Getting MemorySanitizer running can be quite annoying, though
>> > you might not have any issues if this test doesn't invoke any third
>> > party libraries (like zlib).
>> >
>> > I need to sign off for the night soon, but if this isn't sufficient
>> > enough information to identify what's happening here, I can try to dig
>> > deeper tomorrow. This run was executed on an import of upstream commit
>> > 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc (Junio's merge of this
>> > series)
>>
>> valgrind can often find the same issues as MSan without as much headache
>> to get it running (the downside is that it is _way_ slower). And indeed:
>>
>>   git checkout 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc &&
>>   make &&
>>   (cd t && ./t6302-for-each-ref-filter.sh --valgrind-only=48)
>>
>> yields:
>>
>>   ==2177572== Conditional jump or move depends on uninitialised value(s)
>>   ==2177572==    at 0x3BC380: cache_ref_iterator_advance (ref-cache.c:409)
>>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
>>   ==2177572==    by 0x3B6CC3: merge_ref_iterator_advance (iterator.c:179)
>>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
>>   ==2177572==    by 0x3A9770: files_ref_iterator_advance (files-backend.c:902)
>>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
>>   ==2177572==    by 0x3B7457: do_for_each_ref_iterator (iterator.c:478)
>>   ==2177572==    by 0x399B43: for_each_fullref_with_seek (ref-filter.c:2718)
>>   ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
>>   ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
>>   ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
>>   ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
>>   ==2177572==  Uninitialised value was created by a heap allocation
>>   ==2177572==    at 0x484BDD0: realloc (vg_replace_malloc.c:1801)
>>   ==2177572==    by 0x44E941: xrealloc (wrapper.c:140)
>>   ==2177572==    by 0x3BCAD9: cache_ref_iterator_begin (ref-cache.c:580)
>>   ==2177572==    by 0x3A988A: files_ref_iterator_begin (files-backend.c:995)
>>   ==2177572==    by 0x3A295E: refs_ref_iterator_begin (refs.c:1776)
>>   ==2177572==    by 0x399AF6: for_each_fullref_with_seek (ref-filter.c:2710)
>>   ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
>>   ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
>>   ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
>>   ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
>>   ==2177572==    by 0x128C90: run_builtin (git.c:480)
>>   ==2177572==    by 0x1290EB: handle_builtin (git.c:746)
>>
>> Bisecting doesn't tell us much, though (the first commit that introduces
>> the test shows the problem). I didn't dig further than that.
>>
>> -Peff
>
> Thanks for that, that helped me a bit too as it provides more
> information than I was getting out of MemorySanitizer (I suspect
> MemorySanitizer was producing the information it just wasn't going to
> stderr or something, or maybe I was missing a flag to get it to report
> more).
>
Thanks both for raising the issue. Thanks Jeff for also the valgrind
instructions.
On a sidenote, was discussing this at work and Patrick also mentioned
that we could try clang's MemorySanitizer. This seems to also be raising
issues on master, so it was hard to find the exact output that valgrind
was providing.
$ git checkout master
$ CC=clang meson setup --reconfigure memory_build . -Db_sanitize=memory
$ cd memory_build
$ meson test -i --test-args="-ix" t6302-for-each-ref-filter
...
==3275333==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x557bd886f4bb in git_mkstemps_mode ../wrapper.c:487:27
    #1 0x557bd886fb55 in git_mkstemp_mode ../wrapper.c:509:9
    #2 0x557bd8100d1a in create_tmpfile ../object-file.c:736:7
    #3 0x557bd80f1630 in start_loose_object_common ../object-file.c:781:7
    #4 0x557bd80f5203 in write_loose_object ../object-file.c:881:7
    #5 0x557bd80f4875 in write_object_file_flags ../object-file.c:1086:6
    #6 0x557bd80f9f65 in write_object_file ../object-file.h:181:9
    #7 0x557bd8101eb8 in index_mem ../object-file.c:1177:9
    #8 0x557bd80f8bd5 in index_core ../object-file.c:1247:10
    #9 0x557bd80f731d in index_fd ../object-file.c:1274:9
    #10 0x557bd80f95e4 in index_path ../object-file.c:1295:7
    #11 0x557bd831132d in add_to_index ../read-cache.c:771:7
    #12 0x557bd8313cb1 in add_file_to_index ../read-cache.c:804:9
    #13 0x557bd73f892c in add_files ../builtin/add.c:355:7
    #14 0x557bd73f4752 in cmd_add ../builtin/add.c:578:18
    #15 0x557bd7a38b6f in run_builtin ../git.c:480:11
    #16 0x557bd7a31d54 in handle_builtin ../git.c:746:9
    #17 0x557bd7a36644 in run_argv ../git.c:813:4
    #18 0x557bd7a30e09 in cmd_main ../git.c:953:19
    #19 0x557bd7a3ca01 in main ../common-main.c:9:11
    #20 0x7f7e3f02a4d7 in __libc_start_call_main
(/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a4d7)
(BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
    #21 0x7f7e3f02a59a in __libc_start_main@GLIBC_2.2.5
(/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a59a)
(BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
    #22 0x557bd7352b34 in _start (git+0x5db34)
Possibly something we need to look into cleaning up.
> I'm not sure what the right fix would be; my guess is that the
> fix would be to modify the places where we set levels_nr and
> initialize the other fields in level to also set it to prefix_state
> (around lines 488 and 527 in ref-cache.c); and indeed setting the
> prefix_state to PREFIX_CONTAINS_DIR (the 0 value of the enum) makes
> the test pass even under valgrind. Unfortunately without a much more
> in-depth knowledge of the code and the enum values I can't
> definitively state that those are the correct values. I can say that
> setting it to PREFIX_WITHIN_DIR causes both additional valgrind
> failures and test failures even without valgrind, but setting it to
> PREFIX_EXCLUDES_DIR doesn't seem to be a problem. I also moved the
> `if` around like 409 into the following if, because that was the only
> time entry_prefix_state was used, I'd been thinking that maybe it
> needed the check for entry->flag & REF_DIR prior to referencing
> level->prefix_state, but that didn't resolve it on its own.
>
> I don't mind if anyone else picks up this fix and runs with it, but
> I'm not comfortable sending this patch myself because I don't have
> enough knowledge of this are of the code to know if it's right, just
> that it fixes the issue we encountered, and I'm extremely overloaded
> right now and can't get that knowledge nor see the patch through to
> the end.
>
Thanks for taking a stab at this, your inference is correct. Let me
clairfy some parts of it.
So the 'ref-cache' iteration logic is used to provide iteration over
loose refs (which consists of directories and entries). Anytime we come
across a directory, we add it to the level variable, which acts as
stack, when all entries under the current level are yielded, we pop the
stack to obtain the next level to iterate. This ensures we iterate over
all directories recursively.
Before this series, the seek function was used to set the prefix for
iteration, which meant we need to find the directory for matching the
prefix and only iterate over that level and its subdirs. If the prefix
provided was a directory like 'refs/heads/' then all refs under that
would be yielded (PREFIX_CONTAINS_DIR). If the prefix was
'refs/heads/foo', then the level would be set to 'ref/heads/' with the
PREFIX_WITHIN_DIR flag set since only some refs within the dir would
match the prefix. Entries which didn't overlap the prefix are denoted by
PREFIX_EXCLUDES_DIR.
This series allows the seek function to set the cursor without setting
the prefix, which is a requirement for pagination. So there is no need
to set 'prefix_state' for this functionality. Which is why I didn't set
it, since the default value of '0' (PREFIX_CONTAINS_DIR) would be the
correct setting for all dirs. This causes the issue.
So the only fix required would be
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 1d95b56d40..ceef3a2008 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct
ref_iterator *ref_iterator,
 				level = &iter->levels[iter->levels_nr++];
 				level->dir = dir;
 				level->index = -1;
+				level->prefix_state = PREFIX_CONTAINS_DIR;
 			} else {
 				/* reduce the index so the leaf node is iterated over */
 				if (cmp <= 0 && !slash)
The other location (Line 488), is not needed because that is the root
directory and the 'prefix_state' for it is set in
'cache_ref_iterator_set_prefix()' when the iterator begins.
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 1d95b56d40..24feb33fcb 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -391,7 +391,6 @@ static int cache_ref_iterator_advance(struct
> ref_iterator *ref_iterator)
>                         &iter->levels[iter->levels_nr - 1];
>                 struct ref_dir *dir = level->dir;
>                 struct ref_entry *entry;
> -               enum prefix_state entry_prefix_state;
>
>                 if (level->index == -1)
>                         sort_ref_dir(dir);
> @@ -406,16 +405,17 @@ static int cache_ref_iterator_advance(struct
> ref_iterator *ref_iterator)
>
>                 entry = dir->entries[level->index];
>
> -               if (level->prefix_state == PREFIX_WITHIN_DIR) {
> -                       entry_prefix_state =
> overlaps_prefix(entry->name, iter->prefix);
> -                       if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
> -                           (entry_prefix_state == PREFIX_WITHIN_DIR
> && !(entry->flag & REF_DIR)))
> -                               continue;
> -               } else {
> -                       entry_prefix_state = level->prefix_state;
> -               }
> -
>                 if (entry->flag & REF_DIR) {
> +                       enum prefix_state entry_prefix_state;
> +                       if (level->prefix_state == PREFIX_WITHIN_DIR) {
> +                               entry_prefix_state =
> overlaps_prefix(entry->name, iter->prefix);
> +                               if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
> +                                   (entry_prefix_state ==
> PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR)))
> +                                       continue;
> +                       } else {
> +                               entry_prefix_state = level->prefix_state;
> +                       }
> +
>                         /* push down a level */
>                         ALLOC_GROW(iter->levels, iter->levels_nr + 1,
>                                    iter->levels_alloc);
> @@ -489,6 +489,7 @@ static int cache_ref_iterator_seek(struct
> ref_iterator *ref_iterator,
>                 level = &iter->levels[0];
>                 level->index = -1;
>                 level->dir = dir;
> +               level->prefix_state = PREFIX_EXCLUDES_DIR;      //
> FIXME: PROBABLY NOT CORRECT
>
>                 /* Unset any previously set prefix */
>                 FREE_AND_NULL(iter->prefix);
> @@ -527,6 +528,7 @@ static int cache_ref_iterator_seek(struct
> ref_iterator *ref_iterator,
>                                 level = &iter->levels[iter->levels_nr++];
>                                 level->dir = dir;
>                                 level->index = -1;
> +                               level->prefix_state =
> PREFIX_EXCLUDES_DIR;      // FIXME: PROBABLY NOT CORRECT
>                         } else {
>                                 /* reduce the index so the leaf node
> is iterated over */
>                                 if (cmp <= 0 && !slash)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-17 19:26           ` Karthik Nayak
@ 2025-07-17 19:35             ` Kyle Lippincott
  2025-07-17 22:09               ` Jeff King
  2025-07-17 22:21             ` Junio C Hamano
  1 sibling, 1 reply; 102+ messages in thread
From: Kyle Lippincott @ 2025-07-17 19:35 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: Jeff King, Patrick Steinhardt, Junio C Hamano, git, schwab,
	phillip.wood123, Christian Couder
On Thu, Jul 17, 2025 at 12:26 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> Kyle Lippincott <spectral@google.com> writes:
>
> > On Wed, Jul 16, 2025 at 6:54 PM Jeff King <peff@peff.net> wrote:
> >>
> >> On Wed, Jul 16, 2025 at 06:19:32PM -0700, Kyle Lippincott wrote:
> >>
> >> > Unfortunately I can't provide great instructions for reproducing this
> >> > locally, because it relies on our internal build stack (which uses
> >> > blaze). Getting MemorySanitizer running can be quite annoying, though
> >> > you might not have any issues if this test doesn't invoke any third
> >> > party libraries (like zlib).
> >> >
> >> > I need to sign off for the night soon, but if this isn't sufficient
> >> > enough information to identify what's happening here, I can try to dig
> >> > deeper tomorrow. This run was executed on an import of upstream commit
> >> > 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc (Junio's merge of this
> >> > series)
> >>
> >> valgrind can often find the same issues as MSan without as much headache
> >> to get it running (the downside is that it is _way_ slower). And indeed:
> >>
> >>   git checkout 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc &&
> >>   make &&
> >>   (cd t && ./t6302-for-each-ref-filter.sh --valgrind-only=48)
> >>
> >> yields:
> >>
> >>   ==2177572== Conditional jump or move depends on uninitialised value(s)
> >>   ==2177572==    at 0x3BC380: cache_ref_iterator_advance (ref-cache.c:409)
> >>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
> >>   ==2177572==    by 0x3B6CC3: merge_ref_iterator_advance (iterator.c:179)
> >>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
> >>   ==2177572==    by 0x3A9770: files_ref_iterator_advance (files-backend.c:902)
> >>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
> >>   ==2177572==    by 0x3B7457: do_for_each_ref_iterator (iterator.c:478)
> >>   ==2177572==    by 0x399B43: for_each_fullref_with_seek (ref-filter.c:2718)
> >>   ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
> >>   ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
> >>   ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
> >>   ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
> >>   ==2177572==  Uninitialised value was created by a heap allocation
> >>   ==2177572==    at 0x484BDD0: realloc (vg_replace_malloc.c:1801)
> >>   ==2177572==    by 0x44E941: xrealloc (wrapper.c:140)
> >>   ==2177572==    by 0x3BCAD9: cache_ref_iterator_begin (ref-cache.c:580)
> >>   ==2177572==    by 0x3A988A: files_ref_iterator_begin (files-backend.c:995)
> >>   ==2177572==    by 0x3A295E: refs_ref_iterator_begin (refs.c:1776)
> >>   ==2177572==    by 0x399AF6: for_each_fullref_with_seek (ref-filter.c:2710)
> >>   ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
> >>   ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
> >>   ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
> >>   ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
> >>   ==2177572==    by 0x128C90: run_builtin (git.c:480)
> >>   ==2177572==    by 0x1290EB: handle_builtin (git.c:746)
> >>
> >> Bisecting doesn't tell us much, though (the first commit that introduces
> >> the test shows the problem). I didn't dig further than that.
> >>
> >> -Peff
> >
> > Thanks for that, that helped me a bit too as it provides more
> > information than I was getting out of MemorySanitizer (I suspect
> > MemorySanitizer was producing the information it just wasn't going to
> > stderr or something, or maybe I was missing a flag to get it to report
> > more).
> >
>
> Thanks both for raising the issue. Thanks Jeff for also the valgrind
> instructions.
>
> On a sidenote, was discussing this at work and Patrick also mentioned
> that we could try clang's MemorySanitizer. This seems to also be raising
> issues on master, so it was hard to find the exact output that valgrind
> was providing.
>
> $ git checkout master
> $ CC=clang meson setup --reconfigure memory_build . -Db_sanitize=memory
> $ cd memory_build
> $ meson test -i --test-args="-ix" t6302-for-each-ref-filter
> ...
> ==3275333==WARNING: MemorySanitizer: use-of-uninitialized-value
>     #0 0x557bd886f4bb in git_mkstemps_mode ../wrapper.c:487:27
>     #1 0x557bd886fb55 in git_mkstemp_mode ../wrapper.c:509:9
>     #2 0x557bd8100d1a in create_tmpfile ../object-file.c:736:7
>     #3 0x557bd80f1630 in start_loose_object_common ../object-file.c:781:7
>     #4 0x557bd80f5203 in write_loose_object ../object-file.c:881:7
>     #5 0x557bd80f4875 in write_object_file_flags ../object-file.c:1086:6
>     #6 0x557bd80f9f65 in write_object_file ../object-file.h:181:9
>     #7 0x557bd8101eb8 in index_mem ../object-file.c:1177:9
>     #8 0x557bd80f8bd5 in index_core ../object-file.c:1247:10
>     #9 0x557bd80f731d in index_fd ../object-file.c:1274:9
>     #10 0x557bd80f95e4 in index_path ../object-file.c:1295:7
>     #11 0x557bd831132d in add_to_index ../read-cache.c:771:7
>     #12 0x557bd8313cb1 in add_file_to_index ../read-cache.c:804:9
>     #13 0x557bd73f892c in add_files ../builtin/add.c:355:7
>     #14 0x557bd73f4752 in cmd_add ../builtin/add.c:578:18
>     #15 0x557bd7a38b6f in run_builtin ../git.c:480:11
>     #16 0x557bd7a31d54 in handle_builtin ../git.c:746:9
>     #17 0x557bd7a36644 in run_argv ../git.c:813:4
>     #18 0x557bd7a30e09 in cmd_main ../git.c:953:19
>     #19 0x557bd7a3ca01 in main ../common-main.c:9:11
>     #20 0x7f7e3f02a4d7 in __libc_start_call_main
> (/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a4d7)
> (BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
>     #21 0x7f7e3f02a59a in __libc_start_main@GLIBC_2.2.5
> (/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a59a)
> (BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
>     #22 0x557bd7352b34 in _start (git+0x5db34)
>
> Possibly something we need to look into cleaning up.
I also saw those msan issues when trying `make
CFLAGS=-fsanitize=memory CC=clang`, but not with Google's internal
msan build. I don't know which variable in wrapper.c:487 it's
complaining about - you'd think it'd be `letters`, but if it's `v`,
then that potentially comes from OpenSSL or some other library, and
that library would also need to be built with msan (which is why it's
such a pain to get msan builds working - EVERY library needs to be
built with memory sanitizer).
>
> > I'm not sure what the right fix would be; my guess is that the
> > fix would be to modify the places where we set levels_nr and
> > initialize the other fields in level to also set it to prefix_state
> > (around lines 488 and 527 in ref-cache.c); and indeed setting the
> > prefix_state to PREFIX_CONTAINS_DIR (the 0 value of the enum) makes
> > the test pass even under valgrind. Unfortunately without a much more
> > in-depth knowledge of the code and the enum values I can't
> > definitively state that those are the correct values. I can say that
> > setting it to PREFIX_WITHIN_DIR causes both additional valgrind
> > failures and test failures even without valgrind, but setting it to
> > PREFIX_EXCLUDES_DIR doesn't seem to be a problem. I also moved the
> > `if` around like 409 into the following if, because that was the only
> > time entry_prefix_state was used, I'd been thinking that maybe it
> > needed the check for entry->flag & REF_DIR prior to referencing
> > level->prefix_state, but that didn't resolve it on its own.
> >
> > I don't mind if anyone else picks up this fix and runs with it, but
> > I'm not comfortable sending this patch myself because I don't have
> > enough knowledge of this are of the code to know if it's right, just
> > that it fixes the issue we encountered, and I'm extremely overloaded
> > right now and can't get that knowledge nor see the patch through to
> > the end.
> >
>
> Thanks for taking a stab at this, your inference is correct. Let me
> clairfy some parts of it.
>
> So the 'ref-cache' iteration logic is used to provide iteration over
> loose refs (which consists of directories and entries). Anytime we come
> across a directory, we add it to the level variable, which acts as
> stack, when all entries under the current level are yielded, we pop the
> stack to obtain the next level to iterate. This ensures we iterate over
> all directories recursively.
>
> Before this series, the seek function was used to set the prefix for
> iteration, which meant we need to find the directory for matching the
> prefix and only iterate over that level and its subdirs. If the prefix
> provided was a directory like 'refs/heads/' then all refs under that
> would be yielded (PREFIX_CONTAINS_DIR). If the prefix was
> 'refs/heads/foo', then the level would be set to 'ref/heads/' with the
> PREFIX_WITHIN_DIR flag set since only some refs within the dir would
> match the prefix. Entries which didn't overlap the prefix are denoted by
> PREFIX_EXCLUDES_DIR.
>
> This series allows the seek function to set the cursor without setting
> the prefix, which is a requirement for pagination. So there is no need
> to set 'prefix_state' for this functionality. Which is why I didn't set
> it, since the default value of '0' (PREFIX_CONTAINS_DIR) would be the
> correct setting for all dirs. This causes the issue.
>
> So the only fix required would be
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 1d95b56d40..ceef3a2008 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct
> ref_iterator *ref_iterator,
>                                 level = &iter->levels[iter->levels_nr++];
>                                 level->dir = dir;
>                                 level->index = -1;
> +                               level->prefix_state = PREFIX_CONTAINS_DIR;
>                         } else {
>                                 /* reduce the index so the leaf node is iterated over */
>                                 if (cmp <= 0 && !slash)
>
>
> The other location (Line 488), is not needed because that is the root
> directory and the 'prefix_state' for it is set in
> 'cache_ref_iterator_set_prefix()' when the iterator begins.
>
> >
> > diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> > index 1d95b56d40..24feb33fcb 100644
> > --- a/refs/ref-cache.c
> > +++ b/refs/ref-cache.c
> > @@ -391,7 +391,6 @@ static int cache_ref_iterator_advance(struct
> > ref_iterator *ref_iterator)
> >                         &iter->levels[iter->levels_nr - 1];
> >                 struct ref_dir *dir = level->dir;
> >                 struct ref_entry *entry;
> > -               enum prefix_state entry_prefix_state;
> >
> >                 if (level->index == -1)
> >                         sort_ref_dir(dir);
> > @@ -406,16 +405,17 @@ static int cache_ref_iterator_advance(struct
> > ref_iterator *ref_iterator)
> >
> >                 entry = dir->entries[level->index];
> >
> > -               if (level->prefix_state == PREFIX_WITHIN_DIR) {
> > -                       entry_prefix_state =
> > overlaps_prefix(entry->name, iter->prefix);
> > -                       if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
> > -                           (entry_prefix_state == PREFIX_WITHIN_DIR
> > && !(entry->flag & REF_DIR)))
> > -                               continue;
> > -               } else {
> > -                       entry_prefix_state = level->prefix_state;
> > -               }
> > -
> >                 if (entry->flag & REF_DIR) {
> > +                       enum prefix_state entry_prefix_state;
> > +                       if (level->prefix_state == PREFIX_WITHIN_DIR) {
> > +                               entry_prefix_state =
> > overlaps_prefix(entry->name, iter->prefix);
> > +                               if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
> > +                                   (entry_prefix_state ==
> > PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR)))
> > +                                       continue;
> > +                       } else {
> > +                               entry_prefix_state = level->prefix_state;
> > +                       }
> > +
> >                         /* push down a level */
> >                         ALLOC_GROW(iter->levels, iter->levels_nr + 1,
> >                                    iter->levels_alloc);
> > @@ -489,6 +489,7 @@ static int cache_ref_iterator_seek(struct
> > ref_iterator *ref_iterator,
> >                 level = &iter->levels[0];
> >                 level->index = -1;
> >                 level->dir = dir;
> > +               level->prefix_state = PREFIX_EXCLUDES_DIR;      //
> > FIXME: PROBABLY NOT CORRECT
> >
> >                 /* Unset any previously set prefix */
> >                 FREE_AND_NULL(iter->prefix);
> > @@ -527,6 +528,7 @@ static int cache_ref_iterator_seek(struct
> > ref_iterator *ref_iterator,
> >                                 level = &iter->levels[iter->levels_nr++];
> >                                 level->dir = dir;
> >                                 level->index = -1;
> > +                               level->prefix_state =
> > PREFIX_EXCLUDES_DIR;      // FIXME: PROBABLY NOT CORRECT
> >                         } else {
> >                                 /* reduce the index so the leaf node
> > is iterated over */
> >                                 if (cmp <= 0 && !slash)
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-17 19:35             ` Kyle Lippincott
@ 2025-07-17 22:09               ` Jeff King
  2025-07-17 22:16                 ` Jeff King
  2025-07-21 14:27                 ` Karthik Nayak
  0 siblings, 2 replies; 102+ messages in thread
From: Jeff King @ 2025-07-17 22:09 UTC (permalink / raw)
  To: Kyle Lippincott
  Cc: Karthik Nayak, Patrick Steinhardt, Junio C Hamano, git, schwab,
	phillip.wood123, Christian Couder
On Thu, Jul 17, 2025 at 12:35:58PM -0700, Kyle Lippincott wrote:
> > ==3275333==WARNING: MemorySanitizer: use-of-uninitialized-value
> >     #0 0x557bd886f4bb in git_mkstemps_mode ../wrapper.c:487:27
> >     #1 0x557bd886fb55 in git_mkstemp_mode ../wrapper.c:509:9
> >     #2 0x557bd8100d1a in create_tmpfile ../object-file.c:736:7
> >     #3 0x557bd80f1630 in start_loose_object_common ../object-file.c:781:7
> >     #4 0x557bd80f5203 in write_loose_object ../object-file.c:881:7
> >     #5 0x557bd80f4875 in write_object_file_flags ../object-file.c:1086:6
> >     #6 0x557bd80f9f65 in write_object_file ../object-file.h:181:9
> >     #7 0x557bd8101eb8 in index_mem ../object-file.c:1177:9
> >     #8 0x557bd80f8bd5 in index_core ../object-file.c:1247:10
> >     #9 0x557bd80f731d in index_fd ../object-file.c:1274:9
> >     #10 0x557bd80f95e4 in index_path ../object-file.c:1295:7
> >     #11 0x557bd831132d in add_to_index ../read-cache.c:771:7
> >     #12 0x557bd8313cb1 in add_file_to_index ../read-cache.c:804:9
> >     #13 0x557bd73f892c in add_files ../builtin/add.c:355:7
> >     #14 0x557bd73f4752 in cmd_add ../builtin/add.c:578:18
> >     #15 0x557bd7a38b6f in run_builtin ../git.c:480:11
> >     #16 0x557bd7a31d54 in handle_builtin ../git.c:746:9
> >     #17 0x557bd7a36644 in run_argv ../git.c:813:4
> >     #18 0x557bd7a30e09 in cmd_main ../git.c:953:19
> >     #19 0x557bd7a3ca01 in main ../common-main.c:9:11
> >     #20 0x7f7e3f02a4d7 in __libc_start_call_main
> > (/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a4d7)
> > (BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
> >     #21 0x7f7e3f02a59a in __libc_start_main@GLIBC_2.2.5
> > (/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a59a)
> > (BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
> >     #22 0x557bd7352b34 in _start (git+0x5db34)
> >
> > Possibly something we need to look into cleaning up.
> 
> I also saw those msan issues when trying `make
> CFLAGS=-fsanitize=memory CC=clang`, but not with Google's internal
> msan build. I don't know which variable in wrapper.c:487 it's
> complaining about - you'd think it'd be `letters`, but if it's `v`,
> then that potentially comes from OpenSSL or some other library, and
> that library would also need to be built with msan (which is why it's
> such a pain to get msan builds working - EVERY library needs to be
> built with memory sanitizer).
Yeah, presumably it is "v" from csprng_bytes(). If there are only a few
such spots, we can manually "unpoison" memory coming from libraries. On
my system, I didn't hit the case shown above but do have trouble with
bytes coming back from zlib.
Applying this ancient patch:
  https://lore.kernel.org/git/20171004101932.pai6wzcv2eohsicr@sigill.intra.peff.net/
and building with "make SANITIZE=memory CC=clang" let me run t6302 to
completion, modulo the bug that started this thread (and which I
confirmed goes away both with MSan and valgrind with the fix Karthik
posted).
Probably:
diff --git a/wrapper.c b/wrapper.c
index 2f00d2ac87..6a4c1c1c29 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -482,6 +482,8 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
 		if (csprng_bytes(&v, sizeof(v), 0) < 0)
 			return error_errno("unable to get random bytes for temporary file");
 
+		msan_unpoison(&v, sizeof(v));
+
 		/* Fill in the random bits. */
 		for (i = 0; i < num_x; i++) {
 			filename_template[i] = letters[v % num_letters];
on top of that would fix the problem you guys are seeing. I don't know
if that path leads to insanity, though. Using MSan-enabled libraries is
probably a better direction (should increase accuracy, and we don't have
to carry these manual annotations around).
-Peff
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-17 22:09               ` Jeff King
@ 2025-07-17 22:16                 ` Jeff King
  2025-07-21 14:27                 ` Karthik Nayak
  1 sibling, 0 replies; 102+ messages in thread
From: Jeff King @ 2025-07-17 22:16 UTC (permalink / raw)
  To: Kyle Lippincott
  Cc: Karthik Nayak, Patrick Steinhardt, Junio C Hamano, git, schwab,
	phillip.wood123, Christian Couder
On Thu, Jul 17, 2025 at 06:09:29PM -0400, Jeff King wrote:
> Probably:
> 
> diff --git a/wrapper.c b/wrapper.c
> index 2f00d2ac87..6a4c1c1c29 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -482,6 +482,8 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>  		if (csprng_bytes(&v, sizeof(v), 0) < 0)
>  			return error_errno("unable to get random bytes for temporary file");
>  
> +		msan_unpoison(&v, sizeof(v));
> +
>  		/* Fill in the random bits. */
>  		for (i = 0; i < num_x; i++) {
>  			filename_template[i] = letters[v % num_letters];
> 
> 
> on top of that would fix the problem you guys are seeing. I don't know
> if that path leads to insanity, though. Using MSan-enabled libraries is
> probably a better direction (should increase accuracy, and we don't have
> to carry these manual annotations around).
Hmm, probably insanity. Just for fun I tried to run the whole suite, but
got this doozy:
  Uninitialized bytes in fopen64 at offset 0 inside [0x7020000109c0, 25)
  ==2568195==WARNING: MemorySanitizer: use-of-uninitialized-value
      #0 0x7f90fe14fa46 in BIO_new_file (/lib/x86_64-linux-gnu/libcrypto.so.3+0x14fa46) (BuildId: 07a8321bad67632b52b47ad026125c79b7ebaab4)
      #1 0x7f90fe1a659c  (/lib/x86_64-linux-gnu/libcrypto.so.3+0x1a659c) (BuildId: 07a8321bad67632b52b47ad026125c79b7ebaab4)
      #2 0x7f90fe1a8453 in CONF_modules_load_file_ex (/lib/x86_64-linux-gnu/libcrypto.so.3+0x1a8453) (BuildId: 07a8321bad67632b52b47ad026125c79b7ebaab4)
      #3 0x7f90fe1a8807  (/lib/x86_64-linux-gnu/libcrypto.so.3+0x1a8807) (BuildId: 07a8321bad67632b52b47ad026125c79b7ebaab4)
      #4 0x7f90fe27274e  (/lib/x86_64-linux-gnu/libcrypto.so.3+0x27274e) (BuildId: 07a8321bad67632b52b47ad026125c79b7ebaab4)
      #5 0x7f90fea01bc6 in __pthread_once_slow nptl/pthread_once.c:116:7
      #6 0x7f90fea01c38 in __pthread_once nptl/pthread_once.c:143:12
      #7 0x7f90fe287f3c in CRYPTO_THREAD_run_once (/lib/x86_64-linux-gnu/libcrypto.so.3+0x287f3c) (BuildId: 07a8321bad67632b52b47ad026125c79b7ebaab4)
      #8 0x7f90fe272fd9 in OPENSSL_init_crypto (/lib/x86_64-linux-gnu/libcrypto.so.3+0x272fd9) (BuildId: 07a8321bad67632b52b47ad026125c79b7ebaab4)
      #9 0x7f90fe78a6d7 in OPENSSL_init_ssl (/lib/x86_64-linux-gnu/libssl.so.3+0x396d7) (BuildId: a0d77cb273378dec1d74a115ac1c9e40306e675d)
      #10 0x7f90fee2a503  (/lib/x86_64-linux-gnu/libcurl.so.4+0x9e503) (BuildId: 61ee7a8d1799c0e6c38a99b4d739e0c90391a05f)
      #11 0x7f90fedc2e42  (/lib/x86_64-linux-gnu/libcurl.so.4+0x36e42) (BuildId: 61ee7a8d1799c0e6c38a99b4d739e0c90391a05f)
      #12 0x7f90fedc33c9 in curl_global_init (/lib/x86_64-linux-gnu/libcurl.so.4+0x373c9) (BuildId: 61ee7a8d1799c0e6c38a99b4d739e0c90391a05f)
      #13 0x55fbbe211623 in http_init http.c:1347:6
      #14 0x55fbbe1f5b98 in cmd_main remote-curl.c:1583:2
      #15 0x55fbbe244571 in main common-main.c:9:11
      #16 0x7f90fe993ca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
      #17 0x7f90fe993d64 in __libc_start_main csu/../csu/libc-start.c:360:3
      #18 0x55fbbe15bb60 in _start (git-remote-http+0x43b60) (BuildId: dfc63b9261f6d575776d30b4e048b235389a7b20)
  
  SUMMARY: MemorySanitizer: use-of-uninitialized-value (/lib/x86_64-linux-gnu/libcrypto.so.3+0x14fa46) (BuildId: 07a8321bad67632b52b47ad026125c79b7ebaab4) in BIO_new_file
So MSan complaining about stuff deep within curl/openssl, and AFAICT not
something we could influence or annotate as OK.
-Peff
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-17 22:09               ` Jeff King
  2025-07-17 22:16                 ` Jeff King
@ 2025-07-21 14:27                 ` Karthik Nayak
  2025-07-21 21:22                   ` Jeff King
  1 sibling, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-21 14:27 UTC (permalink / raw)
  To: Jeff King, Kyle Lippincott
  Cc: Patrick Steinhardt, Junio C Hamano, git, schwab, phillip.wood123,
	Christian Couder
[-- Attachment #1: Type: text/plain, Size: 4082 bytes --]
Jeff King <peff@peff.net> writes:
> On Thu, Jul 17, 2025 at 12:35:58PM -0700, Kyle Lippincott wrote:
>
>> > ==3275333==WARNING: MemorySanitizer: use-of-uninitialized-value
>> >     #0 0x557bd886f4bb in git_mkstemps_mode ../wrapper.c:487:27
>> >     #1 0x557bd886fb55 in git_mkstemp_mode ../wrapper.c:509:9
>> >     #2 0x557bd8100d1a in create_tmpfile ../object-file.c:736:7
>> >     #3 0x557bd80f1630 in start_loose_object_common ../object-file.c:781:7
>> >     #4 0x557bd80f5203 in write_loose_object ../object-file.c:881:7
>> >     #5 0x557bd80f4875 in write_object_file_flags ../object-file.c:1086:6
>> >     #6 0x557bd80f9f65 in write_object_file ../object-file.h:181:9
>> >     #7 0x557bd8101eb8 in index_mem ../object-file.c:1177:9
>> >     #8 0x557bd80f8bd5 in index_core ../object-file.c:1247:10
>> >     #9 0x557bd80f731d in index_fd ../object-file.c:1274:9
>> >     #10 0x557bd80f95e4 in index_path ../object-file.c:1295:7
>> >     #11 0x557bd831132d in add_to_index ../read-cache.c:771:7
>> >     #12 0x557bd8313cb1 in add_file_to_index ../read-cache.c:804:9
>> >     #13 0x557bd73f892c in add_files ../builtin/add.c:355:7
>> >     #14 0x557bd73f4752 in cmd_add ../builtin/add.c:578:18
>> >     #15 0x557bd7a38b6f in run_builtin ../git.c:480:11
>> >     #16 0x557bd7a31d54 in handle_builtin ../git.c:746:9
>> >     #17 0x557bd7a36644 in run_argv ../git.c:813:4
>> >     #18 0x557bd7a30e09 in cmd_main ../git.c:953:19
>> >     #19 0x557bd7a3ca01 in main ../common-main.c:9:11
>> >     #20 0x7f7e3f02a4d7 in __libc_start_call_main
>> > (/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a4d7)
>> > (BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
>> >     #21 0x7f7e3f02a59a in __libc_start_main@GLIBC_2.2.5
>> > (/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a59a)
>> > (BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
>> >     #22 0x557bd7352b34 in _start (git+0x5db34)
>> >
>> > Possibly something we need to look into cleaning up.
>>
>> I also saw those msan issues when trying `make
>> CFLAGS=-fsanitize=memory CC=clang`, but not with Google's internal
>> msan build. I don't know which variable in wrapper.c:487 it's
>> complaining about - you'd think it'd be `letters`, but if it's `v`,
>> then that potentially comes from OpenSSL or some other library, and
>> that library would also need to be built with msan (which is why it's
>> such a pain to get msan builds working - EVERY library needs to be
>> built with memory sanitizer).
>
> Yeah, presumably it is "v" from csprng_bytes(). If there are only a few
> such spots, we can manually "unpoison" memory coming from libraries. On
> my system, I didn't hit the case shown above but do have trouble with
> bytes coming back from zlib.
>
> Applying this ancient patch:
>
>   https://lore.kernel.org/git/20171004101932.pai6wzcv2eohsicr@sigill.intra.peff.net/
>
> and building with "make SANITIZE=memory CC=clang" let me run t6302 to
> completion, modulo the bug that started this thread (and which I
> confirmed goes away both with MSan and valgrind with the fix Karthik
> posted).
>
> Probably:
>
> diff --git a/wrapper.c b/wrapper.c
> index 2f00d2ac87..6a4c1c1c29 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -482,6 +482,8 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode)
>  		if (csprng_bytes(&v, sizeof(v), 0) < 0)
>  			return error_errno("unable to get random bytes for temporary file");
>
> +		msan_unpoison(&v, sizeof(v));
> +
>  		/* Fill in the random bits. */
>  		for (i = 0; i < num_x; i++) {
>  			filename_template[i] = letters[v % num_letters];
>
>
> on top of that would fix the problem you guys are seeing. I don't know
> if that path leads to insanity, though. Using MSan-enabled libraries is
> probably a better direction (should increase accuracy, and we don't have
> to carry these manual annotations around).
>
I wonder if an alternate is to use '-fsanitize-ignorelist', since the
MemorySanitizer is supposed to work with that too [1].
[1]: https://clang.llvm.org/docs/MemorySanitizer.html#ignorelist
> -Peff
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-21 14:27                 ` Karthik Nayak
@ 2025-07-21 21:22                   ` Jeff King
  2025-07-22  8:44                     ` Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2025-07-21 21:22 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: Kyle Lippincott, Patrick Steinhardt, Junio C Hamano, git, schwab,
	phillip.wood123, Christian Couder
On Mon, Jul 21, 2025 at 02:27:45PM +0000, Karthik Nayak wrote:
> > Applying this ancient patch:
> >
> >   https://lore.kernel.org/git/20171004101932.pai6wzcv2eohsicr@sigill.intra.peff.net/
> >
> > and building with "make SANITIZE=memory CC=clang" let me run t6302 to
> > completion, modulo the bug that started this thread (and which I
> > confirmed goes away both with MSan and valgrind with the fix Karthik
> > posted).
> [...]
> 
> I wonder if an alternate is to use '-fsanitize-ignorelist', since the
> MemorySanitizer is supposed to work with that too [1].
I think you could do that, but it isn't quite what we want: it is
annotating the access of those (false-positive) "uninitialized" bytes.
So you have to mark every spot that touches bytes that come from zlib,
which in Git is a lot of places. And so the patch linked above was an
attempt to silence all of those with a single line: marking the bytes
coming out of zlib as OK.
-Peff
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-21 21:22                   ` Jeff King
@ 2025-07-22  8:44                     ` Karthik Nayak
  0 siblings, 0 replies; 102+ messages in thread
From: Karthik Nayak @ 2025-07-22  8:44 UTC (permalink / raw)
  To: Jeff King
  Cc: Kyle Lippincott, Patrick Steinhardt, Junio C Hamano, git, schwab,
	phillip.wood123, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]
Jeff King <peff@peff.net> writes:
> On Mon, Jul 21, 2025 at 02:27:45PM +0000, Karthik Nayak wrote:
>
>> > Applying this ancient patch:
>> >
>> >   https://lore.kernel.org/git/20171004101932.pai6wzcv2eohsicr@sigill.intra.peff.net/
>> >
>> > and building with "make SANITIZE=memory CC=clang" let me run t6302 to
>> > completion, modulo the bug that started this thread (and which I
>> > confirmed goes away both with MSan and valgrind with the fix Karthik
>> > posted).
>> [...]
>>
>> I wonder if an alternate is to use '-fsanitize-ignorelist', since the
>> MemorySanitizer is supposed to work with that too [1].
>
> I think you could do that, but it isn't quite what we want: it is
> annotating the access of those (false-positive) "uninitialized" bytes.
> So you have to mark every spot that touches bytes that come from zlib,
> which in Git is a lot of places. And so the patch linked above was an
> attempt to silence all of those with a single line: marking the bytes
> coming out of zlib as OK.
>
That makes sense, thanks for explaining.
> -Peff
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'
  2025-07-17 19:26           ` Karthik Nayak
  2025-07-17 19:35             ` Kyle Lippincott
@ 2025-07-17 22:21             ` Junio C Hamano
  1 sibling, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2025-07-17 22:21 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: Kyle Lippincott, Jeff King, Patrick Steinhardt, git, schwab,
	phillip.wood123, Christian Couder
Karthik Nayak <karthik.188@gmail.com> writes:
> This series allows the seek function to set the cursor without setting
> the prefix, which is a requirement for pagination. So there is no need
> to set 'prefix_state' for this functionality. Which is why I didn't set
> it, since the default value of '0' (PREFIX_CONTAINS_DIR) would be the
> correct setting for all dirs. This causes the issue.
>
> So the only fix required would be
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 1d95b56d40..ceef3a2008 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct
> ref_iterator *ref_iterator,
>  				level = &iter->levels[iter->levels_nr++];
>  				level->dir = dir;
>  				level->index = -1;
> +				level->prefix_state = PREFIX_CONTAINS_DIR;
>  			} else {
>  				/* reduce the index so the leaf node is iterated over */
>  				if (cmp <= 0 && !slash)
Yup, that is inside the code added by this series.  It does look
like it fixes the complaint from the checker.
Thanks.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level
  2025-07-15 11:28 ` [PATCH v5 0/5] " Karthik Nayak
                     ` (5 preceding siblings ...)
  2025-07-15 19:00   ` [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after' Junio C Hamano
@ 2025-07-23 21:51   ` Junio C Hamano
  2025-07-23 21:57     ` Kyle Lippincott
                       ` (2 more replies)
  6 siblings, 3 replies; 102+ messages in thread
From: Junio C Hamano @ 2025-07-23 21:51 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, ps, schwab, phillip.wood123, Christian Couder,
	Kyle Lippincott, Jeff King
When cache_ref_iterator_seek() "jumps" to a middle of the sorted ref
list, it forgets to set the .prefix_state member of the new
(i.e. deeper) level it just initialized.  This later causes
cache_ref_iterator_advance() to look at this uninitialized member
to base its decision on what to do next.
Kyle Lippincott [*] and Jeff King noticed this with MSAN and
Valgrind, and Karthik Nayak as the original author located exactly
where the missing initialization is.
[*] <CAO_smVg9TDakUnubepjPGmLyOzW6n8Z=MDbnZKvkwN2=kN2RRw@mail.gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 refs/ref-cache.c | 1 +
 1 file changed, 1 insertion(+)
 * I had this as "fixup!" on top of your topic for quite a while and
   forgot to ask you to send in an official fix.  As Kyle's
   discovery was after the topic hit 'next' (understandable, as
   their internal edition of Git is based on 'next'), we need a
   separate fix on top.
   To prepare for merging down the whole thing to 'master', I wrote
   the proposed log message to help expedite the process.  Comments?
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 1d95b56d40..ceef3a2008 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 				level = &iter->levels[iter->levels_nr++];
 				level->dir = dir;
 				level->index = -1;
+				level->prefix_state = PREFIX_CONTAINS_DIR;
 			} else {
 				/* reduce the index so the leaf node is iterated over */
 				if (cmp <= 0 && !slash)
-- 
2.50.1-521-gf11ee0bd80
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level
  2025-07-23 21:51   ` [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level Junio C Hamano
@ 2025-07-23 21:57     ` Kyle Lippincott
  2025-07-23 23:52     ` Jeff King
  2025-07-24  8:12     ` Karthik Nayak
  2 siblings, 0 replies; 102+ messages in thread
From: Kyle Lippincott @ 2025-07-23 21:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Karthik Nayak, git, ps, schwab, phillip.wood123, Christian Couder,
	Jeff King
On Wed, Jul 23, 2025 at 2:51 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> When cache_ref_iterator_seek() "jumps" to a middle of the sorted ref
> list, it forgets to set the .prefix_state member of the new
> (i.e. deeper) level it just initialized.  This later causes
> cache_ref_iterator_advance() to look at this uninitialized member
> to base its decision on what to do next.
>
> Kyle Lippincott [*] and Jeff King noticed this with MSAN and
> Valgrind, and Karthik Nayak as the original author located exactly
> where the missing initialization is.
>
> [*] <CAO_smVg9TDakUnubepjPGmLyOzW6n8Z=MDbnZKvkwN2=kN2RRw@mail.gmail.com>
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Looks good to me, thanks!
> ---
>  refs/ref-cache.c | 1 +
>  1 file changed, 1 insertion(+)
>
>  * I had this as "fixup!" on top of your topic for quite a while and
>    forgot to ask you to send in an official fix.  As Kyle's
>    discovery was after the topic hit 'next' (understandable, as
>    their internal edition of Git is based on 'next'), we need a
>    separate fix on top.
>
>    To prepare for merging down the whole thing to 'master', I wrote
>    the proposed log message to help expedite the process.  Comments?
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 1d95b56d40..ceef3a2008 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
>                                 level = &iter->levels[iter->levels_nr++];
>                                 level->dir = dir;
>                                 level->index = -1;
> +                               level->prefix_state = PREFIX_CONTAINS_DIR;
>                         } else {
>                                 /* reduce the index so the leaf node is iterated over */
>                                 if (cmp <= 0 && !slash)
> --
> 2.50.1-521-gf11ee0bd80
>
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level
  2025-07-23 21:51   ` [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level Junio C Hamano
  2025-07-23 21:57     ` Kyle Lippincott
@ 2025-07-23 23:52     ` Jeff King
  2025-07-24  8:12     ` Karthik Nayak
  2 siblings, 0 replies; 102+ messages in thread
From: Jeff King @ 2025-07-23 23:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Karthik Nayak, git, ps, schwab, phillip.wood123, Christian Couder,
	Kyle Lippincott
On Wed, Jul 23, 2025 at 02:51:50PM -0700, Junio C Hamano wrote:
> When cache_ref_iterator_seek() "jumps" to a middle of the sorted ref
> list, it forgets to set the .prefix_state member of the new
> (i.e. deeper) level it just initialized.  This later causes
> cache_ref_iterator_advance() to look at this uninitialized member
> to base its decision on what to do next.
> 
> Kyle Lippincott [*] and Jeff King noticed this with MSAN and
> Valgrind, and Karthik Nayak as the original author located exactly
> where the missing initialization is.
This explanation makes sense to me (from my admittedly rusty view of the
ref iteration code). And certainly the patch looks right.
-Peff
^ permalink raw reply	[flat|nested] 102+ messages in thread
* Re: [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level
  2025-07-23 21:51   ` [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level Junio C Hamano
  2025-07-23 21:57     ` Kyle Lippincott
  2025-07-23 23:52     ` Jeff King
@ 2025-07-24  8:12     ` Karthik Nayak
  2025-07-24 17:01       ` Junio C Hamano
  2 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-24  8:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, ps, schwab, phillip.wood123, Christian Couder,
	Kyle Lippincott, Jeff King
[-- Attachment #1: Type: text/plain, Size: 3400 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> When cache_ref_iterator_seek() "jumps" to a middle of the sorted ref
> list, it forgets to set the .prefix_state member of the new
> (i.e. deeper) level it just initialized.  This later causes
> cache_ref_iterator_advance() to look at this uninitialized member
> to base its decision on what to do next.
>
I think the explanation is correct. For reference I had some more
details in my local patch, but this is totally okay.
  ref-cache: set prefix_state when seeking
  In 090eb5336c (refs: selectively set prefix in the seek functions,
  2025-07-15) we separated the seeking functionality of reference
  iterators from the functionality to set prefix to an iterator. This
  allows users of ref iterators to seek to a particular reference to
  provide pagination support.
  The files-backend, uses the ref-cache iterator to iterate over loose
  refs. The iterator tracks directories and entries already processed via
  a stack of levels. Each level corresponds to a directory under the files
  backend. New levels are added to the stack, and when all entries from a
  level is yielded, the corresponding level is popped from the stack.
  To accommodate seeking, we need to populate and traverse the levels to
  stop the requested seek marker at the appropriate level and its entry
  index. Each level also contains a 'prefix_state' which is used for
  prefix matching, this allows the iterator to skip levels/entries which
  don't match a prefix. The default value of 'prefix_state' is
  PREFIX_CONTAINS_DIR, which yields all entries within a level. When
  purely seeking without prefix matching, we want to yield all entries.
  The commit however, skips setting the value explicitly. This causes the
  MemorySanitizer to issue a 'use-of-uninitialized-value' error when
  running 't/t6302-for-each-ref-filter'.
  Set the value explicitly to avoid to fix the issue.
> Kyle Lippincott [*] and Jeff King noticed this with MSAN and
> Valgrind, and Karthik Nayak as the original author located exactly
> where the missing initialization is.
>
> [*] <CAO_smVg9TDakUnubepjPGmLyOzW6n8Z=MDbnZKvkwN2=kN2RRw@mail.gmail.com>
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  refs/ref-cache.c | 1 +
>  1 file changed, 1 insertion(+)
>
>  * I had this as "fixup!" on top of your topic for quite a while and
>    forgot to ask you to send in an official fix.  As Kyle's
>    discovery was after the topic hit 'next' (understandable, as
>    their internal edition of Git is based on 'next'), we need a
>    separate fix on top.
>
>    To prepare for merging down the whole thing to 'master', I wrote
>    the proposed log message to help expedite the process.  Comments?
>
I had a set of patches locally, I just didn't get around to sending it.
Will send the others, omitting this. Thanks for doing it!
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 1d95b56d40..ceef3a2008 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
>  				level = &iter->levels[iter->levels_nr++];
>  				level->dir = dir;
>  				level->index = -1;
> +				level->prefix_state = PREFIX_CONTAINS_DIR;
>  			} else {
>  				/* reduce the index so the leaf node is iterated over */
>  				if (cmp <= 0 && !slash)
> --
> 2.50.1-521-gf11ee0bd80
The patch looks good.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply	[flat|nested] 102+ messages in thread* Re: [PATCH] ref-iterator-seek: correctly initialize the prefix_state for a new level
  2025-07-24  8:12     ` Karthik Nayak
@ 2025-07-24 17:01       ` Junio C Hamano
  2025-07-24 22:11         ` [PATCH] ref-cache: set prefix_state when seeking Karthik Nayak
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2025-07-24 17:01 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, ps, schwab, phillip.wood123, Christian Couder,
	Kyle Lippincott, Jeff King
Karthik Nayak <karthik.188@gmail.com> writes:
>>  * I had this as "fixup!" on top of your topic for quite a while and
>>    forgot to ask you to send in an official fix.  As Kyle's
>>    discovery was after the topic hit 'next' (understandable, as
>>    their internal edition of Git is based on 'next'), we need a
>>    separate fix on top.
>>
>>    To prepare for merging down the whole thing to 'master', I wrote
>>    the proposed log message to help expedite the process.  Comments?
>>
>
> I had a set of patches locally, I just didn't get around to sending it.
> Will send the others, omitting this. Thanks for doing it!
I do not mind discarding what I sent out at all.  I actually prefer
if it came from you.
Thanks.
^ permalink raw reply	[flat|nested] 102+ messages in thread
* [PATCH] ref-cache: set prefix_state when seeking
  2025-07-24 17:01       ` Junio C Hamano
@ 2025-07-24 22:11         ` Karthik Nayak
  2025-07-24 22:30           ` Junio C Hamano
  0 siblings, 1 reply; 102+ messages in thread
From: Karthik Nayak @ 2025-07-24 22:11 UTC (permalink / raw)
  To: karthik.188; +Cc: git, gitster, spectral, peff
In 090eb5336c (refs: selectively set prefix in the seek functions,
2025-07-15) we separated the seeking functionality of reference
iterators from the functionality to set prefix to an iterator. This
allows users of ref iterators to seek to a particular reference to
provide pagination support.
The files-backend, uses the ref-cache iterator to iterate over loose
refs. The iterator tracks directories and entries already processed via
a stack of levels. Each level corresponds to a directory under the files
backend. New levels are added to the stack, and when all entries from a
level is yielded, the corresponding level is popped from the stack.
To accommodate seeking, we need to populate and traverse the levels to
stop the requested seek marker at the appropriate level and its entry
index. Each level also contains a 'prefix_state' which is used for
prefix matching, this allows the iterator to skip levels/entries which
don't match a prefix. The default value of 'prefix_state' is
PREFIX_CONTAINS_DIR, which yields all entries within a level. When
purely seeking without prefix matching, we want to yield all entries.
The commit however, skips setting the value explicitly. This causes the
MemorySanitizer to issue a 'use-of-uninitialized-value' error when
running 't/t6302-for-each-ref-filter'.
Set the value explicitly to avoid to fix the issue.
Reported-by: Kyle Lippincott <spectral@google.com>
Helped-by: Kyle Lippincott <spectral@google.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Here is my version of the same patch!
 refs/ref-cache.c | 1 +
 1 file changed, 1 insertion(+)
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 1d95b56d40..ceef3a2008 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
 				level = &iter->levels[iter->levels_nr++];
 				level->dir = dir;
 				level->index = -1;
+				level->prefix_state = PREFIX_CONTAINS_DIR;
 			} else {
 				/* reduce the index so the leaf node is iterated over */
 				if (cmp <= 0 && !slash)
-- 
2.49.0
^ permalink raw reply related	[flat|nested] 102+ messages in thread* Re: [PATCH] ref-cache: set prefix_state when seeking
  2025-07-24 22:11         ` [PATCH] ref-cache: set prefix_state when seeking Karthik Nayak
@ 2025-07-24 22:30           ` Junio C Hamano
  0 siblings, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2025-07-24 22:30 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, spectral, peff
Karthik Nayak <karthik.188@gmail.com> writes:
> In 090eb5336c (refs: selectively set prefix in the seek functions,
> 2025-07-15) we separated the seeking functionality of reference
> iterators from the functionality to set prefix to an iterator. This
> allows users of ref iterators to seek to a particular reference to
> provide pagination support.
>
> The files-backend, uses the ref-cache iterator to iterate over loose
> refs. The iterator tracks directories and entries already processed via
> a stack of levels. Each level corresponds to a directory under the files
> backend. New levels are added to the stack, and when all entries from a
> level is yielded, the corresponding level is popped from the stack.
>
> To accommodate seeking, we need to populate and traverse the levels to
> stop the requested seek marker at the appropriate level and its entry
> index. Each level also contains a 'prefix_state' which is used for
> prefix matching, this allows the iterator to skip levels/entries which
> don't match a prefix. The default value of 'prefix_state' is
> PREFIX_CONTAINS_DIR, which yields all entries within a level. When
> purely seeking without prefix matching, we want to yield all entries.
> The commit however, skips setting the value explicitly. This causes the
> MemorySanitizer to issue a 'use-of-uninitialized-value' error when
> running 't/t6302-for-each-ref-filter'.
>
> Set the value explicitly to avoid to fix the issue.
>
> Reported-by: Kyle Lippincott <spectral@google.com>
> Helped-by: Kyle Lippincott <spectral@google.com>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>
> Here is my version of the same patch!
Thanks!
>
>  refs/ref-cache.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 1d95b56d40..ceef3a2008 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
>  				level = &iter->levels[iter->levels_nr++];
>  				level->dir = dir;
>  				level->index = -1;
> +				level->prefix_state = PREFIX_CONTAINS_DIR;
>  			} else {
>  				/* reduce the index so the leaf node is iterated over */
>  				if (cmp <= 0 && !slash)
^ permalink raw reply	[flat|nested] 102+ messages in thread