All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Patrick Steinhardt" <ps@pks.im>,
	"Øystein Walle" <oystwa@gmail.com>,
	"Kristoffer Haugsbakk" <code@khaugsbakk.name>,
	"Victoria Dye" <vdye@github.com>
Subject: [PATCH v2 00/10] for-each-ref optimizations & usability improvements
Date: Tue, 14 Nov 2023 19:53:48 +0000	[thread overview]
Message-ID: <pull.1609.v2.git.1699991638.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1609.git.1699320361.gitgitgadget@gmail.com>

This series is a bit of an informal follow-up to [1], adding some more
substantial optimizations and usability fixes around ref
filtering/formatting. Some of the changes here affect user-facing behavior,
some are internal-only, but they're all interdependent enough to warrant
putting them together in one series.

[1]
https://lore.kernel.org/git/pull.1594.v2.git.1696888736.gitgitgadget@gmail.com/

Patch 1 changes the behavior of the '--no-sort' option in 'for-each-ref',
'tag', and 'branch'. Currently, it just removes previous sort keys and, if
no further keys are specified, falls back on ascending refname sort (which,
IMO, makes the name '--no-sort' somewhat misleading). Now, '--no-sort'
completely disables sorting (unless subsequent '--sort' options are
provided).

Patches 2-7 incrementally refactor various parts of the ref
filtering/formatting workflows in order to create a
'filter_and_format_refs()' function. If certain conditions are met (sorting
disabled, no reachability filtering or ahead-behind formatting), ref
filtering & formatting is done within a single 'for_each_fullref_in'
callback. Especially in large repositories, this makes a huge difference in
memory usage & runtime for certain usages of 'for-each-ref', since it's no
longer writing everything to a 'struct ref_array' then repeatedly whittling
down/updating its contents.

Patch 8 updates the 'for-each-ref' documentation, making the '--format'
description a bit less jumbled and more clearly explaining the '*' prefix
(to be updated in the next patch)

Patch 9 changes the dereferencing done by the '*' format prefix from a
single dereference to a recursive peel. See [1] + replies for the discussion
that led to this approach (as opposed to a command line option or new format
specifier).

[1] https://lore.kernel.org/git/ZUoWWo7IEKsiSx-C@tanuki/

Finally, patch 10 adds performance tests for 'for-each-ref', showing the
effects of optimizations made throughout the series. Here are some sample
results from my Ubuntu VM (test names shortened for space):

Test                                                         HEAD
----------------------------------------------------------------------------
6300.2: (loose)                                              4.68(0.98+3.64)
6300.3: (loose, no sort)                                     4.65(0.91+3.67)
6300.4: (loose, --count=1)                                   4.50(0.84+3.60)
6300.5: (loose, --count=1, no sort)                          4.24(0.46+3.71)
6300.6: (loose, tags)                                        2.41(0.45+1.93)
6300.7: (loose, tags, no sort)                               2.33(0.48+1.83)
6300.8: (loose, tags, dereferenced)                          3.65(1.66+1.95)
6300.9: (loose, tags, dereferenced, no sort)                 3.48(1.59+1.87)
6300.10: for-each-ref + cat-file (loose, tags)               4.48(2.27+2.22)
6300.12: (packed)                                            0.90(0.68+0.18)
6300.13: (packed, no sort)                                   0.71(0.55+0.06)
6300.14: (packed, --count=1)                                 0.77(0.52+0.16)
6300.15: (packed, --count=1, no sort)                        0.03(0.01+0.02)
6300.16: (packed, tags)                                      0.45(0.33+0.10)
6300.17: (packed, tags, no sort)                             0.39(0.33+0.03)
6300.18: (packed, tags, dereferenced)                        1.83(1.67+0.10)
6300.19: (packed, tags, dereferenced, no sort)               1.42(1.28+0.08)
6300.20: for-each-ref + cat-file (packed, tags)              2.36(2.11+0.29)


 * Victoria


Changes since V1
================

 * Restructured commit message of patch 1 for better readability
 * Re-added 'ref_sorting_release(sorting)' to 'ls-remote'
 * Dropped patch 2 so we don't commit to behavior we don't want in
   'for-each-ref --omit-empty --count'
 * Split patch 6 into one that renames 'ref_filter_handler()' to
   'filter_one()' and another that creates helper functions from existing
   code
 * Added/updated code comments in patch 7, changed ref iteration "break"
   return value from -1 to 1
 * Added a patch to reword 'for-each-ref' documentation in anticipation of
   updating the description of what '*' does in the format
 * Removed command-line option '--full-deref' for peeling tags in '*' format
   fields in favor of simply cutting over from the current single
   dereference to recursive dereference in all cases. Updated tests to match
   new behavior.
 * Added the '--count=1' tests back to p6300 (I must have unintentionally
   removed them before submitting V1)

Victoria Dye (10):
  ref-filter.c: really don't sort when using --no-sort
  ref-filter.h: add max_count and omit_empty to ref_format
  ref-filter.h: move contains caches into filter
  ref-filter.h: add functions for filter/format & format-only
  ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
  ref-filter.c: refactor to create common helper functions
  ref-filter.c: filter & format refs in the same callback
  for-each-ref: clean up documentation of --format
  ref-filter.c: use peeled tag for '*' format fields
  t/perf: add perf tests for for-each-ref

 Documentation/git-for-each-ref.txt |  23 +--
 builtin/branch.c                   |  42 +++--
 builtin/for-each-ref.c             |  39 +----
 builtin/ls-remote.c                |  11 +-
 builtin/tag.c                      |  32 +---
 ref-filter.c                       | 272 ++++++++++++++++++++---------
 ref-filter.h                       |  25 +++
 t/perf/p6300-for-each-ref.sh       |  87 +++++++++
 t/t3200-branch.sh                  |  68 +++++++-
 t/t6300-for-each-ref.sh            |  43 +++++
 t/t6302-for-each-ref-filter.sh     |   4 +-
 t/t7004-tag.sh                     |  45 +++++
 12 files changed, 517 insertions(+), 174 deletions(-)
 create mode 100755 t/perf/p6300-for-each-ref.sh


base-commit: bc5204569f7db44d22477485afd52ea410d83743
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1609%2Fvdye%2Fvdye%2Ffor-each-ref-optimizations-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1609/vdye/vdye/for-each-ref-optimizations-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1609

Range-diff vs v1:

  1:  dea8d7d1e86 !  1:  074da1ff3e8 ref-filter.c: really don't sort when using --no-sort
     @@ Metadata
       ## Commit message ##
          ref-filter.c: really don't sort when using --no-sort
      
     -    Update 'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if
     -    the string list provided to it is empty, rather than returning the default
     -    refname sort structure. Also update 'ref_array_sort()' to explicitly skip
     -    sorting if its 'struct ref_sorting *' arg is NULL. Other functions using
     -    'struct ref_sorting *' do not need any changes because they already properly
     -    ignore NULL values.
     +    When '--no-sort' is passed to 'for-each-ref', 'tag', and 'branch', the
     +    printed refs are still sorted by ascending refname. Change the handling of
     +    sort options in these commands so that '--no-sort' to truly disables
     +    sorting.
     +
     +    '--no-sort' does not disable sorting in these commands is because their
     +    option parsing does not distinguish between "the absence of '--sort'"
     +    (and/or values for tag.sort & branch.sort) and '--no-sort'. Both result in
     +    an empty 'sorting_options' string list, which is parsed by
     +    'ref_sorting_options()' to create the 'struct ref_sorting *' for the
     +    command. If the string list is empty, 'ref_sorting_options()' interprets
     +    that as "the absence of '--sort'" and returns the default ref sorting
     +    structure (equivalent to "refname" sort).
      
     -    The goal of this change is to have the '--no-sort' option truly disable
     -    sorting in commands like 'for-each-ref, 'tag', and 'branch'. Right now,
     -    '--no-sort' will still trigger refname sorting by default in 'for-each-ref',
     -    'tag', and 'branch'.
     +    To handle '--no-sort' properly while preserving the "refname" sort in the
     +    "absence of --sort'" case, first explicitly add "refname" to the string list
     +    *before* parsing options. This alone doesn't actually change any behavior,
     +    since 'compare_refs()' already falls back on comparing refnames if two refs
     +    are equal w.r.t all other sort keys.
      
     -    To match existing behavior as closely as possible, explicitly add "refname"
     -    to the list of sort keys in 'for-each-ref', 'tag', and 'branch' before
     -    parsing options (if no config-based sort keys are set). This ensures that
     -    sorting will only be fully disabled if '--no-sort' is provided as an option;
     -    otherwise, "refname" sorting will remain the default. Note: this also means
     -    that even when sort keys are provided on the command line, "refname" will be
     -    the final sort key in the sorting structure. This doesn't actually change
     -    any behavior, since 'compare_refs()' already falls back on comparing
     -    refnames if two refs are equal w.r.t all other sort keys.
     +    Now that the string list is populated by default, '--no-sort' is the only
     +    way to empty the 'sorting_options' string list. Update
     +    'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if the
     +    string list is empty, and add a condition to 'ref_array_sort()' to skip the
     +    sort altogether if the sort structure is NULL. Note that other functions
     +    using 'struct ref_sorting *' do not need any changes because they already
     +    ignore NULL values.
      
          Finally, remove the condition around sorting in 'ls-remote', since it's no
     -    longer necessary. Unlike 'for-each-ref' et. al., it does *not* set any sort
     -    keys by default. The default empty list of sort keys will produce a NULL
     -    'struct ref_sorting *', which causes the sorting to be skipped in
     -    'ref_array_sort()'.
     +    longer necessary. Unlike 'for-each-ref' et. al., it does *not* do any
     +    sorting by default. This default is preserved by simply leaving its sort key
     +    string list empty before parsing options; if no additional sort keys are
     +    set, 'struct ref_sorting *' is NULL and sorting is skipped.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ builtin/ls-remote.c: int cmd_ls_remote(int argc, const char **argv, const char *
       
       	for (i = 0; i < ref_array.nr; i++) {
       		const struct ref_array_item *ref = ref_array.items[i];
     +@@ builtin/ls-remote.c: int cmd_ls_remote(int argc, const char **argv, const char *prefix)
     + 		status = 0; /* we found something */
     + 	}
     + 
     ++	ref_sorting_release(sorting);
     + 	ref_array_clear(&ref_array);
     + 	if (transport_disconnect(transport))
     + 		status = 1;
      
       ## builtin/tag.c ##
      @@ builtin/tag.c: int cmd_tag(int argc, const char **argv, const char *prefix)
  2:  88eba4146cd <  -:  ----------- for-each-ref: clarify interaction of --omit-empty & --count
  3:  2e2f9738205 =  2:  adac101bc60 ref-filter.h: add max_count and omit_empty to ref_format
  4:  6c66445ee31 !  3:  f44c4b42c93 ref-filter.h: move contains caches into filter
     @@ Commit message
          filter struct will support, so they are updated to be internally accessible
          wherever the filter is used.
      
     -    The design used here is mirrors what was introduced in 576de3d956
     +    The design used here mirrors what was introduced in 576de3d956
          (unpack_trees: start splitting internal fields from public API, 2023-02-27)
          for 'unpack_trees_options'.
      
  5:  f5be57eea7d =  4:  187b1d6610f ref-filter.h: add functions for filter/format & format-only
  -:  ----------- >  5:  040d291ca45 ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
  6:  8c77452e5dd !  6:  633c0c74c2e ref-filter.c: refactor to create common helper functions
     @@ Commit message
          ref-filter.c: refactor to create common helper functions
      
          Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and
     -    'filter_refs()' into new helper functions ('ref_array_append()',
     -    'apply_ref_filter()', and 'do_filter_refs()' respectively), as well as
     -    rename 'ref_filter_handler()' to 'filter_one()'. In this and later
     -    patches, these helpers will be used by new ref-filter API functions. This
     -    patch does not result in any user-facing behavior changes or changes to
     -    callers outside of 'ref-filter.c'.
     +    'filter_refs()' into new helper functions:
      
     -    The changes are as follows:
     +    * Extract the code to grow a 'struct ref_array' and append a given 'struct
     +      ref_array_item *' to it from 'ref_array_push()' into 'ref_array_append()'.
     +    * Extract the code to filter a given ref by refname & object ID then create
     +      a new 'struct ref_array_item *' from 'filter_one()' into
     +      'apply_ref_filter()'.
     +    * Extract the code for filter pre-processing, contains cache creation, and
     +      ref iteration from 'filter_refs()' into 'do_filter_refs()'.
      
     -    * The logic to grow a 'struct ref_array' and append a given 'struct
     -      ref_array_item *' to it is extracted from 'ref_array_push()' into
     -      'ref_array_append()'.
     -    * 'ref_filter_handler()' is renamed to 'filter_one()' to more clearly
     -      distinguish it from other ref filtering callbacks that will be added in
     -      later patches. The "*_one()" naming convention is common throughout the
     -      codebase for iteration callbacks.
     -    * The code to filter a given ref by refname & object ID then create a new
     -      'struct ref_array_item' is moved out of 'filter_one()' and into
     -      'apply_ref_filter()'. 'apply_ref_filter()' returns either NULL (if the ref
     -      does not match the given filter) or a 'struct ref_array_item *' created
     -      with 'new_ref_array_item()'; 'filter_one()' appends that item to
     -      its ref array with 'ref_array_append()'.
     -    * The filter pre-processing, contains cache creation, and ref iteration of
     -      'filter_refs()' is extracted into 'do_filter_refs()'. 'do_filter_refs()'
     -      takes its ref iterator function & callback data as an input from the
     -      caller, setting it up to be used with additional filtering callbacks in
     -      later patches.
     +    In later patches, these helpers will be used by new ref-filter API
     +    functions. This patch does not result in any user-facing behavior changes or
     +    changes to callers outside of 'ref-filter.c'.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ ref-filter.c: static int filter_ref_kind(struct ref_filter *filter, const char *
      - * A call-back given to for_each_ref().  Filter refs and keep them for
      - * later object processing.
      - */
     --static int ref_filter_handler(const char *refname, const struct object_id *oid, int flag, void *cb_data)
     +-static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
      +static struct ref_array_item *apply_ref_filter(const char *refname, const struct object_id *oid,
      +			    int flag, struct ref_filter *filter)
       {
     @@ ref-filter.c: static int filter_ref_kind(struct ref_filter *filter, const char *
       
       	/*
       	 * A merge filter is applied on refs pointing to commits. Hence
     -@@ ref-filter.c: static int ref_filter_handler(const char *refname, const struct object_id *oid,
     +@@ ref-filter.c: static int filter_one(const char *refname, const struct object_id *oid, int flag
       	    filter->with_commit || filter->no_commit || filter->verbose) {
       		commit = lookup_commit_reference_gently(the_repository, oid, 1);
       		if (!commit)
     @@ ref-filter.c: static int ref_filter_handler(const char *refname, const struct ob
       	}
       
       	/*
     -@@ ref-filter.c: static int ref_filter_handler(const char *refname, const struct object_id *oid,
     +@@ ref-filter.c: static int filter_one(const char *refname, const struct object_id *oid, int flag
       	 * to do its job and the resulting list may yet to be pruned
       	 * by maxcount logic.
       	 */
     @@ ref-filter.c: int filter_refs(struct ref_array *array, struct ref_filter *filter
       		 * of filter_ref_kind().
       		 */
       		if (filter->kind == FILTER_REFS_BRANCHES)
     --			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata);
     +-			ret = for_each_fullref_in("refs/heads/", filter_one, &ref_cbdata);
      +			ret = for_each_fullref_in("refs/heads/", fn, cb_data);
       		else if (filter->kind == FILTER_REFS_REMOTES)
     --			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata);
     +-			ret = for_each_fullref_in("refs/remotes/", filter_one, &ref_cbdata);
      +			ret = for_each_fullref_in("refs/remotes/", fn, cb_data);
       		else if (filter->kind == FILTER_REFS_TAGS)
     --			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata);
     +-			ret = for_each_fullref_in("refs/tags/", filter_one, &ref_cbdata);
      +			ret = for_each_fullref_in("refs/tags/", fn, cb_data);
       		else if (filter->kind & FILTER_REFS_ALL)
     --			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata);
     +-			ret = for_each_fullref_in_pattern(filter, filter_one, &ref_cbdata);
      +			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
       		if (!ret && (filter->kind & FILTER_REFS_DETACHED_HEAD))
     --			head_ref(ref_filter_handler, &ref_cbdata);
     +-			head_ref(filter_one, &ref_cbdata);
      +			head_ref(fn, cb_data);
       	}
       
  7:  84db440896c !  7:  91a77c1a834 ref-filter.c: filter & format refs in the same callback
     @@ ref-filter.c: static void free_array_item(struct ref_array_item *item)
      +	strbuf_release(&err);
      +	free_array_item(ref);
      +
     ++	/*
     ++	 * Increment the running count of refs that match the filter. If
     ++	 * max_count is set and we've reached the max, stop the ref
     ++	 * iteration by returning a nonzero value.
     ++	 */
      +	if (ref_cbdata->format->array_opts.max_count &&
      +	    ++ref_cbdata->internal.count >= ref_cbdata->format->array_opts.max_count)
     -+		return -1;
     ++		return 1;
      +
      +	return 0;
      +}
     @@ ref-filter.c: int filter_refs(struct ref_array *array, struct ref_filter *filter
      +					  struct ref_format *format)
      +{
      +	/*
     -+	 * Refs can be filtered and formatted in the same iteration as long
     -+	 * as we aren't filtering on reachability, sorting the results, or
     -+	 * including ahead-behind information in the formatted output.
     ++	 * Filtering & formatting results within a single ref iteration
     ++	 * callback is not compatible with options that require
     ++	 * post-processing a filtered ref_array. These include:
     ++	 * - filtering on reachability
     ++	 * - sorting the filtered results
     ++	 * - including ahead-behind information in the formatted output
      +	 */
      +	return !(filter->reachable_from ||
      +		 filter->unreachable_from ||
  -:  ----------- >  8:  8eb2fc2950c for-each-ref: clean up documentation of --format
  8:  352b5c42ac3 !  9:  48254d8e161 for-each-ref: add option to fully dereference tags
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    for-each-ref: add option to fully dereference tags
     +    ref-filter.c: use peeled tag for '*' format fields
      
     -    Add a boolean flag '--full-deref' that, when enabled, fills '%(*fieldname)'
     -    format fields using the fully peeled target of tag objects, rather than the
     -    immediate target.
     -
     -    In other builtins ('rev-parse', 'show-ref'), "dereferencing" tags typically
     -    means peeling them down to their non-tag target. Unlike these commands,
     -    'for-each-ref' dereferences only one "level" of tags in '*' format fields
     -    (like "%(*objectname)"). For most annotated tags, one level of dereferencing
     -    is enough, since most tags point to commits or trees. However, nested tags
     -    (annotated tags whose target is another annotated tag) dereferenced once
     -    will point to their target tag, different a full peel to e.g. a commit.
     +    In most builtins ('rev-parse <revision>^{}', 'show-ref --dereference'),
     +    "dereferencing" a tag refers to a recursive peel of the tag object. Unlike
     +    these cases, the dereferencing prefix ('*') in 'for-each-ref' format
     +    specifiers triggers only a single, non-recursive dereference of a given tag
     +    object. For most annotated tags, a single dereference is all that is needed
     +    to access the tag's associated commit or tree; "recursive" and
     +    "non-recursive" dereferencing are functionally equivalent in these cases.
     +    However, nested tags (annotated tags whose target is another annotated tag)
     +    dereferenced once return another tag, where a recursive dereference would
     +    return the commit or tree.
      
          Currently, if a user wants to filter & format refs and include information
     -    about the fully dereferenced tag, they can do so with something like
     +    about a recursively-dereferenced tag, they can do so with something like
          'cat-file --batch-check':
      
              git for-each-ref --format="%(objectname)^{} %(refname)" <pattern> |
                  git cat-file --batch-check="%(objectname) %(rest)"
      
          But the combination of commands is inefficient. So, to improve the
     -    efficiency of this use case, add a '--full-deref' option that causes
     -    'for-each-ref' to fully dereference tags when formatting with '*' fields.
     +    performance of this use case and align the defererencing behavior of
     +    'for-each-ref' with that of other commands, update the ref formatting code
     +    to use the peeled tag (from 'peel_iterated_oid()') to populate '*' fields
     +    rather than the tag's immediate target object (from 'get_tagged_oid()').
     +
     +    Additionally, add a test to 't6300-for-each-ref' to verify new nested tag
     +    behavior and update 't6302-for-each-ref-filter.sh' to print the correct
     +    value for nested dereferenced fields.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Documentation/git-for-each-ref.txt ##
     -@@ Documentation/git-for-each-ref.txt: SYNOPSIS
     - 'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
     - 		   [(--sort=<key>)...] [--format=<format>]
     - 		   [ --stdin | <pattern>... ]
     -+		   [--full-deref]
     - 		   [--points-at=<object>]
     - 		   [--merged[=<object>]] [--no-merged[=<object>]]
     - 		   [--contains[=<object>]] [--no-contains[=<object>]]
     -@@ Documentation/git-for-each-ref.txt: OPTIONS
     - 	the specified host language.  This is meant to produce
     - 	a scriptlet that can directly be `eval`ed.
     +@@ Documentation/git-for-each-ref.txt: from the `committer` or `tagger` fields depending on the object type.
     + These are intended for working on a mix of annotated and lightweight tags.
       
     -+--full-deref::
     -+	Populate dereferenced format fields (indicated with an asterisk (`*`)
     -+	prefix before the fieldname) with information about the fully-peeled
     -+	target object of a tag ref, rather than its immediate target object.
     -+	This only affects the output for nested annotated tags, where the tag's
     -+	immediate target is another tag but its fully-peeled target is another
     -+	object type (e.g. a commit).
     -+
     - --points-at=<object>::
     - 	Only list refs which points at the given object.
     + For tag objects, a `fieldname` prefixed with an asterisk (`*`) expands to
     +-the `fieldname` value of object the tag points at, rather than that of the
     +-tag object itself.
     ++the `fieldname` value of the peeled object, rather than that of the tag
     ++object itself.
       
     -
     - ## builtin/for-each-ref.c ##
     -@@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     - 		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
     - 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
     - 		OPT__COLOR(&format.use_color, N_("respect format colors")),
     -+		OPT_BOOL(0, "full-deref", &format.full_deref,
     -+			 N_("fully dereference tags to populate '*' format fields")),
     - 		OPT_REF_FILTER_EXCLUDE(&filter),
     - 		OPT_REF_SORT(&sorting_options),
     - 		OPT_CALLBACK(0, "points-at", &filter.points_at,
     + Fields that have name-email-date tuple as its value (`author`,
     + `committer`, and `tagger`) can be suffixed with `name`, `email`,
      
       ## ref-filter.c ##
     -@@ ref-filter.c: static struct used_atom {
     - 		char *head;
     - 	} u;
     - } *used_atom;
     --static int used_atom_cnt, need_tagged, need_symref;
     -+static int used_atom_cnt, need_symref;
     -+
     -+enum tag_dereference_mode {
     -+	NO_DEREF = 0,
     -+	DEREF_ONE,
     -+	DEREF_ALL
     -+};
     -+static enum tag_dereference_mode need_tagged;
     - 
     - /*
     -  * Expand string, append it to strbuf *sb, then return error code ret.
     -@@ ref-filter.c: static int parse_ref_filter_atom(struct ref_format *format,
     - 	memset(&used_atom[at].u, 0, sizeof(used_atom[at].u));
     - 	if (valid_atom[i].parser && valid_atom[i].parser(format, &used_atom[at], arg, err))
     - 		return -1;
     --	if (*atom == '*')
     --		need_tagged = 1;
     -+	if (*atom == '*' && !need_tagged)
     -+		need_tagged = format->full_deref ? DEREF_ALL : DEREF_ONE;
     - 	if (i == ATOM_SYMREF)
     - 		need_symref = 1;
     - 	return at;
      @@ ref-filter.c: static int populate_value(struct ref_array_item *ref, struct strbuf *err)
     - 	 * If it is a tag object, see if we use a value that derefs
     - 	 * the object, and if we do grab the object it refers to.
     + 		return 0;
     + 
     + 	/*
     +-	 * If it is a tag object, see if we use a value that derefs
     +-	 * the object, and if we do grab the object it refers to.
     ++	 * If it is a tag object, see if we use the peeled value. If we do,
     ++	 * grab the peeled OID.
       	 */
      -	oi_deref.oid = *get_tagged_oid((struct tag *)obj);
     -+	if (need_tagged == DEREF_ALL) {
     -+		if (peel_iterated_oid(&obj->oid, &oi_deref.oid))
     -+			die("bad tag");
     -+	} else {
     -+		oi_deref.oid = *get_tagged_oid((struct tag *)obj);
     -+	}
     ++	if (need_tagged && peel_iterated_oid(&obj->oid, &oi_deref.oid))
     ++		die("bad tag");
       
      -	/*
      -	 * NEEDSWORK: This derefs tag only once, which
     @@ ref-filter.c: static int populate_value(struct ref_array_item *ref, struct strbu
       }
       
      
     - ## ref-filter.h ##
     -@@ ref-filter.h: struct ref_format {
     - 	const char *rest;
     - 	int quote_style;
     - 	int use_color;
     -+	int full_deref;
     - 
     - 	/* Internal state to ref-filter */
     - 	int need_color_reset_at_eol;
     -
       ## t/t6300-for-each-ref.sh ##
      @@ t/t6300-for-each-ref.sh: test_expect_success 'git for-each-ref with non-existing refs' '
       	test_must_be_empty actual
     @@ t/t6300-for-each-ref.sh: test_expect_success 'git for-each-ref with non-existing
      +	nest1_tag_oid="$(git rev-parse refs/tags/nested/nest1)" &&
      +	nest2_tag_oid="$(git rev-parse refs/tags/nested/nest2)" &&
      +
     -+	# Without full dereference
     -+	cat >expect <<-EOF &&
     -+	refs/tags/nested/base $base_tag_oid tag $head_oid commit
     -+	refs/tags/nested/nest1 $nest1_tag_oid tag $base_tag_oid tag
     -+	refs/tags/nested/nest2 $nest2_tag_oid tag $nest1_tag_oid tag
     -+	EOF
     -+
     -+	git for-each-ref --format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
     -+		refs/tags/nested/ >actual &&
     -+	test_cmp expect actual &&
     -+
     -+	# With full dereference
      +	cat >expect <<-EOF &&
      +	refs/tags/nested/base $base_tag_oid tag $head_oid commit
      +	refs/tags/nested/nest1 $nest1_tag_oid tag $head_oid commit
      +	refs/tags/nested/nest2 $nest2_tag_oid tag $head_oid commit
      +	EOF
      +
     -+	git for-each-ref --full-deref \
     ++	git for-each-ref \
      +		--format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
      +		refs/tags/nested/ >actual &&
      +	test_cmp expect actual
     @@ t/t6300-for-each-ref.sh: test_expect_success 'git for-each-ref with non-existing
       GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
       TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
       
     +
     + ## t/t6302-for-each-ref-filter.sh ##
     +@@ t/t6302-for-each-ref-filter.sh: test_expect_success 'check signed tags with --points-at' '
     + 	sed -e "s/Z$//" >expect <<-\EOF &&
     + 	refs/heads/side Z
     + 	refs/tags/annotated-tag four
     +-	refs/tags/doubly-annotated-tag An annotated tag
     +-	refs/tags/doubly-signed-tag A signed tag
     ++	refs/tags/doubly-annotated-tag four
     ++	refs/tags/doubly-signed-tag four
     + 	refs/tags/four Z
     + 	refs/tags/signed-tag four
     + 	EOF
  9:  a409d773057 ! 10:  d51d073aa4a t/perf: add perf tests for for-each-ref
     @@ Commit message
          Add performance tests for 'for-each-ref'. The tests exercise different
          combinations of filters/formats/options, as well as the overall performance
          of 'git for-each-ref | git cat-file --batch-check' to demonstrate the
     -    performance difference vs. 'git for-each-ref --full-deref'.
     +    performance difference vs. 'git for-each-ref' with "%(*fieldname)" format
     +    specifiers.
      
          All tests are run against a repository with 40k loose refs - 10k commits,
          each having a unique:
     @@ t/perf/p6300-for-each-ref.sh (new)
      +run_tests () {
      +	test_for_each_ref "$1"
      +	test_for_each_ref "$1, no sort" --no-sort
     ++	test_for_each_ref "$1, --count=1" --count=1
     ++	test_for_each_ref "$1, --count=1, no sort" --no-sort --count=1
      +	test_for_each_ref "$1, tags" refs/tags/
      +	test_for_each_ref "$1, tags, no sort" --no-sort refs/tags/
     -+	test_for_each_ref "$1, tags, shallow deref" '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     -+	test_for_each_ref "$1, tags, shallow deref, no sort" --no-sort '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     -+	test_for_each_ref "$1, tags, full deref" --full-deref '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     -+	test_for_each_ref "$1, tags, full deref, no sort" --no-sort --full-deref '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     ++	test_for_each_ref "$1, tags, dereferenced" '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     ++	test_for_each_ref "$1, tags, dereferenced, no sort" --no-sort '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
      +
     -+	test_perf "for-each-ref ($1, tags) + cat-file --batch-check (full deref)" "
     ++	test_perf "for-each-ref ($1, tags) + cat-file --batch-check (dereferenced)" "
      +		for i in \$(test_seq $test_iteration_count); do
      +			git for-each-ref --format='%(objectname)^{} %(refname) %(objectname)' refs/tags/ | \
      +				git cat-file --batch-check='%(objectname) %(rest)' >/dev/null

-- 
gitgitgadget

  parent reply	other threads:[~2023-11-14 19:54 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
2023-11-07  1:25 ` [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
2023-11-07 10:49   ` Patrick Steinhardt
2023-11-07 18:13     ` Victoria Dye
2023-11-07  1:25 ` [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count Victoria Dye via GitGitGadget
2023-11-07 19:23   ` Øystein Walle
2023-11-07 19:30     ` Victoria Dye
2023-11-08  7:53       ` Øystein Walle
2023-11-08 10:00         ` Kristoffer Haugsbakk
2023-11-07  1:25 ` [PATCH 3/9] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
2023-11-07  1:25 ` [PATCH 4/9] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
2023-11-07 10:49   ` Patrick Steinhardt
2023-11-07  1:25 ` [PATCH 5/9] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
2023-11-07  1:25 ` [PATCH 6/9] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
2023-11-07 10:49   ` Patrick Steinhardt
2023-11-07 18:41     ` Victoria Dye
2023-11-07  1:25 ` [PATCH 7/9] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
2023-11-07 10:49   ` Patrick Steinhardt
2023-11-07 19:45     ` Victoria Dye
2023-11-07  1:26 ` [PATCH 8/9] for-each-ref: add option to fully dereference tags Victoria Dye via GitGitGadget
2023-11-07 10:50   ` Patrick Steinhardt
2023-11-08  1:13     ` Victoria Dye
2023-11-08  3:14       ` Junio C Hamano
2023-11-08  7:19         ` Patrick Steinhardt
2023-11-08 18:02         ` Victoria Dye
2023-11-09  1:22           ` Junio C Hamano
2023-11-09  1:23           ` Junio C Hamano
2023-11-09  1:32             ` Junio C Hamano
2023-11-07  1:26 ` [PATCH 9/9] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
2023-11-07  2:36 ` [PATCH 0/9] for-each-ref optimizations & usability improvements Junio C Hamano
2023-11-07  2:48   ` Victoria Dye
2023-11-07  3:04     ` Junio C Hamano
2023-11-07 10:49     ` Patrick Steinhardt
2023-11-08  1:31       ` Victoria Dye
2023-11-14 19:53 ` Victoria Dye via GitGitGadget [this message]
2023-11-14 19:53   ` [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
2023-11-16  5:29     ` Junio C Hamano
2023-11-14 19:53   ` [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
2023-11-16 12:06     ` Øystein Walle
2023-11-14 19:53   ` [PATCH v2 03/10] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
2023-11-16  5:39     ` Junio C Hamano
2023-11-14 19:53   ` [PATCH v2 05/10] ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 06/10] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 07/10] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 08/10] for-each-ref: clean up documentation of --format Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields Victoria Dye via GitGitGadget
2023-11-16  5:48     ` Junio C Hamano
2023-11-14 19:53   ` [PATCH v2 10/10] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1609.v2.git.1699991638.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=code@khaugsbakk.name \
    --cc=git@vger.kernel.org \
    --cc=oystwa@gmail.com \
    --cc=ps@pks.im \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.