From: Patrick Steinhardt <ps@pks.im>
To: Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Victoria Dye <vdye@github.com>
Subject: Re: [PATCH 7/9] ref-filter.c: filter & format refs in the same callback
Date: Tue, 7 Nov 2023 11:49:56 +0100 [thread overview]
Message-ID: <ZUoWVPSE1GcJdHFE@tanuki> (raw)
In-Reply-To: <84db440896c162bcbeeaaf00d528839056aefaa5.1699320362.git.gitgitgadget@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5290 bytes --]
On Tue, Nov 07, 2023 at 01:25:59AM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> Update 'filter_and_format_refs()' to try to perform ref filtering &
> formatting in a single ref iteration, without an intermediate 'struct
> ref_array'. This can only be done if no operations need to be performed on a
> pre-filtered array; specifically, if the refs are
>
> - filtered on reachability,
> - sorted, or
> - formatted with ahead-behind information
>
> they cannot be filtered & formatted in the same iteration. In that case,
> fall back on the current filter-then-sort-then-format flow.
>
> This optimization substantially improves memory usage due to no longer
> storing a ref array in memory. In some cases, it also dramatically reduces
> runtime (e.g. 'git for-each-ref --no-sort --count=1', which no longer loads
> all refs into a 'struct ref_array' to printing only the first ref).
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> ref-filter.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 74 insertions(+), 6 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index ff00ab4b8d8..384cf1595ff 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2863,6 +2863,44 @@ static void free_array_item(struct ref_array_item *item)
> free(item);
> }
>
> +struct ref_filter_and_format_cbdata {
> + struct ref_filter *filter;
> + struct ref_format *format;
> +
> + struct ref_filter_and_format_internal {
> + int count;
> + } internal;
> +};
> +
> +static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
> +{
> + struct ref_filter_and_format_cbdata *ref_cbdata = cb_data;
> + struct ref_array_item *ref;
> + struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;
> +
> + ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
> + if (!ref)
> + return 0;
> +
> + if (format_ref_array_item(ref, ref_cbdata->format, &output, &err))
> + die("%s", err.buf);
> +
> + if (output.len || !ref_cbdata->format->array_opts.omit_empty) {
> + fwrite(output.buf, 1, output.len, stdout);
> + putchar('\n');
> + }
> +
> + strbuf_release(&output);
> + strbuf_release(&err);
> + free_array_item(ref);
> +
> + if (ref_cbdata->format->array_opts.max_count &&
> + ++ref_cbdata->internal.count >= ref_cbdata->format->array_opts.max_count)
> + return -1;
It feels a bit weird to return a negative value here, which usually
indicates that an error has happened whereas we only use it here to
abort the iteration. But we ignore the return value of
`do_iterate_refs()` anyway, so it doesn't make much of a difference.
> + return 0;
> +}
> +
> /* Free all memory allocated for ref_array */
> void ref_array_clear(struct ref_array *array)
> {
> @@ -3046,16 +3084,46 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
> return ret;
> }
>
> +static inline int can_do_iterative_format(struct ref_filter *filter,
> + struct ref_sorting *sorting,
> + struct ref_format *format)
> +{
> + /*
> + * Refs can be filtered and formatted in the same iteration as long
> + * as we aren't filtering on reachability, sorting the results, or
> + * including ahead-behind information in the formatted output.
> + */
Do we want to format this as a bulleted list so that it's more readily
extensible if we ever need to pay attention to new options here? Also, I
noted that this commit doesn't add any new tests -- do we already
exercise all of these conditions?
More generally, I worry a bit about maintainability of this code snippet
as we need to remember to always update this condition whenever we add a
new option, and this can be quite easy to miss. The performance benefit
might be worth the effort though.
Patrick
> + return !(filter->reachable_from ||
> + filter->unreachable_from ||
> + sorting ||
> + format->bases.nr);
> +}
> +
> void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
> struct ref_sorting *sorting,
> struct ref_format *format)
> {
> - struct ref_array array = { 0 };
> - filter_refs(&array, filter, type);
> - filter_ahead_behind(the_repository, format, &array);
> - ref_array_sort(sorting, &array);
> - print_formatted_ref_array(&array, format);
> - ref_array_clear(&array);
> + if (can_do_iterative_format(filter, sorting, format)) {
> + int save_commit_buffer_orig;
> + struct ref_filter_and_format_cbdata ref_cbdata = {
> + .filter = filter,
> + .format = format,
> + };
> +
> + save_commit_buffer_orig = save_commit_buffer;
> + save_commit_buffer = 0;
> +
> + do_filter_refs(filter, type, filter_and_format_one, &ref_cbdata);
> +
> + save_commit_buffer = save_commit_buffer_orig;
> + } else {
> + struct ref_array array = { 0 };
> + filter_refs(&array, filter, type);
> + filter_ahead_behind(the_repository, format, &array);
> + ref_array_sort(sorting, &array);
> + print_formatted_ref_array(&array, format);
> + ref_array_clear(&array);
> + }
> }
>
> static int compare_detached_head(struct ref_array_item *a, struct ref_array_item *b)
> --
> gitgitgadget
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-11-07 10:50 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-07 1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
2023-11-07 1:25 ` [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-07 18:13 ` Victoria Dye
2023-11-07 1:25 ` [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count Victoria Dye via GitGitGadget
2023-11-07 19:23 ` Øystein Walle
2023-11-07 19:30 ` Victoria Dye
2023-11-08 7:53 ` Øystein Walle
2023-11-08 10:00 ` Kristoffer Haugsbakk
2023-11-07 1:25 ` [PATCH 3/9] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
2023-11-07 1:25 ` [PATCH 4/9] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-07 1:25 ` [PATCH 5/9] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
2023-11-07 1:25 ` [PATCH 6/9] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-07 18:41 ` Victoria Dye
2023-11-07 1:25 ` [PATCH 7/9] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
2023-11-07 10:49 ` Patrick Steinhardt [this message]
2023-11-07 19:45 ` Victoria Dye
2023-11-07 1:26 ` [PATCH 8/9] for-each-ref: add option to fully dereference tags Victoria Dye via GitGitGadget
2023-11-07 10:50 ` Patrick Steinhardt
2023-11-08 1:13 ` Victoria Dye
2023-11-08 3:14 ` Junio C Hamano
2023-11-08 7:19 ` Patrick Steinhardt
2023-11-08 18:02 ` Victoria Dye
2023-11-09 1:22 ` Junio C Hamano
2023-11-09 1:23 ` Junio C Hamano
2023-11-09 1:32 ` Junio C Hamano
2023-11-07 1:26 ` [PATCH 9/9] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
2023-11-07 2:36 ` [PATCH 0/9] for-each-ref optimizations & usability improvements Junio C Hamano
2023-11-07 2:48 ` Victoria Dye
2023-11-07 3:04 ` Junio C Hamano
2023-11-07 10:49 ` Patrick Steinhardt
2023-11-08 1:31 ` Victoria Dye
2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
2023-11-16 5:29 ` Junio C Hamano
2023-11-14 19:53 ` [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
2023-11-16 12:06 ` Øystein Walle
2023-11-14 19:53 ` [PATCH v2 03/10] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
2023-11-16 5:39 ` Junio C Hamano
2023-11-14 19:53 ` [PATCH v2 05/10] ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 06/10] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 07/10] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 08/10] for-each-ref: clean up documentation of --format Victoria Dye via GitGitGadget
2023-11-14 19:53 ` [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields Victoria Dye via GitGitGadget
2023-11-16 5:48 ` Junio C Hamano
2023-11-14 19:53 ` [PATCH v2 10/10] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZUoWVPSE1GcJdHFE@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.