From: Junio C Hamano <gitster@pobox.com>
To: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Eric Sunshine <sunshine@sunshineco.com>,
Patrick Steinhardt <ps@pks.im>, Victoria Dye <vdye@github.com>
Subject: Re: [PATCH v2 11/17] mktree: overwrite duplicate entries
Date: Thu, 20 Jun 2024 15:05:25 -0700 [thread overview]
Message-ID: <xmqqh6dns21m.fsf@gitster.g> (raw)
In-Reply-To: <fb555658057f834d94f232f1d8b380a6304a3671.1718834285.git.gitgitgadget@gmail.com> (Victoria Dye via GitGitGadget's message of "Wed, 19 Jun 2024 21:57:59 +0000")
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> If multiple tree entries with the same name are provided as input to
> 'mktree', only write the last one to the tree. Entries are considered
> duplicates if they have identical names (*not* considering mode); if a blob
> and a tree with the same name are provided, only the last one will be
> written to the tree. A tree with duplicate entries is invalid (per 'git
> fsck'), so that condition should be avoided wherever possible.
The "should be avoided" in the last sentence can be satisified
either by the callers being extra careful, or the callee ignoring
earlier entries with the same path. I do not have a strong
objection against allowing looser callers, but if that is what is
going on, perhaps
By teaching "mktree" to ignore the earlier entries for the
same path in the input, the callers can be more casual about
sending duplicate entries in order to avoid creating an
invalid tree objects.
is a more honest justification for this setp?
> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
> index 5f3a6dfe38e..cf1fd82f754 100644
> --- a/Documentation/git-mktree.txt
> +++ b/Documentation/git-mktree.txt
> @@ -54,7 +54,8 @@ cannot be represented in a tree object. The command will fail without
> writing the tree if a higher order stage is specified for any entry.
>
> The order of the tree entries is normalized by `mktree` so pre-sorting the
> -input by path is not required.
> +input by path is not required. Multiple entries provided with the same path
> +are deduplicated, with only the last one specified added to the tree.
OK.
> struct tree_entry {
> + /* Internal */
> + size_t order;
> +
> unsigned mode;
> struct object_id oid;
> int len;
> @@ -74,15 +77,49 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
> ent->len = len;
> oidcpy(&ent->oid, oid);
>
> + ent->order = arr->nr;
> tree_entry_array_push(arr, ent);
> }
>
> -static int ent_compare(const void *a_, const void *b_)
> +static int ent_compare(const void *a_, const void *b_, void *ctx)
> {
> + int cmp;
> struct tree_entry *a = *(struct tree_entry **)a_;
> struct tree_entry *b = *(struct tree_entry **)b_;
> - return base_name_compare(a->name, a->len, a->mode,
> - b->name, b->len, b->mode);
> + int ignore_mode = *((int *)ctx);
> +
> + if (ignore_mode)
> + cmp = name_compare(a->name, a->len, b->name, b->len);
> + else
> + cmp = base_name_compare(a->name, a->len, a->mode,
> + b->name, b->len, b->mode);
> + return cmp ? cmp : b->order - a->order;
> +}
Having two similar functions that could go out of sync has bothered
me somewhat. We could instead do
int a_mode = ignore_mode ? 0 : a->mode;
int b_mode = ignore_mode ? 0 : b->mode;
cmp = base_name_compare(a->name, a->len, a_mode,
b->name, b->len, b_mode);
but that should be done by rewriting name_compare() in terms of
base_name_compare(), which will help more callers, not just this
one.
> +static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> +{
> + size_t count = arr->nr;
> + struct tree_entry *prev = NULL;
> +
> + int ignore_mode = 1;
> + QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
Swap the decl for ignore_mode and the blank line above it?
If the callback context only needs a single bit, ent_compare() could
just use the NULL-ness of ctx as "do we want to ignore mode?" bit.
> + arr->nr = 0;
> + for (size_t i = 0; i < count; i++) {
> + struct tree_entry *curr = arr->entries[i];
> + if (prev &&
> + !name_compare(prev->name, prev->len,
> + curr->name, curr->len)) {
> + FREE_AND_NULL(curr);
> + } else {
> + arr->entries[arr->nr++] = curr;
> + prev = curr;
> + }
> + }
As long as this is done for a single tree (i.e. the paths do not
have any slashes in them), this "sort them all and keep the last
one" is a good strategy.
> + /* Sort again to order the entries for tree insertion */
> + ignore_mode = 0;
> + QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
OK. We from time to time find need to do this, and I always regret
that we didn't design the sort order of paths in a tree (and in the
index) like so [*]. But that is almost 20 years too late ;-).
Looking good.
[Footnote]
* A directory entry $T should have sorted after a non-directory
entry $T but before any non-directory entry whose path has $T
as its prefix (e.g. even a blob whose path is $T + "\001" should
sort after a tree $T). That way we didn't have to worry about a
blob at ($T + '-') sorting before a tree at $T but a blob at ($T
+ '0') sorting after that tree.
next prev parent reply other threads:[~2024-06-20 22:05 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-11 18:45 ` Eric Sunshine
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-11 22:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
2024-06-11 22:52 ` Junio C Hamano
2024-06-18 17:33 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
2024-06-12 1:54 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-12 2:11 ` Junio C Hamano
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:35 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-12 2:18 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-12 2:26 ` Junio C Hamano
2024-06-12 19:01 ` Victoria Dye
2024-06-12 19:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:48 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-13 18:38 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 19:50 ` Junio C Hamano
2024-06-17 19:23 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-20 20:18 ` Junio C Hamano
2024-06-19 21:57 ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-20 22:05 ` Junio C Hamano [this message]
2024-06-19 21:58 ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-20 22:26 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-26 21:10 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-19 21:58 ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-26 21:23 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-27 19:29 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-25 23:26 ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
2024-07-10 21:40 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqh6dns21m.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=ps@pks.im \
--cc=sunshine@sunshineco.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).