From: Junio C Hamano <gitster@pobox.com>
To: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Eric Sunshine <sunshine@sunshineco.com>,
Patrick Steinhardt <ps@pks.im>, Victoria Dye <vdye@github.com>
Subject: Re: [PATCH v2 12/17] mktree: create tree using an in-core index
Date: Thu, 20 Jun 2024 15:26:09 -0700 [thread overview]
Message-ID: <xmqqtthnqmim.fsf@gitster.g> (raw)
In-Reply-To: <2333775ba5bd71766a6aece87e39a6d189aeaead.1718834285.git.gitgitgadget@gmail.com> (Victoria Dye via GitGitGadget's message of "Wed, 19 Jun 2024 21:58:00 +0000")
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -60,17 +66,25 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
> if (literally) {
> FLEX_ALLOC_MEM(ent, name, path, len);
> } else {
> + size_t len_to_copy = len;
> +
> /* Normalize and validate entry path */
> if (S_ISDIR(mode)) {
> - while(len > 0 && is_dir_sep(path[len - 1]))
> - len--;
> + while(len_to_copy > 0 && is_dir_sep(path[len_to_copy - 1]))
Let's fix the style issue while at it, as we are doing other changes
in this step anyway. "while(" -> "while (".
> + len_to_copy--;
> + len = len_to_copy + 1; /* add space for trailing slash */
Do we need to do st_add() here? Perhaps not, but I just noticed the
careful use of st_add3() below, so...
> + ent = xcalloc(1, st_add3(sizeof(struct tree_entry), len, 1));
> + memcpy(ent->name, path, len_to_copy);
>
> if (!verify_path(ent->name, mode))
> die(_("invalid path '%s'"), path);
> if (strchr(ent->name, '/'))
> die("path %s contains slash", path);
> +
> + /* Add trailing slash to dir */
> + if (S_ISDIR(mode))
> + ent->name[len - 1] = '/';
OK.
> @@ -88,11 +102,14 @@ static int ent_compare(const void *a_, const void *b_, void *ctx)
> struct tree_entry *b = *(struct tree_entry **)b_;
> int ignore_mode = *((int *)ctx);
>
> - if (ignore_mode)
> - cmp = name_compare(a->name, a->len, b->name, b->len);
> - else
> - cmp = base_name_compare(a->name, a->len, a->mode,
> - b->name, b->len, b->mode);
> + size_t a_len = a->len, b_len = b->len;
> +
> + if (ignore_mode) {
> + a_len = df_path_len(a_len, a->mode);
> + b_len = df_path_len(b_len, b->mode);
> + }
> +
> + cmp = name_compare(a->name, a_len, b->name, b_len);
> return cmp ? cmp : b->order - a->order;
> }
OK, now the "mode" is sort of "encoded" already in the "name" by the
slash at the end, the way "ignore-mode" works needs to be redesigned.
If we are ignoring mode, we are dropping the trailing '/' and
otherwise we just feed the name with possible trailing '/', and the
same name_compare() can be used. OK.
> @@ -108,8 +125,8 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> for (size_t i = 0; i < count; i++) {
> struct tree_entry *curr = arr->entries[i];
> if (prev &&
> - !name_compare(prev->name, prev->len,
> - curr->name, curr->len)) {
> + !name_compare(prev->name, df_path_len(prev->len, prev->mode),
> + curr->name, df_path_len(curr->len, curr->mode))) {
> FREE_AND_NULL(curr);
And here is the matching adjustment for the dedup comparison, which
makes sense.
> @@ -122,24 +139,43 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
> }
>
> +static int add_tree_entry_to_index(struct index_state *istate,
> + struct tree_entry *ent)
> +{
> + struct cache_entry *ce;
> + struct strbuf ce_name = STRBUF_INIT;
> + strbuf_add(&ce_name, ent->name, ent->len);
> +
Perhaps swap the first statement (which is strbuf_add()) and the
blank line that ought to separate the decls and the first statement?
> + ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
> + if (!ce)
> + return error(_("make_cache_entry failed for path '%s'"), ent->name);
> +
> + add_index_entry(istate, ce, ADD_CACHE_JUST_APPEND);
> + strbuf_release(&ce_name);
> + return 0;
> +}
This is only to append; presumably the caller drives this function
out of a sorted list.
> static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
> {
> + struct index_state istate = INDEX_STATE_INIT(the_repository);
> + istate.sparse_index = 1;
>
> sort_and_dedup_tree_entry_array(arr);
>
> + /* Construct an in-memory index from the provided entries */
> for (size_t i = 0; i < arr->nr; i++) {
> struct tree_entry *ent = arr->entries[i];
> +
> + if (add_tree_entry_to_index(&istate, ent))
> + die(_("failed to add tree entry '%s'"), ent->name);
> }
> + /* Write out new tree */
> + if (cache_tree_update(&istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
> + die(_("failed to write tree"));
Hmph. Are we doing any run-time verification of what we produce
(e.g., if sort_and_dedup_tree_entry_array() fails to dedup or sort
correctly due to a bug or two, would cache_tree_update() notice that
the in-core index array is fishy)? I am not suggesting to add an
unconditional "we appended to the index, so we should sort the
entries in it" step before cache_tree_update() call. It is the
opposite---if we have extra checks in cache_tree_udpate() to slow us
down and if we are confident that the loop that added tree entries
to the index is correct, if we can bypass such checks.
> + oidcpy(oid, &istate.cache_tree->oid);
> +
> + release_index(&istate);
> }
This is the gem of the whole series. Clever.
What is so satisfying is that it takes not that much of code to
replace the "here is a flat buffer of what the contents of a single
tree object ought to look like" with "let's build in-core index and
write it out just like write-tree would". Nice.
next prev parent reply other threads:[~2024-06-20 22:26 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-11 18:45 ` Eric Sunshine
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-11 22:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
2024-06-11 22:52 ` Junio C Hamano
2024-06-18 17:33 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
2024-06-12 1:54 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-12 2:11 ` Junio C Hamano
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:35 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-12 2:18 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-12 2:26 ` Junio C Hamano
2024-06-12 19:01 ` Victoria Dye
2024-06-12 19:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:48 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-13 18:38 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 19:50 ` Junio C Hamano
2024-06-17 19:23 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-20 20:18 ` Junio C Hamano
2024-06-19 21:57 ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-20 22:05 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-20 22:26 ` Junio C Hamano [this message]
2024-06-19 21:58 ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-26 21:10 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-19 21:58 ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-26 21:23 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-27 19:29 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-25 23:26 ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
2024-07-10 21:40 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqtthnqmim.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=ps@pks.im \
--cc=sunshine@sunshineco.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.