* [PATCH 00/16] mktree: support more flexible usage
@ 2024-06-11 18:24 Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
` (16 more replies)
0 siblings, 17 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye
The goal of this series is to make 'git mktree' a much more flexible and
powerful tool for constructing arbitrary trees in memory without the use of
an index or worktree. The main additions are:
* Using an optional "base tree" to add or replace entries in an existing
tree rather than creating a new one from scratch.
* Building off of this, having entries with mode "0" indicate "remove
this entry, if it exists, from the tree"
* Handling tree entries inside of subtrees (e.g., folder1/my-file.txt)
It also introduces some quality-of-life updates:
* Using the same input parsing as 'update-index' to allow a wider variety
of tree entry formats.
* Adding deduplication of input entries & more thorough validation of
inputs (with an option to disable both - plus input sorting - if desired
with '--literally').
The implementation change underpinning the new features is completely
revamping how the tree is constructed in memory. Instead of writing a single
tree object into a strbuf and hashing it into the object database, we
construct an in-core sparse index and write out the root tree, as well as
any new subtrees, using the cache tree infrastructure.
The series is organized as follows:
* Commits 1-3 contain miscellaneous small renames/refactors to make the
code more readable & prepare for larger refactoring later.
* Commits 4-7 generalize the input parsing performed by 'read_index_info()'
in 'update-index' and update 'mktree' to use it.
* Commit 8 adds the '--literally' option to 'mktree'. Practically, this
option allows tests that currently use 'mktree' to generate corrupt trees
to continue functioning after we strengthen input validations.
* Commits 9 & 10 add input path validation & entry deduplication,
respectively.
* Commit 11 replaces the strbuf-to-object tree creation with construction
of an in-core index & writing out the cache tree.
* Commits 12-14 add the ability to add tree entries to an existing "base"
tree. Takes 3 commits to do it because it requires a bit of finesse
around directory/file deduplication and iterating over a tree with
'read_tree()' with a parallel iteration over the input tree entries.
* Commit 15 allows for deeper paths in the input.
* Commit 16 adds handling for mode '0' as "removal" entries.
I also plan to add a '--strict' option that runs 'fsck' checks on the new
tree(s) before writing to the object database (similar to 'mkttag
--strict'), but this series is pretty long as it is and that part can easily
be separated out into its own series.
Thanks!
* Victoria
Victoria Dye (16):
mktree: use OPT_BOOL
mktree: rename treeent to tree_entry
mktree: use non-static tree_entry array
update-index: generalize 'read_index_info'
index-info.c: identify empty input lines in read_index_info
index-info.c: parse object type in provided in read_index_info
mktree: use read_index_info to read stdin lines
mktree: add a --literally option
mktree: validate paths more carefully
mktree: overwrite duplicate entries
mktree: create tree using an in-core index
mktree: use iterator struct to add tree entries to index
mktree: add directory-file conflict hashmap
mktree: optionally add to an existing tree
mktree: allow deeper paths in input
mktree: remove entries when mode is 0
Documentation/git-mktree.txt | 42 +-
Makefile | 1 +
builtin/mktree.c | 595 +++++++++++++++++++++++------
builtin/update-index.c | 119 ++----
index-info.c | 104 +++++
index-info.h | 14 +
t/t1010-mktree.sh | 354 ++++++++++++++++-
t/t1014-read-tree-confusing.sh | 6 +-
t/t1450-fsck.sh | 4 +-
t/t1601-index-bogus.sh | 2 +-
t/t1700-split-index.sh | 6 +-
t/t2107-update-index-basic.sh | 32 ++
t/t7008-filter-branch-null-sha1.sh | 6 +-
t/t7417-submodule-path-url.sh | 2 +-
t/t7450-bad-git-dotfiles.sh | 8 +-
15 files changed, 1055 insertions(+), 240 deletions(-)
create mode 100644 index-info.c
create mode 100644 index-info.h
base-commit: 8d94cfb54504f2ec9edc7ca3eb5c29a3dd3675ae
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1746%2Fvdye%2Fvdye%2Fmktree-recursive-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1746/vdye/vdye/mktree-recursive-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1746
--
gitgitgadget
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH 01/16] mktree: use OPT_BOOL
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
` (15 subsequent siblings)
16 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Replace 'OPT_SET_INT' with 'OPT_BOOL' for the options '--missing' and
'--batch'. The use of 'OPT_SET_INT' in these options is identical to
'OPT_BOOL', but 'OPT_BOOL' provides slightly simpler syntax.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 9a22d4e2773..8b19d440747 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -162,8 +162,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
const struct option option[] = {
OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")),
- OPT_SET_INT( 0 , "missing", &allow_missing, N_("allow missing objects"), 1),
- OPT_SET_INT( 0 , "batch", &is_batch_mode, N_("allow creation of more than one tree"), 1),
+ OPT_BOOL(0, "missing", &allow_missing, N_("allow missing objects")),
+ OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")),
OPT_END()
};
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 02/16] mktree: rename treeent to tree_entry
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
` (14 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Rename the type for better readability, clearly specifying "entry" (instead
of the "ent" abbreviation) and separating "tree" from "entry".
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 8b19d440747..c02feb06aff 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -12,7 +12,7 @@
#include "parse-options.h"
#include "object-store-ll.h"
-static struct treeent {
+static struct tree_entry {
unsigned mode;
struct object_id oid;
int len;
@@ -22,7 +22,7 @@ static int alloc, used;
static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
{
- struct treeent *ent;
+ struct tree_entry *ent;
size_t len = strlen(path);
if (strchr(path, '/'))
die("path %s contains slash", path);
@@ -38,8 +38,8 @@ static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
static int ent_compare(const void *a_, const void *b_)
{
- struct treeent *a = *(struct treeent **)a_;
- struct treeent *b = *(struct treeent **)b_;
+ struct tree_entry *a = *(struct tree_entry **)a_;
+ struct tree_entry *b = *(struct tree_entry **)b_;
return base_name_compare(a->name, a->len, a->mode,
b->name, b->len, b->mode);
}
@@ -56,7 +56,7 @@ static void write_tree(struct object_id *oid)
strbuf_init(&buf, size);
for (i = 0; i < used; i++) {
- struct treeent *ent = entries[i];
+ struct tree_entry *ent = entries[i];
strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 03/16] mktree: use non-static tree_entry array
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-11 18:45 ` Eric Sunshine
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
` (13 subsequent siblings)
16 siblings, 2 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Replace the static 'struct tree_entry **entries' with a non-static 'struct
tree_entry_array' instance. In later commits, we'll want to be able to
create additional 'struct tree_entry_array' instances utilizing common
functionality (create, push, clear, free). To avoid code duplication, create
the 'struct tree_entry_array' type and add functions that perform those
basic operations.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 67 +++++++++++++++++++++++++++++++++---------------
1 file changed, 47 insertions(+), 20 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index c02feb06aff..15bd908702a 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -12,15 +12,39 @@
#include "parse-options.h"
#include "object-store-ll.h"
-static struct tree_entry {
+struct tree_entry {
unsigned mode;
struct object_id oid;
int len;
char name[FLEX_ARRAY];
-} **entries;
-static int alloc, used;
+};
+
+struct tree_entry_array {
+ size_t nr, alloc;
+ struct tree_entry **entries;
+};
-static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
+static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entry *ent)
+{
+ ALLOC_GROW(arr->entries, arr->nr + 1, arr->alloc);
+ arr->entries[arr->nr++] = ent;
+}
+
+static void clear_tree_entry_array(struct tree_entry_array *arr)
+{
+ for (size_t i = 0; i < arr->nr; i++)
+ FREE_AND_NULL(arr->entries[i]);
+ arr->nr = 0;
+}
+
+static void release_tree_entry_array(struct tree_entry_array *arr)
+{
+ FREE_AND_NULL(arr->entries);
+ arr->nr = arr->alloc = 0;
+}
+
+static void append_to_tree(unsigned mode, struct object_id *oid, const char *path,
+ struct tree_entry_array *arr)
{
struct tree_entry *ent;
size_t len = strlen(path);
@@ -32,8 +56,8 @@ static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
ent->len = len;
oidcpy(&ent->oid, oid);
- ALLOC_GROW(entries, used + 1, alloc);
- entries[used++] = ent;
+ /* Append the update */
+ tree_entry_array_push(arr, ent);
}
static int ent_compare(const void *a_, const void *b_)
@@ -44,19 +68,18 @@ static int ent_compare(const void *a_, const void *b_)
b->name, b->len, b->mode);
}
-static void write_tree(struct object_id *oid)
+static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
{
struct strbuf buf;
- size_t size;
- int i;
+ size_t size = 0;
- QSORT(entries, used, ent_compare);
- for (size = i = 0; i < used; i++)
- size += 32 + entries[i]->len;
+ QSORT(arr->entries, arr->nr, ent_compare);
+ for (size_t i = 0; i < arr->nr; i++)
+ size += 32 + arr->entries[i]->len;
strbuf_init(&buf, size);
- for (i = 0; i < used; i++) {
- struct tree_entry *ent = entries[i];
+ for (size_t i = 0; i < arr->nr; i++) {
+ struct tree_entry *ent = arr->entries[i];
strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
}
@@ -70,7 +93,8 @@ static const char *mktree_usage[] = {
NULL
};
-static void mktree_line(char *buf, int nul_term_line, int allow_missing)
+static void mktree_line(char *buf, int nul_term_line, int allow_missing,
+ struct tree_entry_array *arr)
{
char *ptr, *ntr;
const char *p;
@@ -146,7 +170,7 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
}
}
- append_to_tree(mode, &oid, path);
+ append_to_tree(mode, &oid, path, arr);
free(to_free);
}
@@ -158,6 +182,7 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
int allow_missing = 0;
int is_batch_mode = 0;
int got_eof = 0;
+ struct tree_entry_array arr = { 0 };
strbuf_getline_fn getline_fn;
const struct option option[] = {
@@ -182,9 +207,9 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
break;
die("input format error: (blank line only valid in batch mode)");
}
- mktree_line(sb.buf, nul_term_line, allow_missing);
+ mktree_line(sb.buf, nul_term_line, allow_missing, &arr);
}
- if (is_batch_mode && got_eof && used < 1) {
+ if (is_batch_mode && got_eof && arr.nr < 1) {
/*
* Execution gets here if the last tree entry is terminated with a
* new-line. The final new-line has been made optional to be
@@ -192,12 +217,14 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
*/
; /* skip creating an empty tree */
} else {
- write_tree(&oid);
+ write_tree(&arr, &oid);
puts(oid_to_hex(&oid));
fflush(stdout);
}
- used=0; /* reset tree entry buffer for re-use in batch mode */
+ clear_tree_entry_array(&arr); /* reset tree entry buffer for re-use in batch mode */
}
+
+ release_tree_entry_array(&arr);
strbuf_release(&sb);
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 04/16] update-index: generalize 'read_index_info'
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (2 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-11 22:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
` (12 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Move 'read_index_info()' into a new header 'index-info.h' and generalize the
function to call a provided callback for each parsed line. Update
'update-index.c' to use this generalized 'read_index_info()', adding the
callback 'apply_index_info()' to verify the parsed line and update the index
according to its contents.
The input parsing done by 'read_index_info()' is similar to, but more
flexible than, the parsing done in 'mktree' by 'mktree_line()' (handling not
only 'git ls-tree' output but also the outputs of 'git apply --index-info'
and 'git ls-files --stage' outputs). To make 'mktree' more flexible, a later
patch will replace mktree's custom parsing with 'read_index_info()'.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Makefile | 1 +
builtin/update-index.c | 116 ++++++++--------------------------
index-info.c | 91 ++++++++++++++++++++++++++
index-info.h | 11 ++++
t/t2107-update-index-basic.sh | 27 ++++++++
5 files changed, 155 insertions(+), 91 deletions(-)
create mode 100644 index-info.c
create mode 100644 index-info.h
diff --git a/Makefile b/Makefile
index 2f5f16847ae..db9604e59c3 100644
--- a/Makefile
+++ b/Makefile
@@ -1037,6 +1037,7 @@ LIB_OBJS += hex.o
LIB_OBJS += hex-ll.o
LIB_OBJS += hook.o
LIB_OBJS += ident.o
+LIB_OBJS += index-info.o
LIB_OBJS += json-writer.o
LIB_OBJS += kwset.o
LIB_OBJS += levenshtein.o
diff --git a/builtin/update-index.c b/builtin/update-index.c
index d343416ae26..77df380cb54 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -11,6 +11,7 @@
#include "gettext.h"
#include "hash.h"
#include "hex.h"
+#include "index-info.h"
#include "lockfile.h"
#include "quote.h"
#include "cache-tree.h"
@@ -509,100 +510,29 @@ static void update_one(const char *path)
report("add '%s'", path);
}
-static void read_index_info(int nul_term_line)
+static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
+ const char *path_name, void *cbdata UNUSED)
{
- const int hexsz = the_hash_algo->hexsz;
- struct strbuf buf = STRBUF_INIT;
- struct strbuf uq = STRBUF_INIT;
- strbuf_getline_fn getline_fn;
+ if (!verify_path(path_name, mode)) {
+ fprintf(stderr, "Ignoring path %s\n", path_name);
+ return 0;
+ }
- getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
- while (getline_fn(&buf, stdin) != EOF) {
- char *ptr, *tab;
- char *path_name;
- struct object_id oid;
- unsigned int mode;
- unsigned long ul;
- int stage;
-
- /* This reads lines formatted in one of three formats:
- *
- * (1) mode SP sha1 TAB path
- * The first format is what "git apply --index-info"
- * reports, and used to reconstruct a partial tree
- * that is used for phony merge base tree when falling
- * back on 3-way merge.
- *
- * (2) mode SP type SP sha1 TAB path
- * The second format is to stuff "git ls-tree" output
- * into the index file.
- *
- * (3) mode SP sha1 SP stage TAB path
- * This format is to put higher order stages into the
- * index file and matches "git ls-files --stage" output.
+ if (!mode) {
+ /* mode == 0 means there is no such path -- remove */
+ if (remove_file_from_index(the_repository->index, path_name))
+ die("git update-index: unable to remove %s", path_name);
+ }
+ else {
+ /* mode ' ' sha1 '\t' name
+ * ptr[-1] points at tab,
+ * ptr[-41] is at the beginning of sha1
*/
- errno = 0;
- ul = strtoul(buf.buf, &ptr, 8);
- if (ptr == buf.buf || *ptr != ' '
- || errno || (unsigned int) ul != ul)
- goto bad_line;
- mode = ul;
-
- tab = strchr(ptr, '\t');
- if (!tab || tab - ptr < hexsz + 1)
- goto bad_line;
-
- if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
- stage = tab[-1] - '0';
- ptr = tab + 1; /* point at the head of path */
- tab = tab - 2; /* point at tail of sha1 */
- }
- else {
- stage = 0;
- ptr = tab + 1; /* point at the head of path */
- }
-
- if (get_oid_hex(tab - hexsz, &oid) ||
- tab[-(hexsz + 1)] != ' ')
- goto bad_line;
-
- path_name = ptr;
- if (!nul_term_line && path_name[0] == '"') {
- strbuf_reset(&uq);
- if (unquote_c_style(&uq, path_name, NULL)) {
- die("git update-index: bad quoting of path name");
- }
- path_name = uq.buf;
- }
-
- if (!verify_path(path_name, mode)) {
- fprintf(stderr, "Ignoring path %s\n", path_name);
- continue;
- }
-
- if (!mode) {
- /* mode == 0 means there is no such path -- remove */
- if (remove_file_from_index(the_repository->index, path_name))
- die("git update-index: unable to remove %s",
- ptr);
- }
- else {
- /* mode ' ' sha1 '\t' name
- * ptr[-1] points at tab,
- * ptr[-41] is at the beginning of sha1
- */
- ptr[-(hexsz + 2)] = ptr[-1] = 0;
- if (add_cacheinfo(mode, &oid, path_name, stage))
- die("git update-index: unable to update %s",
- path_name);
- }
- continue;
-
- bad_line:
- die("malformed index info %s", buf.buf);
+ if (add_cacheinfo(mode, oid, path_name, stage))
+ die("git update-index: unable to update %s", path_name);
}
- strbuf_release(&buf);
- strbuf_release(&uq);
+
+ return 0;
}
static const char * const update_index_usage[] = {
@@ -849,6 +779,7 @@ static enum parse_opt_result stdin_cacheinfo_callback(
const char *arg, int unset)
{
int *nul_term_line = opt->value;
+ int ret;
BUG_ON_OPT_NEG(unset);
BUG_ON_OPT_ARG(arg);
@@ -856,7 +787,10 @@ static enum parse_opt_result stdin_cacheinfo_callback(
if (ctx->argc != 1)
return error("option '%s' must be the last argument", opt->long_name);
allow_add = allow_replace = allow_remove = 1;
- read_index_info(*nul_term_line);
+ ret = read_index_info(*nul_term_line, apply_index_info, NULL);
+ if (ret)
+ return -1;
+
return 0;
}
diff --git a/index-info.c b/index-info.c
new file mode 100644
index 00000000000..0b68e34c361
--- /dev/null
+++ b/index-info.c
@@ -0,0 +1,91 @@
+#include "git-compat-util.h"
+#include "index-info.h"
+#include "hash.h"
+#include "hex.h"
+#include "strbuf.h"
+#include "quote.h"
+
+int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
+{
+ const int hexsz = the_hash_algo->hexsz;
+ struct strbuf buf = STRBUF_INIT;
+ struct strbuf uq = STRBUF_INIT;
+ strbuf_getline_fn getline_fn;
+ int ret = 0;
+
+ getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
+ while (getline_fn(&buf, stdin) != EOF) {
+ char *ptr, *tab;
+ char *path_name;
+ struct object_id oid;
+ unsigned int mode;
+ unsigned long ul;
+ int stage;
+
+ /* This reads lines formatted in one of three formats:
+ *
+ * (1) mode SP sha1 TAB path
+ * The first format is what "git apply --index-info"
+ * reports, and used to reconstruct a partial tree
+ * that is used for phony merge base tree when falling
+ * back on 3-way merge.
+ *
+ * (2) mode SP type SP sha1 TAB path
+ * The second format is to stuff "git ls-tree" output
+ * into the index file.
+ *
+ * (3) mode SP sha1 SP stage TAB path
+ * This format is to put higher order stages into the
+ * index file and matches "git ls-files --stage" output.
+ */
+ errno = 0;
+ ul = strtoul(buf.buf, &ptr, 8);
+ if (ptr == buf.buf || *ptr != ' '
+ || errno || (unsigned int) ul != ul)
+ goto bad_line;
+ mode = ul;
+
+ tab = strchr(ptr, '\t');
+ if (!tab || tab - ptr < hexsz + 1)
+ goto bad_line;
+
+ if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
+ stage = tab[-1] - '0';
+ ptr = tab + 1; /* point at the head of path */
+ tab = tab - 2; /* point at tail of sha1 */
+ } else {
+ stage = 0;
+ ptr = tab + 1; /* point at the head of path */
+ }
+
+ if (get_oid_hex(tab - hexsz, &oid) ||
+ tab[-(hexsz + 1)] != ' ')
+ goto bad_line;
+
+ path_name = ptr;
+ if (!nul_term_line && path_name[0] == '"') {
+ strbuf_reset(&uq);
+ if (unquote_c_style(&uq, path_name, NULL)) {
+ ret = error("bad quoting of path name");
+ break;
+ }
+ path_name = uq.buf;
+ }
+
+ ret = fn(mode, &oid, stage, path_name, cbdata);
+ if (ret) {
+ ret = -1;
+ break;
+ }
+
+ continue;
+
+ bad_line:
+ ret = error("malformed input line '%s'", buf.buf);
+ break;
+ }
+ strbuf_release(&buf);
+ strbuf_release(&uq);
+
+ return ret;
+}
diff --git a/index-info.h b/index-info.h
new file mode 100644
index 00000000000..d650498325a
--- /dev/null
+++ b/index-info.h
@@ -0,0 +1,11 @@
+#ifndef INDEX_INFO_H
+#define INDEX_INFO_H
+
+#include "hash.h"
+
+typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
+
+/* Iterate over parsed index info from stdin */
+int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata);
+
+#endif /* INDEX_INFO_H */
diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
index cc72ead79f3..29696ade0d0 100755
--- a/t/t2107-update-index-basic.sh
+++ b/t/t2107-update-index-basic.sh
@@ -142,4 +142,31 @@ test_expect_success '--index-version' '
test_must_be_empty actual
'
+test_expect_success '--index-info fails on malformed input' '
+ # empty line
+ echo "" |
+ test_must_fail git update-index --index-info 2>err &&
+ grep "malformed input line" err &&
+
+ # bad whitespace
+ printf "100644 $EMPTY_BLOB A" |
+ test_must_fail git update-index --index-info 2>err &&
+ grep "malformed input line" err &&
+
+ # invalid stage value
+ printf "100644 $EMPTY_BLOB 5\tA" |
+ test_must_fail git update-index --index-info 2>err &&
+ grep "malformed input line" err &&
+
+ # invalid OID length
+ printf "100755 abc123\tA" |
+ test_must_fail git update-index --index-info 2>err &&
+ grep "malformed input line" err &&
+
+ # bad quoting
+ printf "100644 $EMPTY_BLOB\t\"A" |
+ test_must_fail git update-index --index-info 2>err &&
+ grep "bad quoting of path name" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 05/16] index-info.c: identify empty input lines in read_index_info
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (3 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-11 22:52 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
` (11 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Update 'read_index_info()' to return INDEX_INFO_EMPTY_LINE (value 1), rather
than the default error code (value -1) when the function encounters an empty
line in stdin. This grants the caller the flexibility to handle such
scenarios differently than a typical error. In the case of 'update-index',
we'll still exit with a "malformed input line" error. However, when
'read_index_info()' is used to process the input to 'mktree' in a later
patch, the empty line return value will signal a new tree in --batch mode.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/update-index.c | 4 +++-
index-info.c | 5 +++++
index-info.h | 2 ++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 77df380cb54..b1b334807f8 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -788,7 +788,9 @@ static enum parse_opt_result stdin_cacheinfo_callback(
return error("option '%s' must be the last argument", opt->long_name);
allow_add = allow_replace = allow_remove = 1;
ret = read_index_info(*nul_term_line, apply_index_info, NULL);
- if (ret)
+ if (ret == INDEX_INFO_EMPTY_LINE)
+ return error("malformed input line ''");
+ else if (ret < 0)
return -1;
return 0;
diff --git a/index-info.c b/index-info.c
index 0b68e34c361..735cbf1f476 100644
--- a/index-info.c
+++ b/index-info.c
@@ -22,6 +22,11 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
unsigned long ul;
int stage;
+ if (!buf.len) {
+ ret = INDEX_INFO_EMPTY_LINE;
+ break;
+ }
+
/* This reads lines formatted in one of three formats:
*
* (1) mode SP sha1 TAB path
diff --git a/index-info.h b/index-info.h
index d650498325a..1884972021d 100644
--- a/index-info.h
+++ b/index-info.h
@@ -5,6 +5,8 @@
typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
+#define INDEX_INFO_EMPTY_LINE 1
+
/* Iterate over parsed index info from stdin */
int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 06/16] index-info.c: parse object type in provided in read_index_info
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (4 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 1:54 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
` (10 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
If the object type (e.g. "blob", "tree") is identified on a stdin line read
by 'read_index_info()' (i.e. on lines formatted like the output of 'git
ls-tree'), parse it into an 'enum object_type' and provide it to the
'read_index_info()' callback as an argument. If the type is not provided,
pass 'OBJ_NONE' instead. If the object type is invalid, return an error.
The goal of this change is to allow for more thorough validation of the
provided object type (e.g. against the provided mode) in 'mktree' once
'mktree_line' is replaced with 'read_index_info()'. Note, though, that this
change also strengthens the validation done by 'update-index', since invalid
type names now trigger an error.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/update-index.c | 3 ++-
index-info.c | 16 ++++++++++++----
index-info.h | 3 ++-
t/t2107-update-index-basic.sh | 5 +++++
4 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index b1b334807f8..8882433b644 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -510,7 +510,8 @@ static void update_one(const char *path)
report("add '%s'", path);
}
-static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
+static int apply_index_info(unsigned int mode, struct object_id *oid,
+ enum object_type obj_type UNUSED, int stage,
const char *path_name, void *cbdata UNUSED)
{
if (!verify_path(path_name, mode)) {
diff --git a/index-info.c b/index-info.c
index 735cbf1f476..5d61e61e28f 100644
--- a/index-info.c
+++ b/index-info.c
@@ -18,6 +18,7 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
char *ptr, *tab;
char *path_name;
struct object_id oid;
+ enum object_type obj_type = OBJ_NONE;
unsigned int mode;
unsigned long ul;
int stage;
@@ -56,18 +57,17 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
stage = tab[-1] - '0';
- ptr = tab + 1; /* point at the head of path */
+ path_name = tab + 1; /* point at the head of path */
tab = tab - 2; /* point at tail of sha1 */
} else {
stage = 0;
- ptr = tab + 1; /* point at the head of path */
+ path_name = tab + 1; /* point at the head of path */
}
if (get_oid_hex(tab - hexsz, &oid) ||
tab[-(hexsz + 1)] != ' ')
goto bad_line;
- path_name = ptr;
if (!nul_term_line && path_name[0] == '"') {
strbuf_reset(&uq);
if (unquote_c_style(&uq, path_name, NULL)) {
@@ -77,7 +77,15 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
path_name = uq.buf;
}
- ret = fn(mode, &oid, stage, path_name, cbdata);
+ /* Get the type, if provided */
+ if (tab - hexsz - 1 > ptr + 1) {
+ if (*(tab - hexsz - 1) != ' ')
+ goto bad_line;
+ *(tab - hexsz - 1) = '\0';
+ obj_type = type_from_string(ptr + 1);
+ }
+
+ ret = fn(mode, &oid, obj_type, stage, path_name, cbdata);
if (ret) {
ret = -1;
break;
diff --git a/index-info.h b/index-info.h
index 1884972021d..767cf304213 100644
--- a/index-info.h
+++ b/index-info.h
@@ -2,8 +2,9 @@
#define INDEX_INFO_H
#include "hash.h"
+#include "object.h"
-typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
+typedef int (*each_index_info_fn)(unsigned int, struct object_id *, enum object_type, int, const char *, void *);
#define INDEX_INFO_EMPTY_LINE 1
diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
index 29696ade0d0..9c19d24cd4a 100755
--- a/t/t2107-update-index-basic.sh
+++ b/t/t2107-update-index-basic.sh
@@ -153,6 +153,11 @@ test_expect_success '--index-info fails on malformed input' '
test_must_fail git update-index --index-info 2>err &&
grep "malformed input line" err &&
+ # invalid type
+ printf "100644 bad $EMPTY_BLOB\tA" |
+ test_must_fail git update-index --index-info 2>err &&
+ grep "invalid object type" err &&
+
# invalid stage value
printf "100644 $EMPTY_BLOB 5\tA" |
test_must_fail git update-index --index-info 2>err &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 07/16] mktree: use read_index_info to read stdin lines
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (5 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 2:11 ` Junio C Hamano
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
` (9 subsequent siblings)
16 siblings, 2 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Replace the custom input parsing of 'mktree' with 'read_index_info()', which
handles not only the 'ls-tree' output format it already handles but also the
other formats compatible with 'update-index'. This lends some consistency
across the commands (avoiding the need for two similar implementations for
input parsing) and adds flexibility to mktree.
Update 'Documentation/git-mktree.txt' to reflect the more permissive input
format.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 17 +++--
builtin/mktree.c | 139 ++++++++++++-----------------------
t/t1010-mktree.sh | 66 +++++++++++++++++
3 files changed, 125 insertions(+), 97 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 383f09dd333..507682ed23e 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -3,7 +3,7 @@ git-mktree(1)
NAME
----
-git-mktree - Build a tree-object from ls-tree formatted text
+git-mktree - Build a tree-object from formatted tree entries
SYNOPSIS
@@ -13,15 +13,13 @@ SYNOPSIS
DESCRIPTION
-----------
-Reads standard input in non-recursive `ls-tree` output format, and creates
-a tree object. The order of the tree entries is normalized by mktree so
-pre-sorting the input is not required. The object name of the tree object
-built is written to the standard output.
+Reads entry information from stdin and creates a tree object from those entries.
+The object name of the tree object built is written to the standard output.
OPTIONS
-------
-z::
- Read the NUL-terminated `ls-tree -z` output instead.
+ Input lines are separated with NUL rather than LF.
--missing::
Allow missing objects. The default behaviour (without this option)
@@ -35,6 +33,13 @@ OPTIONS
optional. Note - if the `-z` option is used, lines are terminated
with NUL.
+INPUT FORMAT
+------------
+Tree entries may be specified in any of the formats compatible with the
+`--index-info` option to linkgit:git-update-index[1]. The order of the tree
+entries is normalized by `mktree` so pre-sorting the input by path is not
+required.
+
GIT
---
Part of the linkgit:git[1] suite
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 15bd908702a..5530257252d 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -6,6 +6,7 @@
#include "builtin.h"
#include "gettext.h"
#include "hex.h"
+#include "index-info.h"
#include "quote.h"
#include "strbuf.h"
#include "tree.h"
@@ -93,123 +94,80 @@ static const char *mktree_usage[] = {
NULL
};
-static void mktree_line(char *buf, int nul_term_line, int allow_missing,
- struct tree_entry_array *arr)
+struct mktree_line_data {
+ struct tree_entry_array *arr;
+ int allow_missing;
+};
+
+static int mktree_line(unsigned int mode, struct object_id *oid,
+ enum object_type obj_type, int stage UNUSED,
+ const char *path, void *cbdata)
{
- char *ptr, *ntr;
- const char *p;
- unsigned mode;
- enum object_type mode_type; /* object type derived from mode */
- enum object_type obj_type; /* object type derived from sha */
+ struct mktree_line_data *data = cbdata;
+ enum object_type mode_type = object_type(mode);
struct object_info oi = OBJECT_INFO_INIT;
- char *path, *to_free = NULL;
- struct object_id oid;
+ enum object_type parsed_obj_type;
- ptr = buf;
- /*
- * Read non-recursive ls-tree output format:
- * mode SP type SP sha1 TAB name
- */
- mode = strtoul(ptr, &ntr, 8);
- if (ptr == ntr || !ntr || *ntr != ' ')
- die("input format error: %s", buf);
- ptr = ntr + 1; /* type */
- ntr = strchr(ptr, ' ');
- if (!ntr || parse_oid_hex(ntr + 1, &oid, &p) ||
- *p != '\t')
- die("input format error: %s", buf);
-
- /* It is perfectly normal if we do not have a commit from a submodule */
- if (S_ISGITLINK(mode))
- allow_missing = 1;
-
-
- *ntr++ = 0; /* now at the beginning of SHA1 */
-
- path = (char *)p + 1; /* at the beginning of name */
- if (!nul_term_line && path[0] == '"') {
- struct strbuf p_uq = STRBUF_INIT;
- if (unquote_c_style(&p_uq, path, NULL))
- die("invalid quoting");
- path = to_free = strbuf_detach(&p_uq, NULL);
- }
+ if (obj_type && mode_type != obj_type)
+ die("object type (%s) doesn't match mode type (%s)",
+ type_name(obj_type), type_name(mode_type));
- /*
- * Object type is redundantly derivable three ways.
- * These should all agree.
- */
- mode_type = object_type(mode);
- if (mode_type != type_from_string(ptr)) {
- die("entry '%s' object type (%s) doesn't match mode type (%s)",
- path, ptr, type_name(mode_type));
- }
+ oi.typep = &parsed_obj_type;
- /* Check the type of object identified by oid without fetching objects */
- oi.typep = &obj_type;
- if (oid_object_info_extended(the_repository, &oid, &oi,
+ if (oid_object_info_extended(the_repository, oid, &oi,
OBJECT_INFO_LOOKUP_REPLACE |
OBJECT_INFO_QUICK |
OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
- obj_type = -1;
+ parsed_obj_type = -1;
- if (obj_type < 0) {
- if (allow_missing) {
- ; /* no problem - missing objects are presumed to be of the right type */
+ if (parsed_obj_type < 0) {
+ if (data->allow_missing || S_ISGITLINK(mode)) {
+ ; /* no problem - missing objects & submodules are presumed to be of the right type */
} else {
- die("entry '%s' object %s is unavailable", path, oid_to_hex(&oid));
- }
- } else {
- if (obj_type != mode_type) {
- /*
- * The object exists but is of the wrong type.
- * This is a problem regardless of allow_missing
- * because the new tree entry will never be correct.
- */
- die("entry '%s' object %s is a %s but specified type was (%s)",
- path, oid_to_hex(&oid), type_name(obj_type), type_name(mode_type));
+ die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
}
+ } else if (parsed_obj_type != mode_type) {
+ /*
+ * The object exists but is of the wrong type.
+ * This is a problem regardless of allow_missing
+ * because the new tree entry will never be correct.
+ */
+ die("entry '%s' object %s is a %s but specified type was (%s)",
+ path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
}
- append_to_tree(mode, &oid, path, arr);
- free(to_free);
+ append_to_tree(mode, oid, path, data->arr);
+ return 0;
}
int cmd_mktree(int ac, const char **av, const char *prefix)
{
- struct strbuf sb = STRBUF_INIT;
struct object_id oid;
int nul_term_line = 0;
- int allow_missing = 0;
int is_batch_mode = 0;
- int got_eof = 0;
struct tree_entry_array arr = { 0 };
- strbuf_getline_fn getline_fn;
+ struct mktree_line_data mktree_line_data = { .arr = &arr };
+ int ret;
const struct option option[] = {
OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")),
- OPT_BOOL(0, "missing", &allow_missing, N_("allow missing objects")),
+ OPT_BOOL(0, "missing", &mktree_line_data.allow_missing, N_("allow missing objects")),
OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")),
OPT_END()
};
ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
- getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
-
- while (!got_eof) {
- while (1) {
- if (getline_fn(&sb, stdin) == EOF) {
- got_eof = 1;
- break;
- }
- if (sb.buf[0] == '\0') {
- /* empty lines denote tree boundaries in batch mode */
- if (is_batch_mode)
- break;
- die("input format error: (blank line only valid in batch mode)");
- }
- mktree_line(sb.buf, nul_term_line, allow_missing, &arr);
- }
- if (is_batch_mode && got_eof && arr.nr < 1) {
+
+ do {
+ ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data);
+ if (ret < 0)
+ break;
+
+ /* empty lines denote tree boundaries in batch mode */
+ if (ret > 0 && !is_batch_mode)
+ die("input format error: (blank line only valid in batch mode)");
+
+ if (is_batch_mode && !ret && arr.nr < 1) {
/*
* Execution gets here if the last tree entry is terminated with a
* new-line. The final new-line has been made optional to be
@@ -222,9 +180,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
fflush(stdout);
}
clear_tree_entry_array(&arr); /* reset tree entry buffer for re-use in batch mode */
- }
+ } while (ret > 0);
release_tree_entry_array(&arr);
- strbuf_release(&sb);
- return 0;
+ return !!ret;
}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 22875ba598c..9b2ab0c97ad 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -54,11 +54,36 @@ test_expect_success 'ls-tree output in wrong order given to mktree (2)' '
test_cmp tree.withsub actual
'
+test_expect_success '--batch creates multiple trees' '
+ cat top >multi-tree &&
+ echo "" >>multi-tree &&
+ cat top.withsub >>multi-tree &&
+
+ cat tree >expect &&
+ cat tree.withsub >>expect &&
+ git mktree --batch <multi-tree >actual &&
+ test_cmp expect actual
+'
+
test_expect_success 'allow missing object with --missing' '
git mktree --missing <top.missing >actual &&
test_cmp tree.missing actual
'
+test_expect_success 'mktree with invalid submodule OIDs' '
+ # non-existent OID - ok
+ printf "160000 commit $(test_oid numeric)\tA\n" >in &&
+ git mktree <in >tree.actual &&
+ git ls-tree $(cat tree.actual) >actual &&
+ test_cmp in actual &&
+
+ # existing OID, wrong type - error
+ tree_oid="$(cat tree)" &&
+ printf "160000 commit $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
+ grep "object $tree_oid is a tree but specified type was (commit)" err
+'
+
test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
test_must_fail git mktree <all
'
@@ -67,4 +92,45 @@ test_expect_success 'mktree refuses to read ls-tree -r output (2)' '
test_must_fail git mktree <all.withsub
'
+test_expect_success 'mktree fails on malformed input' '
+ # empty line without --batch
+ echo "" |
+ test_must_fail git mktree 2>err &&
+ grep "blank line only valid in batch mode" err &&
+
+ # bad whitespace
+ printf "100644 blob $EMPTY_BLOB A" |
+ test_must_fail git mktree 2>err &&
+ grep "malformed input line" err &&
+
+ # invalid type
+ printf "100644 bad $EMPTY_BLOB\tA" |
+ test_must_fail git mktree 2>err &&
+ grep "invalid object type" err &&
+
+ # invalid OID length
+ printf "100755 blob abc123\tA" |
+ test_must_fail git mktree 2>err &&
+ grep "malformed input line" err &&
+
+ # bad quoting
+ printf "100644 blob $EMPTY_BLOB\t\"A" |
+ test_must_fail git mktree 2>err &&
+ grep "bad quoting of path name" err
+'
+
+test_expect_success 'mktree fails on mode mismatch' '
+ tree_oid="$(cat tree)" &&
+
+ # mode-type mismatch
+ printf "100644 tree $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
+ grep "object type (tree) doesn${SQ}t match mode type (blob)" err &&
+
+ # mode-object mismatch (no --missing)
+ printf "100644 $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
+ grep "object $tree_oid is a tree but specified type was (blob)" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 08/16] mktree: add a --literally option
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (6 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 2:18 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
` (8 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Add the '--literally' option to 'git mktree' to allow constructing a tree
with invalid contents. For now, the only change this represents compared to
the normal 'git mktree' behavior is no longer sorting the inputs; in later
commits, deduplicaton and path validation will be added to the command and
'--literally' will skip those as well.
Certain tests use 'git mktree' to intentionally generate corrupt trees.
Update these tests to use '--literally' so that they continue functioning
properly when additional input cleanup & validation is added to the base
command. Note that, because 'mktree --literally' does not sort entries, some
of the tests are updated to provide their inputs in tree order; otherwise,
the test would fail with an "incorrect order" error instead of the error the
test expects.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 9 ++++++-
builtin/mktree.c | 36 +++++++++++++++++++++++----
t/t1010-mktree.sh | 40 ++++++++++++++++++++++++++++++
t/t1014-read-tree-confusing.sh | 6 ++---
t/t1450-fsck.sh | 4 +--
t/t1601-index-bogus.sh | 2 +-
t/t1700-split-index.sh | 6 ++---
t/t7008-filter-branch-null-sha1.sh | 6 ++---
t/t7417-submodule-path-url.sh | 2 +-
t/t7450-bad-git-dotfiles.sh | 8 +++---
10 files changed, 96 insertions(+), 23 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 507682ed23e..fb07e40cef0 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -9,7 +9,7 @@ git-mktree - Build a tree-object from formatted tree entries
SYNOPSIS
--------
[verse]
-'git mktree' [-z] [--missing] [--batch]
+'git mktree' [-z] [--missing] [--literally] [--batch]
DESCRIPTION
-----------
@@ -27,6 +27,13 @@ OPTIONS
object. This option has no effect on the treatment of gitlink entries
(aka "submodules") which are always allowed to be missing.
+--literally::
+ Create the tree from the tree entries provided to stdin in the order
+ they are provided without performing additional sorting, deduplication,
+ or path validation on them. This option is primarily useful for creating
+ invalid tree objects to use in tests of how Git deals with various forms
+ of tree corruption.
+
--batch::
Allow building of more than one tree object before exiting. Each
tree is separated by a single blank line. The final newline is
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 5530257252d..48019448c1f 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -45,11 +45,11 @@ static void release_tree_entry_array(struct tree_entry_array *arr)
}
static void append_to_tree(unsigned mode, struct object_id *oid, const char *path,
- struct tree_entry_array *arr)
+ struct tree_entry_array *arr, int literally)
{
struct tree_entry *ent;
size_t len = strlen(path);
- if (strchr(path, '/'))
+ if (!literally && strchr(path, '/'))
die("path %s contains slash", path);
FLEX_ALLOC_MEM(ent, name, path, len);
@@ -89,14 +89,35 @@ static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
strbuf_release(&buf);
}
+static void write_tree_literally(struct tree_entry_array *arr,
+ struct object_id *oid)
+{
+ struct strbuf buf;
+ size_t size = 0;
+
+ for (size_t i = 0; i < arr->nr; i++)
+ size += 32 + arr->entries[i]->len;
+
+ strbuf_init(&buf, size);
+ for (size_t i = 0; i < arr->nr; i++) {
+ struct tree_entry *ent = arr->entries[i];
+ strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
+ strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
+ }
+
+ write_object_file(buf.buf, buf.len, OBJ_TREE, oid);
+ strbuf_release(&buf);
+}
+
static const char *mktree_usage[] = {
- "git mktree [-z] [--missing] [--batch]",
+ "git mktree [-z] [--missing] [--literally] [--batch]",
NULL
};
struct mktree_line_data {
struct tree_entry_array *arr;
int allow_missing;
+ int literally;
};
static int mktree_line(unsigned int mode, struct object_id *oid,
@@ -136,7 +157,7 @@ static int mktree_line(unsigned int mode, struct object_id *oid,
path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
}
- append_to_tree(mode, oid, path, data->arr);
+ append_to_tree(mode, oid, path, data->arr, data->literally);
return 0;
}
@@ -152,6 +173,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
const struct option option[] = {
OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")),
OPT_BOOL(0, "missing", &mktree_line_data.allow_missing, N_("allow missing objects")),
+ OPT_BOOL(0, "literally", &mktree_line_data.literally,
+ N_("do not sort, deduplicate, or validate paths of tree entries")),
OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")),
OPT_END()
};
@@ -175,7 +198,10 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
*/
; /* skip creating an empty tree */
} else {
- write_tree(&arr, &oid);
+ if (mktree_line_data.literally)
+ write_tree_literally(&arr, &oid);
+ else
+ write_tree(&arr, &oid);
puts(oid_to_hex(&oid));
fflush(stdout);
}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 9b2ab0c97ad..e0687cb529f 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -133,4 +133,44 @@ test_expect_success 'mktree fails on mode mismatch' '
grep "object $tree_oid is a tree but specified type was (blob)" err
'
+test_expect_success '--literally can create invalid trees' '
+ tree_oid="$(cat tree)" &&
+ blob_oid="$(git rev-parse ${tree_oid}:one)" &&
+
+ # duplicate entries
+ {
+ printf "040000 tree $tree_oid\tmy-tree\n" &&
+ printf "100644 blob $blob_oid\ttest-file\n" &&
+ printf "100755 blob $blob_oid\ttest-file\n"
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+ grep "contains duplicate file entries" err &&
+
+ # disallowed path
+ {
+ printf "100644 blob $blob_oid\t.git\n"
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+ grep "contains ${SQ}.git${SQ}" err &&
+
+ # nested entry
+ {
+ printf "100644 blob $blob_oid\tdeeper/my-file\n"
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+ grep "contains full pathnames" err &&
+
+ # bad entry ordering
+ {
+ printf "100644 blob $blob_oid\tB\n" &&
+ printf "040000 tree $tree_oid\tA\n"
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+ grep "not properly sorted" err
+'
+
test_done
diff --git a/t/t1014-read-tree-confusing.sh b/t/t1014-read-tree-confusing.sh
index 8ea8d36818b..762eb789704 100755
--- a/t/t1014-read-tree-confusing.sh
+++ b/t/t1014-read-tree-confusing.sh
@@ -30,13 +30,13 @@ while read path pretty; do
esac
test_expect_success "reject $pretty at end of path" '
printf "100644 blob %s\t%s" "$blob" "$path" >tree &&
- bogus=$(git mktree <tree) &&
+ bogus=$(git mktree --literally <tree) &&
test_must_fail git read-tree $bogus
'
test_expect_success "reject $pretty as subtree" '
printf "040000 tree %s\t%s" "$tree" "$path" >tree &&
- bogus=$(git mktree <tree) &&
+ bogus=$(git mktree --literally <tree) &&
test_must_fail git read-tree $bogus
'
done <<-EOF
@@ -58,7 +58,7 @@ test_expect_success 'utf-8 paths allowed with core.protectHFS off' '
test_when_finished "git read-tree HEAD" &&
test_config core.protectHFS false &&
printf "100644 blob %s\t%s" "$blob" ".gi${u200c}t" >tree &&
- ok=$(git mktree <tree) &&
+ ok=$(git mktree --literally <tree) &&
git read-tree $ok
'
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8a456b1142d..532d2770e88 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -316,7 +316,7 @@ check_duplicate_names () {
*) printf "100644 blob %s\t%s\n" $blob "$name" ;;
esac
done >badtree &&
- badtree=$(git mktree <badtree) &&
+ badtree=$(git mktree --literally <badtree) &&
test_must_fail git fsck 2>out &&
test_grep "$badtree" out &&
test_grep "error in tree .*contains duplicate file entries" out
@@ -614,7 +614,7 @@ while read name path pretty; do
tree=$(git rev-parse HEAD^{tree}) &&
value=$(eval "echo \$$type") &&
printf "$mode $type %s\t%s" "$value" "$path" >bad &&
- bad_tree=$(git mktree <bad) &&
+ bad_tree=$(git mktree --literally <bad) &&
git fsck 2>out &&
test_grep "warning.*tree $bad_tree" out
)'
diff --git a/t/t1601-index-bogus.sh b/t/t1601-index-bogus.sh
index 4171f1e1410..54e8ae038b7 100755
--- a/t/t1601-index-bogus.sh
+++ b/t/t1601-index-bogus.sh
@@ -4,7 +4,7 @@ test_description='test handling of bogus index entries'
. ./test-lib.sh
test_expect_success 'create tree with null sha1' '
- tree=$(printf "160000 commit $ZERO_OID\\tbroken\\n" | git mktree)
+ tree=$(printf "160000 commit $ZERO_OID\\tbroken\\n" | git mktree --literally)
'
test_expect_success 'read-tree refuses to read null sha1' '
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index ac4a5b2734c..97b58aa3cca 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -478,12 +478,12 @@ test_expect_success 'writing split index with null sha1 does not write cache tre
git config splitIndex.maxPercentChange 0 &&
git commit -m "commit" &&
{
- git ls-tree HEAD &&
- printf "160000 commit $ZERO_OID\\tbroken\\n"
+ printf "160000 commit $ZERO_OID\\tbroken\\n" &&
+ git ls-tree HEAD
} >broken-tree &&
echo "add broken entry" >msg &&
- tree=$(git mktree <broken-tree) &&
+ tree=$(git mktree --literally <broken-tree) &&
test_tick &&
commit=$(git commit-tree $tree -p HEAD <msg) &&
git update-ref HEAD "$commit" &&
diff --git a/t/t7008-filter-branch-null-sha1.sh b/t/t7008-filter-branch-null-sha1.sh
index 93fbc92b8db..a1b4c295c01 100755
--- a/t/t7008-filter-branch-null-sha1.sh
+++ b/t/t7008-filter-branch-null-sha1.sh
@@ -12,12 +12,12 @@ test_expect_success 'setup: base commits' '
test_expect_success 'setup: a commit with a bogus null sha1 in the tree' '
{
- git ls-tree HEAD &&
- printf "160000 commit $ZERO_OID\\tbroken\\n"
+ printf "160000 commit $ZERO_OID\\tbroken\\n" &&
+ git ls-tree HEAD
} >broken-tree &&
echo "add broken entry" >msg &&
- tree=$(git mktree <broken-tree) &&
+ tree=$(git mktree --literally <broken-tree) &&
test_tick &&
commit=$(git commit-tree $tree -p HEAD <msg) &&
git update-ref HEAD "$commit"
diff --git a/t/t7417-submodule-path-url.sh b/t/t7417-submodule-path-url.sh
index dbbb3853dc0..5d3c98e99a7 100755
--- a/t/t7417-submodule-path-url.sh
+++ b/t/t7417-submodule-path-url.sh
@@ -42,7 +42,7 @@ test_expect_success MINGW 'submodule paths disallows trailing spaces' '
tree=$(git -C super write-tree) &&
git -C super ls-tree $tree >tree &&
sed "s/sub/sub /" <tree >tree.new &&
- tree=$(git -C super mktree <tree.new) &&
+ tree=$(git -C super mktree --literally <tree.new) &&
commit=$(echo with space | git -C super commit-tree $tree) &&
git -C super update-ref refs/heads/main $commit &&
diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index 4a9c22c9e2b..de2d45d2244 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -203,11 +203,11 @@ check_dotx_symlink () {
content=$(git hash-object -w ../.gitmodules) &&
target=$(printf "$tricky" | git hash-object -w --stdin) &&
{
- printf "100644 blob $content\t$tricky\n" &&
- printf "120000 blob $target\t$path\n"
+ printf "120000 blob $target\t$path\n" &&
+ printf "100644 blob $content\t$tricky\n"
} >bad-tree
) &&
- tree=$(git -C $dir mktree <$dir/bad-tree)
+ tree=$(git -C $dir mktree --literally <$dir/bad-tree)
'
test_expect_success "fsck detects symlinked $name ($type)" '
@@ -261,7 +261,7 @@ test_expect_success 'fsck detects non-blob .gitmodules' '
cp ../.gitmodules subdir/file &&
git add subdir/file &&
git commit -m ok &&
- git ls-tree HEAD | sed s/subdir/.gitmodules/ | git mktree &&
+ git ls-tree HEAD | sed s/subdir/.gitmodules/ | git mktree --literally &&
test_must_fail git fsck 2>output &&
test_grep gitmodulesBlob output
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 09/16] mktree: validate paths more carefully
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (7 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 2:26 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
` (7 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Use 'verify_path' to validate the paths provided as tree entries, ensuring
we do not create entries with paths not allowed in trees (e.g., .git). Also,
remove trailing slashes on directories before validating, allowing users to
provide 'folder-name/' as the path for a tree object entry.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 20 +++++++++++++++++---
t/t1010-mktree.sh | 33 +++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+), 3 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 48019448c1f..29e9dc6ce69 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -8,6 +8,7 @@
#include "hex.h"
#include "index-info.h"
#include "quote.h"
+#include "read-cache-ll.h"
#include "strbuf.h"
#include "tree.h"
#include "parse-options.h"
@@ -49,10 +50,23 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
{
struct tree_entry *ent;
size_t len = strlen(path);
- if (!literally && strchr(path, '/'))
- die("path %s contains slash", path);
- FLEX_ALLOC_MEM(ent, name, path, len);
+ if (literally) {
+ FLEX_ALLOC_MEM(ent, name, path, len);
+ } else {
+ /* Normalize and validate entry path */
+ if (S_ISDIR(mode)) {
+ while(len > 0 && is_dir_sep(path[len - 1]))
+ len--;
+ }
+ FLEX_ALLOC_MEM(ent, name, path, len);
+
+ if (!verify_path(ent->name, mode))
+ die(_("invalid path '%s'"), path);
+ if (strchr(ent->name, '/'))
+ die("path %s contains slash", path);
+ }
+
ent->mode = mode;
ent->len = len;
oidcpy(&ent->oid, oid);
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index e0687cb529f..e0263cb2bf8 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -173,4 +173,37 @@ test_expect_success '--literally can create invalid trees' '
grep "not properly sorted" err
'
+test_expect_success 'mktree validates path' '
+ tree_oid="$(cat tree)" &&
+ blob_oid="$(git rev-parse $tree_oid:a/one)" &&
+ head_oid="$(git rev-parse HEAD)" &&
+
+ # Valid: tree with or without trailing slash, blob without trailing slash
+ {
+ printf "040000 tree $tree_oid\tfolder1/\n" &&
+ printf "040000 tree $tree_oid\tfolder2\n" &&
+ printf "100644 blob $blob_oid\tfile.txt\n"
+ } | git mktree >actual &&
+
+ # Invalid: blob with trailing slash
+ printf "100644 blob $blob_oid\ttest/" |
+ test_must_fail git mktree 2>err &&
+ grep "invalid path ${SQ}test/${SQ}" err &&
+
+ # Invalid: dotdot
+ printf "040000 tree $tree_oid\t../" |
+ test_must_fail git mktree 2>err &&
+ grep "invalid path ${SQ}../${SQ}" err &&
+
+ # Invalid: dot
+ printf "040000 tree $tree_oid\t." |
+ test_must_fail git mktree 2>err &&
+ grep "invalid path ${SQ}.${SQ}" err &&
+
+ # Invalid: .git
+ printf "040000 tree $tree_oid\t.git/" |
+ test_must_fail git mktree 2>err &&
+ grep "invalid path ${SQ}.git/${SQ}" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 10/16] mktree: overwrite duplicate entries
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (8 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
` (6 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
If multiple tree entries with the same name are provided as input to
'mktree', only write the last one to the tree. Entries are considered
duplicates if they have identical names (*not* considering mode); if a blob
and a tree with the same name are provided, only the last one will be
written to the tree. A tree with duplicate entries is invalid (per 'git
fsck'), so that condition should be avoided wherever possible.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 8 ++++---
builtin/mktree.c | 45 ++++++++++++++++++++++++++++++++----
t/t1010-mktree.sh | 36 +++++++++++++++++++++++++++--
3 files changed, 80 insertions(+), 9 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index fb07e40cef0..afbc846d077 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -43,9 +43,11 @@ OPTIONS
INPUT FORMAT
------------
Tree entries may be specified in any of the formats compatible with the
-`--index-info` option to linkgit:git-update-index[1]. The order of the tree
-entries is normalized by `mktree` so pre-sorting the input by path is not
-required.
+`--index-info` option to linkgit:git-update-index[1].
+
+The order of the tree entries is normalized by `mktree` so pre-sorting the input
+by path is not required. Multiple entries provided with the same path are
+deduplicated, with only the last one specified added to the tree.
GIT
---
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 29e9dc6ce69..e9e2134136f 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -15,6 +15,9 @@
#include "object-store-ll.h"
struct tree_entry {
+ /* Internal */
+ size_t order;
+
unsigned mode;
struct object_id oid;
int len;
@@ -72,15 +75,49 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
oidcpy(&ent->oid, oid);
/* Append the update */
+ ent->order = arr->nr;
tree_entry_array_push(arr, ent);
}
-static int ent_compare(const void *a_, const void *b_)
+static int ent_compare(const void *a_, const void *b_, void *ctx)
{
+ int cmp;
struct tree_entry *a = *(struct tree_entry **)a_;
struct tree_entry *b = *(struct tree_entry **)b_;
- return base_name_compare(a->name, a->len, a->mode,
- b->name, b->len, b->mode);
+ int ignore_mode = *((int *)ctx);
+
+ if (ignore_mode)
+ cmp = name_compare(a->name, a->len, b->name, b->len);
+ else
+ cmp = base_name_compare(a->name, a->len, a->mode,
+ b->name, b->len, b->mode);
+ return cmp ? cmp : b->order - a->order;
+}
+
+static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
+{
+ size_t count = arr->nr;
+ struct tree_entry *prev = NULL;
+
+ int ignore_mode = 1;
+ QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
+
+ arr->nr = 0;
+ for (size_t i = 0; i < count; i++) {
+ struct tree_entry *curr = arr->entries[i];
+ if (prev &&
+ !name_compare(prev->name, prev->len,
+ curr->name, curr->len)) {
+ FREE_AND_NULL(curr);
+ } else {
+ arr->entries[arr->nr++] = curr;
+ prev = curr;
+ }
+ }
+
+ /* Sort again to order the entries for tree insertion */
+ ignore_mode = 0;
+ QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
}
static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
@@ -88,7 +125,7 @@ static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
struct strbuf buf;
size_t size = 0;
- QSORT(arr->entries, arr->nr, ent_compare);
+ sort_and_dedup_tree_entry_array(arr);
for (size_t i = 0; i < arr->nr; i++)
size += 32 + arr->entries[i]->len;
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index e0263cb2bf8..956692347f0 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -6,11 +6,16 @@ TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success setup '
- for d in a a- a0
+ for d in folder folder- folder0
do
mkdir "$d" && echo "$d/one" >"$d/one" &&
git add "$d" || return 1
done &&
+ for f in before folder.txt later
+ do
+ echo "$f" >"$f" &&
+ git add "$f" || return 1
+ done &&
echo zero >one &&
git update-index --add --info-only one &&
git write-tree --missing-ok >tree.missing &&
@@ -175,7 +180,7 @@ test_expect_success '--literally can create invalid trees' '
test_expect_success 'mktree validates path' '
tree_oid="$(cat tree)" &&
- blob_oid="$(git rev-parse $tree_oid:a/one)" &&
+ blob_oid="$(git rev-parse $tree_oid:folder.txt)" &&
head_oid="$(git rev-parse HEAD)" &&
# Valid: tree with or without trailing slash, blob without trailing slash
@@ -206,4 +211,31 @@ test_expect_success 'mktree validates path' '
grep "invalid path ${SQ}.git/${SQ}" err
'
+test_expect_success 'mktree with duplicate entries' '
+ tree_oid=$(cat tree) &&
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ before_oid=$(git rev-parse ${tree_oid}:before) &&
+ head_oid=$(git rev-parse HEAD) &&
+
+ {
+ printf "100755 blob $before_oid\ttest\n" &&
+ printf "040000 tree $folder_oid\ttest-\n" &&
+ printf "160000 commit $head_oid\ttest.txt\n" &&
+ printf "040000 tree $folder_oid\ttest\n" &&
+ printf "100644 blob $before_oid\ttest0\n" &&
+ printf "160000 commit $head_oid\ttest-\n"
+ } >top.dup &&
+ git mktree <top.dup >tree.actual &&
+
+ {
+ printf "160000 commit $head_oid\ttest-\n" &&
+ printf "160000 commit $head_oid\ttest.txt\n" &&
+ printf "040000 tree $folder_oid\ttest\n" &&
+ printf "100644 blob $before_oid\ttest0\n"
+ } >expect &&
+ git ls-tree $(cat tree.actual) >actual &&
+
+ test_cmp expect actual
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 11/16] mktree: create tree using an in-core index
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (9 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
` (5 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Rather than manually write out the contents of a tree object file, construct
an in-memory sparse index from the provided tree entries and create the tree
by writing out its corresponding cache tree.
This patch does not change the behavior of the 'mktree' command. However,
constructing the tree this way will substantially simplify future extensions
to the command's functionality, including handling deeper-than-toplevel tree
entries and applying the provided entries to an existing tree.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 74 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 55 insertions(+), 19 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index e9e2134136f..12f68187221 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -4,6 +4,7 @@
* Copyright (c) Junio C Hamano, 2006, 2009
*/
#include "builtin.h"
+#include "cache-tree.h"
#include "gettext.h"
#include "hex.h"
#include "index-info.h"
@@ -24,6 +25,11 @@ struct tree_entry {
char name[FLEX_ARRAY];
};
+static inline size_t df_path_len(size_t pathlen, unsigned int mode)
+{
+ return S_ISDIR(mode) ? pathlen - 1 : pathlen;
+}
+
struct tree_entry_array {
size_t nr, alloc;
struct tree_entry **entries;
@@ -57,17 +63,25 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
if (literally) {
FLEX_ALLOC_MEM(ent, name, path, len);
} else {
+ size_t len_to_copy = len;
+
/* Normalize and validate entry path */
if (S_ISDIR(mode)) {
- while(len > 0 && is_dir_sep(path[len - 1]))
- len--;
+ while(len_to_copy > 0 && is_dir_sep(path[len_to_copy - 1]))
+ len_to_copy--;
+ len = len_to_copy + 1; /* add space for trailing slash */
}
- FLEX_ALLOC_MEM(ent, name, path, len);
+ ent = xcalloc(1, st_add3(sizeof(struct tree_entry), len, 1));
+ memcpy(ent->name, path, len_to_copy);
if (!verify_path(ent->name, mode))
die(_("invalid path '%s'"), path);
if (strchr(ent->name, '/'))
die("path %s contains slash", path);
+
+ /* Add trailing slash to dir */
+ if (S_ISDIR(mode))
+ ent->name[len - 1] = '/';
}
ent->mode = mode;
@@ -86,11 +100,14 @@ static int ent_compare(const void *a_, const void *b_, void *ctx)
struct tree_entry *b = *(struct tree_entry **)b_;
int ignore_mode = *((int *)ctx);
- if (ignore_mode)
- cmp = name_compare(a->name, a->len, b->name, b->len);
- else
- cmp = base_name_compare(a->name, a->len, a->mode,
- b->name, b->len, b->mode);
+ size_t a_len = a->len, b_len = b->len;
+
+ if (ignore_mode) {
+ a_len = df_path_len(a_len, a->mode);
+ b_len = df_path_len(b_len, b->mode);
+ }
+
+ cmp = name_compare(a->name, a_len, b->name, b_len);
return cmp ? cmp : b->order - a->order;
}
@@ -106,8 +123,8 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
for (size_t i = 0; i < count; i++) {
struct tree_entry *curr = arr->entries[i];
if (prev &&
- !name_compare(prev->name, prev->len,
- curr->name, curr->len)) {
+ !name_compare(prev->name, df_path_len(prev->len, prev->mode),
+ curr->name, df_path_len(curr->len, curr->mode))) {
FREE_AND_NULL(curr);
} else {
arr->entries[arr->nr++] = curr;
@@ -120,24 +137,43 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
}
+static int add_tree_entry_to_index(struct index_state *istate,
+ struct tree_entry *ent)
+{
+ struct cache_entry *ce;
+ struct strbuf ce_name = STRBUF_INIT;
+ strbuf_add(&ce_name, ent->name, ent->len);
+
+ ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
+ if (!ce)
+ return error(_("make_cache_entry failed for path '%s'"), ent->name);
+
+ add_index_entry(istate, ce, ADD_CACHE_JUST_APPEND);
+ strbuf_release(&ce_name);
+ return 0;
+}
+
static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
{
- struct strbuf buf;
- size_t size = 0;
+ struct index_state istate = INDEX_STATE_INIT(the_repository);
+ istate.sparse_index = 1;
sort_and_dedup_tree_entry_array(arr);
- for (size_t i = 0; i < arr->nr; i++)
- size += 32 + arr->entries[i]->len;
- strbuf_init(&buf, size);
+ /* Construct an in-memory index from the provided entries */
for (size_t i = 0; i < arr->nr; i++) {
struct tree_entry *ent = arr->entries[i];
- strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
- strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
+
+ if (add_tree_entry_to_index(&istate, ent))
+ die(_("failed to add tree entry '%s'"), ent->name);
}
- write_object_file(buf.buf, buf.len, OBJ_TREE, oid);
- strbuf_release(&buf);
+ /* Write out new tree */
+ if (cache_tree_update(&istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
+ die(_("failed to write tree"));
+ oidcpy(oid, &istate.cache_tree->oid);
+
+ release_index(&istate);
}
static void write_tree_literally(struct tree_entry_array *arr,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 12/16] mktree: use iterator struct to add tree entries to index
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (10 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
` (4 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Create 'struct tree_entry_iterator' to manage iteration through a 'struct
tree_entry_array'. Using an iterator allows for conditional iteration; this
functionality will be necessary in later commits when performing parallel
iteration through multiple sets of tree entries.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 40 +++++++++++++++++++++++++++++++++++++---
1 file changed, 37 insertions(+), 3 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 12f68187221..bee359e9978 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -137,6 +137,38 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
}
+struct tree_entry_iterator {
+ struct tree_entry *current;
+
+ /* private */
+ struct {
+ struct tree_entry_array *arr;
+ size_t idx;
+ } priv;
+};
+
+static void init_tree_entry_iterator(struct tree_entry_iterator *iter,
+ struct tree_entry_array *arr)
+{
+ iter->priv.arr = arr;
+ iter->priv.idx = 0;
+ iter->current = 0 < arr->nr ? arr->entries[0] : NULL;
+}
+
+/*
+ * Advance the tree entry iterator to the next entry in the array. If no entries
+ * remain, 'current' is set to NULL. Returns the previous 'current' value of the
+ * iterator.
+ */
+static struct tree_entry *advance_tree_entry_iterator(struct tree_entry_iterator *iter)
+{
+ struct tree_entry *prev = iter->current;
+ iter->current = (iter->priv.idx + 1) < iter->priv.arr->nr
+ ? iter->priv.arr->entries[++iter->priv.idx]
+ : NULL;
+ return prev;
+}
+
static int add_tree_entry_to_index(struct index_state *istate,
struct tree_entry *ent)
{
@@ -155,15 +187,17 @@ static int add_tree_entry_to_index(struct index_state *istate,
static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
{
+ struct tree_entry_iterator iter = { NULL };
+ struct tree_entry *ent;
struct index_state istate = INDEX_STATE_INIT(the_repository);
istate.sparse_index = 1;
sort_and_dedup_tree_entry_array(arr);
- /* Construct an in-memory index from the provided entries */
- for (size_t i = 0; i < arr->nr; i++) {
- struct tree_entry *ent = arr->entries[i];
+ init_tree_entry_iterator(&iter, arr);
+ /* Construct an in-memory index from the provided entries & base tree */
+ while ((ent = advance_tree_entry_iterator(&iter))) {
if (add_tree_entry_to_index(&istate, ent))
die(_("failed to add tree entry '%s'"), ent->name);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 13/16] mktree: add directory-file conflict hashmap
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (11 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
` (3 subsequent siblings)
16 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Create a hashmap member of a 'struct tree_entry_array' that contains all of
the (de-duplicated) provided tree entries, indexed by the hash of their path
with *no* trailing slash. This hashmap will be used in a later commit to
avoid adding a file to an existing tree that has the same path as a
directory, or vice versa.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index bee359e9978..09b3c5c6244 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -16,6 +16,8 @@
#include "object-store-ll.h"
struct tree_entry {
+ struct hashmap_entry ent;
+
/* Internal */
size_t order;
@@ -33,8 +35,33 @@ static inline size_t df_path_len(size_t pathlen, unsigned int mode)
struct tree_entry_array {
size_t nr, alloc;
struct tree_entry **entries;
+
+ struct hashmap df_name_hash;
};
+static int df_name_hash_cmp(const void *cmp_data UNUSED,
+ const struct hashmap_entry *eptr,
+ const struct hashmap_entry *entry_or_key,
+ const void *keydata UNUSED)
+{
+ const struct tree_entry *e1, *e2;
+ size_t e1_len, e2_len;
+
+ e1 = container_of(eptr, const struct tree_entry, ent);
+ e2 = container_of(entry_or_key, const struct tree_entry, ent);
+
+ e1_len = df_path_len(e1->len, e1->mode);
+ e2_len = df_path_len(e2->len, e2->mode);
+
+ return e1_len != e2_len ||
+ name_compare(e1->name, e1_len, e2->name, e2_len);
+}
+
+static void init_tree_entry_array(struct tree_entry_array *arr)
+{
+ hashmap_init(&arr->df_name_hash, df_name_hash_cmp, NULL, 0);
+}
+
static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entry *ent)
{
ALLOC_GROW(arr->entries, arr->nr + 1, arr->alloc);
@@ -43,6 +70,7 @@ static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entr
static void clear_tree_entry_array(struct tree_entry_array *arr)
{
+ hashmap_clear(&arr->df_name_hash);
for (size_t i = 0; i < arr->nr; i++)
FREE_AND_NULL(arr->entries[i]);
arr->nr = 0;
@@ -50,6 +78,7 @@ static void clear_tree_entry_array(struct tree_entry_array *arr)
static void release_tree_entry_array(struct tree_entry_array *arr)
{
+ hashmap_clear(&arr->df_name_hash);
FREE_AND_NULL(arr->entries);
arr->nr = arr->alloc = 0;
}
@@ -135,6 +164,14 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
/* Sort again to order the entries for tree insertion */
ignore_mode = 0;
QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
+
+ /* Finally, initialize the directory-file conflict hash map */
+ for (size_t i = 0; i < count; i++) {
+ struct tree_entry *curr = arr->entries[i];
+ hashmap_entry_init(&curr->ent,
+ memhash(curr->name, df_path_len(curr->len, curr->mode)));
+ hashmap_put(&arr->df_name_hash, &curr->ent);
+ }
}
struct tree_entry_iterator {
@@ -302,6 +339,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
+ init_tree_entry_array(&arr);
+
do {
ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data);
if (ret < 0)
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 14/16] mktree: optionally add to an existing tree
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (12 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
` (2 subsequent siblings)
16 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Allow users to specify a single "tree-ish" value as a positional argument.
If provided, the contents of the given tree serve as the basis for the new
tree (or trees, in --batch mode) created by 'mktree', on top of which all of
the stdin-provided tree entries are applied.
At a high level, the entries are "applied" to a base tree by iterating
through the base tree using 'read_tree' in parallel with iterating through
the sorted & deduplicated stdin entries via their iterator. That is, for
each call to the 'build_index_from_tree callback of 'read_tree':
* If the iterator entry precedes the base tree entry, add it to the in-core
index, increment the iterator, and repeat.
* If the iterator entry has the same name as the base tree entry, add the
iterator entry to the index, increment the iterator, and return from the
callback to continue the 'read_tree' iteration.
* If the iterator entry follows the base tree entry, first check
'df_name_hash' to ensure we won't be adding an entry with the same name
later (with a different mode). If there's no directory/file conflict, add
the base tree entry to the index. In either case, return from the callback
to continue the 'read_tree' iteration.
Finally, once 'read_tree' is complete, add the remaining entries in the
iterator to the index and write out the index as a tree.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 7 +-
builtin/mktree.c | 134 ++++++++++++++++++++++++++++++-----
t/t1010-mktree.sh | 36 ++++++++++
3 files changed, 157 insertions(+), 20 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index afbc846d077..99abd3c31a6 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -9,7 +9,7 @@ git-mktree - Build a tree-object from formatted tree entries
SYNOPSIS
--------
[verse]
-'git mktree' [-z] [--missing] [--literally] [--batch]
+'git mktree' [-z] [--missing] [--literally] [--batch] [--] [<tree-ish>]
DESCRIPTION
-----------
@@ -40,6 +40,11 @@ OPTIONS
optional. Note - if the `-z` option is used, lines are terminated
with NUL.
+<tree-ish>::
+ If provided, the tree entries provided in stdin are added to this tree
+ rather than a new empty one, replacing existing entries with identical
+ names. Not compatible with `--literally`.
+
INPUT FORMAT
------------
Tree entries may be specified in any of the formats compatible with the
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 09b3c5c6244..9e9d2554cad 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -12,7 +12,9 @@
#include "read-cache-ll.h"
#include "strbuf.h"
#include "tree.h"
+#include "object-name.h"
#include "parse-options.h"
+#include "pathspec.h"
#include "object-store-ll.h"
struct tree_entry {
@@ -206,45 +208,122 @@ static struct tree_entry *advance_tree_entry_iterator(struct tree_entry_iterator
return prev;
}
-static int add_tree_entry_to_index(struct index_state *istate,
+struct build_index_data {
+ struct tree_entry_iterator iter;
+ struct hashmap *df_name_hash;
+ struct index_state istate;
+};
+
+static int add_tree_entry_to_index(struct build_index_data *data,
struct tree_entry *ent)
{
struct cache_entry *ce;
- struct strbuf ce_name = STRBUF_INIT;
- strbuf_add(&ce_name, ent->name, ent->len);
-
- ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
+ ce = make_cache_entry(&data->istate, ent->mode, &ent->oid, ent->name, 0, 0);
if (!ce)
return error(_("make_cache_entry failed for path '%s'"), ent->name);
- add_index_entry(istate, ce, ADD_CACHE_JUST_APPEND);
- strbuf_release(&ce_name);
+ add_index_entry(&data->istate, ce, ADD_CACHE_JUST_APPEND);
return 0;
}
-static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
+static int build_index_from_tree(const struct object_id *oid,
+ struct strbuf *base, const char *filename,
+ unsigned mode, void *context)
{
- struct tree_entry_iterator iter = { NULL };
+ int result;
+ struct tree_entry *base_tree_ent;
+ struct build_index_data *cbdata = context;
+ size_t filename_len = strlen(filename);
+ size_t path_len = S_ISDIR(mode) ? st_add3(filename_len, base->len, 1)
+ : st_add(filename_len, base->len);
+
+ /* Create a tree entry from the current entry in read_tree iteration */
+ base_tree_ent = xcalloc(1, st_add3(sizeof(struct tree_entry), path_len, 1));
+ base_tree_ent->len = path_len;
+ base_tree_ent->mode = mode;
+ oidcpy(&base_tree_ent->oid, oid);
+
+ memcpy(base_tree_ent->name, base->buf, base->len);
+ memcpy(base_tree_ent->name + base->len, filename, filename_len);
+ if (S_ISDIR(mode))
+ base_tree_ent->name[base_tree_ent->len - 1] = '/';
+
+ while (cbdata->iter.current) {
+ struct tree_entry *ent = cbdata->iter.current;
+
+ int cmp = name_compare(ent->name, ent->len,
+ base_tree_ent->name, base_tree_ent->len);
+ if (!cmp || cmp < 0) {
+ advance_tree_entry_iterator(&cbdata->iter);
+
+ if (add_tree_entry_to_index(cbdata, ent) < 0) {
+ result = error(_("failed to add tree entry '%s'"), ent->name);
+ goto cleanup_and_return;
+ }
+
+ if (!cmp) {
+ result = 0;
+ goto cleanup_and_return;
+ } else
+ continue;
+ }
+
+ break;
+ }
+
+ /*
+ * If the tree entry should be replaced with an entry with the same name
+ * (but different mode), skip it.
+ */
+ hashmap_entry_init(&base_tree_ent->ent,
+ memhash(base_tree_ent->name, df_path_len(base_tree_ent->len, base_tree_ent->mode)));
+ if (hashmap_get_entry(cbdata->df_name_hash, base_tree_ent, ent, NULL)) {
+ result = 0;
+ goto cleanup_and_return;
+ }
+
+ if (add_tree_entry_to_index(cbdata, base_tree_ent)) {
+ result = -1;
+ goto cleanup_and_return;
+ }
+
+ result = 0;
+
+cleanup_and_return:
+ FREE_AND_NULL(base_tree_ent);
+ return result;
+}
+
+static void write_tree(struct tree_entry_array *arr, struct tree *base_tree,
+ struct object_id *oid)
+{
+ struct build_index_data cbdata = { 0 };
struct tree_entry *ent;
- struct index_state istate = INDEX_STATE_INIT(the_repository);
- istate.sparse_index = 1;
+ struct pathspec ps = { 0 };
sort_and_dedup_tree_entry_array(arr);
- init_tree_entry_iterator(&iter, arr);
+ index_state_init(&cbdata.istate, the_repository);
+ cbdata.istate.sparse_index = 1;
+ init_tree_entry_iterator(&cbdata.iter, arr);
+ cbdata.df_name_hash = &arr->df_name_hash;
/* Construct an in-memory index from the provided entries & base tree */
- while ((ent = advance_tree_entry_iterator(&iter))) {
- if (add_tree_entry_to_index(&istate, ent))
+ if (base_tree &&
+ read_tree(the_repository, base_tree, &ps, build_index_from_tree, &cbdata) < 0)
+ die(_("failed to create tree"));
+
+ while ((ent = advance_tree_entry_iterator(&cbdata.iter))) {
+ if (add_tree_entry_to_index(&cbdata, ent))
die(_("failed to add tree entry '%s'"), ent->name);
}
/* Write out new tree */
- if (cache_tree_update(&istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
+ if (cache_tree_update(&cbdata.istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
die(_("failed to write tree"));
- oidcpy(oid, &istate.cache_tree->oid);
+ oidcpy(oid, &cbdata.istate.cache_tree->oid);
- release_index(&istate);
+ release_index(&cbdata.istate);
}
static void write_tree_literally(struct tree_entry_array *arr,
@@ -268,7 +347,7 @@ static void write_tree_literally(struct tree_entry_array *arr,
}
static const char *mktree_usage[] = {
- "git mktree [-z] [--missing] [--literally] [--batch]",
+ "git mktree [-z] [--missing] [--literally] [--batch] [--] [<tree-ish>]",
NULL
};
@@ -326,6 +405,7 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
int is_batch_mode = 0;
struct tree_entry_array arr = { 0 };
struct mktree_line_data mktree_line_data = { .arr = &arr };
+ struct tree *base_tree = NULL;
int ret;
const struct option option[] = {
@@ -338,6 +418,22 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
};
ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
+ if (ac > 1)
+ usage_with_options(mktree_usage, option);
+
+ if (ac) {
+ struct object_id base_tree_oid;
+
+ if (mktree_line_data.literally)
+ die(_("option '%s' and tree-ish cannot be used together"), "--literally");
+
+ if (repo_get_oid(the_repository, av[0], &base_tree_oid))
+ die(_("not a valid object name %s"), av[0]);
+
+ base_tree = parse_tree_indirect(&base_tree_oid);
+ if (!base_tree)
+ die(_("not a tree object: %s"), oid_to_hex(&base_tree_oid));
+ }
init_tree_entry_array(&arr);
@@ -361,7 +457,7 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
if (mktree_line_data.literally)
write_tree_literally(&arr, &oid);
else
- write_tree(&arr, &oid);
+ write_tree(&arr, base_tree, &oid);
puts(oid_to_hex(&oid));
fflush(stdout);
}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 956692347f0..ea5a011405e 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -238,4 +238,40 @@ test_expect_success 'mktree with duplicate entries' '
test_cmp expect actual
'
+test_expect_success 'mktree with base tree' '
+ tree_oid=$(cat tree) &&
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ before_oid=$(git rev-parse ${tree_oid}:before) &&
+ head_oid=$(git rev-parse HEAD) &&
+
+ {
+ printf "040000 tree $folder_oid\ttest\n" &&
+ printf "100644 blob $before_oid\ttest.txt\n" &&
+ printf "040000 tree $folder_oid\ttest-\n" &&
+ printf "160000 commit $head_oid\ttest0\n"
+ } >top.base &&
+ git mktree <top.base >tree.base &&
+
+ {
+ printf "100755 blob $before_oid\tz\n" &&
+ printf "160000 commit $head_oid\ttest.xyz\n" &&
+ printf "040000 tree $folder_oid\ta\n" &&
+ printf "100644 blob $before_oid\ttest\n"
+ } >top.append &&
+ git mktree $(cat tree.base) <top.append >tree.actual &&
+
+ {
+ printf "040000 tree $folder_oid\ta\n" &&
+ printf "100644 blob $before_oid\ttest\n" &&
+ printf "040000 tree $folder_oid\ttest-\n" &&
+ printf "100644 blob $before_oid\ttest.txt\n" &&
+ printf "160000 commit $head_oid\ttest.xyz\n" &&
+ printf "160000 commit $head_oid\ttest0\n" &&
+ printf "100755 blob $before_oid\tz\n"
+ } >expect &&
+ git ls-tree $(cat tree.actual) >actual &&
+
+ test_cmp expect actual
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 15/16] mktree: allow deeper paths in input
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (13 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
16 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Update 'git mktree' to handle entries nested inside of directories (e.g.
'path/to/a/file.txt'). This functionality requires a series of changes:
* In 'sort_and_dedup_tree_entry_array()', remove entries inside of
directories that come after them in input order.
* Also in 'sort_and_dedup_tree_entry_array()', mark directories that contain
entries that come after them in input order (e.g., 'folder/' followed by
'folder/file.txt') as "need to expand".
* In 'add_tree_entry_to_index()', if a tree entry is marked as "need to
expand", recurse into it with 'read_tree_at()' & 'build_index_from_tree'.
* In 'build_index_from_tree()', if a user-specified tree entry is contained
within the current iterated entry, return 'READ_TREE_RECURSIVE' to recurse
into the iterated tree.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 5 ++
builtin/mktree.c | 101 ++++++++++++++++++++++++++++++---
t/t1010-mktree.sh | 107 +++++++++++++++++++++++++++++++++--
3 files changed, 200 insertions(+), 13 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 99abd3c31a6..db90fdcdc8f 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -50,6 +50,11 @@ INPUT FORMAT
Tree entries may be specified in any of the formats compatible with the
`--index-info` option to linkgit:git-update-index[1].
+Entries may use full pathnames containing directory separators to specify
+entries nested within one or more directories. These entries are inserted into
+the appropriate tree in the base tree-ish if one exists. Otherwise, empty parent
+trees are created to contain the entries.
+
The order of the tree entries is normalized by `mktree` so pre-sorting the input
by path is not required. Multiple entries provided with the same path are
deduplicated, with only the last one specified added to the tree.
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 9e9d2554cad..00b77869a56 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -22,6 +22,7 @@ struct tree_entry {
/* Internal */
size_t order;
+ int expand_dir;
unsigned mode;
struct object_id oid;
@@ -39,6 +40,7 @@ struct tree_entry_array {
struct tree_entry **entries;
struct hashmap df_name_hash;
+ int has_nested_entries;
};
static int df_name_hash_cmp(const void *cmp_data UNUSED,
@@ -70,6 +72,13 @@ static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entr
arr->entries[arr->nr++] = ent;
}
+static struct tree_entry *tree_entry_array_pop(struct tree_entry_array *arr)
+{
+ if (!arr->nr)
+ return NULL;
+ return arr->entries[--arr->nr];
+}
+
static void clear_tree_entry_array(struct tree_entry_array *arr)
{
hashmap_clear(&arr->df_name_hash);
@@ -107,8 +116,10 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
if (!verify_path(ent->name, mode))
die(_("invalid path '%s'"), path);
- if (strchr(ent->name, '/'))
- die("path %s contains slash", path);
+
+ /* mark has_nested_entries if needed */
+ if (!arr->has_nested_entries && strchr(ent->name, '/'))
+ arr->has_nested_entries = 1;
/* Add trailing slash to dir */
if (S_ISDIR(mode))
@@ -167,6 +178,46 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
ignore_mode = 0;
QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
+ if (arr->has_nested_entries) {
+ struct tree_entry_array parent_dir_ents = { 0 };
+
+ count = arr->nr;
+ arr->nr = 0;
+
+ /* Remove any entries where one of its parent dirs has a higher 'order' */
+ for (size_t i = 0; i < count; i++) {
+ const char *skipped_prefix;
+ struct tree_entry *parent;
+ struct tree_entry *curr = arr->entries[i];
+ int skip_entry = 0;
+
+ while ((parent = tree_entry_array_pop(&parent_dir_ents))) {
+ if (!skip_prefix(curr->name, parent->name, &skipped_prefix))
+ continue;
+
+ /* entry in dir, so we push the parent back onto the stack */
+ tree_entry_array_push(&parent_dir_ents, parent);
+
+ if (parent->order > curr->order)
+ skip_entry = 1;
+ else
+ parent->expand_dir = 1;
+
+ break;
+ }
+
+ if (!skip_entry) {
+ arr->entries[arr->nr++] = curr;
+ if (S_ISDIR(curr->mode))
+ tree_entry_array_push(&parent_dir_ents, curr);
+ } else {
+ FREE_AND_NULL(curr);
+ }
+ }
+
+ release_tree_entry_array(&parent_dir_ents);
+ }
+
/* Finally, initialize the directory-file conflict hash map */
for (size_t i = 0; i < count; i++) {
struct tree_entry *curr = arr->entries[i];
@@ -214,15 +265,40 @@ struct build_index_data {
struct index_state istate;
};
+static int build_index_from_tree(const struct object_id *oid,
+ struct strbuf *base, const char *filename,
+ unsigned mode, void *context);
+
static int add_tree_entry_to_index(struct build_index_data *data,
struct tree_entry *ent)
{
- struct cache_entry *ce;
- ce = make_cache_entry(&data->istate, ent->mode, &ent->oid, ent->name, 0, 0);
- if (!ce)
- return error(_("make_cache_entry failed for path '%s'"), ent->name);
+ if (ent->expand_dir) {
+ int ret = 0;
+ struct pathspec ps = { 0 };
+ struct tree *subtree = parse_tree_indirect(&ent->oid);
+ struct strbuf base_path = STRBUF_INIT;
+ strbuf_add(&base_path, ent->name, ent->len);
+
+ if (!subtree)
+ ret = error(_("not a tree object: %s"), oid_to_hex(&ent->oid));
+ else if (read_tree_at(the_repository, subtree, &base_path, 0, &ps,
+ build_index_from_tree, data) < 0)
+ ret = -1;
+
+ strbuf_release(&base_path);
+ if (ret)
+ return ret;
+
+ } else {
+ struct cache_entry *ce = make_cache_entry(&data->istate,
+ ent->mode, &ent->oid,
+ ent->name, 0, 0);
+ if (!ce)
+ return error(_("make_cache_entry failed for path '%s'"), ent->name);
+
+ add_index_entry(&data->istate, ce, ADD_CACHE_JUST_APPEND);
+ }
- add_index_entry(&data->istate, ce, ADD_CACHE_JUST_APPEND);
return 0;
}
@@ -249,10 +325,12 @@ static int build_index_from_tree(const struct object_id *oid,
base_tree_ent->name[base_tree_ent->len - 1] = '/';
while (cbdata->iter.current) {
+ const char *skipped_prefix;
struct tree_entry *ent = cbdata->iter.current;
+ int cmp;
- int cmp = name_compare(ent->name, ent->len,
- base_tree_ent->name, base_tree_ent->len);
+ cmp = name_compare(ent->name, ent->len,
+ base_tree_ent->name, base_tree_ent->len);
if (!cmp || cmp < 0) {
advance_tree_entry_iterator(&cbdata->iter);
@@ -266,6 +344,11 @@ static int build_index_from_tree(const struct object_id *oid,
goto cleanup_and_return;
} else
continue;
+ } else if (skip_prefix(ent->name, base_tree_ent->name, &skipped_prefix) &&
+ S_ISDIR(base_tree_ent->mode)) {
+ /* The entry is in the current traversed tree entry, so we recurse */
+ result = READ_TREE_RECURSIVE;
+ goto cleanup_and_return;
}
break;
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index ea5a011405e..1d6365141fc 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -89,12 +89,21 @@ test_expect_success 'mktree with invalid submodule OIDs' '
grep "object $tree_oid is a tree but specified type was (commit)" err
'
-test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
- test_must_fail git mktree <all
+test_expect_success 'mktree reads ls-tree -r output (1)' '
+ git mktree <all >actual &&
+ test_cmp tree actual
'
-test_expect_success 'mktree refuses to read ls-tree -r output (2)' '
- test_must_fail git mktree <all.withsub
+test_expect_success 'mktree reads ls-tree -r output (2)' '
+ git mktree <all.withsub >actual &&
+ test_cmp tree.withsub actual
+'
+
+test_expect_success 'mktree de-duplicates files inside directories' '
+ git ls-tree $(cat tree) >everything &&
+ cat <all >top_and_all &&
+ git mktree <top_and_all >actual &&
+ test_cmp tree actual
'
test_expect_success 'mktree fails on malformed input' '
@@ -238,6 +247,50 @@ test_expect_success 'mktree with duplicate entries' '
test_cmp expect actual
'
+test_expect_success 'mktree adds entry after nested entry' '
+ tree_oid=$(cat tree) &&
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ one_oid=$(git rev-parse ${tree_oid}:folder/one) &&
+
+ {
+ printf "040000 tree $folder_oid\tearly\n" &&
+ printf "100644 blob $one_oid\tearly/one\n" &&
+ printf "100644 blob $one_oid\tlater\n" &&
+ printf "040000 tree $EMPTY_TREE\tnew-tree\n" &&
+ printf "100644 blob $one_oid\tnew-tree/one\n" &&
+ printf "100644 blob $one_oid\tzzz\n"
+ } >top.rec &&
+ git mktree <top.rec >tree.actual &&
+
+ {
+ printf "040000 tree $folder_oid\tearly\n" &&
+ printf "100644 blob $one_oid\tlater\n" &&
+ printf "040000 tree $folder_oid\tnew-tree\n" &&
+ printf "100644 blob $one_oid\tzzz\n"
+ } >expect &&
+ git ls-tree $(cat tree.actual) >actual &&
+
+ test_cmp expect actual
+'
+
+test_expect_success 'mktree inserts entries into directories' '
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ one_oid=$(git rev-parse ${tree_oid}:folder/one) &&
+ blob_oid=$(git rev-parse ${tree_oid}:before) &&
+ {
+ printf "040000 tree $folder_oid\tfolder\n" &&
+ printf "100644 blob $blob_oid\tfolder/two\n"
+ } | git mktree >actual &&
+
+ {
+ printf "100644 blob $one_oid\tfolder/one\n" &&
+ printf "100644 blob $blob_oid\tfolder/two\n"
+ } >expect &&
+ git ls-tree -r $(cat actual) >actual &&
+
+ test_cmp expect actual
+'
+
test_expect_success 'mktree with base tree' '
tree_oid=$(cat tree) &&
folder_oid=$(git rev-parse ${tree_oid}:folder) &&
@@ -274,4 +327,50 @@ test_expect_success 'mktree with base tree' '
test_cmp expect actual
'
+test_expect_success 'mktree with base tree (deep)' '
+ tree_oid=$(cat tree) &&
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ before_oid=$(git rev-parse ${tree_oid}:before) &&
+ folder_one_oid=$(git rev-parse ${tree_oid}:folder/one) &&
+ head_oid=$(git rev-parse HEAD) &&
+
+ {
+ printf "100755 blob $before_oid\tfolder/before\n" &&
+ printf "100644 blob $before_oid\tfolder/one.txt\n" &&
+ printf "160000 commit $head_oid\tfolder/sub\n" &&
+ printf "040000 tree $folder_oid\tfolder/one\n" &&
+ printf "040000 tree $folder_oid\tfolder/one/deeper\n"
+ } >top.append &&
+ git mktree <top.append $(cat tree) >tree.actual &&
+
+ {
+ printf "100755 blob $before_oid\tfolder/before\n" &&
+ printf "100644 blob $before_oid\tfolder/one.txt\n" &&
+ printf "100644 blob $folder_one_oid\tfolder/one/deeper/one\n" &&
+ printf "100644 blob $folder_one_oid\tfolder/one/one\n" &&
+ printf "160000 commit $head_oid\tfolder/sub\n"
+ } >expect &&
+ git ls-tree -r $(cat tree.actual) -- folder/ >actual &&
+
+ test_cmp expect actual
+'
+
+test_expect_success 'mktree fails on directory-file conflict' '
+ tree_oid="$(cat tree)" &&
+ blob_oid="$(git rev-parse $tree_oid:folder.txt)" &&
+
+ {
+ printf "100644 blob $blob_oid\ttest\n" &&
+ printf "100644 blob $blob_oid\ttest/deeper\n"
+ } |
+ test_must_fail git mktree 2>err &&
+ grep "You have both test and test/deeper" err &&
+
+ {
+ printf "100644 blob $blob_oid\tfolder/one/deeper/deep\n"
+ } |
+ test_must_fail git mktree $tree_oid 2>err &&
+ grep "You have both folder/one and folder/one/deeper/deep" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 16/16] mktree: remove entries when mode is 0
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (14 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
@ 2024-06-11 18:24 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
16 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-11 18:24 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
If tree entries are specified with a mode with value '0', remove them from
the tree instead of adding/updating them. If the mode is '0', both the
provided type string (if specified) and the object ID of the entry are
ignored.
Note that entries with mode '0' are added to the 'struct tree_ent_array'
with a trailing slash so that it's always treated like a directory. This is
a bit of a hack to ensure that the removal supercedes any preceding entries
with matching names, as well as any nested inside a directory matching its
name.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 4 +++
builtin/mktree.c | 64 ++++++++++++++++++++----------------
t/t1010-mktree.sh | 38 +++++++++++++++++++++
3 files changed, 77 insertions(+), 29 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index db90fdcdc8f..a660438c67f 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -55,6 +55,10 @@ entries nested within one or more directories. These entries are inserted into
the appropriate tree in the base tree-ish if one exists. Otherwise, empty parent
trees are created to contain the entries.
+An entry with a mode of "0" will remove an entry of the same name from the base
+tree-ish. If no tree-ish argument is given, or the entry does not exist in that
+tree, the entry is ignored.
+
The order of the tree entries is normalized by `mktree` so pre-sorting the input
by path is not required. Multiple entries provided with the same path are
deduplicated, with only the last one specified added to the tree.
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 00b77869a56..e94c9ca7e87 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -32,7 +32,7 @@ struct tree_entry {
static inline size_t df_path_len(size_t pathlen, unsigned int mode)
{
- return S_ISDIR(mode) ? pathlen - 1 : pathlen;
+ return (S_ISDIR(mode) || !mode) ? pathlen - 1 : pathlen;
}
struct tree_entry_array {
@@ -106,7 +106,7 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
size_t len_to_copy = len;
/* Normalize and validate entry path */
- if (S_ISDIR(mode)) {
+ if (S_ISDIR(mode) || !mode) {
while(len_to_copy > 0 && is_dir_sep(path[len_to_copy - 1]))
len_to_copy--;
len = len_to_copy + 1; /* add space for trailing slash */
@@ -122,7 +122,7 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
arr->has_nested_entries = 1;
/* Add trailing slash to dir */
- if (S_ISDIR(mode))
+ if (S_ISDIR(mode) || !mode)
ent->name[len - 1] = '/';
}
@@ -208,7 +208,7 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
if (!skip_entry) {
arr->entries[arr->nr++] = curr;
- if (S_ISDIR(curr->mode))
+ if (S_ISDIR(curr->mode) || !curr->mode)
tree_entry_array_push(&parent_dir_ents, curr);
} else {
FREE_AND_NULL(curr);
@@ -272,6 +272,9 @@ static int build_index_from_tree(const struct object_id *oid,
static int add_tree_entry_to_index(struct build_index_data *data,
struct tree_entry *ent)
{
+ if (!ent->mode)
+ return 0;
+
if (ent->expand_dir) {
int ret = 0;
struct pathspec ps = { 0 };
@@ -445,36 +448,39 @@ static int mktree_line(unsigned int mode, struct object_id *oid,
const char *path, void *cbdata)
{
struct mktree_line_data *data = cbdata;
- enum object_type mode_type = object_type(mode);
- struct object_info oi = OBJECT_INFO_INIT;
- enum object_type parsed_obj_type;
- if (obj_type && mode_type != obj_type)
- die("object type (%s) doesn't match mode type (%s)",
- type_name(obj_type), type_name(mode_type));
+ if (mode) {
+ struct object_info oi = OBJECT_INFO_INIT;
+ enum object_type parsed_obj_type;
+ enum object_type mode_type = object_type(mode);
- oi.typep = &parsed_obj_type;
+ if (obj_type && mode_type != obj_type)
+ die("object type (%s) doesn't match mode type (%s)",
+ type_name(obj_type), type_name(mode_type));
- if (oid_object_info_extended(the_repository, oid, &oi,
- OBJECT_INFO_LOOKUP_REPLACE |
- OBJECT_INFO_QUICK |
- OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
- parsed_obj_type = -1;
+ oi.typep = &parsed_obj_type;
- if (parsed_obj_type < 0) {
- if (data->allow_missing || S_ISGITLINK(mode)) {
- ; /* no problem - missing objects & submodules are presumed to be of the right type */
- } else {
- die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
+ if (oid_object_info_extended(the_repository, oid, &oi,
+ OBJECT_INFO_LOOKUP_REPLACE |
+ OBJECT_INFO_QUICK |
+ OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
+ parsed_obj_type = -1;
+
+ if (parsed_obj_type < 0) {
+ if (data->allow_missing || S_ISGITLINK(mode)) {
+ ; /* no problem - missing objects & submodules are presumed to be of the right type */
+ } else {
+ die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
+ }
+ } else if (parsed_obj_type != mode_type) {
+ /*
+ * The object exists but is of the wrong type.
+ * This is a problem regardless of allow_missing
+ * because the new tree entry will never be correct.
+ */
+ die("entry '%s' object %s is a %s but specified type was (%s)",
+ path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
}
- } else if (parsed_obj_type != mode_type) {
- /*
- * The object exists but is of the wrong type.
- * This is a problem regardless of allow_missing
- * because the new tree entry will never be correct.
- */
- die("entry '%s' object %s is a %s but specified type was (%s)",
- path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
}
append_to_tree(mode, oid, path, data->arr, data->literally);
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 1d6365141fc..7cb88e32d4f 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -373,4 +373,42 @@ test_expect_success 'mktree fails on directory-file conflict' '
grep "You have both folder/one and folder/one/deeper/deep" err
'
+test_expect_success 'mktree with remove entries' '
+ tree_oid="$(cat tree)" &&
+ blob_oid="$(git rev-parse $tree_oid:folder.txt)" &&
+
+ {
+ printf "100644 blob $blob_oid\ttest/deeper/deep.txt\n" &&
+ printf "100644 blob $blob_oid\ttest.txt\n" &&
+ printf "100644 blob $blob_oid\texample\n" &&
+ printf "100644 blob $blob_oid\texample.a/file\n" &&
+ printf "100644 blob $blob_oid\texample.txt\n" &&
+ printf "040000 tree $tree_oid\tfolder\n" &&
+ printf "0 $ZERO_OID\tfolder\n" &&
+ printf "0 $ZERO_OID\tmissing\n"
+ } | git mktree >tree.base &&
+
+ {
+ printf "0 $ZERO_OID\texample.txt\n" &&
+ printf "0 $ZERO_OID\ttest/deeper\n"
+ } | git mktree $(cat tree.base) >tree.actual &&
+
+ {
+ printf "100644 blob $blob_oid\texample\n" &&
+ printf "100644 blob $blob_oid\texample.a/file\n" &&
+ printf "100644 blob $blob_oid\ttest.txt\n"
+ } >expect &&
+ git ls-tree -r $(cat tree.actual) >actual &&
+
+ test_cmp expect actual
+'
+
+test_expect_success 'type and oid not checked if entry mode is 0' '
+ # type and oid do not match
+ printf "0 commit $EMPTY_TREE\tfolder.txt\n" |
+ git mktree >tree.actual &&
+
+ test "$(cat tree.actual)" = $EMPTY_TREE
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [PATCH 03/16] mktree: use non-static tree_entry array
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
@ 2024-06-11 18:45 ` Eric Sunshine
2024-06-12 9:40 ` Patrick Steinhardt
1 sibling, 0 replies; 65+ messages in thread
From: Eric Sunshine @ 2024-06-11 18:45 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
On Tue, Jun 11, 2024 at 2:25 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Replace the static 'struct tree_entry **entries' with a non-static 'struct
> tree_entry_array' instance. In later commits, we'll want to be able to
> create additional 'struct tree_entry_array' instances utilizing common
> functionality (create, push, clear, free). To avoid code duplication, create
> the 'struct tree_entry_array' type and add functions that perform those
> basic operations.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> @@ -12,15 +12,39 @@
> +struct tree_entry_array {
> + size_t nr, alloc;
> + struct tree_entry **entries;
> +};
>
> +static void clear_tree_entry_array(struct tree_entry_array *arr)
> +{
> + for (size_t i = 0; i < arr->nr; i++)
> + FREE_AND_NULL(arr->entries[i]);
> + arr->nr = 0;
> +}
> +
> +static void release_tree_entry_array(struct tree_entry_array *arr)
> +{
> + FREE_AND_NULL(arr->entries);
> + arr->nr = arr->alloc = 0;
> +}
For robustness, to make it less likely for future code to leak the
items pointed to by `arr->entries`, it might make sense for
release_tree_entry_array() to call clear_tree_entry_array() before
calling FREE_AND_NULL().
> - ALLOC_GROW(entries, used + 1, alloc);
> - entries[used++] = ent;
> + /* Append the update */
> + tree_entry_array_push(arr, ent);
Nit: the new comment seems superfluous
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 04/16] update-index: generalize 'read_index_info'
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
@ 2024-06-11 22:45 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-11 22:45 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Move 'read_index_info()' into a new header 'index-info.h' and generalize the
> function to call a provided callback for each parsed line. Update
> 'update-index.c' to use this generalized 'read_index_info()', adding the
> callback 'apply_index_info()' to verify the parsed line and update the index
> according to its contents.
>
> The input parsing done by 'read_index_info()' is similar to, but more
> flexible than, the parsing done in 'mktree' by 'mktree_line()' (handling not
> only 'git ls-tree' output but also the outputs of 'git apply --index-info'
> and 'git ls-files --stage' outputs). To make 'mktree' more flexible, a later
> patch will replace mktree's custom parsing with 'read_index_info()'.
"git apply --index-info"?
That is a blast from the past. It no longer exists since 7a988699
(apply: get rid of --index-info in favor of --build-fake-ancestor,
2007-09-17).
As to the scriptability, supporting "ls-files -s" and "ls-tree -r"
output as our input do help, but the third one is not natively
emitted and it is very unlikely that there are third-party tools
that give output in that format. After all these years, I suspect
that it is sufficient to say
"update-index --index-info" and "mktree" both read information
necessary to eventually build trees, but having two separate
parsers is a maintenance burden, so we are massaging the code
from the former to be reusable.
without mentioning where the old third format comes from.
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index d343416ae26..77df380cb54 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -11,6 +11,7 @@
> #include "gettext.h"
> #include "hash.h"
> #include "hex.h"
> +#include "index-info.h"
> #include "lockfile.h"
> #include "quote.h"
> #include "cache-tree.h"
> @@ -509,100 +510,29 @@ static void update_one(const char *path)
> report("add '%s'", path);
> }
>
> +static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
> + const char *path_name, void *cbdata UNUSED)
> {
> + if (!verify_path(path_name, mode)) {
> + fprintf(stderr, "Ignoring path %s\n", path_name);
> + return 0;
> + }
>
> + if (!mode) {
> + /* mode == 0 means there is no such path -- remove */
> + if (remove_file_from_index(the_repository->index, path_name))
> + die("git update-index: unable to remove %s", path_name);
This changes the error message. We used to feed "ptr" (no longer
visible to this function, as the caller unquotes before calling us)
that pointed at the original the user gave to the program; now we
report the path_name which is the result of the unquoting.
> + }
> + else {
> + /* mode ' ' sha1 '\t' name
> + * ptr[-1] points at tab,
> + * ptr[-41] is at the beginning of sha1
> */
> + if (add_cacheinfo(mode, oid, path_name, stage))
> + die("git update-index: unable to update %s", path_name);
But this side used to report the path_name as the result of
unquoting in the original. So the above change would probably be OK
in the name of consistency?
973d6a20 (update-index --index-info: adjust for funny-path quoting.,
2005-10-16) was the origin of the unquoting, and looking at that
commit, I have a feeling that the "ptr" thing above (i.e., the one I
pointed out as changing the behaviour) was simply forgotten (as
opposed to deliberately made to report the original) while updating
the code to deal with quoted original into unquoted paths.
So I think the change is more than OK. It is a very welcome (belated)
bugfix for 973d6a20 ;-).
> }
> +
> + return 0;
> }
It looks a bit disappointing that we die in the callback like above,
when the main parser loop that moved to the other file to be more
reusable is now capable of returning to the caller with an error,
but at this step, it is a good place to stop. A refactor that does
not change the behaviour.
Nicely done.
> diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
> index cc72ead79f3..29696ade0d0 100755
> --- a/t/t2107-update-index-basic.sh
> +++ b/t/t2107-update-index-basic.sh
> @@ -142,4 +142,31 @@ test_expect_success '--index-version' '
> test_must_be_empty actual
> '
>
> +test_expect_success '--index-info fails on malformed input' '
> + # empty line
> + echo "" |
> + test_must_fail git update-index --index-info 2>err &&
> + grep "malformed input line" err &&
Using "test_grep" would make it easier to diagnose when test breaks.
A failing "grep" will be silent. A failing "test_grep" will tell us
"I was told to find THIS, but didn't find any in THAT".
> + # bad whitespace
> + printf "100644 $EMPTY_BLOB A" |
> + test_must_fail git update-index --index-info 2>err &&
> + grep "malformed input line" err &&
> +
> + # invalid stage value
> + printf "100644 $EMPTY_BLOB 5\tA" |
> + test_must_fail git update-index --index-info 2>err &&
> + grep "malformed input line" err &&
> +
> + # invalid OID length
> + printf "100755 abc123\tA" |
> + test_must_fail git update-index --index-info 2>err &&
> + grep "malformed input line" err &&
> +
> + # bad quoting
> + printf "100644 $EMPTY_BLOB\t\"A" |
> + test_must_fail git update-index --index-info 2>err &&
> + grep "bad quoting of path name" err
> +'
> +
> test_done
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 05/16] index-info.c: identify empty input lines in read_index_info
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
@ 2024-06-11 22:52 ` Junio C Hamano
2024-06-18 17:33 ` Victoria Dye
0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2024-06-11 22:52 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Update 'read_index_info()' to return INDEX_INFO_EMPTY_LINE (value 1), rather
> than the default error code (value -1) when the function encounters an empty
> line in stdin. This grants the caller the flexibility to handle such
> scenarios differently than a typical error. In the case of 'update-index',
> we'll still exit with a "malformed input line" error. However, when
> 'read_index_info()' is used to process the input to 'mktree' in a later
> patch, the empty line return value will signal a new tree in --batch mode.
Interesting. We could even introduce "# commented input" but that
is a different story ;-).
I also wonder if we can flip it around and teach read_index_info()
to (1) silently accept and do a callback when it recognises the
input line is one of the supported formats, and (2) send any
unrecognised line, not just an empty one, with "unrecognised" status
code. That way, the caller can handle more than single kind of
"special input line" more easily, perhaps?
Thanks.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 06/16] index-info.c: parse object type in provided in read_index_info
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
@ 2024-06-12 1:54 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-12 1:54 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> If the object type (e.g. "blob", "tree") is identified on a stdin line read
> by 'read_index_info()' (i.e. on lines formatted like the output of 'git
> ls-tree'), parse it into an 'enum object_type' and provide it to the
> 'read_index_info()' callback as an argument. If the type is not provided,
> pass 'OBJ_NONE' instead. If the object type is invalid, return an error.
My recollection is, when we do not know what to expect, we tend to
use OBJ_ANY rather than OBJ_NONE as convention to signal that fact
(e.g., object-name.c:peel_to_type()).
As long as the code path this series touches is internally
consistent, using OBJ_NONE may not hurt but once they need to start
interacting with existing code paths that use OBJ_ANY for that
purpose, we may need to adjust one to match the other.
> The goal of this change is to allow for more thorough validation of the
> provided object type (e.g. against the provided mode) in 'mktree' once
> 'mktree_line' is replaced with 'read_index_info()'. Note, though, that this
> change also strengthens the validation done by 'update-index', since invalid
> type names now trigger an error.
Nice.
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> builtin/update-index.c | 3 ++-
> index-info.c | 16 ++++++++++++----
> index-info.h | 3 ++-
> t/t2107-update-index-basic.sh | 5 +++++
> 4 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index b1b334807f8..8882433b644 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -510,7 +510,8 @@ static void update_one(const char *path)
> report("add '%s'", path);
> }
>
> -static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
> +static int apply_index_info(unsigned int mode, struct object_id *oid,
> + enum object_type obj_type UNUSED, int stage,
> const char *path_name, void *cbdata UNUSED)
> {
> if (!verify_path(path_name, mode)) {
> diff --git a/index-info.c b/index-info.c
> index 735cbf1f476..5d61e61e28f 100644
> --- a/index-info.c
> +++ b/index-info.c
> @@ -18,6 +18,7 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
> char *ptr, *tab;
> char *path_name;
> struct object_id oid;
> + enum object_type obj_type = OBJ_NONE;
> unsigned int mode;
> unsigned long ul;
> int stage;
> @@ -56,18 +57,17 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
>
> if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
> stage = tab[-1] - '0';
> - ptr = tab + 1; /* point at the head of path */
> + path_name = tab + 1; /* point at the head of path */
> tab = tab - 2; /* point at tail of sha1 */
> } else {
> stage = 0;
> - ptr = tab + 1; /* point at the head of path */
> + path_name = tab + 1; /* point at the head of path */
> }
>
> if (get_oid_hex(tab - hexsz, &oid) ||
> tab[-(hexsz + 1)] != ' ')
> goto bad_line;
>
> - path_name = ptr;
> if (!nul_term_line && path_name[0] == '"') {
> strbuf_reset(&uq);
> if (unquote_c_style(&uq, path_name, NULL)) {
> @@ -77,7 +77,15 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
> path_name = uq.buf;
> }
>
> - ret = fn(mode, &oid, stage, path_name, cbdata);
> + /* Get the type, if provided */
> + if (tab - hexsz - 1 > ptr + 1) {
> + if (*(tab - hexsz - 1) != ' ')
> + goto bad_line;
> + *(tab - hexsz - 1) = '\0';
> + obj_type = type_from_string(ptr + 1);
> + }
> +
> + ret = fn(mode, &oid, obj_type, stage, path_name, cbdata);
> if (ret) {
> ret = -1;
> break;
> diff --git a/index-info.h b/index-info.h
> index 1884972021d..767cf304213 100644
> --- a/index-info.h
> +++ b/index-info.h
> @@ -2,8 +2,9 @@
> #define INDEX_INFO_H
>
> #include "hash.h"
> +#include "object.h"
>
> -typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
> +typedef int (*each_index_info_fn)(unsigned int, struct object_id *, enum object_type, int, const char *, void *);
>
> #define INDEX_INFO_EMPTY_LINE 1
>
> diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
> index 29696ade0d0..9c19d24cd4a 100755
> --- a/t/t2107-update-index-basic.sh
> +++ b/t/t2107-update-index-basic.sh
> @@ -153,6 +153,11 @@ test_expect_success '--index-info fails on malformed input' '
> test_must_fail git update-index --index-info 2>err &&
> grep "malformed input line" err &&
>
> + # invalid type
> + printf "100644 bad $EMPTY_BLOB\tA" |
> + test_must_fail git update-index --index-info 2>err &&
> + grep "invalid object type" err &&
> +
> # invalid stage value
> printf "100644 $EMPTY_BLOB 5\tA" |
> test_must_fail git update-index --index-info 2>err &&
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 07/16] mktree: use read_index_info to read stdin lines
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
@ 2024-06-12 2:11 ` Junio C Hamano
2024-06-12 9:40 ` Patrick Steinhardt
1 sibling, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-12 2:11 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Replace the custom input parsing of 'mktree' with 'read_index_info()', which
> handles not only the 'ls-tree' output format it already handles but also the
> other formats compatible with 'update-index'.
Yay.
> This lends some consistency
> across the commands (avoiding the need for two similar implementations for
> input parsing) and adds flexibility to mktree.
>
> Update 'Documentation/git-mktree.txt' to reflect the more permissive input
> format.
Nice.
> DESCRIPTION
> -----------
> -Reads standard input in non-recursive `ls-tree` output format, and creates
> -a tree object. The order of the tree entries is normalized by mktree so
> -pre-sorting the input is not required. The object name of the tree object
> -built is written to the standard output.
> +Reads entry information from stdin and creates a tree object from those entries.
> +The object name of the tree object built is written to the standard output.
pre-sorting is now required? Ah, such details are left to the
section dedicated for the input format. Makes sense.
The line is getting overly long (the first line now is exactly
80-columns); wrapping them to leave a bit of room to grow, like
at around 72-76 columns, would be appreciated.
> +INPUT FORMAT
> +------------
> +Tree entries may be specified in any of the formats compatible with the
> +`--index-info` option to linkgit:git-update-index[1]. The order of the tree
> +entries is normalized by `mktree` so pre-sorting the input by path is not
> +required.
OK. We might want to split the description of the three-formats
into a separate file and include it in here and in the original (I'd
certainly insist doing so if we had three places that want to refer
to it), but we have only two so let's just remember to do so when we
may want to add the third place in the future.
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 15bd908702a..5530257252d 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -6,6 +6,7 @@
> #include "builtin.h"
> #include "gettext.h"
> #include "hex.h"
> +#include "index-info.h"
> #include "quote.h"
> #include "strbuf.h"
> #include "tree.h"
> @@ -93,123 +94,80 @@ static const char *mktree_usage[] = {
> NULL
> };
>
> -static void mktree_line(char *buf, int nul_term_line, int allow_missing,
> - struct tree_entry_array *arr)
> +struct mktree_line_data {
> + struct tree_entry_array *arr;
> + int allow_missing;
> +};
> +
> +static int mktree_line(unsigned int mode, struct object_id *oid,
> + enum object_type obj_type, int stage UNUSED,
> + const char *path, void *cbdata)
> {
> + struct mktree_line_data *data = cbdata;
> + enum object_type mode_type = object_type(mode);
> struct object_info oi = OBJECT_INFO_INIT;
> + enum object_type parsed_obj_type;
>
> + if (obj_type && mode_type != obj_type)
> + die("object type (%s) doesn't match mode type (%s)",
> + type_name(obj_type), type_name(mode_type));
>
> + oi.typep = &parsed_obj_type;
>
> + if (oid_object_info_extended(the_repository, oid, &oi,
> OBJECT_INFO_LOOKUP_REPLACE |
> OBJECT_INFO_QUICK |
> OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
> + parsed_obj_type = -1;
>
> + if (parsed_obj_type < 0) {
> + if (data->allow_missing || S_ISGITLINK(mode)) {
> + ; /* no problem - missing objects & submodules are presumed to be of the right type */
Overlong line?
> } else {
> + die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
> }
Each side of if/else has only a single statement block that does not
want {braces} around it. I wonder if flipping the polarity makes it
easier to follow the logic flow:
if (!data->allow_missing && !S_ISGITLINK(mode))
die("...");
I wonder if we even want to do the oid_object_info_extended() when
we are expecting to see a gitlink. We do not expect to have the
commit in our history (as it is part of the history of a submodule,
which is from a separate project), so even if we found such an
object in our object database, we do not want to do anything with
the information we learn about the object.
So I am wondering if the whole cascade should read more like
if (S_ISGITILNK(mode)) {
... anything goes ...
} else if (oid_object_info_extended(...) < 0 &&
!data->allow_missing) {
... not found ...
} else if (parsed_obj_type != mode_type) {
... found something different from what we expected ...
}
The main loop, thanks to read_index_info() refactoring, got really
easier to read, i.e. compact and clear.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 08/16] mktree: add a --literally option
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
@ 2024-06-12 2:18 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-12 2:18 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Add the '--literally' option to 'git mktree' to allow constructing a tree
> with invalid contents. For now, the only change this represents compared to
> the normal 'git mktree' behavior is no longer sorting the inputs; in later
> commits, deduplicaton and path validation will be added to the command and
> '--literally' will skip those as well.
Hmph, the end state of the the series as a whole may be good, but
the above makes me wonder if we broke bisectability with the
previous step 07/16 where we introduced type checks without touching
any existing tests?
> Certain tests use 'git mktree' to intentionally generate corrupt trees.
> Update these tests to use '--literally' so that they continue functioning
> properly when additional input cleanup & validation is added to the base
> command. Note that, because 'mktree --literally' does not sort entries, some
> of the tests are updated to provide their inputs in tree order; otherwise,
> the test would fail with an "incorrect order" error instead of the error the
> test expects.
Makes sense.
> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
> index 507682ed23e..fb07e40cef0 100644
> --- a/Documentation/git-mktree.txt
> +++ b/Documentation/git-mktree.txt
> @@ -9,7 +9,7 @@ git-mktree - Build a tree-object from formatted tree entries
> SYNOPSIS
> --------
> [verse]
> -'git mktree' [-z] [--missing] [--batch]
> +'git mktree' [-z] [--missing] [--literally] [--batch]
>
> DESCRIPTION
> -----------
> @@ -27,6 +27,13 @@ OPTIONS
> object. This option has no effect on the treatment of gitlink entries
> (aka "submodules") which are always allowed to be missing.
>
> +--literally::
> + Create the tree from the tree entries provided to stdin in the order
> + they are provided without performing additional sorting, deduplication,
> + or path validation on them. This option is primarily useful for creating
> + invalid tree objects to use in tests of how Git deals with various forms
> + of tree corruption.
> +
OK.
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 5530257252d..48019448c1f 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -45,11 +45,11 @@ static void release_tree_entry_array(struct tree_entry_array *arr)
> }
>
> static void append_to_tree(unsigned mode, struct object_id *oid, const char *path,
> - struct tree_entry_array *arr)
> + struct tree_entry_array *arr, int literally)
> {
> struct tree_entry *ent;
> size_t len = strlen(path);
> - if (strchr(path, '/'))
> + if (!literally && strchr(path, '/'))
> die("path %s contains slash", path);
;-).
A tree_entry with a slash in it. Our fsck should be catching them
already, but this will allow constructing a test case more easily.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 09/16] mktree: validate paths more carefully
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
@ 2024-06-12 2:26 ` Junio C Hamano
2024-06-12 19:01 ` Victoria Dye
0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2024-06-12 2:26 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Use 'verify_path' to validate the paths provided as tree entries, ensuring
> we do not create entries with paths not allowed in trees (e.g.,
> .git).
Sensible.
> Also,
> remove trailing slashes on directories before validating, allowing users to
> provide 'folder-name/' as the path for a tree object entry.
Is that a good idea for a plumbing like this command? We would
silently accept these after silently stripping the trailing slash?
040000 tree 82a33d5150d9316378ef1955a49f2a5bf21aaeb2 templates/
100644 blob 1f89ffab4c32bc02b5d955851401628a5b9a540e thread-utils.c/
The former _might_ count as "usability improvement", but if we are
doing the same for the latter we might be going a bit too lenient.
Let's see what really happens in the code.
> @@ -49,10 +50,23 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
> {
> struct tree_entry *ent;
> size_t len = strlen(path);
> - if (!literally && strchr(path, '/'))
> - die("path %s contains slash", path);
>
> - FLEX_ALLOC_MEM(ent, name, path, len);
> + if (literally) {
> + FLEX_ALLOC_MEM(ent, name, path, len);
> + } else {
> + /* Normalize and validate entry path */
> + if (S_ISDIR(mode)) {
> + while(len > 0 && is_dir_sep(path[len - 1]))
> + len--;
> + }
Leave a single SP after "while", please.
We do this only to subtree entries, and all trailing slashes, not
just a single one. OK, but I am not sure if the extra leniency is a
good idea to begin with. "ls-tree" output does not have such a
trailing slashes, so it is unclear whom we are trying to be extra
nice with this.
> + FLEX_ALLOC_MEM(ent, name, path, len);
> +
> + if (!verify_path(ent->name, mode))
> + die(_("invalid path '%s'"), path);
This is the crux of the change. And it is so simple. Very nice.
> + if (strchr(ent->name, '/'))
> + die("path %s contains slash", path);
> + }
> diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
> index e0687cb529f..e0263cb2bf8 100755
> --- a/t/t1010-mktree.sh
> +++ b/t/t1010-mktree.sh
> @@ -173,4 +173,37 @@ test_expect_success '--literally can create invalid trees' '
> grep "not properly sorted" err
> '
>
> +test_expect_success 'mktree validates path' '
> + tree_oid="$(cat tree)" &&
> + blob_oid="$(git rev-parse $tree_oid:a/one)" &&
> + head_oid="$(git rev-parse HEAD)" &&
> +
> + # Valid: tree with or without trailing slash, blob without trailing slash
> + {
> + printf "040000 tree $tree_oid\tfolder1/\n" &&
> + printf "040000 tree $tree_oid\tfolder2\n" &&
> + printf "100644 blob $blob_oid\tfile.txt\n"
> + } | git mktree >actual &&
> +
> + # Invalid: blob with trailing slash
> + printf "100644 blob $blob_oid\ttest/" |
> + test_must_fail git mktree 2>err &&
> + grep "invalid path ${SQ}test/${SQ}" err &&
> +
> + # Invalid: dotdot
> + printf "040000 tree $tree_oid\t../" |
> + test_must_fail git mktree 2>err &&
> + grep "invalid path ${SQ}../${SQ}" err &&
> +
> + # Invalid: dot
> + printf "040000 tree $tree_oid\t." |
> + test_must_fail git mktree 2>err &&
> + grep "invalid path ${SQ}.${SQ}" err &&
> +
> + # Invalid: .git
> + printf "040000 tree $tree_oid\t.git/" |
> + test_must_fail git mktree 2>err &&
> + grep "invalid path ${SQ}.git/${SQ}" err
> +'
> +
> test_done
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 02/16] mktree: rename treeent to tree_entry
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
@ 2024-06-12 9:40 ` Patrick Steinhardt
0 siblings, 0 replies; 65+ messages in thread
From: Patrick Steinhardt @ 2024-06-12 9:40 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
[-- Attachment #1: Type: text/plain, Size: 905 bytes --]
On Tue, Jun 11, 2024 at 06:24:34PM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> Rename the type for better readability, clearly specifying "entry" (instead
> of the "ent" abbreviation) and separating "tree" from "entry".
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> builtin/mktree.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 8b19d440747..c02feb06aff 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -12,7 +12,7 @@
> #include "parse-options.h"
> #include "object-store-ll.h"
>
> -static struct treeent {
> +static struct tree_entry {
> unsigned mode;
> struct object_id oid;
> int len;
This reads a ton better compared to `treeent`, thanks! I've never been a
fan of abbreviations like this in code.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 03/16] mktree: use non-static tree_entry array
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-11 18:45 ` Eric Sunshine
@ 2024-06-12 9:40 ` Patrick Steinhardt
1 sibling, 0 replies; 65+ messages in thread
From: Patrick Steinhardt @ 2024-06-12 9:40 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]
On Tue, Jun 11, 2024 at 06:24:35PM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> Replace the static 'struct tree_entry **entries' with a non-static 'struct
> tree_entry_array' instance. In later commits, we'll want to be able to
> create additional 'struct tree_entry_array' instances utilizing common
> functionality (create, push, clear, free). To avoid code duplication, create
> the 'struct tree_entry_array' type and add functions that perform those
> basic operations.
Thanks for getting rid of more global state, I really appreciate this.
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> builtin/mktree.c | 67 +++++++++++++++++++++++++++++++++---------------
> 1 file changed, 47 insertions(+), 20 deletions(-)
>
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index c02feb06aff..15bd908702a 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -12,15 +12,39 @@
> #include "parse-options.h"
> #include "object-store-ll.h"
>
> -static struct tree_entry {
> +struct tree_entry {
> unsigned mode;
> struct object_id oid;
> int len;
> char name[FLEX_ARRAY];
> -} **entries;
> -static int alloc, used;
> +};
> +
> +struct tree_entry_array {
> + size_t nr, alloc;
> + struct tree_entry **entries;
> +};
>
> -static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
> +static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entry *ent)
> +{
> + ALLOC_GROW(arr->entries, arr->nr + 1, arr->alloc);
> + arr->entries[arr->nr++] = ent;
> +}
> +
> +static void clear_tree_entry_array(struct tree_entry_array *arr)
> +{
> + for (size_t i = 0; i < arr->nr; i++)
> + FREE_AND_NULL(arr->entries[i]);
> + arr->nr = 0;
> +}
> +
> +static void release_tree_entry_array(struct tree_entry_array *arr)
> +{
> + FREE_AND_NULL(arr->entries);
> + arr->nr = arr->alloc = 0;
> +}
Nit: should these be called `tree_entry_array_clear()` and
`tree_entry_array_release()`? This is one of the areas where our coding
guidelines aren't sufficiently clear in my opinion. I personally
strongly prefer `<noun>_<action>` syntax because it groups together
related functionality much better. And while I personally do not use
code completion, it does help others that do because they can simply
type in the noun as prefix and then, via completion, learn about all
related functions.
Most of our general-purpose interfaces follow this naming schema
(strbuf, string_list, strvec, oidset, ...). I think we should document
this accordingly.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 07/16] mktree: use read_index_info to read stdin lines
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-12 2:11 ` Junio C Hamano
@ 2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:35 ` Junio C Hamano
1 sibling, 1 reply; 65+ messages in thread
From: Patrick Steinhardt @ 2024-06-12 9:40 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]
On Tue, Jun 11, 2024 at 06:24:39PM +0000, Victoria Dye via GitGitGadget wrote:
> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
> index 383f09dd333..507682ed23e 100644
> --- a/Documentation/git-mktree.txt
> +++ b/Documentation/git-mktree.txt
> @@ -13,15 +13,13 @@ SYNOPSIS
>
> DESCRIPTION
> -----------
> -Reads standard input in non-recursive `ls-tree` output format, and creates
> -a tree object. The order of the tree entries is normalized by mktree so
> -pre-sorting the input is not required. The object name of the tree object
> -built is written to the standard output.
> +Reads entry information from stdin and creates a tree object from those entries.
> +The object name of the tree object built is written to the standard output.
It makes perfect sense to not single out git-ls-tree(1) anymore. But I
think we should help the reader a bit by continuing to point out which
commands can be used as input here. That can be either here in the
description, further down in the new "INPUT FORMAT" section, or in both
places.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 10/16] mktree: overwrite duplicate entries
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
@ 2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:48 ` Victoria Dye
0 siblings, 1 reply; 65+ messages in thread
From: Patrick Steinhardt @ 2024-06-12 9:40 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]
On Tue, Jun 11, 2024 at 06:24:42PM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> If multiple tree entries with the same name are provided as input to
> 'mktree', only write the last one to the tree. Entries are considered
> duplicates if they have identical names (*not* considering mode); if a blob
> and a tree with the same name are provided, only the last one will be
> written to the tree. A tree with duplicate entries is invalid (per 'git
> fsck'), so that condition should be avoided wherever possible.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> Documentation/git-mktree.txt | 8 ++++---
> builtin/mktree.c | 45 ++++++++++++++++++++++++++++++++----
> t/t1010-mktree.sh | 36 +++++++++++++++++++++++++++--
> 3 files changed, 80 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
> index fb07e40cef0..afbc846d077 100644
> --- a/Documentation/git-mktree.txt
> +++ b/Documentation/git-mktree.txt
> @@ -43,9 +43,11 @@ OPTIONS
> INPUT FORMAT
> ------------
> Tree entries may be specified in any of the formats compatible with the
> -`--index-info` option to linkgit:git-update-index[1]. The order of the tree
> -entries is normalized by `mktree` so pre-sorting the input by path is not
> -required.
> +`--index-info` option to linkgit:git-update-index[1].
> +
> +The order of the tree entries is normalized by `mktree` so pre-sorting the input
> +by path is not required. Multiple entries provided with the same path are
> +deduplicated, with only the last one specified added to the tree.
Hm. I'm not sure whether this is a good idea. With git-mktree(1) being
part of our plumbing layer, you can expect that it's mostly going to be
fed input from scripts. And any script that generates duplicate tree
entries is broken, but we now start to paper over such brokenness
without giving the user any indicator of this. As user of git-mktree(1)
in Gitaly I can certainly say that I'd rather want to see it die instead
of silently fixing my inputs so that I start to notice my own bugs.
So without seeing a strong motivating usecase for this feature I'd think
that git-mktree(1) should reject such inputs and return an error such
that the user can fix their tooling.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 11/16] mktree: create tree using an in-core index
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
@ 2024-06-12 9:40 ` Patrick Steinhardt
0 siblings, 0 replies; 65+ messages in thread
From: Patrick Steinhardt @ 2024-06-12 9:40 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]
On Tue, Jun 11, 2024 at 06:24:43PM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index e9e2134136f..12f68187221 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -24,6 +25,11 @@ struct tree_entry {
> char name[FLEX_ARRAY];
> };
>
> +static inline size_t df_path_len(size_t pathlen, unsigned int mode)
> +{
> + return S_ISDIR(mode) ? pathlen - 1 : pathlen;
I wonder whether we want to have a sanity check that ensures that the
path really does have a trailing slash.
> @@ -120,24 +137,43 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
> }
>
> +static int add_tree_entry_to_index(struct index_state *istate,
> + struct tree_entry *ent)
> +{
> + struct cache_entry *ce;
> + struct strbuf ce_name = STRBUF_INIT;
> + strbuf_add(&ce_name, ent->name, ent->len);
> +
> + ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
> + if (!ce)
> + return error(_("make_cache_entry failed for path '%s'"), ent->name);
I noticed that `make_cache_entry()` will skip over index entries which
are up-to-date, and it will replace entries which are part of the index
but with different information. Is this the motivator for the preceding
commit where we start to overwrite duplicate entries?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 12/16] mktree: use iterator struct to add tree entries to index
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
@ 2024-06-12 9:40 ` Patrick Steinhardt
2024-06-13 18:38 ` Victoria Dye
0 siblings, 1 reply; 65+ messages in thread
From: Patrick Steinhardt @ 2024-06-12 9:40 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
[-- Attachment #1: Type: text/plain, Size: 2439 bytes --]
On Tue, Jun 11, 2024 at 06:24:44PM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> Create 'struct tree_entry_iterator' to manage iteration through a 'struct
> tree_entry_array'. Using an iterator allows for conditional iteration; this
> functionality will be necessary in later commits when performing parallel
> iteration through multiple sets of tree entries.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> builtin/mktree.c | 40 +++++++++++++++++++++++++++++++++++++---
> 1 file changed, 37 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 12f68187221..bee359e9978 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -137,6 +137,38 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
> }
>
> +struct tree_entry_iterator {
> + struct tree_entry *current;
> +
> + /* private */
> + struct {
> + struct tree_entry_array *arr;
> + size_t idx;
> + } priv;
> +};
> +
> +static void init_tree_entry_iterator(struct tree_entry_iterator *iter,
> + struct tree_entry_array *arr)
> +{
> + iter->priv.arr = arr;
> + iter->priv.idx = 0;
> + iter->current = 0 < arr->nr ? arr->entries[0] : NULL;
> +}
Nit: Same comment as before, I think these should rather be named
`tree_entry_iterator_init()` and `tree_entry_iterator_advance()`.
> +/*
> + * Advance the tree entry iterator to the next entry in the array. If no entries
> + * remain, 'current' is set to NULL. Returns the previous 'current' value of the
> + * iterator.
> + */
> +static struct tree_entry *advance_tree_entry_iterator(struct tree_entry_iterator *iter)
> +{
> + struct tree_entry *prev = iter->current;
> + iter->current = (iter->priv.idx + 1) < iter->priv.arr->nr
> + ? iter->priv.arr->entries[++iter->priv.idx]
> + : NULL;
> + return prev;
> +}
I think it's somewhat confusing to have this return a different value
than `current`. When I call `next()`, then I expect the iterator to
return the next item. And after having called `next()`, I expect that
the current value is the one that the previous call to `next()` has
returned.
To avoid confusion, I'd propose to get rid of the `current` member
altogether. It's not needed as we already save the current index and
avoids the confusion.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 14/16] mktree: optionally add to an existing tree
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
@ 2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 19:50 ` Junio C Hamano
2024-06-17 19:23 ` Victoria Dye
0 siblings, 2 replies; 65+ messages in thread
From: Patrick Steinhardt @ 2024-06-12 9:40 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
[-- Attachment #1: Type: text/plain, Size: 841 bytes --]
On Tue, Jun 11, 2024 at 06:24:46PM +0000, Victoria Dye via GitGitGadget wrote:
> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
> index afbc846d077..99abd3c31a6 100644
> --- a/Documentation/git-mktree.txt
> +++ b/Documentation/git-mktree.txt
> @@ -40,6 +40,11 @@ OPTIONS
> optional. Note - if the `-z` option is used, lines are terminated
> with NUL.
>
> +<tree-ish>::
> + If provided, the tree entries provided in stdin are added to this tree
> + rather than a new empty one, replacing existing entries with identical
> + names. Not compatible with `--literally`.
I think it'd be a bit more intuitive is this was an option, like
`--base-tree=` or just `--base=`.
One question that comes up naturally in this context: when I have a base
tree, how do I remove entries from it?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 07/16] mktree: use read_index_info to read stdin lines
2024-06-12 9:40 ` Patrick Steinhardt
@ 2024-06-12 18:35 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-12 18:35 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye
Patrick Steinhardt <ps@pks.im> writes:
> It makes perfect sense to not single out git-ls-tree(1) anymore. But I
> think we should help the reader a bit by continuing to point out which
> commands can be used as input here. That can be either here in the
> description, further down in the new "INPUT FORMAT" section, or in both
> places.
Here is a way to do so, which I alluded to earlier. The original
text is too specific to "update-index" in that it talked about
"stuffing them into the index", which does not apply in the context
of "mktree".
And then it made me realize that "ls-files -s" output has the stage
information, which of course is needed for "update-index" to be able
to recreate the index state from a textual dump, but "mktree" should
reject if given a higher stage entry.
It seems that the code after applying all these 16 patches does not
diagnose it as an error if you feed a non-zero stage. The callback
starts like so.
static int mktree_line(unsigned int mode, struct object_id *oid,
enum object_type obj_type, int stage UNUSED,
const char *path, void *cbdata)
{
I _think_ it should be made an error if the input has non-zero
stage, which would be a sign that it was taken from "ls-files -s"
(or even "ls-files -u"), out of which "git write-tree" will REFUSE
to create a tree object. "mktree" should behave the same way, no?
In any case, here is the documentation split/refactor.
Documentation/git-mktree.txt | 4 +++-
Documentation/git-update-index.txt | 14 +-------------
Documentation/index-info-formats.txt | 13 +++++++++++++
3 files changed, 17 insertions(+), 14 deletions(-)
diff --git c/Documentation/git-mktree.txt w/Documentation/git-mktree.txt
index a660438c67..fefaa83d29 100644
--- c/Documentation/git-mktree.txt
+++ w/Documentation/git-mktree.txt
@@ -48,7 +48,9 @@ OPTIONS
INPUT FORMAT
------------
Tree entries may be specified in any of the formats compatible with the
-`--index-info` option to linkgit:git-update-index[1].
+`--index-info` option to linkgit:git-update-index[1]. That is:
+
+include::index-info-formats.txt[]
Entries may use full pathnames containing directory separators to specify
entries nested within one or more directories. These entries are inserted into
diff --git c/Documentation/git-update-index.txt w/Documentation/git-update-index.txt
index 7128aed540..2287a5d4be 100644
--- c/Documentation/git-update-index.txt
+++ w/Documentation/git-update-index.txt
@@ -280,19 +280,7 @@ USING --INDEX-INFO
multiple entry definitions from the standard input, and designed
specifically for scripts. It can take inputs of three formats:
- . mode SP type SP sha1 TAB path
-+
-This format is to stuff `git ls-tree` output into the index.
-
- . mode SP sha1 SP stage TAB path
-+
-This format is to put higher order stages into the
-index file and matches 'git ls-files --stage' output.
-
- . mode SP sha1 TAB path
-+
-This format is no longer produced by any Git command, but is
-and will continue to be supported by `update-index --index-info`.
+include::index-info-formats.txt[]
To place a higher stage entry to the index, the path should
first be removed by feeding a mode=0 entry for the path, and
diff --git c/Documentation/index-info-formats.txt w/Documentation/index-info-formats.txt
new file mode 100644
index 0000000000..037ebd2432
--- /dev/null
+++ w/Documentation/index-info-formats.txt
@@ -0,0 +1,13 @@
+ . mode SP type SP sha1 TAB path
++
+This format is to use `git ls-tree` output.
+
+ . mode SP sha1 SP stage TAB path
++
+This format allows higher order stages to appear and
+matches 'git ls-files --stage' output.
+
+ . mode SP sha1 TAB path
++
+This format is no longer produced by any Git command, but is
+and will continue to be supported.
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [PATCH 10/16] mktree: overwrite duplicate entries
2024-06-12 9:40 ` Patrick Steinhardt
@ 2024-06-12 18:48 ` Victoria Dye
0 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye @ 2024-06-12 18:48 UTC (permalink / raw)
To: Patrick Steinhardt, Victoria Dye via GitGitGadget; +Cc: git
Patrick Steinhardt wrote:
> On Tue, Jun 11, 2024 at 06:24:42PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> If multiple tree entries with the same name are provided as input to
>> 'mktree', only write the last one to the tree. Entries are considered
>> duplicates if they have identical names (*not* considering mode); if a blob
>> and a tree with the same name are provided, only the last one will be
>> written to the tree. A tree with duplicate entries is invalid (per 'git
>> fsck'), so that condition should be avoided wherever possible.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>> Documentation/git-mktree.txt | 8 ++++---
>> builtin/mktree.c | 45 ++++++++++++++++++++++++++++++++----
>> t/t1010-mktree.sh | 36 +++++++++++++++++++++++++++--
>> 3 files changed, 80 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
>> index fb07e40cef0..afbc846d077 100644
>> --- a/Documentation/git-mktree.txt
>> +++ b/Documentation/git-mktree.txt
>> @@ -43,9 +43,11 @@ OPTIONS
>> INPUT FORMAT
>> ------------
>> Tree entries may be specified in any of the formats compatible with the
>> -`--index-info` option to linkgit:git-update-index[1]. The order of the tree
>> -entries is normalized by `mktree` so pre-sorting the input by path is not
>> -required.
>> +`--index-info` option to linkgit:git-update-index[1].
>> +
>> +The order of the tree entries is normalized by `mktree` so pre-sorting the input
>> +by path is not required. Multiple entries provided with the same path are
>> +deduplicated, with only the last one specified added to the tree.
>
> Hm. I'm not sure whether this is a good idea. With git-mktree(1) being
> part of our plumbing layer, you can expect that it's mostly going to be
> fed input from scripts. And any script that generates duplicate tree
> entries is broken, but we now start to paper over such brokenness
> without giving the user any indicator of this. As user of git-mktree(1)
> in Gitaly I can certainly say that I'd rather want to see it die instead
> of silently fixing my inputs so that I start to notice my own bugs.
'git mktree' already does some cleaning of the inputs by sorting the
entries, presumably so that a valid tree is created rather than one with
ordering errors. Deduplication is also a cleanup of user inputs to ensure a
valid tree is created, so to me it's a consistent extension to existing
behavior. Conversely, rejecting the inputs and failing would be introducing
an error scenario where none existed previously, which to me would be a
bigger deviation.
One potential way to get the kind of functionality you're looking for,
though, might be to combine something like '--literally' and a '--strict'
that validates the tree before writing. Like I mentioned in the cover letter
[1], I do plan to submit a follow-up series with '--strict' (it's just that
this series is already pretty long and it would add 4-ish more patches).
[1] https://lore.kernel.org/git/pull.1746.git.1718130288.gitgitgadget@gmail.com/
> So without seeing a strong motivating usecase for this feature I'd think
> that git-mktree(1) should reject such inputs and return an error such
> that the user can fix their tooling.
Practically, there are a couple of reasons that led me to wanting this
behavior. One is that it allows using data structures with more rigid
integrity checks (like the index & cache tree). The other is that, once the
ability to add nested entries is introduced, the concept of a "duplicate"
gets fuzzier and blocking them entirely could lead to inconsistencies and/or
limited flexibility. If, for example a user wants to create a tree with a
directory 'folder1/' with OID '0123456789012345678901234567890123456789',
but update a blob 'folder1/file1' in it to OID
'0987654321098765432109876543210987654321', the latter is technically a
"duplicate" but rejecting it would avoid being able to create the tree
without first expanding 'folder1/'with something like 'ls-tree', replacing the
appropriate entry, then calling 'mktree'.
>
> Patrick
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 09/16] mktree: validate paths more carefully
2024-06-12 2:26 ` Junio C Hamano
@ 2024-06-12 19:01 ` Victoria Dye
2024-06-12 19:45 ` Junio C Hamano
0 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye @ 2024-06-12 19:01 UTC (permalink / raw)
To: Junio C Hamano, Victoria Dye via GitGitGadget; +Cc: git
Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Also,
>> remove trailing slashes on directories before validating, allowing users to
>> provide 'folder-name/' as the path for a tree object entry.
>
> Is that a good idea for a plumbing like this command? We would
> silently accept these after silently stripping the trailing slash?
>
> 040000 tree 82a33d5150d9316378ef1955a49f2a5bf21aaeb2 templates/
> 100644 blob 1f89ffab4c32bc02b5d955851401628a5b9a540e thread-utils.c/
>
> The former _might_ count as "usability improvement", but if we are
> doing the same for the latter we might be going a bit too lenient.
The trailing slashes are only ignored on tree entries (with mode 040000), so
the latter case would not be allowed (and triggers a 'die()' as it would
today).
>
> Let's see what really happens in the code.
>
>> @@ -49,10 +50,23 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
>> {
>> struct tree_entry *ent;
>> size_t len = strlen(path);
>> - if (!literally && strchr(path, '/'))
>> - die("path %s contains slash", path);
>>
>> - FLEX_ALLOC_MEM(ent, name, path, len);
>> + if (literally) {
>> + FLEX_ALLOC_MEM(ent, name, path, len);
>> + } else {
>> + /* Normalize and validate entry path */
>> + if (S_ISDIR(mode)) {
>> + while(len > 0 && is_dir_sep(path[len - 1]))
>> + len--;
>> + }
>
> Leave a single SP after "while", please.
Ah, sorry about that, thanks for catching it.
> We do this only to subtree entries, and all trailing slashes, not
> just a single one. OK, but I am not sure if the extra leniency is a
> good idea to begin with. "ls-tree" output does not have such a
> trailing slashes, so it is unclear whom we are trying to be extra
> nice with this.
It might be a bit niche, but 'git ls-files -s --sparse' does print
directories with a trailing slash, and in a format that is otherwise
accepted by the command after switching to 'read_index_info' for input
parsing.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 09/16] mktree: validate paths more carefully
2024-06-12 19:01 ` Victoria Dye
@ 2024-06-12 19:45 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-12 19:45 UTC (permalink / raw)
To: Victoria Dye; +Cc: Victoria Dye via GitGitGadget, git
Victoria Dye <vdye@github.com> writes:
> It might be a bit niche, but 'git ls-files -s --sparse' does print
> directories with a trailing slash, ...
OK.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 14/16] mktree: optionally add to an existing tree
2024-06-12 9:40 ` Patrick Steinhardt
@ 2024-06-12 19:50 ` Junio C Hamano
2024-06-17 19:23 ` Victoria Dye
1 sibling, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-12 19:50 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye
Patrick Steinhardt <ps@pks.im> writes:
> On Tue, Jun 11, 2024 at 06:24:46PM +0000, Victoria Dye via GitGitGadget wrote:
>> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
>> index afbc846d077..99abd3c31a6 100644
>> --- a/Documentation/git-mktree.txt
>> +++ b/Documentation/git-mktree.txt
>> @@ -40,6 +40,11 @@ OPTIONS
>> optional. Note - if the `-z` option is used, lines are terminated
>> with NUL.
>>
>> +<tree-ish>::
>> + If provided, the tree entries provided in stdin are added to this tree
>> + rather than a new empty one, replacing existing entries with identical
>> + names. Not compatible with `--literally`.
>
> I think it'd be a bit more intuitive is this was an option, like
> `--base-tree=` or just `--base=`.
>
> One question that comes up naturally in this context: when I have a base
> tree, how do I remove entries from it?
Presumably the same way how you remove entries with "update-index --index-info"?
I.e. mode=0 entry in the input would serve as a signal to remove the
path?
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 12/16] mktree: use iterator struct to add tree entries to index
2024-06-12 9:40 ` Patrick Steinhardt
@ 2024-06-13 18:38 ` Victoria Dye
0 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye @ 2024-06-13 18:38 UTC (permalink / raw)
To: Patrick Steinhardt, Victoria Dye via GitGitGadget; +Cc: git
Patrick Steinhardt wrote:
> On Tue, Jun 11, 2024 at 06:24:44PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Create 'struct tree_entry_iterator' to manage iteration through a 'struct
>> tree_entry_array'. Using an iterator allows for conditional iteration; this
>> functionality will be necessary in later commits when performing parallel
>> iteration through multiple sets of tree entries.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>> builtin/mktree.c | 40 +++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 37 insertions(+), 3 deletions(-)
>>
>> diff --git a/builtin/mktree.c b/builtin/mktree.c
>> index 12f68187221..bee359e9978 100644
>> --- a/builtin/mktree.c
>> +++ b/builtin/mktree.c
>> @@ -137,6 +137,38 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
>> QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
>> }
>>
>> +struct tree_entry_iterator {
>> + struct tree_entry *current;
>> +
>> + /* private */
>> + struct {
>> + struct tree_entry_array *arr;
>> + size_t idx;
>> + } priv;
>> +};
>> +
>> +static void init_tree_entry_iterator(struct tree_entry_iterator *iter,
>> + struct tree_entry_array *arr)
>> +{
>> + iter->priv.arr = arr;
>> + iter->priv.idx = 0;
>> + iter->current = 0 < arr->nr ? arr->entries[0] : NULL;
>> +}
>
> Nit: Same comment as before, I think these should rather be named
> `tree_entry_iterator_init()` and `tree_entry_iterator_advance()`.
That works for me. I'm not attached to the naming convention I used and your
justification for changing it in [1] is reasonable.
[1] https://lore.kernel.org/git/ZmltDQ5SlVvrEDGP@tanuki/
>> +/*
>> + * Advance the tree entry iterator to the next entry in the array. If no entries
>> + * remain, 'current' is set to NULL. Returns the previous 'current' value of the
>> + * iterator.
>> + */
>> +static struct tree_entry *advance_tree_entry_iterator(struct tree_entry_iterator *iter)
>> +{
>> + struct tree_entry *prev = iter->current;
>> + iter->current = (iter->priv.idx + 1) < iter->priv.arr->nr
>> + ? iter->priv.arr->entries[++iter->priv.idx]
>> + : NULL;
>> + return prev;
>> +}
>
> I think it's somewhat confusing to have this return a different value
> than `current`. When I call `next()`, then I expect the iterator to
> return the next item. And after having called `next()`, I expect that
> the current value is the one that the previous call to `next()` has
> returned.
I do see how it's confusing. I was attempting to mimic the various
array/stack "pop" methods throughout the codebase (which return the "popped"
value while moving the stack pointer), but that doesn't really work here
with an iterator.
The only real benefit of this was that it simplified a loop somewhere later
on, but not by a ton. I'll drop the 'tree_entry *' return value from the
method and access 'iter->current' directly where it's needed.
> To avoid confusion, I'd propose to get rid of the `current` member
> altogether. It's not needed as we already save the current index and
> avoids the confusion.
The idea of the iterator is to have callers only ever reference the
'current' value to avoid needing to deal with the array & current index
directly; I find that it majorly simplifies the parallel iteration through
the base tree and entry array in [2]. IOW, in a language with support for
it, 'idx' would be private & 'current' would be public. So I would like to
keep the 'current' value as the publicly-accessible way of interacting with
the iterator (although, as mentioned above, I'm happy to drop it from the
'advance' method return value).
[2] https://lore.kernel.org/git/df0c50dfea3cb77e0070246efdf7a3f070b2ad97.1718130288.git.gitgitgadget@gmail.com/
>
> Patrick
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 14/16] mktree: optionally add to an existing tree
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 19:50 ` Junio C Hamano
@ 2024-06-17 19:23 ` Victoria Dye
1 sibling, 0 replies; 65+ messages in thread
From: Victoria Dye @ 2024-06-17 19:23 UTC (permalink / raw)
To: Patrick Steinhardt, Victoria Dye via GitGitGadget; +Cc: git
Patrick Steinhardt wrote:
> On Tue, Jun 11, 2024 at 06:24:46PM +0000, Victoria Dye via GitGitGadget wrote:
>> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
>> index afbc846d077..99abd3c31a6 100644
>> --- a/Documentation/git-mktree.txt
>> +++ b/Documentation/git-mktree.txt
>> @@ -40,6 +40,11 @@ OPTIONS
>> optional. Note - if the `-z` option is used, lines are terminated
>> with NUL.
>>
>> +<tree-ish>::
>> + If provided, the tree entries provided in stdin are added to this tree
>> + rather than a new empty one, replacing existing entries with identical
>> + names. Not compatible with `--literally`.
>
> I think it'd be a bit more intuitive is this was an option, like
> `--base-tree=` or just `--base=`.
To me, the positional '<tree-ish>' is more intuitive; it's reminiscent of
'read-tree' (but with '--empty' being the default, since there's no
equivalent to the existing index to overwrite). I consider 'read-tree'
relevant in this case because the updated 'mktree' allows a users to create
trees like:
$ git read-tree <tree-ish>
$ git update-index <entries
$ git write-tree
without the intermediate on-disk index. Conversely, there isn't really an
equivalent option to base the name on ('--base' is a bit overloaded, as it
typically refers to a merge/diff base), and I'd like to avoid adding more
potentially-confusing names to the overall Git UX if I can help it (even if
this is a plumbing command).
However, looking at other command documentation, I should at least drop
'[--]' from the usage string. While that is a separator used to signify "end
of options" using 'parse_options()', it's typically only included in the
usage string to separate different sets of positional arguments (e.g.
revisions from pathspecs).
>
> One question that comes up naturally in this context: when I have a base
> tree, how do I remove entries from it?
In patch 16 [1], entries with mode "0" are removed from the tree (similar to
'update-index').
[1] https://lore.kernel.org/git/a90d6d0c943283e9e7bd181cd6e9bb6d4572aaeb.1718130288.git.gitgitgadget@gmail.com/
>
> Patrick
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 05/16] index-info.c: identify empty input lines in read_index_info
2024-06-11 22:52 ` Junio C Hamano
@ 2024-06-18 17:33 ` Victoria Dye
0 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye @ 2024-06-18 17:33 UTC (permalink / raw)
To: Junio C Hamano, Victoria Dye via GitGitGadget; +Cc: git
Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Victoria Dye <vdye@github.com>
>>
>> Update 'read_index_info()' to return INDEX_INFO_EMPTY_LINE (value 1), rather
>> than the default error code (value -1) when the function encounters an empty
>> line in stdin. This grants the caller the flexibility to handle such
>> scenarios differently than a typical error. In the case of 'update-index',
>> we'll still exit with a "malformed input line" error. However, when
>> 'read_index_info()' is used to process the input to 'mktree' in a later
>> patch, the empty line return value will signal a new tree in --batch mode.
>
> Interesting. We could even introduce "# commented input" but that
> is a different story ;-).
>
> I also wonder if we can flip it around and teach read_index_info()
> to (1) silently accept and do a callback when it recognises the
> input line is one of the supported formats, and (2) send any
> unrecognised line, not just an empty one, with "unrecognised" status
> code. That way, the caller can handle more than single kind of
> "special input line" more easily, perhaps?
This is an interesting idea. The simplest way to do this would probably to
have 'bad_line' stop printing the "malformed input line" error and instead
return INDEX_INFO_UNRECOGNIZED_LINE, and pass in a strbuf so that the
"malformed" line is available to the caller.
That seems simple enough, so I'll include it in V2; I can always revert back
to INDEX_INFO_EMPTY_LINE if the generalized approach doesn't work out for
some reason.
>
> Thanks.
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v2 00/17] mktree: support more flexible usage
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (15 preceding siblings ...)
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
` (17 more replies)
16 siblings, 18 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye
The goal of this series is to make 'git mktree' a much more flexible and
powerful tool for constructing arbitrary trees in memory without the use of
an index or worktree. The main additions are:
* Using an optional "base tree" to add or replace entries in an existing
tree rather than creating a new one from scratch.
* Building off of this, having entries with mode "0" indicate "remove
this entry, if it exists, from the tree"
* Handling tree entries inside of subtrees (e.g., folder1/my-file.txt)
It also introduces some quality-of-life updates:
* Using the same input parsing as 'update-index' to allow a wider variety
of tree entry formats.
* Adding deduplication of input entries & more thorough validation of
inputs (with an option to disable both - plus input sorting - if desired
with '--literally').
The implementation change underpinning the new features is completely
revamping how the tree is constructed in memory. Instead of writing a single
tree object into a strbuf and hashing it into the object database, we
construct an in-core sparse index and write out the root tree, as well as
any new subtrees, using the cache tree infrastructure.
The series is organized as follows:
* Commits 1-3 contain miscellaneous small renames/refactors to make the
code more readable & prepare for larger refactoring later.
* Commits 4-7 generalize the input parsing performed by 'read_index_info()'
in 'update-index' and update 'mktree' to use it.
* Commit 8 removes the check on object existence & type from submodule
entries.
* Commit 9 adds the '--literally' option to 'mktree'. Practically, this
option allows tests that currently use 'mktree' to generate corrupt trees
to continue functioning after we strengthen input validations.
* Commits 10 & 11 add input path validation & entry deduplication,
respectively.
* Commit 12 replaces the strbuf-to-object tree creation with construction
of an in-core index & writing out the cache tree.
* Commits 13-15 add the ability to add tree entries to an existing "base"
tree. Takes 3 commits to do it because it requires a bit of finesse
around directory/file deduplication and iterating over a tree with
'read_tree()' with a parallel iteration over the input tree entries.
* Commit 16 allows for deeper paths in the input.
* Commit 17 adds handling for mode '0' as "removal" entries.
I also plan to add a '--strict' option that runs 'fsck' checks on the new
tree(s) before writing to the object database (similar to 'mkttag
--strict'), but this series is pretty long as it is and that part can easily
be separated out into its own series.
Changes since V1
================
* Renamed 'tree_entry_array' & 'tree_entry_iterator' functions to
'tree_entry_array' & 'tree_entry_iterator', respectively.
* Removed the return value from 'tree_entry_iterator_advance'.
* Added call to 'tree_entry_array_clear' in 'tree_entry_array_release',
updated both methods to optionally free tree entries based on a
'free_entries' arg.
* Updated 'read_index_info()':
* Replaced INDEX_INFO_EMPTY_LINE with INDEX_INFO_UNRECOGNIZED_LINE which
is returned when any "malformed" line is found, setting an input strbuf
to the line contents for the caller to deal with.
* Updated default object type value (when no type info is specified) to
'OBJ_ANY' from 'OBJ_NONE'.
* Updated 'mktree' to 'die()' on malformed line with "input format error",
rather than 'error()' with "malformed input line", to avoid unnecessarily
changing the error message.
* Removed check for object existence & type when the entry type is a
submodule, rearranged checks for readability.
* Replaced use of 'grep' with 'test_grep' in new/updated tests.
* Wrapped lines in documentation updates to 76 characters.
* Applied documentation refactor patch in [1].
* Dropped '[--]' from the 'mktree' usage string.
[1] https://lore.kernel.org/git/xmqqle3aovpq.fsf@gitster.g/
Thanks!
* Victoria
Victoria Dye (17):
mktree: use OPT_BOOL
mktree: rename treeent to tree_entry
mktree: use non-static tree_entry array
update-index: generalize 'read_index_info'
index-info.c: return unrecognized lines to caller
index-info.c: parse object type in provided in read_index_info
mktree: use read_index_info to read stdin lines
mktree.c: do not fail on mismatched submodule type
mktree: add a --literally option
mktree: validate paths more carefully
mktree: overwrite duplicate entries
mktree: create tree using an in-core index
mktree: use iterator struct to add tree entries to index
mktree: add directory-file conflict hashmap
mktree: optionally add to an existing tree
mktree: allow deeper paths in input
mktree: remove entries when mode is 0
Documentation/git-mktree.txt | 50 ++-
Documentation/git-update-index.txt | 16 +-
Documentation/index-info-formats.txt | 13 +
Makefile | 1 +
builtin/mktree.c | 592 ++++++++++++++++++++++-----
builtin/update-index.c | 135 ++----
index-info.c | 100 +++++
index-info.h | 15 +
t/t1010-mktree.sh | 350 +++++++++++++++-
t/t1014-read-tree-confusing.sh | 6 +-
t/t1450-fsck.sh | 4 +-
t/t1601-index-bogus.sh | 2 +-
t/t1700-split-index.sh | 6 +-
t/t2107-update-index-basic.sh | 32 ++
t/t7008-filter-branch-null-sha1.sh | 6 +-
t/t7417-submodule-path-url.sh | 2 +-
t/t7450-bad-git-dotfiles.sh | 8 +-
17 files changed, 1088 insertions(+), 250 deletions(-)
create mode 100644 Documentation/index-info-formats.txt
create mode 100644 index-info.c
create mode 100644 index-info.h
base-commit: 8d94cfb54504f2ec9edc7ca3eb5c29a3dd3675ae
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1746%2Fvdye%2Fvdye%2Fmktree-recursive-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1746/vdye/vdye/mktree-recursive-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1746
Range-diff vs v1:
1: 074dc98acc7 = 1: 074dc98acc7 mktree: use OPT_BOOL
2: 4558f35e7bf = 2: 4558f35e7bf mktree: rename treeent to tree_entry
3: 5ade145352f ! 3: d0d5523a32b mktree: use non-static tree_entry array
@@ builtin/mktree.c
+ size_t nr, alloc;
+ struct tree_entry **entries;
+};
-
--static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
++
+static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entry *ent)
+{
+ ALLOC_GROW(arr->entries, arr->nr + 1, arr->alloc);
+ arr->entries[arr->nr++] = ent;
+}
+
-+static void clear_tree_entry_array(struct tree_entry_array *arr)
++static void tree_entry_array_clear(struct tree_entry_array *arr, int free_entries)
+{
-+ for (size_t i = 0; i < arr->nr; i++)
-+ FREE_AND_NULL(arr->entries[i]);
++ if (free_entries) {
++ for (size_t i = 0; i < arr->nr; i++)
++ FREE_AND_NULL(arr->entries[i]);
++ }
+ arr->nr = 0;
+}
-+
-+static void release_tree_entry_array(struct tree_entry_array *arr)
+
+-static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
++static void tree_entry_array_release(struct tree_entry_array *arr, int free_entries)
+{
++ tree_entry_array_clear(arr, free_entries);
+ FREE_AND_NULL(arr->entries);
-+ arr->nr = arr->alloc = 0;
++ arr->alloc = 0;
+}
+
+static void append_to_tree(unsigned mode, struct object_id *oid, const char *path,
@@ builtin/mktree.c: static void append_to_tree(unsigned mode, struct object_id *oi
- ALLOC_GROW(entries, used + 1, alloc);
- entries[used++] = ent;
-+ /* Append the update */
+ tree_entry_array_push(arr, ent);
}
@@ builtin/mktree.c: int cmd_mktree(int ac, const char **av, const char *prefix)
fflush(stdout);
}
- used=0; /* reset tree entry buffer for re-use in batch mode */
-+ clear_tree_entry_array(&arr); /* reset tree entry buffer for re-use in batch mode */
++ tree_entry_array_clear(&arr, 1); /* reset tree entry buffer for re-use in batch mode */
}
+
-+ release_tree_entry_array(&arr);
++ tree_entry_array_release(&arr, 1);
strbuf_release(&sb);
return 0;
}
4: 9d0689e9c28 ! 4: f5473764236 update-index: generalize 'read_index_info'
@@ Commit message
callback 'apply_index_info()' to verify the parsed line and update the index
according to its contents.
- The input parsing done by 'read_index_info()' is similar to, but more
- flexible than, the parsing done in 'mktree' by 'mktree_line()' (handling not
- only 'git ls-tree' output but also the outputs of 'git apply --index-info'
- and 'git ls-files --stage' outputs). To make 'mktree' more flexible, a later
- patch will replace mktree's custom parsing with 'read_index_info()'.
+ Switching to using a callback to validate the parsed entry in 'update-index'
+ results in a slight change to the error message indicating a file could not
+ be removed from the index. The original implementation uses the raw, quoted
+ pathname in the error message, whereas the callback (without access to the
+ raw pathname) uses the unquoted value. However, this change makes the failed
+ removal message consistent with all other error messages in the function,
+ and that consistency is likely more beneficial than not to a user.
+ The motivation for this change is to consolidate the already-similar input
+ parsing logic in 'git update-index' and 'git mktree', avoiding code
+ duplication and the associated maintenance burden. The input formats
+ accepted by 'update-index' are a superset of those accepted by 'mktree', so
+ in a later commit we can replace the input parsing of the latter with
+ 'read_index_info()' without breaking existing usage.
+
+ Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
+ ## Documentation/git-update-index.txt ##
+@@ Documentation/git-update-index.txt: USING --INDEX-INFO
+
+ `--index-info` is a more powerful mechanism that lets you feed
+ multiple entry definitions from the standard input, and designed
+-specifically for scripts. It can take inputs of three formats:
++specifically for scripts. It can take inputs in the following formats:
+
+- . mode SP type SP sha1 TAB path
+-+
+-This format is to stuff `git ls-tree` output into the index.
+-
+- . mode SP sha1 SP stage TAB path
+-+
+-This format is to put higher order stages into the
+-index file and matches 'git ls-files --stage' output.
+-
+- . mode SP sha1 TAB path
+-+
+-This format is no longer produced by any Git command, but is
+-and will continue to be supported by `update-index --index-info`.
++include::index-info-formats.txt[]
+
+ To place a higher stage entry to the index, the path should
+ first be removed by feeding a mode=0 entry for the path, and
+
+ ## Documentation/index-info-formats.txt (new) ##
+@@
++ . mode SP type SP sha1 TAB path
+++
++This format is to use `git ls-tree` output.
++
++ . mode SP sha1 SP stage TAB path
+++
++This format allows higher order stages to appear and
++matches 'git ls-files --stage' output.
++
++ . mode SP sha1 TAB path
+++
++This format is no longer produced by any Git command, but is
++and will continue to be supported.
+
## Makefile ##
@@ Makefile: LIB_OBJS += hex.o
LIB_OBJS += hex-ll.o
@@ builtin/update-index.c: static void update_one(const char *path)
static const char * const update_index_usage[] = {
@@ builtin/update-index.c: static enum parse_opt_result stdin_cacheinfo_callback(
+ struct parse_opt_ctx_t *ctx, const struct option *opt,
const char *arg, int unset)
{
- int *nul_term_line = opt->value;
-+ int ret;
+- int *nul_term_line = opt->value;
++ int ret = 0;
BUG_ON_OPT_NEG(unset);
BUG_ON_OPT_ARG(arg);
-@@ builtin/update-index.c: static enum parse_opt_result stdin_cacheinfo_callback(
- if (ctx->argc != 1)
- return error("option '%s' must be the last argument", opt->long_name);
- allow_add = allow_replace = allow_remove = 1;
+
+- if (ctx->argc != 1)
+- return error("option '%s' must be the last argument", opt->long_name);
+- allow_add = allow_replace = allow_remove = 1;
- read_index_info(*nul_term_line);
-+ ret = read_index_info(*nul_term_line, apply_index_info, NULL);
-+ if (ret)
-+ return -1;
+- return 0;
++ if (ctx->argc != 1) {
++ ret = error("option '%s' must be the last argument", opt->long_name);
++ } else {
++ int *nul_term_line = opt->value;
+
- return 0;
++ allow_add = allow_replace = allow_remove = 1;
++ ret = read_index_info(*nul_term_line, apply_index_info, NULL);
++ if (ret)
++ ret = -1;
++ }
++
++ return ret;
}
+ static enum parse_opt_result stdin_callback(
## index-info.c (new) ##
@@
@@ index-info.c (new)
+ continue;
+
+ bad_line:
-+ ret = error("malformed input line '%s'", buf.buf);
-+ break;
++ die("malformed input line '%s'", buf.buf);
+ }
+ strbuf_release(&buf);
+ strbuf_release(&uq);
@@ t/t2107-update-index-basic.sh: test_expect_success '--index-version' '
+ # empty line
+ echo "" |
+ test_must_fail git update-index --index-info 2>err &&
-+ grep "malformed input line" err &&
++ test_grep "malformed input line" err &&
+
+ # bad whitespace
+ printf "100644 $EMPTY_BLOB A" |
+ test_must_fail git update-index --index-info 2>err &&
-+ grep "malformed input line" err &&
++ test_grep "malformed input line" err &&
+
+ # invalid stage value
+ printf "100644 $EMPTY_BLOB 5\tA" |
+ test_must_fail git update-index --index-info 2>err &&
-+ grep "malformed input line" err &&
++ test_grep "malformed input line" err &&
+
+ # invalid OID length
+ printf "100755 abc123\tA" |
+ test_must_fail git update-index --index-info 2>err &&
-+ grep "malformed input line" err &&
++ test_grep "malformed input line" err &&
+
+ # bad quoting
+ printf "100644 $EMPTY_BLOB\t\"A" |
+ test_must_fail git update-index --index-info 2>err &&
-+ grep "bad quoting of path name" err
++ test_grep "bad quoting of path name" err
+'
+
test_done
5: 7e3bcc16e23 ! 5: 4f4d54c8d07 index-info.c: identify empty input lines in read_index_info
@@ Metadata
Author: Victoria Dye <vdye@github.com>
## Commit message ##
- index-info.c: identify empty input lines in read_index_info
+ index-info.c: return unrecognized lines to caller
- Update 'read_index_info()' to return INDEX_INFO_EMPTY_LINE (value 1), rather
- than the default error code (value -1) when the function encounters an empty
- line in stdin. This grants the caller the flexibility to handle such
- scenarios differently than a typical error. In the case of 'update-index',
- we'll still exit with a "malformed input line" error. However, when
- 'read_index_info()' is used to process the input to 'mktree' in a later
- patch, the empty line return value will signal a new tree in --batch mode.
+ Update 'read_index_info()' to return INDEX_INFO_UNRECOGNIZED_LINE (value 1),
+ rather than die()-ing when the function encounters a line that cannot be
+ parsed according to one of the accepted formats. This grants the caller the
+ flexibility to fall back on custom handling for such lines rather than a
+ returning a catch-all error. In the case of 'update-index', we'll still exit
+ with a "malformed input line" error. However, when 'read_index_info()' is
+ used to process the input to 'mktree' in a later patch, an empty line return
+ value will signal a new tree in --batch mode.
Signed-off-by: Victoria Dye <vdye@github.com>
## builtin/update-index.c ##
@@ builtin/update-index.c: static enum parse_opt_result stdin_cacheinfo_callback(
- return error("option '%s' must be the last argument", opt->long_name);
- allow_add = allow_replace = allow_remove = 1;
- ret = read_index_info(*nul_term_line, apply_index_info, NULL);
-- if (ret)
-+ if (ret == INDEX_INFO_EMPTY_LINE)
-+ return error("malformed input line ''");
-+ else if (ret < 0)
- return -1;
-
- return 0;
+ ret = error("option '%s' must be the last argument", opt->long_name);
+ } else {
+ int *nul_term_line = opt->value;
++ struct strbuf line = STRBUF_INIT;
+
+ allow_add = allow_replace = allow_remove = 1;
+- ret = read_index_info(*nul_term_line, apply_index_info, NULL);
+- if (ret)
++ ret = read_index_info(*nul_term_line, apply_index_info, NULL, &line);
++
++ if (ret == INDEX_INFO_UNRECOGNIZED_LINE)
++ ret = error("malformed input line '%s'", line.buf);
++ else if (ret)
+ ret = -1;
++ strbuf_release(&line);
+ }
+
+ return ret;
## index-info.c ##
+@@
+ #include "strbuf.h"
+ #include "quote.h"
+
+-int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
++int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
++ struct strbuf *line)
+ {
+ const int hexsz = the_hash_algo->hexsz;
+- struct strbuf buf = STRBUF_INIT;
+ struct strbuf uq = STRBUF_INIT;
+ strbuf_getline_fn getline_fn;
+ int ret = 0;
+
+ getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
+- while (getline_fn(&buf, stdin) != EOF) {
++ while (getline_fn(line, stdin) != EOF) {
+ char *ptr, *tab;
+ char *path_name;
+ struct object_id oid;
+@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
+ * index file and matches "git ls-files --stage" output.
+ */
+ errno = 0;
+- ul = strtoul(buf.buf, &ptr, 8);
+- if (ptr == buf.buf || *ptr != ' '
++ ul = strtoul(line->buf, &ptr, 8);
++ if (ptr == line->buf || *ptr != ' '
+ || errno || (unsigned int) ul != ul)
+ goto bad_line;
+ mode = ul;
@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
- unsigned long ul;
- int stage;
+ continue;
-+ if (!buf.len) {
-+ ret = INDEX_INFO_EMPTY_LINE;
-+ break;
-+ }
-+
- /* This reads lines formatted in one of three formats:
- *
- * (1) mode SP sha1 TAB path
+ bad_line:
+- die("malformed input line '%s'", buf.buf);
++ ret = INDEX_INFO_UNRECOGNIZED_LINE;
++ break;
+ }
+- strbuf_release(&buf);
+ strbuf_release(&uq);
++ if (!ret)
++ strbuf_reset(line);
+
+ return ret;
+ }
## index-info.h ##
@@
typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
-+#define INDEX_INFO_EMPTY_LINE 1
++#define INDEX_INFO_UNRECOGNIZED_LINE 1
+
/* Iterate over parsed index info from stdin */
- int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata);
+-int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata);
++int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
++ struct strbuf *line);
+ #endif /* INDEX_INFO_H */
6: f56eee0b48d ! 6: 472efcaf1dd index-info.c: parse object type in provided in read_index_info
@@ Commit message
by 'read_index_info()' (i.e. on lines formatted like the output of 'git
ls-tree'), parse it into an 'enum object_type' and provide it to the
'read_index_info()' callback as an argument. If the type is not provided,
- pass 'OBJ_NONE' instead. If the object type is invalid, return an error.
+ pass 'OBJ_ANY' instead. If the object type is invalid, return an error.
The goal of this change is to allow for more thorough validation of the
provided object type (e.g. against the provided mode) in 'mktree' once
@@ builtin/update-index.c: static void update_one(const char *path)
if (!verify_path(path_name, mode)) {
## index-info.c ##
-@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
+@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
char *ptr, *tab;
char *path_name;
struct object_id oid;
-+ enum object_type obj_type = OBJ_NONE;
++ enum object_type obj_type = OBJ_ANY;
unsigned int mode;
unsigned long ul;
int stage;
-@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
+@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
stage = tab[-1] - '0';
@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void
if (!nul_term_line && path_name[0] == '"') {
strbuf_reset(&uq);
if (unquote_c_style(&uq, path_name, NULL)) {
-@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
+@@ index-info.c: int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
path_name = uq.buf;
}
@@ index-info.h
-typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
+typedef int (*each_index_info_fn)(unsigned int, struct object_id *, enum object_type, int, const char *, void *);
- #define INDEX_INFO_EMPTY_LINE 1
+ #define INDEX_INFO_UNRECOGNIZED_LINE 1
## t/t2107-update-index-basic.sh ##
@@ t/t2107-update-index-basic.sh: test_expect_success '--index-info fails on malformed input' '
test_must_fail git update-index --index-info 2>err &&
- grep "malformed input line" err &&
+ test_grep "malformed input line" err &&
+ # invalid type
+ printf "100644 bad $EMPTY_BLOB\tA" |
+ test_must_fail git update-index --index-info 2>err &&
-+ grep "invalid object type" err &&
++ test_grep "invalid object type" err &&
+
# invalid stage value
printf "100644 $EMPTY_BLOB 5\tA" |
7: 8d1e1eaa70b ! 7: 9dc8e16a7fc mktree: use read_index_info to read stdin lines
@@ Commit message
across the commands (avoiding the need for two similar implementations for
input parsing) and adds flexibility to mktree.
+ It should be noted that, while the error messages are largely preserved in
+ the refactor, one does change: "fatal: invalid quoting" is now "error: bad
+ quoting of path name".
+
Update 'Documentation/git-mktree.txt' to reflect the more permissive input
- format.
+ format, as well as make a note about rejecting stage values higher than 0.
+ Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
## Documentation/git-mktree.txt ##
@@ Documentation/git-mktree.txt: SYNOPSIS
-a tree object. The order of the tree entries is normalized by mktree so
-pre-sorting the input is not required. The object name of the tree object
-built is written to the standard output.
-+Reads entry information from stdin and creates a tree object from those entries.
-+The object name of the tree object built is written to the standard output.
++Reads entry information from stdin and creates a tree object from those
++entries. The object name of the tree object built is written to the standard
++output.
OPTIONS
-------
@@ Documentation/git-mktree.txt: OPTIONS
+INPUT FORMAT
+------------
+Tree entries may be specified in any of the formats compatible with the
-+`--index-info` option to linkgit:git-update-index[1]. The order of the tree
-+entries is normalized by `mktree` so pre-sorting the input by path is not
-+required.
++`--index-info` option to linkgit:git-update-index[1]:
++
++include::index-info-formats.txt[]
++
++Note that if the `stage` of a tree entry is given, the value must be 0.
++Higher stages represent conflicted files in an index; this information
++cannot be represented in a tree object. The command will fail without
++writing the tree if a higher order stage is specified for any entry.
++
++The order of the tree entries is normalized by `mktree` so pre-sorting the
++input by path is not required.
+
GIT
---
@@ builtin/mktree.c: static const char *mktree_usage[] = {
+};
+
+static int mktree_line(unsigned int mode, struct object_id *oid,
-+ enum object_type obj_type, int stage UNUSED,
++ enum object_type obj_type, int stage,
+ const char *path, void *cbdata)
{
- char *ptr, *ntr;
@@ builtin/mktree.c: static const char *mktree_usage[] = {
- die("invalid quoting");
- path = to_free = strbuf_detach(&p_uq, NULL);
- }
-+ if (obj_type && mode_type != obj_type)
-+ die("object type (%s) doesn't match mode type (%s)",
-+ type_name(obj_type), type_name(mode_type));
++ if (stage)
++ die(_("path '%s' is unmerged"), path);
- /*
- * Object type is redundantly derivable three ways.
@@ builtin/mktree.c: static const char *mktree_usage[] = {
- die("entry '%s' object type (%s) doesn't match mode type (%s)",
- path, ptr, type_name(mode_type));
- }
++ if (obj_type != OBJ_ANY && mode_type != obj_type)
++ die("object type (%s) doesn't match mode type (%s)",
++ type_name(obj_type), type_name(mode_type));
++
+ oi.typep = &parsed_obj_type;
- /* Check the type of object identified by oid without fetching objects */
@@ builtin/mktree.c: static const char *mktree_usage[] = {
OBJECT_INFO_QUICK |
OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
- obj_type = -1;
-+ parsed_obj_type = -1;
-
+-
- if (obj_type < 0) {
- if (allow_missing) {
- ; /* no problem - missing objects are presumed to be of the right type */
-+ if (parsed_obj_type < 0) {
-+ if (data->allow_missing || S_ISGITLINK(mode)) {
-+ ; /* no problem - missing objects & submodules are presumed to be of the right type */
- } else {
+- } else {
- die("entry '%s' object %s is unavailable", path, oid_to_hex(&oid));
- }
- } else {
@@ builtin/mktree.c: static const char *mktree_usage[] = {
- */
- die("entry '%s' object %s is a %s but specified type was (%s)",
- path, oid_to_hex(&oid), type_name(obj_type), type_name(mode_type));
+- }
++ parsed_obj_type = -1;
++
++ if (parsed_obj_type < 0) {
++ /*
++ * There are two conditions where the object being missing
++ * is acceptable:
++ *
++ * - We're explicitly allowing it with --missing.
++ * - The object is a submodule, which we wouldn't expect to
++ * be in this repo anyway.
++ *
++ * If neither condition is met, die().
++ */
++ if (!data->allow_missing && !S_ISGITLINK(mode))
+ die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
- }
++
+ } else if (parsed_obj_type != mode_type) {
+ /*
+ * The object exists but is of the wrong type.
@@ builtin/mktree.c: static const char *mktree_usage[] = {
struct tree_entry_array arr = { 0 };
- strbuf_getline_fn getline_fn;
+ struct mktree_line_data mktree_line_data = { .arr = &arr };
++ struct strbuf line = STRBUF_INIT;
+ int ret;
const struct option option[] = {
@@ builtin/mktree.c: static const char *mktree_usage[] = {
- break;
- }
- if (sb.buf[0] == '\0') {
-- /* empty lines denote tree boundaries in batch mode */
-- if (is_batch_mode)
-- break;
-- die("input format error: (blank line only valid in batch mode)");
-- }
-- mktree_line(sb.buf, nul_term_line, allow_missing, &arr);
-- }
-- if (is_batch_mode && got_eof && arr.nr < 1) {
+
+ do {
-+ ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data);
++ ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data, &line);
+ if (ret < 0)
+ break;
+
-+ /* empty lines denote tree boundaries in batch mode */
-+ if (ret > 0 && !is_batch_mode)
-+ die("input format error: (blank line only valid in batch mode)");
++ if (ret == INDEX_INFO_UNRECOGNIZED_LINE) {
++ if (line.len)
++ die("input format error: %s", line.buf);
++ else if (!is_batch_mode)
+ /* empty lines denote tree boundaries in batch mode */
+- if (is_batch_mode)
+- break;
+ die("input format error: (blank line only valid in batch mode)");
+- }
+- mktree_line(sb.buf, nul_term_line, allow_missing, &arr);
+ }
+- if (is_batch_mode && got_eof && arr.nr < 1) {
+
+ if (is_batch_mode && !ret && arr.nr < 1) {
/*
@@ builtin/mktree.c: static const char *mktree_usage[] = {
@@ builtin/mktree.c: int cmd_mktree(int ac, const char **av, const char *prefix)
fflush(stdout);
}
- clear_tree_entry_array(&arr); /* reset tree entry buffer for re-use in batch mode */
+ tree_entry_array_clear(&arr, 1); /* reset tree entry buffer for re-use in batch mode */
- }
+ } while (ret > 0);
- release_tree_entry_array(&arr);
++ strbuf_release(&line);
+ tree_entry_array_release(&arr, 1);
- strbuf_release(&sb);
- return 0;
+ return !!ret;
@@ t/t1010-mktree.sh: test_expect_success 'ls-tree output in wrong order given to m
+ tree_oid="$(cat tree)" &&
+ printf "160000 commit $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
-+ grep "object $tree_oid is a tree but specified type was (commit)" err
++ test_grep "object $tree_oid is a tree but specified type was (commit)" err
+'
+
test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
@@ t/t1010-mktree.sh: test_expect_success 'mktree refuses to read ls-tree -r output
+ # empty line without --batch
+ echo "" |
+ test_must_fail git mktree 2>err &&
-+ grep "blank line only valid in batch mode" err &&
++ test_grep "blank line only valid in batch mode" err &&
+
+ # bad whitespace
+ printf "100644 blob $EMPTY_BLOB A" |
+ test_must_fail git mktree 2>err &&
-+ grep "malformed input line" err &&
++ test_grep "input format error" err &&
+
+ # invalid type
+ printf "100644 bad $EMPTY_BLOB\tA" |
+ test_must_fail git mktree 2>err &&
-+ grep "invalid object type" err &&
++ test_grep "invalid object type" err &&
+
+ # invalid OID length
+ printf "100755 blob abc123\tA" |
+ test_must_fail git mktree 2>err &&
-+ grep "malformed input line" err &&
++ test_grep "input format error" err &&
+
+ # bad quoting
+ printf "100644 blob $EMPTY_BLOB\t\"A" |
+ test_must_fail git mktree 2>err &&
-+ grep "bad quoting of path name" err
++ test_grep "bad quoting of path name" err
+'
+
+test_expect_success 'mktree fails on mode mismatch' '
@@ t/t1010-mktree.sh: test_expect_success 'mktree refuses to read ls-tree -r output
+ # mode-type mismatch
+ printf "100644 tree $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
-+ grep "object type (tree) doesn${SQ}t match mode type (blob)" err &&
++ test_grep "object type (tree) doesn${SQ}t match mode type (blob)" err &&
+
+ # mode-object mismatch (no --missing)
+ printf "100644 $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
-+ grep "object $tree_oid is a tree but specified type was (blob)" err
++ test_grep "object $tree_oid is a tree but specified type was (blob)" err
+'
+
test_done
-: ----------- > 8: 8a3264afd0c mktree.c: do not fail on mismatched submodule type
8: b497dc90687 ! 9: e640a385b3d mktree: add a --literally option
@@ Documentation/git-mktree.txt: OPTIONS
+--literally::
+ Create the tree from the tree entries provided to stdin in the order
-+ they are provided without performing additional sorting, deduplication,
-+ or path validation on them. This option is primarily useful for creating
-+ invalid tree objects to use in tests of how Git deals with various forms
-+ of tree corruption.
++ they are provided without performing additional sorting,
++ deduplication, or path validation on them. This option is primarily
++ useful for creating invalid tree objects to use in tests of how Git
++ deals with various forms of tree corruption.
+
--batch::
Allow building of more than one tree object before exiting. Each
tree is separated by a single blank line. The final newline is
## builtin/mktree.c ##
-@@ builtin/mktree.c: static void release_tree_entry_array(struct tree_entry_array *arr)
+@@ builtin/mktree.c: static void tree_entry_array_release(struct tree_entry_array *arr, int free_entr
}
static void append_to_tree(unsigned mode, struct object_id *oid, const char *path,
@@ builtin/mktree.c: static void write_tree(struct tree_entry_array *arr, struct ob
static int mktree_line(unsigned int mode, struct object_id *oid,
@@ builtin/mktree.c: static int mktree_line(unsigned int mode, struct object_id *oid,
- path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
+ }
}
- append_to_tree(mode, oid, path, data->arr);
@@ builtin/mktree.c: int cmd_mktree(int ac, const char **av, const char *prefix)
## t/t1010-mktree.sh ##
@@ t/t1010-mktree.sh: test_expect_success 'mktree fails on mode mismatch' '
- grep "object $tree_oid is a tree but specified type was (blob)" err
+ test_grep "object $tree_oid is a tree but specified type was (blob)" err
'
+test_expect_success '--literally can create invalid trees' '
@@ t/t1010-mktree.sh: test_expect_success 'mktree fails on mode mismatch' '
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
-+ grep "contains duplicate file entries" err &&
++ test_grep "contains duplicate file entries" err &&
+
+ # disallowed path
+ {
@@ t/t1010-mktree.sh: test_expect_success 'mktree fails on mode mismatch' '
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
-+ grep "contains ${SQ}.git${SQ}" err &&
++ test_grep "contains ${SQ}.git${SQ}" err &&
+
+ # nested entry
+ {
@@ t/t1010-mktree.sh: test_expect_success 'mktree fails on mode mismatch' '
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
-+ grep "contains full pathnames" err &&
++ test_grep "contains full pathnames" err &&
+
+ # bad entry ordering
+ {
@@ t/t1010-mktree.sh: test_expect_success 'mktree fails on mode mismatch' '
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
-+ grep "not properly sorted" err
++ test_grep "not properly sorted" err
+'
+
test_done
9: 4f9f77e693c ! 10: 2eb207064f8 mktree: validate paths more carefully
@@ builtin/mktree.c: static void append_to_tree(unsigned mode, struct object_id *oi
## t/t1010-mktree.sh ##
@@ t/t1010-mktree.sh: test_expect_success '--literally can create invalid trees' '
- grep "not properly sorted" err
+ test_grep "not properly sorted" err
'
+test_expect_success 'mktree validates path' '
@@ t/t1010-mktree.sh: test_expect_success '--literally can create invalid trees' '
+ # Invalid: blob with trailing slash
+ printf "100644 blob $blob_oid\ttest/" |
+ test_must_fail git mktree 2>err &&
-+ grep "invalid path ${SQ}test/${SQ}" err &&
++ test_grep "invalid path ${SQ}test/${SQ}" err &&
+
+ # Invalid: dotdot
+ printf "040000 tree $tree_oid\t../" |
+ test_must_fail git mktree 2>err &&
-+ grep "invalid path ${SQ}../${SQ}" err &&
++ test_grep "invalid path ${SQ}../${SQ}" err &&
+
+ # Invalid: dot
+ printf "040000 tree $tree_oid\t." |
+ test_must_fail git mktree 2>err &&
-+ grep "invalid path ${SQ}.${SQ}" err &&
++ test_grep "invalid path ${SQ}.${SQ}" err &&
+
+ # Invalid: .git
+ printf "040000 tree $tree_oid\t.git/" |
+ test_must_fail git mktree 2>err &&
-+ grep "invalid path ${SQ}.git/${SQ}" err
++ test_grep "invalid path ${SQ}.git/${SQ}" err
+'
+
test_done
10: b59a4ad8ab4 ! 11: fb555658057 mktree: overwrite duplicate entries
@@ Commit message
Signed-off-by: Victoria Dye <vdye@github.com>
## Documentation/git-mktree.txt ##
-@@ Documentation/git-mktree.txt: OPTIONS
- INPUT FORMAT
- ------------
- Tree entries may be specified in any of the formats compatible with the
--`--index-info` option to linkgit:git-update-index[1]. The order of the tree
--entries is normalized by `mktree` so pre-sorting the input by path is not
--required.
-+`--index-info` option to linkgit:git-update-index[1].
-+
-+The order of the tree entries is normalized by `mktree` so pre-sorting the input
-+by path is not required. Multiple entries provided with the same path are
-+deduplicated, with only the last one specified added to the tree.
+@@ Documentation/git-mktree.txt: cannot be represented in a tree object. The command will fail without
+ writing the tree if a higher order stage is specified for any entry.
+
+ The order of the tree entries is normalized by `mktree` so pre-sorting the
+-input by path is not required.
++input by path is not required. Multiple entries provided with the same path
++are deduplicated, with only the last one specified added to the tree.
GIT
---
@@ builtin/mktree.c
struct object_id oid;
int len;
@@ builtin/mktree.c: static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
+ ent->len = len;
oidcpy(&ent->oid, oid);
- /* Append the update */
+ ent->order = arr->nr;
tree_entry_array_push(arr, ent);
}
@@ t/t1010-mktree.sh: test_expect_success '--literally can create invalid trees' '
# Valid: tree with or without trailing slash, blob without trailing slash
@@ t/t1010-mktree.sh: test_expect_success 'mktree validates path' '
- grep "invalid path ${SQ}.git/${SQ}" err
+ test_grep "invalid path ${SQ}.git/${SQ}" err
'
+test_expect_success 'mktree with duplicate entries' '
11: 130413f2404 = 12: 2333775ba5b mktree: create tree using an in-core index
12: 94d6615d634 ! 13: 56f28efff54 mktree: use iterator struct to add tree entries to index
@@ builtin/mktree.c: static void sort_and_dedup_tree_entry_array(struct tree_entry_
+ } priv;
+};
+
-+static void init_tree_entry_iterator(struct tree_entry_iterator *iter,
++static void tree_entry_iterator_init(struct tree_entry_iterator *iter,
+ struct tree_entry_array *arr)
+{
+ iter->priv.arr = arr;
@@ builtin/mktree.c: static void sort_and_dedup_tree_entry_array(struct tree_entry_
+}
+
+/*
-+ * Advance the tree entry iterator to the next entry in the array. If no entries
-+ * remain, 'current' is set to NULL. Returns the previous 'current' value of the
-+ * iterator.
++ * Advance the tree entry iterator to the next entry in the array. If no
++ * entries remain, 'current' is set to NULL.
+ */
-+static struct tree_entry *advance_tree_entry_iterator(struct tree_entry_iterator *iter)
++static void tree_entry_iterator_advance(struct tree_entry_iterator *iter)
+{
-+ struct tree_entry *prev = iter->current;
+ iter->current = (iter->priv.idx + 1) < iter->priv.arr->nr
+ ? iter->priv.arr->entries[++iter->priv.idx]
+ : NULL;
-+ return prev;
+}
+
static int add_tree_entry_to_index(struct index_state *istate,
@@ builtin/mktree.c: static int add_tree_entry_to_index(struct index_state *istate,
static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
{
+ struct tree_entry_iterator iter = { NULL };
-+ struct tree_entry *ent;
struct index_state istate = INDEX_STATE_INIT(the_repository);
istate.sparse_index = 1;
@@ builtin/mktree.c: static int add_tree_entry_to_index(struct index_state *istate,
- /* Construct an in-memory index from the provided entries */
- for (size_t i = 0; i < arr->nr; i++) {
- struct tree_entry *ent = arr->entries[i];
-+ init_tree_entry_iterator(&iter, arr);
-
++ tree_entry_iterator_init(&iter, arr);
++
+ /* Construct an in-memory index from the provided entries & base tree */
-+ while ((ent = advance_tree_entry_iterator(&iter))) {
++ while (iter.current) {
++ struct tree_entry *ent = iter.current;
++ tree_entry_iterator_advance(&iter);
+
if (add_tree_entry_to_index(&istate, ent))
die(_("failed to add tree entry '%s'"), ent->name);
- }
13: 68acdd3c5ee ! 14: 6f6d78ae7ac mktree: add directory-file conflict hashmap
@@ builtin/mktree.c: static inline size_t df_path_len(size_t pathlen, unsigned int
+ name_compare(e1->name, e1_len, e2->name, e2_len);
+}
+
-+static void init_tree_entry_array(struct tree_entry_array *arr)
++static void tree_entry_array_init(struct tree_entry_array *arr)
+{
+ hashmap_init(&arr->df_name_hash, df_name_hash_cmp, NULL, 0);
+}
@@ builtin/mktree.c: static inline size_t df_path_len(size_t pathlen, unsigned int
static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entry *ent)
{
ALLOC_GROW(arr->entries, arr->nr + 1, arr->alloc);
-@@ builtin/mktree.c: static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entr
-
- static void clear_tree_entry_array(struct tree_entry_array *arr)
- {
-+ hashmap_clear(&arr->df_name_hash);
- for (size_t i = 0; i < arr->nr; i++)
- FREE_AND_NULL(arr->entries[i]);
+@@ builtin/mktree.c: static void tree_entry_array_clear(struct tree_entry_array *arr, int free_entrie
+ FREE_AND_NULL(arr->entries[i]);
+ }
arr->nr = 0;
-@@ builtin/mktree.c: static void clear_tree_entry_array(struct tree_entry_array *arr)
-
- static void release_tree_entry_array(struct tree_entry_array *arr)
- {
+ hashmap_clear(&arr->df_name_hash);
- FREE_AND_NULL(arr->entries);
- arr->nr = arr->alloc = 0;
}
+
+ static void tree_entry_array_release(struct tree_entry_array *arr, int free_entries)
@@ builtin/mktree.c: static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
/* Sort again to order the entries for tree insertion */
ignore_mode = 0;
@@ builtin/mktree.c: int cmd_mktree(int ac, const char **av, const char *prefix)
ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
-+ init_tree_entry_array(&arr);
++ tree_entry_array_init(&arr);
+
do {
- ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data);
+ ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data, &line);
if (ret < 0)
14: df0c50dfea3 ! 15: 4b88f84b933 mktree: optionally add to an existing tree
@@ Documentation/git-mktree.txt: git-mktree - Build a tree-object from formatted tr
--------
[verse]
-'git mktree' [-z] [--missing] [--literally] [--batch]
-+'git mktree' [-z] [--missing] [--literally] [--batch] [--] [<tree-ish>]
++'git mktree' [-z] [--missing] [--literally] [--batch] [<tree-ish>]
DESCRIPTION
-----------
@@ Documentation/git-mktree.txt: OPTIONS
with NUL.
+<tree-ish>::
-+ If provided, the tree entries provided in stdin are added to this tree
-+ rather than a new empty one, replacing existing entries with identical
-+ names. Not compatible with `--literally`.
++ If provided, the tree entries provided in stdin are added to this
++ tree rather than a new empty one, replacing existing entries with
++ identical names. Not compatible with `--literally`.
+
INPUT FORMAT
------------
@@ builtin/mktree.c
#include "object-store-ll.h"
struct tree_entry {
-@@ builtin/mktree.c: static struct tree_entry *advance_tree_entry_iterator(struct tree_entry_iterator
- return prev;
+@@ builtin/mktree.c: static void tree_entry_iterator_advance(struct tree_entry_iterator *iter)
+ : NULL;
}
-static int add_tree_entry_to_index(struct index_state *istate,
@@ builtin/mktree.c: static struct tree_entry *advance_tree_entry_iterator(struct t
+ unsigned mode, void *context)
{
- struct tree_entry_iterator iter = { NULL };
+- struct index_state istate = INDEX_STATE_INIT(the_repository);
+- istate.sparse_index = 1;
+ int result;
+ struct tree_entry *base_tree_ent;
+ struct build_index_data *cbdata = context;
@@ builtin/mktree.c: static struct tree_entry *advance_tree_entry_iterator(struct t
+ int cmp = name_compare(ent->name, ent->len,
+ base_tree_ent->name, base_tree_ent->len);
+ if (!cmp || cmp < 0) {
-+ advance_tree_entry_iterator(&cbdata->iter);
++ tree_entry_iterator_advance(&cbdata->iter);
+
+ if (add_tree_entry_to_index(cbdata, ent) < 0) {
+ result = error(_("failed to add tree entry '%s'"), ent->name);
@@ builtin/mktree.c: static struct tree_entry *advance_tree_entry_iterator(struct t
+ struct object_id *oid)
+{
+ struct build_index_data cbdata = { 0 };
- struct tree_entry *ent;
-- struct index_state istate = INDEX_STATE_INIT(the_repository);
-- istate.sparse_index = 1;
+ struct pathspec ps = { 0 };
sort_and_dedup_tree_entry_array(arr);
-- init_tree_entry_iterator(&iter, arr);
+- tree_entry_iterator_init(&iter, arr);
+ index_state_init(&cbdata.istate, the_repository);
+ cbdata.istate.sparse_index = 1;
-+ init_tree_entry_iterator(&cbdata.iter, arr);
++ tree_entry_iterator_init(&cbdata.iter, arr);
+ cbdata.df_name_hash = &arr->df_name_hash;
/* Construct an in-memory index from the provided entries & base tree */
-- while ((ent = advance_tree_entry_iterator(&iter))) {
-- if (add_tree_entry_to_index(&istate, ent))
+- while (iter.current) {
+- struct tree_entry *ent = iter.current;
+- tree_entry_iterator_advance(&iter);
+ if (base_tree &&
+ read_tree(the_repository, base_tree, &ps, build_index_from_tree, &cbdata) < 0)
+ die(_("failed to create tree"));
+
-+ while ((ent = advance_tree_entry_iterator(&cbdata.iter))) {
++ while (cbdata.iter.current) {
++ struct tree_entry *ent = cbdata.iter.current;
++ tree_entry_iterator_advance(&cbdata.iter);
+
+- if (add_tree_entry_to_index(&istate, ent))
+ if (add_tree_entry_to_index(&cbdata, ent))
die(_("failed to add tree entry '%s'"), ent->name);
}
@@ builtin/mktree.c: static void write_tree_literally(struct tree_entry_array *arr,
static const char *mktree_usage[] = {
- "git mktree [-z] [--missing] [--literally] [--batch]",
-+ "git mktree [-z] [--missing] [--literally] [--batch] [--] [<tree-ish>]",
++ "git mktree [-z] [--missing] [--literally] [--batch] [<tree-ish>]",
NULL
};
@@ builtin/mktree.c: int cmd_mktree(int ac, const char **av, const char *prefix)
- int is_batch_mode = 0;
struct tree_entry_array arr = { 0 };
struct mktree_line_data mktree_line_data = { .arr = &arr };
+ struct strbuf line = STRBUF_INIT;
+ struct tree *base_tree = NULL;
int ret;
@@ builtin/mktree.c: int cmd_mktree(int ac, const char **av, const char *prefix)
+ die(_("not a tree object: %s"), oid_to_hex(&base_tree_oid));
+ }
- init_tree_entry_array(&arr);
+ tree_entry_array_init(&arr);
@@ builtin/mktree.c: int cmd_mktree(int ac, const char **av, const char *prefix)
if (mktree_line_data.literally)
15: 058354f45f7 ! 16: 46756c4e314 mktree: allow deeper paths in input
@@ Commit message
Signed-off-by: Victoria Dye <vdye@github.com>
## Documentation/git-mktree.txt ##
-@@ Documentation/git-mktree.txt: INPUT FORMAT
- Tree entries may be specified in any of the formats compatible with the
- `--index-info` option to linkgit:git-update-index[1].
+@@ Documentation/git-mktree.txt: Higher stages represent conflicted files in an index; this information
+ cannot be represented in a tree object. The command will fail without
+ writing the tree if a higher order stage is specified for any entry.
+Entries may use full pathnames containing directory separators to specify
-+entries nested within one or more directories. These entries are inserted into
-+the appropriate tree in the base tree-ish if one exists. Otherwise, empty parent
-+trees are created to contain the entries.
++entries nested within one or more directories. These entries are inserted
++into the appropriate tree in the base tree-ish if one exists. Otherwise,
++empty parent trees are created to contain the entries.
+
- The order of the tree entries is normalized by `mktree` so pre-sorting the input
- by path is not required. Multiple entries provided with the same path are
- deduplicated, with only the last one specified added to the tree.
+ The order of the tree entries is normalized by `mktree` so pre-sorting the
+ input by path is not required. Multiple entries provided with the same path
+ are deduplicated, with only the last one specified added to the tree.
## builtin/mktree.c ##
@@ builtin/mktree.c: struct tree_entry {
@@ builtin/mktree.c: static void tree_entry_array_push(struct tree_entry_array *arr
+ return arr->entries[--arr->nr];
+}
+
- static void clear_tree_entry_array(struct tree_entry_array *arr)
+ static void tree_entry_array_clear(struct tree_entry_array *arr, int free_entries)
{
- hashmap_clear(&arr->df_name_hash);
+ if (free_entries) {
@@ builtin/mktree.c: static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
if (!verify_path(ent->name, mode))
@@ builtin/mktree.c: static void sort_and_dedup_tree_entry_array(struct tree_entry_
+ }
+ }
+
-+ release_tree_entry_array(&parent_dir_ents);
++ tree_entry_array_release(&parent_dir_ents, 0);
+ }
+
/* Finally, initialize the directory-file conflict hash map */
@@ builtin/mktree.c: static int build_index_from_tree(const struct object_id *oid,
+ cmp = name_compare(ent->name, ent->len,
+ base_tree_ent->name, base_tree_ent->len);
if (!cmp || cmp < 0) {
- advance_tree_entry_iterator(&cbdata->iter);
+ tree_entry_iterator_advance(&cbdata->iter);
@@ builtin/mktree.c: static int build_index_from_tree(const struct object_id *oid,
goto cleanup_and_return;
@@ builtin/mktree.c: static int build_index_from_tree(const struct object_id *oid,
## t/t1010-mktree.sh ##
@@ t/t1010-mktree.sh: test_expect_success 'mktree with invalid submodule OIDs' '
- grep "object $tree_oid is a tree but specified type was (commit)" err
+ done
'
-test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
@@ t/t1010-mktree.sh: test_expect_success 'mktree with base tree' '
+ printf "100644 blob $blob_oid\ttest/deeper\n"
+ } |
+ test_must_fail git mktree 2>err &&
-+ grep "You have both test and test/deeper" err &&
++ test_grep "You have both test and test/deeper" err &&
+
+ {
+ printf "100644 blob $blob_oid\tfolder/one/deeper/deep\n"
+ } |
+ test_must_fail git mktree $tree_oid 2>err &&
-+ grep "You have both folder/one and folder/one/deeper/deep" err
++ test_grep "You have both folder/one and folder/one/deeper/deep" err
+'
+
test_done
16: a90d6d0c943 ! 17: d392c440b8a mktree: remove entries when mode is 0
@@ Commit message
Signed-off-by: Victoria Dye <vdye@github.com>
## Documentation/git-mktree.txt ##
-@@ Documentation/git-mktree.txt: entries nested within one or more directories. These entries are inserted into
- the appropriate tree in the base tree-ish if one exists. Otherwise, empty parent
- trees are created to contain the entries.
+@@ Documentation/git-mktree.txt: entries nested within one or more directories. These entries are inserted
+ into the appropriate tree in the base tree-ish if one exists. Otherwise,
+ empty parent trees are created to contain the entries.
-+An entry with a mode of "0" will remove an entry of the same name from the base
-+tree-ish. If no tree-ish argument is given, or the entry does not exist in that
-+tree, the entry is ignored.
++An entry with a mode of "0" will remove an entry of the same name from the
++base tree-ish. If no tree-ish argument is given, or the entry does not exist
++in that tree, the entry is ignored.
+
- The order of the tree entries is normalized by `mktree` so pre-sorting the input
- by path is not required. Multiple entries provided with the same path are
- deduplicated, with only the last one specified added to the tree.
+ The order of the tree entries is normalized by `mktree` so pre-sorting the
+ input by path is not required. Multiple entries provided with the same path
+ are deduplicated, with only the last one specified added to the tree.
## builtin/mktree.c ##
@@ builtin/mktree.c: struct tree_entry {
@@ builtin/mktree.c: static int build_index_from_tree(const struct object_id *oid,
int ret = 0;
struct pathspec ps = { 0 };
@@ builtin/mktree.c: static int mktree_line(unsigned int mode, struct object_id *oid,
- const char *path, void *cbdata)
- {
- struct mktree_line_data *data = cbdata;
-- enum object_type mode_type = object_type(mode);
-- struct object_info oi = OBJECT_INFO_INIT;
-- enum object_type parsed_obj_type;
-
-- if (obj_type && mode_type != obj_type)
-- die("object type (%s) doesn't match mode type (%s)",
-- type_name(obj_type), type_name(mode_type));
-+ if (mode) {
-+ struct object_info oi = OBJECT_INFO_INIT;
-+ enum object_type parsed_obj_type;
-+ enum object_type mode_type = object_type(mode);
-
-- oi.typep = &parsed_obj_type;
-+ if (obj_type && mode_type != obj_type)
-+ die("object type (%s) doesn't match mode type (%s)",
-+ type_name(obj_type), type_name(mode_type));
+ if (stage)
+ die(_("path '%s' is unmerged"), path);
-- if (oid_object_info_extended(the_repository, oid, &oi,
-- OBJECT_INFO_LOOKUP_REPLACE |
-- OBJECT_INFO_QUICK |
-- OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
-- parsed_obj_type = -1;
-+ oi.typep = &parsed_obj_type;
-
-- if (parsed_obj_type < 0) {
-- if (data->allow_missing || S_ISGITLINK(mode)) {
-- ; /* no problem - missing objects & submodules are presumed to be of the right type */
-- } else {
-- die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
-+ if (oid_object_info_extended(the_repository, oid, &oi,
-+ OBJECT_INFO_LOOKUP_REPLACE |
-+ OBJECT_INFO_QUICK |
-+ OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
-+ parsed_obj_type = -1;
++ /* OID ignored for zero-mode entries; append unconditionally */
++ if (!mode)
++ goto append_entry;
+
-+ if (parsed_obj_type < 0) {
-+ if (data->allow_missing || S_ISGITLINK(mode)) {
-+ ; /* no problem - missing objects & submodules are presumed to be of the right type */
-+ } else {
-+ die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
-+ }
-+ } else if (parsed_obj_type != mode_type) {
-+ /*
-+ * The object exists but is of the wrong type.
-+ * This is a problem regardless of allow_missing
-+ * because the new tree entry will never be correct.
-+ */
-+ die("entry '%s' object %s is a %s but specified type was (%s)",
-+ path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
+ if (obj_type != OBJ_ANY && mode_type != obj_type)
+ die("object type (%s) doesn't match mode type (%s)",
+ type_name(obj_type), type_name(mode_type));
+@@ builtin/mktree.c: static int mktree_line(unsigned int mode, struct object_id *oid,
}
-- } else if (parsed_obj_type != mode_type) {
-- /*
-- * The object exists but is of the wrong type.
-- * This is a problem regardless of allow_missing
-- * because the new tree entry will never be correct.
-- */
-- die("entry '%s' object %s is a %s but specified type was (%s)",
-- path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
}
++append_entry:
append_to_tree(mode, oid, path, data->arr, data->literally);
+ return 0;
+ }
## t/t1010-mktree.sh ##
@@ t/t1010-mktree.sh: test_expect_success 'mktree fails on directory-file conflict' '
- grep "You have both folder/one and folder/one/deeper/deep" err
+ test_grep "You have both folder/one and folder/one/deeper/deep" err
'
+test_expect_success 'mktree with remove entries' '
--
gitgitgadget
^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH v2 01/17] mktree: use OPT_BOOL
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
` (16 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Replace 'OPT_SET_INT' with 'OPT_BOOL' for the options '--missing' and
'--batch'. The use of 'OPT_SET_INT' in these options is identical to
'OPT_BOOL', but 'OPT_BOOL' provides slightly simpler syntax.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 9a22d4e2773..8b19d440747 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -162,8 +162,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
const struct option option[] = {
OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")),
- OPT_SET_INT( 0 , "missing", &allow_missing, N_("allow missing objects"), 1),
- OPT_SET_INT( 0 , "batch", &is_batch_mode, N_("allow creation of more than one tree"), 1),
+ OPT_BOOL(0, "missing", &allow_missing, N_("allow missing objects")),
+ OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")),
OPT_END()
};
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 02/17] mktree: rename treeent to tree_entry
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
` (15 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Rename the type for better readability, clearly specifying "entry" (instead
of the "ent" abbreviation) and separating "tree" from "entry".
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 8b19d440747..c02feb06aff 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -12,7 +12,7 @@
#include "parse-options.h"
#include "object-store-ll.h"
-static struct treeent {
+static struct tree_entry {
unsigned mode;
struct object_id oid;
int len;
@@ -22,7 +22,7 @@ static int alloc, used;
static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
{
- struct treeent *ent;
+ struct tree_entry *ent;
size_t len = strlen(path);
if (strchr(path, '/'))
die("path %s contains slash", path);
@@ -38,8 +38,8 @@ static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
static int ent_compare(const void *a_, const void *b_)
{
- struct treeent *a = *(struct treeent **)a_;
- struct treeent *b = *(struct treeent **)b_;
+ struct tree_entry *a = *(struct tree_entry **)a_;
+ struct tree_entry *b = *(struct tree_entry **)b_;
return base_name_compare(a->name, a->len, a->mode,
b->name, b->len, b->mode);
}
@@ -56,7 +56,7 @@ static void write_tree(struct object_id *oid)
strbuf_init(&buf, size);
for (i = 0; i < used; i++) {
- struct treeent *ent = entries[i];
+ struct tree_entry *ent = entries[i];
strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 03/17] mktree: use non-static tree_entry array
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
` (14 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Replace the static 'struct tree_entry **entries' with a non-static 'struct
tree_entry_array' instance. In later commits, we'll want to be able to
create additional 'struct tree_entry_array' instances utilizing common
functionality (create, push, clear, free). To avoid code duplication, create
the 'struct tree_entry_array' type and add functions that perform those
basic operations.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 69 ++++++++++++++++++++++++++++++++++--------------
1 file changed, 49 insertions(+), 20 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index c02feb06aff..a96ea10bf95 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -12,15 +12,42 @@
#include "parse-options.h"
#include "object-store-ll.h"
-static struct tree_entry {
+struct tree_entry {
unsigned mode;
struct object_id oid;
int len;
char name[FLEX_ARRAY];
-} **entries;
-static int alloc, used;
+};
+
+struct tree_entry_array {
+ size_t nr, alloc;
+ struct tree_entry **entries;
+};
+
+static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entry *ent)
+{
+ ALLOC_GROW(arr->entries, arr->nr + 1, arr->alloc);
+ arr->entries[arr->nr++] = ent;
+}
+
+static void tree_entry_array_clear(struct tree_entry_array *arr, int free_entries)
+{
+ if (free_entries) {
+ for (size_t i = 0; i < arr->nr; i++)
+ FREE_AND_NULL(arr->entries[i]);
+ }
+ arr->nr = 0;
+}
-static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
+static void tree_entry_array_release(struct tree_entry_array *arr, int free_entries)
+{
+ tree_entry_array_clear(arr, free_entries);
+ FREE_AND_NULL(arr->entries);
+ arr->alloc = 0;
+}
+
+static void append_to_tree(unsigned mode, struct object_id *oid, const char *path,
+ struct tree_entry_array *arr)
{
struct tree_entry *ent;
size_t len = strlen(path);
@@ -32,8 +59,7 @@ static void append_to_tree(unsigned mode, struct object_id *oid, char *path)
ent->len = len;
oidcpy(&ent->oid, oid);
- ALLOC_GROW(entries, used + 1, alloc);
- entries[used++] = ent;
+ tree_entry_array_push(arr, ent);
}
static int ent_compare(const void *a_, const void *b_)
@@ -44,19 +70,18 @@ static int ent_compare(const void *a_, const void *b_)
b->name, b->len, b->mode);
}
-static void write_tree(struct object_id *oid)
+static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
{
struct strbuf buf;
- size_t size;
- int i;
+ size_t size = 0;
- QSORT(entries, used, ent_compare);
- for (size = i = 0; i < used; i++)
- size += 32 + entries[i]->len;
+ QSORT(arr->entries, arr->nr, ent_compare);
+ for (size_t i = 0; i < arr->nr; i++)
+ size += 32 + arr->entries[i]->len;
strbuf_init(&buf, size);
- for (i = 0; i < used; i++) {
- struct tree_entry *ent = entries[i];
+ for (size_t i = 0; i < arr->nr; i++) {
+ struct tree_entry *ent = arr->entries[i];
strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
}
@@ -70,7 +95,8 @@ static const char *mktree_usage[] = {
NULL
};
-static void mktree_line(char *buf, int nul_term_line, int allow_missing)
+static void mktree_line(char *buf, int nul_term_line, int allow_missing,
+ struct tree_entry_array *arr)
{
char *ptr, *ntr;
const char *p;
@@ -146,7 +172,7 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
}
}
- append_to_tree(mode, &oid, path);
+ append_to_tree(mode, &oid, path, arr);
free(to_free);
}
@@ -158,6 +184,7 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
int allow_missing = 0;
int is_batch_mode = 0;
int got_eof = 0;
+ struct tree_entry_array arr = { 0 };
strbuf_getline_fn getline_fn;
const struct option option[] = {
@@ -182,9 +209,9 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
break;
die("input format error: (blank line only valid in batch mode)");
}
- mktree_line(sb.buf, nul_term_line, allow_missing);
+ mktree_line(sb.buf, nul_term_line, allow_missing, &arr);
}
- if (is_batch_mode && got_eof && used < 1) {
+ if (is_batch_mode && got_eof && arr.nr < 1) {
/*
* Execution gets here if the last tree entry is terminated with a
* new-line. The final new-line has been made optional to be
@@ -192,12 +219,14 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
*/
; /* skip creating an empty tree */
} else {
- write_tree(&oid);
+ write_tree(&arr, &oid);
puts(oid_to_hex(&oid));
fflush(stdout);
}
- used=0; /* reset tree entry buffer for re-use in batch mode */
+ tree_entry_array_clear(&arr, 1); /* reset tree entry buffer for re-use in batch mode */
}
+
+ tree_entry_array_release(&arr, 1);
strbuf_release(&sb);
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 04/17] update-index: generalize 'read_index_info'
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (2 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
` (13 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Move 'read_index_info()' into a new header 'index-info.h' and generalize the
function to call a provided callback for each parsed line. Update
'update-index.c' to use this generalized 'read_index_info()', adding the
callback 'apply_index_info()' to verify the parsed line and update the index
according to its contents.
Switching to using a callback to validate the parsed entry in 'update-index'
results in a slight change to the error message indicating a file could not
be removed from the index. The original implementation uses the raw, quoted
pathname in the error message, whereas the callback (without access to the
raw pathname) uses the unquoted value. However, this change makes the failed
removal message consistent with all other error messages in the function,
and that consistency is likely more beneficial than not to a user.
The motivation for this change is to consolidate the already-similar input
parsing logic in 'git update-index' and 'git mktree', avoiding code
duplication and the associated maintenance burden. The input formats
accepted by 'update-index' are a superset of those accepted by 'mktree', so
in a later commit we can replace the input parsing of the latter with
'read_index_info()' without breaking existing usage.
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-update-index.txt | 16 +---
Documentation/index-info-formats.txt | 13 +++
Makefile | 1 +
builtin/update-index.c | 129 +++++++--------------------
index-info.c | 90 +++++++++++++++++++
index-info.h | 11 +++
t/t2107-update-index-basic.sh | 27 ++++++
7 files changed, 177 insertions(+), 110 deletions(-)
create mode 100644 Documentation/index-info-formats.txt
create mode 100644 index-info.c
create mode 100644 index-info.h
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 7128aed5405..e52aecb845d 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -278,21 +278,9 @@ USING --INDEX-INFO
`--index-info` is a more powerful mechanism that lets you feed
multiple entry definitions from the standard input, and designed
-specifically for scripts. It can take inputs of three formats:
+specifically for scripts. It can take inputs in the following formats:
- . mode SP type SP sha1 TAB path
-+
-This format is to stuff `git ls-tree` output into the index.
-
- . mode SP sha1 SP stage TAB path
-+
-This format is to put higher order stages into the
-index file and matches 'git ls-files --stage' output.
-
- . mode SP sha1 TAB path
-+
-This format is no longer produced by any Git command, but is
-and will continue to be supported by `update-index --index-info`.
+include::index-info-formats.txt[]
To place a higher stage entry to the index, the path should
first be removed by feeding a mode=0 entry for the path, and
diff --git a/Documentation/index-info-formats.txt b/Documentation/index-info-formats.txt
new file mode 100644
index 00000000000..037ebd24321
--- /dev/null
+++ b/Documentation/index-info-formats.txt
@@ -0,0 +1,13 @@
+ . mode SP type SP sha1 TAB path
++
+This format is to use `git ls-tree` output.
+
+ . mode SP sha1 SP stage TAB path
++
+This format allows higher order stages to appear and
+matches 'git ls-files --stage' output.
+
+ . mode SP sha1 TAB path
++
+This format is no longer produced by any Git command, but is
+and will continue to be supported.
diff --git a/Makefile b/Makefile
index 2f5f16847ae..db9604e59c3 100644
--- a/Makefile
+++ b/Makefile
@@ -1037,6 +1037,7 @@ LIB_OBJS += hex.o
LIB_OBJS += hex-ll.o
LIB_OBJS += hook.o
LIB_OBJS += ident.o
+LIB_OBJS += index-info.o
LIB_OBJS += json-writer.o
LIB_OBJS += kwset.o
LIB_OBJS += levenshtein.o
diff --git a/builtin/update-index.c b/builtin/update-index.c
index d343416ae26..fddf59b54c1 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -11,6 +11,7 @@
#include "gettext.h"
#include "hash.h"
#include "hex.h"
+#include "index-info.h"
#include "lockfile.h"
#include "quote.h"
#include "cache-tree.h"
@@ -509,100 +510,29 @@ static void update_one(const char *path)
report("add '%s'", path);
}
-static void read_index_info(int nul_term_line)
+static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
+ const char *path_name, void *cbdata UNUSED)
{
- const int hexsz = the_hash_algo->hexsz;
- struct strbuf buf = STRBUF_INIT;
- struct strbuf uq = STRBUF_INIT;
- strbuf_getline_fn getline_fn;
+ if (!verify_path(path_name, mode)) {
+ fprintf(stderr, "Ignoring path %s\n", path_name);
+ return 0;
+ }
- getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
- while (getline_fn(&buf, stdin) != EOF) {
- char *ptr, *tab;
- char *path_name;
- struct object_id oid;
- unsigned int mode;
- unsigned long ul;
- int stage;
-
- /* This reads lines formatted in one of three formats:
- *
- * (1) mode SP sha1 TAB path
- * The first format is what "git apply --index-info"
- * reports, and used to reconstruct a partial tree
- * that is used for phony merge base tree when falling
- * back on 3-way merge.
- *
- * (2) mode SP type SP sha1 TAB path
- * The second format is to stuff "git ls-tree" output
- * into the index file.
- *
- * (3) mode SP sha1 SP stage TAB path
- * This format is to put higher order stages into the
- * index file and matches "git ls-files --stage" output.
+ if (!mode) {
+ /* mode == 0 means there is no such path -- remove */
+ if (remove_file_from_index(the_repository->index, path_name))
+ die("git update-index: unable to remove %s", path_name);
+ }
+ else {
+ /* mode ' ' sha1 '\t' name
+ * ptr[-1] points at tab,
+ * ptr[-41] is at the beginning of sha1
*/
- errno = 0;
- ul = strtoul(buf.buf, &ptr, 8);
- if (ptr == buf.buf || *ptr != ' '
- || errno || (unsigned int) ul != ul)
- goto bad_line;
- mode = ul;
-
- tab = strchr(ptr, '\t');
- if (!tab || tab - ptr < hexsz + 1)
- goto bad_line;
-
- if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
- stage = tab[-1] - '0';
- ptr = tab + 1; /* point at the head of path */
- tab = tab - 2; /* point at tail of sha1 */
- }
- else {
- stage = 0;
- ptr = tab + 1; /* point at the head of path */
- }
-
- if (get_oid_hex(tab - hexsz, &oid) ||
- tab[-(hexsz + 1)] != ' ')
- goto bad_line;
-
- path_name = ptr;
- if (!nul_term_line && path_name[0] == '"') {
- strbuf_reset(&uq);
- if (unquote_c_style(&uq, path_name, NULL)) {
- die("git update-index: bad quoting of path name");
- }
- path_name = uq.buf;
- }
-
- if (!verify_path(path_name, mode)) {
- fprintf(stderr, "Ignoring path %s\n", path_name);
- continue;
- }
-
- if (!mode) {
- /* mode == 0 means there is no such path -- remove */
- if (remove_file_from_index(the_repository->index, path_name))
- die("git update-index: unable to remove %s",
- ptr);
- }
- else {
- /* mode ' ' sha1 '\t' name
- * ptr[-1] points at tab,
- * ptr[-41] is at the beginning of sha1
- */
- ptr[-(hexsz + 2)] = ptr[-1] = 0;
- if (add_cacheinfo(mode, &oid, path_name, stage))
- die("git update-index: unable to update %s",
- path_name);
- }
- continue;
-
- bad_line:
- die("malformed index info %s", buf.buf);
+ if (add_cacheinfo(mode, oid, path_name, stage))
+ die("git update-index: unable to update %s", path_name);
}
- strbuf_release(&buf);
- strbuf_release(&uq);
+
+ return 0;
}
static const char * const update_index_usage[] = {
@@ -848,16 +778,23 @@ static enum parse_opt_result stdin_cacheinfo_callback(
struct parse_opt_ctx_t *ctx, const struct option *opt,
const char *arg, int unset)
{
- int *nul_term_line = opt->value;
+ int ret = 0;
BUG_ON_OPT_NEG(unset);
BUG_ON_OPT_ARG(arg);
- if (ctx->argc != 1)
- return error("option '%s' must be the last argument", opt->long_name);
- allow_add = allow_replace = allow_remove = 1;
- read_index_info(*nul_term_line);
- return 0;
+ if (ctx->argc != 1) {
+ ret = error("option '%s' must be the last argument", opt->long_name);
+ } else {
+ int *nul_term_line = opt->value;
+
+ allow_add = allow_replace = allow_remove = 1;
+ ret = read_index_info(*nul_term_line, apply_index_info, NULL);
+ if (ret)
+ ret = -1;
+ }
+
+ return ret;
}
static enum parse_opt_result stdin_callback(
diff --git a/index-info.c b/index-info.c
new file mode 100644
index 00000000000..8ccaac5487b
--- /dev/null
+++ b/index-info.c
@@ -0,0 +1,90 @@
+#include "git-compat-util.h"
+#include "index-info.h"
+#include "hash.h"
+#include "hex.h"
+#include "strbuf.h"
+#include "quote.h"
+
+int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
+{
+ const int hexsz = the_hash_algo->hexsz;
+ struct strbuf buf = STRBUF_INIT;
+ struct strbuf uq = STRBUF_INIT;
+ strbuf_getline_fn getline_fn;
+ int ret = 0;
+
+ getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
+ while (getline_fn(&buf, stdin) != EOF) {
+ char *ptr, *tab;
+ char *path_name;
+ struct object_id oid;
+ unsigned int mode;
+ unsigned long ul;
+ int stage;
+
+ /* This reads lines formatted in one of three formats:
+ *
+ * (1) mode SP sha1 TAB path
+ * The first format is what "git apply --index-info"
+ * reports, and used to reconstruct a partial tree
+ * that is used for phony merge base tree when falling
+ * back on 3-way merge.
+ *
+ * (2) mode SP type SP sha1 TAB path
+ * The second format is to stuff "git ls-tree" output
+ * into the index file.
+ *
+ * (3) mode SP sha1 SP stage TAB path
+ * This format is to put higher order stages into the
+ * index file and matches "git ls-files --stage" output.
+ */
+ errno = 0;
+ ul = strtoul(buf.buf, &ptr, 8);
+ if (ptr == buf.buf || *ptr != ' '
+ || errno || (unsigned int) ul != ul)
+ goto bad_line;
+ mode = ul;
+
+ tab = strchr(ptr, '\t');
+ if (!tab || tab - ptr < hexsz + 1)
+ goto bad_line;
+
+ if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
+ stage = tab[-1] - '0';
+ ptr = tab + 1; /* point at the head of path */
+ tab = tab - 2; /* point at tail of sha1 */
+ } else {
+ stage = 0;
+ ptr = tab + 1; /* point at the head of path */
+ }
+
+ if (get_oid_hex(tab - hexsz, &oid) ||
+ tab[-(hexsz + 1)] != ' ')
+ goto bad_line;
+
+ path_name = ptr;
+ if (!nul_term_line && path_name[0] == '"') {
+ strbuf_reset(&uq);
+ if (unquote_c_style(&uq, path_name, NULL)) {
+ ret = error("bad quoting of path name");
+ break;
+ }
+ path_name = uq.buf;
+ }
+
+ ret = fn(mode, &oid, stage, path_name, cbdata);
+ if (ret) {
+ ret = -1;
+ break;
+ }
+
+ continue;
+
+ bad_line:
+ die("malformed input line '%s'", buf.buf);
+ }
+ strbuf_release(&buf);
+ strbuf_release(&uq);
+
+ return ret;
+}
diff --git a/index-info.h b/index-info.h
new file mode 100644
index 00000000000..d650498325a
--- /dev/null
+++ b/index-info.h
@@ -0,0 +1,11 @@
+#ifndef INDEX_INFO_H
+#define INDEX_INFO_H
+
+#include "hash.h"
+
+typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
+
+/* Iterate over parsed index info from stdin */
+int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata);
+
+#endif /* INDEX_INFO_H */
diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
index cc72ead79f3..794a5b1a184 100755
--- a/t/t2107-update-index-basic.sh
+++ b/t/t2107-update-index-basic.sh
@@ -142,4 +142,31 @@ test_expect_success '--index-version' '
test_must_be_empty actual
'
+test_expect_success '--index-info fails on malformed input' '
+ # empty line
+ echo "" |
+ test_must_fail git update-index --index-info 2>err &&
+ test_grep "malformed input line" err &&
+
+ # bad whitespace
+ printf "100644 $EMPTY_BLOB A" |
+ test_must_fail git update-index --index-info 2>err &&
+ test_grep "malformed input line" err &&
+
+ # invalid stage value
+ printf "100644 $EMPTY_BLOB 5\tA" |
+ test_must_fail git update-index --index-info 2>err &&
+ test_grep "malformed input line" err &&
+
+ # invalid OID length
+ printf "100755 abc123\tA" |
+ test_must_fail git update-index --index-info 2>err &&
+ test_grep "malformed input line" err &&
+
+ # bad quoting
+ printf "100644 $EMPTY_BLOB\t\"A" |
+ test_must_fail git update-index --index-info 2>err &&
+ test_grep "bad quoting of path name" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 05/17] index-info.c: return unrecognized lines to caller
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (3 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
` (12 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Update 'read_index_info()' to return INDEX_INFO_UNRECOGNIZED_LINE (value 1),
rather than die()-ing when the function encounters a line that cannot be
parsed according to one of the accepted formats. This grants the caller the
flexibility to fall back on custom handling for such lines rather than a
returning a catch-all error. In the case of 'update-index', we'll still exit
with a "malformed input line" error. However, when 'read_index_info()' is
used to process the input to 'mktree' in a later patch, an empty line return
value will signal a new tree in --batch mode.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/update-index.c | 9 +++++++--
index-info.c | 16 +++++++++-------
index-info.h | 5 ++++-
3 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index fddf59b54c1..8d0b40a6fd6 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -787,11 +787,16 @@ static enum parse_opt_result stdin_cacheinfo_callback(
ret = error("option '%s' must be the last argument", opt->long_name);
} else {
int *nul_term_line = opt->value;
+ struct strbuf line = STRBUF_INIT;
allow_add = allow_replace = allow_remove = 1;
- ret = read_index_info(*nul_term_line, apply_index_info, NULL);
- if (ret)
+ ret = read_index_info(*nul_term_line, apply_index_info, NULL, &line);
+
+ if (ret == INDEX_INFO_UNRECOGNIZED_LINE)
+ ret = error("malformed input line '%s'", line.buf);
+ else if (ret)
ret = -1;
+ strbuf_release(&line);
}
return ret;
diff --git a/index-info.c b/index-info.c
index 8ccaac5487b..7a02f66426a 100644
--- a/index-info.c
+++ b/index-info.c
@@ -5,16 +5,16 @@
#include "strbuf.h"
#include "quote.h"
-int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
+int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
+ struct strbuf *line)
{
const int hexsz = the_hash_algo->hexsz;
- struct strbuf buf = STRBUF_INIT;
struct strbuf uq = STRBUF_INIT;
strbuf_getline_fn getline_fn;
int ret = 0;
getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
- while (getline_fn(&buf, stdin) != EOF) {
+ while (getline_fn(line, stdin) != EOF) {
char *ptr, *tab;
char *path_name;
struct object_id oid;
@@ -39,8 +39,8 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
* index file and matches "git ls-files --stage" output.
*/
errno = 0;
- ul = strtoul(buf.buf, &ptr, 8);
- if (ptr == buf.buf || *ptr != ' '
+ ul = strtoul(line->buf, &ptr, 8);
+ if (ptr == line->buf || *ptr != ' '
|| errno || (unsigned int) ul != ul)
goto bad_line;
mode = ul;
@@ -81,10 +81,12 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
continue;
bad_line:
- die("malformed input line '%s'", buf.buf);
+ ret = INDEX_INFO_UNRECOGNIZED_LINE;
+ break;
}
- strbuf_release(&buf);
strbuf_release(&uq);
+ if (!ret)
+ strbuf_reset(line);
return ret;
}
diff --git a/index-info.h b/index-info.h
index d650498325a..9258011462d 100644
--- a/index-info.h
+++ b/index-info.h
@@ -5,7 +5,10 @@
typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
+#define INDEX_INFO_UNRECOGNIZED_LINE 1
+
/* Iterate over parsed index info from stdin */
-int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata);
+int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
+ struct strbuf *line);
#endif /* INDEX_INFO_H */
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (4 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
` (11 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
If the object type (e.g. "blob", "tree") is identified on a stdin line read
by 'read_index_info()' (i.e. on lines formatted like the output of 'git
ls-tree'), parse it into an 'enum object_type' and provide it to the
'read_index_info()' callback as an argument. If the type is not provided,
pass 'OBJ_ANY' instead. If the object type is invalid, return an error.
The goal of this change is to allow for more thorough validation of the
provided object type (e.g. against the provided mode) in 'mktree' once
'mktree_line' is replaced with 'read_index_info()'. Note, though, that this
change also strengthens the validation done by 'update-index', since invalid
type names now trigger an error.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/update-index.c | 3 ++-
index-info.c | 16 ++++++++++++----
index-info.h | 3 ++-
t/t2107-update-index-basic.sh | 5 +++++
4 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 8d0b40a6fd6..42a274f9ce4 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -510,7 +510,8 @@ static void update_one(const char *path)
report("add '%s'", path);
}
-static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
+static int apply_index_info(unsigned int mode, struct object_id *oid,
+ enum object_type obj_type UNUSED, int stage,
const char *path_name, void *cbdata UNUSED)
{
if (!verify_path(path_name, mode)) {
diff --git a/index-info.c b/index-info.c
index 7a02f66426a..9c986cd9093 100644
--- a/index-info.c
+++ b/index-info.c
@@ -18,6 +18,7 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
char *ptr, *tab;
char *path_name;
struct object_id oid;
+ enum object_type obj_type = OBJ_ANY;
unsigned int mode;
unsigned long ul;
int stage;
@@ -51,18 +52,17 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
stage = tab[-1] - '0';
- ptr = tab + 1; /* point at the head of path */
+ path_name = tab + 1; /* point at the head of path */
tab = tab - 2; /* point at tail of sha1 */
} else {
stage = 0;
- ptr = tab + 1; /* point at the head of path */
+ path_name = tab + 1; /* point at the head of path */
}
if (get_oid_hex(tab - hexsz, &oid) ||
tab[-(hexsz + 1)] != ' ')
goto bad_line;
- path_name = ptr;
if (!nul_term_line && path_name[0] == '"') {
strbuf_reset(&uq);
if (unquote_c_style(&uq, path_name, NULL)) {
@@ -72,7 +72,15 @@ int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata,
path_name = uq.buf;
}
- ret = fn(mode, &oid, stage, path_name, cbdata);
+ /* Get the type, if provided */
+ if (tab - hexsz - 1 > ptr + 1) {
+ if (*(tab - hexsz - 1) != ' ')
+ goto bad_line;
+ *(tab - hexsz - 1) = '\0';
+ obj_type = type_from_string(ptr + 1);
+ }
+
+ ret = fn(mode, &oid, obj_type, stage, path_name, cbdata);
if (ret) {
ret = -1;
break;
diff --git a/index-info.h b/index-info.h
index 9258011462d..adea453b197 100644
--- a/index-info.h
+++ b/index-info.h
@@ -2,8 +2,9 @@
#define INDEX_INFO_H
#include "hash.h"
+#include "object.h"
-typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
+typedef int (*each_index_info_fn)(unsigned int, struct object_id *, enum object_type, int, const char *, void *);
#define INDEX_INFO_UNRECOGNIZED_LINE 1
diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
index 794a5b1a184..9e0e77bbf9e 100755
--- a/t/t2107-update-index-basic.sh
+++ b/t/t2107-update-index-basic.sh
@@ -153,6 +153,11 @@ test_expect_success '--index-info fails on malformed input' '
test_must_fail git update-index --index-info 2>err &&
test_grep "malformed input line" err &&
+ # invalid type
+ printf "100644 bad $EMPTY_BLOB\tA" |
+ test_must_fail git update-index --index-info 2>err &&
+ test_grep "invalid object type" err &&
+
# invalid stage value
printf "100644 $EMPTY_BLOB 5\tA" |
test_must_fail git update-index --index-info 2>err &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 07/17] mktree: use read_index_info to read stdin lines
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (5 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-20 20:18 ` Junio C Hamano
2024-06-19 21:57 ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
` (10 subsequent siblings)
17 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Replace the custom input parsing of 'mktree' with 'read_index_info()', which
handles not only the 'ls-tree' output format it already handles but also the
other formats compatible with 'update-index'. This lends some consistency
across the commands (avoiding the need for two similar implementations for
input parsing) and adds flexibility to mktree.
It should be noted that, while the error messages are largely preserved in
the refactor, one does change: "fatal: invalid quoting" is now "error: bad
quoting of path name".
Update 'Documentation/git-mktree.txt' to reflect the more permissive input
format, as well as make a note about rejecting stage values higher than 0.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 26 ++++--
builtin/mktree.c | 156 +++++++++++++++--------------------
t/t1010-mktree.sh | 66 +++++++++++++++
3 files changed, 151 insertions(+), 97 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 383f09dd333..c187403c6bd 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -3,7 +3,7 @@ git-mktree(1)
NAME
----
-git-mktree - Build a tree-object from ls-tree formatted text
+git-mktree - Build a tree-object from formatted tree entries
SYNOPSIS
@@ -13,15 +13,14 @@ SYNOPSIS
DESCRIPTION
-----------
-Reads standard input in non-recursive `ls-tree` output format, and creates
-a tree object. The order of the tree entries is normalized by mktree so
-pre-sorting the input is not required. The object name of the tree object
-built is written to the standard output.
+Reads entry information from stdin and creates a tree object from those
+entries. The object name of the tree object built is written to the standard
+output.
OPTIONS
-------
-z::
- Read the NUL-terminated `ls-tree -z` output instead.
+ Input lines are separated with NUL rather than LF.
--missing::
Allow missing objects. The default behaviour (without this option)
@@ -35,6 +34,21 @@ OPTIONS
optional. Note - if the `-z` option is used, lines are terminated
with NUL.
+INPUT FORMAT
+------------
+Tree entries may be specified in any of the formats compatible with the
+`--index-info` option to linkgit:git-update-index[1]:
+
+include::index-info-formats.txt[]
+
+Note that if the `stage` of a tree entry is given, the value must be 0.
+Higher stages represent conflicted files in an index; this information
+cannot be represented in a tree object. The command will fail without
+writing the tree if a higher order stage is specified for any entry.
+
+The order of the tree entries is normalized by `mktree` so pre-sorting the
+input by path is not required.
+
GIT
---
Part of the linkgit:git[1] suite
diff --git a/builtin/mktree.c b/builtin/mktree.c
index a96ea10bf95..03a9899bc11 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -6,6 +6,7 @@
#include "builtin.h"
#include "gettext.h"
#include "hex.h"
+#include "index-info.h"
#include "quote.h"
#include "strbuf.h"
#include "tree.h"
@@ -95,123 +96,96 @@ static const char *mktree_usage[] = {
NULL
};
-static void mktree_line(char *buf, int nul_term_line, int allow_missing,
- struct tree_entry_array *arr)
+struct mktree_line_data {
+ struct tree_entry_array *arr;
+ int allow_missing;
+};
+
+static int mktree_line(unsigned int mode, struct object_id *oid,
+ enum object_type obj_type, int stage,
+ const char *path, void *cbdata)
{
- char *ptr, *ntr;
- const char *p;
- unsigned mode;
- enum object_type mode_type; /* object type derived from mode */
- enum object_type obj_type; /* object type derived from sha */
+ struct mktree_line_data *data = cbdata;
+ enum object_type mode_type = object_type(mode);
struct object_info oi = OBJECT_INFO_INIT;
- char *path, *to_free = NULL;
- struct object_id oid;
+ enum object_type parsed_obj_type;
- ptr = buf;
- /*
- * Read non-recursive ls-tree output format:
- * mode SP type SP sha1 TAB name
- */
- mode = strtoul(ptr, &ntr, 8);
- if (ptr == ntr || !ntr || *ntr != ' ')
- die("input format error: %s", buf);
- ptr = ntr + 1; /* type */
- ntr = strchr(ptr, ' ');
- if (!ntr || parse_oid_hex(ntr + 1, &oid, &p) ||
- *p != '\t')
- die("input format error: %s", buf);
-
- /* It is perfectly normal if we do not have a commit from a submodule */
- if (S_ISGITLINK(mode))
- allow_missing = 1;
-
-
- *ntr++ = 0; /* now at the beginning of SHA1 */
-
- path = (char *)p + 1; /* at the beginning of name */
- if (!nul_term_line && path[0] == '"') {
- struct strbuf p_uq = STRBUF_INIT;
- if (unquote_c_style(&p_uq, path, NULL))
- die("invalid quoting");
- path = to_free = strbuf_detach(&p_uq, NULL);
- }
+ if (stage)
+ die(_("path '%s' is unmerged"), path);
- /*
- * Object type is redundantly derivable three ways.
- * These should all agree.
- */
- mode_type = object_type(mode);
- if (mode_type != type_from_string(ptr)) {
- die("entry '%s' object type (%s) doesn't match mode type (%s)",
- path, ptr, type_name(mode_type));
- }
+ if (obj_type != OBJ_ANY && mode_type != obj_type)
+ die("object type (%s) doesn't match mode type (%s)",
+ type_name(obj_type), type_name(mode_type));
+
+ oi.typep = &parsed_obj_type;
- /* Check the type of object identified by oid without fetching objects */
- oi.typep = &obj_type;
- if (oid_object_info_extended(the_repository, &oid, &oi,
+ if (oid_object_info_extended(the_repository, oid, &oi,
OBJECT_INFO_LOOKUP_REPLACE |
OBJECT_INFO_QUICK |
OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
- obj_type = -1;
-
- if (obj_type < 0) {
- if (allow_missing) {
- ; /* no problem - missing objects are presumed to be of the right type */
- } else {
- die("entry '%s' object %s is unavailable", path, oid_to_hex(&oid));
- }
- } else {
- if (obj_type != mode_type) {
- /*
- * The object exists but is of the wrong type.
- * This is a problem regardless of allow_missing
- * because the new tree entry will never be correct.
- */
- die("entry '%s' object %s is a %s but specified type was (%s)",
- path, oid_to_hex(&oid), type_name(obj_type), type_name(mode_type));
- }
+ parsed_obj_type = -1;
+
+ if (parsed_obj_type < 0) {
+ /*
+ * There are two conditions where the object being missing
+ * is acceptable:
+ *
+ * - We're explicitly allowing it with --missing.
+ * - The object is a submodule, which we wouldn't expect to
+ * be in this repo anyway.
+ *
+ * If neither condition is met, die().
+ */
+ if (!data->allow_missing && !S_ISGITLINK(mode))
+ die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
+
+ } else if (parsed_obj_type != mode_type) {
+ /*
+ * The object exists but is of the wrong type.
+ * This is a problem regardless of allow_missing
+ * because the new tree entry will never be correct.
+ */
+ die("entry '%s' object %s is a %s but specified type was (%s)",
+ path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
}
- append_to_tree(mode, &oid, path, arr);
- free(to_free);
+ append_to_tree(mode, oid, path, data->arr);
+ return 0;
}
int cmd_mktree(int ac, const char **av, const char *prefix)
{
- struct strbuf sb = STRBUF_INIT;
struct object_id oid;
int nul_term_line = 0;
- int allow_missing = 0;
int is_batch_mode = 0;
- int got_eof = 0;
struct tree_entry_array arr = { 0 };
- strbuf_getline_fn getline_fn;
+ struct mktree_line_data mktree_line_data = { .arr = &arr };
+ struct strbuf line = STRBUF_INIT;
+ int ret;
const struct option option[] = {
OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")),
- OPT_BOOL(0, "missing", &allow_missing, N_("allow missing objects")),
+ OPT_BOOL(0, "missing", &mktree_line_data.allow_missing, N_("allow missing objects")),
OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")),
OPT_END()
};
ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
- getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
-
- while (!got_eof) {
- while (1) {
- if (getline_fn(&sb, stdin) == EOF) {
- got_eof = 1;
- break;
- }
- if (sb.buf[0] == '\0') {
+
+ do {
+ ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data, &line);
+ if (ret < 0)
+ break;
+
+ if (ret == INDEX_INFO_UNRECOGNIZED_LINE) {
+ if (line.len)
+ die("input format error: %s", line.buf);
+ else if (!is_batch_mode)
/* empty lines denote tree boundaries in batch mode */
- if (is_batch_mode)
- break;
die("input format error: (blank line only valid in batch mode)");
- }
- mktree_line(sb.buf, nul_term_line, allow_missing, &arr);
}
- if (is_batch_mode && got_eof && arr.nr < 1) {
+
+ if (is_batch_mode && !ret && arr.nr < 1) {
/*
* Execution gets here if the last tree entry is terminated with a
* new-line. The final new-line has been made optional to be
@@ -224,9 +198,9 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
fflush(stdout);
}
tree_entry_array_clear(&arr, 1); /* reset tree entry buffer for re-use in batch mode */
- }
+ } while (ret > 0);
+ strbuf_release(&line);
tree_entry_array_release(&arr, 1);
- strbuf_release(&sb);
- return 0;
+ return !!ret;
}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 22875ba598c..649842fa27c 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -54,11 +54,36 @@ test_expect_success 'ls-tree output in wrong order given to mktree (2)' '
test_cmp tree.withsub actual
'
+test_expect_success '--batch creates multiple trees' '
+ cat top >multi-tree &&
+ echo "" >>multi-tree &&
+ cat top.withsub >>multi-tree &&
+
+ cat tree >expect &&
+ cat tree.withsub >>expect &&
+ git mktree --batch <multi-tree >actual &&
+ test_cmp expect actual
+'
+
test_expect_success 'allow missing object with --missing' '
git mktree --missing <top.missing >actual &&
test_cmp tree.missing actual
'
+test_expect_success 'mktree with invalid submodule OIDs' '
+ # non-existent OID - ok
+ printf "160000 commit $(test_oid numeric)\tA\n" >in &&
+ git mktree <in >tree.actual &&
+ git ls-tree $(cat tree.actual) >actual &&
+ test_cmp in actual &&
+
+ # existing OID, wrong type - error
+ tree_oid="$(cat tree)" &&
+ printf "160000 commit $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
+ test_grep "object $tree_oid is a tree but specified type was (commit)" err
+'
+
test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
test_must_fail git mktree <all
'
@@ -67,4 +92,45 @@ test_expect_success 'mktree refuses to read ls-tree -r output (2)' '
test_must_fail git mktree <all.withsub
'
+test_expect_success 'mktree fails on malformed input' '
+ # empty line without --batch
+ echo "" |
+ test_must_fail git mktree 2>err &&
+ test_grep "blank line only valid in batch mode" err &&
+
+ # bad whitespace
+ printf "100644 blob $EMPTY_BLOB A" |
+ test_must_fail git mktree 2>err &&
+ test_grep "input format error" err &&
+
+ # invalid type
+ printf "100644 bad $EMPTY_BLOB\tA" |
+ test_must_fail git mktree 2>err &&
+ test_grep "invalid object type" err &&
+
+ # invalid OID length
+ printf "100755 blob abc123\tA" |
+ test_must_fail git mktree 2>err &&
+ test_grep "input format error" err &&
+
+ # bad quoting
+ printf "100644 blob $EMPTY_BLOB\t\"A" |
+ test_must_fail git mktree 2>err &&
+ test_grep "bad quoting of path name" err
+'
+
+test_expect_success 'mktree fails on mode mismatch' '
+ tree_oid="$(cat tree)" &&
+
+ # mode-type mismatch
+ printf "100644 tree $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
+ test_grep "object type (tree) doesn${SQ}t match mode type (blob)" err &&
+
+ # mode-object mismatch (no --missing)
+ printf "100644 $tree_oid\tA" |
+ test_must_fail git mktree 2>err &&
+ test_grep "object $tree_oid is a tree but specified type was (blob)" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (6 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
` (9 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Adjust the 'git mktree' tree entry intake logic to no longer fail if an OID
specified with a S_IFGITLINK mode exists in the current repository's object
database with a different type.
While this scenario likely represents a mistake by the user, submodule OIDs
are not validated as part of object writes or in 'git fsck'. In other
commands, any object info would be ignored if such an OID was found in the
current repository with a different type.
Since this check is not needed to avoid creation of a corrupt tree, let's
remove it and make 'git mktree' less opinionated as a result.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 58 ++++++++++++++++++++++-------------------------
t/t1010-mktree.sh | 18 ++++++---------
2 files changed, 34 insertions(+), 42 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 03a9899bc11..f509ed1a81f 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -107,8 +107,6 @@ static int mktree_line(unsigned int mode, struct object_id *oid,
{
struct mktree_line_data *data = cbdata;
enum object_type mode_type = object_type(mode);
- struct object_info oi = OBJECT_INFO_INIT;
- enum object_type parsed_obj_type;
if (stage)
die(_("path '%s' is unmerged"), path);
@@ -117,36 +115,34 @@ static int mktree_line(unsigned int mode, struct object_id *oid,
die("object type (%s) doesn't match mode type (%s)",
type_name(obj_type), type_name(mode_type));
- oi.typep = &parsed_obj_type;
-
- if (oid_object_info_extended(the_repository, oid, &oi,
- OBJECT_INFO_LOOKUP_REPLACE |
+ if (!S_ISGITLINK(mode)) {
+ struct object_info oi = OBJECT_INFO_INIT;
+ enum object_type parsed_obj_type;
+ unsigned int flags = OBJECT_INFO_LOOKUP_REPLACE |
OBJECT_INFO_QUICK |
- OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
- parsed_obj_type = -1;
-
- if (parsed_obj_type < 0) {
- /*
- * There are two conditions where the object being missing
- * is acceptable:
- *
- * - We're explicitly allowing it with --missing.
- * - The object is a submodule, which we wouldn't expect to
- * be in this repo anyway.
- *
- * If neither condition is met, die().
- */
- if (!data->allow_missing && !S_ISGITLINK(mode))
- die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
-
- } else if (parsed_obj_type != mode_type) {
- /*
- * The object exists but is of the wrong type.
- * This is a problem regardless of allow_missing
- * because the new tree entry will never be correct.
- */
- die("entry '%s' object %s is a %s but specified type was (%s)",
- path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
+ OBJECT_INFO_SKIP_FETCH_OBJECT;
+
+ oi.typep = &parsed_obj_type;
+
+ if (oid_object_info_extended(the_repository, oid, &oi, flags) < 0) {
+ /*
+ * If the object is missing and we aren't explicitly
+ * allowing missing objects, die(). Otherwise, continue
+ * without error.
+ */
+ if (!data->allow_missing)
+ die("entry '%s' object %s is unavailable", path,
+ oid_to_hex(oid));
+ } else if (parsed_obj_type != mode_type) {
+ /*
+ * The object exists but is of the wrong type.
+ * This is a problem regardless of allow_missing
+ * because the new tree entry will never be correct.
+ */
+ die("entry '%s' object %s is a %s but specified type was (%s)",
+ path, oid_to_hex(oid), type_name(parsed_obj_type),
+ type_name(mode_type));
+ }
}
append_to_tree(mode, oid, path, data->arr);
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 649842fa27c..48fc532e7af 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -71,17 +71,13 @@ test_expect_success 'allow missing object with --missing' '
'
test_expect_success 'mktree with invalid submodule OIDs' '
- # non-existent OID - ok
- printf "160000 commit $(test_oid numeric)\tA\n" >in &&
- git mktree <in >tree.actual &&
- git ls-tree $(cat tree.actual) >actual &&
- test_cmp in actual &&
-
- # existing OID, wrong type - error
- tree_oid="$(cat tree)" &&
- printf "160000 commit $tree_oid\tA" |
- test_must_fail git mktree 2>err &&
- test_grep "object $tree_oid is a tree but specified type was (commit)" err
+ for oid in "$(test_oid numeric)" "$(cat tree)"
+ do
+ printf "160000 commit $oid\tA\n" >in &&
+ git mktree <in >tree.actual &&
+ git ls-tree $(cat tree.actual) >actual &&
+ test_cmp in actual || return 1
+ done
'
test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 09/17] mktree: add a --literally option
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (7 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
` (8 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Add the '--literally' option to 'git mktree' to allow constructing a tree
with invalid contents. For now, the only change this represents compared to
the normal 'git mktree' behavior is no longer sorting the inputs; in later
commits, deduplicaton and path validation will be added to the command and
'--literally' will skip those as well.
Certain tests use 'git mktree' to intentionally generate corrupt trees.
Update these tests to use '--literally' so that they continue functioning
properly when additional input cleanup & validation is added to the base
command. Note that, because 'mktree --literally' does not sort entries, some
of the tests are updated to provide their inputs in tree order; otherwise,
the test would fail with an "incorrect order" error instead of the error the
test expects.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 9 ++++++-
builtin/mktree.c | 36 +++++++++++++++++++++++----
t/t1010-mktree.sh | 40 ++++++++++++++++++++++++++++++
t/t1014-read-tree-confusing.sh | 6 ++---
t/t1450-fsck.sh | 4 +--
t/t1601-index-bogus.sh | 2 +-
t/t1700-split-index.sh | 6 ++---
t/t7008-filter-branch-null-sha1.sh | 6 ++---
t/t7417-submodule-path-url.sh | 2 +-
t/t7450-bad-git-dotfiles.sh | 8 +++---
10 files changed, 96 insertions(+), 23 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index c187403c6bd..5f3a6dfe38e 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -9,7 +9,7 @@ git-mktree - Build a tree-object from formatted tree entries
SYNOPSIS
--------
[verse]
-'git mktree' [-z] [--missing] [--batch]
+'git mktree' [-z] [--missing] [--literally] [--batch]
DESCRIPTION
-----------
@@ -28,6 +28,13 @@ OPTIONS
object. This option has no effect on the treatment of gitlink entries
(aka "submodules") which are always allowed to be missing.
+--literally::
+ Create the tree from the tree entries provided to stdin in the order
+ they are provided without performing additional sorting,
+ deduplication, or path validation on them. This option is primarily
+ useful for creating invalid tree objects to use in tests of how Git
+ deals with various forms of tree corruption.
+
--batch::
Allow building of more than one tree object before exiting. Each
tree is separated by a single blank line. The final newline is
diff --git a/builtin/mktree.c b/builtin/mktree.c
index f509ed1a81f..4ff99d44d79 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -48,11 +48,11 @@ static void tree_entry_array_release(struct tree_entry_array *arr, int free_entr
}
static void append_to_tree(unsigned mode, struct object_id *oid, const char *path,
- struct tree_entry_array *arr)
+ struct tree_entry_array *arr, int literally)
{
struct tree_entry *ent;
size_t len = strlen(path);
- if (strchr(path, '/'))
+ if (!literally && strchr(path, '/'))
die("path %s contains slash", path);
FLEX_ALLOC_MEM(ent, name, path, len);
@@ -91,14 +91,35 @@ static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
strbuf_release(&buf);
}
+static void write_tree_literally(struct tree_entry_array *arr,
+ struct object_id *oid)
+{
+ struct strbuf buf;
+ size_t size = 0;
+
+ for (size_t i = 0; i < arr->nr; i++)
+ size += 32 + arr->entries[i]->len;
+
+ strbuf_init(&buf, size);
+ for (size_t i = 0; i < arr->nr; i++) {
+ struct tree_entry *ent = arr->entries[i];
+ strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
+ strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
+ }
+
+ write_object_file(buf.buf, buf.len, OBJ_TREE, oid);
+ strbuf_release(&buf);
+}
+
static const char *mktree_usage[] = {
- "git mktree [-z] [--missing] [--batch]",
+ "git mktree [-z] [--missing] [--literally] [--batch]",
NULL
};
struct mktree_line_data {
struct tree_entry_array *arr;
int allow_missing;
+ int literally;
};
static int mktree_line(unsigned int mode, struct object_id *oid,
@@ -145,7 +166,7 @@ static int mktree_line(unsigned int mode, struct object_id *oid,
}
}
- append_to_tree(mode, oid, path, data->arr);
+ append_to_tree(mode, oid, path, data->arr, data->literally);
return 0;
}
@@ -162,6 +183,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
const struct option option[] = {
OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")),
OPT_BOOL(0, "missing", &mktree_line_data.allow_missing, N_("allow missing objects")),
+ OPT_BOOL(0, "literally", &mktree_line_data.literally,
+ N_("do not sort, deduplicate, or validate paths of tree entries")),
OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")),
OPT_END()
};
@@ -189,7 +212,10 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
*/
; /* skip creating an empty tree */
} else {
- write_tree(&arr, &oid);
+ if (mktree_line_data.literally)
+ write_tree_literally(&arr, &oid);
+ else
+ write_tree(&arr, &oid);
puts(oid_to_hex(&oid));
fflush(stdout);
}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 48fc532e7af..961c0c3e55e 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -129,4 +129,44 @@ test_expect_success 'mktree fails on mode mismatch' '
test_grep "object $tree_oid is a tree but specified type was (blob)" err
'
+test_expect_success '--literally can create invalid trees' '
+ tree_oid="$(cat tree)" &&
+ blob_oid="$(git rev-parse ${tree_oid}:one)" &&
+
+ # duplicate entries
+ {
+ printf "040000 tree $tree_oid\tmy-tree\n" &&
+ printf "100644 blob $blob_oid\ttest-file\n" &&
+ printf "100755 blob $blob_oid\ttest-file\n"
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+ test_grep "contains duplicate file entries" err &&
+
+ # disallowed path
+ {
+ printf "100644 blob $blob_oid\t.git\n"
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+ test_grep "contains ${SQ}.git${SQ}" err &&
+
+ # nested entry
+ {
+ printf "100644 blob $blob_oid\tdeeper/my-file\n"
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+ test_grep "contains full pathnames" err &&
+
+ # bad entry ordering
+ {
+ printf "100644 blob $blob_oid\tB\n" &&
+ printf "040000 tree $tree_oid\tA\n"
+ } | git mktree --literally >tree.bad &&
+ git cat-file tree $(cat tree.bad) >top.bad &&
+ test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+ test_grep "not properly sorted" err
+'
+
test_done
diff --git a/t/t1014-read-tree-confusing.sh b/t/t1014-read-tree-confusing.sh
index 8ea8d36818b..762eb789704 100755
--- a/t/t1014-read-tree-confusing.sh
+++ b/t/t1014-read-tree-confusing.sh
@@ -30,13 +30,13 @@ while read path pretty; do
esac
test_expect_success "reject $pretty at end of path" '
printf "100644 blob %s\t%s" "$blob" "$path" >tree &&
- bogus=$(git mktree <tree) &&
+ bogus=$(git mktree --literally <tree) &&
test_must_fail git read-tree $bogus
'
test_expect_success "reject $pretty as subtree" '
printf "040000 tree %s\t%s" "$tree" "$path" >tree &&
- bogus=$(git mktree <tree) &&
+ bogus=$(git mktree --literally <tree) &&
test_must_fail git read-tree $bogus
'
done <<-EOF
@@ -58,7 +58,7 @@ test_expect_success 'utf-8 paths allowed with core.protectHFS off' '
test_when_finished "git read-tree HEAD" &&
test_config core.protectHFS false &&
printf "100644 blob %s\t%s" "$blob" ".gi${u200c}t" >tree &&
- ok=$(git mktree <tree) &&
+ ok=$(git mktree --literally <tree) &&
git read-tree $ok
'
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8a456b1142d..532d2770e88 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -316,7 +316,7 @@ check_duplicate_names () {
*) printf "100644 blob %s\t%s\n" $blob "$name" ;;
esac
done >badtree &&
- badtree=$(git mktree <badtree) &&
+ badtree=$(git mktree --literally <badtree) &&
test_must_fail git fsck 2>out &&
test_grep "$badtree" out &&
test_grep "error in tree .*contains duplicate file entries" out
@@ -614,7 +614,7 @@ while read name path pretty; do
tree=$(git rev-parse HEAD^{tree}) &&
value=$(eval "echo \$$type") &&
printf "$mode $type %s\t%s" "$value" "$path" >bad &&
- bad_tree=$(git mktree <bad) &&
+ bad_tree=$(git mktree --literally <bad) &&
git fsck 2>out &&
test_grep "warning.*tree $bad_tree" out
)'
diff --git a/t/t1601-index-bogus.sh b/t/t1601-index-bogus.sh
index 4171f1e1410..54e8ae038b7 100755
--- a/t/t1601-index-bogus.sh
+++ b/t/t1601-index-bogus.sh
@@ -4,7 +4,7 @@ test_description='test handling of bogus index entries'
. ./test-lib.sh
test_expect_success 'create tree with null sha1' '
- tree=$(printf "160000 commit $ZERO_OID\\tbroken\\n" | git mktree)
+ tree=$(printf "160000 commit $ZERO_OID\\tbroken\\n" | git mktree --literally)
'
test_expect_success 'read-tree refuses to read null sha1' '
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index ac4a5b2734c..97b58aa3cca 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -478,12 +478,12 @@ test_expect_success 'writing split index with null sha1 does not write cache tre
git config splitIndex.maxPercentChange 0 &&
git commit -m "commit" &&
{
- git ls-tree HEAD &&
- printf "160000 commit $ZERO_OID\\tbroken\\n"
+ printf "160000 commit $ZERO_OID\\tbroken\\n" &&
+ git ls-tree HEAD
} >broken-tree &&
echo "add broken entry" >msg &&
- tree=$(git mktree <broken-tree) &&
+ tree=$(git mktree --literally <broken-tree) &&
test_tick &&
commit=$(git commit-tree $tree -p HEAD <msg) &&
git update-ref HEAD "$commit" &&
diff --git a/t/t7008-filter-branch-null-sha1.sh b/t/t7008-filter-branch-null-sha1.sh
index 93fbc92b8db..a1b4c295c01 100755
--- a/t/t7008-filter-branch-null-sha1.sh
+++ b/t/t7008-filter-branch-null-sha1.sh
@@ -12,12 +12,12 @@ test_expect_success 'setup: base commits' '
test_expect_success 'setup: a commit with a bogus null sha1 in the tree' '
{
- git ls-tree HEAD &&
- printf "160000 commit $ZERO_OID\\tbroken\\n"
+ printf "160000 commit $ZERO_OID\\tbroken\\n" &&
+ git ls-tree HEAD
} >broken-tree &&
echo "add broken entry" >msg &&
- tree=$(git mktree <broken-tree) &&
+ tree=$(git mktree --literally <broken-tree) &&
test_tick &&
commit=$(git commit-tree $tree -p HEAD <msg) &&
git update-ref HEAD "$commit"
diff --git a/t/t7417-submodule-path-url.sh b/t/t7417-submodule-path-url.sh
index dbbb3853dc0..5d3c98e99a7 100755
--- a/t/t7417-submodule-path-url.sh
+++ b/t/t7417-submodule-path-url.sh
@@ -42,7 +42,7 @@ test_expect_success MINGW 'submodule paths disallows trailing spaces' '
tree=$(git -C super write-tree) &&
git -C super ls-tree $tree >tree &&
sed "s/sub/sub /" <tree >tree.new &&
- tree=$(git -C super mktree <tree.new) &&
+ tree=$(git -C super mktree --literally <tree.new) &&
commit=$(echo with space | git -C super commit-tree $tree) &&
git -C super update-ref refs/heads/main $commit &&
diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index 4a9c22c9e2b..de2d45d2244 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -203,11 +203,11 @@ check_dotx_symlink () {
content=$(git hash-object -w ../.gitmodules) &&
target=$(printf "$tricky" | git hash-object -w --stdin) &&
{
- printf "100644 blob $content\t$tricky\n" &&
- printf "120000 blob $target\t$path\n"
+ printf "120000 blob $target\t$path\n" &&
+ printf "100644 blob $content\t$tricky\n"
} >bad-tree
) &&
- tree=$(git -C $dir mktree <$dir/bad-tree)
+ tree=$(git -C $dir mktree --literally <$dir/bad-tree)
'
test_expect_success "fsck detects symlinked $name ($type)" '
@@ -261,7 +261,7 @@ test_expect_success 'fsck detects non-blob .gitmodules' '
cp ../.gitmodules subdir/file &&
git add subdir/file &&
git commit -m ok &&
- git ls-tree HEAD | sed s/subdir/.gitmodules/ | git mktree &&
+ git ls-tree HEAD | sed s/subdir/.gitmodules/ | git mktree --literally &&
test_must_fail git fsck 2>output &&
test_grep gitmodulesBlob output
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 10/17] mktree: validate paths more carefully
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (8 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
` (7 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Use 'verify_path' to validate the paths provided as tree entries, ensuring
we do not create entries with paths not allowed in trees (e.g., .git). Also,
remove trailing slashes on directories before validating, allowing users to
provide 'folder-name/' as the path for a tree object entry.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 20 +++++++++++++++++---
t/t1010-mktree.sh | 33 +++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+), 3 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 4ff99d44d79..8f0af24b6b1 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -8,6 +8,7 @@
#include "hex.h"
#include "index-info.h"
#include "quote.h"
+#include "read-cache-ll.h"
#include "strbuf.h"
#include "tree.h"
#include "parse-options.h"
@@ -52,10 +53,23 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
{
struct tree_entry *ent;
size_t len = strlen(path);
- if (!literally && strchr(path, '/'))
- die("path %s contains slash", path);
- FLEX_ALLOC_MEM(ent, name, path, len);
+ if (literally) {
+ FLEX_ALLOC_MEM(ent, name, path, len);
+ } else {
+ /* Normalize and validate entry path */
+ if (S_ISDIR(mode)) {
+ while(len > 0 && is_dir_sep(path[len - 1]))
+ len--;
+ }
+ FLEX_ALLOC_MEM(ent, name, path, len);
+
+ if (!verify_path(ent->name, mode))
+ die(_("invalid path '%s'"), path);
+ if (strchr(ent->name, '/'))
+ die("path %s contains slash", path);
+ }
+
ent->mode = mode;
ent->len = len;
oidcpy(&ent->oid, oid);
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 961c0c3e55e..7e750530455 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -169,4 +169,37 @@ test_expect_success '--literally can create invalid trees' '
test_grep "not properly sorted" err
'
+test_expect_success 'mktree validates path' '
+ tree_oid="$(cat tree)" &&
+ blob_oid="$(git rev-parse $tree_oid:a/one)" &&
+ head_oid="$(git rev-parse HEAD)" &&
+
+ # Valid: tree with or without trailing slash, blob without trailing slash
+ {
+ printf "040000 tree $tree_oid\tfolder1/\n" &&
+ printf "040000 tree $tree_oid\tfolder2\n" &&
+ printf "100644 blob $blob_oid\tfile.txt\n"
+ } | git mktree >actual &&
+
+ # Invalid: blob with trailing slash
+ printf "100644 blob $blob_oid\ttest/" |
+ test_must_fail git mktree 2>err &&
+ test_grep "invalid path ${SQ}test/${SQ}" err &&
+
+ # Invalid: dotdot
+ printf "040000 tree $tree_oid\t../" |
+ test_must_fail git mktree 2>err &&
+ test_grep "invalid path ${SQ}../${SQ}" err &&
+
+ # Invalid: dot
+ printf "040000 tree $tree_oid\t." |
+ test_must_fail git mktree 2>err &&
+ test_grep "invalid path ${SQ}.${SQ}" err &&
+
+ # Invalid: .git
+ printf "040000 tree $tree_oid\t.git/" |
+ test_must_fail git mktree 2>err &&
+ test_grep "invalid path ${SQ}.git/${SQ}" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 11/17] mktree: overwrite duplicate entries
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (9 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
@ 2024-06-19 21:57 ` Victoria Dye via GitGitGadget
2024-06-20 22:05 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
` (6 subsequent siblings)
17 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:57 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
If multiple tree entries with the same name are provided as input to
'mktree', only write the last one to the tree. Entries are considered
duplicates if they have identical names (*not* considering mode); if a blob
and a tree with the same name are provided, only the last one will be
written to the tree. A tree with duplicate entries is invalid (per 'git
fsck'), so that condition should be avoided wherever possible.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 3 ++-
builtin/mktree.c | 45 ++++++++++++++++++++++++++++++++----
t/t1010-mktree.sh | 36 +++++++++++++++++++++++++++--
3 files changed, 77 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 5f3a6dfe38e..cf1fd82f754 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -54,7 +54,8 @@ cannot be represented in a tree object. The command will fail without
writing the tree if a higher order stage is specified for any entry.
The order of the tree entries is normalized by `mktree` so pre-sorting the
-input by path is not required.
+input by path is not required. Multiple entries provided with the same path
+are deduplicated, with only the last one specified added to the tree.
GIT
---
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 8f0af24b6b1..a91d3a7b028 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -15,6 +15,9 @@
#include "object-store-ll.h"
struct tree_entry {
+ /* Internal */
+ size_t order;
+
unsigned mode;
struct object_id oid;
int len;
@@ -74,15 +77,49 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
ent->len = len;
oidcpy(&ent->oid, oid);
+ ent->order = arr->nr;
tree_entry_array_push(arr, ent);
}
-static int ent_compare(const void *a_, const void *b_)
+static int ent_compare(const void *a_, const void *b_, void *ctx)
{
+ int cmp;
struct tree_entry *a = *(struct tree_entry **)a_;
struct tree_entry *b = *(struct tree_entry **)b_;
- return base_name_compare(a->name, a->len, a->mode,
- b->name, b->len, b->mode);
+ int ignore_mode = *((int *)ctx);
+
+ if (ignore_mode)
+ cmp = name_compare(a->name, a->len, b->name, b->len);
+ else
+ cmp = base_name_compare(a->name, a->len, a->mode,
+ b->name, b->len, b->mode);
+ return cmp ? cmp : b->order - a->order;
+}
+
+static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
+{
+ size_t count = arr->nr;
+ struct tree_entry *prev = NULL;
+
+ int ignore_mode = 1;
+ QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
+
+ arr->nr = 0;
+ for (size_t i = 0; i < count; i++) {
+ struct tree_entry *curr = arr->entries[i];
+ if (prev &&
+ !name_compare(prev->name, prev->len,
+ curr->name, curr->len)) {
+ FREE_AND_NULL(curr);
+ } else {
+ arr->entries[arr->nr++] = curr;
+ prev = curr;
+ }
+ }
+
+ /* Sort again to order the entries for tree insertion */
+ ignore_mode = 0;
+ QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
}
static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
@@ -90,7 +127,7 @@ static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
struct strbuf buf;
size_t size = 0;
- QSORT(arr->entries, arr->nr, ent_compare);
+ sort_and_dedup_tree_entry_array(arr);
for (size_t i = 0; i < arr->nr; i++)
size += 32 + arr->entries[i]->len;
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 7e750530455..08760141d6f 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -6,11 +6,16 @@ TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success setup '
- for d in a a- a0
+ for d in folder folder- folder0
do
mkdir "$d" && echo "$d/one" >"$d/one" &&
git add "$d" || return 1
done &&
+ for f in before folder.txt later
+ do
+ echo "$f" >"$f" &&
+ git add "$f" || return 1
+ done &&
echo zero >one &&
git update-index --add --info-only one &&
git write-tree --missing-ok >tree.missing &&
@@ -171,7 +176,7 @@ test_expect_success '--literally can create invalid trees' '
test_expect_success 'mktree validates path' '
tree_oid="$(cat tree)" &&
- blob_oid="$(git rev-parse $tree_oid:a/one)" &&
+ blob_oid="$(git rev-parse $tree_oid:folder.txt)" &&
head_oid="$(git rev-parse HEAD)" &&
# Valid: tree with or without trailing slash, blob without trailing slash
@@ -202,4 +207,31 @@ test_expect_success 'mktree validates path' '
test_grep "invalid path ${SQ}.git/${SQ}" err
'
+test_expect_success 'mktree with duplicate entries' '
+ tree_oid=$(cat tree) &&
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ before_oid=$(git rev-parse ${tree_oid}:before) &&
+ head_oid=$(git rev-parse HEAD) &&
+
+ {
+ printf "100755 blob $before_oid\ttest\n" &&
+ printf "040000 tree $folder_oid\ttest-\n" &&
+ printf "160000 commit $head_oid\ttest.txt\n" &&
+ printf "040000 tree $folder_oid\ttest\n" &&
+ printf "100644 blob $before_oid\ttest0\n" &&
+ printf "160000 commit $head_oid\ttest-\n"
+ } >top.dup &&
+ git mktree <top.dup >tree.actual &&
+
+ {
+ printf "160000 commit $head_oid\ttest-\n" &&
+ printf "160000 commit $head_oid\ttest.txt\n" &&
+ printf "040000 tree $folder_oid\ttest\n" &&
+ printf "100644 blob $before_oid\ttest0\n"
+ } >expect &&
+ git ls-tree $(cat tree.actual) >actual &&
+
+ test_cmp expect actual
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 12/17] mktree: create tree using an in-core index
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (10 preceding siblings ...)
2024-06-19 21:57 ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
@ 2024-06-19 21:58 ` Victoria Dye via GitGitGadget
2024-06-20 22:26 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
` (5 subsequent siblings)
17 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:58 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Rather than manually write out the contents of a tree object file, construct
an in-memory sparse index from the provided tree entries and create the tree
by writing out its corresponding cache tree.
This patch does not change the behavior of the 'mktree' command. However,
constructing the tree this way will substantially simplify future extensions
to the command's functionality, including handling deeper-than-toplevel tree
entries and applying the provided entries to an existing tree.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 74 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 55 insertions(+), 19 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index a91d3a7b028..3ce8d3dc524 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -4,6 +4,7 @@
* Copyright (c) Junio C Hamano, 2006, 2009
*/
#include "builtin.h"
+#include "cache-tree.h"
#include "gettext.h"
#include "hex.h"
#include "index-info.h"
@@ -24,6 +25,11 @@ struct tree_entry {
char name[FLEX_ARRAY];
};
+static inline size_t df_path_len(size_t pathlen, unsigned int mode)
+{
+ return S_ISDIR(mode) ? pathlen - 1 : pathlen;
+}
+
struct tree_entry_array {
size_t nr, alloc;
struct tree_entry **entries;
@@ -60,17 +66,25 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
if (literally) {
FLEX_ALLOC_MEM(ent, name, path, len);
} else {
+ size_t len_to_copy = len;
+
/* Normalize and validate entry path */
if (S_ISDIR(mode)) {
- while(len > 0 && is_dir_sep(path[len - 1]))
- len--;
+ while(len_to_copy > 0 && is_dir_sep(path[len_to_copy - 1]))
+ len_to_copy--;
+ len = len_to_copy + 1; /* add space for trailing slash */
}
- FLEX_ALLOC_MEM(ent, name, path, len);
+ ent = xcalloc(1, st_add3(sizeof(struct tree_entry), len, 1));
+ memcpy(ent->name, path, len_to_copy);
if (!verify_path(ent->name, mode))
die(_("invalid path '%s'"), path);
if (strchr(ent->name, '/'))
die("path %s contains slash", path);
+
+ /* Add trailing slash to dir */
+ if (S_ISDIR(mode))
+ ent->name[len - 1] = '/';
}
ent->mode = mode;
@@ -88,11 +102,14 @@ static int ent_compare(const void *a_, const void *b_, void *ctx)
struct tree_entry *b = *(struct tree_entry **)b_;
int ignore_mode = *((int *)ctx);
- if (ignore_mode)
- cmp = name_compare(a->name, a->len, b->name, b->len);
- else
- cmp = base_name_compare(a->name, a->len, a->mode,
- b->name, b->len, b->mode);
+ size_t a_len = a->len, b_len = b->len;
+
+ if (ignore_mode) {
+ a_len = df_path_len(a_len, a->mode);
+ b_len = df_path_len(b_len, b->mode);
+ }
+
+ cmp = name_compare(a->name, a_len, b->name, b_len);
return cmp ? cmp : b->order - a->order;
}
@@ -108,8 +125,8 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
for (size_t i = 0; i < count; i++) {
struct tree_entry *curr = arr->entries[i];
if (prev &&
- !name_compare(prev->name, prev->len,
- curr->name, curr->len)) {
+ !name_compare(prev->name, df_path_len(prev->len, prev->mode),
+ curr->name, df_path_len(curr->len, curr->mode))) {
FREE_AND_NULL(curr);
} else {
arr->entries[arr->nr++] = curr;
@@ -122,24 +139,43 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
}
+static int add_tree_entry_to_index(struct index_state *istate,
+ struct tree_entry *ent)
+{
+ struct cache_entry *ce;
+ struct strbuf ce_name = STRBUF_INIT;
+ strbuf_add(&ce_name, ent->name, ent->len);
+
+ ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
+ if (!ce)
+ return error(_("make_cache_entry failed for path '%s'"), ent->name);
+
+ add_index_entry(istate, ce, ADD_CACHE_JUST_APPEND);
+ strbuf_release(&ce_name);
+ return 0;
+}
+
static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
{
- struct strbuf buf;
- size_t size = 0;
+ struct index_state istate = INDEX_STATE_INIT(the_repository);
+ istate.sparse_index = 1;
sort_and_dedup_tree_entry_array(arr);
- for (size_t i = 0; i < arr->nr; i++)
- size += 32 + arr->entries[i]->len;
- strbuf_init(&buf, size);
+ /* Construct an in-memory index from the provided entries */
for (size_t i = 0; i < arr->nr; i++) {
struct tree_entry *ent = arr->entries[i];
- strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
- strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
+
+ if (add_tree_entry_to_index(&istate, ent))
+ die(_("failed to add tree entry '%s'"), ent->name);
}
- write_object_file(buf.buf, buf.len, OBJ_TREE, oid);
- strbuf_release(&buf);
+ /* Write out new tree */
+ if (cache_tree_update(&istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
+ die(_("failed to write tree"));
+ oidcpy(oid, &istate.cache_tree->oid);
+
+ release_index(&istate);
}
static void write_tree_literally(struct tree_entry_array *arr,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (11 preceding siblings ...)
2024-06-19 21:58 ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
@ 2024-06-19 21:58 ` Victoria Dye via GitGitGadget
2024-06-26 21:10 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
` (4 subsequent siblings)
17 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:58 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Create 'struct tree_entry_iterator' to manage iteration through a 'struct
tree_entry_array'. Using an iterator allows for conditional iteration; this
functionality will be necessary in later commits when performing parallel
iteration through multiple sets of tree entries.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 39 ++++++++++++++++++++++++++++++++++++---
1 file changed, 36 insertions(+), 3 deletions(-)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 3ce8d3dc524..344c9b9b6fe 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -139,6 +139,35 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
}
+struct tree_entry_iterator {
+ struct tree_entry *current;
+
+ /* private */
+ struct {
+ struct tree_entry_array *arr;
+ size_t idx;
+ } priv;
+};
+
+static void tree_entry_iterator_init(struct tree_entry_iterator *iter,
+ struct tree_entry_array *arr)
+{
+ iter->priv.arr = arr;
+ iter->priv.idx = 0;
+ iter->current = 0 < arr->nr ? arr->entries[0] : NULL;
+}
+
+/*
+ * Advance the tree entry iterator to the next entry in the array. If no
+ * entries remain, 'current' is set to NULL.
+ */
+static void tree_entry_iterator_advance(struct tree_entry_iterator *iter)
+{
+ iter->current = (iter->priv.idx + 1) < iter->priv.arr->nr
+ ? iter->priv.arr->entries[++iter->priv.idx]
+ : NULL;
+}
+
static int add_tree_entry_to_index(struct index_state *istate,
struct tree_entry *ent)
{
@@ -157,14 +186,18 @@ static int add_tree_entry_to_index(struct index_state *istate,
static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
{
+ struct tree_entry_iterator iter = { NULL };
struct index_state istate = INDEX_STATE_INIT(the_repository);
istate.sparse_index = 1;
sort_and_dedup_tree_entry_array(arr);
- /* Construct an in-memory index from the provided entries */
- for (size_t i = 0; i < arr->nr; i++) {
- struct tree_entry *ent = arr->entries[i];
+ tree_entry_iterator_init(&iter, arr);
+
+ /* Construct an in-memory index from the provided entries & base tree */
+ while (iter.current) {
+ struct tree_entry *ent = iter.current;
+ tree_entry_iterator_advance(&iter);
if (add_tree_entry_to_index(&istate, ent))
die(_("failed to add tree entry '%s'"), ent->name);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 14/17] mktree: add directory-file conflict hashmap
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (12 preceding siblings ...)
2024-06-19 21:58 ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
@ 2024-06-19 21:58 ` Victoria Dye via GitGitGadget
2024-06-19 21:58 ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
` (3 subsequent siblings)
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:58 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Create a hashmap member of a 'struct tree_entry_array' that contains all of
the (de-duplicated) provided tree entries, indexed by the hash of their path
with *no* trailing slash. This hashmap will be used in a later commit to
avoid adding a file to an existing tree that has the same path as a
directory, or vice versa.
Signed-off-by: Victoria Dye <vdye@github.com>
---
builtin/mktree.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 344c9b9b6fe..b4d71dcdd02 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -16,6 +16,8 @@
#include "object-store-ll.h"
struct tree_entry {
+ struct hashmap_entry ent;
+
/* Internal */
size_t order;
@@ -33,8 +35,33 @@ static inline size_t df_path_len(size_t pathlen, unsigned int mode)
struct tree_entry_array {
size_t nr, alloc;
struct tree_entry **entries;
+
+ struct hashmap df_name_hash;
};
+static int df_name_hash_cmp(const void *cmp_data UNUSED,
+ const struct hashmap_entry *eptr,
+ const struct hashmap_entry *entry_or_key,
+ const void *keydata UNUSED)
+{
+ const struct tree_entry *e1, *e2;
+ size_t e1_len, e2_len;
+
+ e1 = container_of(eptr, const struct tree_entry, ent);
+ e2 = container_of(entry_or_key, const struct tree_entry, ent);
+
+ e1_len = df_path_len(e1->len, e1->mode);
+ e2_len = df_path_len(e2->len, e2->mode);
+
+ return e1_len != e2_len ||
+ name_compare(e1->name, e1_len, e2->name, e2_len);
+}
+
+static void tree_entry_array_init(struct tree_entry_array *arr)
+{
+ hashmap_init(&arr->df_name_hash, df_name_hash_cmp, NULL, 0);
+}
+
static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entry *ent)
{
ALLOC_GROW(arr->entries, arr->nr + 1, arr->alloc);
@@ -48,6 +75,7 @@ static void tree_entry_array_clear(struct tree_entry_array *arr, int free_entrie
FREE_AND_NULL(arr->entries[i]);
}
arr->nr = 0;
+ hashmap_clear(&arr->df_name_hash);
}
static void tree_entry_array_release(struct tree_entry_array *arr, int free_entries)
@@ -137,6 +165,14 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
/* Sort again to order the entries for tree insertion */
ignore_mode = 0;
QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
+
+ /* Finally, initialize the directory-file conflict hash map */
+ for (size_t i = 0; i < count; i++) {
+ struct tree_entry *curr = arr->entries[i];
+ hashmap_entry_init(&curr->ent,
+ memhash(curr->name, df_path_len(curr->len, curr->mode)));
+ hashmap_put(&arr->df_name_hash, &curr->ent);
+ }
}
struct tree_entry_iterator {
@@ -311,6 +347,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
+ tree_entry_array_init(&arr);
+
do {
ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data, &line);
if (ret < 0)
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 15/17] mktree: optionally add to an existing tree
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (13 preceding siblings ...)
2024-06-19 21:58 ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
@ 2024-06-19 21:58 ` Victoria Dye via GitGitGadget
2024-06-26 21:23 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
` (2 subsequent siblings)
17 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:58 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Allow users to specify a single "tree-ish" value as a positional argument.
If provided, the contents of the given tree serve as the basis for the new
tree (or trees, in --batch mode) created by 'mktree', on top of which all of
the stdin-provided tree entries are applied.
At a high level, the entries are "applied" to a base tree by iterating
through the base tree using 'read_tree' in parallel with iterating through
the sorted & deduplicated stdin entries via their iterator. That is, for
each call to the 'build_index_from_tree callback of 'read_tree':
* If the iterator entry precedes the base tree entry, add it to the in-core
index, increment the iterator, and repeat.
* If the iterator entry has the same name as the base tree entry, add the
iterator entry to the index, increment the iterator, and return from the
callback to continue the 'read_tree' iteration.
* If the iterator entry follows the base tree entry, first check
'df_name_hash' to ensure we won't be adding an entry with the same name
later (with a different mode). If there's no directory/file conflict, add
the base tree entry to the index. In either case, return from the callback
to continue the 'read_tree' iteration.
Finally, once 'read_tree' is complete, add the remaining entries in the
iterator to the index and write out the index as a tree.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 7 +-
builtin/mktree.c | 138 +++++++++++++++++++++++++++++------
t/t1010-mktree.sh | 36 +++++++++
3 files changed, 159 insertions(+), 22 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index cf1fd82f754..260d0e0bd7b 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -9,7 +9,7 @@ git-mktree - Build a tree-object from formatted tree entries
SYNOPSIS
--------
[verse]
-'git mktree' [-z] [--missing] [--literally] [--batch]
+'git mktree' [-z] [--missing] [--literally] [--batch] [<tree-ish>]
DESCRIPTION
-----------
@@ -41,6 +41,11 @@ OPTIONS
optional. Note - if the `-z` option is used, lines are terminated
with NUL.
+<tree-ish>::
+ If provided, the tree entries provided in stdin are added to this
+ tree rather than a new empty one, replacing existing entries with
+ identical names. Not compatible with `--literally`.
+
INPUT FORMAT
------------
Tree entries may be specified in any of the formats compatible with the
diff --git a/builtin/mktree.c b/builtin/mktree.c
index b4d71dcdd02..96f06547a2a 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -12,7 +12,9 @@
#include "read-cache-ll.h"
#include "strbuf.h"
#include "tree.h"
+#include "object-name.h"
#include "parse-options.h"
+#include "pathspec.h"
#include "object-store-ll.h"
struct tree_entry {
@@ -204,47 +206,124 @@ static void tree_entry_iterator_advance(struct tree_entry_iterator *iter)
: NULL;
}
-static int add_tree_entry_to_index(struct index_state *istate,
+struct build_index_data {
+ struct tree_entry_iterator iter;
+ struct hashmap *df_name_hash;
+ struct index_state istate;
+};
+
+static int add_tree_entry_to_index(struct build_index_data *data,
struct tree_entry *ent)
{
struct cache_entry *ce;
- struct strbuf ce_name = STRBUF_INIT;
- strbuf_add(&ce_name, ent->name, ent->len);
-
- ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
+ ce = make_cache_entry(&data->istate, ent->mode, &ent->oid, ent->name, 0, 0);
if (!ce)
return error(_("make_cache_entry failed for path '%s'"), ent->name);
- add_index_entry(istate, ce, ADD_CACHE_JUST_APPEND);
- strbuf_release(&ce_name);
+ add_index_entry(&data->istate, ce, ADD_CACHE_JUST_APPEND);
return 0;
}
-static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
+static int build_index_from_tree(const struct object_id *oid,
+ struct strbuf *base, const char *filename,
+ unsigned mode, void *context)
{
- struct tree_entry_iterator iter = { NULL };
- struct index_state istate = INDEX_STATE_INIT(the_repository);
- istate.sparse_index = 1;
+ int result;
+ struct tree_entry *base_tree_ent;
+ struct build_index_data *cbdata = context;
+ size_t filename_len = strlen(filename);
+ size_t path_len = S_ISDIR(mode) ? st_add3(filename_len, base->len, 1)
+ : st_add(filename_len, base->len);
+
+ /* Create a tree entry from the current entry in read_tree iteration */
+ base_tree_ent = xcalloc(1, st_add3(sizeof(struct tree_entry), path_len, 1));
+ base_tree_ent->len = path_len;
+ base_tree_ent->mode = mode;
+ oidcpy(&base_tree_ent->oid, oid);
+
+ memcpy(base_tree_ent->name, base->buf, base->len);
+ memcpy(base_tree_ent->name + base->len, filename, filename_len);
+ if (S_ISDIR(mode))
+ base_tree_ent->name[base_tree_ent->len - 1] = '/';
+
+ while (cbdata->iter.current) {
+ struct tree_entry *ent = cbdata->iter.current;
+
+ int cmp = name_compare(ent->name, ent->len,
+ base_tree_ent->name, base_tree_ent->len);
+ if (!cmp || cmp < 0) {
+ tree_entry_iterator_advance(&cbdata->iter);
+
+ if (add_tree_entry_to_index(cbdata, ent) < 0) {
+ result = error(_("failed to add tree entry '%s'"), ent->name);
+ goto cleanup_and_return;
+ }
+
+ if (!cmp) {
+ result = 0;
+ goto cleanup_and_return;
+ } else
+ continue;
+ }
+
+ break;
+ }
+
+ /*
+ * If the tree entry should be replaced with an entry with the same name
+ * (but different mode), skip it.
+ */
+ hashmap_entry_init(&base_tree_ent->ent,
+ memhash(base_tree_ent->name, df_path_len(base_tree_ent->len, base_tree_ent->mode)));
+ if (hashmap_get_entry(cbdata->df_name_hash, base_tree_ent, ent, NULL)) {
+ result = 0;
+ goto cleanup_and_return;
+ }
+
+ if (add_tree_entry_to_index(cbdata, base_tree_ent)) {
+ result = -1;
+ goto cleanup_and_return;
+ }
+
+ result = 0;
+
+cleanup_and_return:
+ FREE_AND_NULL(base_tree_ent);
+ return result;
+}
+
+static void write_tree(struct tree_entry_array *arr, struct tree *base_tree,
+ struct object_id *oid)
+{
+ struct build_index_data cbdata = { 0 };
+ struct pathspec ps = { 0 };
sort_and_dedup_tree_entry_array(arr);
- tree_entry_iterator_init(&iter, arr);
+ index_state_init(&cbdata.istate, the_repository);
+ cbdata.istate.sparse_index = 1;
+ tree_entry_iterator_init(&cbdata.iter, arr);
+ cbdata.df_name_hash = &arr->df_name_hash;
/* Construct an in-memory index from the provided entries & base tree */
- while (iter.current) {
- struct tree_entry *ent = iter.current;
- tree_entry_iterator_advance(&iter);
+ if (base_tree &&
+ read_tree(the_repository, base_tree, &ps, build_index_from_tree, &cbdata) < 0)
+ die(_("failed to create tree"));
+
+ while (cbdata.iter.current) {
+ struct tree_entry *ent = cbdata.iter.current;
+ tree_entry_iterator_advance(&cbdata.iter);
- if (add_tree_entry_to_index(&istate, ent))
+ if (add_tree_entry_to_index(&cbdata, ent))
die(_("failed to add tree entry '%s'"), ent->name);
}
/* Write out new tree */
- if (cache_tree_update(&istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
+ if (cache_tree_update(&cbdata.istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
die(_("failed to write tree"));
- oidcpy(oid, &istate.cache_tree->oid);
+ oidcpy(oid, &cbdata.istate.cache_tree->oid);
- release_index(&istate);
+ release_index(&cbdata.istate);
}
static void write_tree_literally(struct tree_entry_array *arr,
@@ -268,7 +347,7 @@ static void write_tree_literally(struct tree_entry_array *arr,
}
static const char *mktree_usage[] = {
- "git mktree [-z] [--missing] [--literally] [--batch]",
+ "git mktree [-z] [--missing] [--literally] [--batch] [<tree-ish>]",
NULL
};
@@ -334,6 +413,7 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
struct tree_entry_array arr = { 0 };
struct mktree_line_data mktree_line_data = { .arr = &arr };
struct strbuf line = STRBUF_INIT;
+ struct tree *base_tree = NULL;
int ret;
const struct option option[] = {
@@ -346,6 +426,22 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
};
ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
+ if (ac > 1)
+ usage_with_options(mktree_usage, option);
+
+ if (ac) {
+ struct object_id base_tree_oid;
+
+ if (mktree_line_data.literally)
+ die(_("option '%s' and tree-ish cannot be used together"), "--literally");
+
+ if (repo_get_oid(the_repository, av[0], &base_tree_oid))
+ die(_("not a valid object name %s"), av[0]);
+
+ base_tree = parse_tree_indirect(&base_tree_oid);
+ if (!base_tree)
+ die(_("not a tree object: %s"), oid_to_hex(&base_tree_oid));
+ }
tree_entry_array_init(&arr);
@@ -373,7 +469,7 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
if (mktree_line_data.literally)
write_tree_literally(&arr, &oid);
else
- write_tree(&arr, &oid);
+ write_tree(&arr, base_tree, &oid);
puts(oid_to_hex(&oid));
fflush(stdout);
}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 08760141d6f..435ac23bd50 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -234,4 +234,40 @@ test_expect_success 'mktree with duplicate entries' '
test_cmp expect actual
'
+test_expect_success 'mktree with base tree' '
+ tree_oid=$(cat tree) &&
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ before_oid=$(git rev-parse ${tree_oid}:before) &&
+ head_oid=$(git rev-parse HEAD) &&
+
+ {
+ printf "040000 tree $folder_oid\ttest\n" &&
+ printf "100644 blob $before_oid\ttest.txt\n" &&
+ printf "040000 tree $folder_oid\ttest-\n" &&
+ printf "160000 commit $head_oid\ttest0\n"
+ } >top.base &&
+ git mktree <top.base >tree.base &&
+
+ {
+ printf "100755 blob $before_oid\tz\n" &&
+ printf "160000 commit $head_oid\ttest.xyz\n" &&
+ printf "040000 tree $folder_oid\ta\n" &&
+ printf "100644 blob $before_oid\ttest\n"
+ } >top.append &&
+ git mktree $(cat tree.base) <top.append >tree.actual &&
+
+ {
+ printf "040000 tree $folder_oid\ta\n" &&
+ printf "100644 blob $before_oid\ttest\n" &&
+ printf "040000 tree $folder_oid\ttest-\n" &&
+ printf "100644 blob $before_oid\ttest.txt\n" &&
+ printf "160000 commit $head_oid\ttest.xyz\n" &&
+ printf "160000 commit $head_oid\ttest0\n" &&
+ printf "100755 blob $before_oid\tz\n"
+ } >expect &&
+ git ls-tree $(cat tree.actual) >actual &&
+
+ test_cmp expect actual
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 16/17] mktree: allow deeper paths in input
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (14 preceding siblings ...)
2024-06-19 21:58 ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
@ 2024-06-19 21:58 ` Victoria Dye via GitGitGadget
2024-06-27 19:29 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-25 23:26 ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
17 siblings, 1 reply; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:58 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
Update 'git mktree' to handle entries nested inside of directories (e.g.
'path/to/a/file.txt'). This functionality requires a series of changes:
* In 'sort_and_dedup_tree_entry_array()', remove entries inside of
directories that come after them in input order.
* Also in 'sort_and_dedup_tree_entry_array()', mark directories that contain
entries that come after them in input order (e.g., 'folder/' followed by
'folder/file.txt') as "need to expand".
* In 'add_tree_entry_to_index()', if a tree entry is marked as "need to
expand", recurse into it with 'read_tree_at()' & 'build_index_from_tree'.
* In 'build_index_from_tree()', if a user-specified tree entry is contained
within the current iterated entry, return 'READ_TREE_RECURSIVE' to recurse
into the iterated tree.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 5 ++
builtin/mktree.c | 101 ++++++++++++++++++++++++++++++---
t/t1010-mktree.sh | 107 +++++++++++++++++++++++++++++++++--
3 files changed, 200 insertions(+), 13 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 260d0e0bd7b..43cd9b10cc7 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -58,6 +58,11 @@ Higher stages represent conflicted files in an index; this information
cannot be represented in a tree object. The command will fail without
writing the tree if a higher order stage is specified for any entry.
+Entries may use full pathnames containing directory separators to specify
+entries nested within one or more directories. These entries are inserted
+into the appropriate tree in the base tree-ish if one exists. Otherwise,
+empty parent trees are created to contain the entries.
+
The order of the tree entries is normalized by `mktree` so pre-sorting the
input by path is not required. Multiple entries provided with the same path
are deduplicated, with only the last one specified added to the tree.
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 96f06547a2a..74cec92a517 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -22,6 +22,7 @@ struct tree_entry {
/* Internal */
size_t order;
+ int expand_dir;
unsigned mode;
struct object_id oid;
@@ -39,6 +40,7 @@ struct tree_entry_array {
struct tree_entry **entries;
struct hashmap df_name_hash;
+ int has_nested_entries;
};
static int df_name_hash_cmp(const void *cmp_data UNUSED,
@@ -70,6 +72,13 @@ static void tree_entry_array_push(struct tree_entry_array *arr, struct tree_entr
arr->entries[arr->nr++] = ent;
}
+static struct tree_entry *tree_entry_array_pop(struct tree_entry_array *arr)
+{
+ if (!arr->nr)
+ return NULL;
+ return arr->entries[--arr->nr];
+}
+
static void tree_entry_array_clear(struct tree_entry_array *arr, int free_entries)
{
if (free_entries) {
@@ -109,8 +118,10 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
if (!verify_path(ent->name, mode))
die(_("invalid path '%s'"), path);
- if (strchr(ent->name, '/'))
- die("path %s contains slash", path);
+
+ /* mark has_nested_entries if needed */
+ if (!arr->has_nested_entries && strchr(ent->name, '/'))
+ arr->has_nested_entries = 1;
/* Add trailing slash to dir */
if (S_ISDIR(mode))
@@ -168,6 +179,46 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
ignore_mode = 0;
QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
+ if (arr->has_nested_entries) {
+ struct tree_entry_array parent_dir_ents = { 0 };
+
+ count = arr->nr;
+ arr->nr = 0;
+
+ /* Remove any entries where one of its parent dirs has a higher 'order' */
+ for (size_t i = 0; i < count; i++) {
+ const char *skipped_prefix;
+ struct tree_entry *parent;
+ struct tree_entry *curr = arr->entries[i];
+ int skip_entry = 0;
+
+ while ((parent = tree_entry_array_pop(&parent_dir_ents))) {
+ if (!skip_prefix(curr->name, parent->name, &skipped_prefix))
+ continue;
+
+ /* entry in dir, so we push the parent back onto the stack */
+ tree_entry_array_push(&parent_dir_ents, parent);
+
+ if (parent->order > curr->order)
+ skip_entry = 1;
+ else
+ parent->expand_dir = 1;
+
+ break;
+ }
+
+ if (!skip_entry) {
+ arr->entries[arr->nr++] = curr;
+ if (S_ISDIR(curr->mode))
+ tree_entry_array_push(&parent_dir_ents, curr);
+ } else {
+ FREE_AND_NULL(curr);
+ }
+ }
+
+ tree_entry_array_release(&parent_dir_ents, 0);
+ }
+
/* Finally, initialize the directory-file conflict hash map */
for (size_t i = 0; i < count; i++) {
struct tree_entry *curr = arr->entries[i];
@@ -212,15 +263,40 @@ struct build_index_data {
struct index_state istate;
};
+static int build_index_from_tree(const struct object_id *oid,
+ struct strbuf *base, const char *filename,
+ unsigned mode, void *context);
+
static int add_tree_entry_to_index(struct build_index_data *data,
struct tree_entry *ent)
{
- struct cache_entry *ce;
- ce = make_cache_entry(&data->istate, ent->mode, &ent->oid, ent->name, 0, 0);
- if (!ce)
- return error(_("make_cache_entry failed for path '%s'"), ent->name);
+ if (ent->expand_dir) {
+ int ret = 0;
+ struct pathspec ps = { 0 };
+ struct tree *subtree = parse_tree_indirect(&ent->oid);
+ struct strbuf base_path = STRBUF_INIT;
+ strbuf_add(&base_path, ent->name, ent->len);
+
+ if (!subtree)
+ ret = error(_("not a tree object: %s"), oid_to_hex(&ent->oid));
+ else if (read_tree_at(the_repository, subtree, &base_path, 0, &ps,
+ build_index_from_tree, data) < 0)
+ ret = -1;
+
+ strbuf_release(&base_path);
+ if (ret)
+ return ret;
+
+ } else {
+ struct cache_entry *ce = make_cache_entry(&data->istate,
+ ent->mode, &ent->oid,
+ ent->name, 0, 0);
+ if (!ce)
+ return error(_("make_cache_entry failed for path '%s'"), ent->name);
+
+ add_index_entry(&data->istate, ce, ADD_CACHE_JUST_APPEND);
+ }
- add_index_entry(&data->istate, ce, ADD_CACHE_JUST_APPEND);
return 0;
}
@@ -247,10 +323,12 @@ static int build_index_from_tree(const struct object_id *oid,
base_tree_ent->name[base_tree_ent->len - 1] = '/';
while (cbdata->iter.current) {
+ const char *skipped_prefix;
struct tree_entry *ent = cbdata->iter.current;
+ int cmp;
- int cmp = name_compare(ent->name, ent->len,
- base_tree_ent->name, base_tree_ent->len);
+ cmp = name_compare(ent->name, ent->len,
+ base_tree_ent->name, base_tree_ent->len);
if (!cmp || cmp < 0) {
tree_entry_iterator_advance(&cbdata->iter);
@@ -264,6 +342,11 @@ static int build_index_from_tree(const struct object_id *oid,
goto cleanup_and_return;
} else
continue;
+ } else if (skip_prefix(ent->name, base_tree_ent->name, &skipped_prefix) &&
+ S_ISDIR(base_tree_ent->mode)) {
+ /* The entry is in the current traversed tree entry, so we recurse */
+ result = READ_TREE_RECURSIVE;
+ goto cleanup_and_return;
}
break;
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 435ac23bd50..9b0e0cf302f 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -85,12 +85,21 @@ test_expect_success 'mktree with invalid submodule OIDs' '
done
'
-test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
- test_must_fail git mktree <all
+test_expect_success 'mktree reads ls-tree -r output (1)' '
+ git mktree <all >actual &&
+ test_cmp tree actual
'
-test_expect_success 'mktree refuses to read ls-tree -r output (2)' '
- test_must_fail git mktree <all.withsub
+test_expect_success 'mktree reads ls-tree -r output (2)' '
+ git mktree <all.withsub >actual &&
+ test_cmp tree.withsub actual
+'
+
+test_expect_success 'mktree de-duplicates files inside directories' '
+ git ls-tree $(cat tree) >everything &&
+ cat <all >top_and_all &&
+ git mktree <top_and_all >actual &&
+ test_cmp tree actual
'
test_expect_success 'mktree fails on malformed input' '
@@ -234,6 +243,50 @@ test_expect_success 'mktree with duplicate entries' '
test_cmp expect actual
'
+test_expect_success 'mktree adds entry after nested entry' '
+ tree_oid=$(cat tree) &&
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ one_oid=$(git rev-parse ${tree_oid}:folder/one) &&
+
+ {
+ printf "040000 tree $folder_oid\tearly\n" &&
+ printf "100644 blob $one_oid\tearly/one\n" &&
+ printf "100644 blob $one_oid\tlater\n" &&
+ printf "040000 tree $EMPTY_TREE\tnew-tree\n" &&
+ printf "100644 blob $one_oid\tnew-tree/one\n" &&
+ printf "100644 blob $one_oid\tzzz\n"
+ } >top.rec &&
+ git mktree <top.rec >tree.actual &&
+
+ {
+ printf "040000 tree $folder_oid\tearly\n" &&
+ printf "100644 blob $one_oid\tlater\n" &&
+ printf "040000 tree $folder_oid\tnew-tree\n" &&
+ printf "100644 blob $one_oid\tzzz\n"
+ } >expect &&
+ git ls-tree $(cat tree.actual) >actual &&
+
+ test_cmp expect actual
+'
+
+test_expect_success 'mktree inserts entries into directories' '
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ one_oid=$(git rev-parse ${tree_oid}:folder/one) &&
+ blob_oid=$(git rev-parse ${tree_oid}:before) &&
+ {
+ printf "040000 tree $folder_oid\tfolder\n" &&
+ printf "100644 blob $blob_oid\tfolder/two\n"
+ } | git mktree >actual &&
+
+ {
+ printf "100644 blob $one_oid\tfolder/one\n" &&
+ printf "100644 blob $blob_oid\tfolder/two\n"
+ } >expect &&
+ git ls-tree -r $(cat actual) >actual &&
+
+ test_cmp expect actual
+'
+
test_expect_success 'mktree with base tree' '
tree_oid=$(cat tree) &&
folder_oid=$(git rev-parse ${tree_oid}:folder) &&
@@ -270,4 +323,50 @@ test_expect_success 'mktree with base tree' '
test_cmp expect actual
'
+test_expect_success 'mktree with base tree (deep)' '
+ tree_oid=$(cat tree) &&
+ folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+ before_oid=$(git rev-parse ${tree_oid}:before) &&
+ folder_one_oid=$(git rev-parse ${tree_oid}:folder/one) &&
+ head_oid=$(git rev-parse HEAD) &&
+
+ {
+ printf "100755 blob $before_oid\tfolder/before\n" &&
+ printf "100644 blob $before_oid\tfolder/one.txt\n" &&
+ printf "160000 commit $head_oid\tfolder/sub\n" &&
+ printf "040000 tree $folder_oid\tfolder/one\n" &&
+ printf "040000 tree $folder_oid\tfolder/one/deeper\n"
+ } >top.append &&
+ git mktree <top.append $(cat tree) >tree.actual &&
+
+ {
+ printf "100755 blob $before_oid\tfolder/before\n" &&
+ printf "100644 blob $before_oid\tfolder/one.txt\n" &&
+ printf "100644 blob $folder_one_oid\tfolder/one/deeper/one\n" &&
+ printf "100644 blob $folder_one_oid\tfolder/one/one\n" &&
+ printf "160000 commit $head_oid\tfolder/sub\n"
+ } >expect &&
+ git ls-tree -r $(cat tree.actual) -- folder/ >actual &&
+
+ test_cmp expect actual
+'
+
+test_expect_success 'mktree fails on directory-file conflict' '
+ tree_oid="$(cat tree)" &&
+ blob_oid="$(git rev-parse $tree_oid:folder.txt)" &&
+
+ {
+ printf "100644 blob $blob_oid\ttest\n" &&
+ printf "100644 blob $blob_oid\ttest/deeper\n"
+ } |
+ test_must_fail git mktree 2>err &&
+ test_grep "You have both test and test/deeper" err &&
+
+ {
+ printf "100644 blob $blob_oid\tfolder/one/deeper/deep\n"
+ } |
+ test_must_fail git mktree $tree_oid 2>err &&
+ test_grep "You have both folder/one and folder/one/deeper/deep" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH v2 17/17] mktree: remove entries when mode is 0
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (15 preceding siblings ...)
2024-06-19 21:58 ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
@ 2024-06-19 21:58 ` Victoria Dye via GitGitGadget
2024-06-25 23:26 ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
17 siblings, 0 replies; 65+ messages in thread
From: Victoria Dye via GitGitGadget @ 2024-06-19 21:58 UTC (permalink / raw)
To: git; +Cc: Eric Sunshine, Patrick Steinhardt, Victoria Dye, Victoria Dye
From: Victoria Dye <vdye@github.com>
If tree entries are specified with a mode with value '0', remove them from
the tree instead of adding/updating them. If the mode is '0', both the
provided type string (if specified) and the object ID of the entry are
ignored.
Note that entries with mode '0' are added to the 'struct tree_ent_array'
with a trailing slash so that it's always treated like a directory. This is
a bit of a hack to ensure that the removal supercedes any preceding entries
with matching names, as well as any nested inside a directory matching its
name.
Signed-off-by: Victoria Dye <vdye@github.com>
---
Documentation/git-mktree.txt | 4 ++++
builtin/mktree.c | 16 +++++++++++----
t/t1010-mktree.sh | 38 ++++++++++++++++++++++++++++++++++++
3 files changed, 54 insertions(+), 4 deletions(-)
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 43cd9b10cc7..52e6005c1d3 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -63,6 +63,10 @@ entries nested within one or more directories. These entries are inserted
into the appropriate tree in the base tree-ish if one exists. Otherwise,
empty parent trees are created to contain the entries.
+An entry with a mode of "0" will remove an entry of the same name from the
+base tree-ish. If no tree-ish argument is given, or the entry does not exist
+in that tree, the entry is ignored.
+
The order of the tree entries is normalized by `mktree` so pre-sorting the
input by path is not required. Multiple entries provided with the same path
are deduplicated, with only the last one specified added to the tree.
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 74cec92a517..e7adcb384c8 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -32,7 +32,7 @@ struct tree_entry {
static inline size_t df_path_len(size_t pathlen, unsigned int mode)
{
- return S_ISDIR(mode) ? pathlen - 1 : pathlen;
+ return (S_ISDIR(mode) || !mode) ? pathlen - 1 : pathlen;
}
struct tree_entry_array {
@@ -108,7 +108,7 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
size_t len_to_copy = len;
/* Normalize and validate entry path */
- if (S_ISDIR(mode)) {
+ if (S_ISDIR(mode) || !mode) {
while(len_to_copy > 0 && is_dir_sep(path[len_to_copy - 1]))
len_to_copy--;
len = len_to_copy + 1; /* add space for trailing slash */
@@ -124,7 +124,7 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
arr->has_nested_entries = 1;
/* Add trailing slash to dir */
- if (S_ISDIR(mode))
+ if (S_ISDIR(mode) || !mode)
ent->name[len - 1] = '/';
}
@@ -209,7 +209,7 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
if (!skip_entry) {
arr->entries[arr->nr++] = curr;
- if (S_ISDIR(curr->mode))
+ if (S_ISDIR(curr->mode) || !curr->mode)
tree_entry_array_push(&parent_dir_ents, curr);
} else {
FREE_AND_NULL(curr);
@@ -270,6 +270,9 @@ static int build_index_from_tree(const struct object_id *oid,
static int add_tree_entry_to_index(struct build_index_data *data,
struct tree_entry *ent)
{
+ if (!ent->mode)
+ return 0;
+
if (ent->expand_dir) {
int ret = 0;
struct pathspec ps = { 0 };
@@ -450,6 +453,10 @@ static int mktree_line(unsigned int mode, struct object_id *oid,
if (stage)
die(_("path '%s' is unmerged"), path);
+ /* OID ignored for zero-mode entries; append unconditionally */
+ if (!mode)
+ goto append_entry;
+
if (obj_type != OBJ_ANY && mode_type != obj_type)
die("object type (%s) doesn't match mode type (%s)",
type_name(obj_type), type_name(mode_type));
@@ -484,6 +491,7 @@ static int mktree_line(unsigned int mode, struct object_id *oid,
}
}
+append_entry:
append_to_tree(mode, oid, path, data->arr, data->literally);
return 0;
}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 9b0e0cf302f..5ed4352054a 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -369,4 +369,42 @@ test_expect_success 'mktree fails on directory-file conflict' '
test_grep "You have both folder/one and folder/one/deeper/deep" err
'
+test_expect_success 'mktree with remove entries' '
+ tree_oid="$(cat tree)" &&
+ blob_oid="$(git rev-parse $tree_oid:folder.txt)" &&
+
+ {
+ printf "100644 blob $blob_oid\ttest/deeper/deep.txt\n" &&
+ printf "100644 blob $blob_oid\ttest.txt\n" &&
+ printf "100644 blob $blob_oid\texample\n" &&
+ printf "100644 blob $blob_oid\texample.a/file\n" &&
+ printf "100644 blob $blob_oid\texample.txt\n" &&
+ printf "040000 tree $tree_oid\tfolder\n" &&
+ printf "0 $ZERO_OID\tfolder\n" &&
+ printf "0 $ZERO_OID\tmissing\n"
+ } | git mktree >tree.base &&
+
+ {
+ printf "0 $ZERO_OID\texample.txt\n" &&
+ printf "0 $ZERO_OID\ttest/deeper\n"
+ } | git mktree $(cat tree.base) >tree.actual &&
+
+ {
+ printf "100644 blob $blob_oid\texample\n" &&
+ printf "100644 blob $blob_oid\texample.a/file\n" &&
+ printf "100644 blob $blob_oid\ttest.txt\n"
+ } >expect &&
+ git ls-tree -r $(cat tree.actual) >actual &&
+
+ test_cmp expect actual
+'
+
+test_expect_success 'type and oid not checked if entry mode is 0' '
+ # type and oid do not match
+ printf "0 commit $EMPTY_TREE\tfolder.txt\n" |
+ git mktree >tree.actual &&
+
+ test "$(cat tree.actual)" = $EMPTY_TREE
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [PATCH v2 07/17] mktree: use read_index_info to read stdin lines
2024-06-19 21:57 ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
@ 2024-06-20 20:18 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-20 20:18 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget
Cc: git, Eric Sunshine, Patrick Steinhardt, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +INPUT FORMAT
> +------------
> +Tree entries may be specified in any of the formats compatible with the
> +`--index-info` option to linkgit:git-update-index[1]:
> +
> +include::index-info-formats.txt[]
> +
> +Note that if the `stage` of a tree entry is given, the value must be 0.
> +Higher stages represent conflicted files in an index; this information
> +cannot be represented in a tree object. The command will fail without
> +writing the tree if a higher order stage is specified for any entry.
> +
> +The order of the tree entries is normalized by `mktree` so pre-sorting the
> +input by path is not required.
Nicely done. I was wondering how the common/shared text that was
made more generic in 04/17 would be made to fit in the new context,
and the "Note that" makes them mix very well.
The updated code is exactly as expected.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v2 11/17] mktree: overwrite duplicate entries
2024-06-19 21:57 ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
@ 2024-06-20 22:05 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-20 22:05 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget
Cc: git, Eric Sunshine, Patrick Steinhardt, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> If multiple tree entries with the same name are provided as input to
> 'mktree', only write the last one to the tree. Entries are considered
> duplicates if they have identical names (*not* considering mode); if a blob
> and a tree with the same name are provided, only the last one will be
> written to the tree. A tree with duplicate entries is invalid (per 'git
> fsck'), so that condition should be avoided wherever possible.
The "should be avoided" in the last sentence can be satisified
either by the callers being extra careful, or the callee ignoring
earlier entries with the same path. I do not have a strong
objection against allowing looser callers, but if that is what is
going on, perhaps
By teaching "mktree" to ignore the earlier entries for the
same path in the input, the callers can be more casual about
sending duplicate entries in order to avoid creating an
invalid tree objects.
is a more honest justification for this setp?
> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
> index 5f3a6dfe38e..cf1fd82f754 100644
> --- a/Documentation/git-mktree.txt
> +++ b/Documentation/git-mktree.txt
> @@ -54,7 +54,8 @@ cannot be represented in a tree object. The command will fail without
> writing the tree if a higher order stage is specified for any entry.
>
> The order of the tree entries is normalized by `mktree` so pre-sorting the
> -input by path is not required.
> +input by path is not required. Multiple entries provided with the same path
> +are deduplicated, with only the last one specified added to the tree.
OK.
> struct tree_entry {
> + /* Internal */
> + size_t order;
> +
> unsigned mode;
> struct object_id oid;
> int len;
> @@ -74,15 +77,49 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
> ent->len = len;
> oidcpy(&ent->oid, oid);
>
> + ent->order = arr->nr;
> tree_entry_array_push(arr, ent);
> }
>
> -static int ent_compare(const void *a_, const void *b_)
> +static int ent_compare(const void *a_, const void *b_, void *ctx)
> {
> + int cmp;
> struct tree_entry *a = *(struct tree_entry **)a_;
> struct tree_entry *b = *(struct tree_entry **)b_;
> - return base_name_compare(a->name, a->len, a->mode,
> - b->name, b->len, b->mode);
> + int ignore_mode = *((int *)ctx);
> +
> + if (ignore_mode)
> + cmp = name_compare(a->name, a->len, b->name, b->len);
> + else
> + cmp = base_name_compare(a->name, a->len, a->mode,
> + b->name, b->len, b->mode);
> + return cmp ? cmp : b->order - a->order;
> +}
Having two similar functions that could go out of sync has bothered
me somewhat. We could instead do
int a_mode = ignore_mode ? 0 : a->mode;
int b_mode = ignore_mode ? 0 : b->mode;
cmp = base_name_compare(a->name, a->len, a_mode,
b->name, b->len, b_mode);
but that should be done by rewriting name_compare() in terms of
base_name_compare(), which will help more callers, not just this
one.
> +static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> +{
> + size_t count = arr->nr;
> + struct tree_entry *prev = NULL;
> +
> + int ignore_mode = 1;
> + QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
Swap the decl for ignore_mode and the blank line above it?
If the callback context only needs a single bit, ent_compare() could
just use the NULL-ness of ctx as "do we want to ignore mode?" bit.
> + arr->nr = 0;
> + for (size_t i = 0; i < count; i++) {
> + struct tree_entry *curr = arr->entries[i];
> + if (prev &&
> + !name_compare(prev->name, prev->len,
> + curr->name, curr->len)) {
> + FREE_AND_NULL(curr);
> + } else {
> + arr->entries[arr->nr++] = curr;
> + prev = curr;
> + }
> + }
As long as this is done for a single tree (i.e. the paths do not
have any slashes in them), this "sort them all and keep the last
one" is a good strategy.
> + /* Sort again to order the entries for tree insertion */
> + ignore_mode = 0;
> + QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
OK. We from time to time find need to do this, and I always regret
that we didn't design the sort order of paths in a tree (and in the
index) like so [*]. But that is almost 20 years too late ;-).
Looking good.
[Footnote]
* A directory entry $T should have sorted after a non-directory
entry $T but before any non-directory entry whose path has $T
as its prefix (e.g. even a blob whose path is $T + "\001" should
sort after a tree $T). That way we didn't have to worry about a
blob at ($T + '-') sorting before a tree at $T but a blob at ($T
+ '0') sorting after that tree.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v2 12/17] mktree: create tree using an in-core index
2024-06-19 21:58 ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
@ 2024-06-20 22:26 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-20 22:26 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget
Cc: git, Eric Sunshine, Patrick Steinhardt, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -60,17 +66,25 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
> if (literally) {
> FLEX_ALLOC_MEM(ent, name, path, len);
> } else {
> + size_t len_to_copy = len;
> +
> /* Normalize and validate entry path */
> if (S_ISDIR(mode)) {
> - while(len > 0 && is_dir_sep(path[len - 1]))
> - len--;
> + while(len_to_copy > 0 && is_dir_sep(path[len_to_copy - 1]))
Let's fix the style issue while at it, as we are doing other changes
in this step anyway. "while(" -> "while (".
> + len_to_copy--;
> + len = len_to_copy + 1; /* add space for trailing slash */
Do we need to do st_add() here? Perhaps not, but I just noticed the
careful use of st_add3() below, so...
> + ent = xcalloc(1, st_add3(sizeof(struct tree_entry), len, 1));
> + memcpy(ent->name, path, len_to_copy);
>
> if (!verify_path(ent->name, mode))
> die(_("invalid path '%s'"), path);
> if (strchr(ent->name, '/'))
> die("path %s contains slash", path);
> +
> + /* Add trailing slash to dir */
> + if (S_ISDIR(mode))
> + ent->name[len - 1] = '/';
OK.
> @@ -88,11 +102,14 @@ static int ent_compare(const void *a_, const void *b_, void *ctx)
> struct tree_entry *b = *(struct tree_entry **)b_;
> int ignore_mode = *((int *)ctx);
>
> - if (ignore_mode)
> - cmp = name_compare(a->name, a->len, b->name, b->len);
> - else
> - cmp = base_name_compare(a->name, a->len, a->mode,
> - b->name, b->len, b->mode);
> + size_t a_len = a->len, b_len = b->len;
> +
> + if (ignore_mode) {
> + a_len = df_path_len(a_len, a->mode);
> + b_len = df_path_len(b_len, b->mode);
> + }
> +
> + cmp = name_compare(a->name, a_len, b->name, b_len);
> return cmp ? cmp : b->order - a->order;
> }
OK, now the "mode" is sort of "encoded" already in the "name" by the
slash at the end, the way "ignore-mode" works needs to be redesigned.
If we are ignoring mode, we are dropping the trailing '/' and
otherwise we just feed the name with possible trailing '/', and the
same name_compare() can be used. OK.
> @@ -108,8 +125,8 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> for (size_t i = 0; i < count; i++) {
> struct tree_entry *curr = arr->entries[i];
> if (prev &&
> - !name_compare(prev->name, prev->len,
> - curr->name, curr->len)) {
> + !name_compare(prev->name, df_path_len(prev->len, prev->mode),
> + curr->name, df_path_len(curr->len, curr->mode))) {
> FREE_AND_NULL(curr);
And here is the matching adjustment for the dedup comparison, which
makes sense.
> @@ -122,24 +139,43 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
> }
>
> +static int add_tree_entry_to_index(struct index_state *istate,
> + struct tree_entry *ent)
> +{
> + struct cache_entry *ce;
> + struct strbuf ce_name = STRBUF_INIT;
> + strbuf_add(&ce_name, ent->name, ent->len);
> +
Perhaps swap the first statement (which is strbuf_add()) and the
blank line that ought to separate the decls and the first statement?
> + ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
> + if (!ce)
> + return error(_("make_cache_entry failed for path '%s'"), ent->name);
> +
> + add_index_entry(istate, ce, ADD_CACHE_JUST_APPEND);
> + strbuf_release(&ce_name);
> + return 0;
> +}
This is only to append; presumably the caller drives this function
out of a sorted list.
> static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
> {
> + struct index_state istate = INDEX_STATE_INIT(the_repository);
> + istate.sparse_index = 1;
>
> sort_and_dedup_tree_entry_array(arr);
>
> + /* Construct an in-memory index from the provided entries */
> for (size_t i = 0; i < arr->nr; i++) {
> struct tree_entry *ent = arr->entries[i];
> +
> + if (add_tree_entry_to_index(&istate, ent))
> + die(_("failed to add tree entry '%s'"), ent->name);
> }
> + /* Write out new tree */
> + if (cache_tree_update(&istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
> + die(_("failed to write tree"));
Hmph. Are we doing any run-time verification of what we produce
(e.g., if sort_and_dedup_tree_entry_array() fails to dedup or sort
correctly due to a bug or two, would cache_tree_update() notice that
the in-core index array is fishy)? I am not suggesting to add an
unconditional "we appended to the index, so we should sort the
entries in it" step before cache_tree_update() call. It is the
opposite---if we have extra checks in cache_tree_udpate() to slow us
down and if we are confident that the loop that added tree entries
to the index is correct, if we can bypass such checks.
> + oidcpy(oid, &istate.cache_tree->oid);
> +
> + release_index(&istate);
> }
This is the gem of the whole series. Clever.
What is so satisfying is that it takes not that much of code to
replace the "here is a flat buffer of what the contents of a single
tree object ought to look like" with "let's build in-core index and
write it out just like write-tree would". Nice.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v2 00/17] mktree: support more flexible usage
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
` (16 preceding siblings ...)
2024-06-19 21:58 ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
@ 2024-06-25 23:26 ` Junio C Hamano
2024-07-10 21:40 ` Junio C Hamano
17 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2024-06-25 23:26 UTC (permalink / raw)
To: git
Cc: Victoria Dye via GitGitGadget, Eric Sunshine, Patrick Steinhardt,
Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> The goal of this series is to make 'git mktree' a much more flexible and
> powerful tool for constructing arbitrary trees in memory without the use of
> an index or worktree.
I've read earlier parts of this series carefully (but didn't manage
to get to the end of it), and I saw Patrick and Eric gave reviews on
the earlier round, but otherwise this topic seems to have stalled.
https://lore.kernel.org/git/pull.1746.v2.git.1718834285.gitgitgadget@gmail.com/
Any more comments?
Otherwise, if I find time to read through 13-17/17 and did not find
anything glaringly wrong, I am planning to mark the topic ready for
'next'. Thanks.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index
2024-06-19 21:58 ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
@ 2024-06-26 21:10 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-26 21:10 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget
Cc: git, Eric Sunshine, Patrick Steinhardt, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -157,14 +186,18 @@ static int add_tree_entry_to_index(struct index_state *istate,
>
> static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
> {
> + struct tree_entry_iterator iter = { NULL };
> struct index_state istate = INDEX_STATE_INIT(the_repository);
> istate.sparse_index = 1;
>
> sort_and_dedup_tree_entry_array(arr);
>
> - /* Construct an in-memory index from the provided entries */
> - for (size_t i = 0; i < arr->nr; i++) {
> - struct tree_entry *ent = arr->entries[i];
> + tree_entry_iterator_init(&iter, arr);
> +
> + /* Construct an in-memory index from the provided entries & base tree */
> + while (iter.current) {
> + struct tree_entry *ent = iter.current;
> + tree_entry_iterator_advance(&iter);
>
> if (add_tree_entry_to_index(&istate, ent))
> die(_("failed to add tree entry '%s'"), ent->name);
OK, looking good.
If we make _iterator_init() and _iterator_advance to both return the
current, then the loop can still be like so:
for (ent = tree_entry_iterator_init(&iter, arr);
ent;
ent = tree_entry_iterator_advance(&iter)) {
... use ent ...
}
and .current does not need to be a non-private member, if we wanted
to (I am not convinced it is necessarily a better interface to make
.current as private---especially if we end up needing _peek() method
to learn its value, i.e. the value the most recent call to _init()
or _advance() returned. If we need a write access to .current from
outside the interator interface, then what I outlined above would
not be a good match).
Thanks.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v2 15/17] mktree: optionally add to an existing tree
2024-06-19 21:58 ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
@ 2024-06-26 21:23 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-26 21:23 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget
Cc: git, Eric Sunshine, Patrick Steinhardt, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Allow users to specify a single "tree-ish" value as a positional argument.
> If provided, the contents of the given tree serve as the basis for the new
> tree (or trees, in --batch mode) created by 'mktree', on top of which all of
> the stdin-provided tree entries are applied.
>
> At a high level, the entries are "applied" to a base tree by iterating
> through the base tree using 'read_tree' in parallel with iterating through
> the sorted & deduplicated stdin entries via their iterator. That is, for
> each call to the 'build_index_from_tree callback of 'read_tree':
>
> * If the iterator entry precedes the base tree entry, add it to the in-core
> index, increment the iterator, and repeat.
"add it" -> "add the base tree entry"? The next bullet point
explicitly says it adds "the iterator entry", which makes it crystal
clear what is going on.
> * If the iterator entry has the same name as the base tree entry, add the
> iterator entry to the index, increment the iterator, and return from the
> callback to continue the 'read_tree' iteration.
> * If the iterator entry follows the base tree entry, first check
> 'df_name_hash' to ensure we won't be adding an entry with the same name
> later (with a different mode). If there's no directory/file conflict, add
> the base tree entry to the index. In either case, return from the callback
> to continue the 'read_tree' iteration.
IOW, we take advantage of the fact that iteration over the base tree
and iteration over the sorted-and-deduped entries from the standard
input are already sorted, and do a simple bog-standard "merge" of
two lists?
We'd probably have many common pitfalls to avoid with the read-tree
walking the index and tree(s) in parallel (I still remember the pain
of maintaining the cache_bottom for the side that walks the index).
Makes me wonder if this opens a way to a future where somehow
read-tree also shares code with this new code in mktree (or vice
versa).
> Finally, once 'read_tree' is complete, add the remaining entries in the
> iterator to the index and write out the index as a tree.
Or vice versa? We may finish iterating over the entries read from
the standard input but there still are entries from the base tree
side remaining, which would need to be added to complete the index,
right?
> +<tree-ish>::
> + If provided, the tree entries provided in stdin are added to this
> + tree rather than a new empty one, replacing existing entries with
> + identical names. Not compatible with `--literally`.
"replacing" might need a bit more clarification when we start
reading paths with multiple pathname components concatenated with
slashes. In the base tree, we may have
100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42 D
and it can (indirectly) replaced by the standard input stream
feeding entries like these
100644 blob b0517166ae2ad92f3b17638cbdee0f04b8170d99 D/a
100644 blob 495a54bc1397e2fd3177c2733baf4899b48d30bd D/b
which also leads us to compute a tree entry
040000 tree eccdce44520aa3ef4ac5ba090df53eadb01229ef D/
in the top-level tree?
The code looks good to me. Thanks.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v2 16/17] mktree: allow deeper paths in input
2024-06-19 21:58 ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
@ 2024-06-27 19:29 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-06-27 19:29 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget
Cc: git, Eric Sunshine, Patrick Steinhardt, Victoria Dye
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Update 'git mktree' to handle entries nested inside of directories (e.g.
> 'path/to/a/file.txt'). This functionality requires a series of changes:
>
> * In 'sort_and_dedup_tree_entry_array()', remove entries inside of
> directories that come after them in input order.
So if you feed "folder/file.txt" and then "folder/", then
"folder/file.txt" gets removed? It is unclear offhand why that is
the right thing to do.
> * Also in 'sort_and_dedup_tree_entry_array()', mark directories that contain
> entries that come after them in input order (e.g., 'folder/' followed by
> 'folder/file.txt') as "need to expand".
Makes me wonder what happens to the object name recorded in the
input for "folder/" when something like this happens. Ideally,
adding (or replacing) "folder/file.txt" to the set of files we
collected out of the base tree and the input stream for "folder/"
and writing that "folder/" out as a tree would result in a tree
whose object name exactly matches it (and we will error out if it
does not)? Or is "need to expand" a signal that we should ignore
the object name in the input and we need to recompute it ourselves?
Again, it is unclear offhand what we want the "need to expand" is
used for.
> * In 'add_tree_entry_to_index()', if a tree entry is marked as "need to
> expand", recurse into it with 'read_tree_at()' & 'build_index_from_tree'.
> * In 'build_index_from_tree()', if a user-specified tree entry is contained
> within the current iterated entry, return 'READ_TREE_RECURSIVE' to recurse
> into the iterated tree.
Surely, no matter what we choose to do to the object name given with
"folder/" when the input stream also talks about "folder/file.txt",
we'd need to recurse into the subtree. But I think we need a higher
level description of what exactly we want to do to the multi-level
pathnames (i.e., "we want to handle them this way") before going into
the implementation details of how we do so (i.e., "hence we deal with
a multi-level pathname this way at these places in the code") in
these bullet points.
Especially, I do not quite understand what semantics the first
bullet point is trying to achieve.
> +Entries may use full pathnames containing directory separators to specify
> +entries nested within one or more directories. These entries are inserted
> +into the appropriate tree in the base tree-ish if one exists. Otherwise,
> +empty parent trees are created to contain the entries.
This still does not answer "how overlapping entries are handled?",
which is more complex than "for two exactly the same paths, the last
one wins", which is mentioned in the next paragraph.
> The order of the tree entries is normalized by `mktree` so pre-sorting the
> input by path is not required. Multiple entries provided with the same path
> are deduplicated, with only the last one specified added to the tree.
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 96f06547a2a..74cec92a517 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> ...
> +static struct tree_entry *tree_entry_array_pop(struct tree_entry_array *arr)
> +{
> + if (!arr->nr)
> + return NULL;
> + return arr->entries[--arr->nr];
> +}
> +
> static void tree_entry_array_clear(struct tree_entry_array *arr, int free_entries)
> {
> if (free_entries) {
> @@ -109,8 +118,10 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
>
> if (!verify_path(ent->name, mode))
> die(_("invalid path '%s'"), path);
> - if (strchr(ent->name, '/'))
> - die("path %s contains slash", path);
> +
> + /* mark has_nested_entries if needed */
> + if (!arr->has_nested_entries && strchr(ent->name, '/'))
> + arr->has_nested_entries = 1;
OK.
> @@ -168,6 +179,46 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> ignore_mode = 0;
> QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
We have already sorted the array twice (once before simple deduping,
once after). So we now have a sorted array of "last one won" paths
and their object names.
> + if (arr->has_nested_entries) {
We need to deal with overlapping entries if "has-nested-entries" is
true. Even though our input here is sorted, we'd still pay
attention to the original input "order", which may be different from
the order in which we find these entries in arr->entries[].
OK.
> + struct tree_entry_array parent_dir_ents = { 0 };
> +
> + count = arr->nr;
> + arr->nr = 0;
> +
> + /* Remove any entries where one of its parent dirs has a higher 'order' */
Is "has a higher order" equivalent to "appears later in the input"?
More importantly, can the reason why they need to be removed be
clarified? For simple deduping, we can say "we will make the last
one of multiple entries talking about the same path be used", and
that would be a sufficient explanation why we discard the one that
we have seen earlier and replace it with the newly seen one for the
same path. Can a similar and simple explanation be given for the
behaviour this loop tries to achieve? Is it "children, which appear
earlier in the input, of a directory, which appears later than these
children, are discarded, because the entry for the directory has a
concrete object name, and there is no point talking about individual
paths inside the directory. We know what the tree object that would
contain these child paths hashes to in the end. This is a natural
extension of 'last one wins' rule---a directory that comes later
trumps paths contained within that come earlier"?
> + for (size_t i = 0; i < count; i++) {
> + const char *skipped_prefix;
> + struct tree_entry *parent;
> + struct tree_entry *curr = arr->entries[i];
> + int skip_entry = 0;
> +
> + while ((parent = tree_entry_array_pop(&parent_dir_ents))) {
> + if (!skip_prefix(curr->name, parent->name, &skipped_prefix))
> + continue;
> +
> + /* entry in dir, so we push the parent back onto the stack */
> + tree_entry_array_push(&parent_dir_ents, parent);
> +
> + if (parent->order > curr->order)
> + skip_entry = 1;
> + else
> + parent->expand_dir = 1;
> +
> + break;
> + }
> +
> + if (!skip_entry) {
> + arr->entries[arr->nr++] = curr;
> + if (S_ISDIR(curr->mode))
> + tree_entry_array_push(&parent_dir_ents, curr);
> + } else {
> + FREE_AND_NULL(curr);
> + }
> + }
> +
> + tree_entry_array_release(&parent_dir_ents, 0);
> + }
> +
> /* Finally, initialize the directory-file conflict hash map */
> for (size_t i = 0; i < count; i++) {
> struct tree_entry *curr = arr->entries[i];
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH v2 00/17] mktree: support more flexible usage
2024-06-25 23:26 ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
@ 2024-07-10 21:40 ` Junio C Hamano
0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2024-07-10 21:40 UTC (permalink / raw)
To: git
Cc: Victoria Dye via GitGitGadget, Eric Sunshine, Patrick Steinhardt,
Victoria Dye
Junio C Hamano <gitster@pobox.com> writes:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> The goal of this series is to make 'git mktree' a much more flexible and
>> powerful tool for constructing arbitrary trees in memory without the use of
>> an index or worktree.
>
> I've read earlier parts of this series carefully (but didn't manage
> to get to the end of it), and I saw Patrick and Eric gave reviews on
> the earlier round, but otherwise this topic seems to have stalled.
>
> https://lore.kernel.org/git/pull.1746.v2.git.1718834285.gitgitgadget@gmail.com/
>
> Any more comments?
>
> Otherwise, if I find time to read through 13-17/17 and did not find
> anything glaringly wrong, I am planning to mark the topic ready for
> 'next'. Thanks.
And I had a few review comments myself. Then the topic stalled. I
am not ready to mark the topic ready for 'next' in this state.
THanks.
^ permalink raw reply [flat|nested] 65+ messages in thread
end of thread, other threads:[~2024-07-10 21:40 UTC | newest]
Thread overview: 65+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-11 18:45 ` Eric Sunshine
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-11 22:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
2024-06-11 22:52 ` Junio C Hamano
2024-06-18 17:33 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
2024-06-12 1:54 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-12 2:11 ` Junio C Hamano
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:35 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-12 2:18 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-12 2:26 ` Junio C Hamano
2024-06-12 19:01 ` Victoria Dye
2024-06-12 19:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:48 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-13 18:38 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 19:50 ` Junio C Hamano
2024-06-17 19:23 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-20 20:18 ` Junio C Hamano
2024-06-19 21:57 ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-20 22:05 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-20 22:26 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-26 21:10 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-19 21:58 ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-26 21:23 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-27 19:29 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-25 23:26 ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
2024-07-10 21:40 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).