git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Victoria Dye <vdye@github.com>, Victoria Dye <vdye@github.com>
Subject: [PATCH 08/16] mktree: add a --literally option
Date: Tue, 11 Jun 2024 18:24:40 +0000	[thread overview]
Message-ID: <b497dc90687a7c77a4d21c3a12fe5fa3bfdabc16.1718130288.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1746.git.1718130288.gitgitgadget@gmail.com>

From: Victoria Dye <vdye@github.com>

Add the '--literally' option to 'git mktree' to allow constructing a tree
with invalid contents. For now, the only change this represents compared to
the normal 'git mktree' behavior is no longer sorting the inputs; in later
commits, deduplicaton and path validation will be added to the command and
'--literally' will skip those as well.

Certain tests use 'git mktree' to intentionally generate corrupt trees.
Update these tests to use '--literally' so that they continue functioning
properly when additional input cleanup & validation is added to the base
command. Note that, because 'mktree --literally' does not sort entries, some
of the tests are updated to provide their inputs in tree order; otherwise,
the test would fail with an "incorrect order" error instead of the error the
test expects.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-mktree.txt       |  9 ++++++-
 builtin/mktree.c                   | 36 +++++++++++++++++++++++----
 t/t1010-mktree.sh                  | 40 ++++++++++++++++++++++++++++++
 t/t1014-read-tree-confusing.sh     |  6 ++---
 t/t1450-fsck.sh                    |  4 +--
 t/t1601-index-bogus.sh             |  2 +-
 t/t1700-split-index.sh             |  6 ++---
 t/t7008-filter-branch-null-sha1.sh |  6 ++---
 t/t7417-submodule-path-url.sh      |  2 +-
 t/t7450-bad-git-dotfiles.sh        |  8 +++---
 10 files changed, 96 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 507682ed23e..fb07e40cef0 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -9,7 +9,7 @@ git-mktree - Build a tree-object from formatted tree entries
 SYNOPSIS
 --------
 [verse]
-'git mktree' [-z] [--missing] [--batch]
+'git mktree' [-z] [--missing] [--literally] [--batch]
 
 DESCRIPTION
 -----------
@@ -27,6 +27,13 @@ OPTIONS
 	object.  This option has no effect on the treatment of gitlink entries
 	(aka "submodules") which are always allowed to be missing.
 
+--literally::
+	Create the tree from the tree entries provided to stdin in the order
+	they are provided without performing additional sorting, deduplication,
+	or path validation on them. This option is primarily useful for creating
+	invalid tree objects to use in tests of how Git deals with various forms
+	of tree corruption.
+
 --batch::
 	Allow building of more than one tree object before exiting.  Each
 	tree is separated by a single blank line. The final newline is
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 5530257252d..48019448c1f 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -45,11 +45,11 @@ static void release_tree_entry_array(struct tree_entry_array *arr)
 }
 
 static void append_to_tree(unsigned mode, struct object_id *oid, const char *path,
-			   struct tree_entry_array *arr)
+			   struct tree_entry_array *arr, int literally)
 {
 	struct tree_entry *ent;
 	size_t len = strlen(path);
-	if (strchr(path, '/'))
+	if (!literally && strchr(path, '/'))
 		die("path %s contains slash", path);
 
 	FLEX_ALLOC_MEM(ent, name, path, len);
@@ -89,14 +89,35 @@ static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
 	strbuf_release(&buf);
 }
 
+static void write_tree_literally(struct tree_entry_array *arr,
+				 struct object_id *oid)
+{
+	struct strbuf buf;
+	size_t size = 0;
+
+	for (size_t i = 0; i < arr->nr; i++)
+		size += 32 + arr->entries[i]->len;
+
+	strbuf_init(&buf, size);
+	for (size_t i = 0; i < arr->nr; i++) {
+		struct tree_entry *ent = arr->entries[i];
+		strbuf_addf(&buf, "%o %s%c", ent->mode, ent->name, '\0');
+		strbuf_add(&buf, ent->oid.hash, the_hash_algo->rawsz);
+	}
+
+	write_object_file(buf.buf, buf.len, OBJ_TREE, oid);
+	strbuf_release(&buf);
+}
+
 static const char *mktree_usage[] = {
-	"git mktree [-z] [--missing] [--batch]",
+	"git mktree [-z] [--missing] [--literally] [--batch]",
 	NULL
 };
 
 struct mktree_line_data {
 	struct tree_entry_array *arr;
 	int allow_missing;
+	int literally;
 };
 
 static int mktree_line(unsigned int mode, struct object_id *oid,
@@ -136,7 +157,7 @@ static int mktree_line(unsigned int mode, struct object_id *oid,
 		    path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
 	}
 
-	append_to_tree(mode, oid, path, data->arr);
+	append_to_tree(mode, oid, path, data->arr, data->literally);
 	return 0;
 }
 
@@ -152,6 +173,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
 	const struct option option[] = {
 		OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")),
 		OPT_BOOL(0, "missing", &mktree_line_data.allow_missing, N_("allow missing objects")),
+		OPT_BOOL(0, "literally", &mktree_line_data.literally,
+			 N_("do not sort, deduplicate, or validate paths of tree entries")),
 		OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")),
 		OPT_END()
 	};
@@ -175,7 +198,10 @@ int cmd_mktree(int ac, const char **av, const char *prefix)
 			 */
 			; /* skip creating an empty tree */
 		} else {
-			write_tree(&arr, &oid);
+			if (mktree_line_data.literally)
+				write_tree_literally(&arr, &oid);
+			else
+				write_tree(&arr, &oid);
 			puts(oid_to_hex(&oid));
 			fflush(stdout);
 		}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 9b2ab0c97ad..e0687cb529f 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -133,4 +133,44 @@ test_expect_success 'mktree fails on mode mismatch' '
 	grep "object $tree_oid is a tree but specified type was (blob)" err
 '
 
+test_expect_success '--literally can create invalid trees' '
+	tree_oid="$(cat tree)" &&
+	blob_oid="$(git rev-parse ${tree_oid}:one)" &&
+
+	# duplicate entries
+	{
+		printf "040000 tree $tree_oid\tmy-tree\n" &&
+		printf "100644 blob $blob_oid\ttest-file\n" &&
+		printf "100755 blob $blob_oid\ttest-file\n"
+	} | git mktree --literally >tree.bad &&
+	git cat-file tree $(cat tree.bad) >top.bad &&
+	test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+	grep "contains duplicate file entries" err &&
+
+	# disallowed path
+	{
+		printf "100644 blob $blob_oid\t.git\n"
+	} | git mktree --literally >tree.bad &&
+	git cat-file tree $(cat tree.bad) >top.bad &&
+	test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+	grep "contains ${SQ}.git${SQ}" err &&
+
+	# nested entry
+	{
+		printf "100644 blob $blob_oid\tdeeper/my-file\n"
+	} | git mktree --literally >tree.bad &&
+	git cat-file tree $(cat tree.bad) >top.bad &&
+	test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+	grep "contains full pathnames" err &&
+
+	# bad entry ordering
+	{
+		printf "100644 blob $blob_oid\tB\n" &&
+		printf "040000 tree $tree_oid\tA\n"
+	} | git mktree --literally >tree.bad &&
+	git cat-file tree $(cat tree.bad) >top.bad &&
+	test_must_fail git hash-object --stdin -t tree <top.bad 2>err &&
+	grep "not properly sorted" err
+'
+
 test_done
diff --git a/t/t1014-read-tree-confusing.sh b/t/t1014-read-tree-confusing.sh
index 8ea8d36818b..762eb789704 100755
--- a/t/t1014-read-tree-confusing.sh
+++ b/t/t1014-read-tree-confusing.sh
@@ -30,13 +30,13 @@ while read path pretty; do
 	esac
 	test_expect_success "reject $pretty at end of path" '
 		printf "100644 blob %s\t%s" "$blob" "$path" >tree &&
-		bogus=$(git mktree <tree) &&
+		bogus=$(git mktree --literally <tree) &&
 		test_must_fail git read-tree $bogus
 	'
 
 	test_expect_success "reject $pretty as subtree" '
 		printf "040000 tree %s\t%s" "$tree" "$path" >tree &&
-		bogus=$(git mktree <tree) &&
+		bogus=$(git mktree --literally <tree) &&
 		test_must_fail git read-tree $bogus
 	'
 done <<-EOF
@@ -58,7 +58,7 @@ test_expect_success 'utf-8 paths allowed with core.protectHFS off' '
 	test_when_finished "git read-tree HEAD" &&
 	test_config core.protectHFS false &&
 	printf "100644 blob %s\t%s" "$blob" ".gi${u200c}t" >tree &&
-	ok=$(git mktree <tree) &&
+	ok=$(git mktree --literally <tree) &&
 	git read-tree $ok
 '
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8a456b1142d..532d2770e88 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -316,7 +316,7 @@ check_duplicate_names () {
 			*)  printf "100644 blob %s\t%s\n" $blob "$name" ;;
 			esac
 		done >badtree &&
-		badtree=$(git mktree <badtree) &&
+		badtree=$(git mktree --literally <badtree) &&
 		test_must_fail git fsck 2>out &&
 		test_grep "$badtree" out &&
 		test_grep "error in tree .*contains duplicate file entries" out
@@ -614,7 +614,7 @@ while read name path pretty; do
 			tree=$(git rev-parse HEAD^{tree}) &&
 			value=$(eval "echo \$$type") &&
 			printf "$mode $type %s\t%s" "$value" "$path" >bad &&
-			bad_tree=$(git mktree <bad) &&
+			bad_tree=$(git mktree --literally <bad) &&
 			git fsck 2>out &&
 			test_grep "warning.*tree $bad_tree" out
 		)'
diff --git a/t/t1601-index-bogus.sh b/t/t1601-index-bogus.sh
index 4171f1e1410..54e8ae038b7 100755
--- a/t/t1601-index-bogus.sh
+++ b/t/t1601-index-bogus.sh
@@ -4,7 +4,7 @@ test_description='test handling of bogus index entries'
 . ./test-lib.sh
 
 test_expect_success 'create tree with null sha1' '
-	tree=$(printf "160000 commit $ZERO_OID\\tbroken\\n" | git mktree)
+	tree=$(printf "160000 commit $ZERO_OID\\tbroken\\n" | git mktree --literally)
 '
 
 test_expect_success 'read-tree refuses to read null sha1' '
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index ac4a5b2734c..97b58aa3cca 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -478,12 +478,12 @@ test_expect_success 'writing split index with null sha1 does not write cache tre
 	git config splitIndex.maxPercentChange 0 &&
 	git commit -m "commit" &&
 	{
-		git ls-tree HEAD &&
-		printf "160000 commit $ZERO_OID\\tbroken\\n"
+		printf "160000 commit $ZERO_OID\\tbroken\\n" &&
+		git ls-tree HEAD
 	} >broken-tree &&
 	echo "add broken entry" >msg &&
 
-	tree=$(git mktree <broken-tree) &&
+	tree=$(git mktree --literally <broken-tree) &&
 	test_tick &&
 	commit=$(git commit-tree $tree -p HEAD <msg) &&
 	git update-ref HEAD "$commit" &&
diff --git a/t/t7008-filter-branch-null-sha1.sh b/t/t7008-filter-branch-null-sha1.sh
index 93fbc92b8db..a1b4c295c01 100755
--- a/t/t7008-filter-branch-null-sha1.sh
+++ b/t/t7008-filter-branch-null-sha1.sh
@@ -12,12 +12,12 @@ test_expect_success 'setup: base commits' '
 
 test_expect_success 'setup: a commit with a bogus null sha1 in the tree' '
 	{
-		git ls-tree HEAD &&
-		printf "160000 commit $ZERO_OID\\tbroken\\n"
+		printf "160000 commit $ZERO_OID\\tbroken\\n" &&
+		git ls-tree HEAD
 	} >broken-tree &&
 	echo "add broken entry" >msg &&
 
-	tree=$(git mktree <broken-tree) &&
+	tree=$(git mktree --literally <broken-tree) &&
 	test_tick &&
 	commit=$(git commit-tree $tree -p HEAD <msg) &&
 	git update-ref HEAD "$commit"
diff --git a/t/t7417-submodule-path-url.sh b/t/t7417-submodule-path-url.sh
index dbbb3853dc0..5d3c98e99a7 100755
--- a/t/t7417-submodule-path-url.sh
+++ b/t/t7417-submodule-path-url.sh
@@ -42,7 +42,7 @@ test_expect_success MINGW 'submodule paths disallows trailing spaces' '
 	tree=$(git -C super write-tree) &&
 	git -C super ls-tree $tree >tree &&
 	sed "s/sub/sub /" <tree >tree.new &&
-	tree=$(git -C super mktree <tree.new) &&
+	tree=$(git -C super mktree --literally <tree.new) &&
 	commit=$(echo with space | git -C super commit-tree $tree) &&
 	git -C super update-ref refs/heads/main $commit &&
 
diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index 4a9c22c9e2b..de2d45d2244 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -203,11 +203,11 @@ check_dotx_symlink () {
 			content=$(git hash-object -w ../.gitmodules) &&
 			target=$(printf "$tricky" | git hash-object -w --stdin) &&
 			{
-				printf "100644 blob $content\t$tricky\n" &&
-				printf "120000 blob $target\t$path\n"
+				printf "120000 blob $target\t$path\n" &&
+				printf "100644 blob $content\t$tricky\n"
 			} >bad-tree
 		) &&
-		tree=$(git -C $dir mktree <$dir/bad-tree)
+		tree=$(git -C $dir mktree --literally <$dir/bad-tree)
 	'
 
 	test_expect_success "fsck detects symlinked $name ($type)" '
@@ -261,7 +261,7 @@ test_expect_success 'fsck detects non-blob .gitmodules' '
 		cp ../.gitmodules subdir/file &&
 		git add subdir/file &&
 		git commit -m ok &&
-		git ls-tree HEAD | sed s/subdir/.gitmodules/ | git mktree &&
+		git ls-tree HEAD | sed s/subdir/.gitmodules/ | git mktree --literally &&
 
 		test_must_fail git fsck 2>output &&
 		test_grep gitmodulesBlob output
-- 
gitgitgadget


  parent reply	other threads:[~2024-06-11 18:24 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-11 18:45   ` Eric Sunshine
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-11 22:45   ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
2024-06-11 22:52   ` Junio C Hamano
2024-06-18 17:33     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
2024-06-12  1:54   ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-12  2:11   ` Junio C Hamano
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 18:35     ` Junio C Hamano
2024-06-11 18:24 ` Victoria Dye via GitGitGadget [this message]
2024-06-12  2:18   ` [PATCH 08/16] mktree: add a --literally option Junio C Hamano
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-12  2:26   ` Junio C Hamano
2024-06-12 19:01     ` Victoria Dye
2024-06-12 19:45       ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 18:48     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-13 18:38     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 19:50     ` Junio C Hamano
2024-06-17 19:23     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-20 20:18     ` Junio C Hamano
2024-06-19 21:57   ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-20 22:05     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-20 22:26     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-26 21:10     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-19 21:58   ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-26 21:23     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-27 19:29     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-25 23:26   ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
2024-07-10 21:40     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b497dc90687a7c77a4d21c3a12fe5fa3bfdabc16.1718130288.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).