All of lore.kernel.org
 help / color / mirror / Atom feed
From: Karthik Nayak <karthik.188@gmail.com>
To: karthik.188@gmail.com
Cc: git@vger.kernel.org, gitster@pobox.com, me@ttaylorr.com
Subject: [PATCH v3] revision: add `--ignore-missing-links` user option
Date: Fri, 15 Sep 2023 10:34:15 +0200	[thread overview]
Message-ID: <20230915083415.263187-1-knayak@gitlab.com> (raw)
In-Reply-To: <20230912155820.136111-1-karthik.188@gmail.com>

From: Karthik Nayak <karthik.188@gmail.com>

The revision backend is used by multiple porcelain commands such as
git-rev-list(1) and git-log(1). The backend currently supports ignoring
missing links by setting the `ignore_missing_links` bit. This allows the
revision walk to skip any objects links which are missing. Expose this
bit via an `--ignore-missing-links` user option.

A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, enabling this option along with the `--boundary` option
while disabling the alternate object directory allows us to find the
boundary objects between the main and alternate object directory.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Changes since v2:
- Refactored the tests thanks to Taylor! 

Range diff against version 2:

 1:  e3f4d85732 ! 1:  a08f3637a0 revision: add `--ignore-missing-links` user option
    @@ Commit message
         while disabling the alternate object directory allows us to find the
         boundary objects between the main and alternate object directory.
     
    +    Helped-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## Documentation/rev-list-options.txt ##
    @@ t/t6022-rev-list-alternates.sh (new)
     +# We create 5 commits and move them to the alt directory and
     +# create 5 more commits which will stay in the main odb.
     +test_expect_success 'create repository and alternate directory' '
    -+	git init main &&
    -+	test_commit_bulk -C main 5 &&
    -+	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
    -+	mkdir alt &&
    -+	mv main/.git/objects/* alt &&
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
    ++	test_commit_bulk 5 &&
    ++	git clone --reference=. --shared . alt &&
    ++	test_commit_bulk --start=6 -C alt 5
     +'
     +
     +# when the alternate odb is provided, all commits are listed along with the boundary
     +# commit.
     +test_expect_success 'rev-list passes with alternate object directory' '
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
    -+	test_stdout_line_count = 10 cat actual &&
    -+	grep $BOUNDARY_COMMIT actual
    ++	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
    ++	{
    ++		git rev-list --all --objects --no-object-names &&
    ++		git -C alt rev-list --all --objects --no-object-names --not \
    ++			--alternate-refs
    ++	} >expect.raw &&
    ++	sort actual.raw >actual &&
    ++	sort expect.raw >expect &&
    ++	test_cmp expect actual
     +'
     +
    ++alt=alt/.git/objects/info/alternates
    ++
    ++hide_alternates () {
    ++	test -f "$alt.bak" || mv "$alt" "$alt.bak"
    ++}
    ++
    ++show_alternates () {
    ++	test -f "$alt" || mv "$alt.bak" "$alt"
    ++}
    ++
     +# When the alternate odb is not provided, rev-list fails since the 5th commit's
     +# parent is not present in the main odb.
     +test_expect_success 'rev-list fails without alternate object directory' '
    -+	test_must_fail git -C main rev-list HEAD
    ++	hide_alternates &&
    ++	test_must_fail git -C alt rev-list HEAD
     +'
     +
     +# With `--ignore-missing-links`, we stop the traversal when we encounter a
     +# missing link. The boundary commit is not listed as we haven't used the
     +# `--boundary` options.
     +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
    -+	git -C main rev-list --ignore-missing-links HEAD >actual &&
    -+	test_stdout_line_count = 5 cat actual &&
    -+	! grep -$BOUNDARY_COMMIT actual
    ++	hide_alternates &&
    ++
    ++	git -C alt rev-list --objects --no-object-names \
    ++		--ignore-missing-links HEAD >actual.raw &&
    ++	git -C alt cat-file  --batch-check="%(objectname)" \
    ++		--batch-all-objects >expect.raw &&
    ++
    ++	sort actual.raw >actual &&
    ++	sort expect.raw >expect &&
    ++	test_must_fail git -C alt rev-list HEAD
     +'
     +
     +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
     +# commits.
     +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
    -+	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
    -+	test_stdout_line_count = 6 cat actual &&
    -+	grep -$BOUNDARY_COMMIT actual
    ++	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
    ++	grep "^-$(git rev-parse HEAD)" got
     +'
     +
    -+# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
    -+# fail when used alongside `--objects` when a tree is missing.
    -+test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
    -+	echo "foo" >main/file &&
    -+	git -C main add file &&
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
    -+	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
    -+	mkdir alt/${TREE_OID:0:2} &&
    -+	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
    -+	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
    -+	! grep $TREE_OID actual
    ++test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
    ++	show_alternates &&
    ++	test_commit -C alt 11
     +'
     +
    -+# Similar to above, it should also work when a blob is missing.
    -+test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
    -+	echo "bar" >main/file &&
    -+	git -C main add file &&
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
    -+	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
    -+	mkdir alt/${BLOB_OID:0:2} &&
    -+	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
    -+	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
    -+	! grep $BLOB_OID actual
    -+'
    ++for obj in "HEAD^{tree}" "HEAD:11.t"
    ++do
    ++	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
    ++	# doesn't fail when used alongside `--objects` when a tree/blob is
    ++	# missing.
    ++	test_expect_success "rev-list --ignore-missing-links with missing $type" '
    ++		oid="$(git -C alt rev-parse $obj)" &&
    ++		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
    ++
    ++		mv "$path" "$path.hidden" &&
    ++		test_when_finished "mv $path.hidden $path" &&
    ++
    ++		git -C alt rev-list --ignore-missing-links --objects HEAD \
    ++			>actual &&
    ++		! grep $oid actual
    ++       '
    ++done
     +
     +test_done


 Documentation/rev-list-options.txt |  9 +++
 builtin/rev-list.c                 |  3 +-
 revision.c                         |  2 +
 t/t6022-rev-list-alternates.sh     | 93 ++++++++++++++++++++++++++++++
 4 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100755 t/t6022-rev-list-alternates.sh

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a4a0cb93b2..8ee713db3d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -227,6 +227,15 @@ explicitly.
 	Upon seeing an invalid object name in the input, pretend as if
 	the bad input was not given.
 
+--ignore-missing-links::
+	During traversal, if an object that is referenced does not
+	exist, instead of dying of a repository corruption, pretend as
+	if the reference itself does not exist. Running the command
+	with the `--boundary` option makes these missing commits,
+	together with the commits on the edge of revision ranges
+	(i.e. true boundary objects), appear on the output, prefixed
+	with '-'.
+
 ifndef::git-rev-list[]
 --bisect::
 	Pretend as if the bad bisection ref `refs/bisect/bad`
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ff715d6918..5239d83c76 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
 {
 	struct rev_list_info *info = cb_data;
 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
-		finish_object__ma(obj);
+		if (!info->revs->ignore_missing_links)
+			finish_object__ma(obj);
 		return 1;
 	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
diff --git a/revision.c b/revision.c
index 2f4c53ea20..cbfcbf6e28 100644
--- a/revision.c
+++ b/revision.c
@@ -2595,6 +2595,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	} else if (!strcmp(arg, "--ignore-missing-links")) {
+		revs->ignore_missing_links = 1;
 	} else if (opt && opt->allow_exclude_promisor_objects &&
 		   !strcmp(arg, "--exclude-promisor-objects")) {
 		if (fetch_if_missing)
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
new file mode 100755
index 0000000000..567dd21876
--- /dev/null
+++ b/t/t6022-rev-list-alternates.sh
@@ -0,0 +1,93 @@
+#!/bin/sh
+
+test_description='handling of alternates in rev-list'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# We create 5 commits and move them to the alt directory and
+# create 5 more commits which will stay in the main odb.
+test_expect_success 'create repository and alternate directory' '
+	test_commit_bulk 5 &&
+	git clone --reference=. --shared . alt &&
+	test_commit_bulk --start=6 -C alt 5
+'
+
+# when the alternate odb is provided, all commits are listed along with the boundary
+# commit.
+test_expect_success 'rev-list passes with alternate object directory' '
+	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
+	{
+		git rev-list --all --objects --no-object-names &&
+		git -C alt rev-list --all --objects --no-object-names --not \
+			--alternate-refs
+	} >expect.raw &&
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
+'
+
+alt=alt/.git/objects/info/alternates
+
+hide_alternates () {
+	test -f "$alt.bak" || mv "$alt" "$alt.bak"
+}
+
+show_alternates () {
+	test -f "$alt" || mv "$alt.bak" "$alt"
+}
+
+# When the alternate odb is not provided, rev-list fails since the 5th commit's
+# parent is not present in the main odb.
+test_expect_success 'rev-list fails without alternate object directory' '
+	hide_alternates &&
+	test_must_fail git -C alt rev-list HEAD
+'
+
+# With `--ignore-missing-links`, we stop the traversal when we encounter a
+# missing link. The boundary commit is not listed as we haven't used the
+# `--boundary` options.
+test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
+	hide_alternates &&
+
+	git -C alt rev-list --objects --no-object-names \
+		--ignore-missing-links HEAD >actual.raw &&
+	git -C alt cat-file  --batch-check="%(objectname)" \
+		--batch-all-objects >expect.raw &&
+
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_must_fail git -C alt rev-list HEAD
+'
+
+# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
+# commits.
+test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
+	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
+	grep "^-$(git rev-parse HEAD)" got
+'
+
+test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
+	show_alternates &&
+	test_commit -C alt 11
+'
+
+for obj in "HEAD^{tree}" "HEAD:11.t"
+do
+	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
+	# doesn't fail when used alongside `--objects` when a tree/blob is
+	# missing.
+	test_expect_success "rev-list --ignore-missing-links with missing $type" '
+		oid="$(git -C alt rev-parse $obj)" &&
+		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
+
+		mv "$path" "$path.hidden" &&
+		test_when_finished "mv $path.hidden $path" &&
+
+		git -C alt rev-list --ignore-missing-links --objects HEAD \
+			>actual &&
+		! grep $oid actual
+       '
+done
+
+test_done
-- 
2.41.0


  parent reply	other threads:[~2023-09-15  8:34 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 17:42 [PATCH] revision: add `--ignore-missing-links` user option Karthik Nayak
2023-09-08 19:19 ` Junio C Hamano
2023-09-12 14:42   ` Karthik Nayak
2023-09-12 15:58 ` [PATCH v2] " Karthik Nayak
2023-09-12 17:07   ` Taylor Blau
2023-09-13  9:32     ` Karthik Nayak
2023-09-13 17:17       ` Taylor Blau
2023-09-15  8:34   ` Karthik Nayak [this message]
2023-09-15 18:54     ` [PATCH v3] " Junio C Hamano
2023-09-18 10:12       ` Karthik Nayak
2023-09-18 15:56         ` Junio C Hamano
2023-09-19  8:45           ` Karthik Nayak
2023-09-19 15:13             ` Junio C Hamano
2023-09-20 10:45     ` [PATCH v4] " Karthik Nayak
2023-09-20 15:32       ` Junio C Hamano
2023-09-21 10:53         ` Karthik Nayak
2023-09-21 19:16           ` Junio C Hamano
2023-09-24 16:14             ` Karthik Nayak
2023-09-25 16:57               ` Junio C Hamano
2023-09-27 16:26                 ` Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230915083415.263187-1-knayak@gitlab.com \
    --to=karthik.188@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.