From: Karthik Nayak <karthik.188@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, Karthik Nayak <karthik.188@gmail.com>
Subject: [PATCH v2] revision: add `--ignore-missing-links` user option
Date: Tue, 12 Sep 2023 17:58:20 +0200 [thread overview]
Message-ID: <20230912155820.136111-1-karthik.188@gmail.com> (raw)
In-Reply-To: <20230908174208.249184-1-karthik.188@gmail.com>
The revision backend is used by multiple porcelain commands such as
git-rev-list(1) and git-log(1). The backend currently supports ignoring
missing links by setting the `ignore_missing_links` bit. This allows the
revision walk to skip any objects links which are missing. Expose this
bit via an `--ignore-missing-links` user option.
A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, enabling this option along with the `--boundary` option for
while disabling the alternate object directory allows us to find the
boundary objects between the main and alternate object directory.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes from v1:
1. Changes in the commit message and option description to be more specific
and list why and what the changes are.
2. Ensure the new option also works with the existing `--objects` options.
3. More specific testing for boundary commit.
Range diff against v1:
1: c0a4dca9b0 ! 1: e3f4d85732 revision: add `--ignore-missing-links` user option
@@ Commit message
The revision backend is used by multiple porcelain commands such as
git-rev-list(1) and git-log(1). The backend currently supports ignoring
missing links by setting the `ignore_missing_links` bit. This allows the
- revision walk to skip any objects links which are missing.
+ revision walk to skip any objects links which are missing. Expose this
+ bit via an `--ignore-missing-links` user option.
- Currently there is no way to use git-rev-list(1) to traverse the objects
- of the main object directory (GIT_OBJECT_DIRECTORY) and print the
- boundary objects when moving from the main object directory to the
- alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES).
-
- By exposing this new flag `--ignore-missing-links`, users can set the
- required env variables (GIT_OBJECT_DIRECTORY and
- GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
- find the boundary objects between object directories.
+ A scenario where this option would be used is to find the boundary
+ objects between different object directories. Consider a repository with
+ a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
+ object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
+ repository, enabling this option along with the `--boundary` option for
+ while disabling the alternate object directory allows us to find the
+ boundary objects between the main and alternate object directory.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
@@ Documentation/rev-list-options.txt: explicitly.
the bad input was not given.
+--ignore-missing-links::
-+ When an object points to another object that is missing, pretend as if the
-+ link did not exist. These missing links are not written to stdout unless
-+ the --boundary flag is passed.
++ During traversal, if an object that is referenced does not
++ exist, instead of dying of a repository corruption, pretend as
++ if the reference itself does not exist. Running the command
++ with the `--boundary` option makes these missing commits,
++ together with the commits on the edge of revision ranges
++ (i.e. true boundary objects), appear on the output, prefixed
++ with '-'.
+
ifndef::git-rev-list[]
--bisect::
Pretend as if the bad bisection ref `refs/bisect/bad`
+ ## builtin/rev-list.c ##
+@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED,
+ {
+ struct rev_list_info *info = cb_data;
+ if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
+- finish_object__ma(obj);
++ if (!info->revs->ignore_missing_links)
++ finish_object__ma(obj);
+ return 1;
+ }
+ if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
+
## revision.c ##
@@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
revs->limited = 1;
@@ t/t6022-rev-list-alternates.sh (new)
+test_expect_success 'create repository and alternate directory' '
+ git init main &&
+ test_commit_bulk -C main 5 &&
++ BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
+ mkdir alt &&
+ mv main/.git/objects/* alt &&
+ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
+'
+
-+# When the alternate odb is provided, all commits are listed.
++# when the alternate odb is provided, all commits are listed along with the boundary
++# commit.
+test_expect_success 'rev-list passes with alternate object directory' '
-+ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_stdout_line_count = 10 git -C main rev-list HEAD
++ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
++ test_stdout_line_count = 10 cat actual &&
++ grep $BOUNDARY_COMMIT actual
+'
+
+# When the alternate odb is not provided, rev-list fails since the 5th commit's
@@ t/t6022-rev-list-alternates.sh (new)
+'
+
+# With `--ignore-missing-links`, we stop the traversal when we encounter a
-+# missing link.
++# missing link. The boundary commit is not listed as we haven't used the
++# `--boundary` options.
+test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
-+ test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
++ git -C main rev-list --ignore-missing-links HEAD >actual &&
++ test_stdout_line_count = 5 cat actual &&
++ ! grep -$BOUNDARY_COMMIT actual
+'
+
+# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
+# commits.
+test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
-+ git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
-+ test_stdout_line_count = 6 cat list-output &&
-+ test_stdout_line_count = 1 cat list-output | grep "^-"
++ git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
++ test_stdout_line_count = 6 cat actual &&
++ grep -$BOUNDARY_COMMIT actual
++'
++
++# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
++# fail when used alongside `--objects` when a tree is missing.
++test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
++ echo "foo" >main/file &&
++ git -C main add file &&
++ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
++ TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
++ mkdir alt/${TREE_OID:0:2} &&
++ mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
++ git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
++ ! grep $TREE_OID actual
++'
++
++# Similar to above, it should also work when a blob is missing.
++test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
++ echo "bar" >main/file &&
++ git -C main add file &&
++ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
++ BLOB_OID=$(git -C main rev-parse HEAD:file) &&
++ mkdir alt/${BLOB_OID:0:2} &&
++ mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
++ git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
++ ! grep $BLOB_OID actual
+'
+
+test_done
Documentation/rev-list-options.txt | 9 ++++
builtin/rev-list.c | 3 +-
revision.c | 2 +
t/t6022-rev-list-alternates.sh | 75 ++++++++++++++++++++++++++++++
4 files changed, 88 insertions(+), 1 deletion(-)
create mode 100755 t/t6022-rev-list-alternates.sh
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a4a0cb93b2..8ee713db3d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -227,6 +227,15 @@ explicitly.
Upon seeing an invalid object name in the input, pretend as if
the bad input was not given.
+--ignore-missing-links::
+ During traversal, if an object that is referenced does not
+ exist, instead of dying of a repository corruption, pretend as
+ if the reference itself does not exist. Running the command
+ with the `--boundary` option makes these missing commits,
+ together with the commits on the edge of revision ranges
+ (i.e. true boundary objects), appear on the output, prefixed
+ with '-'.
+
ifndef::git-rev-list[]
--bisect::
Pretend as if the bad bisection ref `refs/bisect/bad`
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ff715d6918..5239d83c76 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
{
struct rev_list_info *info = cb_data;
if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
- finish_object__ma(obj);
+ if (!info->revs->ignore_missing_links)
+ finish_object__ma(obj);
return 1;
}
if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
diff --git a/revision.c b/revision.c
index 2f4c53ea20..cbfcbf6e28 100644
--- a/revision.c
+++ b/revision.c
@@ -2595,6 +2595,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
revs->limited = 1;
} else if (!strcmp(arg, "--ignore-missing")) {
revs->ignore_missing = 1;
+ } else if (!strcmp(arg, "--ignore-missing-links")) {
+ revs->ignore_missing_links = 1;
} else if (opt && opt->allow_exclude_promisor_objects &&
!strcmp(arg, "--exclude-promisor-objects")) {
if (fetch_if_missing)
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
new file mode 100755
index 0000000000..08d9ffde5f
--- /dev/null
+++ b/t/t6022-rev-list-alternates.sh
@@ -0,0 +1,75 @@
+#!/bin/sh
+
+test_description='handling of alternates in rev-list'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# We create 5 commits and move them to the alt directory and
+# create 5 more commits which will stay in the main odb.
+test_expect_success 'create repository and alternate directory' '
+ git init main &&
+ test_commit_bulk -C main 5 &&
+ BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
+ mkdir alt &&
+ mv main/.git/objects/* alt &&
+ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
+'
+
+# when the alternate odb is provided, all commits are listed along with the boundary
+# commit.
+test_expect_success 'rev-list passes with alternate object directory' '
+ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
+ test_stdout_line_count = 10 cat actual &&
+ grep $BOUNDARY_COMMIT actual
+'
+
+# When the alternate odb is not provided, rev-list fails since the 5th commit's
+# parent is not present in the main odb.
+test_expect_success 'rev-list fails without alternate object directory' '
+ test_must_fail git -C main rev-list HEAD
+'
+
+# With `--ignore-missing-links`, we stop the traversal when we encounter a
+# missing link. The boundary commit is not listed as we haven't used the
+# `--boundary` options.
+test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
+ git -C main rev-list --ignore-missing-links HEAD >actual &&
+ test_stdout_line_count = 5 cat actual &&
+ ! grep -$BOUNDARY_COMMIT actual
+'
+
+# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
+# commits.
+test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
+ git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
+ test_stdout_line_count = 6 cat actual &&
+ grep -$BOUNDARY_COMMIT actual
+'
+
+# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
+# fail when used alongside `--objects` when a tree is missing.
+test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
+ echo "foo" >main/file &&
+ git -C main add file &&
+ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
+ TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
+ mkdir alt/${TREE_OID:0:2} &&
+ mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
+ git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
+ ! grep $TREE_OID actual
+'
+
+# Similar to above, it should also work when a blob is missing.
+test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
+ echo "bar" >main/file &&
+ git -C main add file &&
+ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
+ BLOB_OID=$(git -C main rev-parse HEAD:file) &&
+ mkdir alt/${BLOB_OID:0:2} &&
+ mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
+ git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
+ ! grep $BLOB_OID actual
+'
+
+test_done
--
2.41.0
next prev parent reply other threads:[~2023-09-12 15:58 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-08 17:42 [PATCH] revision: add `--ignore-missing-links` user option Karthik Nayak
2023-09-08 19:19 ` Junio C Hamano
2023-09-12 14:42 ` Karthik Nayak
2023-09-12 15:58 ` Karthik Nayak [this message]
2023-09-12 17:07 ` [PATCH v2] " Taylor Blau
2023-09-13 9:32 ` Karthik Nayak
2023-09-13 17:17 ` Taylor Blau
2023-09-15 8:34 ` [PATCH v3] " Karthik Nayak
2023-09-15 18:54 ` Junio C Hamano
2023-09-18 10:12 ` Karthik Nayak
2023-09-18 15:56 ` Junio C Hamano
2023-09-19 8:45 ` Karthik Nayak
2023-09-19 15:13 ` Junio C Hamano
2023-09-20 10:45 ` [PATCH v4] " Karthik Nayak
2023-09-20 15:32 ` Junio C Hamano
2023-09-21 10:53 ` Karthik Nayak
2023-09-21 19:16 ` Junio C Hamano
2023-09-24 16:14 ` Karthik Nayak
2023-09-25 16:57 ` Junio C Hamano
2023-09-27 16:26 ` Karthik Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230912155820.136111-1-karthik.188@gmail.com \
--to=karthik.188@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).