* [PATCH 0/4] rev-list: introduce NUL-delimited output mode @ 2025-03-10 19:28 Justin Tobler 2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler ` (7 more replies) 0 siblings, 8 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler When walking objects, git-rev-list(1) prints each object entry on a separate line in the form: <oid> LF Some options, such as `--objects`, may print additional information about the object on the same line: <oid> SP [<path>] LF In this mode, if the object path contains a newline it is truncated at the newline. When the `--missing={print,print-info}` option is provided, information about any missing objects encountered during the object walk are also printed in the form: ?<oid> [SP <token>=<value>]... LF where values containing LF or SP are printed in a token specific fashion so that the resulting encoded value does not contain either of these two problematic bytes. For example, missing object paths are quoted in the C style so they contain LF or SP. To make machine parsing easier, this series introduces a NUL-delimited output mode for git-rev-list(1) via a `-z` option following a suggestion from Junio in a previous thread[1]. In this mode, instead of LF, each object is delimited with two NUL bytes and any object metadata is separated with a single NUL byte. Examples: <oid> NUL NUL <oid> [NUL <path>] NUL NUL ?<oid> [NUL <token>=<value>]... NUL NUL In this mode, path and value info are printed as-is without any special encoding or truncation. For now this series only adds support for use with the `--objects` and `--missing` options. Usage of `-z` with other options is rejected, so it can potentially be added in the future. One idea I had, but did not implement in this version, was to also use the `<token>=<value>` format for regular non-missing object info while in the NUL-delimited mode. I could see this being a bit more flexible instead of relying strictly on order. Interested if anyone has thoughts on this. :) This series is structured as follows: - Patches 1 and 2 do some minor preparatory refactors. - Patch 3 adds the `-z` option to git-rev-list(1) to print objects in a NUL-delimited fashion. Printed object paths with the `--objects` option are also handled. - Patch 4 teaches the `--missing` option how to print info in a NUL-delimited fashion. Thanks for taking a look, -Justin [1]: <xmqq5xlor0la.fsf@gitster.g> Justin Tobler (4): rev-list: inline `show_object_with_name()` in `show_object()` rev-list: refactor early option parsing rev-list: support delimiting objects with NUL bytes rev-list: support NUL-delimited --missing option Documentation/rev-list-options.adoc | 26 +++++++++ builtin/rev-list.c | 86 ++++++++++++++++++++++------- revision.c | 8 --- revision.h | 2 - t/t6000-rev-list-misc.sh | 34 ++++++++++++ t/t6022-rev-list-missing.sh | 30 ++++++++++ 6 files changed, 155 insertions(+), 31 deletions(-) base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3 -- 2.49.0.rc2 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler @ 2025-03-10 19:28 ` Justin Tobler 2025-03-10 20:51 ` Junio C Hamano 2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler ` (6 subsequent siblings) 7 siblings, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler The `show_object_with_name()` function only has a single call site. Inline call to `show_object_with_name()` in `show_object()` so the explicit function can be cleaned up and live closer to where it is used. While at it, factor out the code that prints the OID and newline for both objects with and without a name. In a subsequent commit, `show_object()` is modified to support printing object information in a NUL-delimited format. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- builtin/rev-list.c | 13 +++++++++---- revision.c | 8 -------- revision.h | 2 -- 3 files changed, 9 insertions(+), 14 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index bb26bee0d4..dcd079c16c 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data) return; } - if (arg_show_object_names) - show_object_with_name(stdout, obj, name); - else - printf("%s\n", oid_to_hex(&obj->oid)); + printf("%s", oid_to_hex(&obj->oid)); + + if (arg_show_object_names) { + putchar(' '); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); + } + + putchar('\n'); } static void show_edge(struct commit *commit) diff --git a/revision.c b/revision.c index c4390f0938..0eaebe4478 100644 --- a/revision.c +++ b/revision.c @@ -59,14 +59,6 @@ implement_shared_commit_slab(revision_sources, char *); static inline int want_ancestry(const struct rev_info *revs); -void show_object_with_name(FILE *out, struct object *obj, const char *name) -{ - fprintf(out, "%s ", oid_to_hex(&obj->oid)); - for (const char *p = name; *p && *p != '\n'; p++) - fputc(*p, out); - fputc('\n', out); -} - static void mark_blob_uninteresting(struct blob *blob) { if (!blob) diff --git a/revision.h b/revision.h index 71e984c452..21c6a69899 100644 --- a/revision.h +++ b/revision.h @@ -489,8 +489,6 @@ void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit); void mark_tree_uninteresting(struct repository *r, struct tree *tree); void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees); -void show_object_with_name(FILE *, struct object *, const char *); - /** * Helpers to check if a reference should be excluded. */ -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` 2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler @ 2025-03-10 20:51 ` Junio C Hamano 0 siblings, 0 replies; 60+ messages in thread From: Junio C Hamano @ 2025-03-10 20:51 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder Justin Tobler <jltobler@gmail.com> writes: > The `show_object_with_name()` function only has a single call site. > Inline call to `show_object_with_name()` in `show_object()` so the > explicit function can be cleaned up and live closer to where it is used. > While at it, factor out the code that prints the OID and newline for > both objects with and without a name. In a subsequent commit, > `show_object()` is modified to support printing object information in a > NUL-delimited format. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > builtin/rev-list.c | 13 +++++++++---- > revision.c | 8 -------- > revision.h | 2 -- > 3 files changed, 9 insertions(+), 14 deletions(-) > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > index bb26bee0d4..dcd079c16c 100644 > --- a/builtin/rev-list.c > +++ b/builtin/rev-list.c > @@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data) > return; > } > > - if (arg_show_object_names) > - show_object_with_name(stdout, obj, name); > - else > - printf("%s\n", oid_to_hex(&obj->oid)); > + printf("%s", oid_to_hex(&obj->oid)); > + > + if (arg_show_object_names) { > + putchar(' '); > + for (const char *p = name; *p && *p != '\n'; p++) > + putchar(*p); > + } > + > + putchar('\n'); > } Makes sense. ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 2/4] rev-list: refactor early option parsing 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler @ 2025-03-10 19:28 ` Justin Tobler 2025-03-10 20:54 ` Junio C Hamano 2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler ` (5 subsequent siblings) 7 siblings, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler Before invoking `setup_revisions()`, the `--missing` and `--exclude-promisor-objects` options are parsed early. In a subsequent commit, another option is added that must be parsed early. Refactor the code to parse both options in a single early pass. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- builtin/rev-list.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index dcd079c16c..04d9c893b5 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -16,6 +16,7 @@ #include "object-file.h" #include "object-store-ll.h" #include "pack-bitmap.h" +#include "parse-options.h" #include "log-tree.h" #include "graph.h" #include "bisect.h" @@ -639,19 +640,15 @@ int cmd_rev_list(int argc, if (!strcmp(arg, "--exclude-promisor-objects")) { fetch_if_missing = 0; revs.exclude_promisor_objects = 1; - break; - } - } - for (i = 1; i < argc; i++) { - const char *arg = argv[i]; - if (skip_prefix(arg, "--missing=", &arg)) { - if (revs.exclude_promisor_objects) - die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing"); - if (parse_missing_action_value(arg)) - break; + } else if (skip_prefix(arg, "--missing=", &arg)) { + parse_missing_action_value(arg); } } + die_for_incompatible_opt2(revs.exclude_promisor_objects, + "--exclude_promisor_objects", + arg_missing_action, "--missing"); + if (arg_missing_action) revs.do_not_die_on_missing_objects = 1; -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 2/4] rev-list: refactor early option parsing 2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler @ 2025-03-10 20:54 ` Junio C Hamano 2025-03-12 21:39 ` Justin Tobler 0 siblings, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-03-10 20:54 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder Justin Tobler <jltobler@gmail.com> writes: > @@ -639,19 +640,15 @@ int cmd_rev_list(int argc, > if (!strcmp(arg, "--exclude-promisor-objects")) { > fetch_if_missing = 0; > revs.exclude_promisor_objects = 1; > - break; > - } > - } > - for (i = 1; i < argc; i++) { > - const char *arg = argv[i]; > - if (skip_prefix(arg, "--missing=", &arg)) { > - if (revs.exclude_promisor_objects) > - die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing"); > - if (parse_missing_action_value(arg)) > - break; > + } else if (skip_prefix(arg, "--missing=", &arg)) { > + parse_missing_action_value(arg); > } > } There is a huge NEEDSWORK comment that essentially says that the above two loops that this patch combines into one is fundamentally broken. I suspect that the remaining two patches in this series would punt and not improve them, but offhand I am not sure if this change is making it harder to fix them properly easier or harder. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 2/4] rev-list: refactor early option parsing 2025-03-10 20:54 ` Junio C Hamano @ 2025-03-12 21:39 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-12 21:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, ps, christian.couder On 25/03/10 01:54PM, Junio C Hamano wrote: > Justin Tobler <jltobler@gmail.com> writes: > > > @@ -639,19 +640,15 @@ int cmd_rev_list(int argc, > > if (!strcmp(arg, "--exclude-promisor-objects")) { > > fetch_if_missing = 0; > > revs.exclude_promisor_objects = 1; > > - break; > > - } > > - } > > - for (i = 1; i < argc; i++) { > > - const char *arg = argv[i]; > > - if (skip_prefix(arg, "--missing=", &arg)) { > > - if (revs.exclude_promisor_objects) > > - die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing"); > > - if (parse_missing_action_value(arg)) > > - break; > > + } else if (skip_prefix(arg, "--missing=", &arg)) { > > + parse_missing_action_value(arg); > > } > > } > > There is a huge NEEDSWORK comment that essentially says that the > above two loops that this patch combines into one is fundamentally > broken. I suspect that the remaining two patches in this series > would punt and not improve them, but offhand I am not sure if this > change is making it harder to fix them properly easier or harder. My understanding here is that `setup_revisions()` does not provide a mechanism to reject certain individual or combinations of parsed options that do not make sense for a command. As you mentioned, this patch is just punting on that issue, but I don't think it is really making the problem any harder to be resolved in the future. In this case, the `-z` option is parsed early to avoid is being handled by the diff options parsing in `setup_revisions()`. From the perspective of git-rev-list(1), I don't think there is any reason to be parsing diff options at all. We could introduce an option to disable diff options parsing in `setup_revisions()`, but now that we also want to modify stdin argument parsing to be NUL-delimited, we would still have to parse the -z option early in order to configure `setup_revisions()` appropriately. Another way we could potentially resolve these issues would be to optionally separate the argument/option parsing logic out of `setup_revisions()` and expose it directly. That could give the caller more control over how options are handled before fully setting up the `rev_info` structure. I suspect this would be non-trivial though, so I think it would preferrable to keep the current approach for now as it isn't making the problem harder to fix IMO. -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 3/4] rev-list: support delimiting objects with NUL bytes 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler 2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler @ 2025-03-10 19:28 ` Justin Tobler 2025-03-10 20:59 ` Junio C Hamano 2025-03-12 7:50 ` Patrick Steinhardt 2025-03-10 19:28 ` [PATCH 4/4] rev-list: support NUL-delimited --missing option Justin Tobler ` (4 subsequent siblings) 7 siblings, 2 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler When walking objects, git-rev-list(1) prints each object entry on a separate line. Some options, such as `--objects`, may print additional information about tree and blob object on the same line in the form: $ git rev-list --objects <rev> <tree/blob oid> SP [<path>] LF Note that in this form the SP is appended regardless of whether the tree or blob object has path information available. Paths containing a newline are also truncated at the newline. Introduce the `-z` option for git-rev-list(1) which reformats the output to use NUL-delimiters between objects and associated info. Each object line uses two NUL bytes to indicate the end of an object entry and a single NUL byte to delimit between object information in the following form: $ git rev-list -z --objects <rev> <oid> [NUL <path>] NUL NUL For now, the `--objects` flag is the only option that can be used in combination with `-z`. In this mode, the object path is not truncated at newlines. In a subsequent commit, NUL-delimiter support for other options is added. Other options that do not make sense with be used in combination with `-z` are rejected. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 18 +++++++++++++ builtin/rev-list.c | 39 +++++++++++++++++++++++++---- t/t6000-rev-list-misc.sh | 34 +++++++++++++++++++++++++ 3 files changed, 86 insertions(+), 5 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 785c0786e0..d21016d657 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -361,6 +361,24 @@ ifdef::git-rev-list[] --progress=<header>:: Show progress reports on stderr as objects are considered. The `<header>` text will be printed with each progress update. + +-z:: + Instead of being newline-delimited, each outputted object is delimited + with two NUL bytes in the following form: ++ +----------------------------------------------------------------------- +<OID> NUL NUL +----------------------------------------------------------------------- ++ +When the `--objects` option is also present, available object name information +is printed in the following form without any truncation for object names +containing newline characters: ++ +----------------------------------------------------------------------- +<OID> [NUL <object-name>] NUL NUL +----------------------------------------------------------------------- ++ +This option is only compatible with `--objects`. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 04d9c893b5..86b3ce5806 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -65,6 +65,7 @@ static const char rev_list_usage[] = " --abbrev-commit\n" " --left-right\n" " --count\n" +" -z\n" " special purpose:\n" " --bisect\n" " --bisect-vars\n" @@ -97,10 +98,23 @@ static int arg_show_object_names = 1; #define DEFAULT_OIDSET_SIZE (16*1024) +static int nul_delim; static int show_disk_usage; static off_t total_disk_usage; static int human_readable; +static void print_object_term(int nul_delim) +{ + char line_sep = '\n'; + + if (nul_delim) + line_sep = '\0'; + + putchar(line_sep); + if (nul_delim) + putchar(line_sep); +} + static off_t get_object_disk_usage(struct object *obj) { off_t size; @@ -264,7 +278,7 @@ static void show_commit(struct commit *commit, void *data) if (revs->commit_format == CMIT_FMT_ONELINE) putchar(' '); else if (revs->include_header) - putchar('\n'); + print_object_term(nul_delim); if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; @@ -361,12 +375,17 @@ static void show_object(struct object *obj, const char *name, void *cb_data) printf("%s", oid_to_hex(&obj->oid)); if (arg_show_object_names) { - putchar(' '); - for (const char *p = name; *p && *p != '\n'; p++) - putchar(*p); + if (nul_delim && *name) { + putchar('\0'); + printf("%s", name); + } else if (!nul_delim) { + putchar(' '); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); + } } - putchar('\n'); + print_object_term(nul_delim); } static void show_edge(struct commit *commit) @@ -642,6 +661,8 @@ int cmd_rev_list(int argc, revs.exclude_promisor_objects = 1; } else if (skip_prefix(arg, "--missing=", &arg)) { parse_missing_action_value(arg); + } else if (!strcmp(arg, "-z")) { + nul_delim = 1; } } @@ -757,6 +778,14 @@ int cmd_rev_list(int argc, usage(rev_list_usage); } + + if (nul_delim) { + if (revs.graph || revs.verbose_header || show_disk_usage || + info.show_timestamp || info.header_prefix || bisect_list || + use_bitmap_index || revs.edge_hint || arg_missing_action) + die(_("-z option used with unsupported option")); + } + if (revs.commit_format != CMIT_FMT_USERFORMAT) revs.include_header = 1; if (revs.commit_format != CMIT_FMT_UNSPECIFIED) { diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 6289a2e8b0..25c2f2f238 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -182,4 +182,38 @@ test_expect_success 'rev-list --unpacked' ' test_cmp expect actual ' +test_expect_success 'rev-list -z' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD) && + oid2=$(git -C repo rev-parse HEAD~) && + + printf "%s\0\0%s\0\0" "$oid1" "$oid2" >expect && + git -C repo rev-list -z HEAD >actual && + + test_cmp expect actual +' + +test_expect_success 'rev-list -z --objects' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD:1.t) && + oid2=$(git -C repo rev-parse HEAD:2.t) && + path1=1.t && + path2=2.t && + + printf "%s\0%s\0\0%s\0%s\0\0" "$oid1" "$path1" "$oid2" "$path2" >expect && + git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual && + + test_cmp expect actual +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 3/4] rev-list: support delimiting objects with NUL bytes 2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler @ 2025-03-10 20:59 ` Junio C Hamano 2025-03-12 21:39 ` Justin Tobler 2025-03-12 7:50 ` Patrick Steinhardt 1 sibling, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-03-10 20:59 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder Justin Tobler <jltobler@gmail.com> writes: > +-z:: > + Instead of being newline-delimited, each outputted object is delimited > + with two NUL bytes in the following form: > ++ > +----------------------------------------------------------------------- > +<OID> NUL NUL > +----------------------------------------------------------------------- > ++ > +When the `--objects` option is also present, available object name information > +is printed in the following form without any truncation for object names > +containing newline characters: > ++ > +----------------------------------------------------------------------- > +<OID> [NUL <object-name>] NUL NUL > +----------------------------------------------------------------------- > ++ > +This option is only compatible with `--objects`. > endif::git-rev-list[] As I said, I do not think we strongly want double-NUL and would prefer to do away with a single NUL if possible. > +static int nul_delim; > static int show_disk_usage; > static off_t total_disk_usage; > static int human_readable; > > +static void print_object_term(int nul_delim) > +{ > + char line_sep = '\n'; > + > + if (nul_delim) > + line_sep = '\0'; > + > + putchar(line_sep); > + if (nul_delim) > + putchar(line_sep); > +} This looks, to put it mildly, strange. The concept of the line delimiter byte (which can take any single byte) is wider than having a NUL as the line delimiter byte. Why would we even want both? IOW, wouldn't it make more sense to have line_delim as the global (or per-invocation parameter to this function) and have print_object_term() just use it? If you want to make it behave differently only when line-delimter is NUL (which I do not recommend), you can switch on the value of line_delimiter being NUL. So I do not see a merit in having two separate variables (except for confusing future readers). ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/4] rev-list: support delimiting objects with NUL bytes 2025-03-10 20:59 ` Junio C Hamano @ 2025-03-12 21:39 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-12 21:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, ps, christian.couder On 25/03/10 01:59PM, Junio C Hamano wrote: > > +static int nul_delim; > > static int show_disk_usage; > > static off_t total_disk_usage; > > static int human_readable; > > > > +static void print_object_term(int nul_delim) > > +{ > > + char line_sep = '\n'; > > + > > + if (nul_delim) > > + line_sep = '\0'; > > + > > + putchar(line_sep); > > + if (nul_delim) > > + putchar(line_sep); > > +} > > This looks, to put it mildly, strange. The concept of the line > delimiter byte (which can take any single byte) is wider than having > a NUL as the line delimiter byte. Why would we even want both? This is a fair point. There are scerarios where the printed object format varies more than just the delimiter used. For example, in the normal output mode printed object paths are truncated if they contain a newline. In the NUL-delimited mode we want to print the complete path. Similarly, missing object paths are c-quoted if they contain SP or LF characters. These should also be printed as-is when in the NUL-delimited mode. Due to branching behavior, I initially thought it would be easier to follow if there was separate variable that signaled printing behavior, but as you mentioned, we could also just use the global to hold the line terminator being used and check if it is set to NUL when there is conditional behavior. > IOW, wouldn't it make more sense to have line_delim as the global > (or per-invocation parameter to this function) and have > print_object_term() just use it? If you want to make it behave > differently only when line-delimter is NUL (which I do not > recommend), you can switch on the value of line_delimiter being NUL. > So I do not see a merit in having two separate variables (except for > confusing future readers). In the next version, I'll use a global to hold the configured line delimiter. I think it also makes sense to have a separate global variable for the metadata delimiter since in normal mode, it is usually a SP character. When the -z option is detected, we can set both of these to a NUL byte. -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/4] rev-list: support delimiting objects with NUL bytes 2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler 2025-03-10 20:59 ` Junio C Hamano @ 2025-03-12 7:50 ` Patrick Steinhardt 2025-03-12 21:41 ` Justin Tobler 1 sibling, 1 reply; 60+ messages in thread From: Patrick Steinhardt @ 2025-03-12 7:50 UTC (permalink / raw) To: Justin Tobler; +Cc: git, christian.couder On Mon, Mar 10, 2025 at 02:28:28PM -0500, Justin Tobler wrote: > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > index 04d9c893b5..86b3ce5806 100644 > --- a/builtin/rev-list.c > +++ b/builtin/rev-list.c > @@ -757,6 +778,14 @@ int cmd_rev_list(int argc, > usage(rev_list_usage); > > } > + > + if (nul_delim) { > + if (revs.graph || revs.verbose_header || show_disk_usage || > + info.show_timestamp || info.header_prefix || bisect_list || > + use_bitmap_index || revs.edge_hint || arg_missing_action) > + die(_("-z option used with unsupported option")); > + } > + Not sure whether it's worth it, but do we maybe want to add a comment here that mentions that this isn't an inherent limitation, but rather that the initial implementation simply didn't implement compatibility with these options? This would explicitly keep the door open for any future improvements in this area. Patrick ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/4] rev-list: support delimiting objects with NUL bytes 2025-03-12 7:50 ` Patrick Steinhardt @ 2025-03-12 21:41 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-12 21:41 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, christian.couder On 25/03/12 08:50AM, Patrick Steinhardt wrote: > On Mon, Mar 10, 2025 at 02:28:28PM -0500, Justin Tobler wrote: > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > > index 04d9c893b5..86b3ce5806 100644 > > --- a/builtin/rev-list.c > > +++ b/builtin/rev-list.c > > @@ -757,6 +778,14 @@ int cmd_rev_list(int argc, > > usage(rev_list_usage); > > > > } > > + > > + if (nul_delim) { > > + if (revs.graph || revs.verbose_header || show_disk_usage || > > + info.show_timestamp || info.header_prefix || bisect_list || > > + use_bitmap_index || revs.edge_hint || arg_missing_action) > > + die(_("-z option used with unsupported option")); > > + } > > + > > Not sure whether it's worth it, but do we maybe want to add a comment > here that mentions that this isn't an inherent limitation, but rather > that the initial implementation simply didn't implement compatibility > with these options? This would explicitly keep the door open for any > future improvements in this area. That's fair. I'll mention this is a comment that way we know support NUL-delimited mode support can be added for some of the options in the future. Thanks -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 4/4] rev-list: support NUL-delimited --missing option 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler ` (2 preceding siblings ...) 2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler @ 2025-03-10 19:28 ` Justin Tobler 2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano ` (3 subsequent siblings) 7 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler The `--missing={print,print-info}` option for git-rev-list(1) prints missing objects found while performing the revision walk. Add support for printing missing objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --missing=print-info <rev> <oid> NUL NUL ?<oid> [NUL <token>=<value>]... NUL NUL In this mode, values containing special characters or spaces are printed as-is without being escaped or quoted. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 10 +++++++++- builtin/rev-list.c | 27 +++++++++++++++++++------- t/t6022-rev-list-missing.sh | 30 +++++++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 8 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index d21016d657..48648b7507 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -378,7 +378,15 @@ containing newline characters: <OID> [NUL <object-name>] NUL NUL ----------------------------------------------------------------------- + -This option is only compatible with `--objects`. +When the `--missing` option is provided, missing objects are printed in the +following form where value is printed as-is without any token specific +encoding: ++ +----------------------------------------------------------------------- +?<OID> [NUL <token>=<value>]... NUL NUL +----------------------------------------------------------------------- ++ +This option is only compatible with `--objects` and `--missing`. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 86b3ce5806..5bbc4a787e 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -145,25 +145,38 @@ static void print_missing_object(struct missing_objects_map_entry *entry, int print_missing_info) { struct strbuf sb = STRBUF_INIT; + char info_sep = ' '; + + if (nul_delim) + info_sep = '\0'; + + printf("?%s", oid_to_hex(&entry->entry.oid)); if (!print_missing_info) { - printf("?%s\n", oid_to_hex(&entry->entry.oid)); + print_object_term(nul_delim); return; } if (entry->path && *entry->path) { struct strbuf path = STRBUF_INIT; - strbuf_addstr(&sb, " path="); - quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); - strbuf_addbuf(&sb, &path); + strbuf_addf(&sb, "%cpath=", info_sep); + + if (nul_delim) { + strbuf_addstr(&sb, entry->path); + } else { + quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); + strbuf_addbuf(&sb, &path); + } strbuf_release(&path); } if (entry->type) - strbuf_addf(&sb, " type=%s", type_name(entry->type)); + strbuf_addf(&sb, "%ctype=%s", info_sep, type_name(entry->type)); + + fwrite(sb.buf, sizeof(char), sb.len, stdout); + print_object_term(nul_delim); - printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf); strbuf_release(&sb); } @@ -782,7 +795,7 @@ int cmd_rev_list(int argc, if (nul_delim) { if (revs.graph || revs.verbose_header || show_disk_usage || info.show_timestamp || info.header_prefix || bisect_list || - use_bitmap_index || revs.edge_hint || arg_missing_action) + use_bitmap_index || revs.edge_hint) die(_("-z option used with unsupported option")); } diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh index 3e2790d4c8..3ae25e4cfb 100755 --- a/t/t6022-rev-list-missing.sh +++ b/t/t6022-rev-list-missing.sh @@ -198,4 +198,34 @@ do ' done +test_expect_success "-z nul-delimited --missing" ' + test_when_finished rm -rf repo && + + git init repo && + ( + cd repo && + git commit --allow-empty -m first && + + path="foo bar" && + echo foobar >"$path" && + git add -A && + git commit -m second && + + oid=$(git rev-parse "HEAD:$path") && + type="$(git cat-file -t $oid)" && + + obj_path=".git/objects/$(test_oid_to_path $oid)" && + + git rev-list -z --objects --no-object-names \ + HEAD ^"$oid" >expect && + printf "?%s\0path=%s\0type=%s\0\0" "$oid" "$path" "$type" >>expect && + + mv "$obj_path" "$obj_path.hidden" && + git rev-list -z --objects --no-object-names \ + --missing=print-info HEAD >actual && + + test_cmp expect actual + ) +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler ` (3 preceding siblings ...) 2025-03-10 19:28 ` [PATCH 4/4] rev-list: support NUL-delimited --missing option Justin Tobler @ 2025-03-10 20:37 ` Junio C Hamano 2025-03-10 21:08 ` Junio C Hamano 2025-03-11 23:19 ` Justin Tobler 2025-03-10 22:38 ` D. Ben Knoble ` (2 subsequent siblings) 7 siblings, 2 replies; 60+ messages in thread From: Junio C Hamano @ 2025-03-10 20:37 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder Justin Tobler <jltobler@gmail.com> writes: > ?<oid> [SP <token>=<value>]... LF > > where values containing LF or SP are printed in a token specific fashion > so that the resulting encoded value does not contain either of these two > problematic bytes. For example, missing object paths are quoted in the C > style so they contain LF or SP. "so" -> "when"??? > To make machine parsing easier, this series introduces a NUL-delimited > output mode for git-rev-list(1) via a `-z` option following a suggestion > from Junio in a previous thread[1]. In this mode, instead of LF, each > object is delimited with two NUL bytes and any object metadata is > separated with a single NUL byte. Examples: > > <oid> NUL NUL > <oid> [NUL <path>] NUL NUL Why do we need double-NUL in the above two cases? > ?<oid> [NUL <token>=<value>]... NUL NUL This one I understand; we could do without double-NUL and take the lack of "=" in the token after NUL termination as the sign that the previous record ended, though, to avoid double-NUL while keeping the format extensible. As this topic is designing essentially a new and machine parseable format, we could even unify all three formats into one. For example, the format could be like this: <oid> NUL [<attr>=<value> NUL]... where (1) A record ends when a new record begins. (2) The beginning of a new record is signaled by <oid> that is all hexadecimal and does not have any '=' in it. (3) The traditional "rev-list --objects" output that gives path in addition to the object name uses "path" as the <attr> name, i.e. such a record looks like "<oid> NUL path=<path> NUL". (4) The traditional "rev-list --missing" output loses the leading "?"; it is replaced by "missing" as the <attr> name, i.e. such a record may look like "<oid> NUL missing=yes NUL..." together with other "<token>=<value> NUL" pairs appended as needed at the end. Hmm? ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano @ 2025-03-10 21:08 ` Junio C Hamano 2025-03-11 23:24 ` Justin Tobler 2025-03-11 23:19 ` Justin Tobler 1 sibling, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-03-10 21:08 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder Junio C Hamano <gitster@pobox.com> writes: > As this topic is designing essentially a new and machine parseable > format, we could even unify all three formats into one. For example, > the format could be like this: > > <oid> NUL [<attr>=<value> NUL]... > > where (0) "rev-list" that gives only a sequence of "<oid>" for commit-ish, as well as "rev-list --boundary", would fall out as a natural consequence. Bog-standard "list of commits" would see a sequence of "<oid> NUL", while a boundary object would see "<oid> NUL boundary=yes NUL". > (1) A record ends when a new record begins. > > (2) The beginning of a new record is signaled by <oid> that is all > hexadecimal and does not have any '=' in it. > > (3) The traditional "rev-list --objects" output that gives path in > addition to the object name uses "path" as the <attr> name, > i.e. such a record looks like "<oid> NUL path=<path> NUL". > > (4) The traditional "rev-list --missing" output loses the leading > "?"; it is replaced by "missing" as the <attr> name, i.e. such > a record may look like "<oid> NUL missing=yes NUL..." together > with other "<token>=<value> NUL" pairs appended as needed at > the end. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-10 21:08 ` Junio C Hamano @ 2025-03-11 23:24 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-11 23:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, ps, christian.couder On 25/03/10 02:08PM, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > As this topic is designing essentially a new and machine parseable > > format, we could even unify all three formats into one. For example, > > the format could be like this: > > > > <oid> NUL [<attr>=<value> NUL]... > > > > where > > (0) "rev-list" that gives only a sequence of "<oid>" for commit-ish, > as well as "rev-list --boundary", would fall out as a natural > consequence. Bog-standard "list of commits" would see a > sequence of "<oid> NUL", while a boundary object would see > "<oid> NUL boundary=yes NUL". I had not considered handling the `--boundary` option. It looks like boundary objects are printed as part of `show_commit()`, so I can adapt the handling and do something similar to missing objects: $ git rev-list -z --boundary <rev> <oid> NUL boundary=yes NUL This would remain consistent with the unified format. -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano 2025-03-10 21:08 ` Junio C Hamano @ 2025-03-11 23:19 ` Justin Tobler 2025-03-11 23:44 ` Junio C Hamano 1 sibling, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-11 23:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, ps, christian.couder On 25/03/10 01:37PM, Junio C Hamano wrote: > Justin Tobler <jltobler@gmail.com> writes: > > To make machine parsing easier, this series introduces a NUL-delimited > > output mode for git-rev-list(1) via a `-z` option following a suggestion > > from Junio in a previous thread[1]. In this mode, instead of LF, each > > object is delimited with two NUL bytes and any object metadata is > > separated with a single NUL byte. Examples: > > > > <oid> NUL NUL > > <oid> [NUL <path>] NUL NUL > > Why do we need double-NUL in the above two cases? In the `<oid> [NUL <path>] NUL NUL` case, it would technically be possible for an object path to match an OID. The use of two NUL bytes signals when the object record ends. Without someother mechanism to know when a record starts/stops, even the `<oid> NUL NUL` case would need the two trailing NUL bytes to avoid being considered a potential path. If the output format would not result in any additional object metadata being appended, we could use a single NUL byte to delimit between objects in this case, but always using two NUL bytes allowed for a more consistent format. > > > ?<oid> [NUL <token>=<value>]... NUL NUL > > This one I understand; we could do without double-NUL and take the > lack of "=" in the token after NUL termination as the sign that the > previous record ended, though, to avoid double-NUL while keeping the > format extensible. > > As this topic is designing essentially a new and machine parseable > format, we could even unify all three formats into one. For example, > the format could be like this: > > <oid> NUL [<attr>=<value> NUL]... I was also considering something similar. This format could allow other object metadata like `--timestamp` to be supported in the future with a more flexible format. In the next version I'll implement a unified format here. > > where > > (1) A record ends when a new record begins. > > (2) The beginning of a new record is signaled by <oid> that is all > hexadecimal and does not have any '=' in it. I think this is a good idea. By always appending printed object metadata in the form `<token>=<value>`, we know that any entry without '=' must be the start of a new record. This removes the need for the two NUL bytes to indicate the end of a record. I'll use only a single NUL byte to delimit in the next version. > > (3) The traditional "rev-list --objects" output that gives path in > addition to the object name uses "path" as the <attr> name, > i.e. such a record looks like "<oid> NUL path=<path> NUL". > > (4) The traditional "rev-list --missing" output loses the leading > "?"; it is replaced by "missing" as the <attr> name, i.e. such > a record may look like "<oid> NUL missing=yes NUL..." together > with other "<token>=<value> NUL" pairs appended as needed at > the end. I think this is good. Instead of prefixing missing OIDs with '?', we can just append another token/value pair `missing=yes`. Thanks, -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-11 23:19 ` Justin Tobler @ 2025-03-11 23:44 ` Junio C Hamano 2025-03-12 7:37 ` Patrick Steinhardt 0 siblings, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-03-11 23:44 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder Justin Tobler <jltobler@gmail.com> writes: >> (4) The traditional "rev-list --missing" output loses the leading >> "?"; it is replaced by "missing" as the <attr> name, i.e. such >> a record may look like "<oid> NUL missing=yes NUL..." together >> with other "<token>=<value> NUL" pairs appended as needed at >> the end. > > I think this is good. Instead of prefixing missing OIDs with '?', we can > just append another token/value pair `missing=yes`. And we may want to avoid excessive bloat of the output that is not primarily meant for human consumption, in which case perhaps a common things like "missing" and "path" can be given a shorter token, perhaps like "m=yes" and "p=Makefile"? ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-11 23:44 ` Junio C Hamano @ 2025-03-12 7:37 ` Patrick Steinhardt 2025-03-12 21:45 ` Justin Tobler 0 siblings, 1 reply; 60+ messages in thread From: Patrick Steinhardt @ 2025-03-12 7:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Justin Tobler, git, christian.couder On Tue, Mar 11, 2025 at 04:44:10PM -0700, Junio C Hamano wrote: > Justin Tobler <jltobler@gmail.com> writes: > > >> (4) The traditional "rev-list --missing" output loses the leading > >> "?"; it is replaced by "missing" as the <attr> name, i.e. such > >> a record may look like "<oid> NUL missing=yes NUL..." together > >> with other "<token>=<value> NUL" pairs appended as needed at > >> the end. > > > > I think this is good. Instead of prefixing missing OIDs with '?', we can > > just append another token/value pair `missing=yes`. > > And we may want to avoid excessive bloat of the output that is not > primarily meant for human consumption, in which case perhaps a > common things like "missing" and "path" can be given a shorter > token, perhaps like "m=yes" and "p=Makefile"? I would prefer to keep longer paths here: - While the output typically shouldn't be seen by a human, the code handling it both on the printing and on the receiving side would. And there it helps the reader quite a bit to get additional context rather than cryptic single-letter abbreviations that one would have to always look up. - By keeping with long names we also avoid "letter squatting" when two attributes of an object start with the same letter. - I doubt that the couple of extra characters is really going to matter much given that most of the time will most likely be spent reading and decompressing the objects from disk anyway. Patrick ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-12 7:37 ` Patrick Steinhardt @ 2025-03-12 21:45 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-12 21:45 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Junio C Hamano, git, christian.couder On 25/03/12 08:37AM, Patrick Steinhardt wrote: > On Tue, Mar 11, 2025 at 04:44:10PM -0700, Junio C Hamano wrote: > > Justin Tobler <jltobler@gmail.com> writes: > > > > >> (4) The traditional "rev-list --missing" output loses the leading > > >> "?"; it is replaced by "missing" as the <attr> name, i.e. such > > >> a record may look like "<oid> NUL missing=yes NUL..." together > > >> with other "<token>=<value> NUL" pairs appended as needed at > > >> the end. > > > > > > I think this is good. Instead of prefixing missing OIDs with '?', we can > > > just append another token/value pair `missing=yes`. > > > > And we may want to avoid excessive bloat of the output that is not > > primarily meant for human consumption, in which case perhaps a > > common things like "missing" and "path" can be given a shorter > > token, perhaps like "m=yes" and "p=Makefile"? > > I would prefer to keep longer paths here: > > - While the output typically shouldn't be seen by a human, the code > handling it both on the printing and on the receiving side would. > And there it helps the reader quite a bit to get additional context > rather than cryptic single-letter abbreviations that one would have > to always look up. > > - By keeping with long names we also avoid "letter squatting" when two > attributes of an object start with the same letter. > > - I doubt that the couple of extra characters is really going to > matter much given that most of the time will most likely be spent > reading and decompressing the objects from disk anyway. I also would prefer to keep the longer token names here. I don't think the extra bytes should be much of a concern and it allows us to avoid having to conditionally change the printed token name when in the NUL-delimited mode for missing object metadata. -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler ` (4 preceding siblings ...) 2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano @ 2025-03-10 22:38 ` D. Ben Knoble 2025-03-11 22:59 ` Justin Tobler 2025-03-11 23:57 ` Jeff King 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler 7 siblings, 1 reply; 60+ messages in thread From: D. Ben Knoble @ 2025-03-10 22:38 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder, Junio C Hamano On Mon, Mar 10, 2025 at 3:32 PM Justin Tobler <jltobler@gmail.com> wrote: > > When walking objects, git-rev-list(1) prints each object entry on a > separate line in the form: > > <oid> LF > > Some options, such as `--objects`, may print additional information > about the object on the same line: > > <oid> SP [<path>] LF > > In this mode, if the object path contains a newline it is truncated at > the newline. > > When the `--missing={print,print-info}` option is provided, information > about any missing objects encountered during the object walk are also > printed in the form: > > ?<oid> [SP <token>=<value>]... LF > > where values containing LF or SP are printed in a token specific fashion > so that the resulting encoded value does not contain either of these two > problematic bytes. For example, missing object paths are quoted in the C > style so they contain LF or SP. > > To make machine parsing easier, this series introduces a NUL-delimited > output mode for git-rev-list(1) via a `-z` option following a suggestion > from Junio in a previous thread[1]. In this mode, instead of LF, each > object is delimited with two NUL bytes and any object metadata is > separated with a single NUL byte. Examples: > > <oid> NUL NUL > <oid> [NUL <path>] NUL NUL > ?<oid> [NUL <token>=<value>]... NUL NUL > > In this mode, path and value info are printed as-is without any special > encoding or truncation. > > For now this series only adds support for use with the `--objects` and > `--missing` options. Usage of `-z` with other options is rejected, so it > can potentially be added in the future. > > One idea I had, but did not implement in this version, was to also use > the `<token>=<value>` format for regular non-missing object info while > in the NUL-delimited mode. I could see this being a bit more flexible > instead of relying strictly on order. Interested if anyone has thoughts > on this. :) Without taking a deeper look, I think token=value has the benefit of being self-describing at the cost of more output bytes (which might matter over the wire, for example). Generally I like the idea; sometimes I find it troublesome having to parse prose manuals for the specifics of output formats like field order, especially when I end up coding a parser for the format. If the field order doesn’t matter to the consumer, then perhaps using ordered fields AWK-style is inappropriately terse? OTOH, the -z format is for machines, and they don’t need human labels ;) [I think token labels would be a great parser-writing and debugging aid] Best, Ben > > This series is structured as follows: > > - Patches 1 and 2 do some minor preparatory refactors. > > - Patch 3 adds the `-z` option to git-rev-list(1) to print > objects in a NUL-delimited fashion. Printed object paths with > the `--objects` option are also handled. > > - Patch 4 teaches the `--missing` option how to print info in a > NUL-delimited fashion. > > Thanks for taking a look, > -Justin > > [1]: <xmqq5xlor0la.fsf@gitster.g> > > Justin Tobler (4): > rev-list: inline `show_object_with_name()` in `show_object()` > rev-list: refactor early option parsing > rev-list: support delimiting objects with NUL bytes > rev-list: support NUL-delimited --missing option > > Documentation/rev-list-options.adoc | 26 +++++++++ > builtin/rev-list.c | 86 ++++++++++++++++++++++------- > revision.c | 8 --- > revision.h | 2 - > t/t6000-rev-list-misc.sh | 34 ++++++++++++ > t/t6022-rev-list-missing.sh | 30 ++++++++++ > 6 files changed, 155 insertions(+), 31 deletions(-) > > > base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3 > -- > 2.49.0.rc2 > > -- D. Ben Knoble ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-10 22:38 ` D. Ben Knoble @ 2025-03-11 22:59 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-11 22:59 UTC (permalink / raw) To: D. Ben Knoble; +Cc: git, ps, christian.couder, Junio C Hamano On 25/03/10 06:38PM, D. Ben Knoble wrote: > On Mon, Mar 10, 2025 at 3:32 PM Justin Tobler <jltobler@gmail.com> wrote: > > > > When walking objects, git-rev-list(1) prints each object entry on a > > separate line in the form: > > > > <oid> LF > > > > Some options, such as `--objects`, may print additional information > > about the object on the same line: > > > > <oid> SP [<path>] LF > > > > In this mode, if the object path contains a newline it is truncated at > > the newline. > > > > When the `--missing={print,print-info}` option is provided, information > > about any missing objects encountered during the object walk are also > > printed in the form: > > > > ?<oid> [SP <token>=<value>]... LF > > > > where values containing LF or SP are printed in a token specific fashion > > so that the resulting encoded value does not contain either of these two > > problematic bytes. For example, missing object paths are quoted in the C > > style so they contain LF or SP. > > > > To make machine parsing easier, this series introduces a NUL-delimited > > output mode for git-rev-list(1) via a `-z` option following a suggestion > > from Junio in a previous thread[1]. In this mode, instead of LF, each > > object is delimited with two NUL bytes and any object metadata is > > separated with a single NUL byte. Examples: > > > > <oid> NUL NUL > > <oid> [NUL <path>] NUL NUL > > ?<oid> [NUL <token>=<value>]... NUL NUL > > > > In this mode, path and value info are printed as-is without any special > > encoding or truncation. > > > > For now this series only adds support for use with the `--objects` and > > `--missing` options. Usage of `-z` with other options is rejected, so it > > can potentially be added in the future. > > > > One idea I had, but did not implement in this version, was to also use > > the `<token>=<value>` format for regular non-missing object info while > > in the NUL-delimited mode. I could see this being a bit more flexible > > instead of relying strictly on order. Interested if anyone has thoughts > > on this. :) > > Without taking a deeper look, I think token=value has the benefit of > being self-describing at the cost of more output bytes (which might > matter over the wire, for example). Generally I like the idea; > sometimes I find it troublesome having to parse prose manuals for the > specifics of output formats like field order, especially when I end up > coding a parser for the format. If the field order doesn’t matter to > the consumer, then perhaps using ordered fields AWK-style is > inappropriately terse? > > OTOH, the -z format is for machines, and they don’t need human labels > ;) [I think token labels would be a great parser-writing and debugging > aid] One of the challenges with parsing git-rev-list(1) is all the various forms it can take based on the options provided. For example: $ git rev-list --timestamp --objects --parents <rev> timestamp SP <oid> [SP <parent oid>] LF (commit) <oid> SP [<path>] LF (tree/blob) Relying strictly on order can be a bit tricky to parse due to how the output format can change even line to line. So even for machine parsing, labels may help simplify things if all object records follow something along the lines of: <oid> NUL [<token>=<value> NUL]... As you mentioned, this could potentially also be useful for users since the attributes would be self-describing. This series is currently focussed on the machine parsing side, but I think support for this mode in a human-readable format could be added via a separate option in the future. -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler ` (5 preceding siblings ...) 2025-03-10 22:38 ` D. Ben Knoble @ 2025-03-11 23:57 ` Jeff King 2025-03-12 7:42 ` Patrick Steinhardt 2025-03-12 22:09 ` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler 7 siblings, 2 replies; 60+ messages in thread From: Jeff King @ 2025-03-11 23:57 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote: > To make machine parsing easier, this series introduces a NUL-delimited > output mode for git-rev-list(1) via a `-z` option following a suggestion > from Junio in a previous thread[1]. In this mode, instead of LF, each > object is delimited with two NUL bytes and any object metadata is > separated with a single NUL byte. Examples: > > <oid> NUL NUL > <oid> [NUL <path>] NUL NUL > ?<oid> [NUL <token>=<value>]... NUL NUL > > In this mode, path and value info are printed as-is without any special > encoding or truncation. I think this is a good direction, but I have two compatibility questions: 1. What should "git rev-list -z --stdin" do? In most other programs with a "-z" option it affects both input and output. I don't particularly care about this case myself, but it will be hard to change later. So we probably want to decide now. 2. I was a little surprised that rev-list already takes a "-z" option, but it doesn't seem to do anything. I guess it's probably picked up via diff_opt_parse(), though (I think) you can't actually trigger a diff via rev-list itself. So even though this is a change in behavior, probably nobody would have been using it until now? If it is possible to see some effect from "-z" now (I didn't dig very far), then it may be better to continue to let the diff options parser handle it, and simply pick the result out of revs.diffopt.line_termination. As your patch 3 is written, I think the diff code probably doesn't see it anymore at all. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-11 23:57 ` Jeff King @ 2025-03-12 7:42 ` Patrick Steinhardt 2025-03-12 15:56 ` Junio C Hamano 2025-03-12 22:09 ` Justin Tobler 1 sibling, 1 reply; 60+ messages in thread From: Patrick Steinhardt @ 2025-03-12 7:42 UTC (permalink / raw) To: Jeff King; +Cc: Justin Tobler, git, christian.couder On Tue, Mar 11, 2025 at 07:57:20PM -0400, Jeff King wrote: > On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote: > > > To make machine parsing easier, this series introduces a NUL-delimited > > output mode for git-rev-list(1) via a `-z` option following a suggestion > > from Junio in a previous thread[1]. In this mode, instead of LF, each > > object is delimited with two NUL bytes and any object metadata is > > separated with a single NUL byte. Examples: > > > > <oid> NUL NUL > > <oid> [NUL <path>] NUL NUL > > ?<oid> [NUL <token>=<value>]... NUL NUL > > > > In this mode, path and value info are printed as-is without any special > > encoding or truncation. > > I think this is a good direction, but I have two compatibility > questions: > > 1. What should "git rev-list -z --stdin" do? In most other programs > with a "-z" option it affects both input and output. I don't > particularly care about this case myself, but it will be hard to > change later. So we probably want to decide now. I would lean into the direction of making "-z" change the format both for stdin and stdout. That's what we do in most cases, and in those cases where we didn't we came to regret it (git-cat-file(1)). Patrick ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-12 7:42 ` Patrick Steinhardt @ 2025-03-12 15:56 ` Junio C Hamano 2025-03-13 7:46 ` Patrick Steinhardt 0 siblings, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-03-12 15:56 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Jeff King, Justin Tobler, git, christian.couder Patrick Steinhardt <ps@pks.im> writes: > On Tue, Mar 11, 2025 at 07:57:20PM -0400, Jeff King wrote: >> On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote: >> >> > To make machine parsing easier, this series introduces a NUL-delimited >> > output mode for git-rev-list(1) via a `-z` option following a suggestion >> > from Junio in a previous thread[1]. In this mode, instead of LF, each >> > object is delimited with two NUL bytes and any object metadata is >> > separated with a single NUL byte. Examples: >> > >> > <oid> NUL NUL >> > <oid> [NUL <path>] NUL NUL >> > ?<oid> [NUL <token>=<value>]... NUL NUL >> > >> > In this mode, path and value info are printed as-is without any special >> > encoding or truncation. >> >> I think this is a good direction, but I have two compatibility >> questions: >> >> 1. What should "git rev-list -z --stdin" do? In most other programs >> with a "-z" option it affects both input and output. I don't >> particularly care about this case myself, but it will be hard to >> change later. So we probably want to decide now. > > I would lean into the direction of making "-z" change the format both > for stdin and stdout. That's what we do in most cases, and in those > cases where we didn't we came to regret it (git-cat-file(1)). I've seen "-Z", in addition to "-z", used to differentiate between input and output in some commands. If we are not going to do that, I agree that making "-z" to affect both input and output is less surprising than having to remember which side is still text. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-12 15:56 ` Junio C Hamano @ 2025-03-13 7:46 ` Patrick Steinhardt 0 siblings, 0 replies; 60+ messages in thread From: Patrick Steinhardt @ 2025-03-13 7:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Justin Tobler, git, christian.couder On Wed, Mar 12, 2025 at 08:56:01AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > On Tue, Mar 11, 2025 at 07:57:20PM -0400, Jeff King wrote: > >> On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote: > >> > >> > To make machine parsing easier, this series introduces a NUL-delimited > >> > output mode for git-rev-list(1) via a `-z` option following a suggestion > >> > from Junio in a previous thread[1]. In this mode, instead of LF, each > >> > object is delimited with two NUL bytes and any object metadata is > >> > separated with a single NUL byte. Examples: > >> > > >> > <oid> NUL NUL > >> > <oid> [NUL <path>] NUL NUL > >> > ?<oid> [NUL <token>=<value>]... NUL NUL > >> > > >> > In this mode, path and value info are printed as-is without any special > >> > encoding or truncation. > >> > >> I think this is a good direction, but I have two compatibility > >> questions: > >> > >> 1. What should "git rev-list -z --stdin" do? In most other programs > >> with a "-z" option it affects both input and output. I don't > >> particularly care about this case myself, but it will be hard to > >> change later. So we probably want to decide now. > > > > I would lean into the direction of making "-z" change the format both > > for stdin and stdout. That's what we do in most cases, and in those > > cases where we didn't we came to regret it (git-cat-file(1)). > > I've seen "-Z", in addition to "-z", used to differentiate between > input and output in some commands. If we are not going to do that, > I agree that making "-z" to affect both input and output is less > surprising than having to remember which side is still text. Yeah, "-Z" is what I meant with git-cat-file(1) above. I just think it's safer to always use NUL as delimiter in machine parsing -- and while one often thinks initially that it may not be needed either in stdin or stdout, my learning is that eventually you always find an edge case where you do need NUL termination in both. So I'd rather want to push application developers into the right direction and make "-z" adjust both streams. Patrick ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-11 23:57 ` Jeff King 2025-03-12 7:42 ` Patrick Steinhardt @ 2025-03-12 22:09 ` Justin Tobler 2025-03-13 5:33 ` Jeff King 1 sibling, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-12 22:09 UTC (permalink / raw) To: Jeff King; +Cc: git, ps, christian.couder On 25/03/11 07:57PM, Jeff King wrote: > On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote: > > > To make machine parsing easier, this series introduces a NUL-delimited > > output mode for git-rev-list(1) via a `-z` option following a suggestion > > from Junio in a previous thread[1]. In this mode, instead of LF, each > > object is delimited with two NUL bytes and any object metadata is > > separated with a single NUL byte. Examples: > > > > <oid> NUL NUL > > <oid> [NUL <path>] NUL NUL > > ?<oid> [NUL <token>=<value>]... NUL NUL > > > > In this mode, path and value info are printed as-is without any special > > encoding or truncation. > > I think this is a good direction, but I have two compatibility > questions: > > 1. What should "git rev-list -z --stdin" do? In most other programs > with a "-z" option it affects both input and output. I don't > particularly care about this case myself, but it will be hard to > change later. So we probably want to decide now. As others suggested in this thread, in the next version I'll make revision and pathspec parsing on stdin also be NUL-delimited when -z is used. > 2. I was a little surprised that rev-list already takes a "-z" option, > but it doesn't seem to do anything. I guess it's probably picked up > via diff_opt_parse(), though (I think) you can't actually trigger a > diff via rev-list itself. So even though this is a change in > behavior, probably nobody would have been using it until now? This is correct. Technically git-rev-list(1) accepts all the common diff options because it invokes `setup_revisions()`. From my understanding it isn't possible to trigger diffs so I think parsing diff options is unnecessary to begin with. As it also isn't a documented option, I figured -z being an accepted option for git-rev-list(1) is largely an unintended consequence of it using `setup_revisions()` and should be fine to use. > If it is possible to see some effect from "-z" now (I didn't dig > very far), then it may be better to continue to let the diff > options parser handle it, and simply pick the result out of > revs.diffopt.line_termination. As your patch 3 is written, I think > the diff code probably doesn't see it anymore at all. As currently implemented, the early parsing of -z doesn't effect the diff options parsing in `setup_revisions()`. The early parsing doesn't remove the option and thus it continues to be set in the diff options. Furthermore, revision and pathspec argument parsing is all handled in `setup_revisions()` so if we want to NUL-delimit arguments parsed on stdin with -z, we would still need to parse the option early anyway. I think it should be fine to leave the early -z option parsing as-is. Thanks, -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-12 22:09 ` Justin Tobler @ 2025-03-13 5:33 ` Jeff King 2025-03-13 16:41 ` Justin Tobler 0 siblings, 1 reply; 60+ messages in thread From: Jeff King @ 2025-03-13 5:33 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder On Wed, Mar 12, 2025 at 05:09:41PM -0500, Justin Tobler wrote: > > If it is possible to see some effect from "-z" now (I didn't dig > > very far), then it may be better to continue to let the diff > > options parser handle it, and simply pick the result out of > > revs.diffopt.line_termination. As your patch 3 is written, I think > > the diff code probably doesn't see it anymore at all. > > As currently implemented, the early parsing of -z doesn't effect the > diff options parsing in `setup_revisions()`. The early parsing doesn't > remove the option and thus it continues to be set in the diff options. Ah, OK. From the diff context I didn't realize it was not in the main option parsing loop. That makes sense. > Furthermore, revision and pathspec argument parsing is all handled in > `setup_revisions()` so if we want to NUL-delimit arguments parsed on > stdin with -z, we would still need to parse the option early anyway. I > think it should be fine to leave the early -z option parsing as-is. Makes sense. And I guess we might not want to have setup_revisions() do that handling of "-z" for input, as that would make: git log --stdin --raw -z behave differently (since it does not currently change stdin handling, only the diff output). Though that does mean that these two commands will behave differently: git log --stdin -z git rev-list --stdin -z which seems...not great. My earlier suggestion to tie "-z" to stdin handling was for consistency with other tools like grep. But if we already have cases where "-z" is only for output, maybe it is better to stay consistent with other parts of git. I.e., I was worried about us painting ourselves into a corner with your patches, but we may have already done so years ago. ;) -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-13 5:33 ` Jeff King @ 2025-03-13 16:41 ` Justin Tobler 2025-03-14 2:49 ` Jeff King 0 siblings, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-13 16:41 UTC (permalink / raw) To: Jeff King; +Cc: git, ps, christian.couder On 25/03/13 01:33AM, Jeff King wrote: > > Furthermore, revision and pathspec argument parsing is all handled in > > `setup_revisions()` so if we want to NUL-delimit arguments parsed on > > stdin with -z, we would still need to parse the option early anyway. I > > think it should be fine to leave the early -z option parsing as-is. > > Makes sense. And I guess we might not want to have setup_revisions() do > that handling of "-z" for input, as that would make: > > git log --stdin --raw -z > > behave differently (since it does not currently change stdin handling, > only the diff output). Yes, we won't want to include this '-z' parsing directly in `setup_revisions()` or else it would change the behavior of other commands. In version two of this series, NUL-delimited stdin handling for `setup_revisions()` is triggered by setting a `nul_delim_stdin` field in `setup_revision_opt`. This gives the `setup_revisions()` caller the ability to control the parsing delimiter itself. Only in git-rev-list(1) does the stdin parsing behavior change if '-z' is also present. The behavior stdin parsing for `git log -z --stdin` remains unchanged. > Though that does mean that these two commands > will behave differently: > > git log --stdin -z > git rev-list --stdin -z > > which seems...not great. My earlier suggestion to tie "-z" to stdin > handling was for consistency with other tools like grep. But if we > already have cases where "-z" is only for output, maybe it is better to > stay consistent with other parts of git. I.e., I was worried about us > painting ourselves into a corner with your patches, but we may have > already done so years ago. ;) I think to some extent Git is already inconsistent here. IMO it would be preferable for both input and output to use NUL as the delimiter when machine parsing in git-rev-list(1) as that is the behavior I would personally expect. I also agree with Patrick's reasoning else where in this thread[1]. I'm open to discuss further though :) Thanks, -Justin [1]: <Z9KNQ8XliWrrYgAT@pks.im> ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-13 16:41 ` Justin Tobler @ 2025-03-14 2:49 ` Jeff King 2025-03-14 17:02 ` Junio C Hamano 0 siblings, 1 reply; 60+ messages in thread From: Jeff King @ 2025-03-14 2:49 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, christian.couder On Thu, Mar 13, 2025 at 11:41:56AM -0500, Justin Tobler wrote: > > Though that does mean that these two commands > > will behave differently: > > > > git log --stdin -z > > git rev-list --stdin -z > > > > which seems...not great. My earlier suggestion to tie "-z" to stdin > > handling was for consistency with other tools like grep. But if we > > already have cases where "-z" is only for output, maybe it is better to > > stay consistent with other parts of git. I.e., I was worried about us > > painting ourselves into a corner with your patches, but we may have > > already done so years ago. ;) > > I think to some extent Git is already inconsistent here. IMO it would be > preferable for both input and output to use NUL as the delimiter when > machine parsing in git-rev-list(1) as that is the behavior I would > personally expect. I also agree with Patrick's reasoning else where in > this thread[1]. > > I'm open to discuss further though :) Nope, I don't have anything further to add. I just wanted to make sure my suggestion in the earlier part of the thread was not leading us accidentally down a bad path. ;) If the inconsistencies have been considered and we're OK with them, that sounds good to me (and the option handling in your v2 does seem to faithfully implement that). Thanks for all your responses. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-14 2:49 ` Jeff King @ 2025-03-14 17:02 ` Junio C Hamano 2025-03-14 18:59 ` Jeff King 0 siblings, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-03-14 17:02 UTC (permalink / raw) To: Jeff King; +Cc: Justin Tobler, git, ps, christian.couder Jeff King <peff@peff.net> writes: > On Thu, Mar 13, 2025 at 11:41:56AM -0500, Justin Tobler wrote: > >> > Though that does mean that these two commands >> > will behave differently: >> > >> > git log --stdin -z >> > git rev-list --stdin -z >> > >> > which seems...not great. My earlier suggestion to tie "-z" to stdin >> > handling was for consistency with other tools like grep. But if we >> > already have cases where "-z" is only for output, maybe it is better to >> > stay consistent with other parts of git. I.e., I was worried about us >> > painting ourselves into a corner with your patches, but we may have >> > already done so years ago. ;) >> >> I think to some extent Git is already inconsistent here. IMO it would be >> preferable for both input and output to use NUL as the delimiter when >> machine parsing in git-rev-list(1) as that is the behavior I would >> personally expect. I also agree with Patrick's reasoning else where in >> this thread[1]. >> >> I'm open to discuss further though :) > > Nope, I don't have anything further to add. I just wanted to make sure > my suggestion in the earlier part of the thread was not leading us > accidentally down a bad path. ;) If the inconsistencies have been > considered and we're OK with them, that sounds good to me (and the > option handling in your v2 does seem to faithfully implement that). > > Thanks for all your responses. The developers who know that the revision walk infrastructure is shared between rev-list and log may wish that "-z" to mean the same thing for these two commands, but the end-users do not have to, no? After all, "git log" accepts "-z" but "git rev-list" does not in the current system and nobody complained about the discrepancy so far. Having said that, at the plumbing level, my preference is to have two independent options "--nul-delimited-{output,input}". It does not prevent us from starting with a single "-z" that works as a short-hand that flips both on (and is inconsistent with "git log" at the Porcelain level), but we can make "-z" only for output for consistency. As long as we agree on the design to allow us to control both sides independently, starting with "-z" that is only for output may be the best way forward. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-14 17:02 ` Junio C Hamano @ 2025-03-14 18:59 ` Jeff King 2025-03-14 19:53 ` Justin Tobler 0 siblings, 1 reply; 60+ messages in thread From: Jeff King @ 2025-03-14 18:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Justin Tobler, git, ps, christian.couder On Fri, Mar 14, 2025 at 10:02:11AM -0700, Junio C Hamano wrote: > The developers who know that the revision walk infrastructure is > shared between rev-list and log may wish that "-z" to mean the same > thing for these two commands, but the end-users do not have to, no? I think of them as related in a user-facing way, but it is likely that my mind is poisoned by knowing too much of Git's internals. :) > After all, "git log" accepts "-z" but "git rev-list" does not in the > current system and nobody complained about the discrepancy so far. Well, rev-list _does_ take "-z", but it just happens that it cannot ever do anything because you cannot convince it to produce a diff. But even knowing that is true is probably again a sign of my poisoned mind. > Having said that, at the plumbing level, my preference is to have > two independent options "--nul-delimited-{output,input}". It does > not prevent us from starting with a single "-z" that works as a > short-hand that flips both on (and is inconsistent with "git log" at > the Porcelain level), but we can make "-z" only for output for > consistency. As long as we agree on the design to allow us to > control both sides independently, starting with "-z" that is only > for output may be the best way forward. Yeah, I almost suggested earlier having longer, unambiguous names. And then that punts the issue from "which functionality should be available" to "which functionality should be mapped to short-and-sweet -z". I do think it's still worth considering what "-z" should do _now_, though, because it will be painful/impossible to switch its behavior later. And people seemed to like the "both input and output" direction. That would leave the longer names as escape hatches. I.e., I'd expect: git rev-list -z --no-nul-delimited-input --stdin to use newlines for the input and NULs for the output. -Peff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-14 18:59 ` Jeff King @ 2025-03-14 19:53 ` Justin Tobler 2025-03-14 21:16 ` Junio C Hamano 0 siblings, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-14 19:53 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, ps, christian.couder On 25/03/14 02:59PM, Jeff King wrote: > On Fri, Mar 14, 2025 at 10:02:11AM -0700, Junio C Hamano wrote: > > Having said that, at the plumbing level, my preference is to have > > two independent options "--nul-delimited-{output,input}". It does > > not prevent us from starting with a single "-z" that works as a > > short-hand that flips both on (and is inconsistent with "git log" at > > the Porcelain level), but we can make "-z" only for output for > > consistency. As long as we agree on the design to allow us to > > control both sides independently, starting with "-z" that is only > > for output may be the best way forward. > > Yeah, I almost suggested earlier having longer, unambiguous names. And > then that punts the issue from "which functionality should be available" > to "which functionality should be mapped to short-and-sweet -z". > > I do think it's still worth considering what "-z" should do _now_, > though, because it will be painful/impossible to switch its behavior > later. And people seemed to like the "both input and output" direction. > That would leave the longer names as escape hatches. I.e., I'd expect: > > git rev-list -z --no-nul-delimited-input --stdin > > to use newlines for the input and NULs for the output. If we want to adopt less ambiguous long options names for NUL-delimited input/output options as an alternative to "-z", maybe we could do something like: $ git rev-list --nul-delimited={all,input,output} where the default for the `--nul-delimited` could be both input/output. If we want to do something along these lines, I can send another verison of this series where we drop "-z" in favor of using this option. -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-14 19:53 ` Justin Tobler @ 2025-03-14 21:16 ` Junio C Hamano 2025-03-19 15:58 ` Justin Tobler 0 siblings, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2025-03-14 21:16 UTC (permalink / raw) To: Justin Tobler; +Cc: Jeff King, git, ps, christian.couder Justin Tobler <jltobler@gmail.com> writes: > If we want to adopt less ambiguous long options names for NUL-delimited > input/output options as an alternative to "-z", maybe we could do > something like: > > $ git rev-list --nul-delimited={all,input,output} > > where the default for the `--nul-delimited` could be both input/output. I'd prefer not to see that route taken, as it does not look any "less ambiguous" at least to me. Making individual selections are almost the same in either syntax, and the only difference is that --nul-delimited-input --nul-delimited-output can be independently chosen and given and happen to end up selecting both. But with --nul-delimited=<value>, you have to plan ahead and choose "all". When your script first wants NUL delimited I/O on the output side, you'd write "output". When later you want to allow it to optionally take NUL delimited I/O on the input side, you have to notice that you have "output" there already and replace it with "all". If the initial version did not have NUL-delimited output, your change to add support for NUL-delimited input would be different. And you also have to remember that it has to be spelled "all" and not "both" when you replace existing "output". In other words, I'd prefer to leave independent/orthogonal things as such, even if such a general design principle may make the result a bit more verbose, at the plumbing level. Thanks. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode 2025-03-14 21:16 ` Junio C Hamano @ 2025-03-19 15:58 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-19 15:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git, ps, christian.couder On 25/03/14 02:16PM, Junio C Hamano wrote: > Justin Tobler <jltobler@gmail.com> writes: > > > If we want to adopt less ambiguous long options names for NUL-delimited > > input/output options as an alternative to "-z", maybe we could do > > something like: > > > > $ git rev-list --nul-delimited={all,input,output} > > > > where the default for the `--nul-delimited` could be both input/output. > > I'd prefer not to see that route taken, as it does not look any > "less ambiguous" at least to me. Making individual selections are > almost the same in either syntax, and the only difference is that > --nul-delimited-input --nul-delimited-output can be independently > chosen and given and happen to end up selecting both. > > But with --nul-delimited=<value>, you have to plan ahead and choose > "all". When your script first wants NUL delimited I/O on the output > side, you'd write "output". When later you want to allow it to > optionally take NUL delimited I/O on the input side, you have to > notice that you have "output" there already and replace it with "all". > If the initial version did not have NUL-delimited output, your change > to add support for NUL-delimited input would be different. > > And you also have to remember that it has to be spelled "all" and > not "both" when you replace existing "output". > > In other words, I'd prefer to leave independent/orthogonal things as > such, even if such a general design principle may make the result a > bit more verbose, at the plumbing level. That's fair. I agree the explicitness of having two separate options is nice and, while more verbose, that is probably not a big deal at the plumbing level. In my next version I'll return the "-z" option to only setting the output to be NUL-delimited which would better match the log family of commands. If we also want to support NUL-delimited stdin parsing, I can submit a followup series which teaches git-rev-list(1) the "--nul-delimited-{input,output}" options. On a side note, I also noticed that git-commit(1) uses the "--null" option as an alias to "-z". I think it would be nice if going forward there was greater consistency around the options used to control input/output delimiters. Maybe "--nul-delimited-{input,output}" could be that in the future. Thanks, -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v2 0/6] rev-list: introduce NUL-delimited output mode 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler ` (6 preceding siblings ...) 2025-03-11 23:57 ` Jeff King @ 2025-03-13 0:17 ` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler ` (6 more replies) 7 siblings, 7 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 0:17 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler When walking objects, git-rev-list(1) prints each object entry on a separate line in the form: <oid> LF Some options, such as `--objects`, may print additional information about the object on the same line: <oid> SP [<path>] LF In this mode, if the object path contains a newline it is truncated at the newline. The `--boundary` option also modifies output by prefixing boundary objects with `-`: -<oid> LF When the `--missing={print,print-info}` option is provided, information about any missing objects encountered during the object walk are also printed in the form: ?<oid> [SP <token>=<value>]... LF where values containing LF or SP are printed in a token specific fashion so that the resulting encoded value does not contain either of these two problematic bytes. For example, missing object paths are quoted in the C style when they contain LF or SP. To make machine parsing easier, this series introduces a NUL-delimited output mode for git-rev-list(1) via a `-z` option. In this mode, the output format for object records is unified such that each object and its accompanying metadata is formatted without relying on object metadata order. This format follows the existing `<token>=<value>` used by the `--missing` option to represent object metadata in the form: <oid> NUL [<token>=<value> NUL]... # Examples <oid> LF -> <oid> NUL <oid> SP <path> LF -> <oid> NUL path=<path> NUL -<oid> LF -> <oid> NUL boundary=yes NUL ?<oid> [SP <token>=<value>]... -> <oid> NUL missing=yes NUL [<token>=<value> NUL]... Note that token value info is printed as-is without any special encoding or truncation. Prefixes such as '-' and '?' are dropped in favor using a token/value pair to signal the same information. While in this mode, if the `--sdtin` option is used, revision and pathspec arguments read from stdin are separated with a NUL byte instead of being newline delimited. For now this series only adds support for use with the `--objects`, `--boundary` and `--missing` output options. Usage of `-z` with other options is rejected, so it can potentially be added in the future. This series is structured as follows: - Patches 1 and 2 do some minor preparatory refactors. - Patch 3 modifies stdin argument parsing handled by `setup_revisions()` to support NUL-delimited arguments. - Patch 4 adds the `-z` option to git-rev-list(1) to print objects in a NUL-delimited fashion. Arguments parsed on stdin while in the mode are also NUL-delimited. - Patch 5 teaches the `--boundary` option how to print info in a NUL-delimited fashino using the unified output format. - Patch 6 teaches the `--missing` option how to print info in a NUL-delimited fashion using the unified output format. Changes since V1: - Use unified output format with `<token>=<value>` pairs for all object metadata. - Add support for the `--boundary` option in NUL-delimited mode. - Add support for NUL-delimited stdin argument parsing in NUL-delimited mode. - Instead of using two NUL bytes to delimit between object records, a single NUL byte is used. Now that object metadata is always in the form `<token>=<value>`, we know a new object record starts when there is an OID entry which will not contain '='. Thanks for taking a look, -Justin Justin Tobler (6): rev-list: inline `show_object_with_name()` in `show_object()` rev-list: refactor early option parsing revision: support NUL-delimited --stdin mode rev-list: support delimiting objects with NUL bytes rev-list: support NUL-delimited --boundary option rev-list: support NUL-delimited --missing option Documentation/rev-list-options.adoc | 26 ++++++++ builtin/rev-list.c | 92 ++++++++++++++++++++++------- revision.c | 27 ++++----- revision.h | 5 +- t/t6000-rev-list-misc.sh | 51 ++++++++++++++++ t/t6017-rev-list-stdin.sh | 9 +++ t/t6022-rev-list-missing.sh | 31 ++++++++++ 7 files changed, 200 insertions(+), 41 deletions(-) Range-diff against v1: 1: d2eded3ac7 = 1: d2eded3ac7 rev-list: inline `show_object_with_name()` in `show_object()` 2: 03cd08c859 = 2: 03cd08c859 rev-list: refactor early option parsing -: ---------- > 3: 803a49933a revision: support NUL-delimited --stdin mode 3: 41c5cb7737 ! 4: d3b3c4ef89 rev-list: support delimiting objects with NUL bytes @@ Commit message newline are also truncated at the newline. Introduce the `-z` option for git-rev-list(1) which reformats the output - to use NUL-delimiters between objects and associated info. Each object - line uses two NUL bytes to indicate the end of an object entry and a - single NUL byte to delimit between object information in the following - form: + to use NUL-delimiters between objects and associated info in the + following form: $ git rev-list -z --objects <rev> - <oid> [NUL <path>] NUL NUL + <oid> NUL [path=<path> NUL] - For now, the `--objects` flag is the only option that can be used in - combination with `-z`. In this mode, the object path is not truncated at - newlines. In a subsequent commit, NUL-delimiter support for other - options is added. Other options that do not make sense with be used in - combination with `-z` are rejected. + In this form, the start of each record is signaled by an OID entry that + is all hexidecimal and does not contain any '='. Additional path info + from `--objects` is appended to the record as a token/value pair + `path=<path>` as-is without any truncation. + + In this mode, revision and pathspec arguments provided on stdin with the + `--stdin` option are also separated by a NUL byte instead of being + newline delimited. + + For now, the `--objects` and `--stdin` flag are the only options that + can be used in combination with `-z`. In a subsequent commit, + NUL-delimited support for other options is added. Other options that do + not make sense with be used in combination with `-z` are rejected. Signed-off-by: Justin Tobler <jltobler@gmail.com> @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[] `<header>` text will be printed with each progress update. + +-z:: -+ Instead of being newline-delimited, each outputted object is delimited -+ with two NUL bytes in the following form: ++ Instead of being newline-delimited, each outputted object and its ++ accompanying metadata is delimited using NUL bytes in the following ++ form: ++ +----------------------------------------------------------------------- -+<OID> NUL NUL ++<OID> NUL [<token>=<value> NUL]... +----------------------------------------------------------------------- ++ -+When the `--objects` option is also present, available object name information -+is printed in the following form without any truncation for object names -+containing newline characters: ++Additional object metadata, such as object paths, is printed using the ++`<token>=<value>` form. Token values are printed as-is without any ++encoding/truncation. An OID entry never contains a '=' character and thus ++is used to signal the start of a new object record. Examples: ++ +----------------------------------------------------------------------- -+<OID> [NUL <object-name>] NUL NUL ++<OID> NUL ++<OID> NUL path=<path> NUL +----------------------------------------------------------------------- ++ -+This option is only compatible with `--objects`. ++This mode is only compatible with the `--objects` output option. Also, revision ++and pathspec argument parsing on stdin with the `--stdin` option is NUL byte ++delimited instead of using newlines while in this mode. endif::git-rev-list[] History Simplification @@ builtin/rev-list.c: static int arg_show_object_names = 1; #define DEFAULT_OIDSET_SIZE (16*1024) -+static int nul_delim; ++static char line_term = '\n'; ++static char info_term = ' '; ++ static int show_disk_usage; static off_t total_disk_usage; static int human_readable; - -+static void print_object_term(int nul_delim) -+{ -+ char line_sep = '\n'; -+ -+ if (nul_delim) -+ line_sep = '\0'; -+ -+ putchar(line_sep); -+ if (nul_delim) -+ putchar(line_sep); -+} -+ - static off_t get_object_disk_usage(struct object *obj) - { - off_t size; @@ builtin/rev-list.c: static void show_commit(struct commit *commit, void *data) if (revs->commit_format == CMIT_FMT_ONELINE) putchar(' '); else if (revs->include_header) - putchar('\n'); -+ print_object_term(nul_delim); ++ putchar(line_term); if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; @@ builtin/rev-list.c: static void show_object(struct object *obj, const char *name - putchar(' '); - for (const char *p = name; *p && *p != '\n'; p++) - putchar(*p); -+ if (nul_delim && *name) { -+ putchar('\0'); -+ printf("%s", name); -+ } else if (!nul_delim) { -+ putchar(' '); ++ if (line_term) { ++ putchar(info_term); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); ++ } else if (*name) { ++ printf("%cpath=%s", info_term, name); + } } - putchar('\n'); -+ print_object_term(nul_delim); ++ putchar(line_term); } static void show_edge(struct commit *commit) @@ builtin/rev-list.c: int cmd_rev_list(int argc, } else if (skip_prefix(arg, "--missing=", &arg)) { parse_missing_action_value(arg); + } else if (!strcmp(arg, "-z")) { -+ nul_delim = 1; ++ s_r_opt.nul_delim_stdin = 1; ++ line_term = '\0'; ++ info_term = '\0'; } } @@ builtin/rev-list.c: int cmd_rev_list(int argc, } + -+ if (nul_delim) { ++ /* ++ * Reject options currently incompatible with -z. For some options, this ++ * is not an inherent limitation and support may be implemented in the ++ * future. ++ */ ++ if (!line_term) { + if (revs.graph || revs.verbose_header || show_disk_usage || + info.show_timestamp || info.header_prefix || bisect_list || -+ use_bitmap_index || revs.edge_hint || arg_missing_action) ++ use_bitmap_index || revs.edge_hint || revs.left_right || ++ revs.cherry_mark || arg_missing_action || revs.boundary) + die(_("-z option used with unsupported option")); + } + @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list --unpacked' ' + oid1=$(git -C repo rev-parse HEAD) && + oid2=$(git -C repo rev-parse HEAD~) && + -+ printf "%s\0\0%s\0\0" "$oid1" "$oid2" >expect && ++ printf "%s\0%s\0" "$oid1" "$oid2" >expect && + git -C repo rev-list -z HEAD >actual && + + test_cmp expect actual @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list --unpacked' ' + path1=1.t && + path2=2.t && + -+ printf "%s\0%s\0\0%s\0%s\0\0" "$oid1" "$path1" "$oid2" "$path2" >expect && ++ printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \ ++ >expect && + git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual && + + test_cmp expect actual +' + test_done + + ## t/t6017-rev-list-stdin.sh ## +@@ t/t6017-rev-list-stdin.sh: test_expect_success '--not via stdin does not influence revisions from command l + test_cmp expect actual + ' + ++test_expect_success 'NUL-delimited stdin' ' ++ printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input && ++ ++ git rev-list -z --objects HEAD -- file-1 >expect && ++ git rev-list -z --objects --stdin <input >actual && ++ ++ test_cmp expect actual ++' ++ + test_done -: ---------- > 5: 5e4fc41976 rev-list: support NUL-delimited --boundary option 4: 007adbac25 ! 6: 7744966514 rev-list: support NUL-delimited --missing option @@ Commit message rev-list: support NUL-delimited --missing option The `--missing={print,print-info}` option for git-rev-list(1) prints - missing objects found while performing the revision walk. Add support - for printing missing objects in a NUL-delimited format when the `-z` - option is enabled. + missing objects found while performing the object walk in the form: + + $ git rev-list --missing=print-info <rev> + ?<oid> [SP <token>=<value>]... LF + + Add support for printing missing objects in a NUL-delimited format when + the `-z` option is enabled. $ git rev-list -z --missing=print-info <rev> - <oid> NUL NUL - ?<oid> [NUL <token>=<value>]... NUL NUL + <oid> NUL missing=yes NUL [<token>=<value> NUL]... In this mode, values containing special characters or spaces are printed - as-is without being escaped or quoted. + as-is without being escaped or quoted. Instead of prefixing the missing + OID with '?', a separate `missing=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> ## Documentation/rev-list-options.adoc ## -@@ Documentation/rev-list-options.adoc: containing newline characters: - <OID> [NUL <object-name>] NUL NUL +@@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[] + <OID> NUL [<token>=<value> NUL]... + ----------------------------------------------------------------------- + + +-Additional object metadata, such as object paths or boundary objects, is +-printed using the `<token>=<value>` form. Token values are printed as-is ++Additional object metadata, such as object paths or boundary/missing objects, ++is printed using the `<token>=<value>` form. Token values are printed as-is + without any encoding/truncation. An OID entry never contains a '=' character + and thus is used to signal the start of a new object record. Examples: + + +@@ Documentation/rev-list-options.adoc: and thus is used to signal the start of a new object record. Examples: + <OID> NUL + <OID> NUL path=<path> NUL + <OID> NUL boundary=yes NUL ++<OID> NUL missing=yes NUL [<token>=<value> NUL]... ----------------------------------------------------------------------- + --This option is only compatible with `--objects`. -+When the `--missing` option is provided, missing objects are printed in the -+following form where value is printed as-is without any token specific -+encoding: -++ -+----------------------------------------------------------------------- -+?<OID> [NUL <token>=<value>]... NUL NUL -+----------------------------------------------------------------------- -++ -+This option is only compatible with `--objects` and `--missing`. +-This mode is only compatible with the `--objects` and `--boundary` output +-options. Also, revision and pathspec argument parsing on stdin with the +-`--stdin` option is NUL byte delimited instead of using newlines while in this +-mode. ++This mode is only compatible with the `--objects`, `--boundary`, and ++`--missing` output options. Also, revision and pathspec argument parsing on ++stdin with the `--stdin` option is NUL byte delimited instead of using newlines ++while in this mode. endif::git-rev-list[] History Simplification ## builtin/rev-list.c ## @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_entry *entry, - int print_missing_info) { struct strbuf sb = STRBUF_INIT; -+ char info_sep = ' '; + ++ if (line_term) ++ putchar('?'); + -+ if (nul_delim) -+ info_sep = '\0'; ++ printf("%s", oid_to_hex(&entry->entry.oid)); ++ ++ if (!line_term) ++ printf("%cmissing=yes", info_term); + -+ printf("?%s", oid_to_hex(&entry->entry.oid)); - if (!print_missing_info) { - printf("?%s\n", oid_to_hex(&entry->entry.oid)); -+ print_object_term(nul_delim); ++ putchar(line_term); return; } @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_ - strbuf_addstr(&sb, " path="); - quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); - strbuf_addbuf(&sb, &path); -+ strbuf_addf(&sb, "%cpath=", info_sep); ++ strbuf_addf(&sb, "%cpath=", info_term); + -+ if (nul_delim) { -+ strbuf_addstr(&sb, entry->path); -+ } else { ++ if (line_term) { + quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); + strbuf_addbuf(&sb, &path); ++ } else { ++ strbuf_addstr(&sb, entry->path); + } strbuf_release(&path); } if (entry->type) - strbuf_addf(&sb, " type=%s", type_name(entry->type)); -+ strbuf_addf(&sb, "%ctype=%s", info_sep, type_name(entry->type)); ++ strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type)); + + fwrite(sb.buf, sizeof(char), sb.len, stdout); -+ print_object_term(nul_delim); ++ putchar(line_term); - printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf); strbuf_release(&sb); } @@ builtin/rev-list.c: int cmd_rev_list(int argc, - if (nul_delim) { if (revs.graph || revs.verbose_header || show_disk_usage || info.show_timestamp || info.header_prefix || bisect_list || -- use_bitmap_index || revs.edge_hint || arg_missing_action) -+ use_bitmap_index || revs.edge_hint) + use_bitmap_index || revs.edge_hint || revs.left_right || +- revs.cherry_mark || arg_missing_action) ++ revs.cherry_mark) die(_("-z option used with unsupported option")); } @@ t/t6022-rev-list-missing.sh: do + + git rev-list -z --objects --no-object-names \ + HEAD ^"$oid" >expect && -+ printf "?%s\0path=%s\0type=%s\0\0" "$oid" "$path" "$type" >>expect && ++ printf "%s\0missing=yes\0path=%s\0type=%s\0" "$oid" "$path" \ ++ "$type" >>expect && + + mv "$obj_path" "$obj_path.hidden" && + git rev-list -z --objects --no-object-names \ base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3 -- 2.49.0.rc2 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler @ 2025-03-13 0:17 ` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 2/6] rev-list: refactor early option parsing Justin Tobler ` (5 subsequent siblings) 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 0:17 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler The `show_object_with_name()` function only has a single call site. Inline call to `show_object_with_name()` in `show_object()` so the explicit function can be cleaned up and live closer to where it is used. While at it, factor out the code that prints the OID and newline for both objects with and without a name. In a subsequent commit, `show_object()` is modified to support printing object information in a NUL-delimited format. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- builtin/rev-list.c | 13 +++++++++---- revision.c | 8 -------- revision.h | 2 -- 3 files changed, 9 insertions(+), 14 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index bb26bee0d4..dcd079c16c 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data) return; } - if (arg_show_object_names) - show_object_with_name(stdout, obj, name); - else - printf("%s\n", oid_to_hex(&obj->oid)); + printf("%s", oid_to_hex(&obj->oid)); + + if (arg_show_object_names) { + putchar(' '); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); + } + + putchar('\n'); } static void show_edge(struct commit *commit) diff --git a/revision.c b/revision.c index c4390f0938..0eaebe4478 100644 --- a/revision.c +++ b/revision.c @@ -59,14 +59,6 @@ implement_shared_commit_slab(revision_sources, char *); static inline int want_ancestry(const struct rev_info *revs); -void show_object_with_name(FILE *out, struct object *obj, const char *name) -{ - fprintf(out, "%s ", oid_to_hex(&obj->oid)); - for (const char *p = name; *p && *p != '\n'; p++) - fputc(*p, out); - fputc('\n', out); -} - static void mark_blob_uninteresting(struct blob *blob) { if (!blob) diff --git a/revision.h b/revision.h index 71e984c452..21c6a69899 100644 --- a/revision.h +++ b/revision.h @@ -489,8 +489,6 @@ void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit); void mark_tree_uninteresting(struct repository *r, struct tree *tree); void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees); -void show_object_with_name(FILE *, struct object *, const char *); - /** * Helpers to check if a reference should be excluded. */ -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 2/6] rev-list: refactor early option parsing 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler 2025-03-13 0:17 ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler @ 2025-03-13 0:17 ` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 3/6] revision: support NUL-delimited --stdin mode Justin Tobler ` (4 subsequent siblings) 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 0:17 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler Before invoking `setup_revisions()`, the `--missing` and `--exclude-promisor-objects` options are parsed early. In a subsequent commit, another option is added that must be parsed early. Refactor the code to parse both options in a single early pass. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- builtin/rev-list.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index dcd079c16c..04d9c893b5 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -16,6 +16,7 @@ #include "object-file.h" #include "object-store-ll.h" #include "pack-bitmap.h" +#include "parse-options.h" #include "log-tree.h" #include "graph.h" #include "bisect.h" @@ -639,19 +640,15 @@ int cmd_rev_list(int argc, if (!strcmp(arg, "--exclude-promisor-objects")) { fetch_if_missing = 0; revs.exclude_promisor_objects = 1; - break; - } - } - for (i = 1; i < argc; i++) { - const char *arg = argv[i]; - if (skip_prefix(arg, "--missing=", &arg)) { - if (revs.exclude_promisor_objects) - die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing"); - if (parse_missing_action_value(arg)) - break; + } else if (skip_prefix(arg, "--missing=", &arg)) { + parse_missing_action_value(arg); } } + die_for_incompatible_opt2(revs.exclude_promisor_objects, + "--exclude_promisor_objects", + arg_missing_action, "--missing"); + if (arg_missing_action) revs.do_not_die_on_missing_objects = 1; -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 3/6] revision: support NUL-delimited --stdin mode 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler 2025-03-13 0:17 ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 2/6] rev-list: refactor early option parsing Justin Tobler @ 2025-03-13 0:17 ` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler ` (3 subsequent siblings) 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 0:17 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler When `setup_revisions()` parses the `--stdin` option, revision and pathspec arguments are read from stdin. Each line of input is handled as a separate argument. Introduce the `nul_delim_stdin` field to `setup_revision_opt` that, when enabled, uses a NUL byte to delimit between stdin arguments instead of newline. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- revision.c | 19 +++++++++++-------- revision.h | 3 ++- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/revision.c b/revision.c index 0eaebe4478..5de6309830 100644 --- a/revision.c +++ b/revision.c @@ -2275,10 +2275,10 @@ int handle_revision_arg(const char *arg, struct rev_info *revs, int flags, unsig return ret; } -static void read_pathspec_from_stdin(struct strbuf *sb, - struct strvec *prune) +static void read_pathspec_from_stdin(struct strbuf *sb, struct strvec *prune, + int line_term) { - while (strbuf_getline(sb, stdin) != EOF) + while (strbuf_getdelim_strip_crlf(sb, stdin, line_term) != EOF) strvec_push(prune, sb->buf); } @@ -2905,8 +2905,8 @@ static int handle_revision_pseudo_opt(struct rev_info *revs, return 1; } -static void read_revisions_from_stdin(struct rev_info *revs, - struct strvec *prune) +static void read_revisions_from_stdin(struct rev_info *revs, struct strvec *prune, + int line_term) { struct strbuf sb; int seen_dashdash = 0; @@ -2918,7 +2918,7 @@ static void read_revisions_from_stdin(struct rev_info *revs, warn_on_object_refname_ambiguity = 0; strbuf_init(&sb, 1000); - while (strbuf_getline(&sb, stdin) != EOF) { + while (strbuf_getdelim_strip_crlf(&sb, stdin, line_term) != EOF) { if (!sb.len) break; @@ -2946,7 +2946,7 @@ static void read_revisions_from_stdin(struct rev_info *revs, die("bad revision '%s'", sb.buf); } if (seen_dashdash) - read_pathspec_from_stdin(&sb, prune); + read_pathspec_from_stdin(&sb, prune, line_term); strbuf_release(&sb); warn_on_object_refname_ambiguity = save_warning; @@ -3019,13 +3019,16 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s } if (!strcmp(arg, "--stdin")) { + int term = opt && opt->nul_delim_stdin ? '\0' : '\n'; + if (revs->disable_stdin) { argv[left++] = arg; continue; } if (revs->read_from_stdin++) die("--stdin given twice?"); - read_revisions_from_stdin(revs, &prune_data); + read_revisions_from_stdin(revs, &prune_data, + term); continue; } diff --git a/revision.h b/revision.h index 21c6a69899..0e680c3667 100644 --- a/revision.h +++ b/revision.h @@ -439,7 +439,8 @@ struct setup_revision_opt { void (*tweak)(struct rev_info *); unsigned int assume_dashdash:1, allow_exclude_promisor_objects:1, - free_removed_argv_elements:1; + free_removed_argv_elements:1, + nul_delim_stdin:1; unsigned revarg_opt; }; int setup_revisions(int argc, const char **argv, struct rev_info *revs, -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler ` (2 preceding siblings ...) 2025-03-13 0:17 ` [PATCH v2 3/6] revision: support NUL-delimited --stdin mode Justin Tobler @ 2025-03-13 0:17 ` Justin Tobler 2025-03-13 12:55 ` Patrick Steinhardt 2025-03-13 0:17 ` [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler ` (2 subsequent siblings) 6 siblings, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-13 0:17 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler When walking objects, git-rev-list(1) prints each object entry on a separate line. Some options, such as `--objects`, may print additional information about tree and blob object on the same line in the form: $ git rev-list --objects <rev> <tree/blob oid> SP [<path>] LF Note that in this form the SP is appended regardless of whether the tree or blob object has path information available. Paths containing a newline are also truncated at the newline. Introduce the `-z` option for git-rev-list(1) which reformats the output to use NUL-delimiters between objects and associated info in the following form: $ git rev-list -z --objects <rev> <oid> NUL [path=<path> NUL] In this form, the start of each record is signaled by an OID entry that is all hexidecimal and does not contain any '='. Additional path info from `--objects` is appended to the record as a token/value pair `path=<path>` as-is without any truncation. In this mode, revision and pathspec arguments provided on stdin with the `--stdin` option are also separated by a NUL byte instead of being newline delimited. For now, the `--objects` and `--stdin` flag are the only options that can be used in combination with `-z`. In a subsequent commit, NUL-delimited support for other options is added. Other options that do not make sense with be used in combination with `-z` are rejected. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 23 ++++++++++++++++++ builtin/rev-list.c | 36 +++++++++++++++++++++++++---- t/t6000-rev-list-misc.sh | 35 ++++++++++++++++++++++++++++ t/t6017-rev-list-stdin.sh | 9 ++++++++ 4 files changed, 98 insertions(+), 5 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 785c0786e0..166d3cd19e 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -361,6 +361,29 @@ ifdef::git-rev-list[] --progress=<header>:: Show progress reports on stderr as objects are considered. The `<header>` text will be printed with each progress update. + +-z:: + Instead of being newline-delimited, each outputted object and its + accompanying metadata is delimited using NUL bytes in the following + form: ++ +----------------------------------------------------------------------- +<OID> NUL [<token>=<value> NUL]... +----------------------------------------------------------------------- ++ +Additional object metadata, such as object paths, is printed using the +`<token>=<value>` form. Token values are printed as-is without any +encoding/truncation. An OID entry never contains a '=' character and thus +is used to signal the start of a new object record. Examples: ++ +----------------------------------------------------------------------- +<OID> NUL +<OID> NUL path=<path> NUL +----------------------------------------------------------------------- ++ +This mode is only compatible with the `--objects` output option. Also, revision +and pathspec argument parsing on stdin with the `--stdin` option is NUL byte +delimited instead of using newlines while in this mode. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 04d9c893b5..f048500679 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -65,6 +65,7 @@ static const char rev_list_usage[] = " --abbrev-commit\n" " --left-right\n" " --count\n" +" -z\n" " special purpose:\n" " --bisect\n" " --bisect-vars\n" @@ -97,6 +98,9 @@ static int arg_show_object_names = 1; #define DEFAULT_OIDSET_SIZE (16*1024) +static char line_term = '\n'; +static char info_term = ' '; + static int show_disk_usage; static off_t total_disk_usage; static int human_readable; @@ -264,7 +268,7 @@ static void show_commit(struct commit *commit, void *data) if (revs->commit_format == CMIT_FMT_ONELINE) putchar(' '); else if (revs->include_header) - putchar('\n'); + putchar(line_term); if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; @@ -361,12 +365,16 @@ static void show_object(struct object *obj, const char *name, void *cb_data) printf("%s", oid_to_hex(&obj->oid)); if (arg_show_object_names) { - putchar(' '); - for (const char *p = name; *p && *p != '\n'; p++) - putchar(*p); + if (line_term) { + putchar(info_term); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); + } else if (*name) { + printf("%cpath=%s", info_term, name); + } } - putchar('\n'); + putchar(line_term); } static void show_edge(struct commit *commit) @@ -642,6 +650,10 @@ int cmd_rev_list(int argc, revs.exclude_promisor_objects = 1; } else if (skip_prefix(arg, "--missing=", &arg)) { parse_missing_action_value(arg); + } else if (!strcmp(arg, "-z")) { + s_r_opt.nul_delim_stdin = 1; + line_term = '\0'; + info_term = '\0'; } } @@ -757,6 +769,20 @@ int cmd_rev_list(int argc, usage(rev_list_usage); } + + /* + * Reject options currently incompatible with -z. For some options, this + * is not an inherent limitation and support may be implemented in the + * future. + */ + if (!line_term) { + if (revs.graph || revs.verbose_header || show_disk_usage || + info.show_timestamp || info.header_prefix || bisect_list || + use_bitmap_index || revs.edge_hint || revs.left_right || + revs.cherry_mark || arg_missing_action || revs.boundary) + die(_("-z option used with unsupported option")); + } + if (revs.commit_format != CMIT_FMT_USERFORMAT) revs.include_header = 1; if (revs.commit_format != CMIT_FMT_UNSPECIFIED) { diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 6289a2e8b0..dfbbc0aee6 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -182,4 +182,39 @@ test_expect_success 'rev-list --unpacked' ' test_cmp expect actual ' +test_expect_success 'rev-list -z' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD) && + oid2=$(git -C repo rev-parse HEAD~) && + + printf "%s\0%s\0" "$oid1" "$oid2" >expect && + git -C repo rev-list -z HEAD >actual && + + test_cmp expect actual +' + +test_expect_success 'rev-list -z --objects' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD:1.t) && + oid2=$(git -C repo rev-parse HEAD:2.t) && + path1=1.t && + path2=2.t && + + printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \ + >expect && + git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual && + + test_cmp expect actual +' + test_done diff --git a/t/t6017-rev-list-stdin.sh b/t/t6017-rev-list-stdin.sh index 4821b90e74..362a8b126a 100755 --- a/t/t6017-rev-list-stdin.sh +++ b/t/t6017-rev-list-stdin.sh @@ -148,4 +148,13 @@ test_expect_success '--not via stdin does not influence revisions from command l test_cmp expect actual ' +test_expect_success 'NUL-delimited stdin' ' + printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input && + + git rev-list -z --objects HEAD -- file-1 >expect && + git rev-list -z --objects --stdin <input >actual && + + test_cmp expect actual +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes 2025-03-13 0:17 ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler @ 2025-03-13 12:55 ` Patrick Steinhardt 2025-03-13 14:44 ` Justin Tobler 0 siblings, 1 reply; 60+ messages in thread From: Patrick Steinhardt @ 2025-03-13 12:55 UTC (permalink / raw) To: Justin Tobler; +Cc: git, christian.couder, peff, ben.knoble On Wed, Mar 12, 2025 at 07:17:04PM -0500, Justin Tobler wrote: > diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc > index 785c0786e0..166d3cd19e 100644 > --- a/Documentation/rev-list-options.adoc > +++ b/Documentation/rev-list-options.adoc > @@ -361,6 +361,29 @@ ifdef::git-rev-list[] > --progress=<header>:: > Show progress reports on stderr as objects are considered. The > `<header>` text will be printed with each progress update. > + > +-z:: > + Instead of being newline-delimited, each outputted object and its > + accompanying metadata is delimited using NUL bytes in the following > + form: > ++ > +----------------------------------------------------------------------- > +<OID> NUL [<token>=<value> NUL]... > +----------------------------------------------------------------------- > ++ > +Additional object metadata, such as object paths, is printed using the > +`<token>=<value>` form. Token values are printed as-is without any > +encoding/truncation. An OID entry never contains a '=' character and thus > +is used to signal the start of a new object record. Examples: > ++ > +----------------------------------------------------------------------- > +<OID> NUL > +<OID> NUL path=<path> NUL > +----------------------------------------------------------------------- > ++ > +This mode is only compatible with the `--objects` output option. Also, revision > +and pathspec argument parsing on stdin with the `--stdin` option is NUL byte > +delimited instead of using newlines while in this mode. > endif::git-rev-list[] > > History Simplification I feel like this last paragraph, where we talk about `--stdin` being NUL-delimited, should already be mentioned in the first paragraph. Patrick ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes 2025-03-13 12:55 ` Patrick Steinhardt @ 2025-03-13 14:44 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 14:44 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, christian.couder, peff, ben.knoble On 25/03/13 01:55PM, Patrick Steinhardt wrote: > On Wed, Mar 12, 2025 at 07:17:04PM -0500, Justin Tobler wrote: > > diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc > > index 785c0786e0..166d3cd19e 100644 > > --- a/Documentation/rev-list-options.adoc > > +++ b/Documentation/rev-list-options.adoc > > @@ -361,6 +361,29 @@ ifdef::git-rev-list[] > > --progress=<header>:: > > Show progress reports on stderr as objects are considered. The > > `<header>` text will be printed with each progress update. > > + > > +-z:: > > + Instead of being newline-delimited, each outputted object and its > > + accompanying metadata is delimited using NUL bytes in the following > > + form: > > ++ > > +----------------------------------------------------------------------- > > +<OID> NUL [<token>=<value> NUL]... > > +----------------------------------------------------------------------- > > ++ > > +Additional object metadata, such as object paths, is printed using the > > +`<token>=<value>` form. Token values are printed as-is without any > > +encoding/truncation. An OID entry never contains a '=' character and thus > > +is used to signal the start of a new object record. Examples: > > ++ > > +----------------------------------------------------------------------- > > +<OID> NUL > > +<OID> NUL path=<path> NUL > > +----------------------------------------------------------------------- > > ++ > > +This mode is only compatible with the `--objects` output option. Also, revision > > +and pathspec argument parsing on stdin with the `--stdin` option is NUL byte > > +delimited instead of using newlines while in this mode. > > endif::git-rev-list[] > > > > History Simplification > > I feel like this last paragraph, where we talk about `--stdin` being > NUL-delimited, should already be mentioned in the first paragraph. That's fair. I'll move the `--stdin` part to the beginning. -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler ` (3 preceding siblings ...) 2025-03-13 0:17 ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler @ 2025-03-13 0:17 ` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 0:17 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler The `--boundary` option for git-rev-list(1) prints boundary objects found while performing the object walk in the form: $ git rev-list --boundary <rev> -<oid> LF Add support for printing boundary objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --boundary <rev> <oid> NUL boundary=yes NUL In this mode, instead of prefixing the boundary OID with '-', a separate `boundary=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 16 +++++++++------- builtin/rev-list.c | 9 +++++++-- t/t6000-rev-list-misc.sh | 16 ++++++++++++++++ 3 files changed, 32 insertions(+), 9 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 166d3cd19e..d400d76cf2 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -371,19 +371,21 @@ ifdef::git-rev-list[] <OID> NUL [<token>=<value> NUL]... ----------------------------------------------------------------------- + -Additional object metadata, such as object paths, is printed using the -`<token>=<value>` form. Token values are printed as-is without any -encoding/truncation. An OID entry never contains a '=' character and thus -is used to signal the start of a new object record. Examples: +Additional object metadata, such as object paths or boundary objects, is +printed using the `<token>=<value>` form. Token values are printed as-is +without any encoding/truncation. An OID entry never contains a '=' character +and thus is used to signal the start of a new object record. Examples: + ----------------------------------------------------------------------- <OID> NUL <OID> NUL path=<path> NUL +<OID> NUL boundary=yes NUL ----------------------------------------------------------------------- + -This mode is only compatible with the `--objects` output option. Also, revision -and pathspec argument parsing on stdin with the `--stdin` option is NUL byte -delimited instead of using newlines while in this mode. +This mode is only compatible with the `--objects` and `--boundary` output +options. Also, revision and pathspec argument parsing on stdin with the +`--stdin` option is NUL byte delimited instead of using newlines while in this +mode. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index f048500679..7c6d4b25b0 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -240,13 +240,18 @@ static void show_commit(struct commit *commit, void *data) fputs(info->header_prefix, stdout); if (revs->include_header) { - if (!revs->graph) + if (!revs->graph && line_term) fputs(get_revision_mark(revs, commit), stdout); if (revs->abbrev_commit && revs->abbrev) fputs(repo_find_unique_abbrev(the_repository, &commit->object.oid, revs->abbrev), stdout); else fputs(oid_to_hex(&commit->object.oid), stdout); + + if (!line_term) { + if (commit->object.flags & BOUNDARY) + printf("%cboundary=yes", info_term); + } } if (revs->print_parents) { struct commit_list *parents = commit->parents; @@ -779,7 +784,7 @@ int cmd_rev_list(int argc, if (revs.graph || revs.verbose_header || show_disk_usage || info.show_timestamp || info.header_prefix || bisect_list || use_bitmap_index || revs.edge_hint || revs.left_right || - revs.cherry_mark || arg_missing_action || revs.boundary) + revs.cherry_mark || arg_missing_action) die(_("-z option used with unsupported option")); } diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index dfbbc0aee6..349bf5ec3d 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -217,4 +217,20 @@ test_expect_success 'rev-list -z --objects' ' test_cmp expect actual ' +test_expect_success 'rev-list -z --boundary' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD) && + oid2=$(git -C repo rev-parse HEAD~) && + + printf "%s\0%s\0boundary=yes\0" "$oid1" "$oid2" >expect && + git -C repo rev-list -z --boundary HEAD~.. >actual && + + test_cmp expect actual +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 6/6] rev-list: support NUL-delimited --missing option 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler ` (4 preceding siblings ...) 2025-03-13 0:17 ` [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler @ 2025-03-13 0:17 ` Justin Tobler 2025-03-13 12:55 ` Patrick Steinhardt 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler 6 siblings, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-13 0:17 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler The `--missing={print,print-info}` option for git-rev-list(1) prints missing objects found while performing the object walk in the form: $ git rev-list --missing=print-info <rev> ?<oid> [SP <token>=<value>]... LF Add support for printing missing objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --missing=print-info <rev> <oid> NUL missing=yes NUL [<token>=<value> NUL]... In this mode, values containing special characters or spaces are printed as-is without being escaped or quoted. Instead of prefixing the missing OID with '?', a separate `missing=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 13 ++++++------ builtin/rev-list.c | 29 ++++++++++++++++++++------- t/t6022-rev-list-missing.sh | 31 +++++++++++++++++++++++++++++ 3 files changed, 60 insertions(+), 13 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index d400d76cf2..145ded5c78 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -371,8 +371,8 @@ ifdef::git-rev-list[] <OID> NUL [<token>=<value> NUL]... ----------------------------------------------------------------------- + -Additional object metadata, such as object paths or boundary objects, is -printed using the `<token>=<value>` form. Token values are printed as-is +Additional object metadata, such as object paths or boundary/missing objects, +is printed using the `<token>=<value>` form. Token values are printed as-is without any encoding/truncation. An OID entry never contains a '=' character and thus is used to signal the start of a new object record. Examples: + @@ -380,12 +380,13 @@ and thus is used to signal the start of a new object record. Examples: <OID> NUL <OID> NUL path=<path> NUL <OID> NUL boundary=yes NUL +<OID> NUL missing=yes NUL [<token>=<value> NUL]... ----------------------------------------------------------------------- + -This mode is only compatible with the `--objects` and `--boundary` output -options. Also, revision and pathspec argument parsing on stdin with the -`--stdin` option is NUL byte delimited instead of using newlines while in this -mode. +This mode is only compatible with the `--objects`, `--boundary`, and +`--missing` output options. Also, revision and pathspec argument parsing on +stdin with the `--stdin` option is NUL byte delimited instead of using newlines +while in this mode. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 7c6d4b25b0..d7b4dd48ff 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -136,24 +136,39 @@ static void print_missing_object(struct missing_objects_map_entry *entry, { struct strbuf sb = STRBUF_INIT; + if (line_term) + putchar('?'); + + printf("%s", oid_to_hex(&entry->entry.oid)); + + if (!line_term) + printf("%cmissing=yes", info_term); + if (!print_missing_info) { - printf("?%s\n", oid_to_hex(&entry->entry.oid)); + putchar(line_term); return; } if (entry->path && *entry->path) { struct strbuf path = STRBUF_INIT; - strbuf_addstr(&sb, " path="); - quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); - strbuf_addbuf(&sb, &path); + strbuf_addf(&sb, "%cpath=", info_term); + + if (line_term) { + quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); + strbuf_addbuf(&sb, &path); + } else { + strbuf_addstr(&sb, entry->path); + } strbuf_release(&path); } if (entry->type) - strbuf_addf(&sb, " type=%s", type_name(entry->type)); + strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type)); + + fwrite(sb.buf, sizeof(char), sb.len, stdout); + putchar(line_term); - printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf); strbuf_release(&sb); } @@ -784,7 +799,7 @@ int cmd_rev_list(int argc, if (revs.graph || revs.verbose_header || show_disk_usage || info.show_timestamp || info.header_prefix || bisect_list || use_bitmap_index || revs.edge_hint || revs.left_right || - revs.cherry_mark || arg_missing_action) + revs.cherry_mark) die(_("-z option used with unsupported option")); } diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh index 3e2790d4c8..08e92dd002 100755 --- a/t/t6022-rev-list-missing.sh +++ b/t/t6022-rev-list-missing.sh @@ -198,4 +198,35 @@ do ' done +test_expect_success "-z nul-delimited --missing" ' + test_when_finished rm -rf repo && + + git init repo && + ( + cd repo && + git commit --allow-empty -m first && + + path="foo bar" && + echo foobar >"$path" && + git add -A && + git commit -m second && + + oid=$(git rev-parse "HEAD:$path") && + type="$(git cat-file -t $oid)" && + + obj_path=".git/objects/$(test_oid_to_path $oid)" && + + git rev-list -z --objects --no-object-names \ + HEAD ^"$oid" >expect && + printf "%s\0missing=yes\0path=%s\0type=%s\0" "$oid" "$path" \ + "$type" >>expect && + + mv "$obj_path" "$obj_path.hidden" && + git rev-list -z --objects --no-object-names \ + --missing=print-info HEAD >actual && + + test_cmp expect actual + ) +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 6/6] rev-list: support NUL-delimited --missing option 2025-03-13 0:17 ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler @ 2025-03-13 12:55 ` Patrick Steinhardt 2025-03-13 14:51 ` Justin Tobler 0 siblings, 1 reply; 60+ messages in thread From: Patrick Steinhardt @ 2025-03-13 12:55 UTC (permalink / raw) To: Justin Tobler; +Cc: git, christian.couder, peff, ben.knoble On Wed, Mar 12, 2025 at 07:17:06PM -0500, Justin Tobler wrote: > diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc > index d400d76cf2..145ded5c78 100644 > --- a/Documentation/rev-list-options.adoc > +++ b/Documentation/rev-list-options.adoc > @@ -371,8 +371,8 @@ ifdef::git-rev-list[] > <OID> NUL [<token>=<value> NUL]... > ----------------------------------------------------------------------- > + > -Additional object metadata, such as object paths or boundary objects, is > -printed using the `<token>=<value>` form. Token values are printed as-is > +Additional object metadata, such as object paths or boundary/missing objects, > +is printed using the `<token>=<value>` form. Token values are printed as-is > without any encoding/truncation. An OID entry never contains a '=' character > and thus is used to signal the start of a new object record. Examples: > + Nit: I don't think we need to update this paragraph here as it is written as a non-exhaustive list anyway. > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > index 7c6d4b25b0..d7b4dd48ff 100644 > --- a/builtin/rev-list.c > +++ b/builtin/rev-list.c > @@ -136,24 +136,39 @@ static void print_missing_object(struct missing_objects_map_entry *entry, > { > struct strbuf sb = STRBUF_INIT; > > + if (line_term) > + putchar('?'); > + > + printf("%s", oid_to_hex(&entry->entry.oid)); > + > + if (!line_term) > + printf("%cmissing=yes", info_term); > + > if (!print_missing_info) { > - printf("?%s\n", oid_to_hex(&entry->entry.oid)); > + putchar(line_term); > return; > } > > if (entry->path && *entry->path) { > struct strbuf path = STRBUF_INIT; Nit: the variable and its cleanup could be moved closer to where it's used. > - strbuf_addstr(&sb, " path="); > - quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); > - strbuf_addbuf(&sb, &path); > + strbuf_addf(&sb, "%cpath=", info_term); > + > + if (line_term) { > + quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); > + strbuf_addbuf(&sb, &path); > + } else { > + strbuf_addstr(&sb, entry->path); > + } > > strbuf_release(&path); > } > if (entry->type) > - strbuf_addf(&sb, " type=%s", type_name(entry->type)); > + strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type)); > + > + fwrite(sb.buf, sizeof(char), sb.len, stdout); > + putchar(line_term); > > - printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf); > strbuf_release(&sb); > } > Patrick ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 6/6] rev-list: support NUL-delimited --missing option 2025-03-13 12:55 ` Patrick Steinhardt @ 2025-03-13 14:51 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 14:51 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, christian.couder, peff, ben.knoble On 25/03/13 01:55PM, Patrick Steinhardt wrote: > On Wed, Mar 12, 2025 at 07:17:06PM -0500, Justin Tobler wrote: > > diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc > > index d400d76cf2..145ded5c78 100644 > > --- a/Documentation/rev-list-options.adoc > > +++ b/Documentation/rev-list-options.adoc > > @@ -371,8 +371,8 @@ ifdef::git-rev-list[] > > <OID> NUL [<token>=<value> NUL]... > > ----------------------------------------------------------------------- > > + > > -Additional object metadata, such as object paths or boundary objects, is > > -printed using the `<token>=<value>` form. Token values are printed as-is > > +Additional object metadata, such as object paths or boundary/missing objects, > > +is printed using the `<token>=<value>` form. Token values are printed as-is > > without any encoding/truncation. An OID entry never contains a '=' character > > and thus is used to signal the start of a new object record. Examples: > > + > > Nit: I don't think we need to update this paragraph here as it is > written as a non-exhaustive list anyway. Ok, I'll omit this change and only keep the added example. > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > > index 7c6d4b25b0..d7b4dd48ff 100644 > > --- a/builtin/rev-list.c > > +++ b/builtin/rev-list.c > > @@ -136,24 +136,39 @@ static void print_missing_object(struct missing_objects_map_entry *entry, > > { > > struct strbuf sb = STRBUF_INIT; > > > > + if (line_term) > > + putchar('?'); > > + > > + printf("%s", oid_to_hex(&entry->entry.oid)); > > + > > + if (!line_term) > > + printf("%cmissing=yes", info_term); > > + > > if (!print_missing_info) { > > - printf("?%s\n", oid_to_hex(&entry->entry.oid)); > > + putchar(line_term); > > return; > > } > > > > if (entry->path && *entry->path) { > > struct strbuf path = STRBUF_INIT; > > Nit: the variable and its cleanup could be moved closer to where it's > used. Will do. Thanks -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler ` (5 preceding siblings ...) 2025-03-13 0:17 ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler @ 2025-03-13 23:57 ` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler ` (6 more replies) 6 siblings, 7 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler When walking objects, git-rev-list(1) prints each object entry on a separate line in the form: <oid> LF Some options, such as `--objects`, may print additional information about the object on the same line: <oid> SP [<path>] LF In this mode, if the object path contains a newline it is truncated at the newline. The `--boundary` option also modifies output by prefixing boundary objects with `-`: -<oid> LF When the `--missing={print,print-info}` option is provided, information about any missing objects encountered during the object walk are also printed in the form: ?<oid> [SP <token>=<value>]... LF where values containing LF or SP are printed in a token specific fashion so that the resulting encoded value does not contain either of these two problematic bytes. For example, missing object paths are quoted in the C style when they contain LF or SP. To make machine parsing easier, this series introduces a NUL-delimited output mode for git-rev-list(1) via a `-z` option. In this mode, the output format for object records is unified such that each object and its accompanying metadata is formatted without relying on object metadata order. This format follows the existing `<token>=<value>` used by the `--missing` option to represent object metadata in the form: <oid> NUL [<token>=<value> NUL]... # Examples <oid> LF -> <oid> NUL <oid> SP <path> LF -> <oid> NUL path=<path> NUL -<oid> LF -> <oid> NUL boundary=yes NUL ?<oid> [SP <token>=<value>]... -> <oid> NUL missing=yes NUL [<token>=<value> NUL]... Note that token value info is printed as-is without any special encoding or truncation. Prefixes such as '-' and '?' are dropped in favor using a token/value pair to signal the same information. While in this mode, if the `--sdtin` option is used, revision and pathspec arguments read from stdin are separated with a NUL byte instead of being newline delimited. For now this series only adds support for use with the `--objects`, `--boundary` and `--missing` output options. Usage of `-z` with other options is rejected, so it can potentially be added in the future. This series is structured as follows: - Patches 1 and 2 do some minor preparatory refactors. - Patch 3 modifies stdin argument parsing handled by `setup_revisions()` to support NUL-delimited arguments. - Patch 4 adds the `-z` option to git-rev-list(1) to print objects in a NUL-delimited fashion. Arguments parsed on stdin while in the mode are also NUL-delimited. - Patch 5 teaches the `--boundary` option how to print info in a NUL-delimited fashino using the unified output format. - Patch 6 teaches the `--missing` option how to print info in a NUL-delimited fashion using the unified output format. Changes since V2: - In patch 4, the documentation for the -z option now points out the `--stdin` behavior change earlier. - Minor code style and documentation changes in patch 6. Changes since V1: - Use unified output format with `<token>=<value>` pairs for all object metadata. - Add support for the `--boundary` option in NUL-delimited mode. - Add support for NUL-delimited stdin argument parsing in NUL-delimited mode. - Instead of using two NUL bytes to delimit between object records, a single NUL byte is used. Now that object metadata is always in the form `<token>=<value>`, we know a new object record starts when there is an OID entry which will not contain '='. Thanks for taking a look, -Justin Justin Tobler (6): rev-list: inline `show_object_with_name()` in `show_object()` rev-list: refactor early option parsing revision: support NUL-delimited --stdin mode rev-list: support delimiting objects with NUL bytes rev-list: support NUL-delimited --boundary option rev-list: support NUL-delimited --missing option Documentation/rev-list-options.adoc | 26 ++++++++ builtin/rev-list.c | 94 +++++++++++++++++++++-------- revision.c | 27 ++++----- revision.h | 5 +- t/t6000-rev-list-misc.sh | 51 ++++++++++++++++ t/t6017-rev-list-stdin.sh | 9 +++ t/t6022-rev-list-missing.sh | 31 ++++++++++ 7 files changed, 200 insertions(+), 43 deletions(-) Range-diff against v2: 1: d2eded3ac7 = 1: d2eded3ac7 rev-list: inline `show_object_with_name()` in `show_object()` 2: 03cd08c859 = 2: 03cd08c859 rev-list: refactor early option parsing 3: 803a49933a = 3: 803a49933a revision: support NUL-delimited --stdin mode 4: d3b3c4ef89 ! 4: 8eb7669089 rev-list: support delimiting objects with NUL bytes @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[] + +-z:: + Instead of being newline-delimited, each outputted object and its -+ accompanying metadata is delimited using NUL bytes in the following -+ form: ++ accompanying metadata is delimited using NUL bytes. In this mode, when ++ the `--stdin` option is provided, revision and pathspec arguments on ++ stdin are also delimited using a NUL byte. Output is printed in the ++ following form: ++ +----------------------------------------------------------------------- +<OID> NUL [<token>=<value> NUL]... @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[] +<OID> NUL path=<path> NUL +----------------------------------------------------------------------- ++ -+This mode is only compatible with the `--objects` output option. Also, revision -+and pathspec argument parsing on stdin with the `--stdin` option is NUL byte -+delimited instead of using newlines while in this mode. ++This mode is only compatible with the `--objects` output option. endif::git-rev-list[] History Simplification 5: 5e4fc41976 ! 5: 591a2c7dac rev-list: support NUL-delimited --boundary option @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[] +<OID> NUL boundary=yes NUL ----------------------------------------------------------------------- + --This mode is only compatible with the `--objects` output option. Also, revision --and pathspec argument parsing on stdin with the `--stdin` option is NUL byte --delimited instead of using newlines while in this mode. +-This mode is only compatible with the `--objects` output option. +This mode is only compatible with the `--objects` and `--boundary` output -+options. Also, revision and pathspec argument parsing on stdin with the -+`--stdin` option is NUL byte delimited instead of using newlines while in this -+mode. ++options. endif::git-rev-list[] History Simplification 6: 7744966514 ! 6: 669b3b5d9f rev-list: support NUL-delimited --missing option @@ Commit message Signed-off-by: Justin Tobler <jltobler@gmail.com> ## Documentation/rev-list-options.adoc ## -@@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[] - <OID> NUL [<token>=<value> NUL]... - ----------------------------------------------------------------------- - + --Additional object metadata, such as object paths or boundary objects, is --printed using the `<token>=<value>` form. Token values are printed as-is -+Additional object metadata, such as object paths or boundary/missing objects, -+is printed using the `<token>=<value>` form. Token values are printed as-is - without any encoding/truncation. An OID entry never contains a '=' character - and thus is used to signal the start of a new object record. Examples: - + @@ Documentation/rev-list-options.adoc: and thus is used to signal the start of a new object record. Examples: <OID> NUL <OID> NUL path=<path> NUL @@ Documentation/rev-list-options.adoc: and thus is used to signal the start of a n ----------------------------------------------------------------------- + -This mode is only compatible with the `--objects` and `--boundary` output --options. Also, revision and pathspec argument parsing on stdin with the --`--stdin` option is NUL byte delimited instead of using newlines while in this --mode. +-options. +This mode is only compatible with the `--objects`, `--boundary`, and -+`--missing` output options. Also, revision and pathspec argument parsing on -+stdin with the `--stdin` option is NUL byte delimited instead of using newlines -+while in this mode. ++`--missing` output options. endif::git-rev-list[] History Simplification @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_ struct strbuf sb = STRBUF_INIT; + if (line_term) -+ putchar('?'); -+ -+ printf("%s", oid_to_hex(&entry->entry.oid)); -+ -+ if (!line_term) -+ printf("%cmissing=yes", info_term); ++ printf("?%s", oid_to_hex(&entry->entry.oid)); ++ else ++ printf("%s%cmissing=yes", oid_to_hex(&entry->entry.oid), ++ info_term); + if (!print_missing_info) { - printf("?%s\n", oid_to_hex(&entry->entry.oid)); @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_ } if (entry->path && *entry->path) { - struct strbuf path = STRBUF_INIT; +- struct strbuf path = STRBUF_INIT; ++ strbuf_addf(&sb, "%cpath=", info_term); ++ ++ if (line_term) { ++ struct strbuf path = STRBUF_INIT; - strbuf_addstr(&sb, " path="); - quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); - strbuf_addbuf(&sb, &path); -+ strbuf_addf(&sb, "%cpath=", info_term); -+ -+ if (line_term) { + quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); + strbuf_addbuf(&sb, &path); + +- strbuf_release(&path); ++ strbuf_release(&path); + } else { + strbuf_addstr(&sb, entry->path); + } - - strbuf_release(&path); } if (entry->type) - strbuf_addf(&sb, " type=%s", type_name(entry->type)); base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3 -- 2.49.0.rc2 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler @ 2025-03-13 23:57 ` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 2/6] rev-list: refactor early option parsing Justin Tobler ` (5 subsequent siblings) 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler The `show_object_with_name()` function only has a single call site. Inline call to `show_object_with_name()` in `show_object()` so the explicit function can be cleaned up and live closer to where it is used. While at it, factor out the code that prints the OID and newline for both objects with and without a name. In a subsequent commit, `show_object()` is modified to support printing object information in a NUL-delimited format. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- builtin/rev-list.c | 13 +++++++++---- revision.c | 8 -------- revision.h | 2 -- 3 files changed, 9 insertions(+), 14 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index bb26bee0d4..dcd079c16c 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data) return; } - if (arg_show_object_names) - show_object_with_name(stdout, obj, name); - else - printf("%s\n", oid_to_hex(&obj->oid)); + printf("%s", oid_to_hex(&obj->oid)); + + if (arg_show_object_names) { + putchar(' '); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); + } + + putchar('\n'); } static void show_edge(struct commit *commit) diff --git a/revision.c b/revision.c index c4390f0938..0eaebe4478 100644 --- a/revision.c +++ b/revision.c @@ -59,14 +59,6 @@ implement_shared_commit_slab(revision_sources, char *); static inline int want_ancestry(const struct rev_info *revs); -void show_object_with_name(FILE *out, struct object *obj, const char *name) -{ - fprintf(out, "%s ", oid_to_hex(&obj->oid)); - for (const char *p = name; *p && *p != '\n'; p++) - fputc(*p, out); - fputc('\n', out); -} - static void mark_blob_uninteresting(struct blob *blob) { if (!blob) diff --git a/revision.h b/revision.h index 71e984c452..21c6a69899 100644 --- a/revision.h +++ b/revision.h @@ -489,8 +489,6 @@ void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit); void mark_tree_uninteresting(struct repository *r, struct tree *tree); void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees); -void show_object_with_name(FILE *, struct object *, const char *); - /** * Helpers to check if a reference should be excluded. */ -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 2/6] rev-list: refactor early option parsing 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-13 23:57 ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler @ 2025-03-13 23:57 ` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 3/6] revision: support NUL-delimited --stdin mode Justin Tobler ` (4 subsequent siblings) 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler Before invoking `setup_revisions()`, the `--missing` and `--exclude-promisor-objects` options are parsed early. In a subsequent commit, another option is added that must be parsed early. Refactor the code to parse both options in a single early pass. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- builtin/rev-list.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index dcd079c16c..04d9c893b5 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -16,6 +16,7 @@ #include "object-file.h" #include "object-store-ll.h" #include "pack-bitmap.h" +#include "parse-options.h" #include "log-tree.h" #include "graph.h" #include "bisect.h" @@ -639,19 +640,15 @@ int cmd_rev_list(int argc, if (!strcmp(arg, "--exclude-promisor-objects")) { fetch_if_missing = 0; revs.exclude_promisor_objects = 1; - break; - } - } - for (i = 1; i < argc; i++) { - const char *arg = argv[i]; - if (skip_prefix(arg, "--missing=", &arg)) { - if (revs.exclude_promisor_objects) - die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing"); - if (parse_missing_action_value(arg)) - break; + } else if (skip_prefix(arg, "--missing=", &arg)) { + parse_missing_action_value(arg); } } + die_for_incompatible_opt2(revs.exclude_promisor_objects, + "--exclude_promisor_objects", + arg_missing_action, "--missing"); + if (arg_missing_action) revs.do_not_die_on_missing_objects = 1; -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 3/6] revision: support NUL-delimited --stdin mode 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-13 23:57 ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 2/6] rev-list: refactor early option parsing Justin Tobler @ 2025-03-13 23:57 ` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler ` (3 subsequent siblings) 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler When `setup_revisions()` parses the `--stdin` option, revision and pathspec arguments are read from stdin. Each line of input is handled as a separate argument. Introduce the `nul_delim_stdin` field to `setup_revision_opt` that, when enabled, uses a NUL byte to delimit between stdin arguments instead of newline. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- revision.c | 19 +++++++++++-------- revision.h | 3 ++- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/revision.c b/revision.c index 0eaebe4478..5de6309830 100644 --- a/revision.c +++ b/revision.c @@ -2275,10 +2275,10 @@ int handle_revision_arg(const char *arg, struct rev_info *revs, int flags, unsig return ret; } -static void read_pathspec_from_stdin(struct strbuf *sb, - struct strvec *prune) +static void read_pathspec_from_stdin(struct strbuf *sb, struct strvec *prune, + int line_term) { - while (strbuf_getline(sb, stdin) != EOF) + while (strbuf_getdelim_strip_crlf(sb, stdin, line_term) != EOF) strvec_push(prune, sb->buf); } @@ -2905,8 +2905,8 @@ static int handle_revision_pseudo_opt(struct rev_info *revs, return 1; } -static void read_revisions_from_stdin(struct rev_info *revs, - struct strvec *prune) +static void read_revisions_from_stdin(struct rev_info *revs, struct strvec *prune, + int line_term) { struct strbuf sb; int seen_dashdash = 0; @@ -2918,7 +2918,7 @@ static void read_revisions_from_stdin(struct rev_info *revs, warn_on_object_refname_ambiguity = 0; strbuf_init(&sb, 1000); - while (strbuf_getline(&sb, stdin) != EOF) { + while (strbuf_getdelim_strip_crlf(&sb, stdin, line_term) != EOF) { if (!sb.len) break; @@ -2946,7 +2946,7 @@ static void read_revisions_from_stdin(struct rev_info *revs, die("bad revision '%s'", sb.buf); } if (seen_dashdash) - read_pathspec_from_stdin(&sb, prune); + read_pathspec_from_stdin(&sb, prune, line_term); strbuf_release(&sb); warn_on_object_refname_ambiguity = save_warning; @@ -3019,13 +3019,16 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s } if (!strcmp(arg, "--stdin")) { + int term = opt && opt->nul_delim_stdin ? '\0' : '\n'; + if (revs->disable_stdin) { argv[left++] = arg; continue; } if (revs->read_from_stdin++) die("--stdin given twice?"); - read_revisions_from_stdin(revs, &prune_data); + read_revisions_from_stdin(revs, &prune_data, + term); continue; } diff --git a/revision.h b/revision.h index 21c6a69899..0e680c3667 100644 --- a/revision.h +++ b/revision.h @@ -439,7 +439,8 @@ struct setup_revision_opt { void (*tweak)(struct rev_info *); unsigned int assume_dashdash:1, allow_exclude_promisor_objects:1, - free_removed_argv_elements:1; + free_removed_argv_elements:1, + nul_delim_stdin:1; unsigned revarg_opt; }; int setup_revisions(int argc, const char **argv, struct rev_info *revs, -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler ` (2 preceding siblings ...) 2025-03-13 23:57 ` [PATCH v3 3/6] revision: support NUL-delimited --stdin mode Justin Tobler @ 2025-03-13 23:57 ` Justin Tobler 2025-03-19 12:35 ` Christian Couder 2025-03-13 23:57 ` [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler ` (2 subsequent siblings) 6 siblings, 1 reply; 60+ messages in thread From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler When walking objects, git-rev-list(1) prints each object entry on a separate line. Some options, such as `--objects`, may print additional information about tree and blob object on the same line in the form: $ git rev-list --objects <rev> <tree/blob oid> SP [<path>] LF Note that in this form the SP is appended regardless of whether the tree or blob object has path information available. Paths containing a newline are also truncated at the newline. Introduce the `-z` option for git-rev-list(1) which reformats the output to use NUL-delimiters between objects and associated info in the following form: $ git rev-list -z --objects <rev> <oid> NUL [path=<path> NUL] In this form, the start of each record is signaled by an OID entry that is all hexidecimal and does not contain any '='. Additional path info from `--objects` is appended to the record as a token/value pair `path=<path>` as-is without any truncation. In this mode, revision and pathspec arguments provided on stdin with the `--stdin` option are also separated by a NUL byte instead of being newline delimited. For now, the `--objects` and `--stdin` flag are the only options that can be used in combination with `-z`. In a subsequent commit, NUL-delimited support for other options is added. Other options that do not make sense with be used in combination with `-z` are rejected. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 23 ++++++++++++++++++ builtin/rev-list.c | 36 +++++++++++++++++++++++++---- t/t6000-rev-list-misc.sh | 35 ++++++++++++++++++++++++++++ t/t6017-rev-list-stdin.sh | 9 ++++++++ 4 files changed, 98 insertions(+), 5 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 785c0786e0..14d82fdfbf 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -361,6 +361,29 @@ ifdef::git-rev-list[] --progress=<header>:: Show progress reports on stderr as objects are considered. The `<header>` text will be printed with each progress update. + +-z:: + Instead of being newline-delimited, each outputted object and its + accompanying metadata is delimited using NUL bytes. In this mode, when + the `--stdin` option is provided, revision and pathspec arguments on + stdin are also delimited using a NUL byte. Output is printed in the + following form: ++ +----------------------------------------------------------------------- +<OID> NUL [<token>=<value> NUL]... +----------------------------------------------------------------------- ++ +Additional object metadata, such as object paths, is printed using the +`<token>=<value>` form. Token values are printed as-is without any +encoding/truncation. An OID entry never contains a '=' character and thus +is used to signal the start of a new object record. Examples: ++ +----------------------------------------------------------------------- +<OID> NUL +<OID> NUL path=<path> NUL +----------------------------------------------------------------------- ++ +This mode is only compatible with the `--objects` output option. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 04d9c893b5..f048500679 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -65,6 +65,7 @@ static const char rev_list_usage[] = " --abbrev-commit\n" " --left-right\n" " --count\n" +" -z\n" " special purpose:\n" " --bisect\n" " --bisect-vars\n" @@ -97,6 +98,9 @@ static int arg_show_object_names = 1; #define DEFAULT_OIDSET_SIZE (16*1024) +static char line_term = '\n'; +static char info_term = ' '; + static int show_disk_usage; static off_t total_disk_usage; static int human_readable; @@ -264,7 +268,7 @@ static void show_commit(struct commit *commit, void *data) if (revs->commit_format == CMIT_FMT_ONELINE) putchar(' '); else if (revs->include_header) - putchar('\n'); + putchar(line_term); if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; @@ -361,12 +365,16 @@ static void show_object(struct object *obj, const char *name, void *cb_data) printf("%s", oid_to_hex(&obj->oid)); if (arg_show_object_names) { - putchar(' '); - for (const char *p = name; *p && *p != '\n'; p++) - putchar(*p); + if (line_term) { + putchar(info_term); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); + } else if (*name) { + printf("%cpath=%s", info_term, name); + } } - putchar('\n'); + putchar(line_term); } static void show_edge(struct commit *commit) @@ -642,6 +650,10 @@ int cmd_rev_list(int argc, revs.exclude_promisor_objects = 1; } else if (skip_prefix(arg, "--missing=", &arg)) { parse_missing_action_value(arg); + } else if (!strcmp(arg, "-z")) { + s_r_opt.nul_delim_stdin = 1; + line_term = '\0'; + info_term = '\0'; } } @@ -757,6 +769,20 @@ int cmd_rev_list(int argc, usage(rev_list_usage); } + + /* + * Reject options currently incompatible with -z. For some options, this + * is not an inherent limitation and support may be implemented in the + * future. + */ + if (!line_term) { + if (revs.graph || revs.verbose_header || show_disk_usage || + info.show_timestamp || info.header_prefix || bisect_list || + use_bitmap_index || revs.edge_hint || revs.left_right || + revs.cherry_mark || arg_missing_action || revs.boundary) + die(_("-z option used with unsupported option")); + } + if (revs.commit_format != CMIT_FMT_USERFORMAT) revs.include_header = 1; if (revs.commit_format != CMIT_FMT_UNSPECIFIED) { diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 6289a2e8b0..dfbbc0aee6 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -182,4 +182,39 @@ test_expect_success 'rev-list --unpacked' ' test_cmp expect actual ' +test_expect_success 'rev-list -z' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD) && + oid2=$(git -C repo rev-parse HEAD~) && + + printf "%s\0%s\0" "$oid1" "$oid2" >expect && + git -C repo rev-list -z HEAD >actual && + + test_cmp expect actual +' + +test_expect_success 'rev-list -z --objects' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD:1.t) && + oid2=$(git -C repo rev-parse HEAD:2.t) && + path1=1.t && + path2=2.t && + + printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \ + >expect && + git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual && + + test_cmp expect actual +' + test_done diff --git a/t/t6017-rev-list-stdin.sh b/t/t6017-rev-list-stdin.sh index 4821b90e74..362a8b126a 100755 --- a/t/t6017-rev-list-stdin.sh +++ b/t/t6017-rev-list-stdin.sh @@ -148,4 +148,13 @@ test_expect_success '--not via stdin does not influence revisions from command l test_cmp expect actual ' +test_expect_success 'NUL-delimited stdin' ' + printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input && + + git rev-list -z --objects HEAD -- file-1 >expect && + git rev-list -z --objects --stdin <input >actual && + + test_cmp expect actual +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes 2025-03-13 23:57 ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler @ 2025-03-19 12:35 ` Christian Couder 2025-03-19 16:02 ` Justin Tobler 0 siblings, 1 reply; 60+ messages in thread From: Christian Couder @ 2025-03-19 12:35 UTC (permalink / raw) To: Justin Tobler; +Cc: git, ps, peff, ben.knoble On Fri, Mar 14, 2025 at 1:01 AM Justin Tobler <jltobler@gmail.com> wrote: > For now, the `--objects` and `--stdin` flag are the only options that > can be used in combination with `-z`. In a subsequent commit, > NUL-delimited support for other options is added. Other options that do > not make sense with be used in combination with `-z` are rejected. s/with be used/when used/ [...] > +test_expect_success 'rev-list -z' ' > + test_when_finished rm -rf repo && > + > + git init repo && > + test_commit -C repo 1 && > + test_commit -C repo 2 && > + > + oid1=$(git -C repo rev-parse HEAD) && > + oid2=$(git -C repo rev-parse HEAD~) && It seems to me that HEAD is at commit 2 and HEAD~ at commit 1 instead of the other way around. It looks like there is the same issue in the test added in the next patch ("[PATCH v3 5/6] rev-list: support NUL-delimited --boundary option") > + printf "%s\0%s\0" "$oid1" "$oid2" >expect && > + git -C repo rev-list -z HEAD >actual && > + > + test_cmp expect actual > +' Otherwise the whole patch series looks good to me. Thanks. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes 2025-03-19 12:35 ` Christian Couder @ 2025-03-19 16:02 ` Justin Tobler 0 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-19 16:02 UTC (permalink / raw) To: Christian Couder; +Cc: git, ps, peff, ben.knoble On 25/03/19 01:35PM, Christian Couder wrote: > On Fri, Mar 14, 2025 at 1:01 AM Justin Tobler <jltobler@gmail.com> wrote: > > > +test_expect_success 'rev-list -z' ' > > + test_when_finished rm -rf repo && > > + > > + git init repo && > > + test_commit -C repo 1 && > > + test_commit -C repo 2 && > > + > > + oid1=$(git -C repo rev-parse HEAD) && > > + oid2=$(git -C repo rev-parse HEAD~) && > > It seems to me that HEAD is at commit 2 and HEAD~ at commit 1 instead > of the other way around. In this case, oid1 and oid2 were ordered based on how they would show up in ouput, but this is somewhat confusing because its not the order they were committed in. I'll change it to be in commit order instead. Thanks, -Justin ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler ` (3 preceding siblings ...) 2025-03-13 23:57 ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler @ 2025-03-13 23:57 ` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 6/6] rev-list: support NUL-delimited --missing option Justin Tobler 2025-03-19 18:34 ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler The `--boundary` option for git-rev-list(1) prints boundary objects found while performing the object walk in the form: $ git rev-list --boundary <rev> -<oid> LF Add support for printing boundary objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --boundary <rev> <oid> NUL boundary=yes NUL In this mode, instead of prefixing the boundary OID with '-', a separate `boundary=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 12 +++++++----- builtin/rev-list.c | 9 +++++++-- t/t6000-rev-list-misc.sh | 16 ++++++++++++++++ 3 files changed, 30 insertions(+), 7 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 14d82fdfbf..92ac31a8e8 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -373,17 +373,19 @@ ifdef::git-rev-list[] <OID> NUL [<token>=<value> NUL]... ----------------------------------------------------------------------- + -Additional object metadata, such as object paths, is printed using the -`<token>=<value>` form. Token values are printed as-is without any -encoding/truncation. An OID entry never contains a '=' character and thus -is used to signal the start of a new object record. Examples: +Additional object metadata, such as object paths or boundary objects, is +printed using the `<token>=<value>` form. Token values are printed as-is +without any encoding/truncation. An OID entry never contains a '=' character +and thus is used to signal the start of a new object record. Examples: + ----------------------------------------------------------------------- <OID> NUL <OID> NUL path=<path> NUL +<OID> NUL boundary=yes NUL ----------------------------------------------------------------------- + -This mode is only compatible with the `--objects` output option. +This mode is only compatible with the `--objects` and `--boundary` output +options. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index f048500679..7c6d4b25b0 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -240,13 +240,18 @@ static void show_commit(struct commit *commit, void *data) fputs(info->header_prefix, stdout); if (revs->include_header) { - if (!revs->graph) + if (!revs->graph && line_term) fputs(get_revision_mark(revs, commit), stdout); if (revs->abbrev_commit && revs->abbrev) fputs(repo_find_unique_abbrev(the_repository, &commit->object.oid, revs->abbrev), stdout); else fputs(oid_to_hex(&commit->object.oid), stdout); + + if (!line_term) { + if (commit->object.flags & BOUNDARY) + printf("%cboundary=yes", info_term); + } } if (revs->print_parents) { struct commit_list *parents = commit->parents; @@ -779,7 +784,7 @@ int cmd_rev_list(int argc, if (revs.graph || revs.verbose_header || show_disk_usage || info.show_timestamp || info.header_prefix || bisect_list || use_bitmap_index || revs.edge_hint || revs.left_right || - revs.cherry_mark || arg_missing_action || revs.boundary) + revs.cherry_mark || arg_missing_action) die(_("-z option used with unsupported option")); } diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index dfbbc0aee6..349bf5ec3d 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -217,4 +217,20 @@ test_expect_success 'rev-list -z --objects' ' test_cmp expect actual ' +test_expect_success 'rev-list -z --boundary' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD) && + oid2=$(git -C repo rev-parse HEAD~) && + + printf "%s\0%s\0boundary=yes\0" "$oid1" "$oid2" >expect && + git -C repo rev-list -z --boundary HEAD~.. >actual && + + test_cmp expect actual +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v3 6/6] rev-list: support NUL-delimited --missing option 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler ` (4 preceding siblings ...) 2025-03-13 23:57 ` [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler @ 2025-03-13 23:57 ` Justin Tobler 2025-03-19 18:34 ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler 6 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler The `--missing={print,print-info}` option for git-rev-list(1) prints missing objects found while performing the object walk in the form: $ git rev-list --missing=print-info <rev> ?<oid> [SP <token>=<value>]... LF Add support for printing missing objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --missing=print-info <rev> <oid> NUL missing=yes NUL [<token>=<value> NUL]... In this mode, values containing special characters or spaces are printed as-is without being escaped or quoted. Instead of prefixing the missing OID with '?', a separate `missing=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 5 +++-- builtin/rev-list.c | 31 ++++++++++++++++++++--------- t/t6022-rev-list-missing.sh | 31 +++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+), 11 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 92ac31a8e8..f4764b72f5 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -382,10 +382,11 @@ and thus is used to signal the start of a new object record. Examples: <OID> NUL <OID> NUL path=<path> NUL <OID> NUL boundary=yes NUL +<OID> NUL missing=yes NUL [<token>=<value> NUL]... ----------------------------------------------------------------------- + -This mode is only compatible with the `--objects` and `--boundary` output -options. +This mode is only compatible with the `--objects`, `--boundary`, and +`--missing` output options. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 7c6d4b25b0..036fcc26d5 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -136,24 +136,37 @@ static void print_missing_object(struct missing_objects_map_entry *entry, { struct strbuf sb = STRBUF_INIT; + if (line_term) + printf("?%s", oid_to_hex(&entry->entry.oid)); + else + printf("%s%cmissing=yes", oid_to_hex(&entry->entry.oid), + info_term); + if (!print_missing_info) { - printf("?%s\n", oid_to_hex(&entry->entry.oid)); + putchar(line_term); return; } if (entry->path && *entry->path) { - struct strbuf path = STRBUF_INIT; + strbuf_addf(&sb, "%cpath=", info_term); + + if (line_term) { + struct strbuf path = STRBUF_INIT; - strbuf_addstr(&sb, " path="); - quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); - strbuf_addbuf(&sb, &path); + quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); + strbuf_addbuf(&sb, &path); - strbuf_release(&path); + strbuf_release(&path); + } else { + strbuf_addstr(&sb, entry->path); + } } if (entry->type) - strbuf_addf(&sb, " type=%s", type_name(entry->type)); + strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type)); + + fwrite(sb.buf, sizeof(char), sb.len, stdout); + putchar(line_term); - printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf); strbuf_release(&sb); } @@ -784,7 +797,7 @@ int cmd_rev_list(int argc, if (revs.graph || revs.verbose_header || show_disk_usage || info.show_timestamp || info.header_prefix || bisect_list || use_bitmap_index || revs.edge_hint || revs.left_right || - revs.cherry_mark || arg_missing_action) + revs.cherry_mark) die(_("-z option used with unsupported option")); } diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh index 3e2790d4c8..08e92dd002 100755 --- a/t/t6022-rev-list-missing.sh +++ b/t/t6022-rev-list-missing.sh @@ -198,4 +198,35 @@ do ' done +test_expect_success "-z nul-delimited --missing" ' + test_when_finished rm -rf repo && + + git init repo && + ( + cd repo && + git commit --allow-empty -m first && + + path="foo bar" && + echo foobar >"$path" && + git add -A && + git commit -m second && + + oid=$(git rev-parse "HEAD:$path") && + type="$(git cat-file -t $oid)" && + + obj_path=".git/objects/$(test_oid_to_path $oid)" && + + git rev-list -z --objects --no-object-names \ + HEAD ^"$oid" >expect && + printf "%s\0missing=yes\0path=%s\0type=%s\0" "$oid" "$path" \ + "$type" >>expect && + + mv "$obj_path" "$obj_path.hidden" && + git rev-list -z --objects --no-object-names \ + --missing=print-info HEAD >actual && + + test_cmp expect actual + ) +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler ` (5 preceding siblings ...) 2025-03-13 23:57 ` [PATCH v3 6/6] rev-list: support NUL-delimited --missing option Justin Tobler @ 2025-03-19 18:34 ` Justin Tobler 2025-03-19 18:34 ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler ` (4 more replies) 6 siblings, 5 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler When walking objects, git-rev-list(1) prints each object entry on a separate line in the form: <oid> LF Some options, such as `--objects`, may print additional information about the object on the same line: <oid> SP [<path>] LF In this mode, if the object path contains a newline it is truncated at the newline. The `--boundary` option also modifies output by prefixing boundary objects with `-`: -<oid> LF When the `--missing={print,print-info}` option is provided, information about any missing objects encountered during the object walk are also printed in the form: ?<oid> [SP <token>=<value>]... LF where values containing LF or SP are printed in a token specific fashion so that the resulting encoded value does not contain either of these two problematic bytes. For example, missing object paths are quoted in the C style when they contain LF or SP. To make machine parsing easier, this series introduces a NUL-delimited output mode for git-rev-list(1) via a `-z` option. In this mode, the output format for object records is unified such that each object and its accompanying metadata is formatted without relying on object metadata order. This format follows the existing `<token>=<value>` used by the `--missing` option to represent object metadata in the form: <oid> NUL [<token>=<value> NUL]... # Examples <oid> LF -> <oid> NUL <oid> SP <path> LF -> <oid> NUL path=<path> NUL -<oid> LF -> <oid> NUL boundary=yes NUL ?<oid> [SP <token>=<value>]... -> <oid> NUL missing=yes NUL [<token>=<value> NUL]... Note that token value info is printed as-is without any special encoding or truncation. Prefixes such as '-' and '?' are dropped in favor using a token/value pair to signal the same information. For now this series only adds support for use with the `--objects`, `--boundary` and `--missing` output options. Usage of `-z` with other options is rejected, so it can potentially be added in the future. This series is structured as follows: - Patches 1 and 2 do some minor preparatory refactors. - Patch 3 adds the `-z` option to git-rev-list(1) to print objects in a NUL-delimited fashion. - Patch 4 teaches the `--boundary` option how to print info in a NUL-delimited fashino using the unified output format. - Patch 5 teaches the `--missing` option how to print info in a NUL-delimited fashion using the unified output format. Changes since V3: - The -z option now only makes output NUL-delimited. Input parsed on stdin via the `--stdin` option remains unchanged. This is done to remain more consistent with other log family commands. Support for more explicit options to control NUL-delimited input/ouput behavior may be added in a future series via `--NUL-delimited-{input,output}` options. - Changed some variable names in tests that were a little confusing. Changes since V2: - In patch 4, the documentation for the -z option now points out the `--stdin` behavior change earlier. - Minor code style and documentation changes in patch 6. Changes since V1: - Use unified output format with `<token>=<value>` pairs for all object metadata. - Add support for the `--boundary` option in NUL-delimited mode. - Add support for NUL-delimited stdin argument parsing in NUL-delimited mode. - Instead of using two NUL bytes to delimit between object records, a single NUL byte is used. Now that object metadata is always in the form `<token>=<value>`, we know a new object record starts when there is an OID entry which will not contain '='. Thanks for taking a look, -Justin Justin Tobler (5): rev-list: inline `show_object_with_name()` in `show_object()` rev-list: refactor early option parsing rev-list: support delimiting objects with NUL bytes rev-list: support NUL-delimited --boundary option rev-list: support NUL-delimited --missing option Documentation/rev-list-options.adoc | 24 ++++++++ builtin/rev-list.c | 93 +++++++++++++++++++++-------- revision.c | 8 --- revision.h | 2 - t/t6000-rev-list-misc.sh | 51 ++++++++++++++++ t/t6022-rev-list-missing.sh | 31 ++++++++++ 6 files changed, 175 insertions(+), 34 deletions(-) Range-diff against v3: 1: d2eded3ac7 = 1: d2eded3ac7 rev-list: inline `show_object_with_name()` in `show_object()` 2: 03cd08c859 = 2: 03cd08c859 rev-list: refactor early option parsing 3: 803a49933a < -: ---------- revision: support NUL-delimited --stdin mode 4: 8eb7669089 ! 3: f6ee01571d rev-list: support delimiting objects with NUL bytes @@ Commit message from `--objects` is appended to the record as a token/value pair `path=<path>` as-is without any truncation. - In this mode, revision and pathspec arguments provided on stdin with the - `--stdin` option are also separated by a NUL byte instead of being - newline delimited. - - For now, the `--objects` and `--stdin` flag are the only options that - can be used in combination with `-z`. In a subsequent commit, - NUL-delimited support for other options is added. Other options that do - not make sense with be used in combination with `-z` are rejected. + For now, the `--objects` flag is the only options that can be used in + combination with `-z`. In a subsequent commit, NUL-delimited support for + other options is added. Other options that do not make sense when used + in combination with `-z` are rejected. Signed-off-by: Justin Tobler <jltobler@gmail.com> @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[] + +-z:: + Instead of being newline-delimited, each outputted object and its -+ accompanying metadata is delimited using NUL bytes. In this mode, when -+ the `--stdin` option is provided, revision and pathspec arguments on -+ stdin are also delimited using a NUL byte. Output is printed in the -+ following form: ++ accompanying metadata is delimited using NUL bytes. Output is printed ++ in the following form: ++ +----------------------------------------------------------------------- +<OID> NUL [<token>=<value> NUL]... @@ builtin/rev-list.c: int cmd_rev_list(int argc, } else if (skip_prefix(arg, "--missing=", &arg)) { parse_missing_action_value(arg); + } else if (!strcmp(arg, "-z")) { -+ s_r_opt.nul_delim_stdin = 1; + line_term = '\0'; + info_term = '\0'; } @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list --unpacked' ' + test_commit -C repo 1 && + test_commit -C repo 2 && + -+ oid1=$(git -C repo rev-parse HEAD) && -+ oid2=$(git -C repo rev-parse HEAD~) && ++ oid1=$(git -C repo rev-parse HEAD~) && ++ oid2=$(git -C repo rev-parse HEAD) && + -+ printf "%s\0%s\0" "$oid1" "$oid2" >expect && ++ printf "%s\0%s\0" "$oid2" "$oid1" >expect && + git -C repo rev-list -z HEAD >actual && + + test_cmp expect actual @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list --unpacked' ' +' + test_done - - ## t/t6017-rev-list-stdin.sh ## -@@ t/t6017-rev-list-stdin.sh: test_expect_success '--not via stdin does not influence revisions from command l - test_cmp expect actual - ' - -+test_expect_success 'NUL-delimited stdin' ' -+ printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input && -+ -+ git rev-list -z --objects HEAD -- file-1 >expect && -+ git rev-list -z --objects --stdin <input >actual && -+ -+ test_cmp expect actual -+' -+ - test_done 5: 591a2c7dac ! 4: ccf6bd8d35 rev-list: support NUL-delimited --boundary option @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list -z --objects' ' + test_commit -C repo 1 && + test_commit -C repo 2 && + -+ oid1=$(git -C repo rev-parse HEAD) && -+ oid2=$(git -C repo rev-parse HEAD~) && ++ oid1=$(git -C repo rev-parse HEAD~) && ++ oid2=$(git -C repo rev-parse HEAD) && + -+ printf "%s\0%s\0boundary=yes\0" "$oid1" "$oid2" >expect && ++ printf "%s\0%s\0boundary=yes\0" "$oid2" "$oid1" >expect && + git -C repo rev-list -z --boundary HEAD~.. >actual && + + test_cmp expect actual 6: 669b3b5d9f = 5: b1bd245155 rev-list: support NUL-delimited --missing option base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3 -- 2.49.0.rc2 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` 2025-03-19 18:34 ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler @ 2025-03-19 18:34 ` Justin Tobler 2025-03-19 18:34 ` [PATCH v4 2/5] rev-list: refactor early option parsing Justin Tobler ` (3 subsequent siblings) 4 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler The `show_object_with_name()` function only has a single call site. Inline call to `show_object_with_name()` in `show_object()` so the explicit function can be cleaned up and live closer to where it is used. While at it, factor out the code that prints the OID and newline for both objects with and without a name. In a subsequent commit, `show_object()` is modified to support printing object information in a NUL-delimited format. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- builtin/rev-list.c | 13 +++++++++---- revision.c | 8 -------- revision.h | 2 -- 3 files changed, 9 insertions(+), 14 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index bb26bee0d4..dcd079c16c 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data) return; } - if (arg_show_object_names) - show_object_with_name(stdout, obj, name); - else - printf("%s\n", oid_to_hex(&obj->oid)); + printf("%s", oid_to_hex(&obj->oid)); + + if (arg_show_object_names) { + putchar(' '); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); + } + + putchar('\n'); } static void show_edge(struct commit *commit) diff --git a/revision.c b/revision.c index c4390f0938..0eaebe4478 100644 --- a/revision.c +++ b/revision.c @@ -59,14 +59,6 @@ implement_shared_commit_slab(revision_sources, char *); static inline int want_ancestry(const struct rev_info *revs); -void show_object_with_name(FILE *out, struct object *obj, const char *name) -{ - fprintf(out, "%s ", oid_to_hex(&obj->oid)); - for (const char *p = name; *p && *p != '\n'; p++) - fputc(*p, out); - fputc('\n', out); -} - static void mark_blob_uninteresting(struct blob *blob) { if (!blob) diff --git a/revision.h b/revision.h index 71e984c452..21c6a69899 100644 --- a/revision.h +++ b/revision.h @@ -489,8 +489,6 @@ void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit); void mark_tree_uninteresting(struct repository *r, struct tree *tree); void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees); -void show_object_with_name(FILE *, struct object *, const char *); - /** * Helpers to check if a reference should be excluded. */ -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 2/5] rev-list: refactor early option parsing 2025-03-19 18:34 ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-19 18:34 ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler @ 2025-03-19 18:34 ` Justin Tobler 2025-03-19 18:34 ` [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes Justin Tobler ` (2 subsequent siblings) 4 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler Before invoking `setup_revisions()`, the `--missing` and `--exclude-promisor-objects` options are parsed early. In a subsequent commit, another option is added that must be parsed early. Refactor the code to parse both options in a single early pass. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- builtin/rev-list.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index dcd079c16c..04d9c893b5 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -16,6 +16,7 @@ #include "object-file.h" #include "object-store-ll.h" #include "pack-bitmap.h" +#include "parse-options.h" #include "log-tree.h" #include "graph.h" #include "bisect.h" @@ -639,19 +640,15 @@ int cmd_rev_list(int argc, if (!strcmp(arg, "--exclude-promisor-objects")) { fetch_if_missing = 0; revs.exclude_promisor_objects = 1; - break; - } - } - for (i = 1; i < argc; i++) { - const char *arg = argv[i]; - if (skip_prefix(arg, "--missing=", &arg)) { - if (revs.exclude_promisor_objects) - die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing"); - if (parse_missing_action_value(arg)) - break; + } else if (skip_prefix(arg, "--missing=", &arg)) { + parse_missing_action_value(arg); } } + die_for_incompatible_opt2(revs.exclude_promisor_objects, + "--exclude_promisor_objects", + arg_missing_action, "--missing"); + if (arg_missing_action) revs.do_not_die_on_missing_objects = 1; -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes 2025-03-19 18:34 ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-19 18:34 ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler 2025-03-19 18:34 ` [PATCH v4 2/5] rev-list: refactor early option parsing Justin Tobler @ 2025-03-19 18:34 ` Justin Tobler 2025-03-19 18:34 ` [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option Justin Tobler 2025-03-19 18:34 ` [PATCH v4 5/5] rev-list: support NUL-delimited --missing option Justin Tobler 4 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler When walking objects, git-rev-list(1) prints each object entry on a separate line. Some options, such as `--objects`, may print additional information about tree and blob object on the same line in the form: $ git rev-list --objects <rev> <tree/blob oid> SP [<path>] LF Note that in this form the SP is appended regardless of whether the tree or blob object has path information available. Paths containing a newline are also truncated at the newline. Introduce the `-z` option for git-rev-list(1) which reformats the output to use NUL-delimiters between objects and associated info in the following form: $ git rev-list -z --objects <rev> <oid> NUL [path=<path> NUL] In this form, the start of each record is signaled by an OID entry that is all hexidecimal and does not contain any '='. Additional path info from `--objects` is appended to the record as a token/value pair `path=<path>` as-is without any truncation. For now, the `--objects` flag is the only options that can be used in combination with `-z`. In a subsequent commit, NUL-delimited support for other options is added. Other options that do not make sense when used in combination with `-z` are rejected. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 21 +++++++++++++++++ builtin/rev-list.c | 35 ++++++++++++++++++++++++----- t/t6000-rev-list-misc.sh | 35 +++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+), 5 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 785c0786e0..aef83813b8 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -361,6 +361,27 @@ ifdef::git-rev-list[] --progress=<header>:: Show progress reports on stderr as objects are considered. The `<header>` text will be printed with each progress update. + +-z:: + Instead of being newline-delimited, each outputted object and its + accompanying metadata is delimited using NUL bytes. Output is printed + in the following form: ++ +----------------------------------------------------------------------- +<OID> NUL [<token>=<value> NUL]... +----------------------------------------------------------------------- ++ +Additional object metadata, such as object paths, is printed using the +`<token>=<value>` form. Token values are printed as-is without any +encoding/truncation. An OID entry never contains a '=' character and thus +is used to signal the start of a new object record. Examples: ++ +----------------------------------------------------------------------- +<OID> NUL +<OID> NUL path=<path> NUL +----------------------------------------------------------------------- ++ +This mode is only compatible with the `--objects` output option. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 04d9c893b5..17de99d9ca 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -65,6 +65,7 @@ static const char rev_list_usage[] = " --abbrev-commit\n" " --left-right\n" " --count\n" +" -z\n" " special purpose:\n" " --bisect\n" " --bisect-vars\n" @@ -97,6 +98,9 @@ static int arg_show_object_names = 1; #define DEFAULT_OIDSET_SIZE (16*1024) +static char line_term = '\n'; +static char info_term = ' '; + static int show_disk_usage; static off_t total_disk_usage; static int human_readable; @@ -264,7 +268,7 @@ static void show_commit(struct commit *commit, void *data) if (revs->commit_format == CMIT_FMT_ONELINE) putchar(' '); else if (revs->include_header) - putchar('\n'); + putchar(line_term); if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; @@ -361,12 +365,16 @@ static void show_object(struct object *obj, const char *name, void *cb_data) printf("%s", oid_to_hex(&obj->oid)); if (arg_show_object_names) { - putchar(' '); - for (const char *p = name; *p && *p != '\n'; p++) - putchar(*p); + if (line_term) { + putchar(info_term); + for (const char *p = name; *p && *p != '\n'; p++) + putchar(*p); + } else if (*name) { + printf("%cpath=%s", info_term, name); + } } - putchar('\n'); + putchar(line_term); } static void show_edge(struct commit *commit) @@ -642,6 +650,9 @@ int cmd_rev_list(int argc, revs.exclude_promisor_objects = 1; } else if (skip_prefix(arg, "--missing=", &arg)) { parse_missing_action_value(arg); + } else if (!strcmp(arg, "-z")) { + line_term = '\0'; + info_term = '\0'; } } @@ -757,6 +768,20 @@ int cmd_rev_list(int argc, usage(rev_list_usage); } + + /* + * Reject options currently incompatible with -z. For some options, this + * is not an inherent limitation and support may be implemented in the + * future. + */ + if (!line_term) { + if (revs.graph || revs.verbose_header || show_disk_usage || + info.show_timestamp || info.header_prefix || bisect_list || + use_bitmap_index || revs.edge_hint || revs.left_right || + revs.cherry_mark || arg_missing_action || revs.boundary) + die(_("-z option used with unsupported option")); + } + if (revs.commit_format != CMIT_FMT_USERFORMAT) revs.include_header = 1; if (revs.commit_format != CMIT_FMT_UNSPECIFIED) { diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 6289a2e8b0..886e2fc710 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -182,4 +182,39 @@ test_expect_success 'rev-list --unpacked' ' test_cmp expect actual ' +test_expect_success 'rev-list -z' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD~) && + oid2=$(git -C repo rev-parse HEAD) && + + printf "%s\0%s\0" "$oid2" "$oid1" >expect && + git -C repo rev-list -z HEAD >actual && + + test_cmp expect actual +' + +test_expect_success 'rev-list -z --objects' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD:1.t) && + oid2=$(git -C repo rev-parse HEAD:2.t) && + path1=1.t && + path2=2.t && + + printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \ + >expect && + git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual && + + test_cmp expect actual +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option 2025-03-19 18:34 ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler ` (2 preceding siblings ...) 2025-03-19 18:34 ` [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes Justin Tobler @ 2025-03-19 18:34 ` Justin Tobler 2025-03-19 18:34 ` [PATCH v4 5/5] rev-list: support NUL-delimited --missing option Justin Tobler 4 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler The `--boundary` option for git-rev-list(1) prints boundary objects found while performing the object walk in the form: $ git rev-list --boundary <rev> -<oid> LF Add support for printing boundary objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --boundary <rev> <oid> NUL boundary=yes NUL In this mode, instead of prefixing the boundary OID with '-', a separate `boundary=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 12 +++++++----- builtin/rev-list.c | 9 +++++++-- t/t6000-rev-list-misc.sh | 16 ++++++++++++++++ 3 files changed, 30 insertions(+), 7 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index aef83813b8..3fc9902d6b 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -371,17 +371,19 @@ ifdef::git-rev-list[] <OID> NUL [<token>=<value> NUL]... ----------------------------------------------------------------------- + -Additional object metadata, such as object paths, is printed using the -`<token>=<value>` form. Token values are printed as-is without any -encoding/truncation. An OID entry never contains a '=' character and thus -is used to signal the start of a new object record. Examples: +Additional object metadata, such as object paths or boundary objects, is +printed using the `<token>=<value>` form. Token values are printed as-is +without any encoding/truncation. An OID entry never contains a '=' character +and thus is used to signal the start of a new object record. Examples: + ----------------------------------------------------------------------- <OID> NUL <OID> NUL path=<path> NUL +<OID> NUL boundary=yes NUL ----------------------------------------------------------------------- + -This mode is only compatible with the `--objects` output option. +This mode is only compatible with the `--objects` and `--boundary` output +options. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 17de99d9ca..bcb880f109 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -240,13 +240,18 @@ static void show_commit(struct commit *commit, void *data) fputs(info->header_prefix, stdout); if (revs->include_header) { - if (!revs->graph) + if (!revs->graph && line_term) fputs(get_revision_mark(revs, commit), stdout); if (revs->abbrev_commit && revs->abbrev) fputs(repo_find_unique_abbrev(the_repository, &commit->object.oid, revs->abbrev), stdout); else fputs(oid_to_hex(&commit->object.oid), stdout); + + if (!line_term) { + if (commit->object.flags & BOUNDARY) + printf("%cboundary=yes", info_term); + } } if (revs->print_parents) { struct commit_list *parents = commit->parents; @@ -778,7 +783,7 @@ int cmd_rev_list(int argc, if (revs.graph || revs.verbose_header || show_disk_usage || info.show_timestamp || info.header_prefix || bisect_list || use_bitmap_index || revs.edge_hint || revs.left_right || - revs.cherry_mark || arg_missing_action || revs.boundary) + revs.cherry_mark || arg_missing_action) die(_("-z option used with unsupported option")); } diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 886e2fc710..33881274a4 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -217,4 +217,20 @@ test_expect_success 'rev-list -z --objects' ' test_cmp expect actual ' +test_expect_success 'rev-list -z --boundary' ' + test_when_finished rm -rf repo && + + git init repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + + oid1=$(git -C repo rev-parse HEAD~) && + oid2=$(git -C repo rev-parse HEAD) && + + printf "%s\0%s\0boundary=yes\0" "$oid2" "$oid1" >expect && + git -C repo rev-list -z --boundary HEAD~.. >actual && + + test_cmp expect actual +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v4 5/5] rev-list: support NUL-delimited --missing option 2025-03-19 18:34 ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler ` (3 preceding siblings ...) 2025-03-19 18:34 ` [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option Justin Tobler @ 2025-03-19 18:34 ` Justin Tobler 4 siblings, 0 replies; 60+ messages in thread From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw) To: git; +Cc: ps, christian.couder, Justin Tobler The `--missing={print,print-info}` option for git-rev-list(1) prints missing objects found while performing the object walk in the form: $ git rev-list --missing=print-info <rev> ?<oid> [SP <token>=<value>]... LF Add support for printing missing objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --missing=print-info <rev> <oid> NUL missing=yes NUL [<token>=<value> NUL]... In this mode, values containing special characters or spaces are printed as-is without being escaped or quoted. Instead of prefixing the missing OID with '?', a separate `missing=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- Documentation/rev-list-options.adoc | 5 +++-- builtin/rev-list.c | 31 ++++++++++++++++++++--------- t/t6022-rev-list-missing.sh | 31 +++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+), 11 deletions(-) diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc index 3fc9902d6b..0e5605a85e 100644 --- a/Documentation/rev-list-options.adoc +++ b/Documentation/rev-list-options.adoc @@ -380,10 +380,11 @@ and thus is used to signal the start of a new object record. Examples: <OID> NUL <OID> NUL path=<path> NUL <OID> NUL boundary=yes NUL +<OID> NUL missing=yes NUL [<token>=<value> NUL]... ----------------------------------------------------------------------- + -This mode is only compatible with the `--objects` and `--boundary` output -options. +This mode is only compatible with the `--objects`, `--boundary`, and +`--missing` output options. endif::git-rev-list[] History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index bcb880f109..e6ee3f82ee 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -136,24 +136,37 @@ static void print_missing_object(struct missing_objects_map_entry *entry, { struct strbuf sb = STRBUF_INIT; + if (line_term) + printf("?%s", oid_to_hex(&entry->entry.oid)); + else + printf("%s%cmissing=yes", oid_to_hex(&entry->entry.oid), + info_term); + if (!print_missing_info) { - printf("?%s\n", oid_to_hex(&entry->entry.oid)); + putchar(line_term); return; } if (entry->path && *entry->path) { - struct strbuf path = STRBUF_INIT; + strbuf_addf(&sb, "%cpath=", info_term); + + if (line_term) { + struct strbuf path = STRBUF_INIT; - strbuf_addstr(&sb, " path="); - quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); - strbuf_addbuf(&sb, &path); + quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP); + strbuf_addbuf(&sb, &path); - strbuf_release(&path); + strbuf_release(&path); + } else { + strbuf_addstr(&sb, entry->path); + } } if (entry->type) - strbuf_addf(&sb, " type=%s", type_name(entry->type)); + strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type)); + + fwrite(sb.buf, sizeof(char), sb.len, stdout); + putchar(line_term); - printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf); strbuf_release(&sb); } @@ -783,7 +796,7 @@ int cmd_rev_list(int argc, if (revs.graph || revs.verbose_header || show_disk_usage || info.show_timestamp || info.header_prefix || bisect_list || use_bitmap_index || revs.edge_hint || revs.left_right || - revs.cherry_mark || arg_missing_action) + revs.cherry_mark) die(_("-z option used with unsupported option")); } diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh index 3e2790d4c8..08e92dd002 100755 --- a/t/t6022-rev-list-missing.sh +++ b/t/t6022-rev-list-missing.sh @@ -198,4 +198,35 @@ do ' done +test_expect_success "-z nul-delimited --missing" ' + test_when_finished rm -rf repo && + + git init repo && + ( + cd repo && + git commit --allow-empty -m first && + + path="foo bar" && + echo foobar >"$path" && + git add -A && + git commit -m second && + + oid=$(git rev-parse "HEAD:$path") && + type="$(git cat-file -t $oid)" && + + obj_path=".git/objects/$(test_oid_to_path $oid)" && + + git rev-list -z --objects --no-object-names \ + HEAD ^"$oid" >expect && + printf "%s\0missing=yes\0path=%s\0type=%s\0" "$oid" "$path" \ + "$type" >>expect && + + mv "$obj_path" "$obj_path.hidden" && + git rev-list -z --objects --no-object-names \ + --missing=print-info HEAD >actual && + + test_cmp expect actual + ) +' + test_done -- 2.49.0.rc2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
end of thread, other threads:[~2025-03-19 18:37 UTC | newest] Thread overview: 60+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler 2025-03-10 20:51 ` Junio C Hamano 2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler 2025-03-10 20:54 ` Junio C Hamano 2025-03-12 21:39 ` Justin Tobler 2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler 2025-03-10 20:59 ` Junio C Hamano 2025-03-12 21:39 ` Justin Tobler 2025-03-12 7:50 ` Patrick Steinhardt 2025-03-12 21:41 ` Justin Tobler 2025-03-10 19:28 ` [PATCH 4/4] rev-list: support NUL-delimited --missing option Justin Tobler 2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano 2025-03-10 21:08 ` Junio C Hamano 2025-03-11 23:24 ` Justin Tobler 2025-03-11 23:19 ` Justin Tobler 2025-03-11 23:44 ` Junio C Hamano 2025-03-12 7:37 ` Patrick Steinhardt 2025-03-12 21:45 ` Justin Tobler 2025-03-10 22:38 ` D. Ben Knoble 2025-03-11 22:59 ` Justin Tobler 2025-03-11 23:57 ` Jeff King 2025-03-12 7:42 ` Patrick Steinhardt 2025-03-12 15:56 ` Junio C Hamano 2025-03-13 7:46 ` Patrick Steinhardt 2025-03-12 22:09 ` Justin Tobler 2025-03-13 5:33 ` Jeff King 2025-03-13 16:41 ` Justin Tobler 2025-03-14 2:49 ` Jeff King 2025-03-14 17:02 ` Junio C Hamano 2025-03-14 18:59 ` Jeff King 2025-03-14 19:53 ` Justin Tobler 2025-03-14 21:16 ` Junio C Hamano 2025-03-19 15:58 ` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 0/6] " Justin Tobler 2025-03-13 0:17 ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 2/6] rev-list: refactor early option parsing Justin Tobler 2025-03-13 0:17 ` [PATCH v2 3/6] revision: support NUL-delimited --stdin mode Justin Tobler 2025-03-13 0:17 ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler 2025-03-13 12:55 ` Patrick Steinhardt 2025-03-13 14:44 ` Justin Tobler 2025-03-13 0:17 ` [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler 2025-03-13 0:17 ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler 2025-03-13 12:55 ` Patrick Steinhardt 2025-03-13 14:51 ` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-13 23:57 ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 2/6] rev-list: refactor early option parsing Justin Tobler 2025-03-13 23:57 ` [PATCH v3 3/6] revision: support NUL-delimited --stdin mode Justin Tobler 2025-03-13 23:57 ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler 2025-03-19 12:35 ` Christian Couder 2025-03-19 16:02 ` Justin Tobler 2025-03-13 23:57 ` [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler 2025-03-13 23:57 ` [PATCH v3 6/6] rev-list: support NUL-delimited --missing option Justin Tobler 2025-03-19 18:34 ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler 2025-03-19 18:34 ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler 2025-03-19 18:34 ` [PATCH v4 2/5] rev-list: refactor early option parsing Justin Tobler 2025-03-19 18:34 ` [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes Justin Tobler 2025-03-19 18:34 ` [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option Justin Tobler 2025-03-19 18:34 ` [PATCH v4 5/5] rev-list: support NUL-delimited --missing option Justin Tobler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).