git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: git@vger.kernel.org
Cc: ps@pks.im, christian.couder@gmail.com, peff@peff.net,
	ben.knoble@gmail.com, Justin Tobler <jltobler@gmail.com>
Subject: [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode
Date: Thu, 13 Mar 2025 18:57:41 -0500	[thread overview]
Message-ID: <20250313235747.9583-1-jltobler@gmail.com> (raw)
In-Reply-To: <20250313001706.3390502-1-jltobler@gmail.com>

When walking objects, git-rev-list(1) prints each object entry on a
separate line in the form:

        <oid> LF

Some options, such as `--objects`, may print additional information
about the object on the same line:

        <oid> SP [<path>] LF

In this mode, if the object path contains a newline it is truncated at
the newline.

The `--boundary` option also modifies output by prefixing boundary
objects with `-`:

        -<oid> LF

When the `--missing={print,print-info}` option is provided, information
about any missing objects encountered during the object walk are also
printed in the form:

        ?<oid> [SP <token>=<value>]... LF

where values containing LF or SP are printed in a token specific fashion
so that the resulting encoded value does not contain either of these two
problematic bytes. For example, missing object paths are quoted in the C
style when they contain LF or SP.

To make machine parsing easier, this series introduces a NUL-delimited
output mode for git-rev-list(1) via a `-z` option. In this mode, the
output format for object records is unified such that each object and
its accompanying metadata is formatted without relying on object
metadata order. This format follows the existing `<token>=<value>` used
by the `--missing` option to represent object metadata in the form:

        <oid> NUL [<token>=<value> NUL]...

        # Examples
        <oid> LF                       -> <oid> NUL
        <oid> SP <path> LF             -> <oid> NUL path=<path> NUL
        -<oid> LF                      -> <oid> NUL boundary=yes NUL
        ?<oid> [SP <token>=<value>]... -> <oid> NUL missing=yes NUL [<token>=<value> NUL]...

Note that token value info is printed as-is without any special encoding
or truncation. Prefixes such as '-' and '?' are dropped in favor using a
token/value pair to signal the same information.

While in this mode, if the `--sdtin` option is used, revision and
pathspec arguments read from stdin are separated with a NUL byte instead
of being newline delimited.

For now this series only adds support for use with the `--objects`,
`--boundary` and `--missing` output options. Usage of `-z` with other
options is rejected, so it can potentially be added in the future.

This series is structured as follows:

        - Patches 1 and 2 do some minor preparatory refactors.

        - Patch 3 modifies stdin argument parsing handled by
          `setup_revisions()` to support NUL-delimited arguments.

        - Patch 4 adds the `-z` option to git-rev-list(1) to print
          objects in a NUL-delimited fashion. Arguments parsed on stdin
          while in the mode are also NUL-delimited.

        - Patch 5 teaches the `--boundary` option how to print info in a
          NUL-delimited fashino using the unified output format.

        - Patch 6 teaches the `--missing` option how to print info in a
          NUL-delimited fashion using the unified output format.

Changes since V2:

        - In patch 4, the documentation for the -z option now points out
          the `--stdin` behavior change earlier.

        - Minor code style and documentation changes in patch 6.

Changes since V1:

        - Use unified output format with `<token>=<value>` pairs for
          all object metadata.

        - Add support for the `--boundary` option in NUL-delimited mode.

        - Add support for NUL-delimited stdin argument parsing in
          NUL-delimited mode.

        - Instead of using two NUL bytes to delimit between object
          records, a single NUL byte is used. Now that object metadata
          is always in the form `<token>=<value>`, we know a new object
          record starts when there is an OID entry which will not
          contain '='.

Thanks for taking a look,
-Justin

Justin Tobler (6):
  rev-list: inline `show_object_with_name()` in `show_object()`
  rev-list: refactor early option parsing
  revision: support NUL-delimited --stdin mode
  rev-list: support delimiting objects with NUL bytes
  rev-list: support NUL-delimited --boundary option
  rev-list: support NUL-delimited --missing option

 Documentation/rev-list-options.adoc | 26 ++++++++
 builtin/rev-list.c                  | 94 +++++++++++++++++++++--------
 revision.c                          | 27 ++++-----
 revision.h                          |  5 +-
 t/t6000-rev-list-misc.sh            | 51 ++++++++++++++++
 t/t6017-rev-list-stdin.sh           |  9 +++
 t/t6022-rev-list-missing.sh         | 31 ++++++++++
 7 files changed, 200 insertions(+), 43 deletions(-)

Range-diff against v2:
1:  d2eded3ac7 = 1:  d2eded3ac7 rev-list: inline `show_object_with_name()` in `show_object()`
2:  03cd08c859 = 2:  03cd08c859 rev-list: refactor early option parsing
3:  803a49933a = 3:  803a49933a revision: support NUL-delimited --stdin mode
4:  d3b3c4ef89 ! 4:  8eb7669089 rev-list: support delimiting objects with NUL bytes
    @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
     +
     +-z::
     +	Instead of being newline-delimited, each outputted object and its
    -+	accompanying metadata is delimited using NUL bytes in the following
    -+	form:
    ++	accompanying metadata is delimited using NUL bytes. In this mode, when
    ++	the `--stdin` option is provided, revision and pathspec arguments on
    ++	stdin are also delimited using a NUL byte. Output is printed in the
    ++	following form:
     ++
     +-----------------------------------------------------------------------
     +<OID> NUL [<token>=<value> NUL]...
    @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
     +<OID> NUL path=<path> NUL
     +-----------------------------------------------------------------------
     ++
    -+This mode is only compatible with the `--objects` output option. Also, revision
    -+and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
    -+delimited instead of using newlines while in this mode.
    ++This mode is only compatible with the `--objects` output option.
      endif::git-rev-list[]
      
      History Simplification
5:  5e4fc41976 ! 5:  591a2c7dac rev-list: support NUL-delimited --boundary option
    @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
     +<OID> NUL boundary=yes NUL
      -----------------------------------------------------------------------
      +
    --This mode is only compatible with the `--objects` output option. Also, revision
    --and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
    --delimited instead of using newlines while in this mode.
    +-This mode is only compatible with the `--objects` output option.
     +This mode is only compatible with the `--objects` and `--boundary` output
    -+options. Also, revision and pathspec argument parsing on stdin with the
    -+`--stdin` option is NUL byte delimited instead of using newlines while in this
    -+mode.
    ++options.
      endif::git-rev-list[]
      
      History Simplification
6:  7744966514 ! 6:  669b3b5d9f rev-list: support NUL-delimited --missing option
    @@ Commit message
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
      ## Documentation/rev-list-options.adoc ##
    -@@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
    - <OID> NUL [<token>=<value> NUL]...
    - -----------------------------------------------------------------------
    - +
    --Additional object metadata, such as object paths or boundary objects, is
    --printed using the `<token>=<value>` form. Token values are printed as-is
    -+Additional object metadata, such as object paths or boundary/missing objects,
    -+is printed using the `<token>=<value>` form. Token values are printed as-is
    - without any encoding/truncation. An OID entry never contains a '=' character
    - and thus is used to signal the start of a new object record. Examples:
    - +
     @@ Documentation/rev-list-options.adoc: and thus is used to signal the start of a new object record. Examples:
      <OID> NUL
      <OID> NUL path=<path> NUL
    @@ Documentation/rev-list-options.adoc: and thus is used to signal the start of a n
      -----------------------------------------------------------------------
      +
     -This mode is only compatible with the `--objects` and `--boundary` output
    --options. Also, revision and pathspec argument parsing on stdin with the
    --`--stdin` option is NUL byte delimited instead of using newlines while in this
    --mode.
    +-options.
     +This mode is only compatible with the `--objects`, `--boundary`, and
    -+`--missing` output options. Also, revision and pathspec argument parsing on
    -+stdin with the `--stdin` option is NUL byte delimited instead of using newlines
    -+while in this mode.
    ++`--missing` output options.
      endif::git-rev-list[]
      
      History Simplification
    @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_
      	struct strbuf sb = STRBUF_INIT;
      
     +	if (line_term)
    -+		putchar('?');
    -+
    -+	printf("%s", oid_to_hex(&entry->entry.oid));
    -+
    -+	if (!line_term)
    -+		printf("%cmissing=yes", info_term);
    ++		printf("?%s", oid_to_hex(&entry->entry.oid));
    ++	else
    ++		printf("%s%cmissing=yes", oid_to_hex(&entry->entry.oid),
    ++		       info_term);
     +
      	if (!print_missing_info) {
     -		printf("?%s\n", oid_to_hex(&entry->entry.oid));
    @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_
      	}
      
      	if (entry->path && *entry->path) {
    - 		struct strbuf path = STRBUF_INIT;
    +-		struct strbuf path = STRBUF_INIT;
    ++		strbuf_addf(&sb, "%cpath=", info_term);
    ++
    ++		if (line_term) {
    ++			struct strbuf path = STRBUF_INIT;
      
     -		strbuf_addstr(&sb, " path=");
     -		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
     -		strbuf_addbuf(&sb, &path);
    -+		strbuf_addf(&sb, "%cpath=", info_term);
    -+
    -+		if (line_term) {
     +			quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
     +			strbuf_addbuf(&sb, &path);
    + 
    +-		strbuf_release(&path);
    ++			strbuf_release(&path);
     +		} else {
     +			strbuf_addstr(&sb, entry->path);
     +		}
    - 
    - 		strbuf_release(&path);
      	}
      	if (entry->type)
     -		strbuf_addf(&sb, " type=%s", type_name(entry->type));

base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3
-- 
2.49.0.rc2


  parent reply	other threads:[~2025-03-14  0:01 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
2025-03-10 20:51   ` Junio C Hamano
2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler
2025-03-10 20:54   ` Junio C Hamano
2025-03-12 21:39     ` Justin Tobler
2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler
2025-03-10 20:59   ` Junio C Hamano
2025-03-12 21:39     ` Justin Tobler
2025-03-12  7:50   ` Patrick Steinhardt
2025-03-12 21:41     ` Justin Tobler
2025-03-10 19:28 ` [PATCH 4/4] rev-list: support NUL-delimited --missing option Justin Tobler
2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano
2025-03-10 21:08   ` Junio C Hamano
2025-03-11 23:24     ` Justin Tobler
2025-03-11 23:19   ` Justin Tobler
2025-03-11 23:44     ` Junio C Hamano
2025-03-12  7:37       ` Patrick Steinhardt
2025-03-12 21:45         ` Justin Tobler
2025-03-10 22:38 ` D. Ben Knoble
2025-03-11 22:59   ` Justin Tobler
2025-03-11 23:57 ` Jeff King
2025-03-12  7:42   ` Patrick Steinhardt
2025-03-12 15:56     ` Junio C Hamano
2025-03-13  7:46       ` Patrick Steinhardt
2025-03-12 22:09   ` Justin Tobler
2025-03-13  5:33     ` Jeff King
2025-03-13 16:41       ` Justin Tobler
2025-03-14  2:49         ` Jeff King
2025-03-14 17:02           ` Junio C Hamano
2025-03-14 18:59             ` Jeff King
2025-03-14 19:53               ` Justin Tobler
2025-03-14 21:16                 ` Junio C Hamano
2025-03-19 15:58                   ` Justin Tobler
2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
2025-03-13  0:17   ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
2025-03-13  0:17   ` [PATCH v2 2/6] rev-list: refactor early option parsing Justin Tobler
2025-03-13  0:17   ` [PATCH v2 3/6] revision: support NUL-delimited --stdin mode Justin Tobler
2025-03-13  0:17   ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
2025-03-13 12:55     ` Patrick Steinhardt
2025-03-13 14:44       ` Justin Tobler
2025-03-13  0:17   ` [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler
2025-03-13  0:17   ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
2025-03-13 12:55     ` Patrick Steinhardt
2025-03-13 14:51       ` Justin Tobler
2025-03-13 23:57   ` Justin Tobler [this message]
2025-03-13 23:57     ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
2025-03-13 23:57     ` [PATCH v3 2/6] rev-list: refactor early option parsing Justin Tobler
2025-03-13 23:57     ` [PATCH v3 3/6] revision: support NUL-delimited --stdin mode Justin Tobler
2025-03-13 23:57     ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
2025-03-19 12:35       ` Christian Couder
2025-03-19 16:02         ` Justin Tobler
2025-03-13 23:57     ` [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler
2025-03-13 23:57     ` [PATCH v3 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
2025-03-19 18:34       ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
2025-03-19 18:34       ` [PATCH v4 2/5] rev-list: refactor early option parsing Justin Tobler
2025-03-19 18:34       ` [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes Justin Tobler
2025-03-19 18:34       ` [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option Justin Tobler
2025-03-19 18:34       ` [PATCH v4 5/5] rev-list: support NUL-delimited --missing option Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250313235747.9583-1-jltobler@gmail.com \
    --to=jltobler@gmail.com \
    --cc=ben.knoble@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).