All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jerry Zhang via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jerry Zhang <jerry@skydio.com>
Subject: [PATCH v4 0/6] patch-id fixes and improvements
Date: Thu, 20 Oct 2022 23:16:49 +0000	[thread overview]
Message-ID: <pull.1359.v4.git.1666307815.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1359.v3.git.1665737804.gitgitgadget@gmail.com>

These patches add fixes and features to the "git patch-id" command, mostly
discovered through our usage of patch-id in the revup project
(https://github.com/Skydio/revup). On top of that I've tried to make general
cleanup changes where I can.

Summary:

1: Fixed a bug in the combination of --stable with binary files and
header-only, and expanded the test to cover both binary and non-binary
files.

2: Switch internal usage of patch-id in rebase / cherry-pick to use the
stable variant to reduce the number of code paths and improve testing for
bugs like above.

3: Fixed bugs with patch-id and binary diffs. Previously patch-id did not
behave correctly for binary diffs regardless of whether "--binary" was given
to "diff".

4: Fixed bugs with patch-id and mode changes. Previously mode changes were
incorrectly excluded from the patch-id.

5: Add a new "--include-whitespace" mode to patch-id that prevents
whitespace from being stripped during id calculation. Also add a config
option for the same behavior.

6: Remove unused prefix from patch-id logic.

V1->V2: Fixed comment style V2->V3: The ---/+++ lines no longer get added to
the patch-id of binary diffs. Also added patches 3-7 in the series. V3->V3:
Dropped patch7. Updated flag name to --verbatim. Updated commit message
descriptions.

Signed-off-by: Jerry Zhang jerry@skydio.com

Jerry Zhang (6):
  patch-id: fix stable patch id for binary / header-only
  patch-id: use stable patch-id for rebases
  builtin: patch-id: fix patch-id with binary diffs
  patch-id: fix patch-id for mode changes
  builtin: patch-id: add --verbatim as a command mode
  builtin: patch-id: remove unused diff-tree prefix

 Documentation/git-patch-id.txt |  24 ++++---
 builtin/log.c                  |   2 +-
 builtin/patch-id.c             | 113 ++++++++++++++++++++++++---------
 diff.c                         |  75 +++++++++++-----------
 diff.h                         |   2 +-
 patch-ids.c                    |  10 +--
 patch-ids.h                    |   2 +-
 t/t3419-rebase-patch-id.sh     |  63 +++++++++++++++---
 t/t4204-patch-id.sh            |  95 +++++++++++++++++++++++++--
 9 files changed, 287 insertions(+), 99 deletions(-)


base-commit: 45c9f05c44b1cb6bd2d6cb95a22cf5e3d21d5b63
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1359%2Fjerry-skydio%2Fjerry%2Frevup%2Fmaster%2Fpatch_ids-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1359/jerry-skydio/jerry/revup/master/patch_ids-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1359

Range-diff vs v3:

 1:  7d4c2e91ce0 ! 1:  321757ef919 patch-id: fix stable patch id for binary / header-only
     @@ Metadata
       ## Commit message ##
          patch-id: fix stable patch id for binary / header-only
      
     -    Previous logic here skipped flushing the hunks for binary
     -    and header-only patch ids, which would always result in a
     -    patch-id of 0000.
     +    Patch-ids for binary patches are found by hashing the object
     +    ids of the before and after objects in succession. However in
     +    the --stable case, there is a bug where hunks are not flushed
     +    for binary and header-only patch ids, which would always result
     +    in a patch-id of 0000. The --unstable case is currently correct.
      
          Reorder the logic to branch into 3 cases for populating the
          patch body: header-only which populates nothing, binary which
 2:  25e28b7dab3 = 2:  ec4a2422d5b patch-id: use stable patch-id for rebases
 3:  21642128927 ! 3:  81501355313 builtin: patch-id: fix patch-id with binary diffs
     @@ Commit message
          builtin: patch-id: fix patch-id with binary diffs
      
          "git patch-id" currently doesn't produce correct output if the
     -    incoming diff has any binary files. Add logic to
     -    get_one_patchid to handle the different possible styles of binary
     -    diff. This attempts to keep resulting patch-ids identical to what
     -    would be produced by the counterpart logic in diff.c, that is it
     -    produces the id by hashing the a and b oids in succession.
     +    incoming diff has any binary files. Add logic to get_one_patchid
     +    to handle the different possible styles of binary diff. This
     +    attempts to keep resulting patch-ids identical to what would be
     +    produced by the counterpart logic in diff.c, that is it produces
     +    the id by hashing the a and b oids in succession.
      
          In general we handle binary diffs by first caching the object ids from
          the "index" line and using those if we then find an indication
 4:  6e07cfd5691 = 4:  bb0b4add03c patch-id: fix patch-id for mode changes
 5:  bbaa2425ad0 ! 5:  b160f2ae49f builtin: patch-id: add --include-whitespace as a command mode
     @@ Metadata
      Author: Jerry Zhang <jerry@skydio.com>
      
       ## Commit message ##
     -    builtin: patch-id: add --include-whitespace as a command mode
     +    builtin: patch-id: add --verbatim as a command mode
      
     -    There are situations where the user might not want the default setting
     -    where patch-id strips all whitespace. They might be working in a
     -    language where white space is syntactically important, or they might
     -    have CI testing that enforces strict whitespace linting. In these cases,
     -    a whitespace change would result in the patch fundamentally changing,
     -    and thus deserving of a different id.
     +    There are situations where the user might not want the default
     +    setting where patch-id strips all whitespace. They might be working
     +    in a language where white space is syntactically important, or they
     +    might have CI testing that enforces strict whitespace linting. In
     +    these cases, a whitespace change would result in the patch
     +    fundamentally changing, and thus deserving of a different id.
      
          Add a new mode that is exclusive of --stable and --unstable called
     -    --include-whitespace. It also corresponds to the config
     -    patchid.include_whitespace = true. In this mode, the stable algorithm
     -    is used and whitespace is not stripped from the patch text.
     +    --verbatim. It also corresponds to the config
     +    patchid.verbatim = true. In this mode, the stable algorithm is
     +    used and whitespace is not stripped from the patch text.
     +
     +    Users of --unstable mainly care about compatibility with old git
     +    versions, which unstripping the whitespace would break. Thus there
     +    isn't a usecase for the combination of --verbatim and --unstable,
     +    and we don't expose this so as to not add maintainence burden.
      
          Signed-off-by: Jerry Zhang <jerry@skydio.com>
          fixes https://github.com/Skydio/revup/issues/2
     @@ Documentation/git-patch-id.txt: git-patch-id - Compute unique ID for a patch
       --------
       [verse]
      -'git patch-id' [--stable | --unstable]
     -+'git patch-id' [--stable | --unstable | --include-whitespace]
     ++'git patch-id' [--stable | --unstable | --verbatim]
       
       DESCRIPTION
       -----------
     @@ Documentation/git-patch-id.txt: This can be used to make a mapping from patch ID
       OPTIONS
       -------
       
     -+--include-whitespace::
     -+	Use the "stable" algorithm described below and also don't strip whitespace
     -+	from lines when calculating the patch-id.
     ++--verbatim::
     ++	Calculate the patch-id of the input as it is given, do not strip
     ++	any whitespace.
      +
     -+	This is the default if patchid.includeWhitespace is true and implies
     -+	patchid.stable.
     ++	This is the default if patchid.verbatim is true.
      +
       --stable::
       	Use a "stable" sum of hashes as the patch ID. With this option:
     @@ builtin/patch-id.c: static int scan_hunk_header(const char *p, int *p_before, in
       
       static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
      -			   struct strbuf *line_buf, int stable)
     -+			   struct strbuf *line_buf, int stable, int include_whitespace)
     ++			   struct strbuf *line_buf, int stable, int verbatim)
       {
       	int patchlen = 0, found_next = 0;
       	int before = -1, after = -1;
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
       		    !skip_prefix(line, "From ", &p) &&
      -		    starts_with(line, "\\ ") && 12 < strlen(line))
      +		    starts_with(line, "\\ ") && 12 < strlen(line)) {
     -+			if (include_whitespace)
     ++			if (verbatim)
      +				the_hash_algo->update_fn(&ctx, line, strlen(line));
       			continue;
      +		}
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
      -		/* Compute the sha without whitespace */
      -		len = remove_space(line);
      +		/* Add line to hash algo (possibly removing whitespace) */
     -+		len = include_whitespace ? strlen(line) : remove_space(line);
     ++		len = verbatim ? strlen(line) : remove_space(line);
       		patchlen += len;
       		the_hash_algo->update_fn(&ctx, line, len);
       	}
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
       }
       
      -static void generate_id_list(int stable)
     -+static void generate_id_list(int stable, int include_whitespace)
     ++static void generate_id_list(int stable, int verbatim)
       {
       	struct object_id oid, n, result;
       	int patchlen;
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
       	oidclr(&oid);
       	while (!feof(stdin)) {
      -		patchlen = get_one_patchid(&n, &result, &line_buf, stable);
     -+		patchlen = get_one_patchid(&n, &result, &line_buf, stable, include_whitespace);
     ++		patchlen = get_one_patchid(&n, &result, &line_buf, stable, verbatim);
       		flush_current_id(patchlen, &oid, &result);
       		oidcpy(&oid, &n);
       	}
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
       }
       
      -static const char patch_id_usage[] = "git patch-id [--stable | --unstable]";
     -+static const char * const patch_id_usage[] = {
     -+	N_("git patch-id [--stable | --unstable | --include-whitespace]"),
     -+	NULL
     ++static const char *const patch_id_usage[] = {
     ++	N_("git patch-id [--stable | --unstable | --verbatim]"), NULL
      +};
      +
      +struct patch_id_opts {
      +	int stable;
     -+	int include_whitespace;
     ++	int verbatim;
      +};
       
       static int git_patch_id_config(const char *var, const char *value, void *cb)
     @@ builtin/patch-id.c: static void generate_id_list(int stable)
      +		opts->stable = git_config_bool(var, value);
      +		return 0;
      +	}
     -+	if (!strcmp(var, "patchid.includewhitespace")) {
     -+		opts->include_whitespace = git_config_bool(var, value);
     ++	if (!strcmp(var, "patchid.verbatim")) {
     ++		opts->verbatim = git_config_bool(var, value);
       		return 0;
       	}
       
     @@ builtin/patch-id.c: static int git_patch_id_config(const char *var, const char *
      +	int opts = 0;
      +	struct option builtin_patch_id_options[] = {
      +		OPT_CMDMODE(0, "unstable", &opts,
     -+			N_("use the unstable patch-id algorithm"), 1),
     ++		    N_("use the unstable patch-id algorithm"), 1),
      +		OPT_CMDMODE(0, "stable", &opts,
     -+			N_("use the stable patch-id algorithm"), 2),
     -+		OPT_CMDMODE(0, "include-whitespace", &opts,
     -+			N_("use the stable algorithm and don't strip whitespace"), 3),
     ++		    N_("use the stable patch-id algorithm"), 2),
     ++		OPT_CMDMODE(0, "verbatim", &opts,
     ++			N_("don't strip whitespace from the patch"), 3),
      +		OPT_END()
      +	};
      +
      +	git_config(git_patch_id_config, &config);
      +
     -+	/* includeWhitespace implies stable */
     -+	if (config.include_whitespace)
     ++	/* verbatim implies stable */
     ++	if (config.verbatim)
      +		config.stable = 1;
      +
      +	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
      +			     patch_id_usage, 0);
      +
      +	generate_id_list(opts ? opts > 1 : config.stable,
     -+			 opts ? opts == 3 : config.include_whitespace);
     ++			 opts ? opts == 3 : config.verbatim);
       	return 0;
       }
      
     @@ t/t4204-patch-id.sh: test_expect_success 'file order is relevant with --unstable
       	test_patch_id_file_order relevant --unstable --unstable
       '
       
     -+test_expect_success 'whitespace is relevant with --include-whitespace' '
     -+	test_patch_id_whitespace relevant --include-whitespace --include-whitespace
     ++test_expect_success 'whitespace is relevant with --verbatim' '
     ++	test_patch_id_whitespace relevant --verbatim --verbatim
      +'
      +
     -+test_expect_success 'whitespace is irrelevant without --include-whitespace' '
     ++test_expect_success 'whitespace is irrelevant without --verbatim' '
      +	test_patch_id_whitespace irrelevant --stable --stable
      +'
      +
     @@ t/t4204-patch-id.sh: test_expect_success 'patchid.stable = false is unstable' '
       	test_patch_id relevant patchid.stable=false
       '
       
     -+test_expect_success 'patchid.includeWhitespace = true is correct and stable' '
     -+	test_config patchid.includeWhitespace true &&
     -+	test_patch_id_whitespace relevant patchid.includeWhitespace=true &&
     -+	test_patch_id irrelevant patchid.includeWhitespace=true
     ++test_expect_success 'patchid.verbatim = true is correct and stable' '
     ++	test_config patchid.verbatim true &&
     ++	test_patch_id_whitespace relevant patchid.verbatim=true &&
     ++	test_patch_id irrelevant patchid.verbatim=true
      +'
      +
     -+test_expect_success 'patchid.includeWhitespace = false is unstable' '
     -+	test_config patchid.includeWhitespace false &&
     -+	test_patch_id relevant patchid.includeWhitespace=false
     ++test_expect_success 'patchid.verbatim = false is unstable' '
     ++	test_config patchid.verbatim false &&
     ++	test_patch_id relevant patchid.verbatim=false
      +'
      +
       test_expect_success '--unstable overrides patchid.stable = true' '
     @@ t/t4204-patch-id.sh: test_expect_success '--stable overrides patchid.stable = fa
       	test_patch_id irrelevant patchid.stable=false--stable --stable
       '
       
     -+test_expect_success '--include-whitespace overrides patchid.stable = false' '
     ++test_expect_success '--verbatim overrides patchid.stable = false' '
      +	test_config patchid.stable false &&
     -+	test_patch_id_whitespace relevant stable=false--include-whitespace --include-whitespace
     ++	test_patch_id_whitespace relevant stable=false--verbatim --verbatim
      +'
      +
       test_expect_success 'patch-id supports git-format-patch MIME output' '
     @@ t/t4204-patch-id.sh: test_expect_success 'patch-id handles no-nl-at-eof markers'
       	calc_patch_id withnl <withnl &&
      -	test_cmp patch-id_nonl patch-id_withnl
      +	test_cmp patch-id_nonl patch-id_withnl &&
     -+	calc_patch_id nonl-inc-ws --include-whitespace <nonl &&
     -+	calc_patch_id withnl-inc-ws --include-whitespace <withnl &&
     ++	calc_patch_id nonl-inc-ws --verbatim <nonl &&
     ++	calc_patch_id withnl-inc-ws --verbatim <withnl &&
      +	! test_cmp patch-id_nonl-inc-ws patch-id_withnl-inc-ws
       '
       
 6:  a1f6f36d487 ! 6:  dcdfac7a153 builtin: patch-id: remove unused diff-tree prefix
     @@ builtin/patch-id.c: static int get_one_patchid(struct object_id *next_oid, struc
      +		if (!skip_prefix(line, "commit ", &p) &&
       		    !skip_prefix(line, "From ", &p) &&
       		    starts_with(line, "\\ ") && 12 < strlen(line)) {
     - 			if (include_whitespace)
     + 			if (verbatim)
 7:  69440797f30 < -:  ----------- documentation: format-patch: clarify requirements for patch-ids to match

-- 
gitgitgadget

  parent reply	other threads:[~2022-10-20 23:17 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20  5:58 [PATCH 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
2022-09-20  5:58 ` [PATCH 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-09-20  5:58 ` [PATCH 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-09-20  6:20 ` [PATCH v2 0/2] update internal patch-id to use "stable" algorithm Jerry Zhang via GitGitGadget
2022-09-20  6:20   ` [PATCH v2 1/2] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-09-20  6:20   ` [PATCH v2 2/2] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-14  8:56   ` [PATCH v3 0/7] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 1/7] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 2/7] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-14  8:56     ` [PATCH v3 3/7] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-14 17:13       ` Junio C Hamano
2022-10-14 22:33         ` Jerry Zhang
2022-10-14 21:12       ` Junio C Hamano
2022-10-14 22:34         ` Jerry Zhang
2022-10-17 15:23           ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 4/7] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-14 21:17       ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 5/7] builtin: patch-id: add --include-whitespace as a command mode Jerry Zhang via GitGitGadget
2022-10-14 21:24       ` Junio C Hamano
2022-10-14 22:55         ` Jerry Zhang
2022-10-17 15:38         ` Junio C Hamano
2022-10-18 22:12           ` Jerry Zhang
2022-10-14  8:56     ` [PATCH v3 6/7] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-14 22:03       ` Junio C Hamano
2022-10-14  8:56     ` [PATCH v3 7/7] documentation: format-patch: clarify requirements for patch-ids to match Jerry Zhang via GitGitGadget
2022-10-17 15:18       ` Junio C Hamano
2022-10-18 21:57         ` Jerry Zhang
2022-10-19 16:19           ` Junio C Hamano
2022-10-20 23:16     ` Jerry Zhang via GitGitGadget [this message]
2022-10-20 23:16       ` [PATCH v4 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
2022-10-20 23:16       ` [PATCH v4 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-21  9:33         ` Junio C Hamano
2022-10-24 20:07       ` [PATCH v5 0/6] patch-id fixes and improvements Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 1/6] patch-id: fix stable patch id for binary / header-only Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 2/6] patch-id: use stable patch-id for rebases Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 3/6] builtin: patch-id: fix patch-id with binary diffs Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 4/6] patch-id: fix patch-id for mode changes Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 5/6] builtin: patch-id: add --verbatim as a command mode Jerry Zhang via GitGitGadget
2022-10-24 20:07         ` [PATCH v5 6/6] builtin: patch-id: remove unused diff-tree prefix Jerry Zhang via GitGitGadget
2022-10-24 22:55         ` [PATCH v5 0/6] patch-id fixes and improvements Junio C Hamano
2022-09-21 19:16 ` [PATCH 0/2] update internal patch-id to use "stable" algorithm Junio C Hamano
2022-09-21 20:59   ` Jerry Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1359.v4.git.1666307815.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jerry@skydio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.