All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: <git@vger.kernel.org>
Cc: <larsxschneider@gmail.com>, <sandals@crustytoothpaste.net>,
	<peff@peff.net>, <me@ttaylorr.com>, <jrnieder@gmail.com>,
	<gitster@pobox.com>, Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 11/11] fast-export: add --always-show-modify-after-rename
Date: Tue, 13 Nov 2018 16:26:00 -0800	[thread overview]
Message-ID: <20181114002600.29233-12-newren@gmail.com> (raw)
In-Reply-To: <20181114002600.29233-1-newren@gmail.com>

I wanted a way to gather all the following information efficiently
(with as few history traversals as possible):
  * Get all blob sizes
  * Map blob shas to filename(s) they appeared under in the history
  * Find when files and directories were deleted (and whether they
    were later reinstated, since that means they aren't actually gone)
  * Find sets of filenames referring to the same logical 'file'. (e.g.
    foo->bar in commit A and bar->baz in commit B mean that
    {foo,bar,baz} refer to the same 'file', so someone wanting to just
    "keep baz and its history" need all versions of those three
    filenames).  I need to know about things like another foo or bar
    being introduced after the rename though, since that breaks the
    connection between filenames)
and then I would generate various aggregations on the data and display
some type of report for the user.

The only way I know of to get blob sizes is via
  cat-file --batch-all-objects --batch-check

The rest of the data would traditionally be gathered from a log command,
e.g.

  git log --format='%H%n%P%n%cd' --date=short --topo-order --reverse \
      -M --diff-filter=RAMD --no-abbrev --raw -c

however, parsing log output seems slightly dangerous given that it is a
porcelain command.  While we have specified --format and --raw to try
to avoid the most obvious problems, I'm still slightly concerned about
--date=short, the combinations of --raw and -c, options that might
colorize the output, and also the --diff-filter (there is no current
option named --no-find-copies or --no-break-rewrites, but what if those
turn on by default in the future much as we changed the default with
detecting renames?).  Each of those is a small worry, but they add up.

A command meant for data serialization, such as fast-export, seems like
a better candidate for this job.  There's just one missing item: in
order to connect blob sizes to filenames, I need fast-export to tell me
the blob sha1sum of any file changes.  It does this for modifies, but
not always for renames.  In particular, if a file is a 100% rename, it
only prints
    R oldname newname
instead of
    R oldname newname
    M 100644 $SHA1 newname
as occurs when there is a rename+modify.  Add an option which allows us
to force the latter output even when commits have exact renames of
files.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-export.txt | 11 ++++++++++
 builtin/fast-export.c             |  7 +++++-
 t/t9350-fast-export.sh            | 36 +++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index 64c01ba918..b663b6f8af 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -129,6 +129,17 @@ marks the same across runs.
 	for intermediary filters (e.g. for rewriting commit messages
 	which refer to older commits, or for stripping blobs by id).
 
+--always-show-modify-after-rename::
+	When a rename is detected, fast-export normally issues both a
+	'R' (rename) and a 'M' (modify) directive.  However, if the
+	contents of the old and new filename match exactly, it will
+	only issue the rename directive.  Use this flag to have it
+	always issue the modify directive after the rename, which may
+	be useful for tools which are using the fast-export stream as
+	a mechanism for gathering statistics about a repository.  Note
+	that this option only has effect when rename detection is
+	active (see the -M option).
+
 --refspec::
 	Apply the specified refspec to each ref exported. Multiple of them can
 	be specified.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index e0f794811e..31ad43077a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
+static int always_show_modify_after_rename;
 static int show_original_ids;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
@@ -407,7 +408,8 @@ static void show_filemodify(struct diff_queue_struct *q,
 				putchar('\n');
 
 				if (oideq(&ospec->oid, &spec->oid) &&
-				    ospec->mode == spec->mode)
+				    ospec->mode == spec->mode &&
+				    !always_show_modify_after_rename)
 					break;
 			}
 			/* fallthrough */
@@ -1105,6 +1107,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 			 &reference_excluded_commits, N_("Reference parents which are not in fast-export stream by sha1sum")),
 		OPT_BOOL(0, "show-original-ids", &show_original_ids,
 			    N_("Show original sha1sums of blobs/commits")),
+		OPT_BOOL(0, "always-show-modify-after-rename",
+			    &always_show_modify_after_rename,
+			 N_("Always provide 'M' directive after 'R'")),
 
 		OPT_END()
 	};
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 5690fe2810..5c20065e39 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -630,4 +630,40 @@ test_expect_success 'merge commit gets exported with --import-marks' '
 	)
 '
 
+test_expect_success 'rename detection and --always-show-modify-after-rename' '
+	test_create_repo renames &&
+	(
+		cd renames &&
+		test_seq 0  9  >single_digit &&
+		test_seq 10 98 >double_digit &&
+		git add . &&
+		git commit -m initial &&
+
+		echo 99 >>double_digit &&
+		git mv single_digit single-digit &&
+		git mv double_digit double-digit &&
+		git add double-digit &&
+		git commit -m renames &&
+
+		# First, check normal fast-export -M output
+		git fast-export -M --no-data master >out &&
+
+		grep double-digit out >out2 &&
+		test_line_count = 2 out2 &&
+
+		grep single-digit out >out2 &&
+		test_line_count = 1 out2 &&
+
+		# Now, test with --always-show-modify-after-rename; should
+		# have an extra "M" directive for "single-digit".
+		git fast-export -M --no-data --always-show-modify-after-rename master >out &&
+
+		grep double-digit out >out2 &&
+		test_line_count = 2 out2 &&
+
+		grep single-digit out >out2 &&
+		test_line_count = 2 out2
+	)
+'
+
 test_done
-- 
2.19.1.1063.g2b8e4a4f82.dirty


  parent reply	other threads:[~2018-11-14  0:26 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-23 13:04 Import/Export as a fast way to purge files from Git? Lars Schneider
2018-09-23 14:55 ` Eric Sunshine
2018-09-23 15:58   ` Lars Schneider
2018-09-23 15:53 ` brian m. carlson
2018-09-23 17:04   ` Jeff King
2018-09-24 17:24 ` Elijah Newren
2018-10-31 19:15   ` Lars Schneider
2018-11-01  7:12     ` Elijah Newren
2018-11-11  6:23       ` [PATCH 00/10] fast export and import fixes and features Elijah Newren
2018-11-11  6:23         ` [PATCH 01/10] git-fast-import.txt: fix documentation for --quiet option Elijah Newren
2018-11-11  6:33           ` Jeff King
2018-11-11  6:23         ` [PATCH 02/10] git-fast-export.txt: clarify misleading documentation about rev-list args Elijah Newren
2018-11-11  6:36           ` Jeff King
2018-11-11  7:17             ` Elijah Newren
2018-11-13 23:25               ` Elijah Newren
2018-11-13 23:39                 ` Jonathan Nieder
2018-11-14  0:02                   ` Elijah Newren
2018-11-11  6:23         ` [PATCH 03/10] fast-export: use value from correct enum Elijah Newren
2018-11-11  6:36           ` Jeff King
2018-11-11 20:10             ` Ævar Arnfjörð Bjarmason
2018-11-12  9:12               ` Ævar Arnfjörð Bjarmason
2018-11-12 11:31               ` Jeff King
2018-11-11  6:23         ` [PATCH 04/10] fast-export: avoid dying when filtering by paths and old tags exist Elijah Newren
2018-11-11  6:44           ` Jeff King
2018-11-11  7:38             ` Elijah Newren
2018-11-12 12:32               ` Jeff King
2018-11-12 22:50             ` brian m. carlson
2018-11-13 14:38               ` Jeff King
2018-11-11  6:23         ` [PATCH 05/10] fast-export: move commit rewriting logic into a function for reuse Elijah Newren
2018-11-11  6:47           ` Jeff King
2018-11-11  6:23         ` [PATCH 06/10] fast-export: when using paths, avoid corrupt stream with non-existent mark Elijah Newren
2018-11-11  6:53           ` Jeff King
2018-11-11  8:01             ` Elijah Newren
2018-11-12 12:45               ` Jeff King
2018-11-12 15:36                 ` Elijah Newren
2018-11-11  6:23         ` [PATCH 07/10] fast-export: ensure we export requested refs Elijah Newren
2018-11-11  7:02           ` Jeff King
2018-11-11  8:20             ` Elijah Newren
2018-11-11  6:23         ` [PATCH 08/10] fast-export: add --reference-excluded-parents option Elijah Newren
2018-11-11  7:11           ` Jeff King
2018-11-11  6:23         ` [PATCH 09/10] fast-export: add a --show-original-ids option to show original names Elijah Newren
2018-11-11  7:20           ` Jeff King
2018-11-11  8:32             ` Elijah Newren
2018-11-12 12:53               ` Jeff King
2018-11-12 15:46                 ` Elijah Newren
2018-11-12 16:31                   ` Jeff King
2018-11-11  6:23         ` [PATCH 10/10] fast-export: add --always-show-modify-after-rename Elijah Newren
2018-11-11  7:23           ` Jeff King
2018-11-11  8:42             ` Elijah Newren
2018-11-12 12:58               ` Jeff King
2018-11-12 18:08                 ` Elijah Newren
2018-11-13 14:45                   ` Jeff King
2018-11-13 17:10                     ` Elijah Newren
2018-11-14  7:14                       ` Jeff King
2018-11-11  7:27         ` [PATCH 00/10] fast export and import fixes and features Jeff King
2018-11-11  8:44           ` Elijah Newren
2018-11-12 13:00             ` Jeff King
2018-11-14  0:25         ` [PATCH v2 00/11] " Elijah Newren
2018-11-14  0:25           ` [PATCH v2 01/11] git-fast-import.txt: fix documentation for --quiet option Elijah Newren
2018-11-14  0:25           ` [PATCH v2 02/11] git-fast-export.txt: clarify misleading documentation about rev-list args Elijah Newren
2018-11-14  0:25           ` [PATCH v2 03/11] fast-export: use value from correct enum Elijah Newren
2018-11-14  0:25           ` [PATCH v2 04/11] fast-export: avoid dying when filtering by paths and old tags exist Elijah Newren
2018-11-14 19:17             ` SZEDER Gábor
2018-11-14 23:13               ` Elijah Newren
2018-11-14  0:25           ` [PATCH v2 05/11] fast-export: move commit rewriting logic into a function for reuse Elijah Newren
2018-11-14  0:25           ` [PATCH v2 06/11] fast-export: when using paths, avoid corrupt stream with non-existent mark Elijah Newren
2018-11-14  0:25           ` [PATCH v2 07/11] fast-export: ensure we export requested refs Elijah Newren
2018-11-14  0:25           ` [PATCH v2 08/11] fast-export: add --reference-excluded-parents option Elijah Newren
2018-11-14 19:27             ` SZEDER Gábor
2018-11-14 23:16               ` Elijah Newren
2018-11-14  0:25           ` [PATCH v2 09/11] fast-import: remove unmaintained duplicate documentation Elijah Newren
2018-11-14  0:25           ` [PATCH v2 10/11] fast-export: add a --show-original-ids option to show original names Elijah Newren
2018-11-14  0:26           ` Elijah Newren [this message]
2018-11-14  7:25           ` [PATCH v2 00/11] fast export and import fixes and features Jeff King
2018-11-16  7:59           ` [PATCH v3 " Elijah Newren
2018-11-16  7:59             ` [PATCH v3 01/11] fast-export: convert sha1 to oid Elijah Newren
2018-11-16  7:59             ` [PATCH v3 02/11] git-fast-import.txt: fix documentation for --quiet option Elijah Newren
2018-11-16  7:59             ` [PATCH v3 03/11] git-fast-export.txt: clarify misleading documentation about rev-list args Elijah Newren
2018-11-16  7:59             ` [PATCH v3 04/11] fast-export: use value from correct enum Elijah Newren
2018-11-16  7:59             ` [PATCH v3 05/11] fast-export: avoid dying when filtering by paths and old tags exist Elijah Newren
2018-11-16  7:59             ` [PATCH v3 06/11] fast-export: move commit rewriting logic into a function for reuse Elijah Newren
2018-11-16  7:59             ` [PATCH v3 07/11] fast-export: when using paths, avoid corrupt stream with non-existent mark Elijah Newren
2018-11-16  7:59             ` [PATCH v3 08/11] fast-export: ensure we export requested refs Elijah Newren
2018-11-16  7:59             ` [PATCH v3 09/11] fast-export: add --reference-excluded-parents option Elijah Newren
2018-11-16  7:59             ` [PATCH v3 10/11] fast-import: remove unmaintained duplicate documentation Elijah Newren
2018-11-16  7:59             ` [PATCH v3 11/11] fast-export: add a --show-original-ids option to show original names Elijah Newren
2018-11-16 12:29               ` SZEDER Gábor
2018-11-16  8:50             ` [PATCH v3 00/11] fast export and import fixes and features Jeff King
2018-11-12  9:17       ` Import/Export as a fast way to purge files from Git? Ævar Arnfjörð Bjarmason
2018-11-12 15:34         ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181114002600.29233-12-newren@gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=larsxschneider@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.