From: Elijah Newren <newren@gmail.com>
To: <git@vger.kernel.org>
Cc: <larsxschneider@gmail.com>, <sandals@crustytoothpaste.net>,
<peff@peff.net>, <me@ttaylorr.com>, <jrnieder@gmail.com>,
<gitster@pobox.com>, Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 11/11] fast-export: add --always-show-modify-after-rename
Date: Tue, 13 Nov 2018 16:26:00 -0800 [thread overview]
Message-ID: <20181114002600.29233-12-newren@gmail.com> (raw)
In-Reply-To: <20181114002600.29233-1-newren@gmail.com>
I wanted a way to gather all the following information efficiently
(with as few history traversals as possible):
* Get all blob sizes
* Map blob shas to filename(s) they appeared under in the history
* Find when files and directories were deleted (and whether they
were later reinstated, since that means they aren't actually gone)
* Find sets of filenames referring to the same logical 'file'. (e.g.
foo->bar in commit A and bar->baz in commit B mean that
{foo,bar,baz} refer to the same 'file', so someone wanting to just
"keep baz and its history" need all versions of those three
filenames). I need to know about things like another foo or bar
being introduced after the rename though, since that breaks the
connection between filenames)
and then I would generate various aggregations on the data and display
some type of report for the user.
The only way I know of to get blob sizes is via
cat-file --batch-all-objects --batch-check
The rest of the data would traditionally be gathered from a log command,
e.g.
git log --format='%H%n%P%n%cd' --date=short --topo-order --reverse \
-M --diff-filter=RAMD --no-abbrev --raw -c
however, parsing log output seems slightly dangerous given that it is a
porcelain command. While we have specified --format and --raw to try
to avoid the most obvious problems, I'm still slightly concerned about
--date=short, the combinations of --raw and -c, options that might
colorize the output, and also the --diff-filter (there is no current
option named --no-find-copies or --no-break-rewrites, but what if those
turn on by default in the future much as we changed the default with
detecting renames?). Each of those is a small worry, but they add up.
A command meant for data serialization, such as fast-export, seems like
a better candidate for this job. There's just one missing item: in
order to connect blob sizes to filenames, I need fast-export to tell me
the blob sha1sum of any file changes. It does this for modifies, but
not always for renames. In particular, if a file is a 100% rename, it
only prints
R oldname newname
instead of
R oldname newname
M 100644 $SHA1 newname
as occurs when there is a rename+modify. Add an option which allows us
to force the latter output even when commits have exact renames of
files.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
Documentation/git-fast-export.txt | 11 ++++++++++
builtin/fast-export.c | 7 +++++-
t/t9350-fast-export.sh | 36 +++++++++++++++++++++++++++++++
3 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index 64c01ba918..b663b6f8af 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -129,6 +129,17 @@ marks the same across runs.
for intermediary filters (e.g. for rewriting commit messages
which refer to older commits, or for stripping blobs by id).
+--always-show-modify-after-rename::
+ When a rename is detected, fast-export normally issues both a
+ 'R' (rename) and a 'M' (modify) directive. However, if the
+ contents of the old and new filename match exactly, it will
+ only issue the rename directive. Use this flag to have it
+ always issue the modify directive after the rename, which may
+ be useful for tools which are using the fast-export stream as
+ a mechanism for gathering statistics about a repository. Note
+ that this option only has effect when rename detection is
+ active (see the -M option).
+
--refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index e0f794811e..31ad43077a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
static int no_data;
static int full_tree;
static int reference_excluded_commits;
+static int always_show_modify_after_rename;
static int show_original_ids;
static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
@@ -407,7 +408,8 @@ static void show_filemodify(struct diff_queue_struct *q,
putchar('\n');
if (oideq(&ospec->oid, &spec->oid) &&
- ospec->mode == spec->mode)
+ ospec->mode == spec->mode &&
+ !always_show_modify_after_rename)
break;
}
/* fallthrough */
@@ -1105,6 +1107,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
&reference_excluded_commits, N_("Reference parents which are not in fast-export stream by sha1sum")),
OPT_BOOL(0, "show-original-ids", &show_original_ids,
N_("Show original sha1sums of blobs/commits")),
+ OPT_BOOL(0, "always-show-modify-after-rename",
+ &always_show_modify_after_rename,
+ N_("Always provide 'M' directive after 'R'")),
OPT_END()
};
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 5690fe2810..5c20065e39 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -630,4 +630,40 @@ test_expect_success 'merge commit gets exported with --import-marks' '
)
'
+test_expect_success 'rename detection and --always-show-modify-after-rename' '
+ test_create_repo renames &&
+ (
+ cd renames &&
+ test_seq 0 9 >single_digit &&
+ test_seq 10 98 >double_digit &&
+ git add . &&
+ git commit -m initial &&
+
+ echo 99 >>double_digit &&
+ git mv single_digit single-digit &&
+ git mv double_digit double-digit &&
+ git add double-digit &&
+ git commit -m renames &&
+
+ # First, check normal fast-export -M output
+ git fast-export -M --no-data master >out &&
+
+ grep double-digit out >out2 &&
+ test_line_count = 2 out2 &&
+
+ grep single-digit out >out2 &&
+ test_line_count = 1 out2 &&
+
+ # Now, test with --always-show-modify-after-rename; should
+ # have an extra "M" directive for "single-digit".
+ git fast-export -M --no-data --always-show-modify-after-rename master >out &&
+
+ grep double-digit out >out2 &&
+ test_line_count = 2 out2 &&
+
+ grep single-digit out >out2 &&
+ test_line_count = 2 out2
+ )
+'
+
test_done
--
2.19.1.1063.g2b8e4a4f82.dirty
next prev parent reply other threads:[~2018-11-14 0:26 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-23 13:04 Import/Export as a fast way to purge files from Git? Lars Schneider
2018-09-23 14:55 ` Eric Sunshine
2018-09-23 15:58 ` Lars Schneider
2018-09-23 15:53 ` brian m. carlson
2018-09-23 17:04 ` Jeff King
2018-09-24 17:24 ` Elijah Newren
2018-10-31 19:15 ` Lars Schneider
2018-11-01 7:12 ` Elijah Newren
2018-11-11 6:23 ` [PATCH 00/10] fast export and import fixes and features Elijah Newren
2018-11-11 6:23 ` [PATCH 01/10] git-fast-import.txt: fix documentation for --quiet option Elijah Newren
2018-11-11 6:33 ` Jeff King
2018-11-11 6:23 ` [PATCH 02/10] git-fast-export.txt: clarify misleading documentation about rev-list args Elijah Newren
2018-11-11 6:36 ` Jeff King
2018-11-11 7:17 ` Elijah Newren
2018-11-13 23:25 ` Elijah Newren
2018-11-13 23:39 ` Jonathan Nieder
2018-11-14 0:02 ` Elijah Newren
2018-11-11 6:23 ` [PATCH 03/10] fast-export: use value from correct enum Elijah Newren
2018-11-11 6:36 ` Jeff King
2018-11-11 20:10 ` Ævar Arnfjörð Bjarmason
2018-11-12 9:12 ` Ævar Arnfjörð Bjarmason
2018-11-12 11:31 ` Jeff King
2018-11-11 6:23 ` [PATCH 04/10] fast-export: avoid dying when filtering by paths and old tags exist Elijah Newren
2018-11-11 6:44 ` Jeff King
2018-11-11 7:38 ` Elijah Newren
2018-11-12 12:32 ` Jeff King
2018-11-12 22:50 ` brian m. carlson
2018-11-13 14:38 ` Jeff King
2018-11-11 6:23 ` [PATCH 05/10] fast-export: move commit rewriting logic into a function for reuse Elijah Newren
2018-11-11 6:47 ` Jeff King
2018-11-11 6:23 ` [PATCH 06/10] fast-export: when using paths, avoid corrupt stream with non-existent mark Elijah Newren
2018-11-11 6:53 ` Jeff King
2018-11-11 8:01 ` Elijah Newren
2018-11-12 12:45 ` Jeff King
2018-11-12 15:36 ` Elijah Newren
2018-11-11 6:23 ` [PATCH 07/10] fast-export: ensure we export requested refs Elijah Newren
2018-11-11 7:02 ` Jeff King
2018-11-11 8:20 ` Elijah Newren
2018-11-11 6:23 ` [PATCH 08/10] fast-export: add --reference-excluded-parents option Elijah Newren
2018-11-11 7:11 ` Jeff King
2018-11-11 6:23 ` [PATCH 09/10] fast-export: add a --show-original-ids option to show original names Elijah Newren
2018-11-11 7:20 ` Jeff King
2018-11-11 8:32 ` Elijah Newren
2018-11-12 12:53 ` Jeff King
2018-11-12 15:46 ` Elijah Newren
2018-11-12 16:31 ` Jeff King
2018-11-11 6:23 ` [PATCH 10/10] fast-export: add --always-show-modify-after-rename Elijah Newren
2018-11-11 7:23 ` Jeff King
2018-11-11 8:42 ` Elijah Newren
2018-11-12 12:58 ` Jeff King
2018-11-12 18:08 ` Elijah Newren
2018-11-13 14:45 ` Jeff King
2018-11-13 17:10 ` Elijah Newren
2018-11-14 7:14 ` Jeff King
2018-11-11 7:27 ` [PATCH 00/10] fast export and import fixes and features Jeff King
2018-11-11 8:44 ` Elijah Newren
2018-11-12 13:00 ` Jeff King
2018-11-14 0:25 ` [PATCH v2 00/11] " Elijah Newren
2018-11-14 0:25 ` [PATCH v2 01/11] git-fast-import.txt: fix documentation for --quiet option Elijah Newren
2018-11-14 0:25 ` [PATCH v2 02/11] git-fast-export.txt: clarify misleading documentation about rev-list args Elijah Newren
2018-11-14 0:25 ` [PATCH v2 03/11] fast-export: use value from correct enum Elijah Newren
2018-11-14 0:25 ` [PATCH v2 04/11] fast-export: avoid dying when filtering by paths and old tags exist Elijah Newren
2018-11-14 19:17 ` SZEDER Gábor
2018-11-14 23:13 ` Elijah Newren
2018-11-14 0:25 ` [PATCH v2 05/11] fast-export: move commit rewriting logic into a function for reuse Elijah Newren
2018-11-14 0:25 ` [PATCH v2 06/11] fast-export: when using paths, avoid corrupt stream with non-existent mark Elijah Newren
2018-11-14 0:25 ` [PATCH v2 07/11] fast-export: ensure we export requested refs Elijah Newren
2018-11-14 0:25 ` [PATCH v2 08/11] fast-export: add --reference-excluded-parents option Elijah Newren
2018-11-14 19:27 ` SZEDER Gábor
2018-11-14 23:16 ` Elijah Newren
2018-11-14 0:25 ` [PATCH v2 09/11] fast-import: remove unmaintained duplicate documentation Elijah Newren
2018-11-14 0:25 ` [PATCH v2 10/11] fast-export: add a --show-original-ids option to show original names Elijah Newren
2018-11-14 0:26 ` Elijah Newren [this message]
2018-11-14 7:25 ` [PATCH v2 00/11] fast export and import fixes and features Jeff King
2018-11-16 7:59 ` [PATCH v3 " Elijah Newren
2018-11-16 7:59 ` [PATCH v3 01/11] fast-export: convert sha1 to oid Elijah Newren
2018-11-16 7:59 ` [PATCH v3 02/11] git-fast-import.txt: fix documentation for --quiet option Elijah Newren
2018-11-16 7:59 ` [PATCH v3 03/11] git-fast-export.txt: clarify misleading documentation about rev-list args Elijah Newren
2018-11-16 7:59 ` [PATCH v3 04/11] fast-export: use value from correct enum Elijah Newren
2018-11-16 7:59 ` [PATCH v3 05/11] fast-export: avoid dying when filtering by paths and old tags exist Elijah Newren
2018-11-16 7:59 ` [PATCH v3 06/11] fast-export: move commit rewriting logic into a function for reuse Elijah Newren
2018-11-16 7:59 ` [PATCH v3 07/11] fast-export: when using paths, avoid corrupt stream with non-existent mark Elijah Newren
2018-11-16 7:59 ` [PATCH v3 08/11] fast-export: ensure we export requested refs Elijah Newren
2018-11-16 7:59 ` [PATCH v3 09/11] fast-export: add --reference-excluded-parents option Elijah Newren
2018-11-16 7:59 ` [PATCH v3 10/11] fast-import: remove unmaintained duplicate documentation Elijah Newren
2018-11-16 7:59 ` [PATCH v3 11/11] fast-export: add a --show-original-ids option to show original names Elijah Newren
2018-11-16 12:29 ` SZEDER Gábor
2018-11-16 8:50 ` [PATCH v3 00/11] fast export and import fixes and features Jeff King
2018-11-12 9:17 ` Import/Export as a fast way to purge files from Git? Ævar Arnfjörð Bjarmason
2018-11-12 15:34 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181114002600.29233-12-newren@gmail.com \
--to=newren@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=larsxschneider@gmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).