From: Elijah Newren <newren@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Eric Sunshine" <sunshine@sunshineco.com>,
"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
"Johannes Sixt" <j6t@kdbg.org>,
"Torsten Bögershausen" <tboegi@web.de>,
"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v6 5/5] fast-export: do automatic reencoding of commit messages only if requested
Date: Mon, 13 May 2019 21:31:02 -0700 [thread overview]
Message-ID: <20190514043102.31979-6-newren@gmail.com> (raw)
In-Reply-To: <20190514043102.31979-1-newren@gmail.com>
Automatic re-encoding of commit messages (and dropping of the encoding
header) hurts attempts to do reversible history rewrites (e.g. sha1sum
<-> sha256sum transitions, some subtree rewrites), and seems
inconsistent with the general principle followed elsewhere in
fast-export of requiring explicit user requests to modify the output
(e.g. --signed-tags=strip, --tag-of-filtered-object=rewrite). Add a
--reencode flag that the user can use to specify, and like other
fast-export flags, default it to 'abort'.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
Documentation/git-fast-export.txt | 7 +++++
builtin/fast-export.c | 46 +++++++++++++++++++++++++++++--
t/t9350-fast-export.sh | 38 +++++++++++++++++++++++--
3 files changed, 85 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index 64c01ba918..11427acdde 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -129,6 +129,13 @@ marks the same across runs.
for intermediary filters (e.g. for rewriting commit messages
which refer to older commits, or for stripping blobs by id).
+--reencode=(yes|no|abort)::
+ Specify how to handle `encoding` header in commit objects. When
+ asking to 'abort' (which is the default), this program will die
+ when encountering such a commit object. With 'yes', the commit
+ message will be reencoded into UTF-8. With 'no', the original
+ encoding will be preserved.
+
--refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 66331fa401..c22cef3b2f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -33,6 +33,7 @@ static const char *fast_export_usage[] = {
static int progress;
static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT;
static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = TAG_FILTERING_ABORT;
+static enum { REENCODE_ABORT, REENCODE_YES, REENCODE_NO } reencode_mode = REENCODE_ABORT;
static int fake_missing_tagger;
static int use_done_feature;
static int no_data;
@@ -77,6 +78,31 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt,
return 0;
}
+static int parse_opt_reencode_mode(const struct option *opt,
+ const char *arg, int unset)
+{
+ if (unset) {
+ reencode_mode = REENCODE_ABORT;
+ return 0;
+ }
+
+ switch (git_parse_maybe_bool(arg)) {
+ case 0:
+ reencode_mode = REENCODE_NO;
+ break;
+ case 1:
+ reencode_mode = REENCODE_YES;
+ break;
+ default:
+ if (!strcasecmp(arg, "abort"))
+ reencode_mode = REENCODE_ABORT;
+ else
+ return error("Unknown reencoding mode: %s", arg);
+ }
+
+ return 0;
+}
+
static struct decoration idnums;
static uint32_t last_idnum;
@@ -633,10 +659,21 @@ static void handle_commit(struct commit *commit, struct rev_info *rev,
}
mark_next_object(&commit->object);
- if (anonymize)
+ if (anonymize) {
reencoded = anonymize_commit_message(message);
- else if (!is_encoding_utf8(encoding))
- reencoded = reencode_string(message, "UTF-8", encoding);
+ } else if (encoding) {
+ switch(reencode_mode) {
+ case REENCODE_YES:
+ reencoded = reencode_string(message, "UTF-8", encoding);
+ break;
+ case REENCODE_NO:
+ break;
+ case REENCODE_ABORT:
+ die("Encountered commit-specific encoding %s in commit "
+ "%s; use --reencode=[yes|no] to handle it",
+ encoding, oid_to_hex(&commit->object.oid));
+ }
+ }
if (!commit->parents)
printf("reset %s\n", refname);
printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum);
@@ -1091,6 +1128,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
OPT_CALLBACK(0, "tag-of-filtered-object", &tag_of_filtered_mode, N_("mode"),
N_("select handling of tags that tag filtered objects"),
parse_opt_tag_of_filtered_mode),
+ OPT_CALLBACK(0, "reencode", &reencode_mode, N_("mode"),
+ N_("select handling of commit messages in an alternate encoding"),
+ parse_opt_reencode_mode),
OPT_STRING(0, "export-marks", &export_filename, N_("file"),
N_("Dump marks to this file")),
OPT_STRING(0, "import-marks", &import_filename, N_("file"),
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index e2ab8eddc0..b4004e05c2 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -94,14 +94,14 @@ test_expect_success 'fast-export --show-original-ids | git fast-import' '
test $MUSS = $(git rev-parse --verify refs/tags/muss)
'
-test_expect_success 'iso-8859-7' '
+test_expect_success 'reencoding iso-8859-7' '
test_when_finished "git reset --hard HEAD~1" &&
test_config i18n.commitencoding iso-8859-7 &&
test_tick &&
echo rosten >file &&
git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file &&
- git fast-export wer^..wer >iso-8859-7.fi &&
+ git fast-export --reencode=yes wer^..wer >iso-8859-7.fi &&
sed "s/wer/i18n/" iso-8859-7.fi |
(cd new &&
git fast-import &&
@@ -118,13 +118,45 @@ test_expect_success 'iso-8859-7' '
! grep ^encoding actual)
'
+test_expect_success 'aborting on iso-8859-7' '
+
+ test_when_finished "git reset --hard HEAD~1" &&
+ test_config i18n.commitencoding iso-8859-7 &&
+ echo rosten >file &&
+ git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file &&
+ test_must_fail git fast-export --reencode=abort wer^..wer >iso-8859-7.fi
+'
+
+test_expect_success 'preserving iso-8859-7' '
+
+ test_when_finished "git reset --hard HEAD~1" &&
+ test_config i18n.commitencoding iso-8859-7 &&
+ echo rosten >file &&
+ git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file &&
+ git fast-export --reencode=no wer^..wer >iso-8859-7.fi &&
+ sed "s/wer/i18n-no-recoding/" iso-8859-7.fi |
+ (cd new &&
+ git fast-import &&
+ # The commit object, if not re-encoded, is 240 bytes.
+ # Removing the "encoding iso-8859-7\n" header would drops 20
+ # bytes. Re-encoding the Pi character from \xF0 (\360) in
+ # iso-8859-7 to \xCF\x80 (\317\200) in UTF-8 adds a byte.
+ # Check for the expected size...
+ test 240 -eq "$(git cat-file -s i18n-no-recoding)" &&
+ # ...as well as the expected byte.
+ git cat-file commit i18n-no-recoding >actual &&
+ grep $(printf "\360") actual &&
+ # Also make sure the commit has the "encoding" header
+ grep ^encoding actual)
+'
+
test_expect_success 'encoding preserved if reencoding fails' '
test_when_finished "git reset --hard HEAD~1" &&
test_config i18n.commitencoding iso-8859-7 &&
echo rosten >file &&
git commit -s -F "$TEST_DIRECTORY/t9350/broken-iso-8859-7-commit-message.txt" file &&
- git fast-export wer^..wer >iso-8859-7.fi &&
+ git fast-export --reencode=yes wer^..wer >iso-8859-7.fi &&
sed "s/wer/i18n-invalid/" iso-8859-7.fi |
(cd new &&
git fast-import &&
--
2.21.0.782.gd18f03d1bf
next prev parent reply other threads:[~2019-05-14 4:31 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-30 18:25 [PATCH v2 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-04-30 18:25 ` [PATCH v2 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-04-30 18:25 ` [PATCH v2 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-04-30 18:25 ` [PATCH v2 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-04-30 18:25 ` [PATCH v2 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-04-30 18:25 ` [PATCH v2 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-10 20:53 ` [PATCH v3 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-05-10 20:53 ` [PATCH v3 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-10 20:53 ` [PATCH v3 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-10 20:53 ` [PATCH v3 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-10 20:53 ` [PATCH v3 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-10 20:53 ` [PATCH v3 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-11 21:07 ` Torsten Bögershausen
2019-05-11 21:42 ` Elijah Newren
2019-05-13 7:48 ` Junio C Hamano
2019-05-13 13:24 ` Elijah Newren
2019-05-13 10:23 ` Johannes Schindelin
2019-05-13 12:56 ` Torsten Bögershausen
2019-05-13 13:29 ` Elijah Newren
2019-05-13 16:41 ` Elijah Newren
2019-05-13 10:14 ` [PATCH v3 0/5] Fix and extend encoding handling in fast export/import Johannes Schindelin
2019-05-13 16:47 ` [PATCH v4 " Elijah Newren
2019-05-13 16:47 ` [PATCH v4 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-13 16:47 ` [PATCH v4 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-13 16:47 ` [PATCH v4 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-13 16:47 ` [PATCH v4 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-13 16:47 ` [PATCH v4 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-13 22:32 ` Junio C Hamano
2019-05-13 23:17 ` [PATCH v5 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-05-13 23:17 ` [PATCH v5 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-14 2:50 ` Torsten Bögershausen
2019-05-13 23:17 ` [PATCH v5 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-13 23:17 ` [PATCH v5 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-14 2:56 ` Torsten Bögershausen
2019-05-13 23:17 ` [PATCH v5 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-14 3:01 ` Torsten Bögershausen
2019-05-13 23:17 ` [PATCH v5 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-14 0:19 ` Eric Sunshine
2019-05-14 4:30 ` [PATCH v6 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-05-14 4:30 ` [PATCH v6 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-14 4:30 ` [PATCH v6 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-14 4:31 ` [PATCH v6 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-14 4:31 ` [PATCH v6 4/5] fast-export: differentiate between explicitly UTF-8 and implicitly UTF-8 Elijah Newren
2019-05-14 4:31 ` Elijah Newren [this message]
2019-05-16 18:15 ` [PATCH v6 0/5] Fix and extend encoding handling in fast export/import Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190514043102.31979-6-newren@gmail.com \
--to=newren@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=sunshine@sunshineco.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.