* [PATCH] fast-import docs: LT is valid in email, GT is not @ 2010-04-24 0:45 Mark Lodato 2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder 2010-04-24 16:12 ` [PATCH] fast-import docs: LT is valid in email, GT is not Jonathan Nieder 0 siblings, 2 replies; 13+ messages in thread From: Mark Lodato @ 2010-04-24 0:45 UTC (permalink / raw) To: git; +Cc: Mark Lodato In git-fast-import(1), fix a mistake that said LT and LF were the invalid characters in email addresses. This should have been GT and LF, since the GT ends the email address and LF ends the command. Signed-off-by: Mark Lodato <lodatom@gmail.com> --- Documentation/git-fast-import.txt | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 19082b0..6dcd583 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -394,7 +394,7 @@ Here `<name>` is the person's display name (for example and greater-than (\x3e) symbols. These are required to delimit the email address from the other fields in the line. Note that `<name>` is free-form and may contain any sequence of bytes, except -`LT` and `LF`. It is typically UTF-8 encoded. +`GT` and `LF`. It is typically UTF-8 encoded. The time of the change is specified by `<when>` using the date format that was selected by the \--date-format=<fmt> command line option. -- 1.7.0.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] fsck: check ident lines in commit objects 2010-04-24 0:45 [PATCH] fast-import docs: LT is valid in email, GT is not Mark Lodato @ 2010-04-24 16:06 ` Jonathan Nieder 2010-04-24 16:59 ` Jonathan Nieder 2010-04-24 19:04 ` Shawn O. Pearce 2010-04-24 16:12 ` [PATCH] fast-import docs: LT is valid in email, GT is not Jonathan Nieder 1 sibling, 2 replies; 13+ messages in thread From: Jonathan Nieder @ 2010-04-24 16:06 UTC (permalink / raw) To: Mark Lodato; +Cc: git, Shawn O. Pearce, Nicolas Pitre Check that email addresses do not contain <, >, or newline so they can be quickly scanned without trouble. The copy() function in ident.c already ensures that ordinary git commands will not write email addresses without this property. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Thoughts? Should some of these errors be warnings? git fast-import is capable of producing commits with some of these problems: for example, it is fine with committer C O Mitter <foo@b>ar.net> 005 - +5 fsck.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ t/t1450-fsck.sh | 25 +++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 0 deletions(-) diff --git a/fsck.c b/fsck.c index 89278c1..ae9ae1a 100644 --- a/fsck.c +++ b/fsck.c @@ -222,12 +222,47 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func) return retval; } +static int fsck_ident(char **ident, struct object *obj, fsck_error error_func) +{ + if (**ident == '<' || **ident == '\n') + return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing space before email"); + *ident += strcspn(*ident, "<\n"); + if ((*ident)[-1] != ' ') + return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing space before email"); + if (**ident != '<') + return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing email"); + (*ident)++; + *ident += strcspn(*ident, "<>\n"); + if (**ident != '>') + return error_func(obj, FSCK_ERROR, "invalid author/committer line - bad email"); + (*ident)++; + if (**ident != ' ') + return error_func(obj, FSCK_ERROR, "invalid author/committer line - missing space before date"); + (*ident)++; + if (**ident == '0' && (*ident)[1] != ' ') + return error_func(obj, FSCK_ERROR, "invalid author/committer line - zero-padded date"); + *ident += strspn(*ident, "0123456789"); + if (**ident != ' ') + return error_func(obj, FSCK_ERROR, "invalid author/committer line - bad date"); + (*ident)++; + if ((**ident != '+' && **ident != '-') || + !isdigit((*ident)[1]) || + !isdigit((*ident)[2]) || + !isdigit((*ident)[3]) || + !isdigit((*ident)[4]) || + ((*ident)[5] != '\n')) + return error_func(obj, FSCK_ERROR, "invalid author/committer line - bad time zone"); + (*ident) += 6; + return 0; +} + static int fsck_commit(struct commit *commit, fsck_error error_func) { char *buffer = commit->buffer; unsigned char tree_sha1[20], sha1[20]; struct commit_graft *graft; int parents = 0; + int err; if (commit->date == ULONG_MAX) return error_func(&commit->object, FSCK_ERROR, "invalid author/committer line"); @@ -266,6 +301,18 @@ static int fsck_commit(struct commit *commit, fsck_error error_func) } if (memcmp(buffer, "author ", 7)) return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'author' line"); + buffer += 7; + err = fsck_ident(&buffer, &commit->object, error_func); + if (err) + return err; + if (memcmp(buffer, "committer ", strlen("committer "))) + return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'committer' line"); + buffer += strlen("committer "); + err = fsck_ident(&buffer, &commit->object, error_func); + if (err) + return err; + if (*buffer != '\n') + return error_func(&commit->object, FSCK_ERROR, "invalid format - expected blank line"); if (!commit->tree) return error_func(&commit->object, FSCK_ERROR, "could not load commit's tree %s", sha1_to_hex(tree_sha1)); diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 49cae3e..d8eed9b 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -57,6 +57,31 @@ test_expect_success 'branch pointing to non-commit' ' git update-ref -d refs/heads/invalid ' +new=nothing +test_expect_success 'email without @ is okay' ' + git cat-file commit HEAD >basis && + sed "s/@/AT/" basis >okay && + new=$(git hash-object -t commit -w --stdin <okay) && + echo "$new" && + git update-ref refs/heads/bogus "$new" && + git fsck +' +git update-ref -d refs/heads/bogus +rm -f ".git/objects/$new" + +new=nothing +test_expect_success 'email with embedded > is not okay' ' + git cat-file commit HEAD >basis && + sed "s/@[a-z]/&>/" basis >bad-email && + new=$(git hash-object -t commit -w --stdin <bad-email) && + echo "$new" && + git update-ref refs/heads/bogus "$new" && + git fsck 2>out && + grep "error in commit $new" out +' +git update-ref -d refs/heads/bogus +rm -f ".git/objects/$new" + cat > invalid-tag <<EOF object ffffffffffffffffffffffffffffffffffffffff type commit -- 1.7.0.6.2.g02f3f0.dirty ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] fsck: check ident lines in commit objects 2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder @ 2010-04-24 16:59 ` Jonathan Nieder 2010-04-24 19:04 ` Shawn O. Pearce 1 sibling, 0 replies; 13+ messages in thread From: Jonathan Nieder @ 2010-04-24 16:59 UTC (permalink / raw) To: Mark Lodato; +Cc: git, Shawn O. Pearce, Nicolas Pitre Jonathan Nieder wrote: > Check that email addresses do not contain <, >, or newline so they can > be quickly scanned without trouble. Test was bogus: the object format tests in fsck do not affect its exit code. Here’s a fixup. Sorry for the trouble, Jonathan diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index d8eed9b..22a80c8 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -64,7 +64,9 @@ test_expect_success 'email without @ is okay' ' new=$(git hash-object -t commit -w --stdin <okay) && echo "$new" && git update-ref refs/heads/bogus "$new" && - git fsck + git fsck 2>out && + cat out && + ! grep "error in commit $new" out ' git update-ref -d refs/heads/bogus rm -f ".git/objects/$new" @@ -77,6 +79,7 @@ test_expect_success 'email with embedded > is not okay' ' echo "$new" && git update-ref refs/heads/bogus "$new" && git fsck 2>out && + cat out && grep "error in commit $new" out ' git update-ref -d refs/heads/bogus ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] fsck: check ident lines in commit objects 2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder 2010-04-24 16:59 ` Jonathan Nieder @ 2010-04-24 19:04 ` Shawn O. Pearce 2010-04-24 20:38 ` [PATCH 0/2] fast-import: tighten up parsing ident line Jonathan Nieder 1 sibling, 1 reply; 13+ messages in thread From: Shawn O. Pearce @ 2010-04-24 19:04 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Mark Lodato, git, Nicolas Pitre Jonathan Nieder <jrnieder@gmail.com> wrote: > Check that email addresses do not contain <, >, or newline so they can > be quickly scanned without trouble. The copy() function in ident.c > already ensures that ordinary git commands will not write email > addresses without this property. > > Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> > --- > Thoughts? Should some of these errors be warnings? These should be errors. We should never see this sort of thing occur in a live repository. > git fast-import is capable of producing commits with some of these > problems: for example, it is fine with > > committer C O Mitter <foo@b>ar.net> 005 - +5 Yuck. We probably should tighten up the parser in fast-import a bit more. The above is pretty insane for it to produce into the repository. I can't even begin to count how many ways the above line is just wrong... :-) -- Shawn. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/2] fast-import: tighten up parsing ident line 2010-04-24 19:04 ` Shawn O. Pearce @ 2010-04-24 20:38 ` Jonathan Nieder 2010-04-24 20:50 ` [PATCH 1/2] fast-import: be strict about formatting of raw dates Jonathan Nieder 2010-04-24 21:10 ` [PATCH 2/2] fast-import: validate entire ident string Jonathan Nieder 0 siblings, 2 replies; 13+ messages in thread From: Jonathan Nieder @ 2010-04-24 20:38 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre Shawn O. Pearce wrote: > Jonathan Nieder <jrnieder@gmail.com> wrote: >> git fast-import is capable of producing commits with some of these >> problems: for example, it is fine with >> >> committer C O Mitter <foo@b>ar.net> 005 - +5 > > Yuck. We probably should tighten up the parser in fast-import a > bit more. How about this? Jonathan Nieder (2): fast-import: be strict about formatting of dates fast-import: validate entire ident string Documentation/git-fast-import.txt | 9 ++-- fast-import.c | 63 ++++++++++++++-------- t/t9300-fast-import.sh | 105 +++++++++++++++++++++++++++++++++++++ 3 files changed, 150 insertions(+), 27 deletions(-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] fast-import: be strict about formatting of raw dates 2010-04-24 20:38 ` [PATCH 0/2] fast-import: tighten up parsing ident line Jonathan Nieder @ 2010-04-24 20:50 ` Jonathan Nieder 2010-04-24 21:10 ` [PATCH 2/2] fast-import: validate entire ident string Jonathan Nieder 1 sibling, 0 replies; 13+ messages in thread From: Jonathan Nieder @ 2010-04-24 20:50 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre git proper never zero- or space-pads its dates and its time zones are always 4 digits long. Require fast-import front-ends to behave likewise to avoid generating puzzling objects. Since this makes the input format more strict, some front-ends may not be happy. But they were warned: This is the Git native format and is <time> SP <offutc>. It is also fast-import’s default format, if --date-format was not specified. Unlike the rfc2822 format, this format is very strict. Any variation in formatting will cause fast-import to reject the value. Aside from ensuring the format is predictable so tools like git can handle it, making the date format this strict ensures that there is only one valid representation for a given date and time zone, which would be useful for round-trip conversion of objects to and from other formats (for storage by other version control systems, for example). Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Is -0000 the same time zone as +0000? I wasn’t sure so I erred on the side of not worrying about it. fast-import.c | 9 +++++++-- t/t9300-fast-import.sh | 30 ++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/fast-import.c b/fast-import.c index 74f08bd..1701cf1 100644 --- a/fast-import.c +++ b/fast-import.c @@ -1906,6 +1906,8 @@ static int validate_raw_date(const char *src, char *result, int maxlen) errno = 0; + if ((*src == '0' && isdigit(src[1])) || !isdigit(*src)) + return -1; num = strtoul(src, &endp, 10); /* NEEDSWORK: perhaps check for reasonable values? */ if (errno || endp == src || *endp != ' ') @@ -1915,8 +1917,11 @@ static int validate_raw_date(const char *src, char *result, int maxlen) if (*src != '-' && *src != '+') return -1; - num = strtoul(src + 1, &endp, 10); - if (errno || endp == src + 1 || *endp || (endp - orig_src) >= maxlen || + src++; + if (*src != '0' && *src != '1') + return -1; + num = strtoul(src, &endp, 10); + if (errno || endp != src + 4 || *endp || (endp - orig_src) >= maxlen || 1400 < num) return -1; diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 131f032..ed653a7 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -348,6 +348,36 @@ test_expect_success \ cat >input <<INPUT_END commit refs/heads/branch +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 1170783301 - 0500 +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 117078330 -0500 +data <<COMMIT +Malformed time zone +COMMIT + +from refs/heads/branch^0 + +INPUT_END +test_expect_success 'E: blanks in raw time zone' ' + test_must_fail git fast-import --date-format=raw <input +' + +cat >input <<INPUT_END +commit refs/heads/branch +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 01170783301 -0500 +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 117078330 -0500 +data <<COMMIT +Malformed date +COMMIT + +from refs/heads/branch^0 + +INPUT_END +test_expect_success 'E: leading zero in raw date' ' + test_must_fail git fast-import --date-format=raw <input +' + +cat >input <<INPUT_END +commit refs/heads/branch author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Tue Feb 6 11:22:18 2007 -0500 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:02 2007 -0500 data <<COMMIT -- 1.7.1.rc1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] fast-import: validate entire ident string 2010-04-24 20:38 ` [PATCH 0/2] fast-import: tighten up parsing ident line Jonathan Nieder 2010-04-24 20:50 ` [PATCH 1/2] fast-import: be strict about formatting of raw dates Jonathan Nieder @ 2010-04-24 21:10 ` Jonathan Nieder 2010-04-26 16:02 ` Shawn O. Pearce 1 sibling, 1 reply; 13+ messages in thread From: Jonathan Nieder @ 2010-04-24 21:10 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre The author, committer, and tagger name and email should not include any embedded <, >, or newline characters. The format of the identification string is ('author'|'committer'|'tagger') sp name sp < email > sp date If an object has no name attached, then git expects to find two spaces in a row. Helped-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- For malformed input, the parser in pretty.c and ‘git commit --amend’ tend to end up with different ideas of who the author is. A lot of the time, commit --amend gives up with "fatal: invalid commit". Documentation/git-fast-import.txt | 9 ++-- fast-import.c | 54 ++++++++++++++++---------- t/t9300-fast-import.sh | 75 +++++++++++++++++++++++++++++++++++++ 3 files changed, 113 insertions(+), 25 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 19082b0..ee725c6 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -337,8 +337,8 @@ change to the project. .... 'commit' SP <ref> LF mark? - ('author' (SP <name>)? SP LT <email> GT SP <when> LF)? - 'committer' (SP <name>)? SP LT <email> GT SP <when> LF + ('author' SP <name>? SP LT <email> GT SP <when> LF)? + 'committer' SP <name>? SP LT <email> GT SP <when> LF data ('from' SP <committish> LF)? ('merge' SP <committish> LF)? @@ -393,8 +393,9 @@ Here `<name>` is the person's display name (for example (``cm@example.com''). `LT` and `GT` are the literal less-than (\x3c) and greater-than (\x3e) symbols. These are required to delimit the email address from the other fields in the line. Note that -`<name>` is free-form and may contain any sequence of bytes, except -`LT` and `LF`. It is typically UTF-8 encoded. +`<name>` and `<email>` are free-form and may contain any sequence +of bytes that are not `LT`, `GT`, or `LF`. Both are typically UTF-8 +encoded. The time of the change is specified by `<when>` using the date format that was selected by the \--date-format=<fmt> command line option. diff --git a/fast-import.c b/fast-import.c index 1701cf1..d919168 100644 --- a/fast-import.c +++ b/fast-import.c @@ -19,8 +19,8 @@ Format of STDIN stream: new_commit ::= 'commit' sp ref_str lf mark? - ('author' (sp name)? sp '<' email '>' sp when lf)? - 'committer' (sp name)? sp '<' email '>' sp when lf + ('author' sp name? sp '<' email '>' sp when lf)? + 'committer' sp name? sp '<' email '>' sp when lf commit_msg ('from' sp committish lf)? ('merge' sp committish lf)* @@ -47,7 +47,7 @@ Format of STDIN stream: new_tag ::= 'tag' sp tag_str lf 'from' sp committish lf - ('tagger' (sp name)? sp '<' email '>' sp when lf)? + ('tagger' sp name? sp '<' email '>' sp when lf)? tag_msg; tag_msg ::= data; @@ -123,9 +123,8 @@ Format of STDIN stream: sha1exp ::= # Any valid GIT SHA1 expression; hexsha1 ::= # SHA1 in hexadecimal format; - # note: name and email are UTF8 strings, however name must not - # contain '<' or lf and email must not contain any of the - # following: '<', '>', lf. + # note: name and email are UTF8 strings, however name and email + # must not contain any of the following: '<', '>', lf. # name ::= # valid GIT author/committer name; email ::= # valid GIT author/committer email; @@ -1929,34 +1928,47 @@ static int validate_raw_date(const char *src, char *result, int maxlen) return 0; } -static char *parse_ident(const char *buf) +static size_t parse_name_and_email(const char *src, char **result, size_t extra) { - const char *gt; + const char *lt, *gt; size_t name_len; - char *ident; - gt = strrchr(buf, '>'); - if (!gt) - die("Missing > in ident string: %s", buf); + lt = src + strcspn(src, "<>\n"); + if (lt == src || lt[-1] != ' ' || *lt != '<') + die("Invalid name in ident string: %s", src); + gt = lt + 1 + strcspn(lt + 1, "<>\n"); + if (*gt != '>') + die("Invalid email in ident string: %s", src); gt++; if (*gt != ' ') - die("Missing space after > in ident string: %s", buf); + die("Missing space after > in ident string: %s", src); gt++; - name_len = gt - buf; - ident = xmalloc(name_len + 24); - strncpy(ident, buf, name_len); + name_len = gt - src; + *result = xmalloc(name_len + extra); + memcpy(*result, src, name_len); + return name_len; +} + +static char *parse_ident(const char *buf) +{ + const char *date; + size_t name_len; + char *ident; + + name_len = parse_name_and_email(buf, &ident, 24); + date = buf + name_len; switch (whenspec) { case WHENSPEC_RAW: - if (validate_raw_date(gt, ident + name_len, 24) < 0) - die("Invalid raw date \"%s\" in ident: %s", gt, buf); + if (validate_raw_date(date, ident + name_len, 24) < 0) + die("Invalid raw date \"%s\" in ident: %s", date, buf); break; case WHENSPEC_RFC2822: - if (parse_date(gt, ident + name_len, 24) < 0) - die("Invalid rfc2822 date \"%s\" in ident: %s", gt, buf); + if (parse_date(date, ident + name_len, 24) < 0) + die("Invalid rfc2822 date \"%s\" in ident: %s", date, buf); break; case WHENSPEC_NOW: - if (strcmp("now", gt)) + if (strcmp("now", date)) die("Date in ident must be 'now': %s", buf); datestamp(ident + name_len, 24); break; diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index ed653a7..a7e379f 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -348,6 +348,81 @@ test_expect_success \ cat >input <<INPUT_END commit refs/heads/branch +author <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 14:49:52 -0500 +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:01 2007 -0500 +data <<COMMIT +Nameless author, first attempt +COMMIT + +from refs/heads/branch^0 + +INPUT_END +test_expect_success 'E: require space after author name' ' + test_must_fail git fast-import --date-format=rfc2822 <input +' + +cat >input <<INPUT_END +commit refs/heads/branch +author <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 14:49:52 -0500 +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:01 2007 -0500 +data <<COMMIT +Nameless author +COMMIT + +from refs/heads/branch^0 + +INPUT_END +test_expect_success 'E: do not require author name, though' ' + git fast-import --date-format=rfc2822 <input +' + +cat >input <<INPUT_END +commit refs/heads/branch +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 14:49:52 -0500 +committer C O >Mitter <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:01 2007 -0500 +data <<COMMIT +Odd committer +COMMIT + +from refs/heads/branch^0 + +INPUT_END +test_expect_success 'E: unparsable committer' ' + test_must_fail git fast-import --date-format=rfc2822 <input +' + +cat >input <<INPUT_END +commit refs/heads/branch +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 15:05:27 -0500 +committer $GIT_COMMITTER_NAME <aggh@<example.com> Tue Feb 6 12:35:01 2007 -0500 +data <<COMMIT +Odd email +COMMIT + +from refs/heads/branch^0 + +INPUT_END +test_expect_success 'E: unparsable email' ' + test_must_fail git fast-import --date-format=rfc2822 <input +' + +cat >input <<INPUT_END +commit refs/heads/branch +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Sat, 24 Apr 2010 15:05:27 -0500 +committer $GIT_COMMITTER_NAME <äggh!some!other!machine!example> Tue Feb 6 12:35:01 2007 -0500 +data <<COMMIT +Bang path +COMMIT + +from refs/heads/branch^0 + +INPUT_END +test_expect_success 'E: okay email' ' + git fast-import --date-format=rfc2822 <input +' + +cat >input <<INPUT_END +commit refs/heads/branch author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 1170783301 - 0500 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 117078330 -0500 data <<COMMIT -- 1.7.1.rc1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fast-import: validate entire ident string 2010-04-24 21:10 ` [PATCH 2/2] fast-import: validate entire ident string Jonathan Nieder @ 2010-04-26 16:02 ` Shawn O. Pearce 2010-04-26 16:24 ` Jonathan Nieder 2010-05-04 17:11 ` Junio C Hamano 0 siblings, 2 replies; 13+ messages in thread From: Shawn O. Pearce @ 2010-04-26 16:02 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Mark Lodato, git, Nicolas Pitre Jonathan Nieder <jrnieder@gmail.com> wrote: > The author, committer, and tagger name and email should not include > any embedded <, >, or newline characters. The format of the > identification string is > > ('author'|'committer'|'tagger') sp name sp < email > sp date > > If an object has no name attached, then git expects to find two spaces > in a row. This is going to be a problem I think. Some importers are probably writing "committer <bob> ...." when pulling from systems that don't have a concept of name vs. email (e.g. CVS or SVN). I highly suspect that requiring two spaces here will cause a lot of importers to fail. If we really need to require two spaces, I think we need to honor the documented input format but rewrite the line inside of the import process to match the two space convention. -- Shawn. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fast-import: validate entire ident string 2010-04-26 16:02 ` Shawn O. Pearce @ 2010-04-26 16:24 ` Jonathan Nieder 2010-04-26 16:30 ` Jonathan Nieder 2010-05-04 17:11 ` Junio C Hamano 1 sibling, 1 reply; 13+ messages in thread From: Jonathan Nieder @ 2010-04-26 16:24 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre Shawn O. Pearce wrote: > Some importers are probably > writing "committer <bob> ...." when pulling from systems that don't > have a concept of name vs. email (e.g. CVS or SVN). I highly suspect > that requiring two spaces here will cause a lot of importers to fail. > > If we really need to require two spaces, It is not a huge deal, but ‘git commit --amend’ will die with "invalid commit" if it does not find a “ <” sequence after the “author ” string. Maybe that should be changed. Patch below. > I think we need to honor > the documented input format but rewrite the line inside of the > import process to match the two space convention. Yes, that’s doable. Thanks for the feedback, Jonathan diff --git a/builtin/commit.c b/builtin/commit.c index c5ab683..c56f2c9 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -449,19 +449,23 @@ static void determine_author_info(void) date = getenv("GIT_AUTHOR_DATE"); if (use_message && !renew_authorship) { - const char *a, *lb, *rb, *eol; + const char *a, *n, *lb, *rb, *eol; a = strstr(use_message_buffer, "\nauthor "); if (!a) die("invalid commit: %s", use_message); - lb = strstr(a + 8, " <"); - rb = strstr(a + 8, "> "); - eol = strchr(a + 8, '\n'); + n = a + strlen("\nauthor"); + lb = strstr(n, " <"); + rb = strstr(lb + 2, "> "); + eol = strchr(rb + 2, '\n'); if (!lb || !rb || !eol) die("invalid commit: %s", use_message); - name = xstrndup(a + 8, lb - (a + 8)); + if (lb == n) + name = xstrndup("", 0); + else + name = xstrndup(n + 1, lb - (n + 1)); email = xstrndup(lb + 2, rb - (lb + 2)); date = xstrndup(rb + 2, eol - (rb + 2)); } @@ -470,7 +474,7 @@ static void determine_author_info(void) const char *lb = strstr(force_author, " <"); const char *rb = strchr(force_author, '>'); - if (!lb || !rb) + if (!lb || !rb || rb < lb) die("malformed --author parameter"); name = xstrndup(force_author, lb - force_author); email = xstrndup(lb + 2, rb - (lb + 2)); -- ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fast-import: validate entire ident string 2010-04-26 16:24 ` Jonathan Nieder @ 2010-04-26 16:30 ` Jonathan Nieder 0 siblings, 0 replies; 13+ messages in thread From: Jonathan Nieder @ 2010-04-26 16:30 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Mark Lodato, git, Nicolas Pitre Jonathan Nieder wrote: > - lb = strstr(a + 8, " <"); > - rb = strstr(a + 8, "> "); > - eol = strchr(a + 8, '\n'); > + n = a + strlen("\nauthor"); > + lb = strstr(n, " <"); > + rb = strstr(lb + 2, "> "); > + eol = strchr(rb + 2, '\n'); > if (!lb || !rb || !eol) > die("invalid commit: %s", use_message); Err, this will segv when it fails; better to use lb = a + strlen("\nauthor "); lb = strchrnul(lb, '<'); rb = strchrnul(lb, '>'); eol = strchrnul(rb, '\n'); if (!*lb || !*rb || !*eol) die("invalid commit: %s", use_message); This is even more permissive, but I think that’s okay. Sorry for the noise. Jonathan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fast-import: validate entire ident string 2010-04-26 16:02 ` Shawn O. Pearce 2010-04-26 16:24 ` Jonathan Nieder @ 2010-05-04 17:11 ` Junio C Hamano 1 sibling, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2010-05-04 17:11 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jonathan Nieder, Mark Lodato, git, Nicolas Pitre "Shawn O. Pearce" <spearce@spearce.org> writes: > Jonathan Nieder <jrnieder@gmail.com> wrote: >> The author, committer, and tagger name and email should not include >> any embedded <, >, or newline characters. The format of the >> identification string is >> >> ('author'|'committer'|'tagger') sp name sp < email > sp date >> >> If an object has no name attached, then git expects to find two spaces >> in a row. > > This is going to be a problem I think. Some importers are probably > writing "committer <bob> ...." when pulling from systems that don't > have a concept of name vs. email (e.g. CVS or SVN). I highly suspect > that requiring two spaces here will cause a lot of importers to fail. > > If we really need to require two spaces, I think we need to honor > the documented input format but rewrite the line inside of the > import process to match the two space convention. It probably should document [sp name] (or [name sp]) an "zero or one" item, if we want to be lenient. I also think it may not be such a bad idea to allow fast-import to add a phoney "name" by taking everything in e-mail before the first '@', just like how the git-cvsimport and git-svn does, when the input stream does not have a "name". I am not sure how usable "shortlog" output would be otherwise, without any "name" field. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fast-import docs: LT is valid in email, GT is not 2010-04-24 0:45 [PATCH] fast-import docs: LT is valid in email, GT is not Mark Lodato 2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder @ 2010-04-24 16:12 ` Jonathan Nieder 2010-04-24 16:59 ` Mark Lodato 1 sibling, 1 reply; 13+ messages in thread From: Jonathan Nieder @ 2010-04-24 16:12 UTC (permalink / raw) To: Mark Lodato; +Cc: git Hi Mark, Mark Lodato wrote: > +++ b/Documentation/git-fast-import.txt > @@ -394,7 +394,7 @@ Here `<name>` is the person's display name (for example > and greater-than (\x3e) symbols. These are required to delimit > the email address from the other fields in the line. Note that > `<name>` is free-form and may contain any sequence of bytes, except > -`LT` and `LF`. It is typically UTF-8 encoded. > +`GT` and `LF`. It is typically UTF-8 encoded. Here <name> is the person’s display name (for example “Com M Itter”) So the original text is correct --- a <name> cannot contain LT because a less-than sign marks the boundary between a name and email address. Maybe you were wondering what characters are valid in an e-mail address? The comments in fast-import.c and code in ident.c are consistent about this: the forbidden characters are <, >, and LF, though no one seems to check (see also my other reply). A patch to explain this (including a reference to git-commit-tree(1), I guess) might be useful. git won’t understand an email with embedded > or LF. I’m not sure a < would cause problems, but I don’t mind that it is disallowed. Hope that helps, Jonathan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fast-import docs: LT is valid in email, GT is not 2010-04-24 16:12 ` [PATCH] fast-import docs: LT is valid in email, GT is not Jonathan Nieder @ 2010-04-24 16:59 ` Mark Lodato 0 siblings, 0 replies; 13+ messages in thread From: Mark Lodato @ 2010-04-24 16:59 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git On Sat, Apr 24, 2010 at 12:12 PM, Jonathan Nieder <jrnieder@gmail.com> wrote: > Mark Lodato wrote: >> +++ b/Documentation/git-fast-import.txt >> @@ -394,7 +394,7 @@ Here `<name>` is the person's display name (for example >> and greater-than (\x3e) symbols. These are required to delimit >> the email address from the other fields in the line. Note that >> `<name>` is free-form and may contain any sequence of bytes, except >> -`LT` and `LF`. It is typically UTF-8 encoded. >> +`GT` and `LF`. It is typically UTF-8 encoded. > > Here <name> is the person’s display name (for example > “Com M Itter”) > > So the original text is correct --- a <name> cannot contain LT because > a less-than sign marks the boundary between a name and email address. Ah, you're right, sorry. I thought it was <email>, not <name>. > Maybe you were wondering what characters are valid in an e-mail address? > The comments in fast-import.c and code in ident.c are consistent about > this: the forbidden characters are <, >, and LF, though no one seems to > check (see also my other reply). A patch to explain this (including a > reference to git-commit-tree(1), I guess) might be useful. > > git won’t understand an email with embedded > or LF. I’m not sure a < > would cause problems, but I don’t mind that it is disallowed. It seems like it would be good to disallow <, >, and LF in both name and email. With your other patch, > is allowed in the name. Thanks for the clarification, Mark ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-05-04 17:11 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-24 0:45 [PATCH] fast-import docs: LT is valid in email, GT is not Mark Lodato 2010-04-24 16:06 ` [PATCH] fsck: check ident lines in commit objects Jonathan Nieder 2010-04-24 16:59 ` Jonathan Nieder 2010-04-24 19:04 ` Shawn O. Pearce 2010-04-24 20:38 ` [PATCH 0/2] fast-import: tighten up parsing ident line Jonathan Nieder 2010-04-24 20:50 ` [PATCH 1/2] fast-import: be strict about formatting of raw dates Jonathan Nieder 2010-04-24 21:10 ` [PATCH 2/2] fast-import: validate entire ident string Jonathan Nieder 2010-04-26 16:02 ` Shawn O. Pearce 2010-04-26 16:24 ` Jonathan Nieder 2010-04-26 16:30 ` Jonathan Nieder 2010-05-04 17:11 ` Junio C Hamano 2010-04-24 16:12 ` [PATCH] fast-import docs: LT is valid in email, GT is not Jonathan Nieder 2010-04-24 16:59 ` Mark Lodato
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).