* Git commit amend empty emails
@ 2014-12-10 13:17 Simon
2014-12-10 15:39 ` Jeff King
0 siblings, 1 reply; 7+ messages in thread
From: Simon @ 2014-12-10 13:17 UTC (permalink / raw)
To: git
Git is having empty email problems I think, I'm on git v2.1.3.
Steps to reproduce:
$ git init
Initialized empty Git repository in /tmp/test_git/.git/
$ echo 'test' > abc
$ git add --all 1 ↵
$ git commit --message 'test'
[master (root-commit) 3cc2793] test
1 file changed, 1 insertion(+)
create mode 100644 abc
$ echo 'test2' > abc
$ git commit --amend
fatal: Malformed ident string: 'admin <> 1418217345 +0000'
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Git commit amend empty emails
2014-12-10 13:17 Git commit amend empty emails Simon
@ 2014-12-10 15:39 ` Jeff King
2014-12-10 15:42 ` [PATCH 1/2] commit: loosen ident checks when generating template Jeff King
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jeff King @ 2014-12-10 15:39 UTC (permalink / raw)
To: Simon; +Cc: Junio C Hamano, git
On Thu, Dec 11, 2014 at 12:17:35AM +1100, Simon wrote:
> Git is having empty email problems I think, I'm on git v2.1.3.
>
> Steps to reproduce:
>
> $ git init
> Initialized empty Git repository in /tmp/test_git/.git/
> $ echo 'test' > abc
> $ git add --all 1 ↵
> $ git commit --message 'test'
> [master (root-commit) 3cc2793] test
> 1 file changed, 1 insertion(+)
> create mode 100644 abc
> $ echo 'test2' > abc
> $ git commit --amend
> fatal: Malformed ident string: 'admin <> 1418217345 +0000'
Yes, this looks like a regression in v2.1.0, due to my 4701026 (commit:
use split_ident_line to compare author/committer, 2014-05-01). That
commit added a call to sane_ident_split() when preparing the commit
message template, which is what is causing the error you see (and why
your first commit succeeds, but the latter fails).
The absolute simplest fix would be to remove the sane_ident_split call.
Though I have a different patch prepared, which argues that we should
not be doing any validation in this code path at all (see below).
However, there's something else going on. I am surprised that we allow
empty emails at all and the code here is quite strange. The first check
on the ident format is when we feed the data to fmt_ident to generate
the string that goes into the commit object. We disallow empty _names_
there, but not empty _emails_. I'm not sure if this is an oversight, or
an intentional historic compatibility thing.
And then it gets weirder. We take the output of fmt_ident, and then feed
that back into split_ident_line, so that we can set the GIT_AUTHOR_*
variables in the environment. And that does its _own_ set of checks, via
sane_ident_split.
Once upon a time, it relied only on split_ident_lane to report problems.
But Junio's e27ddb6 (split_ident_line(): make best effort when parsing
author/committer line, 2012-08-31) made split_ident_line more lenient,
and introduced sane_ident_split to cover the difference. Except that it
did more than that: besides checking whether the name is empty (which
the original split_ident_line used to do), it also complains if the
email is empty (which is new in that commit).
So we now notice the empty email in this code path, but the only thing
we do is avoid writing out the environment variables and continue. Which
means that the actual string generated by fmt_ident (complete with empty
email) is what goes into the commit. So why are we setting the
environment variables at all?
It looks like they are for the benefit of hooks, as evidenced by 7dfe8ad
(commit: pass author/committer info to hooks, 2012-03-11). So that means
that we are sometimes passing totally bogus data to hooks (i.e., an
ident is good enough to make it into a commit message, but
sane_ident_split prevents us from putting the data into the
environment).
So I think e27ddb6 was a little over-anxious in adding the email
validation, but nobody noticed because those checks don't actually
affect whether or not we commit. Hooks might see bogus data, but it is
in such a rare set of cases that nobody has noticed.
Here are two patches to improve this. These are on top of the
jk/commit-date-approxidate topic, as that is where the regression was
introduced.
The first one fixes the regression and can stand by itself. The second
fixes the GIT_AUTHOR problem, but AFAIK that has been there for years.
So it is not as urgent, but is still maint-worthy, in my opinion.
[1/2]: commit: loosen ident checks when generating template
[2/2]: commit: always populate GIT_AUTHOR_* variables
If we did want to truly disallow empty emails, we could do a follow-on
3/2 that teaches fmt_ident to reject them (that is the right place
because it is where the validation checks for the author go, and also
because we would probably want the same validation for the committer).
But I do not think we should do that lightly. It has been this way for
years, and clearly at least one person is depending on it. If we're
going to change it, we might want a warning/deprecation period.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] commit: loosen ident checks when generating template
2014-12-10 15:39 ` Jeff King
@ 2014-12-10 15:42 ` Jeff King
2014-12-11 20:26 ` Eric Sunshine
2014-12-10 15:43 ` [PATCH 2/2] commit: always populate GIT_AUTHOR_* variables Jeff King
2014-12-10 18:46 ` Git commit amend empty emails Junio C Hamano
2 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2014-12-10 15:42 UTC (permalink / raw)
To: Simon; +Cc: Junio C Hamano, git
When we generate the commit-message template, we try to
report an author or committer ident that will be of interest
to the user: an author that does not match the committer, or
a committer that was auto-configured.
When doing so, if we encounter what we consider to be a
bogus ident, we immediately die. This is a bad idea, because
our use of the idents here is purely informational. Any
ident rules should be enforced elsewhere, because commits
that do not invoke the editor will not even hit this code
path (e.g., "git commit -mfoo" would work, but "git commit"
would not). So at best, we are redundant with other checks,
and at worse, we actively prevent commits that should
otherwise be allowed.
We should therefore do the minimal parsing we can to get a
value and not do any validation (i.e., drop the call to
sane_ident_split()).
In theory we could notice when even our minimal parsing
fails to work, and do the sane thing for each check (e.g.,
if we have an author but can't parse the committer, assume
they are different and print the author). But we can
actually simplify this even further.
We know that the author and committer strings we are parsing
have been generated by us earlier in the program, and
therefore they must be parseable. We could just call
split_ident_line without even checking its return value,
knowing that it will put _something_ in the name/mail
fields. Of course, to protect ourselves against future
changes to the code, it makes sense to turn this into an
assert, so we are not surprised if our assumption fails.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/commit.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index d1c90db..2be5506 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -502,6 +502,12 @@ static int is_a_merge(const struct commit *current_head)
return !!(current_head->parents && current_head->parents->next);
}
+static void assert_split_ident(struct ident_split *id, const struct strbuf *buf)
+{
+ if (split_ident_line(id, buf->buf, buf->len))
+ die("BUG: unable to parse our own ident: %s", buf->buf);
+}
+
static void export_one(const char *var, const char *s, const char *e, int hack)
{
struct strbuf buf = STRBUF_INIT;
@@ -608,13 +614,6 @@ static void determine_author_info(struct strbuf *author_ident)
}
}
-static void split_ident_or_die(struct ident_split *id, const struct strbuf *buf)
-{
- if (split_ident_line(id, buf->buf, buf->len) ||
- !sane_ident_split(id))
- die(_("Malformed ident string: '%s'"), buf->buf);
-}
-
static int author_date_is_interesting(void)
{
return author_message || force_date;
@@ -822,8 +821,14 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
status_printf_ln(s, GIT_COLOR_NORMAL,
"%s", only_include_assumed);
- split_ident_or_die(&ai, author_ident);
- split_ident_or_die(&ci, &committer_ident);
+ /*
+ * These should never fail because we they come from our own
+ * fmt_ident. They may fail the sane_ident test, but we know
+ * that the name and mail pointers will at least be valid,
+ * which is enough for our tests and printing here.
+ */
+ assert_split_ident(&ai, author_ident);
+ assert_split_ident(&ci, &committer_ident);
if (ident_cmp(&ai, &ci))
status_printf_ln(s, GIT_COLOR_NORMAL,
--
2.2.0.454.g7eca6b7
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] commit: always populate GIT_AUTHOR_* variables
2014-12-10 15:39 ` Jeff King
2014-12-10 15:42 ` [PATCH 1/2] commit: loosen ident checks when generating template Jeff King
@ 2014-12-10 15:43 ` Jeff King
2014-12-10 18:46 ` Git commit amend empty emails Junio C Hamano
2 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2014-12-10 15:43 UTC (permalink / raw)
To: Simon; +Cc: Junio C Hamano, git
To figure out the author ident for a commit, we call
determine_author_info(). This function collects information
from the environment, other commits (in the case of
"--amend" or "-c/-C"), and the "--author" option. It then
uses fmt_ident to generate the final ident string that goes
into the commit object. fmt_ident is therefore responsible
for any quality or validation checks on what is allowed to
go into a commit.
Before returning, though, we call split_ident_line on the
result, and feed the individual components to hooks via the
GIT_AUTHOR_* variables. Furthermore, we do extra validation
by feeding the split to sane_ident_split(), which is pickier
than fmt_ident (in particular, it will complain about an empty
email field). If this parsing or validation fails, we skip
updating the environment variables.
This is bad, because it means that hooks may silently see a
different ident than what we are putting into the commit. We
should drop the extra sane_ident_split checks entirely, and
take whatever fmt_ident has fed us (and what will go into
the commit object).
If parsing fails, we should actually abort here rather than
continuing (and feeding the hooks bogus data). However,
split_ident_line should never fail here. The ident was just
generated by fmt_ident, so we know that it's sane. We can
use assert_split_ident to double-check this.
Note that we also teach that assertion to check that we
found a date (it always should, but until now, no caller
cared whether we found a date or not). Checking the return
value of sane_ident_split is enough to ensure we have the
name/email pointers set, and checking date_begin is enough
to know that all of the date/tz variables are set.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/commit.c | 26 +++++---------------------
1 file changed, 5 insertions(+), 21 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index 2be5506..f1a9e07 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -504,7 +504,7 @@ static int is_a_merge(const struct commit *current_head)
static void assert_split_ident(struct ident_split *id, const struct strbuf *buf)
{
- if (split_ident_line(id, buf->buf, buf->len))
+ if (split_ident_line(id, buf->buf, buf->len) || !id->date_begin)
die("BUG: unable to parse our own ident: %s", buf->buf);
}
@@ -518,20 +518,6 @@ static void export_one(const char *var, const char *s, const char *e, int hack)
strbuf_release(&buf);
}
-static int sane_ident_split(struct ident_split *person)
-{
- if (!person->name_begin || !person->name_end ||
- person->name_begin == person->name_end)
- return 0; /* no human readable name */
- if (!person->mail_begin || !person->mail_end ||
- person->mail_begin == person->mail_end)
- return 0; /* no usable mail */
- if (!person->date_begin || !person->date_end ||
- !person->tz_begin || !person->tz_end)
- return 0;
- return 1;
-}
-
static int parse_force_date(const char *in, char *out, int len)
{
if (len < 1)
@@ -606,12 +592,10 @@ static void determine_author_info(struct strbuf *author_ident)
}
strbuf_addstr(author_ident, fmt_ident(name, email, date, IDENT_STRICT));
- if (!split_ident_line(&author, author_ident->buf, author_ident->len) &&
- sane_ident_split(&author)) {
- export_one("GIT_AUTHOR_NAME", author.name_begin, author.name_end, 0);
- export_one("GIT_AUTHOR_EMAIL", author.mail_begin, author.mail_end, 0);
- export_one("GIT_AUTHOR_DATE", author.date_begin, author.tz_end, '@');
- }
+ assert_split_ident(&author, author_ident);
+ export_one("GIT_AUTHOR_NAME", author.name_begin, author.name_end, 0);
+ export_one("GIT_AUTHOR_EMAIL", author.mail_begin, author.mail_end, 0);
+ export_one("GIT_AUTHOR_DATE", author.date_begin, author.tz_end, '@');
}
static int author_date_is_interesting(void)
--
2.2.0.454.g7eca6b7
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Git commit amend empty emails
2014-12-10 15:39 ` Jeff King
2014-12-10 15:42 ` [PATCH 1/2] commit: loosen ident checks when generating template Jeff King
2014-12-10 15:43 ` [PATCH 2/2] commit: always populate GIT_AUTHOR_* variables Jeff King
@ 2014-12-10 18:46 ` Junio C Hamano
2014-12-10 19:45 ` Jeff King
2 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2014-12-10 18:46 UTC (permalink / raw)
To: Jeff King; +Cc: Simon, git
Jeff King <peff@peff.net> writes:
> However, there's something else going on. I am surprised that we allow
> empty emails at all and the code here is quite strange. The first check
> on the ident format is when we feed the data to fmt_ident to generate
> the string that goes into the commit object. We disallow empty _names_
> there, but not empty _emails_. I'm not sure if this is an oversight, or
> an intentional historic compatibility thing.
Looking at e27ddb6 you cited, I think we knew about historical
mistakes that allowed an empty names, but not an empty e-mail
address. We probably have tried to kill both in one stone.
> Once upon a time, it relied only on split_ident_lane to report problems.
> But Junio's e27ddb6 (split_ident_line(): make best effort when parsing
> author/committer line, 2012-08-31) made split_ident_line more lenient,
> and introduced sane_ident_split to cover the difference. Except that it
> did more than that: besides checking whether the name is empty (which
> the original split_ident_line used to do), it also complains if the
> email is empty (which is new in that commit).
> So we now notice the empty email in this code path, but the only thing
> we do is avoid writing out the environment variables and continue. Which
> means that the actual string generated by fmt_ident (complete with empty
> email) is what goes into the commit. So why are we setting the
> environment variables at all?
I think that part was more underthinking than oversight.
We didn't want to abort the commit but we didn't want to contaminate
the environment variables with known-to-be-bad values to spread the
problem further. But there is no guarantee that not exporting the
environment variables would give us more comformant name and e-mail
address, so that thinking is flawed.
> Here are two patches to improve this. These are on top of the
> jk/commit-date-approxidate topic, as that is where the regression was
> introduced.
>
> The first one fixes the regression and can stand by itself. The second
> fixes the GIT_AUTHOR problem, but AFAIK that has been there for years.
> So it is not as urgent, but is still maint-worthy, in my opinion.
>
> [1/2]: commit: loosen ident checks when generating template
> [2/2]: commit: always populate GIT_AUTHOR_* variables
>
> If we did want to truly disallow empty emails, we could do a follow-on
> 3/2 that teaches fmt_ident to reject them (that is the right place
> because it is where the validation checks for the author go, and also
> because we would probably want the same validation for the committer).
>
> But I do not think we should do that lightly. It has been this way for
> years, and clearly at least one person is depending on it. If we're
> going to change it, we might want a warning/deprecation period.
>
> -Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Git commit amend empty emails
2014-12-10 18:46 ` Git commit amend empty emails Junio C Hamano
@ 2014-12-10 19:45 ` Jeff King
0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2014-12-10 19:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Simon, git
On Wed, Dec 10, 2014 at 10:46:16AM -0800, Junio C Hamano wrote:
> > So we now notice the empty email in this code path, but the only thing
> > we do is avoid writing out the environment variables and continue. Which
> > means that the actual string generated by fmt_ident (complete with empty
> > email) is what goes into the commit. So why are we setting the
> > environment variables at all?
>
> I think that part was more underthinking than oversight.
>
> We didn't want to abort the commit but we didn't want to contaminate
> the environment variables with known-to-be-bad values to spread the
> problem further. But there is no guarantee that not exporting the
> environment variables would give us more comformant name and e-mail
> address, so that thinking is flawed.
That sort of makes sense, but I agree it is flawed. The real spread is
when the bogus data makes it into the commit objects themselves.
> > The first one fixes the regression and can stand by itself. The second
> > fixes the GIT_AUTHOR problem, but AFAIK that has been there for years.
> > So it is not as urgent, but is still maint-worthy, in my opinion.
> >
> > [1/2]: commit: loosen ident checks when generating template
> > [2/2]: commit: always populate GIT_AUTHOR_* variables
By the way, as I said I built these on the original regression. They
will have some minor textual conflicts if you merge them up to master,
due to the jk/commit-author-parsing topic. Please me know if you run
into any trouble merging.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] commit: loosen ident checks when generating template
2014-12-10 15:42 ` [PATCH 1/2] commit: loosen ident checks when generating template Jeff King
@ 2014-12-11 20:26 ` Eric Sunshine
0 siblings, 0 replies; 7+ messages in thread
From: Eric Sunshine @ 2014-12-11 20:26 UTC (permalink / raw)
To: Jeff King; +Cc: Simon, Junio C Hamano, Git List
On Wed, Dec 10, 2014 at 10:42 AM, Jeff King <peff@peff.net> wrote:
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/builtin/commit.c b/builtin/commit.c
> index d1c90db..2be5506 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -822,8 +821,14 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
> status_printf_ln(s, GIT_COLOR_NORMAL,
> "%s", only_include_assumed);
>
> - split_ident_or_die(&ai, author_ident);
> - split_ident_or_die(&ci, &committer_ident);
> + /*
> + * These should never fail because we they come from our own
s/we//
> + * fmt_ident. They may fail the sane_ident test, but we know
> + * that the name and mail pointers will at least be valid,
> + * which is enough for our tests and printing here.
> + */
> + assert_split_ident(&ai, author_ident);
> + assert_split_ident(&ci, &committer_ident);
>
> if (ident_cmp(&ai, &ci))
> status_printf_ln(s, GIT_COLOR_NORMAL,
> --
> 2.2.0.454.g7eca6b7
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-12-11 20:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 13:17 Git commit amend empty emails Simon
2014-12-10 15:39 ` Jeff King
2014-12-10 15:42 ` [PATCH 1/2] commit: loosen ident checks when generating template Jeff King
2014-12-11 20:26 ` Eric Sunshine
2014-12-10 15:43 ` [PATCH 2/2] commit: always populate GIT_AUTHOR_* variables Jeff King
2014-12-10 18:46 ` Git commit amend empty emails Junio C Hamano
2014-12-10 19:45 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).