git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Simon <simonzack@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: Git commit amend empty emails
Date: Wed, 10 Dec 2014 10:39:53 -0500	[thread overview]
Message-ID: <20141210153952.GA14910@peff.net> (raw)
In-Reply-To: <548847EF.7080805@gmail.com>

On Thu, Dec 11, 2014 at 12:17:35AM +1100, Simon wrote:

> Git is having empty email problems I think, I'm on git v2.1.3.
> 
> Steps to reproduce:
> 
>     $ git init
>     Initialized empty Git repository in /tmp/test_git/.git/
>     $ echo 'test' > abc
>     $ git add --all 1 ↵
>     $ git commit --message 'test'
>     [master (root-commit) 3cc2793] test
>      1 file changed, 1 insertion(+)
>      create mode 100644 abc
>     $ echo 'test2' > abc
>     $ git commit --amend
>     fatal: Malformed ident string: 'admin <> 1418217345 +0000'

Yes, this looks like a regression in v2.1.0, due to my 4701026 (commit:
use split_ident_line to compare author/committer, 2014-05-01).  That
commit added a call to sane_ident_split() when preparing the commit
message template, which is what is causing the error you see (and why
your first commit succeeds, but the latter fails).

The absolute simplest fix would be to remove the sane_ident_split call.
Though I have a different patch prepared, which argues that we should
not be doing any validation in this code path at all (see below).

However, there's something else going on. I am surprised that we allow
empty emails at all and the code here is quite strange. The first check
on the ident format is when we feed the data to fmt_ident to generate
the string that goes into the commit object.  We disallow empty _names_
there, but not empty _emails_.  I'm not sure if this is an oversight, or
an intentional historic compatibility thing.

And then it gets weirder. We take the output of fmt_ident, and then feed
that back into split_ident_line, so that we can set the GIT_AUTHOR_*
variables in the environment. And that does its _own_ set of checks, via
sane_ident_split.

Once upon a time, it relied only on split_ident_lane to report problems.
But Junio's e27ddb6 (split_ident_line(): make best effort when parsing
author/committer line, 2012-08-31) made split_ident_line more lenient,
and introduced sane_ident_split to cover the difference. Except that it
did more than that: besides checking whether the name is empty (which
the original split_ident_line used to do), it also complains if the
email is empty (which is new in that commit).

So we now notice the empty email in this code path, but the only thing
we do is avoid writing out the environment variables and continue. Which
means that the actual string generated by fmt_ident (complete with empty
email) is what goes into the commit. So why are we setting the
environment variables at all?

It looks like they are for the benefit of hooks, as evidenced by 7dfe8ad
(commit: pass author/committer info to hooks, 2012-03-11). So that means
that we are sometimes passing totally bogus data to hooks (i.e., an
ident is good enough to make it into a commit message, but
sane_ident_split prevents us from putting the data into the
environment).

So I think e27ddb6 was a little over-anxious in adding the email
validation, but nobody noticed because those checks don't actually
affect whether or not we commit. Hooks might see bogus data, but it is
in such a rare set of cases that nobody has noticed.

Here are two patches to improve this. These are on top of the
jk/commit-date-approxidate topic, as that is where the regression was
introduced.

The first one fixes the regression and can stand by itself. The second
fixes the GIT_AUTHOR problem, but AFAIK that has been there for years.
So it is not as urgent, but is still maint-worthy, in my opinion.

  [1/2]: commit: loosen ident checks when generating template
  [2/2]: commit: always populate GIT_AUTHOR_* variables

If we did want to truly disallow empty emails, we could do a follow-on
3/2 that teaches fmt_ident to reject them (that is the right place
because it is where the validation checks for the author go, and also
because we would probably want the same validation for the committer).

But I do not think we should do that lightly. It has been this way for
years, and clearly at least one person is depending on it. If we're
going to change it, we might want a warning/deprecation period.

-Peff

  reply	other threads:[~2014-12-10 15:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-10 13:17 Git commit amend empty emails Simon
2014-12-10 15:39 ` Jeff King [this message]
2014-12-10 15:42   ` [PATCH 1/2] commit: loosen ident checks when generating template Jeff King
2014-12-11 20:26     ` Eric Sunshine
2014-12-10 15:43   ` [PATCH 2/2] commit: always populate GIT_AUTHOR_* variables Jeff King
2014-12-10 18:46   ` Git commit amend empty emails Junio C Hamano
2014-12-10 19:45     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141210153952.GA14910@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=simonzack@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).