From: Junio C Hamano <gitster@pobox.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Mike Hommey <mh@glandium.org>, git@vger.kernel.org
Subject: Re: [PATCH] Use GIT_COMMITTER_IDENT instead of hardcoded values in import-tars.perl
Date: Mon, 08 Sep 2008 13:40:20 -0700 [thread overview]
Message-ID: <7vzlmie5hn.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <alpine.DEB.1.00.0809081649040.13830@pacific.mpi-cbg.de.mpi-cbg.de> (Johannes Schindelin's message of "Mon, 8 Sep 2008 16:51:45 +0200 (CEST)")
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> On Sun, 7 Sep 2008, Mike Hommey wrote:
>
>> -my $committer_name = 'T Ar Creator';
>> -my $committer_email = 'tar@example.com';
>> +chomp(my $committer_ident = `git var GIT_COMMITTER_IDENT`);
>> +die 'You need to set user name and email'
>> + unless ($committer_ident =~ s/(.+ <[^>]+>).*/\1/);
>
> I have at least one script that will be broken by this change in behavior.
>
> To me, the issue is just like git-cvsimport, which sets the committer not
> to the actual committer, so that two people can end up with identical
> commit names, even if they cvsimported independently. I'd like the same
> behavior for import-tars. I actually use it that way.
I sense there are conflicting goals here.
cvsimport has partial information about the author (only short account
name and nothing else), and by replicating them without taking them
literally you can achieve reproducibility. On the other extreme is to use
the authorname mapping file to sacrifice reproducibility with other people
that do not have the identical author mapping file to obtain more readable
resulting history with real names. You can do both.
With the hardcoded 'T Ar Creator', you do not have any choice but strict
reproducibility without readable names. With Mike's original patch to
make it in line with git-import.{sh,perl}, you cannot still have both,
because setting GIT_COMMITTER_NAME does not affect what user.name
configuration says. But with "git var GIT_COMMITTER_IDENT", you could.
This makes me wonder if it might be a better design to:
* Make fast-import feeders to preserve as much information from the
source material but not from the environment. This is half-similar in
spirit to what cvsimport does---it does not know the timezone so it
always uses GMT, and it uses the short account name because it is the
only thing available, but it does not use hardcoded "cvs", and the
environment can affect it further by setting up an author mapping
file. Here I am saying a fast-import feeder shouldn't (and does not
have to) take the environment into account, if it does not have good
data in the source material.
In the context of importing tarballs, zipfiles and an existing directory
which is a tarball extract, there is not much authorship information in
the source material (each entry in a tarball may have the owner
information, but what if your tarball have more than one files, with
different owners?).
* Invent a fast-import stream filter that allows you to munge authorship
and committer information selectively. Splice that in to the pipeline
between the feeder and the fast-import, if you want the resulting
history more readable if desired (e.g. use author mapping file).
Or you can choose not to use such a filter, and get a reproducible
result.
If the "filter" turns out to be simple enough, it might even make sense to
make it part of the fast-import itself, but that is an implementation
detail.
prev parent reply other threads:[~2008-09-08 20:41 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-24 12:57 [PATCH] Use user.name and user.email in import-tars.perl Mike Hommey
2008-02-24 18:06 ` Junio C Hamano
2008-09-07 8:52 ` [PATCH] Use GIT_COMMITTER_IDENT instead of hardcoded values " Mike Hommey
2008-09-07 17:09 ` Junio C Hamano
2008-09-08 14:51 ` Johannes Schindelin
2008-09-08 20:40 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vzlmie5hn.fsf@gitster.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=mh@glandium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).