Re: RFD: fast-import is picky with author names (and maybe it should - but how much so?)

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael J Gruber <git@drmicha.warpmail.net>
To: Felipe Contreras <felipe.contreras@gmail.com>
Cc: Jeff King <peff@peff.net>, A Large Angry SCM <gitzilla@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: RFD: fast-import is picky with author names (and maybe it should - but how much so?)
Date: Tue, 13 Nov 2012 11:15:21 +0100	[thread overview]
Message-ID: <50A21DB9.7070700@drmicha.warpmail.net> (raw)
In-Reply-To: <CAMP44s1gA1P-Lr1M=7RDRqFQmvQAtNnB+yAJfKC1gk3XUjbfCQ@mail.gmail.com>

Felipe Contreras venit, vidit, dixit 12.11.2012 23:47:
> On Mon, Nov 12, 2012 at 10:41 PM, Jeff King <peff@peff.net> wrote:
>> On Sun, Nov 11, 2012 at 07:48:14PM +0100, Felipe Contreras wrote:
>>
>>>>   3. Exporters should not use it if they have any broken-down
>>>>      representation at all. Even knowing that the first half is a human
>>>>      name and the second half is something else would give it a better
>>>>      shot at cleaning than fast-import would get.
>>>
>>> I'm not sure what you mean by this. If they have name and email, then
>>> sure, it's easy.
>>
>> But not as easy as just printing it. What if you have this:
>>
>>   name="Peff <angle brackets> King"
>>   email="<peff@peff.net>"
>>
>> Concatenating them does not produce a valid git author name. Sending the
>> concatenation through fast-import's cleanup function would lose
>> information (namely, the location of the boundary between name and
>> email).
> 
> Right. Unfortunately I'm not aware of any DSCM that does that.
> 
>> Similarly, one might have other structured data (e.g., CVS username)
>> where the structure is a useful hint, but some conversion to name+email
>> is still necessary.
> 
> CVS might be the only one that has such structured data. I think in
> subversion the username has no meaning. A 'felipec' subversion
> username is as bad as a mercurial 'felipec' username.

In subversion, the username has the clearly defined meaning of being a
username on the subversion host. If the host is, e.g., a sourceforge
site then I can easily look up the user profile and convert the username
into a valid e-mail address (<username>@users.sf.net). That is the
advantage that the exporter (together with user knowledge) has over the
importer.

If the initial clone process aborts after every single "unknown" user
it's no fun, of course. On the other hand, if an incremental clone
(fetch) let's commits with unknown author sneak in it's no fun either
(because I may want to fetch in crontab and publish that converted beast
automatically). That is why I proposed neither approach.

Most conveniently, the export side of a remote helper would

- do "obvious" automatic lossless transformations
- use an author map for other names
- For names not covered by the above (or having an empty map entry):
Stop exporting commits but continue parsing commits and amend the author
map with any unknown usernames (empty entry), and warn the user.
(crontab script can notify me based on the return code.)

If the cloning involves a "foreign clone" (like the hg clone behind the
scene) then the runtime of the second pass should be much smaller. In
principle, one could even store all blobs and trees on the first run and
skip that step on the second, but that would rely on immutability on the
foreign side, so I dunno. (And to check the sha1, we have to get the
blob anyways.)

As for the format for incomplete entries (foo <some@where>), a technical
guideline should suffice for those that follow guidelines.

Michael

next prev parent reply	other threads:[~2012-11-13 10:15 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-02 14:43 RFD: fast-import is picky with author names (and maybe it should - but how much so?) Michael J Gruber
2012-11-02 14:47 ` Michael J Gruber
2012-11-08 20:09 ` Jeff King
2012-11-09  9:28   ` Michael J Gruber
2012-11-09 14:34     ` Felipe Contreras
2012-11-10 17:28       ` Michael J Gruber
2012-11-10 18:43         ` Felipe Contreras
2012-11-10 19:25           ` A Large Angry SCM
2012-11-11 12:41             ` Felipe Contreras
2012-11-11 17:00               ` A Large Angry SCM
2012-11-11 17:15                 ` Jeff King
2012-11-11 17:45                   ` Felipe Contreras
2012-11-11 18:14                     ` Jeff King
2012-11-11 18:48                       ` Felipe Contreras
2012-11-12 21:41                         ` Jeff King
2012-11-12 22:47                           ` Felipe Contreras
2012-11-13 10:15                             ` Michael J Gruber [this message]
2012-11-13 18:15                               ` Felipe Contreras
2012-11-11 18:16                   ` A Large Angry SCM
2012-11-11 17:16                 ` Felipe Contreras
2012-11-11 17:39                   ` A Large Angry SCM
2012-11-11 17:49                     ` Felipe Contreras
2012-11-12 17:45                 ` Junio C Hamano
2012-11-12 20:46                   ` Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50A21DB9.7070700@drmicha.warpmail.net \
    --to=git@drmicha.warpmail.net \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=gitzilla@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).