From: Felipe Contreras <felipe.contreras@gmail.com>
To: Michael J Gruber <git@drmicha.warpmail.net>
Cc: Jeff King <peff@peff.net>, A Large Angry SCM <gitzilla@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: RFD: fast-import is picky with author names (and maybe it should - but how much so?)
Date: Tue, 13 Nov 2012 19:15:59 +0100 [thread overview]
Message-ID: <CAMP44s18diic3KQtH5weCv-sVJXj4Pv-QnAaTeHTbrxk-=+3Gw@mail.gmail.com> (raw)
In-Reply-To: <50A21DB9.7070700@drmicha.warpmail.net>
On Tue, Nov 13, 2012 at 11:15 AM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> Felipe Contreras venit, vidit, dixit 12.11.2012 23:47:
>> On Mon, Nov 12, 2012 at 10:41 PM, Jeff King <peff@peff.net> wrote:
>>> On Sun, Nov 11, 2012 at 07:48:14PM +0100, Felipe Contreras wrote:
>>>
>>>>> 3. Exporters should not use it if they have any broken-down
>>>>> representation at all. Even knowing that the first half is a human
>>>>> name and the second half is something else would give it a better
>>>>> shot at cleaning than fast-import would get.
>>>>
>>>> I'm not sure what you mean by this. If they have name and email, then
>>>> sure, it's easy.
>>>
>>> But not as easy as just printing it. What if you have this:
>>>
>>> name="Peff <angle brackets> King"
>>> email="<peff@peff.net>"
>>>
>>> Concatenating them does not produce a valid git author name. Sending the
>>> concatenation through fast-import's cleanup function would lose
>>> information (namely, the location of the boundary between name and
>>> email).
>>
>> Right. Unfortunately I'm not aware of any DSCM that does that.
>>
>>> Similarly, one might have other structured data (e.g., CVS username)
>>> where the structure is a useful hint, but some conversion to name+email
>>> is still necessary.
>>
>> CVS might be the only one that has such structured data. I think in
>> subversion the username has no meaning. A 'felipec' subversion
>> username is as bad as a mercurial 'felipec' username.
>
> In subversion, the username has the clearly defined meaning of being a
> username on the subversion host. If the host is, e.g., a sourceforge
> site then I can easily look up the user profile and convert the username
> into a valid e-mail address (<username>@users.sf.net). That is the
> advantage that the exporter (together with user knowledge) has over the
> importer.
>
> If the initial clone process aborts after every single "unknown" user
> it's no fun, of course. On the other hand, if an incremental clone
> (fetch) let's commits with unknown author sneak in it's no fun either
> (because I may want to fetch in crontab and publish that converted beast
> automatically). That is why I proposed neither approach.
>
> Most conveniently, the export side of a remote helper would
>
> - do "obvious" automatic lossless transformations
> - use an author map for other names
This should be done by fast-import. It doesn't make any sense that
every remote helper and fast-exporter out there have their own way of
mapping authors (or none).
> - For names not covered by the above (or having an empty map entry):
> Stop exporting commits but continue parsing commits and amend the author
> map with any unknown usernames (empty entry), and warn the user.
> (crontab script can notify me based on the return code.)
Stop exporting commits but continue parsing commits? I don't know what
that means.
fast-import should try it's best to clean it up, warn the user, sure,
but also store the missing entry on a file, so that it can be filed
later (if the user so wishes).
> If the cloning involves a "foreign clone" (like the hg clone behind the
> scene) then the runtime of the second pass should be much smaller. In
> principle, one could even store all blobs and trees on the first run and
> skip that step on the second, but that would rely on immutability on the
> foreign side, so I dunno. (And to check the sha1, we have to get the
> blob anyways.)
No. There's no concept of partial clones... Either you clone, or you don't.
Wait if the remote helper didn't notice that the author was bad?
fast-import could just just leave everything up to that point, warn
abut what happened, and exit, but the exporter side would die in the
middle of exporting, and it might end up in a bad state, not saving
marks, or who knows what.
It wouldn't work.
The cloning should be full, and the bad authors stored in a scaffold author map.
> As for the format for incomplete entries (foo <some@where>), a technical
> guideline should suffice for those that follow guidelines.
fast-import should do that.
Cheers.
--
Felipe Contreras
next prev parent reply other threads:[~2012-11-13 18:16 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-02 14:43 RFD: fast-import is picky with author names (and maybe it should - but how much so?) Michael J Gruber
2012-11-02 14:47 ` Michael J Gruber
2012-11-08 20:09 ` Jeff King
2012-11-09 9:28 ` Michael J Gruber
2012-11-09 14:34 ` Felipe Contreras
2012-11-10 17:28 ` Michael J Gruber
2012-11-10 18:43 ` Felipe Contreras
2012-11-10 19:25 ` A Large Angry SCM
2012-11-11 12:41 ` Felipe Contreras
2012-11-11 17:00 ` A Large Angry SCM
2012-11-11 17:15 ` Jeff King
2012-11-11 17:45 ` Felipe Contreras
2012-11-11 18:14 ` Jeff King
2012-11-11 18:48 ` Felipe Contreras
2012-11-12 21:41 ` Jeff King
2012-11-12 22:47 ` Felipe Contreras
2012-11-13 10:15 ` Michael J Gruber
2012-11-13 18:15 ` Felipe Contreras [this message]
2012-11-11 18:16 ` A Large Angry SCM
2012-11-11 17:16 ` Felipe Contreras
2012-11-11 17:39 ` A Large Angry SCM
2012-11-11 17:49 ` Felipe Contreras
2012-11-12 17:45 ` Junio C Hamano
2012-11-12 20:46 ` Felipe Contreras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMP44s18diic3KQtH5weCv-sVJXj4Pv-QnAaTeHTbrxk-=+3Gw@mail.gmail.com' \
--to=felipe.contreras@gmail.com \
--cc=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=gitzilla@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).