From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com,
Elijah Newren <newren@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Why does fast-import need to check the validity of idents? + Other ident adventures
Date: Wed, 03 Feb 2021 12:57:08 +0100 [thread overview]
Message-ID: <87bld8ov9q.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <pull.795.v3.git.git.1590870357549.gitgitgadget@gmail.com>
[Originally sent 5 days ago, but seems to have been a victim of the
vger.kernel.org problems at the time, re-sending]
On Sat, May 30 2020, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
Full snipped E-Mail in the archive:
https://lore.kernel.org/git/pull.795.v3.git.git.1590870357549.gitgitgadget@gmail.com/
> There are multiple repositories in the wild with random, invalid
> timezones. Most notably is a commit from rails.git with a timezone of
> "+051800"[1]. A few searches will find other repos with that same
> invalid timezone as well. Further, Peff reports that GitHub relaxed
> their fsck checks in August 2011 to accept any timezone value[2], and
> there have been multiple reports to filter-repo about fast-import
> crashing while trying to import their existing repositories since they
> had timezone values such as "-7349423" and "-43455309"[3].
I've been looking at some of our duplicate logic here after my mktag
series where we now use fsck validation. It had a hardcoded "1400"
offset value, which I see fast-import.c still has.
Then in mailmap.c we have parse_name_and_email(), then there's
split_ident_line() in ident.c, and of course
fsck_ident(). record_person_from_buf() in fmt-merge-msg.c, copy_name()
and copy_email() in ref-filter.c. Maybe handle_from() in mailinfo.c also
counts. Anyway, aside from the last these are all parsers for
"author/committer" lines in commits one way or another.
But I was wondering about fast-import.c in particular. I think Elijah's
patch here is obviously good an incremental improvement. But stepping
back a bit: who cares about sort-of-fsck validation in fast-import.c
anyway?
Shouldn't it just pretty much be importing data as-is, and then we could
document "if you don't trust it, run fsck afterwards"?
Or, if it's a use-case people actually care about, then I might see
about unifying some of these parser functions as part of a series I'm
preparing.
next prev parent reply other threads:[~2021-02-03 11:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-28 19:15 [PATCH] fast-import: accept invalid timezones so we can import existing repos Elijah Newren via GitGitGadget
2020-05-28 19:26 ` Jonathan Nieder
2020-05-28 20:40 ` [PATCH v2] fast-import: add new --date-format=raw-permissive format Elijah Newren via GitGitGadget
2020-05-28 23:08 ` Junio C Hamano
2020-05-29 0:20 ` Jonathan Nieder
2020-05-29 6:13 ` Jeff King
2020-05-29 17:19 ` Junio C Hamano
2020-05-30 20:25 ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-05-30 23:13 ` Jeff King
2021-02-03 11:57 ` =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason [this message]
2021-02-03 19:20 ` Why does fast-import need to check the validity of idents? + Other ident adventures Junio C Hamano
2021-02-05 15:25 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bld8ov9q.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.