From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com,
Elijah Newren <newren@gmail.com>
Subject: Re: Why does fast-import need to check the validity of idents? + Other ident adventures
Date: Fri, 05 Feb 2021 16:25:23 +0100 [thread overview]
Message-ID: <87k0rmcza4.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <xmqq7dnpc610.fsf@gitster.c.googlers.com>
On Wed, Feb 03 2021, Junio C Hamano wrote:
> "=?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?=" Bjarmason <avarab@gmail.com>
> writes:
>
>> But I was wondering about fast-import.c in particular. I think Elijah's
>> patch here is obviously good an incremental improvement. But stepping
>> back a bit: who cares about sort-of-fsck validation in fast-import.c
>> anyway?
>
> Those who want to notice and verify the procedure they used to
> produce the import data from the original before it is too late?
>
> I.e. data gets imported to Git, victory declared and then old SCM
> turned gets off---and only then the resulting imported repository is
> found not to pass fsck.
>
>> Shouldn't it just pretty much be importing data as-is, and then we could
>> document "if you don't trust it, run fsck afterwards"?
>
> If it is a small import, the distinction does not matter, but for a
> huge import, the procedure to produce the data is likely to be
> mechanical, so even after processing just a very small portion of
> early part of the datastream, systematic errors would be noticed
> before fast-import wastes importing too much garbage that need to be
> discarded after running such fsck. So in that sense, I suspect that
> there is value in the early validation.
What I was fishing for here is that perhaps since fast-import was
originally written this use-case of in-place conversion of primary data
on a server might have become too obscure to care about, i.e. as opposed
to doing a conversion locally and then "git push"-ing it to something
that does transfer.fsckObjects.
>> Or, if it's a use-case people actually care about, then I might see
>> about unifying some of these parser functions as part of a series I'm
>> preparing.
>
> I think allowing people to loosen particular checks for fast-import
> (or elsewhere for that matter) is a good idea, and you can do so
> more easily once the existing checking is migrated to your new
> scheme that shares code with the fsck machinery.
...allright, depending on how much of a hassle that is I might just add
tests for the differences and leave this particular problem to someone
else :)
prev parent reply other threads:[~2021-02-05 22:29 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-28 19:15 [PATCH] fast-import: accept invalid timezones so we can import existing repos Elijah Newren via GitGitGadget
2020-05-28 19:26 ` Jonathan Nieder
2020-05-28 20:40 ` [PATCH v2] fast-import: add new --date-format=raw-permissive format Elijah Newren via GitGitGadget
2020-05-28 23:08 ` Junio C Hamano
2020-05-29 0:20 ` Jonathan Nieder
2020-05-29 6:13 ` Jeff King
2020-05-29 17:19 ` Junio C Hamano
2020-05-30 20:25 ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-05-30 23:13 ` Jeff King
2021-02-03 11:57 ` Why does fast-import need to check the validity of idents? + Other ident adventures =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason
2021-02-03 19:20 ` Junio C Hamano
2021-02-05 15:25 ` Ævar Arnfjörð Bjarmason [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k0rmcza4.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.